[Fwd: [computational.science] Resilience 2009 submission deadine has been extended to March 4] - Cfp

26 Feb 2009


      -------- Original-Nachricht --------
Betreff: 	[computational.science] Resilience 2009 submission deadine has 
been extended to March 4
Datum: 	Wed, 25 Feb 2009 13:35:23 -0500
Von: 	Christian Engelmann engelmannc@ornl.gov
Organisation: 	"OptimaNumerics"
An: 	Computational Science Mailing List 
computational.science@lists.OptimaNumerics.com
The paper submission deadline has been extended to March 4 (firm deadline).
We apologize if you receive multiple copies of this CFP.
-----------------------------------------------------------------------
Call for Papers
2nd International Workshop on Resiliency in High Performance Computing
                          (Resilience 2009)
              http://xcr.cenit.latech.edu/resilience2009
                        in conjunction with the
International Symposium on High Performance Distributed Computing (HPDC)
                   June 9-13, 2009 Munich, Germany
Recent trends in high-performance computing (HPC) systems have clearly
indicated that future increases in performance, in excess of those
resulting from improvements in single-processor performance, will be
achieved through corresponding increases in system scale, i.e., using a
significantly larger component count. As the raw computational
performance of the world's fastest HPC systems increases from today's
current tera-scale to next-generation peta-scale capability and beyond,
their number of computational, networking, and storage components will
grow from the ten-to-one-hundred thousand compute nodes of today's
systems to several hundreds of thousands of compute nodes and more in
the foreseeable future. This substantial growth in system scale, and the
resulting component count, poses a challenge for HPC system and
application software with respect to fault tolerance and resilience.
Furthermore, recent experiences on extreme-scale HPC systems with
non-recoverable soft errors, i.e., bit flips in memory, cache,
registers, and logic added another major source of concern. The
probability of such errors not only grows with system size, but also
with increasing architectural vulnerability caused by employing
accelerators, such as FPGAs and GPUs, and by shrinking nanometer
technology. Reactive fault tolerance technologies, such as
checkpoint/restart, are unable to handle high failure rates due to
associated overheads, while proactive resiliency technologies, such as
preemptive migration, simply fail as random soft errors can't be
predicted. Moreover, soft errors may even remain undetected resulting in
silent data corruption.
The goal of this Workshop is to bring together experts in the area of fault
tolerance and resiliency for HPC to present the latest achievements and to
discuss the challenges ahead. Accepted papers will be included with the HPDC
conference proceedings published by ACM. Resilience 2009 is the follow-on to the
successful Resilience 2008 workshop http://xcr.cenit.latech.edu/resilience2008
held in conjunction with CCGrid in Lyon, France.
Important Dates:
- Paper Submission Deadline : March  4, 2009 (firm)
- Notification Deadline     : March 18, 2009
- Camera Ready Deadline     : April  2, 2008
Submission Guidelines:
Original, unpublished work is required. Submissions shall be a maximum of 10 ACM
SIG style pages (http://www.acm.org/sigs/publications/proceedings-templates),
including tables and illustrations. All submitted manuscripts will be reviewed
by a distinguished international program committee. Accepted contributions will
be published with the HPDC conference proceedings through ACM. Papers should be
submitted electronically via https://ssl.linklings.net/conferences/hpdc.
Topics of interest include, but are not limited to:
- Reports on current HPC system and application resiliency
- HPC resiliency metrics and standards
- HPC system and application resiliency analysis
- HPC system and application-level fault handling and anticipation
- HPC system and application health monitoring
- Resiliency for HPC file and storage systems
- System-level checkpoint/restart for HPC
- System-level preemptive migration for HPC
- Algorithm-based resiliency for HPC
- Fault tolerant MPI concepts and solutions
- Soft error detection and recovery in HPC systems
- HPC system and application log analysis
- Statistical methods to identify failure root causes
- Fault injection studies in HPC environments
- High availability solutions for HPC systems
- Reliability and availability analysis
- Hardware for fault detection and recovery
General Co-Chairs:
- Stephen L. Scott
  Computer Science and Mathematics Division
  Oak Ridge National Laboratory
  scottsl@ornl.gov
- Chokchai (Box) Leangsuksun
  SWEPCO Endowed Associate Professor of Computer Science,
  Louisiana Tech University, USA
  box@latech.edu
Program Chair:
- Christian Engelmann
  Computer Science and Mathematics Division
  Oak Ridge National Laboratory
  engelmannc@ornl.gov
Program Committee:
- Ann Gentile, Sandia National Laboratory, USA
- Aurelien Bouteiller, University of Tennessee, USA
- Chokchai (Box) Leangsuksun, Louisiana Tech University, USA
- Christian Engelmann, Oak Ridge National Laboratory, USA
- Daniel S. Katz, Louisiana State University, USA
- Dan Stanzione, Arizona State University, USA
- Franck Cappello, INRIA, France
- Geoffroy Vallee, Oak Ridge National Laboratory, USA
- George Bosilca, University of Tennessee, USA
- George Ostrouchov, Oak Ridge National Laboratory, USA
- Greg Bronevetsky, Lawrence Livermore National Laboratory, USA
- Gregory M. Thorson, SGI, USA
- Hong Ong, Oak Ridge National Laboratory, USA
- Jim Brandt, Sandia National Laboratory, USA
- John T. Daly, Center for Exceptional Computing, USA
- Jon Stearley, Sandia National Laboratory, USA
- Li Ou, Dell, USA
- Mihaela Paun, Louisiana Tech University, USA
- Nathan DeBardeleben, Los Alamos National Laboratory, USA
- Paul Hargrove, Lawrence Berkeley National Laboratory, USA
- Stephen Poole, Oak Ridge National Laboratory, USA
- Stephen L. Scott, Oak Ridge National Laboratory, USA
- Sudharshan Vazhkudai, Oak Ridge National Laboratory, USA
- Thomas Naughton, Oak Ridge National Laboratory, USA
- Tong Liu, Mellanox, USA
- Xian-He Sun, Illinois Institute of Technology, USA
- Xubin (Ben) He, Tennessee Tech University, USA
- Yung-Chin Fang, Dell, USA
- Zhiling Lan, Illinois Institute of Technology, USA
-- 
-----------------------------------------------------------------------
Dr. Christian Engelmann                        Phone: +1 (865) 574-3132
Research and Development Staff Member            Fax: +1 (865) 576-5491
Oak Ridge National Laboratory                    One Bethel Valley Road
mailto:engelmannc@ornl.gov                       P.O. Box 2008, MS-6173
http://www.csm.ornl.gov/~engelman              Oak Ridge, TN 37831, USA
-----------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: computational.science-unsubscribe@lists.optimanumerics.com
For additional commands, e-mail: computational.science-help@lists.optimanumerics.com

 Computational Science mailing list hosting is provided by 
 OptimaNumerics (http://www.OptimaNumerics.com)
---------------------------------------------------------------------