-------- Forwarded Message --------
We apologize if you receive multiple copies of this call for
papers.
--------------------------------------------------------------------------------
13th Workshop on Resiliency in High Performance Computing
(Resilience)
in Clusters, Clouds, and Grids
<https://www.csm.ornl.gov/srt/conferences/Resilience/2020>
in conjunction with
the 26th International European Conference on Parallel and
Distributed
Computing (Euro-Par), Warsaw, Poland
August 24 - 28, 2020
<http://2020.euro-par.org>
2020 Workshop Format:
Due to the exceptional situation of COVID-19, this year Euro-Par
and its
workshops will be organized as an all-virtual event. This includes
the main
conference and workshops. The accepted workshop papers must be
presented by
one of the authors in order to be included in the proceedings.
There will
be a single minimal registration fee for each accepted paper in
order to
cover expenses associated with organization and proceedings
publication.
Only one author per paper needs to register (150 euros). Lastly,
the
preferred presentation format for the workshop will be via a
streaming
presentation, with slides and pre-recorded video presentations
used in
exceptional situations.
Overview:
Resilience is a critical challenge as high performance computing
(HPC)
systems continue to increase component counts, individual
component
reliability decreases (such as due to shrinking process technology
and
near-threshold voltage (NTV) operation), hardware complexity
increases
(such as due to heterogeneous computing) and software complexity
increases
(such as due to complex data- and workflows, real-time
requirements and
integration of artificial intelligence (AI) technologies with
traditional
applications).
Correctness and execution efficiency, in spite of faults, errors,
and
failures, is essential to ensure the success of the HPC systems,
cluster
computing environments, Grid computing infrastructures, and Cloud
computing
services. The impact of faults, errors, and failures in such HPC
systems
can range from financial losses due to system downtime (sometimes
several
tens-of-thousands of Dollars per lost system-hour), to financial
losses due
to unnecessary overprovision (acquisition and operating costs), to
financial losses and legal liabilities due to erroneous or delayed
output.
The emergence of AI technology opens up new possibilities, but
also new
problems. Using AI technology for operational intelligence that
enables
resilience in HPC systems and centers is a complex control
problem, while
designing resilient AI technology for HPC applications is a
difficult
algorithmic problem. Resilience for HPC systems encompasses a wide
spectrum
of fundamental and applied research and development, including
theoretical
foundations, error/failure and anomaly detection, monitoring and
control,
end-to-end data integrity, enabling infrastructure, and resilient
algorithms.
This workshop brings together experts in the community to further
research
and development in HPC resilience and to facilitate exchanges
across the
computational paradigms of extreme-scale HPC, cluster computing,
Grid
computing, and Cloud computing.
Submission Guidelines:
Authors are invited to submit papers electronically in English in
PDF
format. Submitted manuscripts should be structured as technical
papers and
BETWEEN 10 AND 12 PAGES, including figures, tables and references,
using
Springer's Lecture Notes in Computer Science (LNCS) format at
<http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0>.
Papers with
less than 10 or more than 12 pages will not be accepted due to
publisher
guidelines. Submissions should include abstract, key words and the
e-mail
address of the corresponding author. Papers not conforming to
these
guidelines may be returned without review. All manuscripts will be
reviewed
and will be judged on correctness, originality, technical
strength,
significance, quality of presentation, and interest and relevance
to the
conference attendees. Submitted papers must represent original
unpublished
research that is not currently under review for any other
conference or
journal. Papers not following these guidelines will be rejected
without
review and further action may be taken, including (but not limited
to)
notifications sent to the heads of the institutions of the authors
and
sponsors of the conference. Submissions received after the due
date or not
appropriately structured may also not be considered. The
proceedings will
be published in Springer's LNCS as post-conference proceedings. At
least
one author of an accepted paper must register for and attend the
workshop
for inclusion in the proceedings. Authors may contact the workshop
program
chairs for more information.
Important websites:
- Resilience 2020 Website:
<https://www.csm.ornl.gov/srt/conferences/Resilience/2020>
- Resilience 2020 Submissions:
<https://easychair.org/conferences/?conf=europar2020workshop>
- Euro-Par 2020 website:
<http://2020.euro-par.org>
Topics of interest include, but are not limited to:
- Theoretical foundations for resilience:
- Metrics and measurement
- Statistics and optimization
- Simulation and emulation
- Formal methods
- Efficiency modeling and uncertainty quantification
- Experience reports
- Error/failure/anomaly detection and reliability/dependability
modeling:
- Statistical analyses
- Machine learning and artificial intelligence
- Digital twins
- Data collection and aggregation
- Information visualization
- Monitoring and control for resilience:
- Center, system and application monitoring and control
- Reliability, availability, serviceability and performability
- Tunable fidelity and quality of service
- Automated response and recovery
- Operational intelligence to enable resilience
- End-to-end integrity:
- Fault tolerant design of centers, systems and applications
- Forward migration and verification
- Degraded operation
- Error propagation, failure cascades, and error/failure
containment
- Testing and evaluation, including fault/error/failure injection
- Enabling infrastructure for resilience:
- Reliability, availability, serviceability systems
- System software and middleware
- Resilience extensions for programming models
- Tools and frameworks
- Support for resilience in heterogeneous architectures
- Resilient algorithms:
- Algorithmic detection and correction
- Resilient solvers and algorithm-based fault tolerance
- Fault tolerant numerical methods
- Robust iterative algorithms
- Resilient artificial intelligence
Important Dates:
- Workshop papers due: June 12, 2020 (extended)
- Workshop author notification: July 21, 2020
- Workshop author registration: TBA
- Workshop paper (for informal workshop proceedings): July 21,
2020
- Workshop date: August 24 or 25, 2020
- Workshop camera-ready papers: September 11, 2020 (after the
conference)
General Co-Chairs:
- Stephen L. Scott
Tennessee Tech University, USA
scottsl@ornl.gov
- Christian Engelmann
Oak Ridge National Laboratory , USA
engelmannc@ornl.gov
Program Co-Chairs:
- Ferrol Aderholdt
Middle Tennessee State University, USA
ferrol.aderholdt@mtsu.edu
- Thomas Naughton
Oak Ridge National Laboratory , USA
naughtont@ornl.gov
Workshop Chair Emeritus:
- Chokchai (Box) Leangsuksun
Louisiana Tech University, USA
box@latech.edu
Program Committee:
- Wesley Bland, Intel Corporation, USA
- Hans-Joachim Bungartz, Technical University of Munich, Germany
- Marc Casas, Barcelona Supercomputer Center, Spain
- Zizhong Chen, University of California at Riverside, USA
- Robert Clay, Sandia National Laboratories, USA
- Nathan DeBardeleben, Los Alamos National Laboratory, USA
- James Elliott, Sandia National Laboratories, USA
- Kurt Ferreira, Sandia National Laboratories, USA
- Saurabh Hukerikar, NVIDIA, USA
- Ignacio Laguna, Lawrence Livermore National Laboratory, USA
- Scott Levy, Sandia National Laboratories, USA
- Rolf Riesen, Intel Corporation, USA
- Yves Robert, ENS Lyon, France
- Thomas Ropars, Universite Grenoble Alpes, France
- Martin Schulz, Technical University of Munich, Germany
- Keita Teranishi, Sandia National Laboratories, USA
_________________________________________________________________________
Thomas Naughton
naughtont@ornl.gov
Research Associate (865) 576-4184