-------- Forwarded Message --------
[apologize for multiple posting]
ScaDL 2022: Scalable Deep Learning over Parallel And Distributed
Infrastructure - An IPDPS 2022 Workshop
https://2022.scadl.org <https://2022.scadl.org/>
Scope of the Workshop
Recently, Deep Learning (DL) has received tremendous attention in
the research community because of the impressive results obtained
for a large number of machine learning problems. The success of
state-of-the-art deep learning systems relies on training deep
neural networks over a massive amount of training data, which
typically requires a large-scale distributed computing
infrastructure to run. In order to run these jobs in a scalable
and efficient manner, on cloud infrastructure or dedicated HPC
systems, several interesting research topics have emerged which
are specific to DL. The sheer size and complexity of deep learning
models when trained over a large amount of data makes them harder
to converge in a reasonable amount of time. It demands advancement
along multiple research directions such as, model/data
parallelism, model/data compression, distributed optimization
algorithms for DL convergence, synchronization strategies,
efficient communication and specific hardware acceleration.
SCADL seeks to advance the following research directions:
- Asynchronous and Communication-Efficient SGD: Stochastic
gradient descent is at the core of large-scale machine learning.
Parallelizing SGD gradient computation across multiple nodes
increases the data processed per iteration, but exposes the SGD to
communication and synchronization delays and unpredictable node
failures in the system. Thus, there is a critical need to design
robust and scalable distributed SGD methods to achieve fast
error-convergence in spite of such system variabilities.
- High performance computing aspects: Deep learning is highly
compute intensive. Algorithms for kernel computations on commonly
used accelerators (e.g. GPUs), efficient techniques for
communicating gradients and loading data from storage are critical
for training performance.
- Model and Gradient Compression Techniques: Techniques such as
reducing weights and the size of weight tensors help in reducing
the compute complexity. Using lower-bit representations such as
quantization and sparsification allow for more optimal use of
memory and communication bandwidth.
- Distributed Trustworthy AI: New techniques are needed to meet
the goal of global trustworthiness (e.g., fairness and adversarial
robustness) efficiently in a distributed DL setting.
- Emerging AI hardware Accelerators: with the proliferation of new
hardware accelerators for AI such in memory computing (Analog AI)
and neuromorphic computing, novel methods and algorithms need to
be introduced to adapt to the underlying properties of the new
hardware (example: the non-idealities of the phase-change memory
(PCM) and the cycle-to-cycle statistical variations). - The
intersection of Distributed DL and Neural Architecture Search
(NAS): NAS is increasingly being used to automate the synthesis of
neural networks. However, given the huge computational demands of
NAS, distributed DL is critical to make NAS computationally
tractable (e.g., differentiable distributed NAS).
This intersection of distributed/parallel computing and deep
learning is becoming critical and demands specific attention to
address the above topics which some of the broader forums may not
be able to provide. The aim of this workshop is to foster
collaboration among researchers from distributed/parallel
computing and deep learning communities to share the relevant
topics as well as results of the current approaches lying at the
intersection of these areas.
Areas of Interest
In this workshop, we solicit research papers focused on
distributed deep learning aiming to achieve efficiency and
scalability for deep learning jobs over distributed and parallel
systems. Papers focusing both on algorithms as well as systems are
welcome. We invite authors to submit papers on topics including
but not limited to:
- Deep learning on cloud platforms, HPC systems, and edge devices
- Model-parallel and data-parallel techniques
- Asynchronous SGD for Training DNNs
- Communication-Efficient Training of DNNs
- Scalable and distributed graph neural networks, Sampling
techniques for graph neural networks
- Federated deep learning, both horizontal and vertical, and its
challenges
- Model/data/gradient compression
- Learning in Resource constrained environments
- Coding Techniques for Straggler Mitigation
- Elasticity for deep learning jobs/spot market enablement
- Hyper-parameter tuning for deep learning jobs
- Hardware Acceleration for Deep Learning including digital and
analog accelerators
- Scalability of deep learning jobs on large clusters
- Deep learning on heterogeneous infrastructure
- Efficient and Scalable Inference
- Data storage/access in shared networks for deep learning
- Communication-efficient distributed fair and adversarially
robust learning
- Distributed learning techniques applied to speed up neural
architecture search
Workshop Format
Due to the continuing impact of COVID-19, ScaDL 2022 will also
adopt relevant IPDPS 2022 policies on virtual participation and
presentation. Consequently, the organizers are currently planning
a hybrid (in-person and virtual) event.
Submission Link
Submissions will be managed through linklings. Submission link
available at:
https://2022.scadl.org/call-for-papers
<https://2022.scadl.org/call-for-papers>
Key Dates
- Paper Submission: January 24, 2022
- Acceptance Notification: March 1, 2022
- Camera ready papers due: March 15, 2022 (hard deadline)
- Workshop Date: TBA (May 30th or June 3rd, 2022)
Author Instructions
ScaDL 2022 accepts submissions in two categories:
Regular papers: 8-10 pages
Short papers/Work in progress: 4 pages
The aforementioned lengths include all technical content,
references and appendices.
We encourage submissions that are original research work, work in
progress, case studies, vision papers, and industrial experience
papers.
Papers should be formatted using IEEE conference style, including
figures, tables, and references. The IEEE conference style
templates for MS Word and LaTeX provided by IEEE eXpress
Conference Publishing are available for download. See the latest
versions at
https://www.ieee.org/conferences/publishing/templates.html
<https://www.ieee.org/conferences/publishing/templates.html>
General Chairs
Danilo Ardagna, Politecnico di Milano, Italy
Stacy Patterson, Rensselaer Polytechnic Institute (RPI), USA
Program Committee Chairs
Program Committee Chairs
Alex Gittens, Rensselaer Polytechnic Institute (RPI), USA
Kaoutar El Maghraoui, IBM Research AI, USA
Program Committee Members
Misbah Mubarak, Amazon
Hamza Ouarnoughi, UPHF LAMIH Neil McGlohon, Rensselaer Polytechnic
Institute (RPI)
Nathalie Baracaldo Angel, IBM Research, USA
Ignacio Blanquer, Universitat Politecnica de Valencia, Spain
Dario Garcia-Gasulla, Barcelona Supercomputing Center
Saurabh Gupta, AMD
Jalil Boukhobza, ENSTA-Bretagne
Aiichiro Nakano, University of Southern California, USA
Dhabaleswar K. (DK) Panda, Ohio State University
Eduardo Rocha Rodrigues, IBM Research, Brazil
Chen Wang, IBM Research, USA
Yangyang Xu, Rensselaer Polytechnic Institute (RPI)
Hongyi Wang, CMU, MLD lab
Steering Committee
Parijat Dube, IBM Research AI, USA
Vijay K. Garg, University of Texas at Austin
Vinod Muthusamy, IBM Research AI
Ashish Verma, IBM Research AI
Jayaram K. R., IBM Research AI, USA
Yogish Sabharwal, IBM Research AI, India