KAUST Workshop on Distributed Training in the Era of Large Models

KAUST Workshop on Distributed Training in the Era of Large Models

November 24 - 26, 2025

KAUST

Building 4 & 5, Level 0,

Auditorium Room 0215

Register

Share

X LinkedIn

ABOUT THE WORKSHOP

The KAUST Workshop on Distributed Training in the Era of Large Models is a three-day, in-person event dedicated to one of the most pressing challenges in artificial intelligence today: how to efficiently scale learning algorithms to meet the massive computational and data demands of modern AI models.

Over the past few years, models have grown from millions to hundreds of billions of parameters, delivering remarkable improvements in capability, but also creating unprecedented challenges for training. Scaling this next generation of AI systems requires advances across distributed optimization, communication-efficient algorithms, model-parallel architectures, and hardware-software co-design.

This workshop will bring together an exceptional group of international researchers, industry practitioners, and KAUST faculty to share state-of-the-art solutions and explore what comes next for distributed and large-scale training. Through invited talks and informal interactions, participants will have the opportunity to engage deeply with emerging ideas, discover new research directions, and build collaborations that will shape the future of scalable machine learning.

AGENDA

*Coming soon

SPEAKERS

*Additional speakers coming soon.

Thalaiyasingam Ajanthan

Founding Scientist

Pluralis Research

Communication Efficient Model Parallel Training of Language Models

Thalaiyasingam Ajanthan

Founding Scientist, Pluralis Research

Abstract:

Biography:

Thalaiyasingam Ajanthan is a Founding Scientist at Pluralis Research, focusing on making large-scale decentralized training practical and scalable. I also hold a Visiting Researcher position at the Australian National University (ANU). Before this, He was a Senior Machine Learning Scientist at Amazon and a Postdoctoral Research Fellow at the Australian Centre for Robotic Vision (ANU) and the Torr Vision Group (University of Oxford). I received my PhD in 2017 from ANU under the supervision of Prof. Richard Hartley.

His research interests lie at the intersection of optimization and machine learning, with a particular focus on developing robust and efficient decentralized training algorithms.

Marco Canini

Professor, Computer Science

KAUST

Volkan Cevher

Professor

EPFL

Training neural networks at any scale

Volkan Cevher

Professor, EPFL

Abstract:

Biography:

Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park, from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. He was also a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University from 2010-2020. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and an Amazon Scholar. His research interests include machine learning, optimization theory and methods, and automated control. Dr. Cevher is an IEEE Fellow ('24), an ELLIS fellow, and was the recipient of the ICML AdvML Best Paper Award in 2023, Google Faculty Research award in 2018, the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.

Heming Cui

Assistant Professor

University of Hong Kong (HKU)

Building Multi-dimensional Parallel Training Systems for Large AI Models

Heming Cui

Associate Professor, University of Hong Kong (HKU)

Biography:

Dr. Heming Cui is an Associate Professor in HKU CS. Dr. Cui is interested in building software infrastructures and tools to greatly improve the reliability, security and performance of real-world software. After gaining his PhD degree in the Columbia University in 2015, he joined HKU and independently built a parallel and distributed system group with about 20 ongoing, full-time PhD students. His recent research has led to a series of open source projects and publications in international top conferences and journals of broad areas, including SOSP, IEEE S&P, VLDB, TPDS, MICRO, NSDI, ASPLOS, ATC, ICSE, EuroSys. Dr. Cui received the ACM ICSE 2025 best paper award and the ACM ACSAC 2017 best paper award. As a research project leader (principle investigator), his research projects have received a total fundings of about HKD $150 million, including China's National Key R&D Program (CNY $110 million), and HK's Research Grants Council (e.g., the RGC Research Impact Fund). Dr Cui received his bachelor and master degrees from Tsinghua University, and PhD from Columbia University, all in Computer Science

Hongchang Gao

Assistant Professor

Temple University

On Provable Benefits of Muon in Federated Learning

Hongchang Gao

Assistant Professor, Temple University

Biography:

Hongchang Gao is an assistant professor in the Department of Computer and Information Sciences at Temple University. He received his Ph.D. degree in Electrical and Computer Engineering from University of Pittsburgh in 2020. His research interests include machine learning, optimization, and biomedical data science, with a special focus on distributed optimization and federated learning. His work has been published in top venues such as ICML, NeurIPS, AISTATS, KDD, AAAI, and IJCAI. He currently serves as an Associate Editor for the Journal of Combinatorial Optimization and regularly acts as an Area Chair for ICML, NeurIPS, and ICLR. He is a recipient of the NSF CAREER Award (2024), the AAAI New Faculty Highlights (2023), and the Cisco Faculty Research Award (2023).

Frederik Kunstner

Postdoctoral Researcher

Inria

The Benefits of Adam and Sign-like Methods in Training Large Language Models

Frederik Kunstner

Postdoctoral researcher, Inria

Abstract:

Biography:

Frederik Kunstner is a postdoctoral researcher at INRIA Paris, working with Francis Bach. His research focuses on the intersection of optimization theory and machine learning, aiming to build a better understanding on how to train machine learning models. He received his PhD from the University of British Columbia where he worked with Mark Schmidt. His thesis has received the CAIAC Best Doctoral Dissertation Award and a AAAI/ACM SIGAI Honorable Mention, and his work on the Expectation-Maximization algorithm has won the best paper award at AISTATS 2021.

Zhouchen Lin

Professor

Peking University

Conda: Column-Normalized Adam for Training Large Language Models Faster

Zhouchen Lin

Professor, Peking University

Biography:

Zhouchen Lin received the Ph.D. degree in applied mathematics from Peking University in 2000. He is currently a Boya Special Professor with the State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University. His research interests include machine learning and numerical optimization. He has published over 350 technical papers and 5 monographs, receiving over 41,000 Google Scholar citations. He has been Area Chairs and Senior Area Chairs of ACML, ACCV, CVPR, ICCV, NIPS/NeurIPS, AAAI, IJCAI, ICLR, and ICML for many times, and a Program co-Chair of ICPR 2022. He is currently a member of ICML Board of Directors, an Associate Editor in Chief of the IEEE Transactions on Pattern Analysis and Machine Intelligence, and an associate editor of the International Journal of Computer Vision and Optimization Methods and Software. He is a Fellow of the IAPR, the IEEE, the AAIA and the CSIG.

Edouard Oyallon

Research Scientist

CNRS

Hiding Communication Cost via Overlapping Methods (TBC)

Edouard Oyallon

Research Scientist, CNRS

Abstract:

Biography:

Edouard Oyallon, a CNRS researcher at Sorbonne University (MLIA). His research spans theoretical and applied machine learning, from studying deep network symmetries to designing algorithms for large-scale distributed and decentralized training with asynchronous methods and parallelism.

Antonio Orvieto

Principal Investigator & Independent Group Leader

MPI-IS, ELLIST

Training LLMs: Do We Understand Our Optimizers?

Antonio Orvieto

Principal Investigator & Independent Group Leader, MPI-IS, ELLIST

Abstract:

Biography:

Antonio is an Independent Group Leader of the MPI for Intelligent Systems and a Hector Endowed Fellow and Principal Investigator (PI) at the ELLIS Institute Tübingen, where he leads the Deep Models and Optimization group. He received the ETH medal for outstanding doctoral theses and the Schmidt Sciences AI2050 Early Career Fellowship.

In his research, Antonio strives to improve the efficiency of deep learning technologies by pioneering new architectures and training techniques grounded in theoretical knowledge. His work encompasses two main areas: understanding the intricacies of large-scale optimization dynamics and designing innovative architectures and powerful optimizers capable of handling data with a complex sequential nature.

Youngsuk Park

Senior Applied Scientist & Manager

Amazon AWS AI

Toward Compute- and Memory-Efficient LLM Training: Optimizers and Low-Precision Recipes

Youngsuk Park

Senior Applied Scientist & Manager, Amazon AWS AI

Abstract:

Biography:

Youngsuk Park leads a core algorithm team at AWS Annapurna Labs advancing scalable and efficient LLM training and inference methods. He manages a high-caliber research group, pioneering innovations in quantization, structured sparsity, and hardware-aware modelling and algorithms optimized for AWS Trainium. His algorithmic work spans the full model lifecycle—from efficient large-scale training recipes to low-latency inference deployment—powering foundation models across Amazon Bedrock, AGI, and partners like Anthropic. He has co-authored 30+ papers at ICLR, ICML, AISTATS, and KDD on LLM training, inference, optimization, time-series and reinforcement learning, and regularly organizes tutorials and workshops at top AI conferences, sharing practical insights on deploying foundation models efficiently on AI accelerators.

Peter Richtarik

Professor, Computer Science

KAUST

Frank Schneider

Postdoctoral Researcher

University of Tübingen

Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition

Frank Schneider

Postdoctoral Researcher, University of Tübingen

Abstract:

Biography:

Frank Schneider is a postdoctoral researcher at the Chair for the Methods of Machine Learning at the University of Tübingen. Before that, he did his Ph.D. in the same group as part of the IMPRS-IS (International Max Planck Research School for Intelligent Systems) under the supervision of Prof. Dr. Philipp Hennig. His research focuses on helping the community move beyond the unsatisfactory user experience of current optimization methods for deep learning. He currently serves as one of the two chairs of MLCommons' Algorithms Working Group. He holds a Bachelor’s and Master’s degree in Simulation Technology from the University of Stuttgart as well as a Master’s degree in Industrial and Applied Mathematics from the Eindhoven University of Technology.

Antonio Silveti-Falls

Associate Professor

CentraleSupélec

Training Neural Networks at Any Scale with the Scion Algorithm

Antonio Silveti-Falls

Associate Professor, CentraleSupélec

Abstract:

Biography:

I am a researcher and associate professor of artificial intelligence at CentraleSupélec, where I teach math and optimization courses in the math department. I grew up in San Diego, California but I moved to France in 2017 to study for my Ph.D. in math, then stayed there for a post-doc in Toulouse before finding my current position in the south of the Paris region in 2022. My research is focused mainly on nonsmooth analysis and non-euclidean stochastic optimization. I am an area chair for NeurIPS and ICML, the top conferences in machine learning and artifical intelligence, where I have also published papers on topics like nonsmooth implicit differentiation or how to train large language models with billions of parameters using stochastic noneuclidean optimization methods.

Sebastian Urban Stich

Faculty

CISPA Helmholtz Center

TBD

Sebastian Urban StichL

Faculty, CISPA Helmholtz Center

Biography:

Dr. Sebastian Stich is a tenured faculty member at the CISPA Helmholtz Center for Information Security and a member of the European Laboratory for Learning and Intelligent Systems (ELLIS). His research focuses on the intersection of machine learning, optimization, and statistics, with an emphasis on efficient parallel and distributed algorithms for training models over decentralized datasets.

He obtained his PhD from ETH Zurich and held postdoctoral positions at UCLouvain and EPFL. His work has been recognized with a Meta Research Award (2022), a Google Research Scholar Award (2023), and an ERC Consolidator Grant (CollectiveMinds, 2024).