KAUST Workshop on Distributed Training in the Era of Large Models
ABOUT THE WORKSHOP
The KAUST Workshop on Distributed Training in the Era of Large Models is a three-day, in-person event dedicated to one of the most pressing challenges in artificial intelligence today: how to efficiently scale learning algorithms to meet the massive computational and data demands of modern AI models.
Over the past few years, models have grown from millions to hundreds of billions of parameters, delivering remarkable improvements in capability, but also creating unprecedented challenges for training. Scaling this next generation of AI systems requires advances across distributed optimization, communication-efficient algorithms, model-parallel architectures, and hardware-software co-design.
This workshop will bring together an exceptional group of international researchers, industry practitioners, and KAUST faculty to share state-of-the-art solutions and explore what comes next for distributed and large-scale training. Through invited talks and informal interactions, participants will have the opportunity to engage deeply with emerging ideas, discover new research directions, and build collaborations that will shape the future of scalable machine learning.
AGENDA
*Coming soon
SPEAKERS
*Additional speakers coming soon.
Communication Efficient Model Parallel Training of Language Models
Thalaiyasingam Ajanthan
Founding Scientist, Pluralis Research
Abstract:
Biography:
Thalaiyasingam Ajanthan is a Founding Scientist at Pluralis Research, focusing on making large-scale decentralized training practical and scalable. I also hold a Visiting Researcher position at the Australian National University (ANU). Before this, He was a Senior Machine Learning Scientist at Amazon and a Postdoctoral Research Fellow at the Australian Centre for Robotic Vision (ANU) and the Torr Vision Group (University of Oxford). I received my PhD in 2017 from ANU under the supervision of Prof. Richard Hartley.
His research interests lie at the intersection of optimization and machine learning, with a particular focus on developing robust and efficient decentralized training algorithms.
Training neural networks at any scale
Volkan Cevher
Professor, EPFL
Abstract:
Biography:
Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park, from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. He was also a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University from 2010-2020. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and an Amazon Scholar. His research interests include machine learning, optimization theory and methods, and automated control. Dr. Cevher is an IEEE Fellow ('24), an ELLIS fellow, and was the recipient of the ICML AdvML Best Paper Award in 2023, Google Faculty Research award in 2018, the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.
Building Multi-dimensional Parallel Training Systems for Large AI Models
Heming Cui
Associate Professor, University of Hong Kong (HKU)
Biography:
Dr. Heming Cui is an Associate Professor in HKU CS. Dr. Cui is interested in building software infrastructures and tools to greatly improve the reliability, security and performance of real-world software. After gaining his PhD degree in the Columbia University in 2015, he joined HKU and independently built a parallel and distributed system group with about 20 ongoing, full-time PhD students. His recent research has led to a series of open source projects and publications in international top conferences and journals of broad areas, including SOSP, IEEE S&P, VLDB, TPDS, MICRO, NSDI, ASPLOS, ATC, ICSE, EuroSys. Dr. Cui received the ACM ICSE 2025 best paper award and the ACM ACSAC 2017 best paper award. As a research project leader (principle investigator), his research projects have received a total fundings of about HKD $150 million, including China's National Key R&D Program (CNY $110 million), and HK's Research Grants Council (e.g., the RGC Research Impact Fund). Dr Cui received his bachelor and master degrees from Tsinghua University, and PhD from Columbia University, all in Computer Science
On Provable Benefits of Muon in Federated Learning
Hongchang Gao
Assistant Professor, Temple University
Biography:
Hongchang Gao is an assistant professor in the Department of Computer and Information Sciences at Temple University. He received his Ph.D. degree in Electrical and Computer Engineering from University of Pittsburgh in 2020. His research interests include machine learning, optimization, and biomedical data science, with a special focus on distributed optimization and federated learning. His work has been published in top venues such as ICML, NeurIPS, AISTATS, KDD, AAAI, and IJCAI. He currently serves as an Associate Editor for the Journal of Combinatorial Optimization and regularly acts as an Area Chair for ICML, NeurIPS, and ICLR. He is a recipient of the NSF CAREER Award (2024), the AAAI New Faculty Highlights (2023), and the Cisco Faculty Research Award (2023).
The Benefits of Adam and Sign-like Methods in Training Large Language Models
Frederik Kunstner
Postdoctoral researcher, Inria
Abstract:
Biography:
Frederik Kunstner is a postdoctoral researcher at INRIA Paris, working with Francis Bach. His research focuses on the intersection of optimization theory and machine learning, aiming to build a better understanding on how to train machine learning models. He received his PhD from the University of British Columbia where he worked with Mark Schmidt. His thesis has received the CAIAC Best Doctoral Dissertation Award and a AAAI/ACM SIGAI Honorable Mention, and his work on the Expectation-Maximization algorithm has won the best paper award at AISTATS 2021.
Conda: Column-Normalized Adam for Training Large Language Models Faster
Zhouchen Lin
Professor, Peking University
Biography:
Zhouchen Lin received the Ph.D. degree in applied mathematics from Peking University in 2000. He is currently a Boya Special Professor with the State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University. His research interests include machine learning and numerical optimization. He has published over 350 technical papers and 5 monographs, receiving over 41,000 Google Scholar citations. He has been Area Chairs and Senior Area Chairs of ACML, ACCV, CVPR, ICCV, NIPS/NeurIPS, AAAI, IJCAI, ICLR, and ICML for many times, and a Program co-Chair of ICPR 2022. He is currently a member of ICML Board of Directors, an Associate Editor in Chief of the IEEE Transactions on Pattern Analysis and Machine Intelligence, and an associate editor of the International Journal of Computer Vision and Optimization Methods and Software. He is a Fellow of the IAPR, the IEEE, the AAIA and the CSIG.
Hiding Communication Cost via Overlapping Methods (TBC)
Edouard Oyallon
Research Scientist, CNRS
Abstract:
Biography:
Edouard Oyallon, a CNRS researcher at Sorbonne University (MLIA). His research spans theoretical and applied machine learning, from studying deep network symmetries to designing algorithms for large-scale distributed and decentralized training with asynchronous methods and parallelism.
Training LLMs: Do We Understand Our Optimizers?
Antonio Orvieto
Principal Investigator & Independent Group Leader, MPI-IS, ELLIST
Abstract:
Biography:
Antonio is an Independent Group Leader of the MPI for Intelligent Systems and a Hector Endowed Fellow and Principal Investigator (PI) at the ELLIS Institute Tübingen, where he leads the Deep Models and Optimization group. He received the ETH medal for outstanding doctoral theses and the Schmidt Sciences AI2050 Early Career Fellowship.
In his research, Antonio strives to improve the efficiency of deep learning technologies by pioneering new architectures and training techniques grounded in theoretical knowledge. His work encompasses two main areas: understanding the intricacies of large-scale optimization dynamics and designing innovative architectures and powerful optimizers capable of handling data with a complex sequential nature.
Toward Compute- and Memory-Efficient LLM Training: Optimizers and Low-Precision Recipes
Youngsuk Park
Senior Applied Scientist & Manager, Amazon AWS AI
Abstract:
Biography:
Youngsuk Park leads a core algorithm team at AWS Annapurna Labs advancing scalable and efficient LLM training and inference methods. He manages a high-caliber research group, pioneering innovations in quantization, structured sparsity, and hardware-aware modelling and algorithms optimized for AWS Trainium. His algorithmic work spans the full model lifecycle—from efficient large-scale training recipes to low-latency inference deployment—powering foundation models across Amazon Bedrock, AGI, and partners like Anthropic. He has co-authored 30+ papers at ICLR, ICML, AISTATS, and KDD on LLM training, inference, optimization, time-series and reinforcement learning, and regularly organizes tutorials and workshops at top AI conferences, sharing practical insights on deploying foundation models efficiently on AI accelerators.
Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition
Frank Schneider
Postdoctoral Researcher, University of Tübingen
Abstract:
Biography:
Frank Schneider is a postdoctoral researcher at the Chair for the Methods of Machine Learning at the University of Tübingen. Before that, he did his Ph.D. in the same group as part of the IMPRS-IS (International Max Planck Research School for Intelligent Systems) under the supervision of Prof. Dr. Philipp Hennig. His research focuses on helping the community move beyond the unsatisfactory user experience of current optimization methods for deep learning. He currently serves as one of the two chairs of MLCommons' Algorithms Working Group. He holds a Bachelor’s and Master’s degree in Simulation Technology from the University of Stuttgart as well as a Master’s degree in Industrial and Applied Mathematics from the Eindhoven University of Technology.
Training Neural Networks at Any Scale with the Scion Algorithm
Antonio Silveti-Falls
Associate Professor, CentraleSupélec
Abstract:
Biography:
I am a researcher and associate professor of artificial intelligence at CentraleSupélec, where I teach math and optimization courses in the math department. I grew up in San Diego, California but I moved to France in 2017 to study for my Ph.D. in math, then stayed there for a post-doc in Toulouse before finding my current position in the south of the Paris region in 2022. My research is focused mainly on nonsmooth analysis and non-euclidean stochastic optimization. I am an area chair for NeurIPS and ICML, the top conferences in machine learning and artifical intelligence, where I have also published papers on topics like nonsmooth implicit differentiation or how to train large language models with billions of parameters using stochastic noneuclidean optimization methods.
TBD
Sebastian Urban StichL
Faculty, CISPA Helmholtz Center
Biography:
Dr. Sebastian Stich is a tenured faculty member at the CISPA Helmholtz Center for Information Security and a member of the European Laboratory for Learning and Intelligent Systems (ELLIS). His research focuses on the intersection of machine learning, optimization, and statistics, with an emphasis on efficient parallel and distributed algorithms for training models over decentralized datasets.
He obtained his PhD from ETH Zurich and held postdoctoral positions at UCLouvain and EPFL. His work has been recognized with a Meta Research Award (2022), a Google Research Scholar Award (2023), and an ERC Consolidator Grant (CollectiveMinds, 2024).
A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models
Kun Yuan
Assistant Professor, Peking University
Biography:
Dr. Kun Yuan is an Assistant Professor at Center for Machine Learning Research (CMLR) in Peking University. He completed his Ph.D. degree at UCLA in 2019, and was a staff algorithm engineer in Alibaba (US) Group between 2019 and 2022. His research focuses on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of Things. He was the recipient of the 2017 IEEE Signal Processing Society Young Author Best Paper Award, and the 2017 ICCM Distinguished Paper Award. Some of his work has been integrated to Alibaba MindOpt Solver and NVIDIA DeepStream library.
A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models
Kun Yuan
Assistant Professor, Peking University
Biography:
Dr. Kun Yuan is an Assistant Professor at Center for Machine Learning Research (CMLR) in Peking University. He completed his Ph.D. degree at UCLA in 2019, and was a staff algorithm engineer in Alibaba (US) Group between 2019 and 2022. His research focuses on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of Things. He was the recipient of the 2017 IEEE Signal Processing Society Young Author Best Paper Award, and the 2017 ICCM Distinguished Paper Award. Some of his work has been integrated to Alibaba MindOpt Solver and NVIDIA DeepStream library.
A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models
Kun Yuan
Assistant Professor, Peking University
Biography:
Dr. Kun Yuan is an Assistant Professor at Center for Machine Learning Research (CMLR) in Peking University. He completed his Ph.D. degree at UCLA in 2019, and was a staff algorithm engineer in Alibaba (US) Group between 2019 and 2022. His research focuses on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of Things. He was the recipient of the 2017 IEEE Signal Processing Society Young Author Best Paper Award, and the 2017 ICCM Distinguished Paper Award. Some of his work has been integrated to Alibaba MindOpt Solver and NVIDIA DeepStream library.
A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models
Kun Yuan
Assistant Professor, Peking University
Biography:
Dr. Kun Yuan is an Assistant Professor at Center for Machine Learning Research (CMLR) in Peking University. He completed his Ph.D. degree at UCLA in 2019, and was a staff algorithm engineer in Alibaba (US) Group between 2019 and 2022. His research focuses on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of Things. He was the recipient of the 2017 IEEE Signal Processing Society Young Author Best Paper Award, and the 2017 ICCM Distinguished Paper Award. Some of his work has been integrated to Alibaba MindOpt Solver and NVIDIA DeepStream library.
A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models
Kun Yuan
Assistant Professor, Peking University
Biography:
Dr. Kun Yuan is an Assistant Professor at Center for Machine Learning Research (CMLR) in Peking University. He completed his Ph.D. degree at UCLA in 2019, and was a staff algorithm engineer in Alibaba (US) Group between 2019 and 2022. His research focuses on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of Things. He was the recipient of the 2017 IEEE Signal Processing Society Young Author Best Paper Award, and the 2017 ICCM Distinguished Paper Award. Some of his work has been integrated to Alibaba MindOpt Solver and NVIDIA DeepStream library.
KAUST Centers of Excellence
KAUST Launches Four Pioneering Centers of Excellence to Address Key National and International Priorities
KAUST CORE LABS
KAUST hosts a wide range of sophisticated instruments and world-class facilities that students can access, including the Prototyping and Product Development Core Lab, and laboratories involving robotics and embedded systems, sensors, intelligent autonomous systems and biotechnology. Specific labs will be identified based on the curriculum and individual projects.
A NEW ERA FOR KAUST
Our unrelenting commitment to research, innovation and talent has seen KAUST establish itself as one of the leading research universities in the world, ranking #1 for citations per faculty globally, with a reputation for impact-driven research that contributes to the betterment of the world. This new era of KAUST builds on our many successes, achievements and strong foundations, and our new strategy represents an evolution that brings us closer to the interests of the Kingdom.
CONTACT US
King Abdullah University of Science and Technology (KAUST)
4700 King Abdullah University of Science and Technology
Thuwal 23955-6900
Kingdom of Saudi Arabia
Follow us on Social media: