Fan Mo
Senior Algorithm Engineer at Moore Threads
Graduated from Shanghai Jiao Tong University, senior algorithm engineer at Moore Threads and maintainer of the open-source project torch\_musa. He leads the adaptation and optimization of PyTorch on MUSA and has long focused on performance tuning for large model training and inference, covering areas such as distributed parallelism, operator fusion, low-precision inference, and dynamic compiler stacks. He is also advancing open-source compatibility and engineering automation for domestic chip ecosystems, striving to make domestic computing power more efficient and easier to use.
Topic
Collaborative Evolution Practices of Large-Model Training and AI Frameworks at Moore Threads
Large-model training increasingly demands more diverse support from AI frameworks. This talk reviews the current development status of mainstream frameworks, analyzes the communication, memory, and fault-tolerance challenges encountered in large-scale training, and — drawing on MT’s practical experience in co-developing large-model training and AI frameworks — shares lessons on tuning distributed strategies, optimizing unified compute parallelism, and low-precision training. Finally, it discusses remaining problems and future directions for framework development to support efficient large-model training and inference. 1. Introduction to large-model evolution: from dense models to MoE (mixture-of-experts), and how framework requirements are changing and evolving. 2. Overview of current AI frameworks, including support for various backends (especially non-CUDA), key development directions for frameworks, and adaptability for custom optimizations. 3. Challenges faced by current AI frameworks in large-scale training, including overlap of computation and communication, memory pressure, and fault tolerance. 4. MT’s case studies in large-model training and framework optimization, highlighting tuning of distributed strategies, optimizations for unified compute parallelism, and low-precision training. 5. Open problems and outlook for the future development of large models and AI frameworks.