Fan Mo | 2025 Machine Learning Summit

Fan Mo

Senior Algorithm Engineer at Moore Threads

Graduated from Shanghai Jiao Tong University, senior algorithm engineer at Moore Threads and maintainer of the open-source project torch\_musa. He leads the adaptation and optimization of PyTorch on MUSA and has long focused on performance tuning for large model training and inference, covering areas such as distributed parallelism, operator fusion, low-precision inference, and dynamic compiler stacks. He is also advancing open-source compatibility and engineering automation for domestic chip ecosystems, striving to make domestic computing power more efficient and easier to use.

Topic

Collaborative Evolution Practices of Large-Model Training and AI Frameworks at Moore Threads

Large-model training increasingly demands more diverse support from AI frameworks. This talk reviews the current development status of mainstream frameworks, analyzes the communication, memory, and fault-tolerance challenges encountered in large-scale training, and — drawing on MT’s practical experience in co-developing large-model training and AI frameworks — shares lessons on tuning distributed strategies, optimizing unified compute parallelism, and low-precision training. Finally, it discusses remaining problems and future directions for framework development to support efficient large-model training and inference. 1. Introduction to large-model evolution: from dense models to MoE (mixture-of-experts), and how framework requirements are changing and evolving. 2. Overview of current AI frameworks, including support for various backends (especially non-CUDA), key development directions for frameworks, and adaptability for custom optimizations. 3. Challenges faced by current AI frameworks in large-scale training, including overlap of computation and communication, memory pressure, and fault tolerance. 4. MT’s case studies in large-model training and framework optimization, highlighting tuning of distributed strategies, optimizations for unified compute parallelism, and low-precision training. 5. Open problems and outlook for the future development of large models and AI frameworks.

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6