Chen Zhang | 2025 Machine Learning Summit

Chen Zhang

Senior Algorithm Engineer at Moore Threads, Former Senior Algorithm Researcher at Tencent

Responsible for Moore's Threads distributed training research and development. More than 10 years of experience in NLP, focusing on NLP algorithms, distributed training, and large-scale optimization. Participated in Tencent Search business optimization, led the team to participate in CLUE large model benchemark evaluation, and won the Top 10 with a small model under 1B. Deep learning veteran, MXNet.cpp Commiter.

Topic

Moore's Threads Full-Featured GPU Distributed Training Performance Optimization Exploration for Large-Scale Language Models

Introduction: In the wave of large model training, the distributed training capability of domestic full-featured GPUs is ushering in an unprecedented breakthrough. Moore Threads AI Infra group has been deeply engaged in large language model training technology for nearly three years, and has been ranked among the top 10 in CLUE evaluation, successfully adapted to almost all the mainstream model training frameworks, and constructed a large-scale domestic graphics card cluster to achieve the industry's top-level MFUs with the help of FP8 acceleration. at the same time, we are the first one to complete the highly efficient adaptation of the DeepSeek model to achieve excellent training performance. In this talk, we will analyze the compatibility advantage of domestic full-featured GPUs in large-scale model training, share the core practice of optimization from Dense model to MoE model, and discuss the breakthrough direction of domestic AI computing hardware in future large-scale training, so as to provide developers with real-world experience and in-depth thinking. Outline: 1、Domestic Graphics Card AI Computing Architecture: MUSA High Compatibility and MT-Megatron and Other Framework Achievements 2、Dense model optimization exploration: challenges and optimization strategies for distributed training of dense models. 3. MoE model acceleration practice: Efficient adaptation and performance optimization of DeepSeek-like MoE models. 4、Future Outlook: How Domestic AI Computing Hardware Continues to Make Breakthroughs in Large-Scale Model Training

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6