Wanqing He
Vice President of Qingcheng.ai
Dr. He Wanqing is currently the Vice President of Qingcheng.ai. He previously served as Senior Director at Biren Technology, where he was responsible for Turnkey systems and application optimization. His past roles include Chief Engineer at Intel DCAI, Head of High-Performance Computing at Alibaba Cloud and Senior Technical Expert, CTO of 360 Cloud, and R&D Manager at Motorola and Guodian Power. Dr. He graduated from Shanghai Jiao Tong University in 1999 and has devoted the past 25 years to HPC parallel optimization, cloud computing, and AI application performance tuning. He has also invested significant time in fostering industry-academia-research collaboration within the China Computer Federation (CCF). He has served as a CCF Executive Committee Member, Standing Committee Member of the CCF High Performance Computing Technical Committee, Vice Chair of CCF YOCSEF Headquarters, Vice Chair of ACM Hangzhou, and Chair or Committee Member of the Enterprise Track of the CNCC 2022, 2023, and 2024 Technical Forums. He has been awarded numerous honors, including CCF Honorary Member, CCF Distinguished Speaker, and CCF Outstanding Contribution Award. He has authored three books on parallel development and cloud computing, translated and published five books on the internet, popular science, and engineering technology, and received the 40th Anniversary Outstanding Contribution Award from the Publishing House of Electronics Industry as well as the 2024 Best Translator Award from Cheers Publishing.
Topic
Optimization Techniques for Large-Model Training and Inference and Turnkey Performance Delivery
Introduce the optimization techniques behind Qingcheng Jizhi’s Chitu inference engine and the Bagualu training-optimization toolkit, plus the Turnkey (Taiji) performance-delivery platform. Break down how Chitu achieves joint optimization of algorithms, the inference engine, and operators that goes beyond optimizing individual operator sets. Discuss how engineering optimizations can be delivered as a PaaS product, and share practical applications of combining Bagualu module principles—fine-tuning optimizations, graph compilation, hybrid quantization, memory management, and heterogeneous training. Provide E2D optimization templates for deploying inference via the Turnkey platform (including, but not limited to, affinity tuning, load balancing, and cache optimizations) with real-world examples, and describe the practice of separating PD in Kubernetes clusters. Outline: 1. Problem statement and analysis: mathematical models from scientific computing to AI inference, and their requirements for precision and algorithms 2. The origin and evolution of the Chitu inference engine; technical roadmap 3. Principles of Bagualu training-optimization modules 4. Principles and implementation of the Turnkey (Taiji) performance-delivery engine 5. Optimization case studies