Chaojun Xiao | 2025 Machine Learning Summit

Chaojun Xiao

ChatGPT 说： Postdoc at Tsinghua CS Dept, lead author of MiniCPM-4

Postdoctoral researcher at the Natural Language Processing Lab of Tsinghua University, with a research focus on efficient large model architectures. He has published over ten papers in top-tier international AI conferences as first author or co-first author, and is a primary contributor to MiniCPM-4, an efficient edge-side large model. His work has been cited over 3,000 times on Google Scholar. He has received several prestigious honors, including the First Prize of the Qian Weichang Award for Science and Technology in Chinese Information Processing, the Postdoctoral Innovative Talent Support Program, the Tsinghua Shuimu Scholar title, and the Outstanding Scholarship of the Tencent Rhino-Bird Elite Talent Program.

Topic

MiniCPM: Efficient Large Model for Edge Devices

With the rapid development of artificial intelligence, the demand for deploying large models on edge devices has become increasingly urgent. However, traditional large models often face challenges such as high computational resource consumption and slow inference speed, making it difficult to operate effectively in resource-constrained edge environments. MiniCPM, a highly efficient large model specifically optimized for edge devices, offers a solution to this critical problem. Its core innovations span four key dimensions. At the model architecture level, InfLLM v2 features a trainable sparse attention mechanism that significantly accelerates the prefill and decoding stages for long-context processing. In terms of training data, the UltraClean efficient data filtering strategy greatly improves the effectiveness and efficiency of data validation. Regarding training algorithm optimization, Model Wind Tunnel enables efficient pretraining strategy search, segmented exploration sampling achieves load-balanced reinforcement learning, and BitCPM implements extreme parameter compression through post-training ternary quantization. At the inference system level, the CPM.cu inference engine integrates sparse attention, model quantization, and speculative sampling technologies, greatly enhancing inference efficiency. Tests on typical edge chips show that, compared with dense models of the same size, MiniCPM achieves over 5× inference acceleration.

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6