Pan Zhou | 2025 Machine Learning Summit

Pan Zhou

Multimodal Intelligence Lead at Li Auto

Pan Zhou is currently a multimodal foundation model algorithm expert in the Base Model Department at Li Auto. He holds a Ph.D. from the University of Science and Technology of China and has previously worked as an algorithm researcher at iFlytek, Sogou, and Tencent, focusing on speech recognition technologies. His research interests include speech recognition, voice interaction, multimodal understanding, and large multimodal models.

Topic

Practical Implementation of MindGPT-4o-Audio: Real-Time Speech Dialogue Large Model by Li Xiang

This talk will present Li Xiang’s real-time speech dialogue large model, MindGPT-4o-Audio. It is a full-duplex, low-latency end-to-end speech model capable of natural “listen-and-speak” interactions like a human. It excels in speech-based knowledge Q\&A, multi-role expressive voice generation, diverse style control, and external tool invocation, achieving a level of natural interaction comparable to human-to-human conversation.

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6