Pan Zhou
Multimodal Intelligence Lead at Li Auto
Pan Zhou is currently a multimodal foundation model algorithm expert in the Base Model Department at Li Auto. He holds a Ph.D. from the University of Science and Technology of China and has previously worked as an algorithm researcher at iFlytek, Sogou, and Tencent, focusing on speech recognition technologies. His research interests include speech recognition, voice interaction, multimodal understanding, and large multimodal models.
Topic
Practical Implementation of MindGPT-4o-Audio: Real-Time Speech Dialogue Large Model by Li Xiang
This talk will present Li Xiang’s real-time speech dialogue large model, MindGPT-4o-Audio. It is a full-duplex, low-latency end-to-end speech model capable of natural “listen-and-speak” interactions like a human. It excels in speech-based knowledge Q\&A, multi-role expressive voice generation, diverse style control, and external tool invocation, achieving a level of natural interaction comparable to human-to-human conversation.