Haoze Sun | 2025 Machine Learning Summit

Haoze Sun

Baichuan Intelligent Multimodal Algorithm Expert

He graduated from Peking University in 2017, and has practical experience in NLP, search and recommendation fields. Since joining Baichuan Intelligence, he has been engaged in text pre-training, SFT, code agent and multi-modal pre-training, and recently focused on the exploration of algorithms for omni-modal models, especially the end-to-end model for speech. Currently, the open source Baichuan-Omni-1.5 multimodal model has achieved the best balance of text capability, image/video comprehension, speech comprehension and generation effect.

Topic

Baichuan-Omni-1.5: Baichuan Intelligence's Practical Exploration of End-to-End Multi-Modal Large Models

As an emerging technology paradigm, the development prospect of full-modal end-to-end model has attracted a lot of attention, however, the data organization and training process of full-modal model faces many challenges such as multimodal capacity balance and speech modality “de-intelligence”. Baichuan Intelligence multimodal team's practical experience will be brought to this session to discuss with industry colleagues to promote the development of omnimodal models. Outline: 1. Introduction to Baichuan-Omni-1.5 multimodal model architecture and training method. 2. 2. Unified Speech Understanding and Generation - Baichuan's technical practice and thinking. 3. future outlook of omnimodal modeling.

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6