Haoze Sun
Baichuan Intelligent Multimodal Algorithm Expert
He graduated from Peking University in 2017, and has practical experience in NLP, search and recommendation fields. Since joining Baichuan Intelligence, he has been engaged in text pre-training, SFT, code agent and multi-modal pre-training, and recently focused on the exploration of algorithms for omni-modal models, especially the end-to-end model for speech. Currently, the open source Baichuan-Omni-1.5 multimodal model has achieved the best balance of text capability, image/video comprehension, speech comprehension and generation effect.
Topic
Baichuan-Omni-1.5: Baichuan Intelligence's Practical Exploration of End-to-End Multi-Modal Large Models
As an emerging technology paradigm, the development prospect of full-modal end-to-end model has attracted a lot of attention, however, the data organization and training process of full-modal model faces many challenges such as multimodal capacity balance and speech modality “de-intelligence”. Baichuan Intelligence multimodal team's practical experience will be brought to this session to discuss with industry colleagues to promote the development of omnimodal models. Outline: 1. Introduction to Baichuan-Omni-1.5 multimodal model architecture and training method. 2. 2. Unified Speech Understanding and Generation - Baichuan's technical practice and thinking. 3. future outlook of omnimodal modeling.