Haoze Sun

Baichuan Intelligent Multimodal Algorithm Expert

He graduated from Peking University in 2017, and has practical experience in NLP, search and recommendation fields. Since joining Baichuan Intelligence, he has been engaged in text pre-training, SFT, code agent and multi-modal pre-training, and recently focused on the exploration of algorithms for omni-modal models, especially the end-to-end model for speech. Currently, the open source Baichuan-Omni-1.5 multimodal model has achieved the best balance of text capability, image/video comprehension, speech comprehension and generation effect.

Topic

Baichuan-Omni-1.5: Baichuan Intelligence's Practical Exploration of End-to-End Multi-Modal Large Models

As an emerging technology paradigm, the development prospect of full-modal end-to-end model has attracted a lot of attention, however, the data organization and training process of full-modal model faces many challenges such as multimodal capacity balance and speech modality “de-intelligence”. Baichuan Intelligence multimodal team's practical experience will be brought to this session to discuss with industry colleagues to promote the development of omnimodal models. Outline: 1. Introduction to Baichuan-Omni-1.5 multimodal model architecture and training method. 2. 2. Unified Speech Understanding and Generation - Baichuan's technical practice and thinking. 3. future outlook of omnimodal modeling.

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号