免费领取大会全套演讲PPT    

点击领取

我要参会

Han Peng

Senior Algorithm Expert at Ant Group and Head of Post-Training Algorithms for the Bailing Multimodal Large Model.

PhD in Physics from the University of Oxford. Former Postdoctoral Researcher at the Visual Geometry Group (VGG) at the University of Oxford and Senior Software Engineer at Google, where he worked on the large-scale industrial deployment of computer vision and agent technologies. He is currently a Senior Algorithm Expert at Ant Group and the Head of Post-Training Algorithms for the Bailing Multimodal Large Model. As a core contributor, he participated in building the 100B-parameter fully multimodal open-source large model Ming-flash-omni. He is currently focused on post-training for multimodal foundation models, logical reasoning, and next-generation agent technologies.

Topic

Bailing Multimodal Ming-Omni: R&D Practices and Exploration

On the eve of the Spring Festival, the Bailing team released an open-source MoE-based multimodal foundation model with a total of 100B parameters: Ming-flash-omni-2.0, aiming to build a versatile AI capable of seeing, hearing, speaking, and creating images. Compared to its predecessor, the new version achieves significant improvements across key dimensions, including multimodal understanding, logical reasoning, image generation and editing, speech recognition, and audio generation, and has reached state-of-the-art (SOTA) performance on multiple authoritative benchmarks. This breakthrough is driven by two major phases of iteration. First, from Ming-lite-omni to Ming-flash-omni-preview, the team validated the impact of model scaling on performance. Then, from Ming-flash-omni-preview to Ming-flash-omni-2.0, fine-grained optimization over massive datasets enabled new SOTA results across all modalities. The release of Ming-flash-omni-2.0 demonstrates that a unified-architecture, fully multimodal model can not only become a broadly capable generalist but also achieve top-tier expertise in specific modalities.

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号