免费领取大会全套演讲PPT    

点击领取

我要参会

Sun Yabo

Senior Algorithm Engineer at Kingsoft Office

Focusing on large language models (LLMs) and multimodal domains, I spearheaded the development of Kingsoft Office's government-grade large model pre-training framework and played a key role in the development of the Monkey series models and the implementation of multimodal training. I built Kingsoft Office's image translation capabilities from scratch, achieving end-to-end layout understanding and multilingual translation pipelines. I am committed to deeply integrating large model technology with real-world office scenarios to drive the large-scale adoption of intelligent office products.

Topic

Exploration and Practice of Multimodal Technologies in the Office Domain

This talk will share Kingsoft Office’s exploration and practical experience with multimodal technologies in office scenarios. Centered on the Monkey series of models, combined with large language models (LLMs) and multimodal techniques, we have built an intelligent office system for document understanding and image translation. In document parsing scenarios, multimodal technologies enable precise analysis and information extraction from complex document layouts. In image translation scenarios, we developed an end-to-end pipeline from scratch for layout understanding and multilingual translation, seamlessly integrating text recognition, semantic understanding, and translation output. The presentation will cover the technical roadmap, model optimization, real-world deployment results, and innovative applications in office settings. It will share practical experience and challenges in multimodal large models, demonstrating how cutting-edge AI technologies can be deeply integrated into real office products to enhance user experience and improve work efficiency.

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号