Sun Yabo
Senior Algorithm Engineer at Kingsoft Office
Focusing on large language models (LLMs) and multimodal domains, I spearheaded the development of Kingsoft Office's government-grade large model pre-training framework and played a key role in the development of the Monkey series models and the implementation of multimodal training. I built Kingsoft Office's image translation capabilities from scratch, achieving end-to-end layout understanding and multilingual translation pipelines. I am committed to deeply integrating large model technology with real-world office scenarios to drive the large-scale adoption of intelligent office products.
Topic
Exploration and Practice of Multimodal Technologies in the Office Domain
This talk will share Kingsoft Office’s exploration and practical experience with multimodal technologies in office scenarios. Centered on the Monkey series of models, combined with large language models (LLMs) and multimodal techniques, we have built an intelligent office system for document understanding and image translation. In document parsing scenarios, multimodal technologies enable precise analysis and information extraction from complex document layouts. In image translation scenarios, we developed an end-to-end pipeline from scratch for layout understanding and multilingual translation, seamlessly integrating text recognition, semantic understanding, and translation output. The presentation will cover the technical roadmap, model optimization, real-world deployment results, and innovative applications in office settings. It will share practical experience and challenges in multimodal large models, demonstrating how cutting-edge AI technologies can be deeply integrated into real office products to enhance user experience and improve work efficiency.