Algorithm Scientist at Tongyi Lab and Core Author of Tongyi DeepResearch

Ph.D. in Software Engineering from Peking University and Algorithm Scientist at Tongyi Lab. He has been deeply involved in the pre-training and post-training development of the Qwen-2.5 and Qwen-3 series of large language models. Specializing in the field of Agentic AI research, he is a core developer of Tongyi DeepResearch—Alibaba’s first open-source deep research Agentic LLM—responsible for the end-to-end construction of Agent models. His work spans multiple key stages, including virtual environment design, high-quality data generation, exploration of advanced Agent paradigms, as well as model pre-training and post-training. He has achieved state-of-the-art performance across multiple authoritative Agent benchmarks such as Humanity's Last Exam, BrowseComp, BrowseComp-zh, and GAIA.

Topic

Tongyi DeepResearch: A Full-Stack Methodology for Building SOTA-Level AI Agents

Alibaba recently released the fully open-source agentic LLM “Tongyi DeepResearch,” which achieved breakthrough results on several challenging web information retrieval and reasoning benchmarks, performing on par with or even surpassing leading closed-source models. This work not only open-sourced a high-performance model but also, for the first time, publicly released a reproducible, full-stack methodology for building agents. Its core technical breakthroughs include three key areas: 1. **Innovative agent paradigms:** Introducing the model’s exceptional performance under the basic ReAct paradigm, and presenting a new paradigm—IterResearch (deep mode)—designed for extremely complex tasks. By dynamically planning and restructuring the workspace, IterResearch effectively ensures robust long-range reasoning performance. 2. **Data synthesis flywheel:** Demonstrating how formal task modeling and automated pipeline construction can enable large-scale self-generation and iterative optimization of complex training data that surpasses human expert quality, thus breaking through the performance ceiling of agentic LLMs. 3. **Integrated training pipeline:** Detailing the complete process from Agentic CPT and SFT to on-policy RL, revealing the underlying mechanisms driving the evolution of agent capabilities. Outline: I. Introduction and key achievements II. Agent paradigms—from basic reasoning to deep research III. End-to-end data synthesis IV. Integrated training pipeline

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号