Algorithm Scientist at Tongyi Lab and Core Author of Tongyi DeepResearch
Ph.D. in Software Engineering from Peking University and Algorithm Scientist at Tongyi Lab. He has been deeply involved in the pre-training and post-training development of the Qwen-2.5 and Qwen-3 series of large language models. Specializing in the field of Agentic AI research, he is a core developer of Tongyi DeepResearch—Alibaba’s first open-source deep research Agentic LLM—responsible for the end-to-end construction of Agent models. His work spans multiple key stages, including virtual environment design, high-quality data generation, exploration of advanced Agent paradigms, as well as model pre-training and post-training. He has achieved state-of-the-art performance across multiple authoritative Agent benchmarks such as Humanity's Last Exam, BrowseComp, BrowseComp-zh, and GAIA.
Topic
Tongyi DeepResearch: A Full-Stack Methodology for Building SOTA-Level AI Agents
Alibaba recently released the fully open-source agentic LLM “Tongyi DeepResearch,” which achieved breakthrough results on several challenging web information retrieval and reasoning benchmarks, performing on par with or even surpassing leading closed-source models. This work not only open-sourced a high-performance model but also, for the first time, publicly released a reproducible, full-stack methodology for building agents. Its core technical breakthroughs include three key areas: 1. **Innovative agent paradigms:** Introducing the model’s exceptional performance under the basic ReAct paradigm, and presenting a new paradigm—IterResearch (deep mode)—designed for extremely complex tasks. By dynamically planning and restructuring the workspace, IterResearch effectively ensures robust long-range reasoning performance. 2. **Data synthesis flywheel:** Demonstrating how formal task modeling and automated pipeline construction can enable large-scale self-generation and iterative optimization of complex training data that surpasses human expert quality, thus breaking through the performance ceiling of agentic LLMs. 3. **Integrated training pipeline:** Detailing the complete process from Agentic CPT and SFT to on-policy RL, revealing the underlying mechanisms driving the evolution of agent capabilities. Outline: I. Introduction and key achievements II. Agent paradigms—from basic reasoning to deep research III. End-to-end data synthesis IV. Integrated training pipeline