Junlin Zhang

Chief Scientist and Head of AI R&D Department at Sina Weibo

Junlin Zhang is a director of the China Society for Computational Linguistics and holds a Ph.D. from the Institute of Software, Chinese Academy of Sciences. He currently serves as the Chief Scientist and Head of AI R&D at Sina Weibo. Previously, he was a senior technical expert at Alibaba, leading a new technology team. He is also the author of the technical books *This is a Search Engine: A Detailed Explanation of Core Technologies* and *Big Data Daily Knowledge: Architecture and Algorithms.

Topic

Reinforcement Learning with Verifiable Rewards (RLVR): Industry Experience, Challenges, and Future Directions

Since the release of DeepSeek-R1, Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a core engine driving the development of large models. By leveraging objectively verifiable signals, such as the correctness of mathematical answers, RLVR optimizes model reasoning performance, advancing AI from “subjective alignment” toward “objective correctness.” However, the explosive growth of RLVR research has also introduced confusion in technology selection, as numerous improvements make it challenging for developers to weigh trade-offs. This talk will first summarize recent academic and industry practices, including the choice between on-policy and off-policy strategies and the relationships and applicable scenarios of various RL methods. It will then analyze current challenges in RLVR development, such as redundant research and insufficient training stability. Finally, it will explore future directions, such as extending RL to agent action optimization and Rubric Reward design, providing developers with a clear framework for technical decision-making. Outline: 1. Introduction to RLVR 2. Industry Experience with RLVR 3. Challenges in RLVR Development 4. Future Directions for RLVR

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号