Ying Wen
Associate Professor, Long-term Teaching Track, School of AI, Shanghai Jiao Tong University, China
Ying Wen is a long term faculty track associate professor and PhD supervisor at the School of Artificial Intelligence, Shanghai Jiaotong University. His research interests involve multi-intelligent body learning, reinforcement learning and the application of game theory in it. He received his Ph.D. and research master's degree from the Department of Computing, University College London, UK, in 2020 and 2016, respectively. He was selected as one of the Shanghai Overseas High-level Talents, and presided over the topics of the National Key Research and Development Program as a responsible person, and the Shanghai Young Science and Technology Talents Yangfan Program. More than 40 of his research results have been published in ICML, NeurIPS, ICLR, IJCAI, AAMAS and other top international conferences in related fields, and he has won the best system paper award in CoRL 2020 and the best paper award in AAMAS 2021 Blue Sky Track. He has served as a PC member or reviewer for many consecutive years for prestigious international conferences/journals such as ICML, NeurIPS, IJCAI, AAAI, IROS, ICAPS, Operational Research, etc.
Topic
Reinforcement feedback-based self-improvement and reasoning enhancement for large models
The ability of Large Language Models (LLMs) to improve relies on continuous access to high-quality data and feedback signals. While the pre-training phase already utilizes a large amount of high-quality data, the key to continued growth lies in the constant introduction of new high-quality data. As manual data production is costly and difficult to meet the demand, it becomes crucial to explore ways for big models to generate and filter data iteratively on their own. This talk will explore the data reproduction process of large models, including the three steps of generation, evaluation, and training. The core challenge is to design efficient algorithms and feedback utilization mechanisms for effective screening and evaluation of data, reinforcement learning by applying different levels of feedback signals to ensure that only the most valuable data is used for iterative training of the model, and enhancement of complexity of the Inference phase. Reasoning and decision-making tasks in the Inference phase.