Liu Shujie
Principal Researcher, Microsoft Research Asia (MSRA) Hong Kong
Dr. Liu Shujie is a Principal Researcher at Microsoft Research Asia (MSRA) Hong Kong. He joined MSRA Beijing in July 2012 after earning his Ph.D. from the School of Computer Science and Technology at Harbin Institute of Technology, and transferred to MSRA Hong Kong in October 2024. His research spans spoken language processing, multimodal large language models, and medical AI, with a focus on improving quality of life through artificial intelligence. His work received the IEEE 2025 SPS Best Paper Award. He has published more than 100 papers in top journals and conferences in natural language processing and speech processing, co-authored the book Machine Translation, and contributed to the book Introduction to Artificial Intelligence. He has achieved multiple first-place results in international evaluations for natural language and speech processing, and has served as a reviewer and area chair for numerous international conferences. His research has been successfully integrated into a range of Microsoft products, including Microsoft Translator, Skype Translator, Microsoft IME, XiaoIce, and Microsoft Speech Service.
Topic
Zero-Shot Speech Synthesis Based on Large Language Models
With the growing application of large language models (LLMs) in natural language processing, speech-focused LLMs are also gaining increasing attention. In this talk, we will introduce zero-shot speech synthesis based on large language models, specifically VALL-E, which leverages LLMs’ contextual learning capabilities to generate high-quality, personalized speech using only a three-second recording of an unseen speaker as an audio prompt. Building on this foundation, we will discuss several extensions of VALL-E, including VALL-E X for multilingual support, VALL-E 2 addressing stability issues, PALLE combining AR and NAR approaches, as well as MELL-E and FELLE, which are based on continuous encoding techniques.