Zhang Jiagu
vLLM Community Contributor, Red Hat Greater China CTO
Passionate about open-source software and communities, with over 20 years of experience in product development, architecture design, and team management across multiple domains including Linux operating systems, distributed systems, storage and high availability, virtualization and cloud computing, containers and cloud-native technologies, and AI infrastructure. Previously held roles such as Chief Architect, Technical Director, and China Technical Lead at renowned multinational corporations and leading domestic ICT enterprises. Additionally, as an independent contributor, initiated open-source projects and successfully promoted them internationally. Currently serves in the CTO Office at Red Hat Asia Pacific, leading the incubation and promotion of AI inference-related open-source communities such as vLLM and llm-d, implementing AI technology strategies, and prioritizing the innovative application and deployment of 100% open-source AI products and solutions across vertical industries.
Topic
vLLM-compile: Bringing Compiler Optimizations to Large Model Inference
vLLM has become a widely adopted open-source large model inference engine, supporting multiple models and hardware accelerators. Its integration of torch.compile enables performance portability and provides a clear decoupling between model implementation and low-level optimization. In this talk, we will explore the design of vLLM-compile, diving into key fused operator optimizations and graph transformations, and discuss how compilation can improve runtime efficiency, thereby boosting developer productivity. We will also cover recent advances in the field, including new techniques for reducing compilation time and a novel compiler intermediate representation specifically designed for large models. 1. Background and Challenges 2. vLLM-compile Architecture Design 3. Detailed Overview of Core Optimization Techniques 4. Latest Features and Future Outlook Audience Takeaways: Gain an understanding of the core technologies behind vLLM-compile and anticipate future directions in its technical evolution.