Jiaju Zhang | 2026 Singularity Intelligent technology Summit

免费领取大会全套演讲PPT

点击领取

我要参会

Jiaju Zhang

vLLM Community Contributor, Red Hat Greater China CTO

Passionate about open-source software and communities, with over 20 years of experience in product development, architecture design, and team management across multiple domains including Linux operating systems, distributed systems, storage and high availability, virtualization and cloud computing, containers and cloud-native technologies, and AI infrastructure. Previously held roles such as Chief Architect, Technical Director, and China Technical Lead at renowned multinational corporations and leading domestic ICT enterprises. Additionally, as an independent contributor, initiated open-source projects and successfully promoted them internationally. Currently serves in the CTO Office at Red Hat Asia Pacific, leading the incubation and promotion of AI inference-related open-source communities such as vLLM and llm-d, implementing AI technology strategies, and prioritizing the innovative application and deployment of 100% open-source AI products and solutions across vertical industries.

Topic

vLLM-compile: Bringing Compiler Optimizations to Large Model Inference

vLLM has become a widely adopted open-source large model inference engine, supporting multiple models and hardware accelerators. Its integration of torch.compile enables performance portability and provides a clear decoupling between model implementation and low-level optimization. In this talk, we will explore the design of vLLM-compile, diving into key fused operator optimizations and graph transformations, and discuss how compilation can improve runtime efficiency, thereby boosting developer productivity. We will also cover recent advances in the field, including new techniques for reducing compilation time and a novel compiler intermediate representation specifically designed for large models. 1. Background and Challenges 2. vLLM-compile Architecture Design 3. Detailed Overview of Core Optimization Techniques 4. Latest Features and Future Outlook Audience Takeaways: Gain an understanding of the core technologies behind vLLM-compile and anticipate future directions in its technical evolution.

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6