Dezhao Wang

Technical Expert at Alibaba's Taotian Group and Chief Architect of the MNN Team

Graduated with a Master's degree from the Institute of Computing Technology, Chinese Academy of Sciences. Currently serving as a Technical Expert at Alibaba's Taotian Group and the Chief Architect of the MNN team, he is responsible for the architecture design and performance optimization of the high-performance on-device AI inference engine MNN (12.5k stars) and its large model optimization branch, MNN-LLM. He leads the evolution of the MNN core engine and guided the team to win first place in the 2024–2025 IEEE AICAS LLM Performance Optimization Competition using MNN-LLM. His work focuses on advancing the efficient deployment and real-world application of cutting-edge AI models on mobile, IoT, and other edge devices.

Topic

MNN-LLM: Large Language Model Inference Framework for Mobile Devices

Large Language Models (LLMs) have demonstrated remarkable performance in the field of artificial intelligence, but optimizing their inference on edge devices poses significant challenges. The MNN engine, as an efficient multi-platform inference framework, supports various deep learning models and offers excellent versatility and high performance. This talk will focus on implementing LLM inference deployment on edge devices using the MNN engine and on performance optimizations tailored for edge environments. Outline: 1. Introduction to MNN-LLM 2. Structure and performance analysis of large language models 3. Edge memory optimization 4. Edge heterogeneous performance optimization 5. Application examples

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号