Shiwei Yu
Director-level Engineer and Department Head at Google
Primarily responsible for Google's infrastructure planning, design, deployment, and operations.
Topic
Planet-Scale Large Model Infrastructure
In today’s highly competitive AI landscape, computing infrastructure has become one of the key factors determining success or failure. Outline: 1. Four key components of large models: data, models, chips, and computing power, with barriers increasing at each stage. 2. Why the computing power barrier is the highest and the ultimate determinant of LLM success. 3. What constitutes planet-scale computing infrastructure? Centralized training with direct nuclear power supply and geographically distributed inference. 4. Major challenges of planet-scale computing infrastructure: * Challenges from physical laws and solutions: energy density, network density, thermal density * Challenges from temporal constraints and solutions: data center planning (location, electricity, water), network planning, chip planning * Challenges from financial planning and solutions: financial decision timelines, ROI timelines * Challenges from communication and collaboration and solutions: planning → design → construction → deployment → operation → decommissioning