Particle.news

Download on the App Store

Alibaba Cloud’s Aegaeon Cuts Nvidia GPU Needs by 82% in SOSP Paper

The study draws on a months-long marketplace beta that exposed widespread GPU underutilization.

Overview

  • The research details a reduction from 1,192 to 213 Nvidia H20 GPUs to serve dozens of models of up to 72 billion parameters.
  • Aegaeon was beta-tested for more than three months in Alibaba Cloud’s model marketplace before the results were presented at SOSP.
  • The team measured skewed demand, with 17.7% of allocated GPUs processing only 1.35% of inference requests.
  • The system pools GPU resources with token-level autoscaling; secondary reports state one H20 can run up to seven LLMs and model-switching latency fell 97%.
  • The paper lists Peking University and Alibaba Cloud researchers as authors, including CTO Zhou Jingren, while broader coverage links the work to China’s evolving AI chip landscape.