Particle.news
Download on the App Store

Alibaba’s Aegaeon Claims 82% Cut in Nvidia GPU Needs in Peer-Reviewed Tests

A SOSP paper describes token-level GPU virtualization validated in months of production at Alibaba Cloud.

Overview

  • In a multi-month Model Studio beta, required accelerators fell from 1,192 to 213, with reporting indicating the use of Nvidia H20 chips that remain available in China under U.S. export rules.
  • Scheduling work at the token level pooled GPUs across many models and lifted effective output by up to nine times compared with older serverless approaches.
  • The research was presented at the 2025 ACM SOSP in Seoul by authors from Peking University and Alibaba, including CTO Jingren Zhou.
  • The paper does not detail the network fabric, and analysts caution the gains may depend on Alibaba’s vertically integrated stack, leaving portability outside its environment unverified.
  • Market commentary noted shares of several data-center and related companies weakened after the reports, while the broader impact on GPU demand awaits independent replication.