Particle: Alibaba’s Aegaeon Slashes LLM Inference GPUs by 82% in Peer-Reviewed Tests

Overview

Aegaeon pooled GPU resources at the token level during a multi‑month production beta, cutting the fleet for dozens of LLMs from 1,192 to 213 accelerators.
The system raised effective output by up to nine times versus Alibaba’s prior serverless setup by scheduling tiny slices of work across shared GPUs.
Coverage cites Nvidia H20s as the test hardware, reflecting China’s constrained access to high‑end accelerators under U.S. export controls.
The research was presented at the 2025 ACM SOSP in Seoul by authors from Peking University and Alibaba, including CTO Jingren Zhou.
Analyses say results may hinge on Alibaba’s vertically integrated stack and network fabric, and some on Wall Street linked the reports to weakness in data‑center‑related stocks.