Overview
- Nvidia released company benchmarks on Dec. 3 showing its latest server with 72 top-tier chips delivers roughly a tenfold boost on mixture‑of‑experts inference workloads.
- Tests included Moonshot AI’s Kimi K2 Thinking model, with Nvidia reporting comparable gains on DeepSeek models.
- The company credits the performance to dense chip counts and high‑speed interconnects linking the accelerators.
- The data addresses inference rather than training, reflecting the industry’s shift toward serving models to users at scale.
- AMD is developing a similar multi‑chip server slated for next year, and Cerebras features among the rivals contesting the inference market.