Particle News: Zyphra Trains ZAYA1 MoE on AMD Instinct MI300X in IBM Cloud

Overview

ZAYA1 is described as the first large-scale Mixture-of-Experts model trained entirely on AMD Instinct MI300X GPUs with AMD Pensando networking and the ROCm software stack, detailed in a technical report published today.
The base model architecture totals 8.3 billion parameters with 760 million active, with reported results matching or exceeding Llama-3-8B and OLMoE and rivaling Qwen3-4B and Gemma3-12B across reasoning, math, and coding benchmarks.
AMD’s 192 GB HBM per MI300X was cited as enabling training without expert or tensor sharding, reducing engineering complexity and improving throughput.
Zyphra reported more than 10x faster model save times using AMD-optimized distributed I/O during large-scale training.
The jointly engineered system with IBM Cloud used 128 nodes running ROCm 6.4, and industry commentary frames the milestone as a validation of AMD’s platform competitiveness versus Nvidia.