Overview
- ZAYA1 is described as the first large-scale Mixture-of-Experts model trained entirely on AMD Instinct MI300X GPUs with AMD Pensando networking and the ROCm software stack, detailed in a technical report published today.
- The base model architecture totals 8.3 billion parameters with 760 million active, with reported results matching or exceeding Llama-3-8B and OLMoE and rivaling Qwen3-4B and Gemma3-12B across reasoning, math, and coding benchmarks.
- AMD’s 192 GB HBM per MI300X was cited as enabling training without expert or tensor sharding, reducing engineering complexity and improving throughput.
- Zyphra reported more than 10x faster model save times using AMD-optimized distributed I/O during large-scale training.
- The jointly engineered system with IBM Cloud used 128 nodes running ROCm 6.4, and industry commentary frames the milestone as a validation of AMD’s platform competitiveness versus Nvidia.