Overview
- Microsoft introduced MAI‑Voice‑1 for speech and MAI‑1‑preview for text, its first models trained end to end in house.
- MAI‑Voice‑1 generates a minute of audio in under one second on a single GPU and is already powering Copilot Daily and Podcasts, with tryouts in Copilot Labs.
- MAI‑1‑preview, a mixture‑of‑experts model refined using about 15,000 Nvidia H100 GPUs, is being evaluated on LMArena and ranked 13th for text tasks on Thursday.
- Select Copilot text use cases will start using MAI‑1‑preview in the coming weeks, and developers can request early access through a trusted‑tester program.
- Microsoft highlights efficiency and tighter product integration, says its GB200 cluster is operational, and frames the move as diversification alongside an ongoing OpenAI partnership.