Overview
- Microsoft introduced MAI‑Voice‑1 for speech and MAI‑1‑preview for text as its first internally developed foundation models.
- MAI‑Voice‑1 generates a minute of audio in under a second on a single GPU and is already live in Copilot Daily and Podcasts, with trials available through Copilot Labs.
- MAI‑1‑preview, Microsoft’s first end‑to‑end in‑house foundation model, is in public evaluation on LMArena and will begin powering select text use cases in Copilot in the coming weeks.
- Microsoft says MAI‑1‑preview was trained on roughly 15,000 Nvidia H100 GPUs using a mixture‑of‑experts approach and efficiency techniques to stretch compute, and it currently ranks 13th for text tasks on LMArena.
- The company reports an operational Nvidia GB200 cluster and a multi‑year roadmap, framing the effort as strategic diversification even as it maintains its extensive partnership with OpenAI.