Overview
- MAI‑Voice‑1, a speech model, is live in Copilot Daily and Podcasts and can generate a minute of audio in under a second on a single GPU, with hands‑on testing available in Copilot Labs.
- MAI‑1‑preview, Microsoft’s first end‑to‑end foundation text model, is open for public evaluation on LMArena and will roll into select Copilot text scenarios in the coming weeks.
- Microsoft says MAI‑1‑preview was trained on roughly 15,000 Nvidia H100 GPUs using an efficiency‑focused approach, and it has operational GB200 clusters for the next wave of models.
- On LMArena, MAI‑1‑preview currently sits around 13th for text tasks, and Microsoft has opened a form for developers to request early access.
- The shift to in‑house models arrives as multiple outlets report ongoing Microsoft–OpenAI negotiations over an AGI clause and Azure exclusivity, with a SoftBank investment said to hinge on a year‑end agreement.