Overview
- Mu is a transformer-based small language model with 330 million parameters that Microsoft distilled and fine-tuned using over 3.6 million examples and low-rank adaptation methods.
- The model is currently being deployed on Copilot+ PCs in Windows 11 Insider builds to power AI agents that perform natural language commands in the Settings app.
- Microsoft claims Mu delivers sub-half-second response times at more than 100 tokens per second and can exceed 200 tokens per second on optimized devices like the Surface Laptop 7.
- Benchmarking on Qualcomm’s Hexagon NPU showed a 47 percent reduction in time to first token and decoding speeds nearly five times faster than comparable decoder-only models.
- All of Mu’s computations occur on device NPUs, ensuring personal data remains on the PC and supporting Microsoft’s broader data sovereignty and privacy initiatives.