Particle.news
Download on the App Store

Apple’s M5 Macs With macOS 26.2 Deliver Up to 4x Faster Local LLM Start Times in MLX

MLX now uses the M5’s Neural Accelerators via macOS 26.2 to cut time to first token dramatically.

Overview

  • Apple’s official benchmarks show up to roughly 4x improvements in time to first token versus M4 when running LLMs with MLX on M5 hardware.
  • Dense 14B models drop below about 10 seconds to first token, while a 30B Mixture‑of‑Experts example falls under about 3 seconds.
  • Subsequent‑token generation improves by 19–27% due to higher memory bandwidth on M5 Macs, measured at about 153GB/s versus 120GB/s on M4.
  • Independent reporting cites a Qwen3‑14B 4‑bit run moving from roughly 36 seconds to around 8 seconds on M5, corroborating Apple’s results.
  • MLX remains open source with Python, Swift, and C/C++ support, adds fast built‑in quantization for Hugging Face models, and requires macOS 26.2 to tap the M5 accelerators.