Particle: Google Limits Meta’s Access to Gemini Over Compute Shortage

Overview

Google told Meta in March that it could not deliver the full Gemini inference capacity Meta requested and has kept compute‑based usage limits in place to manage demand.
The shortfall disrupted some of Meta’s internal AI projects and prompted the company to ask employees to conserve AI “tokens” while shifting more workloads to its in‑house Muse Spark model.
Several other Google Cloud customers have seen reduced access to Gemini at a smaller scale, and CEO Sundar Pichai has acknowledged Google is “compute‑constrained in the near term” with a sharply larger Cloud backlog.
Providers are responding with stopgaps such as metered usage, emergency leases of third‑party capacity reported with firms like SpaceX, and large capital plans to build more chips and data centres.
The squeeze exists because inference queries consume steady GPU, memory and power resources, and the gap between demand and available hardware is accelerating moves to multi‑provider routing, in‑house models and cost controls that could slow some product rollouts and raise operating costs.