Particle.news

Google Limits Meta's Access to Gemini Over Compute Shortage

The company is using rolling five-hour refresh windows and weekly token caps to stretch limited GPU-backed model capacity during an interim demand-management phase.

Overview

  • Multiple reports published on June 28 corroborate that Google told Meta in March it could not deliver the full Gemini compute capacity Meta sought and has since placed limits that delayed some of Meta’s internal AI projects.
  • Google instituted compute-based rationing for Gemini Apps that began in mid-May and uses rolling five-hour refresh windows plus weekly caps to control how many AI tokens customers can consume.
  • Meta, which had exceptionally high demand, has instructed employees to reduce AI token use and optimize workloads to cope with the tighter access.
  • Other Google Cloud customers have seen reduced Gemini access but have been less affected than Meta, and Google presents the measures as temporary steps while it expands data-centre and chip capacity.
  • The episode underlines that scaling high-end AI compute takes time and large investment, which may push firms to rely more on in-house models, token-efficiency measures, or alternative compute suppliers for less intensive tasks.