Overview
- Google has introduced compute-based usage limits for its Gemini API, including rolling five-hour refresh windows and weekly caps, that apply across customers.
- Meta sought far more Gemini capacity than Google could supply and was told it could not get the full allotment, a shortfall that disrupted some internal AI projects.
- The restrictions have led Meta to urge staff to conserve AI 'tokens,' the units used to measure model usage, while other Google clients have felt smaller effects.
- Google executives link the caps to rapid commercial uptake of Gemini and a growing Google Cloud backlog, highlighting the high cost and long lead times of adding specialized AI hardware.
- The episode shows how scarce high-performance compute can reshape competition and product plans, as rivals become buyers and firms delay or redesign AI features that need large-scale models.