Overview
- Multiple reports say Apple will send some compute‑intensive Siri queries to Google Cloud where licensed Gemini models will run on Nvidia Blackwell B200 GPUs that support hardware‑level confidential compute.
- Apple tested a modified Gemini model on its Private Cloud Compute servers and concluded the in‑house system could not deliver the inference speed needed for a real‑time chatbot‑style Siri.
- The planned setup is a hybrid design that keeps smaller, privacy‑focused models on device and hands off heavier requests to cloud GPUs, with Apple expected to keep Private Cloud Compute branding for parts of the service.
- Apple is expected to preview the redesigned, conversational Siri at WWDC on June 8 and to stage a wider rollout later in the year that may begin as a labeled beta or a waitlist for early access.
- The move departs from Apple’s usual insistence on controlling the full stack and raises questions about trust and revenue paths while offering a technical privacy safeguard because Nvidia’s confidential compute encrypts data during processing.