Overview
- Apple clarified technical and deployment details after Monday's WWDC keynote, confirming a five‑model AFM 3 family that spans two on‑device models and three cloud models.
- The most powerful on‑device model, AFM 3 Core Advanced, is a 20‑billion‑parameter sparse model that Apple says activates only 1–4 billion parameters per request and requires top‑tier Apple silicon such as the A19 Pro.
- To run the 20B model on phones, Apple stores the full model in NAND flash and uses Instruction‑Following Pruning to select and lock a fixed set of experts per prompt so it does not need to load all weights into DRAM.
- Apple says AFM Cloud Pro runs on NVIDIA GPUs in Google Cloud under its Private Cloud Compute framework, which uses confidential computing, an append‑only cryptographic ledger of hardware, and promised researcher access through its security bounty program.
- Apple maintains that its AFMs were pre‑ and post‑trained on Apple data with reinforcement learning, then refined using outputs from Google’s frontier models after licensing Gemini, and the company’s privacy claim is that raw user data is not sent to the cloud.