Particle: Google DeepMind’s DiffusionGemma Brings 4x Faster Text Generation

Overview

DiffusionGemma, which Google DeepMind released June 10, 2026, is a 26-billion-parameter mixture-of-experts model that activates 3.8 billion parameters at inference and is available under an Apache 2.0 license on Hugging Face.
The model uses diffusion-style block generation to draft and iteratively refine up to 256 tokens in parallel, a design that shifts the bottleneck from memory bandwidth to compute and yields vendor-reported speedups up to about 4x.
Google and NVIDIA published day-one benchmarks showing throughput figures such as roughly 1,000+ tokens per second on an H100 and 700+ tokens per second on an RTX 5090, with NVIDIA providing platform playbooks and optimizations across RTX and DGX systems.
DiffusionGemma is explicitly experimental and trades raw output quality for speed, with Google recommending standard Gemma 4 for applications that require the highest quality.
Practical local deployment is limited today because the model relies on a specialized drafter/speculative-decoding component that is not yet integrated into many common public runtimes, so community toolchain work and independent benchmarking will determine real-world adoption and performance.