Particle: NVIDIA Releases Nemotron‑Labs Diffusion Models That Combine Autoregressive and Parallel Decoding

Overview

NVIDIA published the Nemotron‑Labs Diffusion family as open checkpoints with training code and permissive licenses for text models and a separate license for an 8B vision‑language model.
The family includes 3B, 8B and 14B text models plus an 8B VLM and offers both base weights and instruction‑tuned chat variants for developer use.
Each model supports three generation modes: standard autoregressive decoding, block‑wise diffusion decoding that drafts and refines many tokens in parallel, and self‑speculation that drafts with diffusion then verifies outputs with autoregressive decoding.
NVIDIA says the models were pretrained with a joint autoregressive+diffusion objective on large corpora and fine‑tuned thereafter, and it reports modest accuracy gains versus a comparator plus multi‑fold throughput increases in vendor benchmarks.
Deployment support is rolling into SGLang while interim inference access is available via a GitHub issue, and the company notes third‑party validation will be needed to confirm performance across real workloads.