Overview
- NVIDIA published the Nemotron‑Labs Diffusion family as open checkpoints with training code and permissive licenses for text models and a separate license for an 8B vision‑language model.
- The family includes 3B, 8B and 14B text models plus an 8B VLM and offers both base weights and instruction‑tuned chat variants for developer use.
- Each model supports three generation modes: standard autoregressive decoding, block‑wise diffusion decoding that drafts and refines many tokens in parallel, and self‑speculation that drafts with diffusion then verifies outputs with autoregressive decoding.
- NVIDIA says the models were pretrained with a joint autoregressive+diffusion objective on large corpora and fine‑tuned thereafter, and it reports modest accuracy gains versus a comparator plus multi‑fold throughput increases in vendor benchmarks.
- Deployment support is rolling into SGLang while interim inference access is available via a GitHub issue, and the company notes third‑party validation will be needed to confirm performance across real workloads.