Particle.news

NVIDIA Releases Nemotron‑Labs Diffusion Models That Combine Autoregressive and Parallel Decoding

A single checkpoint runs autoregressive, diffusion or self‑speculation modes, boosting token throughput, enabling iterative token revision.

Overview

  • NVIDIA published the Nemotron‑Labs Diffusion family as open checkpoints with training code and permissive licenses for text models and a separate license for an 8B vision‑language model.
  • The family includes 3B, 8B and 14B text models plus an 8B VLM and offers both base weights and instruction‑tuned chat variants for developer use.
  • Each model supports three generation modes: standard autoregressive decoding, block‑wise diffusion decoding that drafts and refines many tokens in parallel, and self‑speculation that drafts with diffusion then verifies outputs with autoregressive decoding.
  • NVIDIA says the models were pretrained with a joint autoregressive+diffusion objective on large corpora and fine‑tuned thereafter, and it reports modest accuracy gains versus a comparator plus multi‑fold throughput increases in vendor benchmarks.
  • Deployment support is rolling into SGLang while interim inference access is available via a GitHub issue, and the company notes third‑party validation will be needed to confirm performance across real workloads.