Particle News: DeepSeek Releases V3.2-Exp to Test Sparse Attention for Long-Context Efficiency

Overview

DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention, a fine-grained mechanism the team says improves training and inference efficiency on long-context workloads while preserving output quality.
DeepSeek aligned V3.2-Exp’s training setup with V3.1-Terminus and reports overall benchmark parity, with task-level variation such as MMLU-Pro 85.0 vs 85.0, AIME 2025 89.3 vs 88.4, and Codeforces 2121 vs 2046.
Preliminary internal testing cited by DeepSeek suggests long-context API calls could cost roughly half as much, a claim that awaits third‑party verification.
The release includes instructions and tools for local use, with Hugging Face conversion scripts, SGLang Docker images for H200, MI350 and NPUs, and day‑0 support in vLLM.
Sparse attention kernels are available via FlashMLA with high‑performance CUDA components in DeepGEMM, and both the codebase and model weights are licensed under MIT.