Overview
- DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention, a fine-grained mechanism the team says improves training and inference efficiency on long-context workloads while preserving output quality.
- DeepSeek aligned V3.2-Exp’s training setup with V3.1-Terminus and reports overall benchmark parity, with task-level variation such as MMLU-Pro 85.0 vs 85.0, AIME 2025 89.3 vs 88.4, and Codeforces 2121 vs 2046.
- Preliminary internal testing cited by DeepSeek suggests long-context API calls could cost roughly half as much, a claim that awaits third‑party verification.
- The release includes instructions and tools for local use, with Hugging Face conversion scripts, SGLang Docker images for H200, MI350 and NPUs, and day‑0 support in vLLM.
- Sparse attention kernels are available via FlashMLA with high‑performance CUDA components in DeepGEMM, and both the codebase and model weights are licensed under MIT.