Overview
- DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention, a two‑stage system using a “lightning indexer” and fine‑grained token selection to reduce compute for extended inputs.
- DeepSeek reports benchmark performance on par with V3.1-Terminus after aligning training configurations to isolate the impact of sparse attention.
- Model weights, code, and inference demos are published on Hugging Face and GitHub under a permissive MIT license for reproduction and adaptation.
- First-day integrations target Huawei’s Ascend/CANN stack with recipes from vLLM-Ascend and SGLang, with additional support noted for Cambricon and Hygon hardware.
- Preliminary internal tests suggest API and inference costs for long-context tasks could fall by about half, a claim outlets note requires third‑party validation and safety checks.