Particle News: DeepSeek Open-Releases V3.2-Exp, a Sparse-Attention Model Claiming Up to 50% Lower Long-Context Costs

Overview

DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention, a two‑stage system using a “lightning indexer” and fine‑grained token selection to reduce compute for extended inputs.
DeepSeek reports benchmark performance on par with V3.1-Terminus after aligning training configurations to isolate the impact of sparse attention.
Model weights, code, and inference demos are published on Hugging Face and GitHub under a permissive MIT license for reproduction and adaptation.
First-day integrations target Huawei’s Ascend/CANN stack with recipes from vLLM-Ascend and SGLang, with additional support noted for Cambricon and Hygon hardware.
Preliminary internal tests suggest API and inference costs for long-context tasks could fall by about half, a claim outlets note requires third‑party validation and safety checks.