Particle.news

Download on the App Store

DeepSeek Open-Releases V3.2-Exp, a Sparse-Attention Model Claiming Up to 50% Lower Long-Context Costs

The open-weight release invites independent testing with day-one support for Chinese NPUs.

Overview

  • DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention, a two‑stage system using a “lightning indexer” and fine‑grained token selection to reduce compute for extended inputs.
  • DeepSeek reports benchmark performance on par with V3.1-Terminus after aligning training configurations to isolate the impact of sparse attention.
  • Model weights, code, and inference demos are published on Hugging Face and GitHub under a permissive MIT license for reproduction and adaptation.
  • First-day integrations target Huawei’s Ascend/CANN stack with recipes from vLLM-Ascend and SGLang, with additional support noted for Cambricon and Hygon hardware.
  • Preliminary internal tests suggest API and inference costs for long-context tasks could fall by about half, a claim outlets note requires third‑party validation and safety checks.