Particle.news

Download on the App Store

DeepSeek Releases V3.2-Exp to Test Sparse Attention for Long-Context Efficiency

The open-weight model arrives with runnable code and kernel references under an MIT license to enable independent benchmarking.

Overview

  • DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention, a fine-grained mechanism the team says improves training and inference efficiency on long-context workloads while preserving output quality.
  • DeepSeek aligned V3.2-Exp’s training setup with V3.1-Terminus and reports overall benchmark parity, with task-level variation such as MMLU-Pro 85.0 vs 85.0, AIME 2025 89.3 vs 88.4, and Codeforces 2121 vs 2046.
  • Preliminary internal testing cited by DeepSeek suggests long-context API calls could cost roughly half as much, a claim that awaits third‑party verification.
  • The release includes instructions and tools for local use, with Hugging Face conversion scripts, SGLang Docker images for H200, MI350 and NPUs, and day‑0 support in vLLM.
  • Sparse attention kernels are available via FlashMLA with high‑performance CUDA components in DeepGEMM, and both the codebase and model weights are licensed under MIT.