Overview
- DeepSeek made the experimental V3.2‑Exp model and paper publicly available on Hugging Face, ModelScope, and GitHub, and rolled the model into its app and web clients.
- The update introduces DeepSeek Sparse Attention, a fine‑grained sparse attention mechanism aimed at boosting long‑sequence training and inference efficiency.
- Training settings were aligned with V3.1‑Terminus, and DeepSeek reports roughly comparable performance on public benchmarks while keeping temporary V3.1 API endpoints for comparison.
- DeepSeek reduced API pricing by more than 50% effective immediately, citing lower service costs with the new model.
- Huawei’s Ascend platform announced 0‑day support with open‑sourced inference code and operators via vLLM and SGLang, claiming 128K‑sequence TTFT under 2 seconds and TPOT under 30 milliseconds alongside new LI and SFA operator implementations and tools such as PyPTO and TileLang.