Overview
- DeepSeek says V3.1 adopts a UE8M0 FP8 format tailored for soon‑to‑launch domestic accelerators, without naming specific chipmakers.
- The release unifies reasoning and non‑reasoning modes in a single model with chat templates, and expands the context window to 131,072 tokens.
- Performance claims include faster responses, fewer required “thinking” tokens that lower serving costs, and a Browsecomp score of 30 versus 8.9 on an earlier model.
- Access is available through DeepSeek’s chatbot and API, with downloadable weights for base and instruct models on Hugging Face and ModeScope.
- Developer pricing for API use will change on September 6, as the company continues assessing Huawei hardware for inference after earlier training setbacks that led it to rely on Nvidia H20.