Particle.news

Download on the App Store

Alibaba’s Upgraded Qwen3 Model Tops OpenAI and DeepSeek in Maths and Coding

The upgrade boosts the model’s context window eightfold to 256,000 tokens for handling extensive documents.

Image
Image
Image

Overview

  • The open-source Qwen3-235B-A22B-Instruct-2507-FP8 achieved significant improvements in instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage.
  • It scored 70.3 on the 2025 American Invitational Mathematics Examination, outperforming DeepSeek-V3-0324 by 23.7 points and GPT-4o-0327 by 43.6 points.
  • On the MultiPL-E coding benchmark, it earned 87.9 points, ahead of DeepSeek and OpenAI models but just behind Anthropic’s Claude Opus 4 Non-thinking.
  • The model operates exclusively in non-thinking mode, delivering direct outputs without revealing its reasoning steps.
  • Alibaba plans to integrate a three-billion-parameter Qwen variant into HP’s Xiaowei Hui assistant on PCs in China to enhance document drafting and meeting summarization.