Particle News: Alibaba Cloud Open-Sources Qwen3-Omni, TTS and Image-Edit Models With Native Multimodality

Overview

The Sept. 23 release makes Qwen3-Omni, Qwen3-TTS and Qwen3-TTS-Flash, Qwen-Image-Edit-2509, and new Qwen3-Next-80B variants publicly available with code and demos on GitHub, Hugging Face and ModelScope.
Qwen3-Omni handles text, images, audio and video inputs and streams responses as text or natural speech in real time using a MoE “thinker–speaker” design with AuT pretraining and multi-codebooks for lower latency.
Alibaba reports coverage of 119 text languages plus 19 speech input languages and 10 speech output languages across the suite.
Company benchmarks cite leadership on 32 of 36 audio and video tests within the open-source range and parity with Gemini 2.5 Pro for ASR and audio understanding.
Qwen3-TTS offers 17 voices each spanning 10 languages with multiple Chinese dialects, while TTS-Flash targets faster first-packet latency and stability, and Qwen-Image-Edit-2509 adds multi-image editing, improved single-image consistency and native ControlNet support.