Overview
- Qwen-Image weights, demo code and documentation went live August 4 on Hugging Face and Modelscope under an open-source Apache 2.0 license.
- Benchmark results confirm state-of-the-art image generation and precise text-aware editing, with particularly strong performance in complex multilingual and Chinese text rendering.
- A curriculum learning strategy guides training from non-text tasks through simple prompts to paragraph-level descriptions across text-to-image, text-image-to-image and image-to-image objectives.
- The model’s dual-encoding framework separates semantic and reconstructive representations to balance text fidelity with visual consistency during advanced editing operations.
- Beyond generation and editing, Qwen-Image supports object detection, semantic segmentation, depth and edge estimation, novel view synthesis and super-resolution tasks.