Overview
- Reasoning-enabled models generated an average of 543.5 thinking tokens per question compared with 37.7 tokens by concise-answer models.
- None of the AI systems that kept emissions below 500 grams of CO₂ equivalent per response achieved more than 80% accuracy, revealing a clear accuracy–sustainability trade-off.
- Queries on complex topics such as abstract algebra or philosophy emitted up to six times more CO₂ than simpler subjects like high school history.
- DeepSeek R1’s 600,000 answers equate to the emissions of a round-trip London–New York flight, while Qwen 2.5 can cover 1.9 million queries at similar accuracy with the same carbon output.
- Researchers recommend users limit high-capacity, reasoning-enabled models to tasks requiring deep analysis and prompt AI for concise answers to reduce environmental impact.