Overview
- New study quantifies that reasoning-enabled LLMs generate 543.5 thinking tokens per query, driving up to 50 times more CO₂ emissions than concise-response models.
- A clear accuracy-sustainability trade-off emerges as models exceeding 80% benchmark accuracy emit substantially higher carbon than those under 500 grams CO₂ equivalent.
- Queries in abstract algebra and philosophy produce up to six times more emissions than straightforward topics such as high school history.
- Using polite language like “please” and “thank you” expands response length and CO₂ output; enforcing concise prompts can curb unnecessary energy use.
- Model choice dramatically alters carbon costs at scale, with DeepSeek R1’s 600,000 answers equaling a transatlantic flight’s emissions compared to Qwen2.5’s capacity to handle nearly three times that workload for the same CO₂ impact.