Technology ❯ Artificial Intelligence ❯ Model Evaluation

Benchmarking

Performance Metrics Performance Comparison Performance Assessment Intelligence Index User Feedback Sudoku-Extreme Public Perception Human Evaluation Frameworks Training Focus PEFT-Arena

3 ARTICLES

last mo.

Z.ai Releases GLM-5.2 Open-Weight Model Trained on Huawei Chips

A one‑million‑token, 744B mixture‑of‑experts model offered under an MIT license could widen access to long‑horizon coding by pairing open weights with aggressive API pricing.

5 ARTICLES

2mo ago

Moonshot AI Releases Kimi K2.6, an Open-Source Model Built for Long-Horizon Coding

3 ARTICLES

8mo ago

Microsoft Open-Sources Fara-7B, an On-Device Computer-Use Agent Built for Screenshots

11 ARTICLES

8mo ago

xAI Rolls Out Grok 4.1 Free to All Users, Citing Faster, More Accurate Replies

11 ARTICLES

9mo ago

New Papers Flag Retrieval as RAG’s Weak Link, Propose Practical Fixes

3 ARTICLES

9mo ago

Samsung’s Tiny Recursive Model Tops Bigger AIs on Tough Reasoning Benchmarks

3 ARTICLES

10mo ago

Meta Releases Open-Weights Code World Model With Built-In Execution Simulation

3 ARTICLES

10mo ago

DeepSeek Releases V3.1‑Terminus With Reliability Fixes, Mode Limits and Agent Gains

10 ARTICLES

10mo ago

OpenAI Says Incentives Drive AI Hallucinations, Calls for Scoreboard Overhaul

6 ARTICLES

11mo ago

Google Launches Gemma 3 270M to Enable Hyper-Efficient On-Device AI

3 ARTICLES

last yr.

Apple Study Finds Reasoning Models Collapse on Complex Puzzles

6 ARTICLES

last yr.

Microsoft Unveils Phi-4 Reasoning Models That Outperform Larger AI Systems

7 ARTICLES

last yr.

Meta Faces Backlash Over Use of Experimental AI Model for Benchmark Testing

3 ARTICLES

last yr.

Mistral Unveils Small 3 AI Model, Rivaling Larger Competitors in Efficiency and Accuracy

5 ARTICLES

last yr.

Alibaba Launches QwQ-32B AI Model to Challenge OpenAI's Reasoning Models

5 ARTICLES

last yr.

Mistral Unveils Pixtral 12B, Its First Multimodal AI Model

Benchmarking

Z.ai Releases GLM-5.2 Open-Weight Model Trained on Huawei Chips

JetBrains Open-Sources Mellum2, a 12B Mixture-of-Experts Model

Nvidia Unveils Cosmos 3, an Open World Model for Robots and Vehicles

Cisco Releases Open-Source Toolkit and Constitution to Verify AI Model Lineage

Moonshot AI Releases Kimi K2.6, an Open-Source Model Built for Long-Horizon Coding

Microsoft Open-Sources Fara-7B, an On-Device Computer-Use Agent Built for Screenshots

xAI Rolls Out Grok 4.1 Free to All Users, Citing Faster, More Accurate Replies

New Papers Flag Retrieval as RAG’s Weak Link, Propose Practical Fixes

Samsung’s Tiny Recursive Model Tops Bigger AIs on Tough Reasoning Benchmarks

Meta Releases Open-Weights Code World Model With Built-In Execution Simulation

Never miss stories about

Benchmarking

DeepSeek Releases V3.1‑Terminus With Reliability Fixes, Mode Limits and Agent Gains

OpenAI Says Incentives Drive AI Hallucinations, Calls for Scoreboard Overhaul

Google Launches Gemma 3 270M to Enable Hyper-Efficient On-Device AI

Apple Study Finds Reasoning Models Collapse on Complex Puzzles

Microsoft Unveils Phi-4 Reasoning Models That Outperform Larger AI Systems

Meta Faces Backlash Over Use of Experimental AI Model for Benchmark Testing

Mistral Unveils Small 3 AI Model, Rivaling Larger Competitors in Efficiency and Accuracy

Alibaba Launches QwQ-32B AI Model to Challenge OpenAI's Reasoning Models

Mistral Unveils Pixtral 12B, Its First Multimodal AI Model

Never miss stories about

Benchmarking