Model Evaluation

Performance Metrics Benchmarking Benchmarks Performance Comparison Performance Benchmarks Performance Benchmarking Benchmark Testing Performance Improvement Performance Testing Comparative Analysis

3 ARTICLES

20h ago

Anthropic Models Accessed Real Networks During Cybersecurity Tests

A partner misconfiguration let advanced Claude models probe live systems, so Anthropic has paused internet-enabled security evaluations and opened an independent review.

119 ARTICLES

23h ago

Anthropic Says Claude Escaped Tests and Breached Three Companies

Stories older than 24h

5 ARTICLES

last wk.

Black Forest Labs Releases FLUX 3, a Unified Model That Generates Video with Native Audio and Predicts Robot Actions

8 ARTICLES

2w ago

OpenAI Builds GPT-Red to Hunt Prompt‑Injection Flaws

4 ARTICLES

2w ago

xAI’s Grok 4.5 Debuts, Promises Low‑Cost, Token‑Efficient Frontier Model

3 ARTICLES

3w ago

Path-Aware Methods Aim to Fix MoE Efficiency, Watermarking and Inference Throughput

7 ARTICLES

4w ago

Grok 4.5 Enters Private Beta at SpaceX and Tesla

9 ARTICLES

last mo.

Chinese AI Firms Slash Training and Inference Costs With Sparse MoE and H800 Workarounds

4 ARTICLES

last mo.

Google Open-Sources DiffusionGemma, a Text-Diffusion Model That Boosts Local Inference Speed

5 ARTICLES

2mo ago

JetBrains Open-Sources Mellum2, a 12B Mixture-of-Experts Model

9 ARTICLES

2mo ago

Nvidia Unveils Cosmos 3, an Open World Model for Robots and Vehicles

4 ARTICLES

2mo ago

Chinese Team Opens 1.58‑Bit BitCPM-CANN and 1B MiniCPM5 Models for Phone and Ascend Deployment

5 ARTICLES

2mo ago

SpaceXAI’s Grok V9‑Medium Completes Core Training

5 ARTICLES

2mo ago

Cursor Launches Composer 2.5 After Musk Invites Public To Test

10 ARTICLES

2mo ago

MoE Research Delivers New Playbooks for Design, Routing, and Real-World Deployment

5 ARTICLES

3mo ago

Cisco Releases Open-Source Toolkit and Constitution to Verify AI Model Lineage

3 ARTICLES

3mo ago

DeepSeek Releases Multimodal Model, Pitches 'Visual Primitives' for Spatial Reasoning

Model Evaluation

Anthropic Models Accessed Real Networks During Cybersecurity Tests

Anthropic Says Claude Escaped Tests and Breached Three Companies

Black Forest Labs Releases FLUX 3, a Unified Model That Generates Video with Native Audio and Predicts Robot Actions

OpenAI Builds GPT-Red to Hunt Prompt‑Injection Flaws

xAI’s Grok 4.5 Debuts, Promises Low‑Cost, Token‑Efficient Frontier Model

Path-Aware Methods Aim to Fix MoE Efficiency, Watermarking and Inference Throughput

Grok 4.5 Enters Private Beta at SpaceX and Tesla

Chinese AI Firms Slash Training and Inference Costs With Sparse MoE and H800 Workarounds

OpenAI Tests GPT‑5.6 and Pro Variant Ahead of Possible Late‑June Launch

Z.ai Releases GLM-5.2 Open-Weight Model Trained on Huawei Chips

Never miss stories about

Model Evaluation

Companies Slash AI Use After 'Token Shock' Drives Soaring Bills

Google Open-Sources DiffusionGemma, a Text-Diffusion Model That Boosts Local Inference Speed

JetBrains Open-Sources Mellum2, a 12B Mixture-of-Experts Model

Nvidia Unveils Cosmos 3, an Open World Model for Robots and Vehicles

Chinese Team Opens 1.58‑Bit BitCPM-CANN and 1B MiniCPM5 Models for Phone and Ascend Deployment

SpaceXAI’s Grok V9‑Medium Completes Core Training

Cursor Launches Composer 2.5 After Musk Invites Public To Test

MoE Research Delivers New Playbooks for Design, Routing, and Real-World Deployment

Cisco Releases Open-Source Toolkit and Constitution to Verify AI Model Lineage

DeepSeek Releases Multimodal Model, Pitches 'Visual Primitives' for Spatial Reasoning

Never miss stories about

Model Evaluation