Technology ❯ Artificial Intelligence ❯ Machine Learning

Model Evaluation

Performance Metrics Benchmarking Performance Improvement Performance Comparison Open Source Tools Benchmarks Benchmark Datasets Performance Analysis Accuracy Improvement Hallucinations

NVIDIA Releases Nemotron‑Labs Diffusion Models That Combine Autoregressive and Parallel Decoding

A single checkpoint runs autoregressive, diffusion or self‑speculation modes, boosting token throughput, enabling iterative token revision.

Enterprise AI Shifts From Pure RAG to Structure-Aware Retrieval After Real-World Failures

RAG Practice Coalesces Into a Seven-Step Playbook Focused on Retrieval

New RAG Papers Tout Structured, Chunk-Free and Hybrid Retrieval Gains

ARC-AGI-3 Launch Exposes Sharp Gap Between Humans and Top AI Models

AI2 Launches MolmoWeb, an Open Visual Web Agent

New Guidance Reframes Production RAG: Fix Retrieval, Not the LLM

Generator-Focused RAG Benchmark Released as Hierarchical System Wins WattBot 2025

OpenAI and Cerebras Launch GPT-5.3‑Codex‑Spark for Real‑Time Coding at 1,000+ Tokens per Second

Sarvam AI Launches India‑Tuned Voice and Vision Models, Claims Benchmark Wins Over Global Rivals

Two arXiv Preprints Propose Adaptive RAG With Complexity Scoring and Agentic Retrieval

DeepSeek Releases DeepSeek‑OCR 2 With Encoder That Learns Human‑Like Reading Order

Google Releases TranslateGemma, an Open Suite of Efficient Translation Models

LMArena Secures $150 Million Series A at $1.7 Billion Valuation

New RAG Research Debuts Adaptive Retrieval Advances as Security Risks Emerge

Anthropic Study Shows AI Agents Can Autonomously Execute Lucrative Smart-Contract Exploits

Google Rolls Out Nano Banana Pro, Gemini 3–Powered Image Generator Focused on Factual Visuals

OpenAI Says Incentives Drive AI Hallucinations, Calls for Scoreboard Overhaul

New RAG Preprints Propose Retrieval, Reasoning, and Graph Methods to Curb Hallucinations

OpenAI Says Guess-Rewarding Evaluations Drive AI Hallucinations, Proposes Scoring Fix