Particle.news
Download on the App Store

Google Launches Agentic Vision for Gemini 3 Flash, Bringing Code-Driven Image Reasoning

The capability reduces hallucination by executing Python to inspect images step by step.

Overview

  • Agentic Vision is available now to developers through the Gemini API in Google AI Studio and Vertex AI, with an initial rollout in the Gemini app.
  • The system follows a Think–Act–Observe loop in which the model plans tasks, runs Python to manipulate or analyze images, and reviews transformed inputs before answering.
  • Google reports a consistent 5–10% quality improvement across most vision benchmarks when code execution is enabled.
  • PlanCheckSolver.com cites a roughly 5% accuracy gain after using iterative code-driven inspection on high‑resolution building plans.
  • Demos include implicit zooming, programmatic annotation using a visual scratchpad, and visual arithmetic, with a roadmap for more implicit behaviors, web and reverse image search tools, and support beyond the Flash model.