Overview
- Agentic Vision is available now to developers through the Gemini API in Google AI Studio and Vertex AI, with an initial rollout in the Gemini app.
- The system follows a Think–Act–Observe loop in which the model plans tasks, runs Python to manipulate or analyze images, and reviews transformed inputs before answering.
- Google reports a consistent 5–10% quality improvement across most vision benchmarks when code execution is enabled.
- PlanCheckSolver.com cites a roughly 5% accuracy gain after using iterative code-driven inspection on high‑resolution building plans.
- Demos include implicit zooming, programmatic annotation using a visual scratchpad, and visual arithmetic, with a roadmap for more implicit behaviors, web and reverse image search tools, and support beyond the Flash model.