Overview
- Google made the model available today in public preview through the Gemini API in Google AI Studio and Vertex AI, with a live demo hosted by Browserbase.
- The computer_use tool operates in a loop that ingests a user request, a screenshot, and recent actions, returns a UI action such as click or type, executes it, then repeats until completion.
- Supported behaviors include navigation, back and forward, form filling, typing, scrolling, drag‑and‑drop, cursor hovering, keyboard combinations, and opening specific URLs.
- Google reports stronger accuracy than leading alternatives on web and Android control benchmarks with lower latency, citing internal evaluations and Browserbase testing.
- The release is optimized for browser control with promising Android performance, lacks desktop OS‑level control, and includes per‑step safety checks plus configurable confirmations for high‑risk actions.