Particle.news
Download on the App Store

Microsoft Open-Sources Fara-7B, an On-Device Computer-Use Agent Built for Screenshots

The research preview interprets UIs from pixels to issue grounded actions for private, low-latency automation on consumer hardware.

Overview

  • Microsoft released Fara-7B as an experimental, open-weight model under an MIT license, available on Microsoft Foundry and Hugging Face with optimizations for Copilot+ PCs.
  • The 7B-parameter agent operates from screenshots to generate mouse and keyboard actions by predicting pixel coordinates, removing reliance on accessibility trees at inference time.
  • Training used the FaraGen pipeline to create 145,603 verified browser trajectories with 1,010,797 steps across 70,117 domains, then supervised finetuned a Qwen2.5-VL-7B–based multimodal decoder with a 128k-token context.
  • On live web benchmarks, Fara-7B scored 73.5% on WebVoyager, 34.1% on Online-Mind2Web, 26.2% on DeepShop, and 38.4% on WebTailBench, outperforming a 7B baseline and comparing favorably to larger systems.
  • Microsoft reports significant efficiency and safeguards, estimating about $0.025 per WebVoyager task versus roughly $0.30 for larger SoM agents, with sandboxed operation, auditable logs, Critical Point consent prompts, and high refusal rates for harmful tasks.