Overview
- Microsoft released Fara-7B as an experimental, open-weight model under an MIT license, available on Microsoft Foundry and Hugging Face with optimizations for Copilot+ PCs.
- The 7B-parameter agent operates from screenshots to generate mouse and keyboard actions by predicting pixel coordinates, removing reliance on accessibility trees at inference time.
- Training used the FaraGen pipeline to create 145,603 verified browser trajectories with 1,010,797 steps across 70,117 domains, then supervised finetuned a Qwen2.5-VL-7B–based multimodal decoder with a 128k-token context.
- On live web benchmarks, Fara-7B scored 73.5% on WebVoyager, 34.1% on Online-Mind2Web, 26.2% on DeepShop, and 38.4% on WebTailBench, outperforming a 7B baseline and comparing favorably to larger systems.
- Microsoft reports significant efficiency and safeguards, estimating about $0.025 per WebVoyager task versus roughly $0.30 for larger SoM agents, with sandboxed operation, auditable logs, Critical Point consent prompts, and high refusal rates for harmful tasks.