Overview
- Ai2 published SERA-32B and SERA-8B plus the codebase, generated data, and end-to-end training recipes for reproducing and specializing agents.
- On SWE-Bench Verified, Ai2 reports SERA-32B solves 55% of tasks and SERA-8B reaches 29.4%, outperforming similar-sized open models.
- The method centers on soft-verified generation and a taxonomy of 51 bug patterns to create diverse, workflow-faithful synthetic trajectories.
- Training was conducted on two Nvidia H100 GPUs, with Ai2 estimating about $400 to replicate results and just over $2,000 for best performance, while independent reporting pegs up to roughly $12,000 to rival top industry models.
- Ai2 says small, fine-tuned models specialized on private repositories can match or exceed larger teacher agents, with community replication and external benchmarking now the next step.