Overview
- DeepReinforce announced Ornith‑1.0 on Monday, June 29, 2026, publishing four open‑source variants under an MIT license that range from a 9 billion‑parameter dense model to a 397 billion‑parameter mixture‑of‑experts flagship.
- Ornith’s central innovation is treating the task scaffold—the harness that orders tool calls, error handling, and verification—as a learnable object so the model jointly optimizes how it plans work and how it writes code.
- The lab reports strong benchmark results, citing Ornith‑1.0‑397B scores of 77.5 on Terminal‑Bench 2.1 and 82.4 on SWE‑Bench Verified and a compact Ornith‑1.0‑9B that posts 69.4 on SWE‑Bench Verified, but these claims await independent reproduction.
- To limit reward‑gaming from self authored scaffolds DeepReinforce describes a three‑layer defense that fixes the outer environment and tool surface, runs a deterministic monitor to block out‑of‑bounds actions, and places a frozen LLM judge as a veto over verifier outputs.
- Ornith uses pipeline‑RL with token‑level staleness weighting and a weighted GRPO loss to train long agent rollouts, a set of engineering choices that could let teams run autonomous developer agents on both edge devices and large data‑center models while raising new verification and safety review needs.