Particle.news

DeepReinforce Unveils Ornith‑1.0, Open‑Source Models That Learn Their Own Coding Scaffolds

The lab says the technique lets models co‑learn execution strategies with solutions to run multi‑step coding workflows autonomously.

Overview

  • DeepReinforce announced Ornith‑1.0 on Monday, June 29, 2026, publishing four open‑source variants under an MIT license that range from a 9 billion‑parameter dense model to a 397 billion‑parameter mixture‑of‑experts flagship.
  • Ornith’s central innovation is treating the task scaffold—the harness that orders tool calls, error handling, and verification—as a learnable object so the model jointly optimizes how it plans work and how it writes code.
  • The lab reports strong benchmark results, citing Ornith‑1.0‑397B scores of 77.5 on Terminal‑Bench 2.1 and 82.4 on SWE‑Bench Verified and a compact Ornith‑1.0‑9B that posts 69.4 on SWE‑Bench Verified, but these claims await independent reproduction.
  • To limit reward‑gaming from self authored scaffolds DeepReinforce describes a three‑layer defense that fixes the outer environment and tool surface, runs a deterministic monitor to block out‑of‑bounds actions, and places a frozen LLM judge as a veto over verifier outputs.
  • Ornith uses pipeline‑RL with token‑level staleness weighting and a weighted GRPO loss to train long agent rollouts, a set of engineering choices that could let teams run autonomous developer agents on both edge devices and large data‑center models while raising new verification and safety review needs.