Overview
- The Abu Dhabi–based Technology Innovation Institute published Falcon H1R 7B on Hugging Face with full and quantized checkpoints plus a technical report under the Falcon TII license.
- TII reports that the 7‑billion‑parameter model matches or surpasses larger systems such as Phi 4 Reasoning Plus 14B, Qwen3‑32B, and Nemotron H 47B, including an 88.1% score on AIME‑24.
- The model builds on a hybrid Transformer–Mamba backbone and a two‑stage regimen of curated supervised fine‑tuning on long‑form reasoning traces followed by reinforcement learning with GRPO.
- A test‑time scaling method called DeepConf filters low‑confidence reasoning chains to boost accuracy while cutting tokens generated, positioning the model on a favorable cost‑performance frontier.
- Developers cite high inference throughput of roughly 1,000 tokens per second per GPU at batch 32 and about 1,500 at batch 64, nearly doubling Qwen3‑8B in comparable settings.