Particle.news

Thinking Machines Unveils Real‑Time Interaction Model That Listens While It Speaks

The research preview challenges voice systems that layer latency tricks on turn‑based chatbots.

Overview

  • Thinking Machines announced a research preview of “interaction models” that keep listening, seeing, and speaking in one continuous flow across audio, video, and text.
  • The design uses 200‑millisecond micro‑turns for rapid back‑and‑forth and hands slower planning and tool use to a separate background model.
  • The lab introduced TML‑Interaction‑Small, a mixture‑of‑experts system with 276 billion parameters and 12 billion active per step, and it reported large gains on new timing and temporal tests versus OpenAI’s GPT Realtime‑2 minimal.
  • Claims include 64.7% accuracy on a time‑aware speech test called TimeSpeak and 35.4% on temporal action counting, though the article does not provide independent verification.
  • Serving changes include streaming 200‑millisecond chunks via SGLang and a training‑to‑inference “bitwise” match for deterministic outputs, with the lab saying longer sessions and scale remain open work for 2026.