Particle.news

Thinking Machines Introduces Interaction Models for Real-Time, Multimodal AI

Limited trials begin in the coming months with claims still unverified.

Overview

  • Thinking Machines, which unveiled the approach Monday, published a technical post detailing a research model called TML-Interaction-Small and released internal benchmarks.
  • The system trains interactivity into the network with 200 millisecond micro-turns, encoder-free early fusion of audio and video, and a split design that pairs a fast interaction model with a slower background model for deeper tasks.
  • The lab claims roughly 0.40 second response time and large accuracy gains on time-aware tests, including TimeSpeak and temporal action counting, versus OpenAI’s GPT Realtime-2.
  • To speed adoption, the team upstreamed streaming-session support to the SGLang serving stack so clients can send 200 millisecond chunks while the server maintains a persistent GPU sequence.
  • The company plans a limited research preview in the coming months and a wider release later in 2026, and reporters note the results need independent replication and real-world user testing.