Overview
- Thinking Machines, which unveiled the approach Monday, published a technical post detailing a research model called TML-Interaction-Small and released internal benchmarks.
- The system trains interactivity into the network with 200 millisecond micro-turns, encoder-free early fusion of audio and video, and a split design that pairs a fast interaction model with a slower background model for deeper tasks.
- The lab claims roughly 0.40 second response time and large accuracy gains on time-aware tests, including TimeSpeak and temporal action counting, versus OpenAI’s GPT Realtime-2.
- To speed adoption, the team upstreamed streaming-session support to the SGLang serving stack so clients can send 200 millisecond chunks while the server maintains a persistent GPU sequence.
- The company plans a limited research preview in the coming months and a wider release later in 2026, and reporters note the results need independent replication and real-world user testing.