Particle: Thinking Machines Introduces Interaction Models for Real-Time, Multimodal AI

Overview

Thinking Machines, which unveiled the approach Monday, published a technical post detailing a research model called TML-Interaction-Small and released internal benchmarks.
The system trains interactivity into the network with 200 millisecond micro-turns, encoder-free early fusion of audio and video, and a split design that pairs a fast interaction model with a slower background model for deeper tasks.
The lab claims roughly 0.40 second response time and large accuracy gains on time-aware tests, including TimeSpeak and temporal action counting, versus OpenAI’s GPT Realtime-2.
To speed adoption, the team upstreamed streaming-session support to the SGLang serving stack so clients can send 200 millisecond chunks while the server maintains a persistent GPU sequence.
The company plans a limited research preview in the coming months and a wider release later in 2026, and reporters note the results need independent replication and real-world user testing.