Thinking Machines unveils interaction models that respond in 0.40 seconds for real-time AI chat

Reviewed byNidhi Govil

6 Sources

Share

Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati, announced interaction models that process input and generate responses simultaneously. The TML-Interaction-Small model responds in 0.40 seconds—matching natural human conversation speed—and uses full-duplex architecture to listen, see, and talk in real time. A limited research preview arrives in coming months.

Thinking Machines Introduces New Class of AI That Processes While It Talks

Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati, announced on Monday a research preview of interaction models, a new approach designed to transform how humans collaborate with AI systems

1

. Unlike traditional models that operate in turn-based chat sequences—where users must finish speaking before the AI begins processing—these interaction models can process input and generate responses simultaneously, creating what the company describes as real-time AI conversation

2

. The technical capability enabling this is full-duplex architecture, which allows the system to listen, see, and talk at the same time, much like a phone call rather than an email exchange

3

.

Source: SiliconANGLE

Source: SiliconANGLE

The company's first model, TML-Interaction-Small, is a 276-billion parameter Mixture-of-Experts system with 12 billion active parameters that achieves a turn-taking latency of 0.40 seconds—closely matching natural human conversation speed

4

. This highly responsive model significantly outperforms comparable systems from other major AI labs, with Gemini-3.1-flash-live clocking in at 0.57 seconds and GPT-realtime-2.0 at 1.18 seconds

3

. On FD-bench, a benchmark specifically designed to measure interaction quality, TML-Interaction-Small scored 77.8, nearly doubling the 46.8 score of GPT-realtime-2.0

3

.

How Full-Duplex Architecture Enables Human-Like Interactions

At the core of Thinking Machines' approach is a fundamental architectural shift away from standard alternating token sequences. The system uses a multi-stream, micro-turn design that processes 200-millisecond chunks of input and output simultaneously. This allows the AI that can interrupt users when it detects visual or auditory cues—such as noticing a user slouching during a video call or identifying when someone writes a bug in code

2

. Rather than relying on massive standalone encoders, the system employs encoder-free early fusion, taking in raw audio signals and image patches through a lightweight embedding layer and co-training all components from scratch within the transformer.

Source: VentureBeat

Source: VentureBeat

Thinking Machines argues that current models force humans to "contort themselves" to AI interfaces, batching their thoughts and phrasing questions like emails because the AI cannot handle the natural backchanneling—the "mhmms" and "I sees"—that characterize genuine human dialogue

4

. The company's dual-model architecture addresses this limitation by pairing TML-Interaction-Small with an asynchronous background agent that handles complex reasoning, web searches, and tool calls while the interaction model maintains conversational flow

4

. This design balances speed with deep reasoning, enabling multi-modal collaboration across audio, video, and text inputs.

Enterprise Applications and What Comes Next

The implications for enterprise applications are substantial. Models with reduced latency could monitor video feeds in laboratories or manufacturing facilities and alert humans the moment a safety violation occurs, rather than waiting for human supervisors to notice

4

. In customer service contexts, the lower latency promises to make interactions feel more like genuine real-time conversation. The models also possess an internal sense of time, allowing them to manage time-sensitive requests without requiring timestamps in prompts

4

.

Source: The Verge

Source: The Verge

However, this remains a research preview, not a finished product

1

. Thinking Machines plans to open a limited research preview to select partners in the coming months, with a wider release scheduled for later this year

5

. The company, co-founded by Mira Murati and former OpenAI researcher John Schulman, has positioned interactivity as a first-class citizen of model architecture rather than an external software addition. While the benchmarks appear impressive, the real-world effectiveness will only become clear once users can test these systems in practical scenarios. Watch for how this approach influences the broader AI industry's thinking about conversational interfaces and whether competitors from major labs adopt similar architectural strategies.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved