Thinking Machines unveils interaction models that listen, see and talk simultaneously in real time

Reviewed byNidhi Govil

3 Sources

Share

Mira Murati's AI startup Thinking Machines announced interaction models that process audio, video, and text simultaneously with 0.4-second latency. The full-duplex architecture eliminates turn-based chat delays, allowing AI to backchannel and respond while users speak. A limited research preview launches in coming months, with wider release planned for later this year.

Thinking Machines Challenges Turn-Based Chat with Real-Time AI

Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati and researcher John Schulman, announced a research preview of interaction models that fundamentally reimagine human-AI interaction

1

. Unlike current models that experience reality in a single thread, waiting for users to finish typing or speaking before processing begins, these new systems enable real-time AI collaboration across audio, video, and text simultaneously

1

. The company contends that today's models force humans to "contort themselves" to AI interfaces, phrasing questions like emails and batching thoughts because existing systems cannot handle interruptions or the subtle backchanneling that characterizes natural human conversation

3

.

Source: VentureBeat

Source: VentureBeat

Full-Duplex Architecture Enables Humanlike Interactions in Real-Time

At the core of Thinking Machines' breakthrough is a full-duplex architecture that treats interactivity as a first-class citizen of model design rather than an external software feature

2

. The system moves away from standard alternating token sequences, instead using a multi-stream, micro-turn design that processes 200-millisecond chunks of input and output simultaneously

2

. This allows the highly responsive model to backchannel while a user speaks or interject when it notices visual cues, such as someone writing a bug in code or a friend entering a video frame

2

. Rather than relying on massive standalone encoders, the multimodal AI system takes in raw audio signals and image patches through a lightweight embedding layer, with all components co-trained from scratch within the transformer

2

.

Source: SiliconANGLE

Source: SiliconANGLE

TML-Interaction-Small Achieves 0.4-Second Latency

The research preview introduces TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts (MoE model) with 12 billion active parameters

2

. Because real-time multi-modal collaboration requires near-instantaneous response times that often conflict with deep reasoning, Thinking Machines architected a dual-model system

2

. The Interaction Model manages dialogue and immediate follow-ups with rapid speed, while an asynchronous background agent handles complex reasoning, web searches, and tool calls behind the scenes

3

. On FD-bench, a benchmark designed to measure interaction quality, TML-Interaction-Small achieved a turn-taking latency of 0.40 seconds, compared to 0.57 seconds for Gemini-3.1-flash-live and 1.18 seconds for GPT-realtime-2.0

2

3

. The system scored 77.8 on FD-bench V1.5, nearly doubling the 46.8 score of GPT-realtime-2.0.

Enterprise Applications and Future Implications

Thinking Machines demonstrated several practical applications, including listening for mentions of animals in stories, translating speech in real time, and even posture correction by alerting users when they're slouching

1

. For enterprise applications, native interaction models could monitor video feeds in laboratories or manufacturing facilities and alert humans the moment safety violations occur, rather than waiting for human supervisors to notice

3

. In customer service, reduced latency can make calls feel more like genuine conversations

3

. The models possess an internal sense of time, allowing them to manage time-sensitive requests without requiring timestamps in prompts

3

. Thinking Machines plans to open a limited research preview in the coming months, with a wider release later this year

1

3

.

Source: The Verge

Source: The Verge

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved