Thinking Machines Unveils Real-Time AI Models

Thinking Machines Introduces New Class of AI That Processes While It Talks

Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati, announced on Monday a research preview of interaction models, a new approach designed to transform how humans collaborate with AI systems1

. Unlike traditional models that operate in turn-based chat sequences—where users must finish speaking before the AI begins processing—these interaction models can process input and generate responses simultaneously, creating what the company describes as real-time AI conversation2

. The technical capability enabling this is full-duplex architecture, which allows the system to listen, see, and talk at the same time, much like a phone call rather than an email exchange3

Source: SiliconANGLE

The company's first model, TML-Interaction-Small, is a 276-billion parameter Mixture-of-Experts system with 12 billion active parameters that achieves a turn-taking latency of 0.40 seconds—closely matching natural human conversation speed4

. This highly responsive model significantly outperforms comparable systems from other major AI labs, with Gemini-3.1-flash-live clocking in at 0.57 seconds and GPT-realtime-2.0 at 1.18 seconds3

. On FD-bench, a benchmark specifically designed to measure interaction quality, TML-Interaction-Small scored 77.8, nearly doubling the 46.8 score of GPT-realtime-2.03

How Full-Duplex Architecture Enables Human-Like Interactions

At the core of Thinking Machines' approach is a fundamental architectural shift away from standard alternating token sequences. The system uses a multi-stream, micro-turn design that processes 200-millisecond chunks of input and output simultaneously. This allows the AI that can interrupt users when it detects visual or auditory cues—such as noticing a user slouching during a video call or identifying when someone writes a bug in code2

. Rather than relying on massive standalone encoders, the system employs encoder-free early fusion, taking in raw audio signals and image patches through a lightweight embedding layer and co-training all components from scratch within the transformer.

Source: VentureBeat

Thinking Machines argues that current models force humans to "contort themselves" to AI interfaces, batching their thoughts and phrasing questions like emails because the AI cannot handle the natural backchanneling—the "mhmms" and "I sees"—that characterize genuine human dialogue4

. The company's dual-model architecture addresses this limitation by pairing TML-Interaction-Small with an asynchronous background agent that handles complex reasoning, web searches, and tool calls while the interaction model maintains conversational flow4

. This design balances speed with deep reasoning, enabling multi-modal collaboration across audio, video, and text inputs.

Enterprise Applications and What Comes Next

The implications for enterprise applications are substantial. Models with reduced latency could monitor video feeds in laboratories or manufacturing facilities and alert humans the moment a safety violation occurs, rather than waiting for human supervisors to notice4

. In customer service contexts, the lower latency promises to make interactions feel more like genuine real-time conversation. The models also possess an internal sense of time, allowing them to manage time-sensitive requests without requiring timestamps in prompts4

Source: The Verge

However, this remains a research preview, not a finished product1

. Thinking Machines plans to open a limited research preview to select partners in the coming months, with a wider release scheduled for later this year5

. The company, co-founded by Mira Murati and former OpenAI researcher John Schulman, has positioned interactivity as a first-class citizen of model architecture rather than an external software addition. While the benchmarks appear impressive, the real-world effectiveness will only become clear once users can test these systems in practical scenarios. Watch for how this approach influences the broader AI industry's thinking about conversational interfaces and whether competitors from major labs adopt similar architectural strategies.

Thinking Machines unveils interaction models that respond in 0.40 seconds for real-time AI chat

Thinking Machines Introduces New Class of AI That Processes While It Talks

How Full-Duplex Architecture Enables Human-Like Interactions

Enterprise Applications and What Comes Next

References

Thinking Machines wants to build an AI that actually listens while it talks | TechCrunch

Here's what Mira Murati's AI company is up to

Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'

Thinking Machines drops a new, highly responsive model designed for humanlike interactions in real-time - SiliconANGLE

Thinking Machines unveils AI that can interrupt users in real-time

Related Stories

OpenAI's Realtime API: A Game-Changer for Smart Speakers and Voice Assistants

Thinking Machines Lab Unveils Tinker: An API for AI Model Fine-Tuning

OpenAI unifies teams to build advanced audio model for audio-first AI device launching in 2026

Recent Highlights

Nvidia RTX Spark chips power new AI laptops with up to 128GB memory and local agent capabilities

Florida sues OpenAI and Sam Altman over ChatGPT safety, alleging AI harms linked to violence

Trump signs AI executive order seeking voluntary 30-day review after industry pushback

Recent Highlights

Today's Top Stories

UN Report Warns AI Could Consume 3% of Global Electricity and Water for 1.3 Billion by 2030

Meta Business Agent launches globally to automate customer support across WhatsApp and Instagram

Google launches Gemma 4 12B, bringing multimodal AI agents to consumer laptops with 16GB RAM

Google's Gemini Avatar lets you create an AI clone of yourself in minutes, sparking deepfake concerns