6 Sources
[1]
Thinking Machines wants to build an AI that actually listens while it talks | TechCrunch
Thinking Machines Lab, the AI startup founded last year by former OpenAI CTO Mira Murati, on Monday announced something called interaction models, which, at its essence, sounds like AI that can interrupt you. Right now, every AI model you've ever used works the same way. You talk, it listens. It responds, you listen. Thinking Machines is trying to change that by building a model that processes your input and generates a response at the same time, so it's more like a phone call than a text chain. The technical term for this is "full duplex," and the company claims its model, TML-Interaction-Small, responds in 0.40 seconds, which is roughly the speed of natural human conversation and significantly faster than comparable models from OpenAI and Google. Still, this is a research preview, not a product. The company isn't releasing it to the public yet. A "limited research preview" is coming in the next few months, it says, with a wider release set for later this year. So what to make of it? We're not sure. The benchmarks are impressive and the underlying idea -- that interactivity should be native to a model, not bolted on -- is definitely interesting. Whether the real-world experience lives up to the technical claims is something we won't know until people can actually use it.
[2]
Here's what Mira Murati's AI company is up to
Thinking Machines, the AI company founded by former OpenAI CTO Mira Murati, announced Monday that it's working on something called "interaction models." The idea behind interaction models, according to Thinking Machines, is that they will let people "collaborate with AI the way we naturally collaborate with each other -- they continuously take in audio, video, and text, and think, respond, and act in real time." As explained by Thinking Machines: Today's models experience reality in a single thread. Until the user finishes typing or speaking, the model waits with no perception of what the user is doing or how the user is doing it. Until the model finishes generating, its perception freezes, receiving no new information until it finishes or is interrupted. This creates a narrow channel for human-AI collaboration that limits how much of a person's knowledge, intent, and judgement can reach the model, and how much of the model's work can be understood. Picture trying to resolve a crucial disagreement over email rather than in person. At Thinking Machines, we believe we can solve this bandwidth bottleneck by making AI interactive in real time across any modality. This enables AI interfaces to meet humans where they are, rather than forcing humans to contort themselves to AI interfaces. Thinking Machines also shared several examples of the model in action, including listening for mentions of animals in a story, translating speech in real time, and telling someone when they're slouching. You can read a deeper explanation of interaction models on the Thinking Machines website. However, you can't try interaction models for yourself just yet; Thinking Machines plans to open a "limited research preview" in the "coming months" and aims to do a "wider release later this year."
[3]
Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'
Is AI leaving the era of "turn-based" chat? Right now, all of us who use AI models regularly for work or in our personal lives know that the basic interaction mode across text, imagery, audio, and video remains the same: the human user provides an input, waits anywhere between milliseconds to minutes (or in some cases, for particularly tough queries, hours and days), and the AI model provides an output. But if AI is to really take on the load of jobs requiring natural interaction, it will need to do more than provide this kind of "turn-based" interactivity -- it will ultimately need to respond more fluidly and naturally to human inputs, even responding while also processing the next human input, be it text or another format. That at least seems to be the contention of Thinking Machines, the well-funded AI startup founded last year by former OpenAI chief technology officer Mira Murati and former OpenAI researcher and co-founder John Schulman, among others. Today, the firm announced a research preview of what it deems to be "interaction models, a new class of native multimodal systems that treats interactivity as a first-class citizen of model architecture rather than an external software "harness," scoring some impressive gains on third-party benchmarks and reduced latency as a result. However, the models are not yet available to the general public or even enterprises -- the company says in its announcement blog post: "In the coming months, we will open a limited research preview to collect feedback, with a wider release later this year." 'Full duplex' simultaneous input/output processing At the heart of this announcement is a fundamental shift in how AI perceives time and presence. Current frontier models typically experience reality in a single thread; they wait for a user to finish an input before they begin processing, and their perception freezes while they generate a response. In their blog post, the Thinking Machines researchers described the status quo as a limitation that forces humans to "contort themselves" to AI interfaces, phrasing questions like emails and batching their thoughts. To solve this "collaboration bottleneck," Thinking Machines has moved away from the standard alternating token sequence. Instead, they use a multi-stream, micro-turn design that processes 200ms chunks of input and output simultaneously. This "full-duplex" architecture allows the model to listen, talk, and see in real time, enabling it to backchannel while a user speaks or interject when it notices a visual cue -- such as a user writing a bug in a code snippet or a friend entering a video frame. Technically, the model utilizes encoder-free early fusion. Rather than relying on massive standalone encoders like Whisper for audio, the system takes in raw audio signals as dMel and image patches (40x40) through a lightweight embedding layer, co-training all components from scratch within the transformer. Dual model system The research preview introduces TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts (MoE) model with 12 billion active parameters. Because real-time interaction requires near-instantaneous response times that often conflict with deep reasoning, the company has architected a two-part system: This setup allows the AI to perform tasks like live translation or generating a UI chart while continuing to listen to user feedback -- a capability demonstrated in the announcement video where the model provided typical human reaction times for various cues while simultaneously generating a bar chart. Impressive performance on major benchmarks against other leading AI labs' fast interaction models To prove the efficacy of this approach, the lab utilized FD-bench, a benchmark specifically designed to measure interaction quality rather than just raw intelligence.The results show that significantly outperforms existing real-time systems: * Responsiveness: It achieved a turn-taking latency of 0.40 seconds, compared to 0.57s for Gemini-3.1-flash-live and 1.18s for GPT-realtime-2.0 (minimal). * Interaction Quality: On FD-bench V1.5, it scored 77.8, nearly doubling the scores of its primary competitors (GPT-realtime-2.0 minimal scored 46.8). * Visual Proactivity: In specialized tests like RepCount-A (counting physical repetitions in video) and ProactiveVideoQA, Thinking Machines' model successfully engaged with the visual world while other frontier models remained silent or provided incorrect answers. A potentially huge boon to enterprises -- once the models are made available If made available to the enterprise sector, Thinking Machines' interaction models would represent a fundamental shift in how businesses integrate AI into their operational workflows. A native interaction model like TML-Interaction-Small allows for several enterprise capabilities that are currently impossible or highly brittle with standard multimodal models: Current enterprise AI requires a "turn" to be completed before it can analyze data. In a manufacturing or lab setting, a native interaction model can monitor a video feed and proactively interject the moment it detects a safety violation or a deviation from a protocol -- without waiting for the worker to ask for feedback. The model's success in visual benchmarks like RepCount-A (accurate repetition counting) and ProactiveVideoQA (answering questions as visual evidence appears) suggests it could serve as a real-time auditor for high-stakes physical tasks. The primary friction in voice-based customer service is the 1-2 second "processing" delay common in 2026's standard APIs. Thinking Machines' model achieves a turn-taking latency of 0.40 seconds, roughly the speed of a natural human conversation. Because it handles simultaneous speech natively, an enterprise support bot could listen to a customer's frustration, provide "backchannel" cues (like "I see" or "mm-hmm") without interrupting the user, and offer live translation that feels like a natural conversation rather than a series of disjointed recordings. Standard LLMs lack an internal clock; they "know" time only if it is provided in a text prompt. Interaction models are natively time-aware, allowing them to manage time-sensitive processes like "Remind me to check the temperature every 4 minutes" or "Alert me if this process takes longer than the last one". This is critical for industrial maintenance and pharmaceutical research where timing is an essential variable. Background on Thinking Machines This release marks the second major milestone for Thinking Machines following the October 2025 launch of Tinker, a managed API for fine-tuning language models that lets researchers and developers control their data and training methods while Thinking Machines handles the infrastructure burden of distributed training. The company said Tinker supports both small and large open-weight models, including mixture-of-experts models, and early users included groups at Princeton, Stanford, Berkeley and Redwood Research. At launch in early 2025, Thinking Machines framed itself as an AI research and product company trying to make advanced AI systems "more widely understood, customizable and generally capable." In July 2025, Thinking Machines said it had raised about $2 billion at a $12 billion valuation in a round led by Andreessen Horowitz, with participation from Nvidia, Accel, ServiceNow, Cisco, AMD and Jane Street, described by WIRED as the largest seed funding round in history. The Wall Street Journal reported in August 2025 that rival tech CEO Mark Zuckerberg approached Murati about acquiring Thinking Machines Lab and, after she declined, Meta pursued more than a dozen of the startup's roughly 50 employees. In March and April 2026, the company also became known for its compute ambitions: it announced a Nvidia partnership to deploy at least one gigawatt of next-generation Vera Rubin systems, then expanded its Google Cloud relationship to use Google's AI Hypercomputer infrastructure with Nvidia GB300 systems for model research, reinforcement learning workloads, frontier model training and Tinker. By April 2026, Business Insider reported that Meta had hired seven founding members from Thinking Machines, including Mark Jen and Yinghai Lu, while another Thinking Machines researcher, Tianyi Zhang, also moved to Meta. The same reporting said Joshua Gross, who helped build Thinking Machines' flagship fine-tuning product Tinker, had joined Meta Superintelligence Labs, and that the company had grown to about 130 employees despite the departures. Thinking Machines was not simply losing people, however: it also hired Meta veteran Soumith Chintala, creator of PyTorch, as CTO, and added other high-profile technical talent such as Neal Wu. TechCrunch separately reported in April 2026 that Weiyao Wang, an eight-year Meta veteran who worked on multimodal perception systems, had joined Thinking Machines, underscoring that the talent flow was not one-way. Thinking Machines previously stated it was committed to "significant open source components" in its releases to empower the research community. It's unclear if these new interaction models models will fall under the same ethos and release terms. But one thing is certain: by making interactivity native to the model, Thinking Machines believes that scaling a model will now make it both smarter and a more effective collaborator.
[4]
Thinking Machines drops a new, highly responsive model designed for humanlike interactions in real-time - SiliconANGLE
Thinking Machines drops a new, highly responsive model designed for humanlike interactions in real-time Thinking Machines Lab Inc., the artificial intelligence research startup founded by former OpenAI Group PBC Chief Technology Officer Mira Murati, wants to move beyond the era of "turn-based" AI interactions. The company has just announced a research preview of its first "interaction models," which are a new class of multimodal AI systems designed to avoid the inevitable pauses that characterize human interactions with AI systems. As anyone who uses AI regularly knows, the basic interaction is a spotty one, at best: The user provides an input, such as text or an image upload, then waits anywhere from a few milliseconds to several minutes, depending on the model used, before finally receiving the output. This occurs because existing models need to wait for their users to finish asking a question or complete the sentence they're saying before they can start processing a response. To get around this, Thinking Machines has created an entirely new model architecture that enables "full-duplex" communication, which means AI that can listen, see and talk simultaneously. Thinking Machines argues that the back-and-forth interactions with current models forces human users to "contort themselves" to the interface. Over multiple months of use, humans have learned to phrase their questions like emails and batch their thoughts, because they know the AI they're using cannot handle interruptions or deal with the subtle "backchanneling," or the "mhmms" and "I sees" that exist in truly natural human interactions. But if AI is to become a true humanlike collaborator in high-stakes applications like medical surgery, it has to find a way to ditch that lag. The company's answer is a new model architecture that drops the standard alternating token sequence in favor of a larger, multistream micro-turn-based design. The way it works is the system processes inputs and outputs in tiny 200-millisecond chunks, enabling it to react in real-time to any visual or auditory cues it picks up on, even when it's already speaking. The startup says this "dual-model" architecture is designed to balance speed with deep reasoning. The first component of this new architecture is TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts model that's designed to manage dialogue, presence and immediate follow-ups with rapid speed. It's paired with an asynchronous agent that's meant to work behind the scenes, so while the Interaction Model keeps the conversation flowing, the Background Model takes care of all of the heavy lifting - the complex reasoning, web searches and tool calls required to get things done or work things out. It can then send its findings to the Interaction Model when it's ready, and these will be woven into the live chat. In a blog post, the company explained that instead of using heavy external encoders to translate audio or video into signals the model can understand, it utilizes "encoder-free early fusion" that takes in raw signals directly through a lightweight embedding layer. Everything is processed rapidly within the transformer, which is what gives it such an advantage in terms of latency. Thinking Machines claims that this dual-model architecture delivers some impressive results. On FD-bench, a benchmark designed to measure AI interaction quality, TML-Interaction-Small achieved a turn-taking latency of less than 0.4 seconds, well ahead of Google LLC's Gemini-3.1-flash-live, which clocked in at 0.57 seconds, and GPT-realtime-2.0, which achieved a score of just 1.18 seconds. While speedier chatbots will be appreciated by most people, the most significant implications could be found in enterprise applications. Models that can see and react in real-time pave the way for possibilities that simply don't exist when dealing with the latency prevalent with today's models. For instance, a native interaction model could be set up to monitor a video feed in a laboratory or a manufacturing facility and alert humans the moment a safety violation occurs, rather than waiting for a human supervisor to stroll past and see it with their own eyes. In customer service, the lower latency can help to make calls feel more like real conversations. What's especially useful is that Thinking Machine's models have an internal sense of time, which allows them to manage time-sensitive requests. A user in a lab could tell a model to "alert me if this chemical reaction takes longer than the last one," without needing to provide any timestamps in the prompt. Thinking Machines says TML-Interaction-Small and its partnering background model are only being made available to a select number of partners during the research preview phase, with a public release slated for later in the year.
[5]
Thinking Machines unveils AI that can interrupt users in real-time
Thinking Machines Lab announced the development of a new AI technology called interaction models, intended to enable AI to interrupt users during conversations. Founded by former OpenAI CTO Mira Murati, the company asserts that traditional AI models function sequentially -- listening and then responding -- whereas its new model aims for simultaneous input processing and response generation, akin to a phone call. The company has named its model TML-Interaction-Small, which reportedly responds in 0.40 seconds, closely matching the speed of natural human conversation. This performance exceeds that of similar models developed by OpenAI and Google, according to the company. The technical capability of this model is described as "full duplex." Currently, TML-Interaction-Small is only available in a research preview phase and is not open to the public. Thinking Machines Labs plans to roll out a limited research preview in the next few months, followed by a wider release planned for later this year. While the benchmarks for the TML-Interaction-Small have been characterized as impressive, the effectiveness of the model in practical applications will only be verified once it is accessible to users. The concept of integrating interactivity into AI models has been noted as a novel approach, though its ultimate success remains to be determined.
[6]
AI that talks back in real time: Mira Murati's Thinking Machines unveils 'interaction models' - The Economic Times
Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati, has introduced what it calls "interaction models". It's a new approach to artificial intelligence designed to make conversations feel more natural and immediate. The idea is simple but ambitious: AI that does not wait for you to finish speaking or typing before responding, but can instead engage in real time. What are interaction models? Interaction models are designed to process information and respond to it continuously, making AI behave more like a live conversation than a message thread. Instead of waiting for a full prompt and then replying, the system can listen, think and respond at the same time. The company describes this as a move towards AI that collaborates more naturally with people in real time. "We think interactivity should scale alongside intelligence; the way we work with AI should not be treated as an afterthought," the company said in a blog post. Early preview and claims The system, called TML-Interaction-Small, is currently in research preview and not publicly available. The company says it will release a limited preview in the coming months, with wider access planned later this year. Key features include: How this differs from today's AI Most current AI tools are built around a simple structure: the user speaks or types, the model responds, and the exchange repeats. Thinking Machines Lab argues this slows down collaboration and limits how useful AI can be in real-world tasks. "AI labs often treat the ability for AI to work autonomously as the model's most important capability. As a result, today's models and interfaces aren't optimized for humans to remain in the loop." The company says its approach is meant to keep humans involved throughout the interaction, rather than pushing them out of the process. It also points to the limitations of current systems, noting that "until the user finishes typing or speaking, the model waits with no perception of what the user is doing or how the user is doing it." Limitations Long sessions are still difficult because continuous audio and video quickly fill up context, so managing very long conversations remains an open challenge. The system works better for short- and medium-length interactions, but extended use still requires careful context management. It also depends heavily on strong connectivity for smooth real-time audio and video streaming, and performance can drop if the connection is weak. In addition, larger versions of the model are currently too slow to run in real time, so scaling the system while keeping it fast is still a limitation. Early reactions While the announcement has generated interest in the AI community, many observers say the real test will be how the system performs in everyday use. The company itself acknowledges that the technology is still early and experimental. "Autonomous interfaces are valuable, but in most real work, users can't fully specify their requirements upfront and walk away -- good results benefit from a collaborative process where the human stays in the loop, clarifying and giving feedback along the way," Thinking Machines Lab said. For now, interaction models remain a research concept. Whether they reshape how people use AI will depend on how well they work outside the lab.
Share
Copy Link
Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati, announced interaction models that process input and generate responses simultaneously. The TML-Interaction-Small model responds in 0.40 seconds—matching natural human conversation speed—and uses full-duplex architecture to listen, see, and talk in real time. A limited research preview arrives in coming months.
Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati, announced on Monday a research preview of interaction models, a new approach designed to transform how humans collaborate with AI systems
1
. Unlike traditional models that operate in turn-based chat sequences—where users must finish speaking before the AI begins processing—these interaction models can process input and generate responses simultaneously, creating what the company describes as real-time AI conversation2
. The technical capability enabling this is full-duplex architecture, which allows the system to listen, see, and talk at the same time, much like a phone call rather than an email exchange3
.
Source: SiliconANGLE
The company's first model, TML-Interaction-Small, is a 276-billion parameter Mixture-of-Experts system with 12 billion active parameters that achieves a turn-taking latency of 0.40 seconds—closely matching natural human conversation speed
4
. This highly responsive model significantly outperforms comparable systems from other major AI labs, with Gemini-3.1-flash-live clocking in at 0.57 seconds and GPT-realtime-2.0 at 1.18 seconds3
. On FD-bench, a benchmark specifically designed to measure interaction quality, TML-Interaction-Small scored 77.8, nearly doubling the 46.8 score of GPT-realtime-2.03
.At the core of Thinking Machines' approach is a fundamental architectural shift away from standard alternating token sequences. The system uses a multi-stream, micro-turn design that processes 200-millisecond chunks of input and output simultaneously. This allows the AI that can interrupt users when it detects visual or auditory cues—such as noticing a user slouching during a video call or identifying when someone writes a bug in code
2
. Rather than relying on massive standalone encoders, the system employs encoder-free early fusion, taking in raw audio signals and image patches through a lightweight embedding layer and co-training all components from scratch within the transformer.
Source: VentureBeat
Thinking Machines argues that current models force humans to "contort themselves" to AI interfaces, batching their thoughts and phrasing questions like emails because the AI cannot handle the natural backchanneling—the "mhmms" and "I sees"—that characterize genuine human dialogue
4
. The company's dual-model architecture addresses this limitation by pairing TML-Interaction-Small with an asynchronous background agent that handles complex reasoning, web searches, and tool calls while the interaction model maintains conversational flow4
. This design balances speed with deep reasoning, enabling multi-modal collaboration across audio, video, and text inputs.Related Stories
The implications for enterprise applications are substantial. Models with reduced latency could monitor video feeds in laboratories or manufacturing facilities and alert humans the moment a safety violation occurs, rather than waiting for human supervisors to notice
4
. In customer service contexts, the lower latency promises to make interactions feel more like genuine real-time conversation. The models also possess an internal sense of time, allowing them to manage time-sensitive requests without requiring timestamps in prompts4
.
Source: The Verge
However, this remains a research preview, not a finished product
1
. Thinking Machines plans to open a limited research preview to select partners in the coming months, with a wider release scheduled for later this year5
. The company, co-founded by Mira Murati and former OpenAI researcher John Schulman, has positioned interactivity as a first-class citizen of model architecture rather than an external software addition. While the benchmarks appear impressive, the real-world effectiveness will only become clear once users can test these systems in practical scenarios. Watch for how this approach influences the broader AI industry's thinking about conversational interfaces and whether competitors from major labs adopt similar architectural strategies.Summarized by
Navi
[1]
[2]
[3]
[4]
10 Oct 2024•Technology

02 Oct 2025•Technology

02 Jan 2026•Technology

1
Technology

2
Policy and Regulation

3
Policy and Regulation
