11 Sources
11 Sources
[1]
OpenAI bets big on audio as Silicon Valley declares war on screens | TechCrunch
OpenAI is betting big on audio AI, and it's not just about making ChatGPT sound better. According to new reporting from The Information, the company has unified several engineering, product, and research teams over the past two months to overhaul its audio models, all in preparation for an audio-first personal device expected to launch in about a year. The move reflects where the entire tech industry is headed -- toward a future where screens become background noise and audio takes center stage. Smart speakers have already made voice assistants a fixture in more than a third of U.S. homes. Meta just rolled out a feature for its Ray-Ban smart glasses that uses a five-microphone array to help you hear conversations in noisy rooms -- essentially turning your face into a directional listening device. Google, meanwhile, began experimenting in June with "Audio Overviews" that transform search results into conversational summaries. And Tesla is integrating Grok and other LLMs into its vehicles to create conversational voice assistants that can handle everything from navigation to climate control through natural dialogue. It's not just the tech giants placing this bet. A motley crew of startups has emerged with the same conviction, albeit with varying degrees of success. The makers of the Humane AI Pin burned through hundreds of millions before their screenless wearable became a cautionary tale. The Friend AI pendant, a necklace that records your life and offers companionship, has sparked privacy concerns and existential dread in equal measure. And now at least two companies, including Sandbar and one helmed by Pebble founder Eric Migicovsky, are building AI rings expected to debut in 2026, allowing wearers to literally talk to the hand. The form factors may differ, but the thesis is the same: audio is the interface of the future. Every space -- your home, your car, even your face -- is becoming an interface. OpenAI's new audio model, slated for early 2026, will reportedly sound more natural, handle interruptions like an actual conversation partner, and even speak while you're talking, which is something today's models can't manage. The company is also said to envision a family of devices, possibly including glasses or screenless smart speakers, that act less like tools and more like companions. As The Information notes, former Apple design chief Jony Ive, who joined OpenAI's hardware efforts through the company's $6.5 billion acquisition in May of his firm io, has made reducing device addiction a priority, seeing audio-first design as a chance to "right the wrongs" of past consumer gadgets.
[2]
OpenAI device will be 'audio-based' with new ChatGPT models, per report - 9to5Mac
A new report at The Information details the latest leaks and tidbits on OpenAI's first hardware device, including plans for upgraded audio models for ChatGPT and more. Stephanie Palazzolo writes at The Information: OpenAI is taking steps to improve its audio AI models, in preparation for its eventual release of an AI-powered personal device, said a person with knowledge of the effort. The device is expected to be largely audio-based, said three people with knowledge of it. [...] A new audio-model architecture produces responses that sound more natural and emotive and provide more accurate, in-depth answers, said the person with knowledge of the effort. The new audio model will also be able to speak at the same time as a human user, which today's models can't do, and will handle interruptions better, this person said. The company is aiming to release the new audio model in the first quarter of 2026 The report goes on to explain that OpenAI's mystery hardware product still isn't expected to launch for about another year. It will also, as previously reported, be just the first of a family of devices being developed. This ecosystem of products is expected to be entirely audio-focused. Palazzolo writes: "Among the ideas the company has discussed are glasses and a smart speaker without a display." I'm eager to see how human-like OpenAI can make interactions with ChatGPT feel. Right now I never use voice mode, but not necessarily because of its current limitations. Rather, I'm rarely in environments without other people around, and thus text is a more fitting interaction method for me. Building devices entirely around audio sounds risky, since I imagine I'm not alone in preferring text-based AI interactions. But I remain highly interested in the work Jony Ive, Sam Altman, and team are doing. Would you want an OpenAI device that's entirely audio focused? Let us know in the comments.
[3]
OpenAI Building New Voice Model Ahead of AI Device Launch: Report
Last year, OpenAI acquired ex-Apple design chief Jony Ive's AI hardware startup io for $6.5 billion. OpenAI is working on a new AI audio model architecture, which is slated to release in the first quarter of this year, reported The Information. This model is also being developed for the new voice-based device the company is working on, added the report. Furthermore, OpenAI has restructured and brought together several engineers and researchers in a team to build the new AI model. It is expected to bring significant improvements in accuracy, emotion, more natural responses while also being able to handle interruptions like a real conversation partner. Last year, OpenAI deepened its push into hardware by partnering with former Apple design chief Jony Ive. In May last year, Ive's startup io, focused on building hardware products around artificial intelligence, was acquired by the AI giant in a nearly $6.5 billion all-stock deal. It also marked the next phase of a two-year collaboration between Ive's design firm LoveFrom and OpenAI, to lead design for the AI giant's future hardware and software. In July last year, OpenAI ramped up its hiring across multiple positions in the consumer hardware sector, with positions open for hardware systems product designer, to help build the 'next generation of world's most innovative mobile devices'. According to a Wall Street Journal report from last year, Altman and Ive hinted that these AI companion devices would be fully aware of the user's surroundings while offering an 'unobtrusive' experience. The report added that these devices would be standalone units, and will be released later this year. Last August, OpenAI made its Realtime API generally available with new features and released its "most advanced" speech-to-speech model, gpt-realtime. The company claimed that gpt-realtime is better at interpreting system messages and developer prompts. This includes reading disclaimer scripts word-for-word on a support call, repeating back alphanumerics, or switching seamlessly between languages mid-sentence.
[4]
Report: OpenAI plans to launch new audio model in the first quarter - SiliconANGLE
Report: OpenAI plans to launch new audio model in the first quarter OpenAI Group PBC is reportedly developing a new artificial intelligence model optimized for audio generation tasks. The Information today cited sources as saying that the algorithm will launch by the end of March. According to the publication, it's expected to produce more natural-sounding speech than OpenAI's current models. The AI will also be better at handling real-time back-and-forth interactions with users. OpenAI will reportedly base the model on a new architecture. The company's current flagship real-time audio model, GPT-realtime, uses the ubiquitous transformer architecture. It's unclear whether the company will pivot to an entirely different algorithm design or simply adopt a new transformer implementation. Some transformer-based audio models process speech directly. Others, such as the Whisper algorithm that OpenAI released in 2022, turn audio files into graphs called spectrograms before processing them. Whisper and the company's newer audio models are all available in multiple editions with varying output quality. It's possible OpenAI will also offer multiple versions of the algorithm it's expected to release this quarter. The company has reportedly combined several engineering, product and research teams to support its audio model push. The initiative is said to be led by Kundan Kumar, a former researcher at venture-backed AI provider Character.AI Inc. Many of the startup's other staffers joined Google LLC in late 2024 as part of a $2.7 billion reverse acquihire. It's possible OpenAI's upcoming model will not focus solely on speech generation use cases. The nascent AI-generated music segment is currently experiencing rapid growth: the Wall Street Journal recently reported that one market player, startup Suno Inc., is generating more than $200 million in annual revenue. Joining the fray may help OpenAI boost its consumer business. The upcoming audio model is part of a broader effort on the company's part to enter the consumer electronics market. According to The Information, OpenAI plans to launch an "audio-first personal device" in about a year. It's believed the company could eventually introduce an entire portfolio of devices complete with a smart speaker and smart glasses. Last May, OpenAI acquired product design startup io Products Inc. to support its consumer hardware push. The transaction valued the Jony Ive-founded startup at $6.5 billion. In October, the Financial Times reported that Ive is working on a smartphone-sized device that is designed to sit on a desk or table. OpenAI may seek to develop a lightweight, on-device audio model to support its move into consumer hardware. Processing prompts locally is more cost-efficient than sending them to the cloud. Google has taken a similar approach with its Pixel smartphone series, which uses an on-device model called Gemini Nano to power some AI features.
[5]
OpenAI unifies teams to build audio device with Jony Ive
OpenAI has unified engineering, product, and research teams over the past two months to overhaul its audio models in preparation for an audio-first personal device set to launch in about a year, according to reporting from The Information. The overhaul targets improvements in OpenAI's audio capabilities beyond current limitations. The company's new audio model, scheduled for release in early 2026, will produce more natural-sounding speech. It will manage interruptions in a manner similar to a real conversation partner. This model will also enable the AI to speak simultaneously while the user talks, a function that existing models cannot perform. OpenAI plans a family of devices powered by this advanced audio technology. Possible designs include glasses or screenless smart speakers. These devices aim to function as companions rather than mere tools, integrating seamlessly into daily interactions. Former Apple design chief Jony Ive contributes to OpenAI's hardware initiatives. OpenAI acquired his firm io for $6.5 billion in May. Ive prioritizes audio-first design principles to address device addiction. As The Information notes, Ive views this approach as an opportunity to "right the wrongs" of past consumer gadgets. The push toward audio interfaces aligns with broader industry developments. Smart speakers, featuring voice assistants, exist in more than a third of U.S. homes. These devices have established voice interaction as a standard household element. Meta introduced a feature for its Ray-Ban smart glasses that employs a five-microphone array. This setup assists users in hearing conversations within noisy environments. The technology effectively transforms the glasses into a directional listening device positioned on the face. Google initiated experiments in June with "Audio Overviews." This feature converts search results into conversational summaries delivered via audio. Users receive spoken overviews instead of visual lists, facilitating hands-free information access. Tesla incorporates Grok and other large language models into its vehicles. The integration creates conversational voice assistants capable of natural dialogue. These assistants manage tasks such as navigation and climate control through spoken commands and responses. Startups pursue similar audio-centric hardware with mixed outcomes. The Humane AI Pin, a screenless wearable, consumed hundreds of millions in funding before emerging as a cautionary tale in the sector. The Friend AI pendant functions as a necklace that records aspects of users' lives while providing companionship. This device has generated privacy concerns alongside reports of existential dread among users. Additional startups develop AI rings projected for debut in 2026. Sandbar represents one effort. Another involves Pebble founder Eric Migicovsky. These rings enable wearers to engage in conversations directly with the device on their hand. Form factors vary across these projects -- wearables, pendants, rings -- but all emphasize audio as the primary interface. Spaces such as homes, cars, and even the face evolve into interactive audio environments.
[6]
OpenAI Might Want Users to Start Interacting With AI in a Different Way
OpenAI recently reorganized several of its teams in order to focus on improving its audio-generation AI models, according to a new report -- and these improvements are crucial to OpenAI pulling off one of its most high-profile projects. According to a report from the Information, OpenAI is prioritizing audio AI development because the technology will be at the core of OpenAI's much-anticipated line of physical devices, created by legendary iPhone designer Jony Ive. OpenAI bought Ive's design startup, io Products, for $6.5 billion in May 2025, with the explicit goal of creating a new generation of AI-powered devices. In the months since, rumors have circulated that the devices would eschew screens in favor of an audio-based operating system, in the form of a pair of smart glasses or an Amazon Echo-like speaker. OpenAI CEO Sam Altman further added to those rumors during a December 2025 interview with journalist Alex Kantrowitz, in which he said that using a screen would limit OpenAI's device to "the same way we've had graphical user interfaces working for many decades."
[7]
OpenAI Is Doubling Down on Audio AI as It Prepares a New Audio Device
* OpenAI said to release a new audio model in early 2026 * This AI model will reportedly be more natural-sounding * The company can reportedly launch an audio device in a year OpenAI is reportedly increasing its focus on the company's audio-based artificial intelligence (AI) models. As per the report, the San Francisco-based AI giant has combined several teams, including engineering, product, and research, to develop a new audio generation model. This model is said to power the company's audio-first AI device, which could be launched later in 2026. Notably, a recent leak claimed that the ChatGPT maker is planning to launch not one but three different devices, including an AI Pen. OpenAI Reportedly Preparing for Audio-First AI Device Launch According to The Information (via TechCrunch), OpenAI has unified several teams for the last two months to accelerate the development of a new audio AI model and improve the performance of the existing audio models that power features such as the Advanced Voice mode. Citing people familiar with the model, the publication claimed that the new model will be released in early 2026. This new AI audio generation model will reportedly arrive with features such as a more natural-sounding voice, better handling of interruptions made by the user, and the ability to speak when the user is talking to make the conversation feel more human-like. The last capability, if true, will be a major upgrade compare to the existing audio models available in the market. While the AI model will reportedly be released early this year, the audio-first device is not expected to hit the market before late-2026. Not a lot is known about this device, but it is said that it will be a screen-less experience, similar to Humane's AI Pin and the AI Pendant by Limitless. Notably, both of the products failed to impress the consumers, and while Humane was shut down and its assets were acquired by HP, Limitless was acquired by Meta last month. A separate report had claimed that OpenAI was currently in the vendor evaluation stage, with production expected to start soon. The company is reportedly planning to let Foxconn handle the manufacturing in its Vietnam plant, but nothing is confirmed at this stage. The AI giant is also said to be working on two other devices, one of which is an AI Pen. Nothing else is known about these devices at this point.
[8]
OpenAI Improving Audio AI Models Ahead of Introduction of Personal AI Devices | PYMNTS.com
By completing this form, you agree to receive marketing communications from PYMNTS and to the sharing of your information with our sponsor, if applicable, in accordance with our Privacy Policy and Terms and Conditions. Currently, the company's large language model (LLM) that powers the audio version of ChatGPT is less accurate and slower to respond than those powering the text-based versions, The Information reported Thursday (Jan. 1), citing unnamed sources. OpenAI plans to release the new audio model during the first quarter and expects to release its first personal AI device in about a year, according to the report. OpenAI declined The Information's request to comment on the report. According to the report, OpenAI plans to release several personal AI devices, including glasses and a smart speaker, over time. Google, Amazon, Meta and Apple are also developing personal AI devices that will be optimized for artificial intelligence, the report said. Many AI researchers, including those working on OpenAI's first device, expect users to interact with these devices through speech because that is a more natural method than using a screen, per the report. When OpenAI announced its acquisition of io, an AI device startup co-founded by former Apple chief design officer Jony Ive, in May 2025, OpenAI CEO Sam Altman noted that to use ChatGPT today, users need to turn on a computer, open a web browser, go to the ChatGPT website, type in a query and wait for an answer. "I think we have the opportunity here to kind of completely reimagine what it means to use a computer," Altman said. OpenAI Chief Operating Officer Brad Lightcap told The Wall Street Journal in May 2025 that the company sees an opportunity for AI access to be offered through an "ambient computer layer" rather than via web browsers and mobile apps. The company aims to eliminate the need to look at a screen to access AI and wants to build AI that is "truly personal," Lightcap said. PYMNTS reported Tuesday (Dec. 30) that ahead of next week's CES 2026, several consumer electronics manufacturers are previewing hardware products built on artificial intelligence technologies. The devices include wearables, mixed-reality systems, home security, appliances and robotics.
[9]
OpenAI Shifts Focus on Voice, Aims to Launch an Audio Device in 2026
OpenAI seeks to ride the audio first hardware wave as startups create new form factors for voice enabled smarts That voice would soon become the preferred AI interface appears beyond doubt, as we discussed in this article posted a few days ago. A lot is being spoken of and experimented with in the realm of Voice LLMs. However, the world's richest and biggest startup (is it?) OpenAI is chasing a bigger dream of a future where screens become background noise and voice takes centre stage. And no, this isn't just about making ChatGPT sound better or even smarter. Recent reports published by The Information suggests that the company wants to create an audio-first personal device and could be well on the way to launching it in 2026. Towards this end, Sam Altman has created a unified team at OpenAI comprising research, product and engineering. Of course, there is precious little that the company is letting out for now. Maybe, they're not yet clear about the contours of the story. Or Altman is just waiting for the opportune time to reveal more details. By which we do not necessarily mean that OpenAI is wary of letting out secrets. It could just be that Altman would play this card when something or the other goes wrong within the company and its PR needs a quick fix. OpenAI's move isn't surprising Be that as it may, OpenAI's move towards creating a voice assistant is not surprising, given that smart speakers are already doing this job in about 30% of all US homes. Google began playing around with its audio overviews last June that could be a precursor to transforming its search results into an interactive voice-based conversation. More recently, Meta came out with a feature on its Ray-Ban smart glasses that lays out a five-microphone array to help the wearer listen to conversations in noisy rooms. If this technology turned a human face into a directional listening device, there is Stream Ring from Sandbars that is launching a thought-taker for just $249 in a few months' time. Which is we don't find it surprising that Altman and his team at OpenAI is suddenly pivoting towards voice as the new mode of AI-human interaction. A host of startups have been talking big around this business, none more so than the makers of Humane Pin who burnt hundreds of million dollars for creating a screenless wearable. Can Apple's former design chief work wonders for Altman? The sole difference between the various ideas around voice-enabled AI devices revolve around the form factor but the basic idea is the same. If one isn't doing audio, one is missing out, seems to be the refrain emanating from the Silicon Valley. And OpenAI seems to be making a start with its new audio model to arrive early this year. They say it would sound more natural, be capable of handling interruptions in conversations with a human being and even break into its own speech while you are talking. Imaging what a babble such conversations would produce if this came true. The new team at OpenAI is also putting together a bevy of devices that are AI-enabled and could include smart glasses, smart speakers and maybe some more. Towards this end, the company recently acquired the services of design czar from Apple. Yes, nothing less for do for Altman and his lofty dreams. Jony Ive, who was Apple's design chief, is now a part of OpenAI's hardware team. Of course, Ive didn't quite Apple, he was part of a $6.5 billion acquisition done by Altman of his company called just "io". It appears that the core idea we move into 2026, and the fourth year of LLM madness, is that artificial intelligence or AI is no more a tool. It seeks to be a companion.
[10]
Examining Voice-First AI Amid OpenAI Betting On Audio-Driven AI
OpenAI is reportedly reorganising teams and rebuilding its audio models ahead of an audio-first personal device launch. For context, TechCrunch, while citing The Information, reported that OpenAI has unified its engineering, product, and research teams to improve audio AI for an upcoming device and is developing a new audio model architecture focused on more natural and accurate responses. Notably, the new audio model is expected to handle interruptions and speak while the user is talking, with a rollout timeframe around early 2026. Pertinently, all these developments at OpenAI point to a larger question: what changes when AI becomes voice-first? The significance here is not that OpenAI may ship new hardware, but in how companies now treat audio as a primary interface for AI: a change that reshapes how users experience accuracy, trust, privacy, and security. OpenAI's internal assessment, as reported by The Information, points to a clear problem. Its audio models lag behind its text models in both accuracy and response speed, a gap that becomes harder to ignore as more products move toward voice-led interaction. That assessment reflects a broader shift already visible across major tech platforms. Voice assistants have become a default feature in homes through smart speakers. Meta has begun using multi-microphone arrays in its Ray-Ban smart glasses to actively process and enhance real-world conversations. Meanwhile, Google has started experimenting with "Audio Overviews" that convert search results into spoken summaries rather than lists of links. And Tesla is integrating xAI's chatbot Grok into its vehicles to allow drivers to control navigation-related commands. Against this backdrop, OpenAI is not pursuing incremental improvements in text-to-speech. Instead, it is trying to unify its text and audio stack so the system can listen, reason, and respond in real time, including during overlapping conversational dynamics. As mentioned earlier, OpenAI's next audio model will support more natural interaction, handle interruptions, and speak while the user is still talking. If OpenAI succeeds, it moves AI away from turn-based chat and closer to ambient, continuous conversation, where voice becomes the primary interface. Audio-first AI does not simply add voice input to an existing chatbot. It redesigns the interaction model so speech becomes the default channel in both directions. That produces three practical shifts. Text interfaces introduce friction that can help users stay sceptical. Users can re-read responses, scroll back, or copy claims to verify them elsewhere. Voice-based interaction may reduce some of that friction because responses unfold in real-time and disappear once spoken. Furthermore, a particular user's experience in voice-led interactions can influence perceptions of credibility. To explain, a 2022 systematic review of voice assistant usability found that user acceptance is closely tied to overall usability and interaction experience, particularly in systems where voice is the primary mode of interaction. Moreover, conversational systems can make uncertainty harder to notice. To explain, most audio systems tend to deliver a single response rather than signalling hesitation or asking for clarification, even when speech input contains ambiguity due to accents, background noise, overlapping speakers, or imperfect microphone capture. In such cases, users may have limited visibility into what the system actually heard or how confident it was in its interpretation. As a result, trust dynamics may begin to shift before questions of privacy, regulation, or liability come into focus. The shift in trust dynamics has implications beyond perception. Audio-first systems rely on microphones that remain always on or close to it, increasing the likelihood of passive data capture and recording of bystanders. Notably, European data protection regulators have cautioned that voice assistants raise specific data protection risks, particularly where audio is retained by default or speech is captured in shared environments, potentially conflicting with principles such as storage limitation and informed consent. In the Indian context specifically, the Digital Personal Data Protection (DPDP) Act requires that consent be "free, specific, informed, unconditional and unambiguous", and that it be demonstrated through a clear affirmative action by the individual providing it, as set out in Section 6(1) of the Act. However in practice, ambient audio makes that standard difficult to meet, as many affected individuals may never receive notice, let alone an opportunity to consent. Voice-first AI does not merely change how users interact with systems. It also complicates questions of responsibility when those systems act in ways users did not intend. A well-known example surfaced in earlier generations of voice assistants. In 2017, an Amazon Alexa device reportedly picked up a phrase during a television broadcast and placed an unintended order: a case that MediaNama later examined during its Governing the AI Ecosystem roundtable on agentic systems. Participants at the discussion argued that while the outcome was reversible, the incident highlighted how ambient, voice-triggered systems can act without a clear or deliberate user instruction. Legal and policy experts at the roundtable differed on where liability should sit in such cases. Some argued that developers and manufacturers bear responsibility when foreseeable safeguards are missing, treating such failures as closer to product defects. Others pointed to user-configurable settings, such as voice recognition controls or payment authorisations, as factors that could shift responsibility back to users. Several participants also cautioned that consent and choice can become illusory when systems rely on long, complex terms or default configurations that users rarely understand. Overall, these views highlight how voice-first interaction blurs traditional boundaries between user action and system behaviour, complicating how responsibility is assigned when things go wrong. Audio-first AI also expands the attack surface, as security research has shown. Unlike text-based systems, voice interfaces must continuously process ambient sound from their surroundings, increasing exposure to unintended inputs and malicious triggers. As users often cannot see or review what the system has "heard", errors or abuse can occur without immediate detection, making security failures harder to spot and contest. Researchers behind the DolphinAttack paper demonstrated that attackers can embed voice commands into ultrasonic frequencies that humans cannot hear but microphones and speech recognition systems can still pick up. In simple terms, a device can be made to "hear" a command even though the user hears nothing at all. For audio-first systems designed to listen continuously, this means actions could be triggered without the user ever realising a command was issued. Advances in synthetic voice generation have also increased spoofing risks. In a widely cited case reported by Forbes, a fraudster used an AI-generated voice to impersonate a company executive and persuade an employee to transfer $243,000 (more than Rs 2.19 crore). For consumer-facing devices the context differs, however the underlying lesson remains. If a system treats voice as proof of identity, it may be vulnerable to replayed recordings, synthetic voices, or other forms of impersonation. Speaker recognition, which attempts to verify a user based on vocal characteristics, can reduce some risk. But it cannot reliably distinguish between a real person and a high-quality imitation. Consequently, as audio-first systems become more widespread, these vulnerabilities turn voice-driven technology from a convenience into a potential point of failure.
[11]
OpenAI ramps up audio AI efforts ahead of new device launch: Report
A tipster recently shared that OpenAI and Ive's are working on a device that could be a pen or "a 'to-go' audio device. OpenAI is reportedly accelerating its work on audio AI as it prepares to launch a new personal device expected in about a year. According to Information, the company has merged several engineering, product, and research teams over the past two months to overhaul its audio models, signalling a major shift toward voice-first technology. At the centre of this push is a new audio model planned for early 2026. The model is expected to sound more natural, deal with interruptions smoothly, and even speak while a user is talking, which today's AI systems still struggle with. OpenAI is also said to be exploring a lineup of devices, possibly including screenless speakers or smart glasses, designed to act more like companions than traditional gadgets. Also read: Apple iPhone 17 Pro Max price drops by over Rs 16,000: How to grab this deal This move fits into a much broader trend across the tech industry. Many companies now believe audio will become the main way people interact with technology, with screens fading into the background. Voice assistants are already common, with smart speakers used in more than a third of US homes, reports TechCrunch. Former Apple design chief Jony Ive, now involved in OpenAI's hardware efforts through the company's acquisition of his firm io, sees audio-first design as a way to reduce screen addiction and "right the wrongs" of past consumer devices. Also read: Instagram head Adam Mosseri says era of believing images is over as AI advances In related news, a tipster recently shared that OpenAI and Ive's are working on a device that could be a pen or "a 'to-go' audio device." The tipster also claimed that the gadget will likely be manufactured by Foxconn, the same company that builds iPhones.
Share
Share
Copy Link
OpenAI has restructured engineering, product, and research teams to develop a new audio model launching in Q1 2026. The company plans an audio-first personal device in about a year, working with former Apple design chief Jony Ive. The move signals a broader industry shift toward screenless, audio-centric interfaces as Silicon Valley reimagines how users interact with AI technology.
OpenAI has unified several engineering, product, and research teams over the past two months to overhaul its audio models, according to reporting from The Information
1
. The restructuring aims to prepare for an audio-first personal device expected to launch in about a year2
.The company's new audio model architecture is slated for release in the first quarter of 2026
3
. This advanced audio model will produce responses that sound more natural and emotive while providing more accurate, in-depth answers2
. Unlike today's models, the new system will handle interruptions like an actual conversation partner and even speak while users are talking—a capability current ChatGPT audio features cannot manage1
.
Source: SiliconANGLE
Former Apple design chief Jony Ive joined OpenAI's hardware efforts through the company's $6.5 billion acquisition in May of his firm io
1
. Ive has made reducing device addiction a priority, seeing audio-first design as a chance to "right the wrongs" of past consumer gadgets5
. The Jony Ive collaboration marks the next phase of a two-year partnership between his design firm LoveFrom and OpenAI to lead design for the AI giant's future hardware and software3
.OpenAI envisions a family of devices that act less like tools and more as companion-like devices
1
. Among the ideas discussed are smart glasses and a smart speaker without a display2
. According to a Wall Street Journal report from last year, these AI companion devices would be fully aware of users' surroundings while offering an unobtrusive experience as standalone units3
.
Source: PYMNTS
The move reflects where the entire tech industry is headed—toward a future where screens become background noise and audio takes center stage
1
. Smart speakers have already made voice assistants a fixture in more than a third of U.S. homes5
. Meta just rolled out a feature for its Ray-Ban smart glasses that uses a five-microphone array to help users hear conversations in noisy rooms1
. Google began experimenting in June with Audio Overviews that transform search results into conversational summaries5
. Tesla is integrating Grok and other LLMs into its vehicles to create conversational voice assistants that handle everything from navigation to climate control through natural, interruptible conversations1
.Startups have pursued screenless, audio-centric interfaces with mixed results. The makers of the Humane AI Pin burned through hundreds of millions before their screenless wearable became a cautionary tale
1
. At least two companies, including Sandbar and one helmed by Pebble founder Eric Migicovsky, are building AI rings expected to debut in 20265
.Related Stories
OpenAI's current flagship real-time audio model, GPT-realtime, uses the transformer architecture
4
. The company released its Whisper algorithm in 2022, which turns audio files into graphs called spectrograms before processing them4
. Last August, OpenAI made its Realtime API generally available with new features and released gpt-realtime, which the company claimed is better at interpreting system messages and developer prompts3
.OpenAI may seek to develop a lightweight, on-device audio model to support its move into consumer hardware. Processing prompts locally is more cost-efficient than sending them to the cloud. Google has taken a similar approach with its Pixel smartphone series, which uses an on-device model called Gemini Nano to power some AI features
4
. In July last year, OpenAI ramped up hiring across multiple positions in the consumer hardware sector, with positions open for hardware systems product designer to help build the next generation of the world's most innovative mobile devices3
.
Source: Digit
Summarized by
Navi
[5]
1
Policy and Regulation

2
Technology

3
Technology
