8 Sources
8 Sources
[1]
The Gemini app just got the one feature everyone was asking for
Google has updated the Gemini app to handle any file type. This includes audio files, which was the top request. Audio length can be up to 10 minutes for free users. Around the beginning of August, we found evidence that Google was working on adding a highly requested feature to Gemini. Specifically, the AI app was getting support for audio file uploads. Fast forward to today, and the function is now live on Android, iOS, and on the web. On social media, Josh Woodward, VP of Google Labs and Gemini, announced that users can now upload any file to the Gemini app. According to Woodward, this also includes the number one request -- audio files. So when you tap or click on the plus icon and select "Upload files" (web) or "Files" (mobile), you'll be able to upload MP3 files, WAV files, and more. Along with this announcement, Google has updated its support document. According to the page, up to 10 files can be uploaded in a single prompt, and the total audio length can be no longer than 10 minutes. However, the total audio length increases to three hours if you pay for Google AI Pro or AI Ultra. While 10 minutes may not seem like much time, it still doubles the total amount of time for videos. As a refresher, video uploads can go up to five minutes, with that time extended to a full hour if you're a paid subscriber.
[2]
This much-requested Gemini feature just went live
Google Gemini's free tier gets a feature ChatGPT has had for months For months, Google's Gemini app has been able to chew through images, PDFs, and even video uploads, but audio was conspicuously absent. That changes today. Google quietly flipped the switch on one of Gemini's most-requested features: audio file uploads. Expanded support, expanded use cases A feature users have wanted for months Vice President of Google Labs and Gemini Josh Woodward confirmed the rollout on X, calling it the "#1 request." The update is live across Android, iOS, and the web, letting you drop in MP3s, WAVs, and most other common formats by hitting the familiar "Upload files" option. As you might imagine, there is a small caveat. Free-tier Gemini users can upload up to 10 files at a time, but the total audio length across those uploads can't exceed 10 minutes. If you're on Google's paid tiers -- Gemini Advanced via AI Pro or AI Ultra -- that cap jumps significantly, to three hours. While it's not unlimited, it is relatively generous. By comparison, Gemini video uploads remain capped at five minutes for free users, with paid users allowed an hour. Audio doubles that free allowance while offering three times the headroom for paid plans, which makes sense for use cases like transcription, parsing meeting notes, or analyzing podcasts. The lack of audio support has been a weird omission since file uploads arrived earlier this year. Gemini could already summarize YouTube videos and handle short clips you threw at it, but recording a quick voice memo and asking the AI to work with it wasn't possible until now. This puts Gemini closer to feature parity with rivals like OpenAI's ChatGPT, which has supported audio uploads and transcription for months. Whether 10 minutes is enough for casual users is up for debate, and the disparity indicates Google wants audio processing to act as another subscription driver. Regardless, if you've been waiting to throw a podcast snippet, lecture clip, or your own rambling voice notes into Gemini, now's your chance to try it.
[3]
Google Gemini Can Now Take Your Audio Files
Google's Gemini has finally added the ability to upload and analyze audio files. This new feature takes your audio files, including common formats like MP3, M4A, and WAV, and can transcribe, summarize, and extract key details from the content. The feature is now available on Android, iOS, and the web. You can access the new feature through the plus menu on the Gemini mobile app or the Upload files option on the web. From there, just select an audio file from your device. It will then analyze whatever you put into it and make it incredibly easy to find details in your content, whether it's a recorded meeting, an interview, a lecture, or even a personal voice note. Unfortunately, the new transcription service comes with tiered usage limits, which will be different for free users and those with a paid subscription. For users on the free tier, the total audio length that can be uploaded and analyzed is capped at 10 minutes. This is incredibly generous of Google, and it offers more time for audio files than any other free transcription service I've seen. The time limit isn't the only restriction to look out for. You can upload up to 10 files of any supported format on a single prompt by default. This includes code folders with up to 5,000 files, GitHub repositories, and ZIP files containing up to 10 compressed files. The audio update does not expand this limit, but it counts toward the 10-file limit of what you can upload at once. If you're going to use it to transcribe, I'd recommend giving the script back to Gemini and asking if there is anything there that isn't in the audio file. This is just in case the AI messes up at any point, because 10 minutes to three hours is a long time for any AI, and I personally wouldn't completely trust it not to confuse words or hallucinate. Keep in mind that once an audio file is uploaded, Gemini can do more than simply convert it to text. Users can prompt the AI to summarize the key points, identify different speakers, or even extract specific action items or quotes. This turns a raw audio file into a structured, searchable, and highly useful document. For Power users and professionals who need more extensive transcription capabilities, Google is offering significantly higher limits. Subscribers to Google AI Pro or Google AI Ultra can upload up to three hours of audio. This is a huge increase that makes the service great for transcribing long-form content like podcasts, full-length interviews, or seminars. I can imagine anyone who runs a business or works in transcribing could take advantage of the low $20 monthly cost of the AI Pro Plan. I have saved a lot of time putting YouTube links into Gemini to find a spot I'm looking for in hour-long videos. Gemini is great at paying attention to what is happening in video links, so I know this upgrade for audio is likely to be really helpful for users. Source: Google, 9to5Google
[4]
You can now upload audio files to the Gemini app
The Gemini app finally supports audio uploads on Android, iOS, and the web for expanded file analysis options. Open Gemini's 'plus' menu for "Files" (mobile) or "Upload files" (web) and select any audio file: MP3, M4A, WAV, etc. Google says "total audio length can be up to 10 minutes" for free users, while it's 3 hours for Google AI Pro or Google AI Ultra subscribers. This can be useful for transcribing audio, with video previously supported. It's a great quality-of-life improvement, with Google noting today how this has been a #1 request. This joins video, which can be up to 5 minutes (free) or 1 hour (paid) and up to 2 GB in size, with "all other supported file types can be up to 100 MB."
[5]
Gemini just got a new highly-requested feature that trumps ChatGPT
The feature turns 10 minutes of voice memos, meetings, lectures, and interviews into searchable documents. Google Gemini has just learned how to listen and make sense of what it hears. You can now upload audio files to the AI assistant on the web or through the mobile apps and get transcriptions, summaries, and key details. For anyone who's ever let a voice memo rot in their phone or dreaded the task of rewatching a meeting recording, this update could be the AI equivalent of hiring a personal note-taker. That said, it can only handle 10 minutes of audio at a time, so no long meetings just yet. You can upload the audio files directly by selecting audio from the usual file upload options. What makes it different from Gemini's earlier Gemini Live voice features is that this isn't just speaking to the AI in real time. Gemini Live is useful for casual commands, but this is more about getting the AI to process data as it does with the other formats. Notably, audio file uploading has apparently been the most requested feature from users, according to Google's VP of Gemini Josh Woodward.. I tested it by uploading a couple of sketches from old comedy albums and a phone conversation with a friend. The AI successfully transcribed all the words said in each case, with a couple of small name-related errors. It was also good at pulling out key elements and things set for a to-do list. The demand for audio and Google's response hint at how AI tools are evolving to match how we save information in audio logs and voice memos. Turning that into something searchable has usually meant using external transcription software. Gemini's new feature collapses that process into a single step. What makes the addition feel particularly timely is the way it dovetails with other recent Gemini improvements. Google has already integrated Gemini into apps like, begun testing a card-based visual interface, and significantly expanded Gemini's personalization options. The ability to process audio continues that trend. The audio option isn't unique to Gemini among AI assistants, but it can at least match some of what ChatGPT can do thanks to its Whisper transcription model. In fact, in my testing, I preferred Google's offering. Anthropic's Claude also handles audio in some developer tools, and Perplexity can extract data from YouTube videos. But Gemini's execution is more focused on everyday use cases. And the output isn't just a dumb transcription. You can ask Gemini to simplify the language, extract speaker-specific comments, generate questions based on the content, or create a study guide from a classroom discussion. Of course, the 10-minute limit puts some restraint on making it part of everyday life. Free-tier users also face daily usage limits. Google hasn't released a formal pricing breakdown for high-volume audio processing, but it's part of the regular Gemini quota, so anyone planning to feed it a dozen hours of legal depositions should pace themselves.
[6]
Google's Gemini AI can now process and talk about audio files
Hey Gemini, give me a one-page summary of this two-hour lecture. Thanks! Google's Gemini AI is multi-modal, which means it can process and generate files in various formats, ranging from text and images to videos. Though it can generate audio, so far, it has lacked the ability to process audio files uploaded by users. That finally changes, as Gemini now lets you feed audio files and talk about them. What's the big change? The ability to upload audio files is now live in the Gemini mobile app and the web version, too. In the Gemini chat bubble, tap on the "+" icon and upload the audio clip by selecting the clip-shaped file upload icon. Oh, by the way, this feature is free for all Gemini users. Recommended Videos According to Google's support page, you can upload audio clips of up to ten minutes duration. But if you pay for the Gemini AI Pro or Ultra bundles, you can upload audio files with a run time of up to 3 hours. In case you're curious about what other file formats you can feed to Gemini, here's a quick rundown: Up to 10 files in one go, including ZIP files. Video of up to 2GB in size. 5 minutes in length for free users, and 1 hour for paying customers. One code folder, or one GitHub repository (up to 5,000 files / 100MB size) A boon for the bibliophiles Not everyone loves digging into an audiobook, podcast, or lecture recording. Sometimes, walls of text are where the real magic happens, or it's where the cognitive comfort zone lies. If you count yourself among the folks who seek some aural liberation, this Gemini feature update is nothing short of a godsend. And yeah, audio support goes beyond the English language, as you can see in the post below. Now, whether it's the summarization of a long lecture, or the need to extract only a few specific talking points from a podcast, Gemini will handle the audio and give you just what you want. You can ask it to write long reports, short briefs, or even convert it into the form of knowledge slides that you can export as images. On the other end of the rope, we have the fantastic NotebookLM tool. It can turn your long text files into an engaging two-person audio podcast. If you prefer video overviews, it can do that, as well. And while at it, go and avail the free Gemini AI Pro offer that Google is offering to students in numerous countries, including the US.
[7]
Google Gemini now transcribes audio files
Google's Gemini AI assistant now allows audio file uploads, enabling users to transcribe, summarize, and extract key information from recordings. This new feature converts up to 10 minutes of voice memos, meetings, lectures, and interviews into searchable documents directly within the AI environment. Audio file uploads are supported on both the web and mobile applications. Users can access the feature through the standard file-upload interface. This differs from Gemini Live's real-time voice command processing, as the new function processes pre-recorded audio for data extraction and analysis. Josh Woodward, Google's VP of Gemini, stated that audio file upload was the most requested feature from Gemini users. This demand highlights a need for streamlined audio processing within the AI assistant. During testing, Gemini accurately transcribed various audio types, including comedy album sketches and phone conversations, with only minor errors in name recognition. The system also effectively identified key elements and generated to-do lists from the audio content. The addition of audio processing aligns with recent Gemini integrations, such as implementations into various apps, testing of a card-based visual interface, and expanded personalization options. These updates collectively enhance Gemini's functionality and user experience. While Gemini's audio capabilities are not unique, they are comparable to features from competitors like ChatGPT, which uses its Whisper transcription model. Anthropic's Claude also supports audio processing in certain developer tools, and Perplexity can extract data from YouTube videos. Gemini aims to focus on everyday use cases for a broad user base. Beyond simple transcription, Gemini allows users to request language simplification, extract speaker-specific comments, generate questions from audio content, or create study guides from recorded discussions. These options provide tools to efficiently manipulate and repurpose audio information. The current 10-minute limit on audio file uploads restricts its applicability for longer recordings. Free-tier users also face daily usage limits on audio processing. These limitations may impact users with extensive audio processing needs. Google has not released specific pricing for high-volume audio processing. However, audio processing is integrated into the regular Gemini quota. This suggests users should manage their usage to avoid exceeding allocated resources.
[8]
You Can Now Upload Audio Files to the Gemini App, But There's a Catch
A Google executive said this was a highly requested feature by users Google is updating the Gemini app with a new capability. On Monday, a company executive announced that the mobile app version of the artificial intelligence (AI) platform will now support audio files as input. So far, the file upload feature has only accepted text files across a limited number of formats. With this, users can ask the AI chatbot to process and analyse audio files and answer queries based on them. Additionally, the Mountain View-based tech giant is also adding support for ZIP files. Gemini App Can Now Analyse Audio Files In a post on X (formerly known as Twitter), Josh Woodward, the Vice President of Google Labs, Gemini, and AI Studio, announced that the Gemini app was being upgraded to support more file formats. This capability is being added to both the Android and iOS versions of the apps. One of the supported formats also includes audio files, allowing users to upload music samples and interviews to the platform. Calling it the number one requested feature, Woodward highlighted that the new formats have been introduced in addition to the existing images, videos, text files, and PDFs. However, just like images and videos, audio files will also arrive with tier-based rate limits Those on the free tier of Gemini can upload a maximum of 10 minutes of audio and get access to five free prompts per day, as per a support page. On the other hand, the Google AI Pro and Google AI Ultra users can upload files with up to three hours of audio. These users can also upload 10 files a day across all formats. Apart from images and videos, Gemini accepts a wide range of text file formats, including txt, doc, docx, PDF, RTF, dot, dotx, hwp, hwpx, and Google Docs. It also accepts data files such as xls, xlsx, csv, tsv, and Google Sheets. In addition, it will also support ZiP files. But a ZIP file can only contain a maximum of 10 files. Notably, when uploading coding files or a GitHub repository, users can upload a maximum of 5,000 files with a maximum size of 100MB to a chat session.
Share
Share
Copy Link
Google has added audio file upload support to its Gemini AI app, allowing users to transcribe, summarize, and analyze audio content. This feature is available across Android, iOS, and web platforms, with varying time limits for free and paid users.
Google has rolled out a significant update to its Gemini AI app, introducing support for audio file uploads across Android, iOS, and web platforms
1
2
. This highly anticipated feature, described as the "#1 request" by Josh Woodward, VP of Google Labs and Gemini, allows users to upload and analyze various audio formats, including MP3, WAV, and M4A files3
.The new audio upload capability enables Gemini to transcribe, summarize, and extract key details from uploaded content
3
. This feature proves particularly useful for processing recorded meetings, interviews, lectures, and personal voice notes. Users can prompt the AI to identify different speakers, extract specific action items, or generate summaries, transforming raw audio into structured, searchable documents5
.Google has implemented tiered usage limits for the audio upload feature:
1
3
These limits apply per prompt, with users able to upload up to 10 files of any supported format in a single interaction
3
.The introduction of audio uploads brings Gemini closer to feature parity with rivals like OpenAI's ChatGPT, which has supported audio uploads and transcription for some time
2
. Notably, Gemini's 10-minute allowance for free users is considered generous compared to other free transcription services3
.In comparison to Gemini's video upload feature, which is limited to 5 minutes for free users and 1 hour for paid subscribers, the audio upload allowance is more expansive
1
4
.Related Stories
The audio upload feature opens up numerous possibilities for users:
This update aligns with Google's recent efforts to enhance Gemini's functionality and integration across various applications, making it a more versatile tool for everyday use
5
.While the audio upload feature significantly expands Gemini's capabilities, users should be aware of potential limitations. The AI's accuracy in transcription and analysis may vary, especially with longer audio files or complex content. Users are advised to review AI-generated outputs for accuracy, particularly when dealing with important or sensitive information
3
.As Gemini continues to evolve, this new audio processing capability represents a significant step forward in making AI assistance more accessible and useful for a wide range of personal and professional applications.
Summarized by
Navi
[1]
[2]
[3]
[4]