Meta releases SAM Audio AI model to isolate and edit sounds with simple text prompts

Reviewed byNidhi Govil

4 Sources

Share

Meta has launched SAM Audio, an open-source AI model that simplifies audio editing by isolating specific sounds from complex recordings using text, visual, or time-based prompts. The unified multimodal model can separate voices, instruments, and background noise without manual editing, though it raises questions about privacy safeguards and struggles with similar overlapping sounds.

Meta Unveils Open-Source AI Model for Prompt-Based Sound Separation

Meta has released SAM Audio, a new AI model designed to isolate and edit audio through simple prompts, marking the company's expansion of its Segment Anything Model family into the audio domain

1

. The open-source AI model is now available through Meta's Segment Anything Playground and for download via the company's website, GitHub, and Hugging Face

4

. Meta describes SAM Audio as "the first unified multimodal model for audio separation," capable of interpreting text prompts, visual selections in video, and time-segment markings to isolate specific sounds

1

.

Source: SiliconANGLE

Source: SiliconANGLE

How the Unified Multimodal Model Works with Audio Editing

SAM Audio supports three distinct prompting methods that can be used individually or combined for precise control. Users can describe sounds using text prompts like "drum beat" or "background noise," click on people or objects in videos to visually identify sounds, or mark time spans where specific sounds first appear

4

. The core technology relies on the Perception Encoder Audiovisual engine, built on Meta's open-source Perception Founder model released earlier this year

3

. This engine acts as the model's "ears," allowing it to comprehend described sounds, isolate them in audio files, and extract them without affecting other audio elements

3

.

Source: Gadgets 360

Source: Gadgets 360

Applications Span Music Production, Podcasting, and Accessibility

The AI model addresses use cases across multiple industries, including music production, podcasting, film and television, and scientific research

2

. Creators can clean up noisy recordings by removing traffic sounds from podcasts, isolate vocals from band recordings, or delete unwanted barking dogs from video presentations

3

. Meta has partnered with US hearing aid manufacturer Starkey to explore potential integrations and is working with 2gether-International, an accelerator for disabled startup founders, to develop accessibility solutions

1

. The model operates faster than real-time with RTF ≈ 0.7, processing audio efficiently at scale from 500M to 3B parameters

3

.

Source: Digital Trends

Source: Digital Trends

Privacy Concerns and Technical Limitations Emerge

Questions about safety features have surfaced given the model's ability to isolate specific sounds based on user prompts, potentially creating avenues for surveillance. When asked about safeguards, Meta only stated that use of SAM Audio must comply with applicable laws and regulations, including data protection laws, without detailing built-in protections

1

. The company acknowledged several limitations: SAM Audio cannot perform complete audio separation without prompting, does not support audio-based prompts, and struggles with highly similar audio events like isolating one voice from a choir or a single instrument from an orchestra

1

2

.

New Benchmark Measures Audio Separation Performance

To advance the field of audio separation, Meta introduced SAM Audio-Bench, a benchmark covering speech, music, and sound effects across text, visual, and span-prompt types

3

. The company also released SAM Audio Judge, which evaluates how natural and accurate separated audio sounds to human listeners without requiring reference tracks

2

. Performance evaluations show SAM Audio achieves state-of-the-art results in modality-specific tasks, with mixed-modality prompting delivering stronger outcomes than single-modality approaches

3

. The launch connects to Meta's broader AI strategy, including improving voice clarity on AI-powered glasses for noisy environments and developing conversational AI to rival ChatGPT

2

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo