Nvidia's Fugatto: A Revolutionary AI Model for Audio Generation and Transformation

24 Sources

[1]

PCWorld

Nvidia's new AI model can create 'unheard sounds' like never before

The new Fugatto AI model takes sound synthesis to a new level by creating sounds that previously haven't existed. Nvidia has been instrumental in the current AI boom that's going on, but primarily as the manufacturer of GPUs that power all the next-gen AI processing tasks. But the company isn't content with just providing shovels to all the diggers. They've gone ahead and joined in the fray with their own AI model that does something truly novel. Reported by Ars Technica, Nvidia's new AI model is called Fugatto and it combines new AI training methods and technologies to transform music, voices, and other sounds in ways that have never been done before, to create soundscapes never before experienced. Fugatto is based on an advanced AI architecture with 2.5 billion parameters, trained on over 50,000 hours of annotated audio data. The model uses a technique called Composable ART (Audio Representation Transformation), which can combine and control different sound properties based on text or audio prompts. The result is completely new sound combinations that weren't present in the training material. For example, Fugatto can generate audio of a violin that sounds like a laughing child, or a factory machine that screams in metallic pain. The model also allows fine-tuning of specific characteristics, such as amplifying or reducing French accents or adjusting the degree of sadness in a voice recording. In addition to combining and transforming sounds, Fugatto can perform classic AI audio tasks, such as changing the emotion of a voice, isolating vocals in music, or adapting musical instruments to new sound sources. For all the nitty-gritty details, you can read about Fugatto in Nvidia's official white paper (PDF). Otherwise, check out the Fugatto page with examples of emergent sounds and emergent tasks.

[2]

NDTV Gadgets 360

Nvidia's New AI Model Can Generate and Mix Different Types of Audio

Nvidia says Fugatto has the ability to combine free-form instructions Nvidia introduced a new artificial intelligence (AI) model on Monday that can generate a variety of audio and mix different types of sounds. The tech giant calls the foundation model Fugatto, which is short for Foundational Generative Audio Transformer Opus 1. While audio-focused AI platforms such as Beatoven and Suno exist, the company highlighted that Fugatto offers users granular control over the desired output. The AI model can generate or transform any mix of music, voices and sound based on specific prompts. In a blog post, the tech giant detailed its new large language model (LLM). Nvidia said Fugatto can generate music snippets, remove or add instruments from an existing song, change accent or emotion in a voice, and "even let people produce sounds never heard before." The AI model accepts both text and audio files as input, and users can combine both to fine-tune their requests. Under the hood, the foundation model's architecture is based on the company's previous work in speech modelling, audio vocoding, and audio understanding. Its full version uses 2.5 billion parameters and was trained on the datasets of Nvidia DGX systems. Nvidia highlighted that the team that built Fugatto collaborated from different countries globally including Brazil, China, India, Jordan, and South Korea. The collaboration of people from different ethnicities has also contributed to developing the AI model's multi-accent and multilingual capabilities, the company said. Coming to the AI audio model's capabilities, the tech giant highlighted that it has the capability to generate audio output types that it was not pre-trained on. Highlighting an example, Nvidia said, "Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create." Additionally, Fugatto can combine specific audio capabilities using a technique called ComposableART. With this, users can ask the AI model to generate an audio of a person speaking French with a sad feeling. Users can also control the degree of sorrow and the heaviness of the accent with specific instructions. Further, the foundation model can also generate audio with temporal interpolation, or sounds that change over time. For instance, users can generate the sound of a rainstorm with crescendos of thunder that fade into the distance. These soundscapes can also be experimented with, and even if it is a sound that the model has never processed before, it can create them. At present, the company has not shared any plans to make the AI model available to users or enterprises.

[3]

Ars Technica

Nvidia's new AI audio model can synthesize sounds that have never existed

At this point, anyone who has been following AI research is long familiar with generative models that can synthesize speech or melodic music from nothing but text prompting. Nvidia's newly revealed "Fugatto" model looks to go a step farther, using new synthetic training methods and inference-level combination techniques to "transform any mix of music, voices and sounds," including the synthesis of sounds that have never existed. While Fugatto isn't available for public testing yet, a sample-filled website showcases how Fugatto can be used to dial a number of distinct audio traits and descriptions up or down, resulting in everything from the sound of saxophones barking to people speaking underwater to ambulance sirens singing in a kind of choir. While the results on display can be a bit hit or miss, the vast array of capabilities on display here helps support Nvidia's description of Fugatto as "a Swiss Army knife for sound." In an explanatory research paper, over a dozen Nvidia researchers explain the difficulty in crafting a training dataset that can "reveal meaningful relationships between audio and language." While standard language models can often infer how to handle various instructions from the text-based data itself, it can be hard to generalize descriptions and traits from audio without more explicit guidance. To that end, the researchers start by using an LLM to generate a python script that can in turn create a large number of template-based and free-form instructions describing different audio "personas" (e.g. "standard, young-crowd, thirty-somethings, professional"). They then generate a set of both absolute (e.g. "synthesize a happy voice") and relative (e.g. "increase the happiness of this voice") instructions that can be applied to those personas. The wide array of open source audio datasets used as the basis for Fugatto generally don't have these kinds of trait measurements embedded in them by default. But the researchers make use of existing audio understanding models to create "synthetic captions" for their training clips based on their prompts, creating natural language descriptions that can automatically quantify traits such as gender, emotion, and speech quality. Audio processing tools are also used to describe and quantify training clips on a more acoustic level (e.g. "fundamental frequency variance" or "reverb").

[4]

TechSpot

Nvidia's Fugatto AI sound model is set to transform audio production

Forward-looking: A team of researchers from around the globe working with Nvidia have crafted what's being described as a Swiss Army knife for sound - an AI model capable of generating or transforming virtually any mix of music using any combination of audio files or text prompts. The new model is known as Fugatto, which is short for Foundational Generative Audio Transformer Opus 1. According to Nvidia, its capabilities are unparalleled. For example, Fugatto can create a tune based solely on text, change the emotion in a singer's voice or modify their accent, and even add or remove instruments from an existing song. Fugatto could revolutionize the music creation process. With it, a producer could quickly prototype an idea for a new song complete with custom voice styles and instruments, or adjust effects in an existing track. Ido Zmishlany, a multi-platinum producer and songwriter, believes AI and tools like Fugatto will help write the next chapter of music. That said, the model isn't limited to music production. Nvidia highlighted several alternate use cases, such as an advertising agency using it to modify voiceovers in a campaign to accommodate different regions, situations, or languages. The model could also help enhance language learning tools by allowing a user to customize the voice of the speaker, like making it sound like a friend or family member. Video game developers could use the tool to create new assets on the fly based on player inputs, or modify pre-recorded assets to best fit the level of on-screen action at any given time. Rafael Valle, one of the researchers that worked on the project, said they wanted to create a model that understands and generates sound like humans do. More than a year of work went into crafting the full version of Fugatto, which uses 2.5 billion parameters. Nvidia said the mode was trained on a group of DGX systems powered by 32 Nvidia H100 Tensor Core GPUs. Unfortunately, a timeline on when Fugatto might be released to the public was not shared.

[5]

Tom's Guide

Meet Fugatto -- an impressive new AI sound model from Nvidia

Graphics and AI giant NVIDIA has announced a new AI model called Fugatto (short for Foundational Generative Audio Transformer Opus 1). Developed by an international team of researchers. It is being billed as "the world's most flexible sound machine" taking on ElevenLabs and AI music maker Suno in one hit. With this model, we're about to witness a completely new paradigm in how sound and audio are manipulated and transformed by AI. It goes way beyond converting text to speech or producing music from text prompts and delivers some genuinely innovative features we haven't seen before. Fugatto isn't currently available to try as it's only a research paper but it is likely it will be made available to one or more Nvidia partners in the future and then we will start to see some significant changes in how sound is developed. Key to Nvidia Fugatto is its ability to exhibit emergent capabilities, which the team is calling ComposableART. This means it can do things it was not trained to do, by combining different capabilities together in new ways. The authors of the launch research paper describe how the model can produce a cello that shouts with anger, or a saxophone that barks. It may sound silly but some of the demonstrations seen on the project's homepage are very impressive. For example, the ability to instantly convert speech into different accents and emotional intensity, or adding and deleting instruments seamlessly to and from an existing music performance. We have seen some of this from other models such as OpenAI's Advanced Voice, ElevenLabs SFX model or Google's MusicFX experiment, but not in one model. One of the most striking examples the team puts forward is the on-the-fly generation of complex sound effects, some of which are completely new or wacky. Video game developers and those in the movie industry will either be salivating or sweating at the news that almost any kind of soundscape will soon be AI-generated at the touch of a button. The power of all this technology is delivered via a model that features 2.5 billion parameters and was trained on a huge suite of Nvidia computer processors, as you might expect. As with many of these early research demonstrations, it's likely to be a while before we see a fully fledged product released onto the market. Creating a four-second audio clip of a thunderstorm or a mechanical monster is one thing, making it usable in the real world is another. However, there's no question that the technology behind this new model shows that an important bridge has been crossed in the ability of the machine to master another art form. It may be the first time we've seen AI generational power of this type, but it's certainly not going to be the last.

[6]

Dataconomy

NVIDIA introduces Fugatto as "world's most flexible sound machine"

NVIDIA has unveiled Fugatto, a generative AI model capable of creating and modifying audio content. The model aims to assist music producers, film creators, and game developers by allowing them to generate novel sounds through text prompts. Fugatto combines various audio generation capabilities, employing advanced algorithms to enhance creative processes in the audio industry. Fugatto, short for Foundational Generative Audio Transformer Opus 1, was introduced by NVIDIA, the world's leading supplier of chips and software for AI systems. The technology can generate and alter sound from existing audio files, making it distinct from previous models. For instance, it can transform a piano melody into a human voice or modify a spoken recording's accent and emotional tone. This flexibility allows creators to explore a range of innovative applications across different fields. The team behind Fugatto consists of over a dozen researchers, including Rafael Valle, NVIDIA's applied audio research manager. Valle emphasized the goal of the project: "We wanted to create a model that understands and generates sound like humans do." Key to Fugatto's design is its ability to integrate multiple tasks related to audio generation and transformation, showcasing emergent properties that arise from its extensive training data. Users can instruct Fugatto with free-form prompts to create soundscapes, music snippets, or even unique sound effects. For example, a producer could quickly prototype different styles or instruments for a track. Notably, Fugatto features techniques like ComposableART, allowing users to amalgamate varying commands. Testing revealed surprising results, as suggested by Rohan Badlani, an AI researcher involved with the model, who described the experience as artistically rewarding despite his technical background. During its training, Fugatto utilized 2.5 billion parameters and was developed on NVIDIA's powerful DGX systems featuring 32 H100 Tensor Core GPUs. The model's training relied on a diverse, blended dataset comprising millions of audio samples, enhancing its multi-accent and multilingual functionality. This ambitious project also took over a year to develop, with the team overcoming several challenges in data generation and model training. Fugatto offers several potential applications, including for advertising agencies and language learning platforms. It's been suggested that marketing campaigns could benefit from its ability to tailor voiceovers with different accents or moods. In education, learners might enjoy personalized courses featuring familiar voices. Game developers could adapt in-game audio dynamically, integrating interactive elements that respond to user actions. While Fugatto's capabilities are impressive, NVIDIA has not announced immediate plans to release this technology to the public. The company expresses concern over potential misuse of generative AI, with Bryan Catanzaro, NVIDIA's vice president of applied deep learning research, highlighting the importance of caution given the risks associated with such technology. OpenAI and other firms in the field face similar challenges regarding the responsible deployment of their models, particularly concerning intellectual property rights and misinformation.

[7]

Softonic

Nvidia presents Fugatto, its revolutionary artificial intelligence model for audio and music - Softonic

The company does not plan to release the model to the public immediately Nvidia has announced Fugatto, an innovative artificial intelligence model designed to generate music, sound effects, and modify voices, aimed at music, film, and video game creators. This model, whose full name is Foundational Generative Audio Transformer Opus 1, can create from unique sounds to transform existing pieces, although the company has made it clear that it does not plan to release it to the public immediately. Nvidia's proposal joins other similar models, such as ElevenLabs or Descript, which generate voice and audio from textual descriptions. The company has wanted to highlight Fugatto's ability to create novel sounds, such as making a trumpet sound like a dog's bark. Additionally, it can modify existing audios, for example, transforming a melody played on the piano into a vocal line or changing the accent and emotion in a spoken recording. "Music has changed a lot in the last 50 years thanks to computers and synthesizers," explained Bryan Catanzaro, vice president of deep learning research at Nvidia. "I believe generative AI will bring new capabilities to music, video games, and anyone who wants to create." However, all these advances are not without ethical and legal issues, especially in the entertainment industry. The relationship between Hollywood and technology is in a very tense situation. For example, a few months ago, actress Scarlett Johansson accused OpenAI of using her voice in the advanced voice mode of ChatGPT, which allows for fluid conversations with the AI. Nvidia emphasized that Fugatto has been trained with open-source data and is still evaluating how and when it could be publicly accessible, aware of the inherent risks. The potential misuse of generative models, including the creation of false information or copyright infringement, is one of the biggest concerns that AI companies have (or should have). Both Nvidia and other companies, such as OpenAI and Meta, have not yet determined the measures to prevent these issues, nor have they set release dates for their most advanced models.

[8]

SiliconANGLE

Nvidia's new music generation model Fugatto creates 'never before heard sounds' - SiliconANGLE

Nvidia's new music generation model Fugatto creates 'never before heard sounds' Nvidia Corp. today joined the likes of Meta Platforms Inc., OpenAI and Runway AI Inc. in releasing a generative artificial intelligence model that's designed to create 'new' music and audio from human language prompts. According to the chipmaker, the new model is called Fugatto (Foundational Generative Audio Transformer Opus 1), and it's uniquely able to modify human voices and create "novel sounds" that no other model can produce. Nvidia, which is better known for making the powerful graphics processing units that power AI models, has not publicly released the model yet, due to concerns around safety. The company said Fugatto is different from other music and audio generation models because it has the ability to absorb and modify existing sounds. For instance, it can listen to a musical segment played on a piano, and transform that sound into notes sung by a human voice, or an alternative instrument like a violin. It can also take a human voice recording and alter the accent and mood expressed in the singing. It's perhaps deceiving to say that Fugatto's sounds are entirely novel, because like all AI models, the outputs come from an algorithm that uses existing data sources to try and create something that satisfies the user's prompted requests. Even so, Nvidia says Fugatto is able to "create soundscapes it's never seen before" by overlaying two distinct audio effects to create something original. In a video posted on YouTube, the company demonstrates how Fugatto can generate the sound of a train that slowly morphs into an orchestral performance, change happy voices into angry ones, and so on. Such capabilities haven't been seen before in an audio-generation model, Nvidia claims. Furthermore, beyond basic prompt engineering, Fugatto also comes with more fine-grained controls for users to edit the soundscapes they create. Nvidia's vice president of Applied Deep Learning Research, Bryan Catanzaro, told Reuters that generative AI has the potential to impact music production in the same way that electronic synthesizers did. "If we think about synthetic audio over the past 50 years, music sounds different now because of computers," he said. "Generative AI is going to bring new capabilities to music, to video games and to ordinary folks that want to create things." Nvidia isn't the first company to try its hand at generative AI music creation. Last month, Meta debuted a new model called Movie Gen, which is able to create both video and soundscapes for the short movies it generates. Nvidia didn't say much about the data used to train Fugatto, other than it's made up of "millions of audio samples" that come from open-source data. The company also confirmed that it doesn't have any plans to make Fugatto available to AI developers just yet, similar to Meta, which also declined to do so. According to Catazaro, his team is still debating how it can release the model to the public safely. "Any generative technology always carries some risks, because people might use that to generate things that we would prefer they don't," he said. "We need to be careful about that, which is why we don't have immediate plans to release this." In addition to the safety concerns, Nvidia is no doubt mindful of potential copyright issues. In June, record labels including which represents plaintiffs including Sony Music Entertainment, Warner Music Group Inc. and Universal Music Group N.V. filed lawsuits against the generative AI music startups Suno Inc. and Uncharted Labs Inc., accusing them of "widespread infringement" of copyrighted sound recordings at an "almost unimaginable scale." The relationship between AI and Hollywood is just as tense. While some AI firms, like OpenAI, are trying to negotiate with Hollywood studios over the use of their data, the actress Scarlett Johansson has openly accused OpenAI of cloning her voice and has threatened to take legal action against the company.

[9]

engadget

NVIDIA's new AI model Fugatto can create audio from text prompts

NVIDIA has debuted a new experimental generative AI model, which it describes as "a Swiss Army knife for sound." The model called Foundational Generative Audio Transformer Opus 1, or Fugatto, can take commands from text prompts and use them to create audio or to modify existing music, voice and sound files. It was designed by a team of AI researchers from around the world, and NVIDIA says that made the model's "multi-accent and multilingual capabilities stronger." "We wanted to create a model that understands and generates sound like humans do," said Rafael Valle, one of the researchers behind the project and a manager of applied audio research at NVIDIA. The company listed some possible real-world scenarios wherein Fugatto could be of use in its announcement. Music producers, it suggested, could use the technology to quickly generate a prototype for a song idea, which they can then easily edit to try out different styles, voices and instruments. People could use it to generate materials for language learnings tools in the voice of their choice. And video game developers could use it to create variations of pre-recorded assets to fit changes in the game based on the players' choices and actions. In addition, the researchers found that the model can accomplish tasks not part of its pre-training, with some fine-tuning. It could combine instructions that it was trained on separately, such as generating speech that sounds angry with a specific accent or the sound of birds singing during a thunderstorm. The model can generate sounds that change over time, as well, like the pounding of a rainstorm as it moves across the land. NVIDIA didn't say if it will give the public access to Fugatto, but the model isn't the first generative AI technology that can create sounds out of text prompts. Meta previously released an open source AI kit that can create sounds from text descriptions. Google has its own text-to-music AI called MusicLM that people can access through the company's AI Test Kitchen website.

[10]

Digital Trends

Nvidia's new AI model makes music from text and audio prompts | Digital Trends

Nvidia has released a new generative audio AI model that is capable of creating myriad sounds, music, and even voices, based on the user's simple text and audio prompts. Dubbed Fugatto (aka Foundational Generative Audio Transformer Opus 1) the model can, for example, create jingles and song snippets based solely on text prompts, add or remove instruments and vocals from existing tracks, modify both the accent and emotion of a voice, and "even let people produce sounds never heard before," per Monday's announcement post. Recommended Videos "We wanted to create a model that understands and generates sound like humans do," said Rafael Valle, a manager of applied audio research at Nvidia. "Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale." The company notes that music producers could use the AI model to rapidly prototype and vet song ideas in various musical styles with varying arrangements, or add effects and additional layers to existing tracks. The model could also be leveraged to adapt and localize the music and voiceovers of an existing ad campaign, or adjust the music of a video game on the fly as the player plays through a level. The model is even capable of generating previously unheard sounds like barking trumpets or meowing saxophones. In doing so, it uses a technique called ComposableART to combine the instructions it learned during training. "I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one," Nvidia AI researcher Rohan Badlani wrote in the announcement post. "In my tests, the results were often surprising and made me feel a little bit like an artist, even though I'm a computer scientist." The Fugatto model itself uses 2.5 billion parameters and was trained on 32 H100 GPUs. Audio AI's like this are becoming increasingly common. Stability AI unveiled a similar system in April that can generate tracks up to three minutes in length while Google's V2A model can generate "an unlimited number of soundtracks for any video input." YouTube recently released an AI music remixer that generates a 30-second sample based on the input song and the user's text prompts. Even OpenAI is experimenting in this space, having released an AI tool in April that needs just 15 seconds of sample audio in order to fully clone a user's voice and vocal patterns.

[11]

Fast Company

Nvidia's new AI model Fugatto can generate music and audio

Nvidia, the world's biggest supplier of chips and software used to create AI systems, said it does not have immediate plans to publicly release the technology, which it calls Fugatto, short for Foundational Generative Audio Transformer Opus 1. It joins other technologies shown by startups such as Runway and larger players such as Meta Platforms that can generate audio or video from a text prompt. Santa Clara, California-based Nvidia's version generates sound effects and music from a text description, including novel sounds such as making a trumpet bark like a dog.

[12]

The How-To Geek

NVIDIA Fugatto Wants to Change How We Work With Audio

NVIDIA researchers have introduced Fugatto, a new AI model that can modify or generate audio based on natural language input. Unlike other AI tools that focus on specific tasks like writing songs or changing voices, Fugatto (short for Foundational Generative Audio Transformer Opus 1) offers a lot of flexibility and can handle many different audio tasks using both text and audio inputs. The model can create audio and music based on text descriptions, change existing songs by adding or taking away instruments, adjust the tone or emotion of voices, and even invent new sounds. It can also improve audio quality and act as a springboard for musical ideas. Fugatto's design uses a method called ComposableART, which lets users mix different audio instructions while it's working. This means users can combine things like voice accents and emotions in detailed ways. It can also create long, modulated, evolving audio scenes, such as a rainstorm that gradually changes into the sounds of a morning chorus. The development of Fugatto took several years and involved a team of people from around the world. They used a large collection of audio samples, plus powerful DGX systems, to build the model's 2.5 billion parameters. One of the main challenges was creating a mixed dataset that would allow the model to handle a variety of tasks effectively. The team used different strategies to create and analyze data, which helped improve the model's functions while minimizing the size of its dataset. Fugatto is an exciting development in the world of audio creation, ideation, and editing. It clearly has the potential to aid music or film development. That said, it is not a finished product. You cannot install or test Fugatto, and NVIDIA has not provided a timeline for the model's release. Fugatto may simply be a proof of concept. Source: NVIDIA

[13]

The Verge

Nvidia claims a new AI audio generator can make sounds never heard before

Nvidia says its new AI music editor can create "sounds never heard before" -- like a trumpet that meows. The tool, called Fugatto, is capable of generating music, sounds, and speech using text and audio inputs it's never been trained on. As shown in this video embedded below, this allows Fugatto to put together songs based on wild prompts, like "Create a saxophone howling, barking then electronic music with dogs barking." Some other examples shared by the company include the ability to produce unique sound effects based on a description, like "Deep, rumbling bass pulses paired with intermittent, high-pitched digital chirps, like the sound of a massive sentient machine waking up." It can even transform the sound of someone's voice, changing their accent or giving them a different tone, like angry or calm. There are ways to edit music, too, as Fugatto can isolate the vocals in a song, add instruments, and even change up a melody by swapping out a piano for an opera singer. A paper released with the announcement shows the long list of all the datasets Nvidia says Fugatto was trained on, one of which includes a library of sound effects from the BBC. There are already several other AI audio tools out there, including those from Stability AI, OpenAI, Google DeepMind, ElevenLabs, and Adobe, but not ones claiming to create completely new and unheard-of sounds. Some AI startups are even facing copyright lawsuits over their music creation tools, while a recent report found that Nvidia and other companies trained AI models on subtitles from thousands of YouTube videos. To build Fugatto, Nvidia says researchers had to put together a dataset with millions of audio samples. They then created instructions "that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data." Nvidia doesn't say when -- or if -- the tool will be widely available.

[14]

PYMNTS

Nvidia Says AI Model Generates 'Sounds Never Heard Before' | PYMNTS.com

Nvidia has unveiled an AI model it dubs "a Swiss Army knife for sound." Fugatto (or "Foundational Generative Audio Transformer Opus 1") is an artificial intelligence (AI) tool that can take prompts using any mix of text and audio files to generate or transform any combination of sounds, music and voices, the tech giant said Monday (Nov. 25). "For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice -- even let people produce sounds never heard before," the company wrote on its blog. Nvidia argues that Fugatto, which supports numerous audio generation and transformation tasks, is the first foundational generative AI model that showcases emergent properties -- capabilities stemming from the interaction of its various trained abilities -- and the ability to meld free-form instructions. "Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale," said Rafael Valle, a manager of applied audio research at Nvidia. An orchestral conductor and composer, he is among the dozen-plus people who helped develop Fugatto. Valle noted that music producers could use Fugatto to quickly prototype or edit an idea for a song, testing different styles, voices and instruments, or add effects and improve the overall sound quality of an existing track. But the tool's use goes beyond music, the company said. Ad agencies could employ Fugatto to target campaigns for multiple regions or situations, applying a range of different accents and emotions to voiceovers. And video game companies could use the tool to modify prerecorded audio to it changing action as players progress in a game. The launch of Fugatto comes days after Nvidia released quarterly earnings showing a 94% jump in revenue. And as covered here last week, CEO Jensen Huang is not resting on his laurels after reaching that milestone. "Many AI services are running 24/7, just like any factory," Huang said during an earnings call. "We're going to see this new type of system come online. And I call it [the company's data centers] an AI factory because that's really close to what it is. It's unlike a data center of the past. And these fundamental trends are really just beginning. We expect this to happen, this growth, this modernization and the creation of a new industry to go on for several years." As PYMNTS wrote, Huang and CFO Colette Kress clearly believe that the company's best days are ahead of it, despite analysts wondering whether or not it can keep up the pace in several areas: large language model (LLM) development, AI usage scale and the rapid-fire revenue growth it has enjoyed over the past two years.

[15]

TelecomTalk

Nvidia Unveils New AI Model Fugatto That Generates Audio from Text and Audio

Fugatto is powered by Nvidia's H100 GPUs and a global team of researchers. Nvidia has unveiled a new generative AI model that can create any combination of music, voices and sounds using text and audio as inputs. Called Fugatto, (Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts, using any combination of text and audio files. "While some AI models can compose a song or modify a voice, none have the dexterity of the new offering," said Nvidia in a blog post on Monday. Also Read: Anthropic Unveils New AI Model with Computer Use Capability Nvidia describes this model as a "Swiss Army knife for sound," one that allows users to control the audio output simply using text. Fugatto can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice and even let people produce sounds never heard before, the company explained. "We wanted to create a model that understands and generates sound like humans do," said Rafael Valle, a manager of applied audio research at Nvidia. Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties -- capabilities that arise from the interaction of its various trained abilities -- and the ability to combine free-form instructions, Nvidia said. "Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale," Valle added. Also Read: Microsoft Launches Industry-Specific AI Models to Drive Business Transformation According to Nvidia, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track. An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers. Additionally, Nvidia says language learning tools could be personalised to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend. Video game developers could use the AI model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets easily from text instructions and optional audio inputs. Also Read: Microsoft Announces New AI Models and Solutions for Healthcare Nvidia said Fugatto is a foundational generative transformer model that builds on prior work in areas such as speech modeling, audio vocoding and audio understanding. Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. "Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger," said the company. The full version used 2.5 billion parameters and was trained on a bank of Nvidia DGX systems, equipped with 32 Nvidia H100 Tensor Core GPUs.

[16]

Quartz

Nvidia's AI audio model, Claude's new features, and funding for AI agents: This week's AI launches

Nvidia (NVDA) announced its new AI audio model, Fugatto, this week that can generate or transform "any mix of music, voices and sounds described with prompts using any combination of text and audio files." Fugatto is short for Foundational Generative Audio Transformer Opus 1, Nvidia said. With the new model, users can enter a text prompt and generate a music snippet, remove or add instruments to an already existing song, change accents or emotions in a voice, and "produce sounds never heard before." "Fugatto is the first foundational generative AI model that showcases emergent properties -- capabilities that arise from the interaction of its various trained abilities -- and the ability to combine free-form instructions," Nvidia said.

[17]

NVIDIA Blog

Now Hear This: World's Most Flexible Sound Machine Debuts

Using text and audio as inputs, a new generative AI model from NVIDIA can create any combination of music, voices and sounds. A team of generative AI researchers created a Swiss Army knife for sound, one that allows users to control the audio output simply using text. While some AI models can compose a song or modify a voice, none have the dexterity of the new offering. Called Fugatto (short for Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files. For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice -- even let people produce sounds never heard before. "This thing is wild," said Ido Zmishlany, a multi-platinum producer and songwriter -- and cofounder of One Take Audio, a member of the NVIDIA Inception program for cutting-edge startups. "Sound is my inspiration. It's what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible." "We wanted to create a model that understands and generates sound like humans do," said Rafael Valle, a manager of applied audio research at NVIDIA and one of the dozen-plus people behind Fugatto, as well as an orchestral conductor and composer. Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties -- capabilities that arise from the interaction of its various trained abilities -- and the ability to combine free-form instructions. "Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale," Valle said. For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track. "The history of music is also a history of technology. The electric guitar gave the world rock and roll. When the sampler showed up, hip-hop was born," said Zmishlany. "With AI, we're writing the next chapter of music. We have a new instrument, a new tool for making music -- and that's super exciting." An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers. Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend. Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs. "One of the model's capabilities we're especially proud of is what we call the avocado chair," said Valle, referring to a novel visual created by a generative AI model for imaging. For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create. With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt. Several capabilities add to Fugatto's novelty. During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent. The model's ability to interpolate between instructions gives users fine-grained control over text instructions, in this case the heaviness of the accent or the degree of sorrow. "I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one," said Rohan Badlani, an AI researcher who designed these aspects of the model. "In my tests, the results were often surprising and made me feel a little bit like an artist, even though I'm a computer scientist," said Badlani, who holds a master's degree in computer science with a focus on AI from Stanford. The model also generates sounds that change over time, a feature he calls temporal interpolation. It can, for instance, create the sounds of a rainstorm moving through an area with crescendos of thunder that slowly fade into the distance. It also gives users fine-grained control over how the soundscape evolves. Plus, unlike most models, which can only recreate the training data they've been exposed to, Fugatto allows users to create soundscapes it's never seen before, such as a thunderstorm easing into a dawn with the sound of birds singing. Fugatto is a foundational generative transformer model that builds on the team's prior work in areas such as speech modeling, audio vocoding and audio understanding. The full version uses 2.5 billion parameters and was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs. Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger. One of the hardest parts of the effort was generating a blended dataset that contains millions of audio samples used for training. The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data. They also scrutinized existing datasets to reveal new relationships among the data. The overall work spanned more than a year. Valle remembers two moments when the team knew it was on to something. "The first time it generated music from a prompt, it blew our minds," he said. Later, the team demoed Fugatto responding to a prompt to create electronic music with dogs barking in time to the beat. "When the group broke up with laughter, it really warmed my heart."

[18]

Silicon Republic

Nvidia claims new AI model can generate new sounds

The company's newest model comes at a time when Big Tech is under fire for how its AI technology affects creative industries. Tech giant Nvidia has unveiled its latest AI model, which it describes as "the world's most flexible sound machine". Fugatto, which is short for Foundational Generative Audio Transformer Opus 1, can generate any mix of music, voices or sounds described with prompts using a combination of text and audio files, the company said. The model was built on Nvidia's previous work around speech modelling, voice encoding and audio understanding. Nvidia said the Fugatto was created by people from around the world, including India, Brazil, China, Jordan and South Korea, making its "multi-accent and multilingual capabilities stronger". Rafael Valle, a manager of applied audio research at Nvidia, said the company wanted to create a model that understands and generates sounds like humans do. "Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale," he said. Nvidia also claims that Fugatto allows users to create soundscapes "it's never seen before", setting it apart from other models. However, it's important to take a company's claims about its own models with a pinch of salt. Earlier this year, the Stanford AI Index claimed that robust evaluations for large language models are "seriously lacking" and there is a lack standardisation in responsible AI reporting. And last year, the Foundational Model Transparency Index created by US researchers suggested that companies in the foundational AI model space are becoming less transparent about their creations. Under scrutiny in more ways than one Nvidia has been investing heavily in the AI space, along with many other tech giants and has thus far managed to reap the benefits. In May 2023, it became the first chipmaker to reach a $1trn valuation and in June of this year, it became the world's most valuable company. But the company has come under fire for stifling competition, both in the chips market and in the AI market. And outside of competition investigations, the AI chipmaker has also come under fire for allegedly using copyrighted books to train AI as questions over artificial intelligence's threat to creative industries rumbles on. Last year also saw thousands sign a letter written by the US Authors Guild, calling on the likes of OpenAI, Alphabet and Meta to stop using their work to train AI models without "consent, credit or compensation". Earlier this year, hundreds of musicians - including Billie Eilish and Katy Perry - signed an open letter calling on developers to stop using AI to "devalue the rights of human artists". And in May, Sony wrote to more than 700 tech companies asking them to refrain from using its content to train AI models. Don't miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic's digest of need-to-know sci-tech news.

[19]

TechRadar

Ever wanted to hear a saxophone bark? Nvidia just made the 'world's most flexible sound machine' that uses AI to blend music, voices and sounds

Fugatto promies to create unique sounds, audio mixes, speech, and more Nvidia has announced a new generative AI audio tool called Fugatto, which it's describing as the "world's most flexible sound machine" - capable of producing all kinds of music, speech, and other audio, and even unique sounds that have never been heard before. Fugatto, which is short for Foundational Generative Audio Transformer Opus 1, can work with text prompts and audio samples. You can simply describe what you want to hear, or get the AI model to modify or combine existing audio clips. For example, you can have the sound of a train transform into a lush orchestral arrangement, or mix a banjo melody with the sounds of rainfall. You can hear the sound of a saxophone barking, or a flute meowing, just by typing in a prompt. Fugatto can also isolate vocals from tracks, and change the vocal delivery style, as well as generate speech from scratch. Feed in an existing melody, and you can have it played on whatever instrument you like, in any kind of style. So how can you try out this impressive new AI technology? You can't, for the time being: you'll have to make do with Nvidia's promo video and a website of samples. There's no word yet on when Fugatto will be available for public testing. Some of the samples published by Nvidia include the sound of a female voice barking, a factory machine screaming, a typewriter whispering, and a cello shouting with anger. You can see the wide variety of audio effects that are possible. Nvidia has also demonstrated how the AI engine is able to produce spoken word clips, which can then be delivered with a range of different emotions (from angry to happy) and even with different accents applied. "We wanted to create a model that understands and generates sound like humans do," says Nvidia's Rafael Valle, one of the Fugatto team. "Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale."

[20]

The Information

Nvidia Unveils AI Model to Generate Music

Nvidia on Monday unveiled it was working on Fugatto, a new artificial intelligence model for generating audio based on text prompts. Fugatto, an abbreviation of Foundational Generative Audio Transformer Opus 1, can compose a new piece of music based on a text prompt, or users can upload their own audio and request tweaks like adding in new instruments, according to Nvidia's announcement. If

[21]

Economic Times

Nvidia shows AI model that can modify voices, generate novel sounds

What makes it different from other AI technologies is its ability to take in and modify existing audio, for example by taking a line played on a piano and transforming it into a line sung by a human voice, or by taking a spoken word recording and changing the accent used and the mood expressed.Nvidia on Monday showed a new artificial intelligence model for generating music and audio that can modify voices and generate novel sounds - technology aimed at the producers of music, films and video games. Nvidia, the world's biggest supplier of chips and software used to create AI systems, said it does not have immediate plans to publicly release the technology, which it calls Fugatto, short for Foundational Generative Audio Transformer Opus 1. It joins other technologies shown by startups such as Runway and larger players such as Meta Platforms that can generate audio or video from a text prompt. Santa Clara, California-based Nvidia's version generates sound effects and music from a text description, including novel sounds such as making a trumpet bark like a dog. What makes it different from other AI technologies is its ability to take in and modify existing audio, for example by taking a line played on a piano and transforming it into a line sung by a human voice, or by taking a spoken word recording and changing the accent used and the mood expressed. "If we think about synthetic audio over the past 50 years, music sounds different now because of computers, because of synthesizers," said Bryan Catanzaro, vice president of applied deep learning research at Nvidia. "I think that generative AI is going to bring new capabilities to music, to video games and to ordinary folks that want to create things." While companies such as OpenAI are negotiating with Hollywood studios over whether and how the AI could be used in the entertainment industry, the relationship between tech and Hollywood has become tense, particularly after Hollywood star Scarlett Johansson accused OpenAI of imitating her voice. Nvidia's new model was trained on open-source data, and the company said it is still debating whether and how to release it publicly. "Any generative technology always carries some risks, because people might use that to generate things that we would prefer they don't," Catanzaro said. "We need to be careful about that, which is why we don't have immediate plans to release this." Creators of generative AI models have yet to determine how to prevent abuse of the technology such as a user generating misinformation or infringing on copyrights by generating copyrighted characters. OpenAI and Meta similarly have not said when they plan to release to the public their models that generate audio or video.

[22]

Market Screener

Nvidia shows AI model that can modify voices, generate novel sounds

(Reuters) - Nvidia on Monday showed a new artificial intelligence model for generating music and audio that can modify voices and generate novel sounds - technology aimed at the producers of music, films and video games. Nvidia, the world's biggest supplier of chips and software used to create AI systems, said it does not have immediate plans to publicly release the technology, which it calls Fugatto, short for Foundational Generative Audio Transformer Opus 1. It joins other technologies shown by startups such as Runway and larger players such as Meta Platforms that can generate audio or video from a text prompt. Santa Clara, California-based Nvidia's version generates sound effects and music from a text description, including novel sounds such as making a trumpet bark like a dog. What makes it different from other AI technologies is its ability to take in and modify existing audio, for example by taking a line played on a piano and transforming it into a line sung by a human voice, or by taking a spoken word recording and changing the accent used and the mood expressed. "If we think about synthetic audio over the past 50 years, music sounds different now because of computers, because of synthesizers," said Bryan Catanzaro, vice president of applied deep learning research at Nvidia. "I think that generative AI is going to bring new capabilities to music, to video games and to ordinary folks that want to create things." While companies such as OpenAI are negotiating with Hollywood studios over whether and how the AI could be used in the entertainment industry, the relationship between tech and Hollywood has become tense, particularly after Hollywood star Scarlett Johansson accused OpenAI of imitating her voice. Nvidia's new model was trained on open-source data, and the company said it is still debating whether and how to release it publicly. "Any generative technology always carries some risks, because people might use that to generate things that we would prefer they don't," Catanzaro said. "We need to be careful about that, which is why we don't have immediate plans to release this." Creators of generative AI models have yet to determine how to prevent abuse of the technology such as a user generating misinformation or infringing on copyrights by generating copyrighted characters. OpenAI and Meta similarly have not said when they plan to release to the public their models that generate audio or video. (Reporting by Stephen Nellis in San Francisco; Editing by Will Dunham)

[23]

Gizmodo

Want to Hear a Saxophone Bark Like a Dog? Nvidia's New AI Audio Generator Has You Covered

Nvidiaâ€™s Fugatto is designed for â€~film or audio productions,â€™ but it can also create the sound of a saxophone howling like a dog. Nvidia wants to let you knowÂ that your weirdest audio whims are now possible. The companyâ€™s latest AI project, along with itsÂ AI NPCsÂ andÂ in-game chatbot, is a text-to-audio AI called Fugatto. Like other AI audio generators, it can create tracks from a simple description, but this program can also create â€œsounds never heard before,â€ such as a "saxophone howl,â€ whatever that means. In a blog post, Nvidia claimed its â€œSwiss army knife for soundâ€ AI model can modify existing sounds or create entire soundscapes out of whole cloth. Fugatto is actually an acronym for the obnoxiously long â€œFoundational Generative Audio Transformer Opus 1.â€ Itâ€™s capable of processing voices, music, and background noise and producing them all into a single audio track. It can also modify existing sound sources. Itâ€™s silly to call anything "a sound never heard before," especially if it comes from AI. Whatever the output, the audio is merely an AI algorithm using existing sources in its training data to supply a result that approximates the prompt. Nvidia said its model is unique since it can combine instructions that were separate during training and â€œcreate soundscapes it's never seen before.â€ This means it can overlay two distinct audio effects to create something new. In a video, Nvidia showed how it could generate the sound of a train that morphs into an orchestral score. It can also create the sound of a rainstorm that fades into the distance. These are capabilities we havenâ€™t seen before. Beyond a prompt to demo â€œelectronic music with dogs barking in time to the beat,â€ Nvidia said its tool offers â€œfine-grained controlâ€ over the created soundscapes. Nvidia claims the narrator for the video was an AI version of Nvidia CEO Jensen Huang, though if Fugatto produced the obviously fake voice, the AI model needs more work before anybody uses it for their next deepfake project. Plenty of AI audio tools alreadyÂ take text prompts and turn them into audio tracks. Adobe has shopped its own Project MusicGenAI Control tool to unscrupulous musicians. Big tech companies like Meta have already promoted their audio models to the movie industry. Last month,Â Meta debuted Movie Gen, which can generate soundscapes for AI-generated films. Nvidia quotes AI researcher Rohana Badlani, who said the model â€œmade me feel a little bit like an artist,â€ though, of course, the AI draws from thousands of gigabytes worth of existing music and audio data. Nvidia did not share exact details about its dataset and only said it contains â€œmillions of audio samples used for training.â€ The full version of Fugatto is a 2.5 billion-parameter model trained on Nvidiaâ€™s own banks of its famed H100 AI GPUs. Itâ€™s bad news for foley artists, who have made that kind of audio fakery into a renowned art form. The company said Fugatto could be a useful tool for ad agencies, video game developers, or musicians who want to sample changes to their work without doing much extra work. Still, the other side of the coin is all those people who would use it to make â€œnew assets,â€ AKA potentially adding more AI slop to the growing pile. Fugatto potentially has more utility than merely giving an excuse for movie production companies to replace human audio engineers. Nvidia claims it can remove or add instruments to existing music. It can also isolate and modify specific noise from existing sources. Maybe you can get away with generating empty drum rhythms to your blasÃ© synthesizer score, but an entire soundtrack generated with nothing but AI isnâ€™t what most people pay for when buying a movie ticket.

[24]

Benzinga

Nvidia Unveils Fugatto: AI Tool That Transforms Sound Creation For Music, Ads, And Gaming - NVIDIA (NASDAQ:NVDA)

Fugatto combines prompts to generate unique soundscapes, from barking trumpets to dynamic storm-to-dawn transitions. Nvidia Corp NVDA showcased a groundbreaking generative AI model named Fugatto. This model is designed as a versatile tool for creating and modifying sounds using text and audio prompts. Fugatto can generate and transform a mix of music, voices, and soundscapes, offering unprecedented capabilities to musicians, developers, and content creators. Also Read: EchoStar Shares Sink As DirecTV Terminates Acquisition Agreement Fugatto, short for Foundational Generative Audio Transformer Opus 1, supports multiple tasks, such as generating new music, altering accents or emotions in voices, and crafting entirely novel soundscapes. These features mark a significant leap in audio AI innovation. Fugatto empowers users to create audio that combines various instructions and prompts. For example, it can produce a trumpet sound mimicking a barking dog or generate a voice with a specific accent and tone. Beyond music, Fugatto opens possibilities for advertising, education, and gaming. Advertisers can adjust campaign voiceovers for regional audiences, while educators can personalize content with voices familiar to learners. Game developers can modify audio assets or generate them dynamically based on gameplay. Fugatto, powered by a 2.5-billion-parameter generative transformer, was trained on Nvidia DGX systems with 32 H100 Tensor Core GPUs. Its development involved a diverse team spanning several countries, which enhanced its multilingual and multi-accent capabilities. The model's training relied on millions of audio samples, carefully curated to enable complex and diverse tasks. Fugatto's debut signifies a milestone in generative AI, promising to reshape how professionals interact with sound. The model enables users to combine attributes like accent, tone, and emotion into one cohesive sound. For instance, it can create a dynamic soundscape transitioning from a thunderstorm to a tranquil dawn. Nvidia stock surged 186% year-to-date. Last week, Nvidia reported a third-quarter revenue of $35.1 billion, up 94%, beating the consensus estimate of $33.12 billion. The company reported EPS of 81 cents, which beat the Street consensus estimate of 75 cents. Nvidia expects fourth-quarter revenue to be $37.5 billion plus or minus 2%. Analysts highlighted Nvidia's leadership in AI and data center technologies, with robust demand for its Blackwell and Hopper product lines driving growth through 2025. Investors can gain exposure to Nvidia through ProShares Ultra Semiconductors USD and EA Series Trust Strive U.S. Semiconductor ETF SHOC. Price Action: NVDA stock is down 2.7% at $138.12 at last check Monday. Also Read: Boeing Secures $2.38 Billion US Air Force Deal for Advanced KC-46A Tankers Photo via Shutterstock. Market News and Data brought to you by Benzinga APIs

Twitter

Facebook

Copy Link

Nvidia introduces Fugatto, an advanced AI model capable of generating and transforming various types of audio, including music, voices, and sound effects. This innovative technology promises to revolutionize audio production across multiple industries.

Introducing Nvidia's Fugatto: A New Frontier in AI Audio Generation

Nvidia, a company primarily known for its GPU manufacturing, has unveiled a groundbreaking AI model called Fugatto, short for Foundational Generative Audio Transformer Opus 1. This innovative technology is set to revolutionize the audio industry by offering unprecedented capabilities in sound generation and transformation 1

Advanced Architecture and Training

Fugatto boasts an advanced AI architecture with 2.5 billion parameters, trained on over 50,000 hours of annotated audio data 1

. The model was developed using Nvidia DGX systems, powered by 32 Nvidia H100 Tensor Core GPUs, showcasing the company's commitment to pushing the boundaries of AI technology 4

Unique Capabilities and Applications

What sets Fugatto apart is its ability to generate and manipulate audio in ways never before possible. The model can:

Create entirely new sounds by combining different audio properties 1
1
Transform existing audio, such as changing emotions in voices or modifying accents 2
2
Add or remove instruments from music tracks 4
4
Generate complex sound effects and soundscapes 5
5

One of Fugatto's most impressive features is its use of Composable ART (Audio Representation Transformation), which allows for the combination and control of different sound properties based on text or audio prompts 1

Potential Industry Impact

The versatility of Fugatto opens up numerous possibilities across various industries:

Music Production: Producers can quickly prototype ideas and adjust existing tracks with unprecedented ease 4
4
Advertising: Agencies can modify voiceovers for different regions or languages 4
4
Language Learning: Tools can be enhanced with customizable voice options 4
4
Video Game Development: Developers can create dynamic audio assets based on player inputs 4
4
Film and Television: Sound designers can generate complex soundscapes on demand 5
5

Collaborative Development and Future Prospects

Fugatto was developed by an international team of researchers from countries including Brazil, China, India, Jordan, and South Korea. This diverse collaboration contributed to the model's multi-accent and multilingual capabilities 2

While Fugatto is not yet available for public testing, Nvidia has showcased its capabilities through a sample-filled website and a detailed research paper 3

. The company has not announced specific plans for public release, but it's likely that Fugatto will be made available to Nvidia partners in the future 5

As AI continues to evolve, Fugatto represents a significant milestone in audio technology, promising to reshape how we create, manipulate, and experience sound across various media and industries.

References

Summarized by

Navi

[1]

PCWorld

Nvidia's new AI model can create 'unheard sounds' like never before

[2]

NDTV Gadgets 360

Nvidia's New AI Model Can Generate and Mix Different Types of Audio

[3]

Ars Technica

Nvidia's new AI audio model can synthesize sounds that have never existed

[4]

TechSpot

Nvidia's Fugatto AI sound model is set to transform audio production

[5]

Tom's Guide

Meet Fugatto -- an impressive new AI sound model from Nvidia

Weekly Highlights

Today's Top Stories

Google's AI Strategy Pays Off with Historic $100 Billion Quarter

Alphabet achieves its first-ever $100 billion quarterly revenue milestone, driven by AI integration across search, cloud, and YouTube. Google Cloud revenue surged 34% to $15.2 billion while the company announced massive AI infrastructure investments of up to $93 billion for 2025.

6 Sources

Business and Economy

15 hrs ago

Microsoft Reports Record $77.7 Billion Revenue as AI Investments Surge to $34.9 Billion

Microsoft exceeded Wall Street expectations with 18% revenue growth driven by strong Azure cloud performance and AI adoption, while dramatically increasing AI infrastructure spending. The company's deepened partnership with OpenAI includes a $135 billion stake and extended exclusive rights through 2032.

9 Sources

Business and Economy

2 hrs ago

Universal Music Group Settles Copyright Lawsuit with AI Startup Udio, Partners on New Music Platform

Universal Music Group has reached a groundbreaking settlement with AI music generator Udio, ending their copyright infringement lawsuit and announcing plans for a new AI-powered music platform launching next year. This marks the first major licensing deal between a record label giant and an AI music startup.

7 Sources

Business and Economy

2 hrs ago

YouTube Introduces AI-Powered Video Upscaling and Enhanced TV Features

YouTube launches automatic AI upscaling for low-resolution videos to HD and 4K, alongside enhanced thumbnails and new TV-focused features. Creators maintain control with opt-out options while the platform aims to improve viewing experience on its fastest-growing surface.

12 Sources

Technology

22 hrs ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

News

About

Nvidia's Fugatto: A Revolutionary AI Model for Audio Generation and Transformation

Introducing Nvidia's Fugatto: A New Frontier in AI Audio Generation

Advanced Architecture and Training

Unique Capabilities and Applications

Potential Industry Impact

Collaborative Development and Future Prospects

References

Nvidia's new AI model can create 'unheard sounds' like never before

Nvidia's New AI Model Can Generate and Mix Different Types of Audio

Nvidia's new AI audio model can synthesize sounds that have never existed

Nvidia's Fugatto AI sound model is set to transform audio production

Meet Fugatto -- an impressive new AI sound model from Nvidia

Related Stories

NVIDIA's Blackwell GPUs and RTX 50 Series: Revolutionizing AI for Consumers and Creators

Nvidia Unveils AI Breakthroughs for Real-World Applications in Biotech, Robotics, and Autonomous Vehicles

NVIDIA CEO Envisions AI Assistants for Everyone, Introduces 'Physical AI'

Weekly Highlights

Tech Giants Triple Down on AI Infrastructure as Spending Soars to Unprecedented Levels

OpenAI Completes Historic Restructuring, Creates $500 Billion Public Benefit Corporation

Qualcomm Challenges Nvidia with New AI Chips for Data Centers

Weekly Highlights

Today's Top Stories

Google's AI Strategy Pays Off with Historic $100 Billion Quarter

Microsoft Reports Record $77.7 Billion Revenue as AI Investments Surge to $34.9 Billion

Universal Music Group Settles Copyright Lawsuit with AI Startup Udio, Partners on New Music Platform

YouTube Introduces AI-Powered Video Upscaling and Enhanced TV Features