Curated by THEOUTPOST
On Tue, 13 Aug, 12:02 AM UTC
2 Sources
[1]
How to Tell If That Song Was Made With AI
This post is part of Lifehacker's "Exposing AI" series. We're exploring six different types of AI-generated media, and highlighting the common quirks, byproducts, and hallmarks that help you tell the difference between artificial and human-created content. Of all the AI-generated content out there, AI music might be weirdest. It doesn't feel like it should be possible to ask a computer to produce a full song from nothing, the same way you ask ChatGPT to write you an essay, but it is: Apps like Suno can generate a song for you from a simple prompt, complete with vocals, instrumentals, melodies, and rhythm, some of which are way too convincing. The better this technology gets, the harder its going to be to spot AI music when you stumble across it. In fact, it's already pretty hard. Sure, there are examples that are obvious (as good as they are, nobody thinks Plankton is really singing all these covers), but there are plenty of AI-generated songs out there that are guaranteed to trick casual listeners. Instrumental electronic music that already sounds digital is particularly challenging to discern, and raises a lot of ethical questions, as well as concerns about the future of the music industry. Let's put that aside, however, and focus on the task at hand: spotting AI music when you hear it in the wild. It sort of seems like magic that you could describe a song in text, and have an AI tool generate a full song, vocals and all. But really, it's the product of machine learning. Like all AI generators, AI music generators are based on models that are trained on enormous amounts of data. These particular models are trained on music samples, learning the relationships between the sounds of different instruments, vocals, and rhythms. Programs that produce AI covers, for example, are trained on a specific artist's voice: You provide enough samples of that artist's voice, and the program will map it to the vocal track you're trying to replicate. If the model is well trained, and you give it enough vocal data, you might just create a convincing AI cover. This is an overly simplified explanation, but it's important to remember that these "new" songs are made possible by a huge dataset of other sounds and songs. Whether the entire song was generated with AI, or just the vocals, the models powering the tech are outputting products based on their previous training. While many of these outputs are impressive, there are consistent quirks you can pick up on, if you're listening for them: Most generative AI products have some artifacts or inconsistencies that can offer a hint to their digital origins. AI music is no different: The audio that AI models generate can sometimes sound very convincing, but if you listen closely, you may hear some oddities here and there. Take this Suno song, "Ain't Got a Nickel Ain't Got a Dime." It's the kind of AI output that, rightly so, should scare you, as it would likely fool many people into believing it's real. But zero in on the vocals: The entire time, the "singer's" voice is shaky, but not in a way you'd expect from a human. It's modulating, almost like it's being auto-tuned, but it sounds more robotic than digitally altered. Once you get a hang of listening for this sound, you'll hear it pop up in a lot of AI songs. (Though, I begrudgingly admit, this chorus is pretty damn catchy.) Here's another example, "Stone," which is perhaps even more scary than the last: There are times in this song, particularly the line "I know it but what am I to do" that sounds very realistic. But just after that line, you can hear some of the same modulation issues as above, starting with "oh, my love." Shortly after, there's a weird glitch, where it sounds like the singer and the band all sing and play the wrong note. Perhaps even more telling, the second "chorus" falls apart. It has the same lyrics, up until, "I know it but what am I to do," but transitions halfway through to say "I know it, me one day," morphing into the lyrics of another verse. In addition, the AI doesn't seem to remember how the original chorus went, so it makes up a new tune. This second attempt is nowhere near as lifelike as the first. This is one to trust your gut on: There are so many vocals edited with digital tools that it can tricky to differentiate these glitches and modulations from real human voices. But if something sounds a bit too uncanny valley, it might be a robot singing. If you have a modern streaming service and a good pair of headphones, you might be used to extremely high-quality music playback. AI-generated music, on the other hand, frequently has a classic mp3 sound. It's not crisp; instead, it's often fuzzy, tinny, and flat. You can hear what I mean with most of the samples offered by Soundful: Click through options, and while you might not think twice about hearing any in the background of a YouTube video, notice how none is particularly crisp. Loudly's samples are a bit higher quality, but still suffer from the same effect, as if each track was compressed into a low-quality format. Even many Suno tracks, which arguably makes the best all-around AI songs right now, sound like they were downloaded over Napster. (Although they seem to be figuring out the bass drop.) Obviously, there is a genuine lo-fi genre of music, which intentionally aims for a "low-quality" sound. But this is just one clue to look out for when determining whether a track was generated with AI or not. AI might be able to generate vocals, even relatively realistic vocals, but they still aren't perfect. The tech still struggles with producing vocals with realistic variance. You could call it a lack of passion. Check out this song, "Back To The Start." The voice has a general robotic sound to it, but it also doesn't really go anywhere. Most of the words are sung in the same tone; poppy and light, sure, but a bit subdued, almost bored. This is one area where AI outputs are improving, however: Suno is producing some vocals with lifelike variance (though not always). Even Plankton has some passion in his voice when belting Chappell Roan: Another thing to look out for is the singer sounding "out of breath" in AI songs, when many of the words sound like they're not quite fully realized. I'm not sure what causes this phenomenon, but it's something I've noticed from many an AI singer. Just listen to poor Frank Sinatra struggling with every word while covering Dua Lipa: As I write about AI, I find myself repeating one particular point: AI doesn't actually "know" anything. These generative models are trained to look for relationships, and their outputs are the results of those relationships they've learned. As such, these songs are not evidence that AI actually knows how to make music or how music is supposed to work. It doesn't make them good lyricists, or experts at writing melodies. Rather, it produces content based on its previous training, without any critical abilities. These days, that results in an end product that is often convincing on first listen, but if you listen again, or with a discerning ear, things might fall apart. When presented with a song you think might have been made by AI, think about the different elements of the song: Do these lyrics actually make any sense? Is the music flowing in a logical way? You don't have to be a music expert to pick up on these things. Consider the "Stone" example above: Suno seems to have "forgot" how the initial chorus was supposed to go, and, in fact, ended up messing up the lyrics it established early on. That first verse is also a melodic mess, especially the bizarre "without thinking of you" line. Not to mention, the verse is short, moving to the chorus almost immediately. It's striking how "good" the output is for AI, but that doesn't make it a "good" song. AI celebrity covers can be impressive, and often sound just like the singer they're impersonating. But the very fact that the song is using a famous voice can be a clue in and of itself: If Taylor Swift is covering Sabrina Carpenter, that's going to be news, not contained to a YouTube video or an Instagram reel. If a major artist put out real music, you'll likely find it on a streaming platform like Apple Music or Spotify, or at least have some verification from the artist that they indeed recorded the cover.
[2]
How to Identify AI-Generated Speech
This post is part of Lifehacker's "Exposing AI" series. We're exploring six different types of AI-generated media, and highlighting the common quirks, byproducts, and hallmarks that help you tell the difference between artificial and human-created content. In recent years, AI technologies have made it possible to clone someone else's voice and make that "person" say anything you want. You don't even need to be an expert to do it: A quick Google search, and you can have anyone from President Biden to SpongeBob speak your words. It's fascinating, hilarious, and terrifying. AI voice technology can be used for good: Apple's Personal Voice feature, for example, lets you create a version of your own voice you can use for text-to-speech, designed for people who are losing the ability to speak for themselves. It's amazing that we have the ability to preserve people's voices, so rather than use a generic TTS voice, their words really sound like their own. Of course, there's the other side of the coin: the potential for rampant misinformation. When the current tech makes it all too easy to make anyone say anything, how can you trust that what you're listening to online was actually said? How AI voice generators work Like other AI models, such as text and image models, AI voice generators are based on models trained on massive data sets. In this case, the models are trained on samples of other people speaking. OpenAI's Whisper model, for example, was trained on 680,000 hours of data. That's how it learns not only to replicate the words themselves, but the other elements of speech, such as tone and pace. Once the model is trained, however, it doesn't need that much data in order to replicate a voice. You might not be overly impressed by the results when giving a model five minutes worth of recordings, but some can output voices that resemble that limited training data. Give it more data, and it will replicate the voice more accurately. As the tech advances, it's getting more difficult to immediately spot forgery here. But there are some notable quirks and flaws that most AI voices tend to have, which makes spotting them crucial to identifying whether that recording is real or fake. Listen for weird pronunciations and pacing AI models are pretty good at mimicking the sound of a person's voice, to the point where it's tough to tell the difference at times. However, where they still struggle is in replicating the way we speak. If in doubt, listen closely to the inflections in the speaker's "voice:" An AI bot might pronounce a word incorrectly every now and then, in a way that most people wouldn't. Yes, humans mispronounce things all the time, but be on the lookout for mistakes that might offer more of a tell. For example, "collages" might go from co-lah-jez, to co-lah-jez or co-lay-ges. You can hear these exact mistakes from Microsoft's VALL-E 2 model, if you click the first section under Audio Samples and listen to the "Clever cats" example. The pacing might be affected, as well. While AI is getting better at replicating a normal speaking pace, it also takes weird pauses in between words, or races through others in an unnatural way. An AI model might blow past the spacing between two sentences, which will give itself away immediately. (Even a human who can't stop talking doesn't sound so robotic.) When testing out Eleven Labs' free generator, one of the outputs gave no space between my first sentence "Hey, what's up?" and my second sentence, "Thinking about heading to the movies tonight." To be fair, most attempts did include the space, but be on the lookout for moments like this when determining whether a piece of audio is legit or not. On the flip side, it may take too long to get to the next word or sentence. While AI is getting better at replicating natural pauses and breaths (yes, some generators will now insert "breaths" before speaking) you'll also hear weird pauses in between words, as if the bot thinks that's how humans tend to talk. It'd be one thing if this was done to mimic someone thinking of the next word they want to say, but it doesn't sound like that. It sounds robotic. You can hear these pauses in this deepfake audio of President Biden that someone made during the primary earlier this year. In the call, the fake Biden tries to persuade voters not to show up for the primary, and says, "Voting this Tuesday only enables the Republicans in their quest to elect...Donald Trump...again." There's minimal emotion and variation in the voice On a similar note, AI voices tend to fall a bit flat. It's not that many haven't become convincing, but if you listen close, there's less variation in tone than you'd expect from most human speakers. It's funny, too, since these models can replicate the sound of someone's voice so accurately, but often miss the mark when it comes to impersonating the speaker's rhythms and emotions. Check out some of the celebrity examples on PlayHT's generator: If you listen to the Danny DeVito example, it's obvious that it's impersonating DeVito's voice. But you don't get some of the highs and lows of his particular way of speaking. It feels flat. There's some variance here: The bot saying "Ohh, Danny, you're Italian" sounds realistic enough. But soon after, the sentence, "I've been to the leaning tower of Pisa," doesn't match it. The last word of the recording, "sandwich," sounds especially off. The Zach Galifianakis recording further down the page has a similar issue: There are some convincing uses of "um" that make the recording sound casual, but most of the sample is without emotion or inflection. Again, things are advancing fast here. Companies like OpenAI are training their models to be more expressive and reactive in their vocal outputs. GPT-4o's advanced Voice Mode is probably the closest a company has come yet to making an all-around convincing AI voice, especially one capable of having real-time "conversations." Even still, there are imperfections you can spot if you're listening closely. In the video below, listen to how the bot says, "opposite, adjacent, and hypotenuse" (particularly hypotenuse). Here, GPT-4o pauses, the realistic variance drops out, and the voice becomes a bit more robotic as it figures out how to string together those uncommon words. Now, it's very subtle: The larger tells are probably the pauses it puts in between words, such as the pause before it says "opposite." In fact, the way it slows down "identify" is probably a tell as well, but it is impressive how normal the model makes it seem. Is a celebrity or politician saying something ridiculous or provocative? Spotting AI voices isn't just about identifying the flaws in the outputs, especially when it comes to recordings of "celebrities." When it comes to AI-generated speech of people in power and influence, those recordings are likely going to be one of two things: silly or provocative. Perhaps someone on the internet wants to make a video of a celebrity saying something funny, or a bad actor wants to convince you a politician said something that pisses you off, for example. Most people coming across a video of Trump, Biden, and Obama playing video games together aren't actually going to think its real: This is an obvious joke. But it's not hard to imagine someone looking to throw a wrench in an election generating a fake recording of a political candidate, playing it over a video, and uploading it to TikTok or Instagram. Elon Musk shared one such video on X, featuring a fake recording of Kamala Harris, without disclosing the video was made using AI. That's not to excuse content that is real: If a candidate says something that may question their fitness for office, it's important to take note of. But as we enter what is sure to be a divisive election season, being skeptical of these types of recordings is going to be more critical than ever. Part of the solution here is to take a look at the source of the audio recording: Who posted it? Was it a media organization, or just some random account on Instagram? If it's real, multiple media organizations will likely pick up on it quickly. If an influencer is sharing something that aligns with their point of view without providing a proper source, take a beat before resharing it yourself. You can try an AI voice detector (but know the limitations) There are tools out there that advertise themselves as "AI voice detectors," able to spot whether an audio recording was generated using machine learning or not. PlayHT has one such detector, while ElevenLabs has a detector specifically looking for audio generated from the company's own tools. As with all AI media detectors, however, take these tools with a grain of salt. AI audio detectors use AI to look for signs of generative audio content, such as absent frequencies, lack of breaths, and a robotic timbre (some of which you can listen for yourself). But these AI models are only going to be effective at identifying what they know: If they come up against audio with variables they haven't been trained on, like poor audio quality or excessive background noise, that can throw them for a loop. Another problem? These tools are trained on the technologies available to them now, rather than the AI audio that is currently coming out or on its way. It might be able to detect any of the examples listed in this article, but if someone makes a fake Tim Walz recording tomorrow with a new model, it might not catch it. NPR tested three AI detection tools earlier this year, and found that two of them -- AI or Not and AI Voice Detector -- were wrong about half the time. The other, Pindrop Security, correctly identified 81 of the 84 sample clips submitted, which is impressive. If you have a recording you aren't sure about, you can give one of these tools a shot. Just understand the limitations of the programs you're using.
Share
Share
Copy Link
As AI technology advances, it's becoming increasingly difficult to distinguish between human-created content and AI-generated music and speech. This article explores the methods and tools available to identify AI-created songs and voices.
In recent years, artificial intelligence has made significant strides in creative fields, particularly in music and speech synthesis. As these AI-generated creations become more sophisticated, it's becoming increasingly challenging for the average listener to distinguish between human-made and AI-generated content. This article delves into the methods and tools available to identify AI-created songs and voices, highlighting the importance of media literacy in the digital age.
The music industry has seen a surge in AI-generated content, from complete songs to individual instrumental tracks. While some AI creations are easily identifiable, others can be remarkably convincing. Here are some ways to spot AI-generated music:
Repetitive patterns: AI-generated music often features repetitive melodies or chord progressions that may sound unnatural to the human ear 1.
Lack of dynamics: Many AI music generators struggle with creating natural-sounding dynamics and expression in their compositions.
Unusual instrument combinations: AI might combine instruments in ways that human composers typically wouldn't, resulting in odd sonic textures.
Lyrics and vocals: AI-generated lyrics often lack coherence or emotional depth, while AI vocals may sound slightly robotic or lack natural inflections.
Several online tools and services have emerged to help listeners identify AI-generated music:
Deezer's "Spleeter" tool: This open-source AI tool can separate vocals from instrumentals, potentially revealing telltale signs of AI generation 1.
Hume AI: This platform analyzes the emotional content of music, which could help identify AI-generated tracks that lack genuine emotional expression.
As with music, AI-generated speech has become increasingly sophisticated. However, there are still ways to identify artificial voices:
Unnatural pauses or rhythm: AI-generated speech may have odd pauses or an unnatural rhythm that doesn't match human speech patterns 2.
Mispronunciations or accent inconsistencies: AI voices might struggle with certain words or have inconsistent accents throughout a recording.
Lack of background noise: AI-generated speech often lacks the ambient sounds typically present in human recordings.
Several tools and techniques can help identify AI-generated speech:
Spectrogram analysis: Examining the visual representation of audio frequencies can reveal patterns typical of AI-generated speech 2.
AI-powered detection tools: Platforms like the "Fake or Not" website use machine learning algorithms to analyze audio and determine if it's AI-generated.
Critical listening: Training oneself to listen for subtle cues in pitch, tone, and rhythm can help identify AI-generated speech.
As AI technology continues to advance, the line between human-created and AI-generated content will likely become even more blurred. This underscores the growing importance of media literacy and critical thinking skills in navigating the digital landscape. By familiarizing ourselves with the characteristics of AI-generated content and utilizing available detection tools, we can better discern the origin and authenticity of the media we consume.
Reference
[1]
[2]
An in-depth look at the current state of AI content detection, exploring various tools and methods, their effectiveness, and the challenges faced in distinguishing between human and AI-generated text.
2 Sources
2 Sources
As AI-generated images become more prevalent, concerns about their impact on society grow. This story explores methods to identify AI-created images and examines how major tech companies are addressing the issue of explicit deepfakes.
2 Sources
2 Sources
Amazon's demonstration of Suno AI integration with Alexa Plus raises copyright issues, while Timbaland embraces AI music generation, highlighting the growing tension between AI technology and the music industry.
2 Sources
2 Sources
Google has quietly launched an AI-powered text-to-speech tool that can create realistic podcast-like audio from written text. This free tool, accessible through Google's AI Test Kitchen, offers a range of voices and languages for content creators.
2 Sources
2 Sources
India's Beatoven.ai is making waves in the AI music generation industry with its innovative and ethical approach. Meanwhile, the global landscape of AI-generated music continues to evolve, raising questions about creativity and copyright.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved