2 Sources
2 Sources
[1]
Deepfakes leveled up in 2025 - here's what's coming next
Over the course of 2025, deepfakes improved dramatically. AI-generated faces, voices and full-body performances that mimic real people increased in quality far beyond what even many experts expected would be the case just a few years ago. They were also increasingly used to deceive people. For many everyday scenarios -- especially low-resolution video calls and media shared on social media platforms -- their realism is now high enough to reliably fool nonexpert viewers. In practical terms, synthetic media have become indistinguishable from authentic recordings for ordinary people and, in some cases, even for institutions. And this surge is not limited to quality. The volume of deepfakes has grown explosively: Cybersecurity firm DeepStrike estimates an increase from roughly 500,000 online deepfakes in 2023 to about 8 million in 2025, with annual growth nearing 900%. I'm a computer scientist who researches deepfakes and other synthetic media. From my vantage point, I see that the situation is likely to get worse in 2026 as deepfakes become synthetic performers capable of reacting to people in real time. Dramatic improvements Several technical shifts underlie this dramatic escalation. First, video realism made a significant leap thanks to video generation models designed specifically to maintain temporal consistency. These models produce videos that have coherent motion, consistent identities of the people portrayed, and content that makes sense from one frame to the next. The models disentangle the information related to representing a person's identity from the information about motion so that the same motion can be mapped to different identities, or the same identity can have multiple types of motions. These models produce stable, coherent faces without the flicker, warping or structural distortions around the eyes and jawline that once served as reliable forensic evidence of deepfakes. Second, voice cloning has crossed what I would call the "indistinguishable threshold." A few seconds of audio now suffice to generate a convincing clone - complete with natural intonation, rhythm, emphasis, emotion, pauses and breathing noise. This capability is already fueling large-scale fraud. Some major retailers report receiving over 1,000 AI-generated scam calls per day. The perceptual tells that once gave away synthetic voices have largely disappeared. Third, consumer tools have pushed the technical barrier almost to zero. Upgrades from OpenAI's Sora 2 and Google's Veo 3 and a wave of startups mean that anyone can describe an idea, let a large language model such as OpenAI's ChatGPT or Google's Gemini draft a script, and generate polished audio-visual media in minutes. AI agents can automate the entire process. The capacity to generate coherent, storyline-driven deepfakes at a large scale has effectively been democratized. This combination of surging quantity and personas that are nearly indistinguishable from real humans creates serious challenges for detecting deepfakes, especially in a media environment where people's attention is fragmented and content moves faster than it can be verified. There has already been real-world harm - from misinformation to targeted harassment and financial scams - enabled by deepfakes that spread before people have a chance to realize what's happening. The future is real time Looking forward, the trajectory for next year is clear: Deepfakes are moving toward real-time synthesis that can produce videos that closely resemble the nuances of a human's appearance, making it easier for them to evade detection systems. The frontier is shifting from static visual realism to temporal and behavioral coherence: models that generate live or near-live content rather than pre-rendered clips. Identity modeling is converging into unified systems that capture not just how a person looks, but how they move, sound and speak across contexts. The result goes beyond "this resembles person X," to "this behaves like person X over time." I expect entire video-call participants to be synthesized in real time; interactive AI-driven actors whose faces, voices and mannerisms adapt instantly to a prompt; and scammers deploying responsive avatars rather than fixed videos. As these capabilities mature, the perceptual gap between synthetic and authentic human media will continue to narrow. The meaningful line of defense will shift away from human judgment. Instead, it will depend on infrastructure-level protections. These include secure provenance such as media signed cryptographically, and AI content tools that use the Coalition for Content Provenance and Authenticity specifications. It will also depend on multimodal forensic tools such as my lab's Deepfake-o-Meter. Simply looking harder at pixels will no longer be adequate.
[2]
2026 will be the year you get fooled by a deepfake, researcher says. Voice cloning has crossed the 'indistinguishable threshold' | Fortune
For many everyday scenarios -- especially low-resolution video calls and media shared on social media platforms -- their realism is now high enough to reliably fool nonexpert viewers. In practical terms, synthetic media have become indistinguishable from authentic recordings for ordinary people and, in some cases, even for institutions. And this surge is not limited to quality. The volume of deepfakes has grown explosively: Cybersecurity firm DeepStrike estimates an increase from roughly 500,000 online deepfakes in 2023 to about 8 million in 2025, with annual growth nearing 900%. I'm a computer scientist who researches deepfakes and other synthetic media. From my vantage point, I see that the situation is likely to get worse in 2026 as deepfakes become synthetic performers capable of reacting to people in real time. https://www.youtube.com/embed/2DhHxitgzX0?wmode=transparent&start=0 Just about anyone can now make a deepfake video. Several technical shifts underlie this dramatic escalation. First, video realism made a significant leap thanks to video generation models designed specifically to maintain temporal consistency. These models produce videos that have coherent motion, consistent identities of the people portrayed, and content that makes sense from one frame to the next. The models disentangle the information related to representing a person's identity from the information about motion so that the same motion can be mapped to different identities, or the same identity can have multiple types of motions. These models produce stable, coherent faces without the flicker, warping or structural distortions around the eyes and jawline that once served as reliable forensic evidence of deepfakes. Second, voice cloning has crossed what I would call the "indistinguishable threshold." A few seconds of audio now suffice to generate a convincing clone - complete with natural intonation, rhythm, emphasis, emotion, pauses and breathing noise. This capability is already fueling large-scale fraud. Some major retailers report receiving over 1,000 AI-generated scam calls per day. The perceptual tells that once gave away synthetic voices have largely disappeared. Third, consumer tools have pushed the technical barrier almost to zero. Upgrades from OpenAI's Sora 2 and Google's Veo 3 and a wave of startups mean that anyone can describe an idea, let a large language model such as OpenAI's ChatGPT or Google's Gemini draft a script, and generate polished audio-visual media in minutes. AI agents can automate the entire process. The capacity to generate coherent, storyline-driven deepfakes at a large scale has effectively been democratized. This combination of surging quantity and personas that are nearly indistinguishable from real humans creates serious challenges for detecting deepfakes, especially in a media environment where people's attention is fragmented and content moves faster than it can be verified. There has already been real-world harm - from misinformation to targeted harassment and financial scams - enabled by deepfakes that spread before people have a chance to realize what's happening. https://www.youtube.com/embed/syNN38cu3Vw?wmode=transparent&start=0 AI researcher Hany Farid explains how deepfakes work and how good they're getting. Looking forward, the trajectory for next year is clear: Deepfakes are moving toward real-time synthesis that can produce videos that closely resemble the nuances of a human's appearance, making it easier for them to evade detection systems. The frontier is shifting from static visual realism to temporal and behavioral coherence: models that generate live or near-live content rather than pre-rendered clips. Identity modeling is converging into unified systems that capture not just how a person looks, but how they move, sound and speak across contexts. The result goes beyond "this resembles person X," to "this behaves like person X over time." I expect entire video-call participants to be synthesized in real time; interactive AI-driven actors whose faces, voices and mannerisms adapt instantly to a prompt; and scammers deploying responsive avatars rather than fixed videos. As these capabilities mature, the perceptual gap between synthetic and authentic human media will continue to narrow. The meaningful line of defense will shift away from human judgment. Instead, it will depend on infrastructure-level protections. These include secure provenance such as media signed cryptographically, and AI content tools that use the Coalition for Content Provenance and Authenticity specifications. It will also depend on multimodal forensic tools such as my lab's Deepfake-o-Meter. Simply looking harder at pixels will no longer be adequate.
Share
Share
Copy Link
Deepfakes evolved dramatically in 2025, with AI-generated faces and voices becoming indistinguishable from authentic media for most viewers. Cybersecurity firm DeepStrike reports deepfakes surged from 500,000 in 2023 to 8 million in 2025—nearly 900% annual growth. Researchers warn 2026 will bring real-time interactive deepfakes capable of responding instantly, making detection increasingly difficult and shifting defense strategies from human judgment to infrastructure-level protections.
Deepfakes achieved a troubling milestone in 2025, with synthetic media technology advancing to the point where AI-generated faces and voices have become indistinguishable from authentic media for ordinary viewers and even some institutions
1
. The quality leap means that for many everyday scenarios—particularly low-resolution video calls and social media content—nonexpert viewers can no longer reliably distinguish synthetic content from genuine recordings2
.
Source: The Conversation
The surge extends beyond quality improvements. Cybersecurity firm DeepStrike estimates deepfakes increased from roughly 500,000 online in 2023 to approximately 8 million in 2025, representing annual growth nearing 900%
1
. This explosive volume combined with near-perfect realism creates serious challenges for detection systems, especially in a fragmented media environment where content spreads faster than verification can occur.Video realism made a significant leap through models designed specifically to maintain temporal consistency. These systems produce videos with coherent motion, consistent identities, and logical frame-to-frame progression
2
. The models disentangle identity information from motion data, allowing the same motion to map across different identities or enabling one identity to exhibit multiple motion types. These advances produce stable, coherent faces without the flicker, warping, or structural distortions around eyes and jawlines that once served as reliable forensic evidence1
.Voice cloning crossed what researchers call the "indistinguishable threshold"
2
. Just a few seconds of audio now suffice to generate highly convincing voice cloning complete with natural intonation, rhythm, emphasis, emotion, pauses, and breathing noise. This capability already fuels large-scale fraud, with some major retailers reporting over 1,000 AI-generated scam calls per day1
. The perceptual tells that once revealed synthetic voices have largely disappeared.Consumer deepfake tools have pushed the technical barrier nearly to zero. Upgrades from OpenAI's Sora 2 and Google's Veo 3, alongside a wave of startups, mean anyone can describe an idea, let large language models like ChatGPT or Gemini draft scripts, and generate polished audio-visual media within minutes
2
. AI agents can automate the entire process, effectively democratizing the capacity to generate coherent, storyline-driven deepfakes at scale.Looking toward 2026, researchers expect deepfakes to evolve into real-time interactive deepfakes capable of responding to people instantly
1
. The frontier shifts from static visual realism to temporal and behavioral coherence—models generating live or near-live content rather than pre-rendered clips. Identity modeling converges into unified systems capturing not just appearance, but how people move, sound, and speak across contexts2
.This evolution moves beyond "this resembles person X" to "this behaves like person X over time." Experts anticipate entire video-call participants synthesized in real time, interactive AI-driven actors whose faces, voices, and mannerisms adapt instantly to prompts, and scammers deploying responsive avatars rather than fixed videos
1
.Related Stories
The combination of surging quantity and personas nearly indistinguishable from real humans has already caused real-world harm through misinformation, targeted harassment, and financial scams
2
. As capabilities mature, the perceptual gap between synthetic and authentic human media continues narrowing, making human judgment inadequate for detection.The meaningful line of defense shifts to infrastructure-level protections. These include secure content provenance through cryptographic media signing and AI content tools using Coalition for Content Provenance and Authenticity specifications
1
. Multimodal forensic tools like Deepfake-o-Meter offer additional detection capabilities, but simply examining pixels more carefully will no longer suffice as synthetic performers achieve behavioral coherence indistinguishable from genuine human interaction2
.Summarized by
Navi
[1]
1
Technology

2
Policy and Regulation

3
Technology
