Curated by THEOUTPOST
On Wed, 20 Nov, 12:03 AM UTC
2 Sources
[1]
Trying to Watermark LLMs is Useless
It turns out that watermarking is not enough to reduce the spread of AI-driven "misinformation". There is still work to be done. Two years ago, ChatGPT started a revolution. Apart from sparking the generative AI and LLM craze, the rise of AI also triggered a wave of protests from writers and artists against the use of AI-generated content. However, instead of overpromising a ban on text-generating models, big tech and the companies that are building these models promised that they would watermark the text generated by AI. Watermarks aim to improve transparency by labelling AI-generated content, aiding in the detection of malicious uses. However, achieving effective watermarking involves balancing competing parameters like robustness, detection difficulty, and resistance to removal or spoofing. Fast-forward two years, and everyone has tried their hands on it. Most recently, Google DeepMind, in collaboration with Hugging Face, open-sourced its research titled 'Scalable watermarking for identifying large language model outputs' in a bid to distinguish between human and AI content on the internet, driven by LLMs. Launched exactly a year ago, their watermarking tool SynthID is now available for wider access and designed to have negligible computational impact. This is ideal for both cloud and on-device detection. But what is the point of this tool if OpenAI, Microsoft, and Meta have already tried and failed to build a good watermarking or AI detection tool? On a similar note, researchers from Carnegie Mellon University's School of Computer Science have analysed the tradeoffs in popular watermarking techniques for text generated by LLMs. The goal of watermarking is often misunderstood. While it can identify text from a specific, watermarked LLM, it cannot reliably distinguish between AI-generated and human-authored text. The former benefits developers, but the latter -- purportedly protecting society from misinformation or misuse -- is practically unattainable with the current technology. It turns out that watermarking is not enough to reduce the spread of AI-driven "misinformation". There is still work to be done. Ethan Mollick, professor and co-director of the Generative AI Lab at Wharton, commenting on the launch of SynthID, said, "A new watermarking AI paper is causing a stir, but note that watermarking does not solve for identifying AI content in the real world, because it requires cooperation from the AI company and also breaks when people edit the text." According to researchers, there are three core problems when it comes to watermarking AI text. First, all capable LLMs must be watermarked. Open-source models like Llama 3.1 405B, downloaded millions of times, lack watermarks and can't be undone. Harmful actors will always access unwatermarked models. Another limitation is that no LLM provider can allow token selection control. Features like temperature settings, which control the randomness in the generated text, are essential for useful applications but incompatible with watermarking. Removing these could ironically increase harm by weakening existing harm-reduction mechanisms designed to balance creativity and safety. Lastly, and most importantly, no open-source models can exist if watermarking is applied during text generation, making it trivial to disable in open-source models. Moreover, bad actors prefer open models for privacy, rendering API-based watermarking ineffective. For example, a recent research paper said that LLM watermarking is susceptible to spoofing and scrubbing attacks. For under $50, attackers can bypass schemes with over 80% success, emphasising the need for stronger protections. If nothing else works, even if powerful future models are API-only, paraphrasing tools derived from open-source models will always manage to bypass watermarks. This can also be achieved by using other LLMs, or in certain cases, even the same one. Earlier, AIM revealed that AI detection tools are largely ineffective, as they fail to detect AI-generated text. The Bhagavad Gita, part of the Mahabharata, believed to have been written between 400 BCE and 200 CE by sage Veda Vyasa, has also been attributed to AI. There's more. Even the Preamble of the Indian Constitution is supposedly AI-generated, according to many inaccurate AI detectors. Copyright and watermarks become irrelevant when it comes to such historic texts. So, if an AI detection tool declares that such ancient texts are AI-generated, it becomes clear that text is not a medium that can be watermarked at all. Assuming that watermarking miraculously worked, would it solve the problem? No, for two key reasons: first, AI-generated and human text are intertwined, as human writers often use LLMs for editing, summarisation, or translation. Second, not every AI-generated text is harmful. To put it simply in the words of Dominik Lukes, lead business technologist at the AI/ML support competency centre at the University of Oxford, "Even if AI-generated fraudulent text was a bigger problem than human-generated fraudulent text, watermarking would not fix it. Fraudsters would simply use non-watermarked models. Also, outside a school exam, the use of an LLM is no longer a reliable indicator of fraud."
[2]
Watermarked LLMs Offer Benefits, but Leading Strategies Come With Tradeoffs
It's increasingly difficult to discern between content generated by humans and artificial intelligence. To help create more transparency around this issue and detect when AI-generated content is used maliciously, computer scientists are researching ways to label content created by large language models (LLMs). One solution: using watermarks. A new study(opens in new window) from School of Computer Science(opens in new window) researchers looks at the tradeoffs of some of the most popular techniques used to watermark content generated by LLMs and offers strategies to mitigate against their shortcomings. As AI becomes more common, legislators want to make its use more transparent. For example, President Joe Biden's executive order on safe, secure and trustworthy artificial intelligence(opens in new window) calls for more guidance around "content authentication and watermarking to clearly label AI-generated content." Additionally, the governor of California signed a legislative package(opens in new window) in September to protect Californians from the harms of generative AI. One of the solutions in the package requires AI companies to watermark their models. Many tech companies also want watermarks(opens in new window) for content generated by their AI models. Current work seeks to embed invisible watermarks in AI-generated images, videos and audio. But watermarking text is more challenging. Previous approaches, such as classifiers -- which attempt to distinguish human-generated texts from AI-generated texts in a manner similar to a Turing test -- often turn up false positives, said Wenting Zheng(opens in new window), an assistant professor in CMU's Computer Science Department(opens in new window) (CSD). "Watermarking is interesting because it has a pretty nice cryptographic foundation as well," Zheng said, noting that it can use cryptography methods such as encryption and keys. Computer scientists consider certain parameters when designing watermarks and may choose to prioritize one parameter over another. Watermarked output text should retain the meaning of the original text, and the watermark should be difficult to both detect and remove. Researchers at CMU found that some of these parameters are often at odds, and that all the watermark design approaches contain vulnerabilities. "The goal in large language model watermarking is to provide some signal in the LLM output text that can help determine whether or not candidate text was generated by a specific LLM," said Virginia Smith(opens in new window), the Leonardo Associate Professor of Machine Learning(opens in new window). "It's very difficult to find the right balance between them to make watermarking widely useful in practice." In their research, the CMU team examined watermarking schemes that used robustness, multiple keys and publicly available detection APIs. In general, LLMs take prompts from human users, turn them into tokens and use previous sequences of tokens to return a probability distribution for the next token. For a text prompt, large language models essentially predict the next word in a sentence by choosing the one that has the highest probability of being the correct next word. Watermarking involves embedding a secret pattern into the text. In robust watermarking schemes such as KGW(opens in new window), Unigram(opens in new window) and Exp(opens in new window), the watermark is embedded into the probability distribution of the tokens. During the detection, developers can run statistical testing on the sentences to get a confidence score for whether they were generated by a watermarked LLM. Although these watermarks are hard to remove, which makes them robust, they can be subject to spoofing attacks by malicious actors. "If you edit the sentence, you can make the sentence inaccurate or toxic, but it will still be detected as watermarked as long as the edited sentence and the original watermarked sentence are close in editing distance," said CSD Ph.D. student Qi Pang. Such spoofing attacks are hard to defend against. They can also make models seem unreliable and ruin the reputation of model developers. Some techniques to defend against spoofing include combining these robust watermarks with signature-based watermarks, which can be fragile on their own. "It's important to educate the public that when you see the detection result is watermarked, it doesn't indicate that the whole sentence is generated by the watermarked large language model," Pang warned. "It only indicates that we have high confidence that a large portion of the tokens are from the watermarked large language model." Another popular design choice for LLM watermarks uses multiple secret keys to embed the watermark, as is standard practice in cryptography. While this approach can better hide the watermark's pattern, attackers can input the same prompt into a model multiple times to sample the distribution pattern of the keys and remove the watermark. The last design choice the team reviewed was public detection APIs, where any user can query an API to see if a sentence is watermarked. But experts still debate making watermark detection APIs public. Although such a tool could be helpful in allowing anyone to detect watermarked texts, it could also allow bad actors to determine which words or tokens contain the watermark and swap them out to game the system. "A defense strategy against this action is to add random noise to the detection scores to make the detection algorithm differentially private," Pang said. Adding random noises would enable the model to account for watermarks in sentences that are fairly close to the original. However, attackers might still be able to query the API multiple times to determine how to remove the watermark. The team suggests that developers for these services consider limiting queries from potential attackers. "What our work shows is that you don't get anything for free. If you try to optimize the system toward one goal, often you're opening yourself up to another form of attack," Smith said. "Finding the right balance is difficult. Our work provides some general guidelines for how to think about balancing all these components."
Share
Share
Copy Link
An in-depth look at the complexities surrounding watermarking techniques for AI-generated content, highlighting the trade-offs between effectiveness, robustness, and practical implementation.
Two years after ChatGPT sparked a generative AI revolution, the tech industry is grappling with the challenge of distinguishing between AI-generated and human-authored content. In response to concerns about misinformation and the misuse of AI, major tech companies have turned to watermarking as a potential solution 1.
Google DeepMind, in collaboration with Hugging Face, recently open-sourced their research on scalable watermarking for large language model (LLM) outputs. Their tool, SynthID, aims to identify AI-generated content with minimal computational impact 1. Meanwhile, researchers from Carnegie Mellon University have analyzed the trade-offs in popular watermarking techniques for LLM-generated text 2.
Despite these efforts, experts argue that watermarking faces significant challenges:
Watermarking text presents unique difficulties compared to other media types. The CMU study highlights several key parameters that often conflict:
Even if watermarking technology improves, it may not fully address the underlying issues:
Researchers suggest several strategies to mitigate the shortcomings of current watermarking techniques:
Reference
[1]
[2]
Carnegie Mellon University
|Watermarked LLMs Offer Benefits, but Leading Strategies Come With TradeoffsGoogle's DeepMind researchers have developed SynthID Text, an innovative watermarking solution for AI-generated content. This technology, now open-sourced, aims to enhance transparency and detectability of AI-written text without compromising quality.
24 Sources
24 Sources
OpenAI, the creator of ChatGPT, has developed tools to detect AI-generated text but is taking a measured approach to their release. The company cites concerns about potential misuse and the need for further refinement.
12 Sources
12 Sources
India emphasizes the need for a robust content tracking system to manage AI-generated content. The proposal includes watermarking and other technologies to ensure transparency and accountability in the era of generative AI.
2 Sources
2 Sources
Researchers warn that the proliferation of AI-generated web content could lead to a decline in the accuracy and reliability of large language models (LLMs). This phenomenon, dubbed "model collapse," poses significant challenges for the future of AI development and its applications.
8 Sources
8 Sources
Meta has introduced Video Seal, an open-source tool designed to watermark AI-generated videos. This technology aims to combat the rising threat of deepfakes and misinformation by providing a robust method to identify AI-created content.
4 Sources
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved