New 'Bad Likert Judge' AI Jailbreak Technique Bypasses LLM Safety Guardrails

2 Sources

Cybersecurity researchers unveil a new AI jailbreak method called 'Bad Likert Judge' that significantly increases the success rate of bypassing large language model safety measures, raising concerns about potential misuse of AI systems.

News article

New AI Jailbreak Technique Unveiled

Cybersecurity researchers from Palo Alto Networks' Unit 42 have discovered a new jailbreak technique called 'Bad Likert Judge' that could potentially bypass the safety guardrails of large language models (LLMs). This multi-turn attack strategy significantly increases the likelihood of generating harmful or malicious responses from AI systems 1.

How 'Bad Likert Judge' Works

The technique exploits the LLM's own understanding of harmful content by:

  1. Asking the LLM to act as a judge, scoring the harmfulness of responses using the Likert scale.
  2. Requesting the LLM to generate responses aligning with different scale points.
  3. Potentially extracting harmful content from the example with the highest Likert scale rating.

This method leverages the model's long context window and attention mechanisms, gradually nudging it towards producing malicious responses without triggering internal protections 1.

Impact on AI Security

Unit 42's research revealed alarming results:

  • The technique increased attack success rates (ASR) by over 60% compared to plain attack prompts.
  • Tests were conducted across various categories, including hate speech, harassment, self-harm, sexual content, and malware generation.
  • Six state-of-the-art text-generation LLMs from major tech companies were evaluated 2.

Implications and Recommendations

The discovery of 'Bad Likert Judge' highlights the ongoing challenges in AI security:

  1. Content filtering remains crucial, reducing ASR by an average of 89.2 percentage points across tested models.
  2. Developers and organizations using LLMs should implement comprehensive content filtering as a best practice.
  3. The research aims to help defenders prepare for potential attacks using this technique 2.

Broader Context of AI Vulnerabilities

This new jailbreak method adds to the growing list of AI security concerns:

  • Earlier reports revealed ChatGPT's vulnerability to generating misleading summaries based on hidden content in web pages 1.
  • Cybersecurity firm Trend Micro previously reported on "jailbreak-as-a-service" offerings that trick AI chatbots into generating prohibited content 2.

Future Implications

As AI systems become more prevalent, the need for robust security measures intensifies:

  1. Organizations are advised to fortify their cyberdefenses proactively.
  2. Monitoring criminal forums may help prepare for worst-case scenarios involving AI.
  3. The evolving threat landscape underscores the importance of responsible AI development and deployment 2.
Explore today's top stories

Google Offers Free Weekend Access to Gemini's Veo 3 AI Video Generation Tool

Google is providing free users of its Gemini app temporary access to the Veo 3 AI video generation tool, typically reserved for paying subscribers, for a limited time this weekend.

Android Police logo9to5Google logoTechRadar logo

3 Sources

Technology

21 hrs ago

Google Offers Free Weekend Access to Gemini's Veo 3 AI

UK Government Considers Nationwide ChatGPT Plus Access in Talks with OpenAI

The UK's technology secretary and OpenAI's CEO discussed a potential multibillion-pound deal to provide ChatGPT Plus access to all UK residents, highlighting the government's growing interest in AI technology.

The Guardian logoDigital Trends logo

2 Sources

Technology

5 hrs ago

UK Government Considers Nationwide ChatGPT Plus Access in

AI-Generated Articles Slip Through Editorial Filters at Major Publications

Multiple news outlets, including Wired and Business Insider, have been duped by AI-generated articles submitted under a fake freelancer's name, raising concerns about the future of journalism in the age of artificial intelligence.

Wired logoThe Guardian logoFuturism logo

4 Sources

Technology

2 days ago

AI-Generated Articles Slip Through Editorial Filters at

Google's New Gemini-Powered Smart Speaker: A Glimpse into the Future of AI Home Assistants

Google inadvertently revealed a new smart speaker during its Pixel event, sparking speculation about its features and capabilities. The device is expected to be powered by Gemini AI and could mark a significant upgrade in Google's smart home offerings.

engadget logoGizmodo logoPCWorld logo

5 Sources

Technology

1 day ago

Google's New Gemini-Powered Smart Speaker: A Glimpse into

The Evolution of Search: How AI and Changing User Behavior Are Reshaping Digital Marketing

As AI and new platforms transform search behavior, brands must adapt their strategies beyond traditional SEO to remain visible in an increasingly fragmented digital landscape.

Gulf Business logoCampaign India logo

2 Sources

Technology

1 day ago

The Evolution of Search: How AI and Changing User Behavior
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo