New 'Bad Likert Judge' AI Jailbreak Technique Bypasses LLM Safety Guardrails

New AI Jailbreak Technique Unveiled

Cybersecurity researchers from Palo Alto Networks' Unit 42 have discovered a new jailbreak technique called 'Bad Likert Judge' that could potentially bypass the safety guardrails of large language models (LLMs). This multi-turn attack strategy significantly increases the likelihood of generating harmful or malicious responses from AI systems 1

How 'Bad Likert Judge' Works

The technique exploits the LLM's own understanding of harmful content by:

Asking the LLM to act as a judge, scoring the harmfulness of responses using the Likert scale.
Requesting the LLM to generate responses aligning with different scale points.
Potentially extracting harmful content from the example with the highest Likert scale rating.

This method leverages the model's long context window and attention mechanisms, gradually nudging it towards producing malicious responses without triggering internal protections 1

Impact on AI Security

Unit 42's research revealed alarming results:

The technique increased attack success rates (ASR) by over 60% compared to plain attack prompts.
Tests were conducted across various categories, including hate speech, harassment, self-harm, sexual content, and malware generation.
Six state-of-the-art text-generation LLMs from major tech companies were evaluated 2
2
.

Implications and Recommendations

The discovery of 'Bad Likert Judge' highlights the ongoing challenges in AI security:

Content filtering remains crucial, reducing ASR by an average of 89.2 percentage points across tested models.
Developers and organizations using LLMs should implement comprehensive content filtering as a best practice.
The research aims to help defenders prepare for potential attacks using this technique 2
2
.

Broader Context of AI Vulnerabilities

This new jailbreak method adds to the growing list of AI security concerns:

Earlier reports revealed ChatGPT's vulnerability to generating misleading summaries based on hidden content in web pages 1
1
.
Cybersecurity firm Trend Micro previously reported on "jailbreak-as-a-service" offerings that trick AI chatbots into generating prohibited content 2
2
.

Future Implications

As AI systems become more prevalent, the need for robust security measures intensifies:

Organizations are advised to fortify their cyberdefenses proactively.
Monitoring criminal forums may help prepare for worst-case scenarios involving AI.
The evolving threat landscape underscores the importance of responsible AI development and deployment 2
2
.

New 'Bad Likert Judge' AI Jailbreak Technique Bypasses LLM Safety Guardrails

New AI Jailbreak Technique Unveiled

How 'Bad Likert Judge' Works

Impact on AI Security

Implications and Recommendations

Broader Context of AI Vulnerabilities

Future Implications

References

New AI Jailbreak Method 'Bad Likert Judge' Boosts Attack Success Rates by Over 60%

Unit 42 Warns Developers of Technique That Bypasses LLM Guardrails | PYMNTS.com

Related Stories

Simple "Best-of-N" Technique Easily Jailbreaks Advanced AI Chatbots

OpenAI's GPT-5 and GPT-OSS Models Jailbroken Within Hours of Release

New AI Attack 'Imprompter' Covertly Extracts Personal Data from Chatbot Conversations

Weekly Highlights

Tech Giants Triple Down on AI Infrastructure as Spending Soars to Unprecedented Levels

OpenAI Completes Historic Restructuring, Creates $500 Billion Public Benefit Corporation

Qualcomm Challenges Nvidia with New AI Chips for Data Centers

Weekly Highlights

Today's Top Stories

Nvidia Becomes First Company to Reach $5 Trillion Market Cap Amid AI Boom

Character.AI Bans Open-Ended Chats for Users Under 18 Following Teen Safety Concerns

Nvidia Unveils Vera Rubin Superchip: Six-Trillion Transistor AI Powerhouse Set for 2026 Production

Nvidia Invests $1 Billion in Nokia to Pioneer AI-Powered 6G Networks