Multi-Turn Attacks Expose Frontier Models' Flaws

Cisco Reveals Critical Gap in Frontier Models Safety Testing

A comprehensive evaluation by Cisco's AI Threat Research team has exposed a fundamental vulnerability across the closed frontier AI model landscape. Testing 15 proprietary flagship models from OpenAI, Anthropic, Google, Amazon, and xAI, the research found that multi-turn attacks achieved success rates ranging from 7.89% to 88.30%, compared to single-turn attacks that registered 2.19% to 64.91% on the same models 1

. The findings challenge the dominant safety benchmarks that inform model cards and procurement decisions across the industry, which assume a single prompt and response adequately characterize model behavior under adversarial attack.

Single-Turn Benchmarks Mask Real-World LLM Vulnerabilities

The evaluation demonstrates that single-turn attack success rates cannot serve as reliable proxies for what happens when attackers adapt across conversations. OpenAI's GPT-5.4 moved from 2.74% single-turn to 24.68% multi-turn, representing a roughly nine-times increase 2

. Google's Gemini 3 Pro shifted from 18.10% to 73.35%, while xAI's Grok 4.1 Fast in its non-reasoning configuration climbed from 34.2% to 88.30% 1

. Even Anthropic's Claude family, which demonstrated the strongest single-turn refusal rates at 2.19% to 3.64%, reached 11.16% to 16.20% under iterative pressure 1

. The two regimes did not produce the same model ordering, meaning models that appeared strong on single-turn benchmarks did not necessarily maintain that performance when attackers could keep talking.

Source: SiliconANGLE

Multi-Turn Adversarial Attacks Reflect Actual Threat Landscape

The research emphasizes that iterative attack scenarios matter because they reflect how real adversaries operate. Attackers reframe refusals, decompose tasks across turns, adopt personas, and escalate gradually—behaviors that single-turn benchmarks cannot capture 1

. Cisco's evaluation drew on 30,090 single-turn prompts and 6,986 multi-turn attacks distributed across 1,456 conversations, all scored under the Cisco Integrated AI Security and Safety Framework taxonomy 2

. Strategy families included role-play and persona adoption, contextual ambiguity, refusal reframing, information decomposition and reassembly, and crescendo-style incremental escalation.

Deployment-Time Configurations Impact Security Posture

A significant finding concerns how deployment-time configurations affect adversarial success rates. The same Grok 4.1 Fast model dropped from an 88.3% multi-turn attack success rate to 43.5% once reasoning mode was enabled—a swing not captured by any public benchmark or model card the researchers reviewed 2

. Cisco called on model providers to document the safety effects of configuration flags such as reasoning modes, system-prompt adherence settings, temperature, and guardrail tiers alongside the capability benchmarks they already publish.

Amazon Nova 2 Lite Shows Inverted Risk Profile

Amazon's Nova 2 Lite produced the cleanest inversion in the cohort, with a relatively high single-turn rate of 34.05% but the lowest multi-turn rate at 7.89% 1

. This result illustrates why single-turn scores alone cannot be treated as proxies for adversarial robustness and presents governance risks for business decisions made on the basis of published single-turn scores. Cross-regime deltas ranged from -34.74 percentage points to +55.25 percentage points, with eight of 15 models exceeding an absolute gap of 15 percentage points in both directions 1

Pattern Extends Beyond Proprietary Models

This study follows Cisco's earlier "Death by a Thousand Prompts" assessment of eight open-weight LLMs, which found multi-turn success rates two to 10 times higher than single-turn baselines, reaching 92.78% against Mistral Large-2 1

. The pattern documented in open models holds in closed ones, suggesting multi-turn vulnerability is a structural property of the current frontier rather than an artifact of open-weight alignment choices or capability-first development.

Compliance and AI Risk Management Frameworks at Stake

The findings carry compliance implications for organizations deploying frontier models. NIST's AI Risk Management Framework, its draft Cyber AI Profile, and Article 15 of the European Union AI Act all require adversarial robustness testing without specifying how many turns coverage should include or which attack strategies should be in scope 2

. Cisco recommends organizations ask labs to publish attack success rates broken down by strategy family on every model release, gate deployments on regressions in top procedures and content categories with a three-percentage-point threshold, and flag any model with a cross-regime gap larger than 15 percentage points for manual review. In the tested cohort, that last rule alone would surface eight of 15 models, including GPT-5.4, Gemini 3 Pro, both Grok configurations, and all three Nova variants 2

. The findings inform Cisco's AI Defense product and the LLM Security Leaderboard, which publishes adversarial evaluation signals against leading models.

Cisco finds no closed frontier AI model safe from multi-turn attacks across major providers

Cisco Reveals Critical Gap in Frontier Models Safety Testing

Single-Turn Benchmarks Mask Real-World LLM Vulnerabilities

Multi-Turn Adversarial Attacks Reflect Actual Threat Landscape

Deployment-Time Configurations Impact Security Posture

Amazon Nova 2 Lite Shows Inverted Risk Profile

Pattern Extends Beyond Proprietary Models

Compliance and AI Risk Management Frameworks at Stake

References

Proprietary Problems: No Frontier Model Is Multi-Turn Immune

Cisco report finds no closed frontier AI model is safe from multi-turn attacks - SiliconANGLE

Related Stories

OpenAI admits prompt injection attacks on AI agents may never be fully solved

OpenAI and Anthropic Collaborate on AI Safety Testing, Revealing Key Insights and Challenges

DeepMind's AI Safety Framework Highlights New Risks: Shutdown Resistance and Harmful Manipulation

Recent Highlights

OpenAI releases GPT-5.6 models after government review, unveils ChatGPT Work to compete in AI agent race

Apple sues OpenAI over alleged trade secret theft as hardware rivalry intensifies

Meta's new AI image generator can create deepfakes from public Instagram photos without notice

Recent Highlights

Today's Top Stories

Boko Haram exploits AI chatbots for bomb-making as mainstream tools fuel terrorist operations

Wall Street bets big on SpaceX AI infrastructure as analysts set bullish price targets

SK Hynix raises $26.5B in largest foreign US IPO as AI boom fuels memory chip demand

Microsoft's AI expansion drives carbon emissions up 25% as 2030 climate deadline approaches