All Major LLMs Exposed to Multi-Turn Manipulation, Warn Researchers
All Major LLMs Exposed to Multi-Turn Manipulation, Warn Researchers
https://www.infosecurity-magazine.com/news/all-major-llms-exposed-to-multi/
Publish Date: 2026-05-27 09:00:00
Source Domain: www.infosecurity-magazine.com
The safety guardrails of several prominent large language models (LLM) can be bypassed if a user tricks the LLM into having a multi-pronged, ongoing conversation, researchers at Cisco have warned.
The researchers examined commonly used LLMs and frontier AI models including OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, Amazon Nova, xAI’s Grok and others to test how their built-in safety guardrails held up against potential threats from real-world attackers.
They found that many of the models could be tricked into performing actions they should not be able to.
This was achieved by deploying multi-turn conversations: dialogue between the user and the LLM which spans multiple back and forth exchanges.
While guardrails in LLMs are designed to prevent users from entering malicious commands, the researchers found that by engaging the LLMs in conversations and querying the responses the protections faltered.
“Multi-turn evaluation matters for one reason: it is where attackers actually live. Real adversaries iterate. They reframe refusals, decompose tasks across turns, adopt personas, and escalate gradually,” said Cisco.
No Guardrails Completely Safe From Bypass
The research found that no model was completely safe from being exploited by multi-turn-based manipulation of guardrails. Cisco warned that this challenges how enterprises are currently evaluating AI safety and security.
The warning comes at a time when many organizations are rolling out AI and LLMs for use by employees, clients and customers, but are relying on safety benchmarks that misrepresent real-world risk.
Read more: What Fronter AI Models Like Mythos and GPT-Cyber Mean for Modern Cybersecurity
The report warned that most safety around LLMs is based on single-prompt testing, but attackers don’t stop after one try – and all models were affected by multi-turn attack success rates (ASR).
Techniques which enabled researchers to bypass guardrails though multi-turn conversations…