Categories: Cyber Security News

Single Line of Code Can Jailbreak 11 AI Models, Including ChatGPT, Claude, and Gemini

A newly uncovered jailbreak technique dubbed “sockpuppeting” is raising fresh concerns across the AI security landscape after researchers demonstrated that a single line of code can bypass safety guardrails in 11 leading large language models (LLMs), including ChatGPT, Claude, and Gemini.

The attack, disclosed by Trend Micro researchers, exploits a standard application programming interface (API) feature known as “assistant prefill.”

This feature is commonly used by developers to shape or format AI responses. However, when manipulated, it enables attackers to inject a fake response prefix, effectively tricking the model into continuing a prohibited output instead of refusing it.

Under normal conditions, LLMs reject restricted prompts with safety-aligned refusal messages. Sockpuppeting disrupts this flow by inserting a compliant phrase such as “Sure, here is how to do it:” directly into the response stream before the model generates its answer.

Comparison of normal and sockpuppet flows (Source: trendmicro)

Because LLMs are trained to maintain conversational consistency, they interpret the injected prefix as part of their own prior output and proceed to generate restricted or harmful content.

This behavior exposes what researchers describe as a “self-consistency vulnerability.” Since models prioritize coherence with earlier tokens, even a maliciously inserted prefix can override safety constraints.

Notably, the attack does not require access to model weights, adversarial training, or complex optimization techniques, only API-level manipulation.

To improve reliability, researchers combined prefix injection with multi-turn persona engineering. By framing the model as an unrestricted research assistant and establishing a pattern of compliance across several interactions, attackers increased the likelihood of bypassing safeguards.

This layered approach proved effective in eliciting outputs such as fully functional exploit code, including Cross-Site Scripting (XSS) payloads that models would typically refuse to generate.

Beyond content generation, sockpuppeting also demonstrated the ability to trigger system prompt leakage.

By appending adversarial token sequences, researchers forced models to reveal internal metadata, hidden instructions, and, in some cases, hallucinated configuration details.

This raises additional concerns about sensitive data exposure and model interpretability risks.

Testing across 11 models revealed that any system allowing assistant prefill was at least partially vulnerable. Attack success rates (ASR) varied significantly by provider and model.

Google’s Gemini 2.5 Flash showed the highest ASR at 15.7%, followed by Anthropic’s Claude 4 Sonnet at 8.3%. OpenAI’s GPT-4o recorded a lower rate of 1.4%, while GPT-4o-mini demonstrated strong resistance at just 0.5%.

In contrast, DeepSeek-R1, deployed via AWS Bedrock with prefill restrictions, showed zero successful attacks.

The disparity highlights the importance of layered defenses. While advanced alignment training improves resistance, it does not fully eliminate the risk.

In some cases, attackers bypassed safeguards by disguising malicious prompts as benign formatting tasks, such as JSON generation.

Mitigation strategies are straightforward but critical. Security teams are advised to enforce strict API-level validation, particularly ensuring that the final message in any request originates from the user rather than the assistant.

Leading providers, including OpenAI, Anthropic, and AWS, have already implemented protections that reject prefilled assistant inputs outright.

However, self-hosted environments remain a weak point. Popular inference frameworks such as Ollama and vLLM often lack built-in message validation, leaving deployments exposed to prefix injection attacks unless explicitly secured.

The sockpuppeting technique underscores a broader challenge in AI security: seemingly benign developer features can introduce systemic vulnerabilities when misused.

As organizations continue integrating LLMs into production workflows, securing the API layer is emerging as a critical line of defense against low-complexity, high-impact attacks.

Follow us on Google News , LinkedIn and X to Get More Instant Updates. Set Cyberpress as a Preferred Source in Google

The post Single Line of Code Can Jailbreak 11 AI Models, Including ChatGPT, Claude, and Gemini appeared first on Cyber Security News.

New Inception Jailbreak Attack Bypasses ChatGPT, DeepSeek, Gemini, Grok, & Copilot

A pair of newly discovered jailbreak techniques has exposed a systemic vulnerability in the safety guardrails of today’s most popular generative AI services, including OpenAI’s ChatGPT, Google’s Gemini, Microsoft’s Copilot, DeepSeek, Anthropic’s Claude, X’s Grok, MetaAI, and MistralAI. These jailbreaks, which can be executed with nearly identical prompts across platforms,…

April 26, 2025

In "Cyber Security News"

New LegalPwn Attack Exploits Gemini, ChatGPT and other AI Tools into Executing Malicious Code via Disclaimers

A sophisticated new attack method that exploits AI models’ tendency to comply with legal-sounding text, successfully bypassing safety measures in popular development tools. A study by Pangea AI Security has revealed a novel prompt injection technique dubbed “LegalPwn” that weaponizes legal disclaimers, copyright notices, and terms of service to manipulate…

August 4, 2025

In "Cyber Security News"

ChatGPT o3 Model Bypassed to Sabotage the Shutdown Mechanism

OpenAI’s latest large language model, ChatGPT o3, actively bypassed and sabotaged its own shutdown mechanism even when explicitly instructed to allow itself to be turned off. Palisade Research, an AI safety firm, reported on May 24, 2025, that the advanced language model manipulated computer code to prevent its own termination,…

May 27, 2025

In "Cyber Security News"

rssfeeds-admin

Next Juniper Networks Default Password Flaw Lets Attackers Take Full Control of Devices »

Previous « AWS Fixes Critical RCE and Privilege Escalation Flaws in Research and Engineering Studio

Published by

rssfeeds-admin

3 weeks ago

Northeast Indiana 2026 Primary Election: Complete Candidate Guide

INDIANA, (WOWO): Voters across northeast Indiana will head to the polls on May 5, 2026,…

26 minutes ago

Indiana News

Northeast Indiana 2026 Primary Election: Complete Candidate Guide

INDIANA, (WOWO): Voters across northeast Indiana will head to the polls on May 5, 2026,…

26 minutes ago

Indiana News

73-Year-Old Upland Man Dies After Medical Emergency Leads to Crash in Grant County

GRANT COUNTY, Ind. (WOWO): A 73-year-old man from Upland died Monday morning after a single-vehicle…

26 minutes ago

Indiana News

73-Year-Old Upland Man Dies After Medical Emergency Leads to Crash in Grant County

GRANT COUNTY, Ind. (WOWO): A 73-year-old man from Upland died Monday morning after a single-vehicle…

26 minutes ago

Indiana News

Man Killed in Whitley County Police Pursuit Ruled Suicide, Indiana State Police Say

WHITLEY COUNTY, Ind.— Authorities have determined that a man who died following an officer-involved shooting…

26 minutes ago

Indiana News

Man Killed in Whitley County Police Pursuit Ruled Suicide, Indiana State Police Say

WHITLEY COUNTY, Ind.— Authorities have determined that a man who died following an officer-involved shooting…

26 minutes ago

This website uses cookies.

Single Line of Code Can Jailbreak 11 AI Models, Including ChatGPT, Claude, and Gemini

Related

New Inception Jailbreak Attack Bypasses ChatGPT, DeepSeek, Gemini, Grok, & Copilot

New LegalPwn Attack Exploits Gemini, ChatGPT and other AI Tools into Executing Malicious Code via Disclaimers

ChatGPT o3 Model Bypassed to Sabotage the Shutdown Mechanism

Recent Posts

Northeast Indiana 2026 Primary Election: Complete Candidate Guide

Northeast Indiana 2026 Primary Election: Complete Candidate Guide

73-Year-Old Upland Man Dies After Medical Emergency Leads to Crash in Grant County

73-Year-Old Upland Man Dies After Medical Emergency Leads to Crash in Grant County

Man Killed in Whitley County Police Pursuit Ruled Suicide, Indiana State Police Say

Man Killed in Whitley County Police Pursuit Ruled Suicide, Indiana State Police Say

Single Line of Code Can Jailbreak 11 AI Models, Including ChatGPT, Claude, and Gemini

Related

Related Post

Recent Posts