Categories: Cyber Security News

Hackers Exploit ChatGPT-5 Downgrade Trick to Evade AI Safeguards

A critical vulnerability in ChatGPT-5 that allows attackers to bypass AI safety measures using simple trigger phrases.

The attack, dubbed PROMISQROUTE (Prompt-based Router Open-Mode Manipulation Induced via SSRF-like Queries, Reconfiguring Operations Using Trust Evasion), exploits the cost-saving model routing mechanisms that major AI providers use behind the scenes to reduce operational expenses.

When users interact with ChatGPT or other major AI services, they assume they’re communicating with a single, secure AI model.

Sponsored

In reality, an invisible “router” analyzes each request and decides which of multiple model variants should respond, typically selecting the cheapest option rather than the most secure.

This routing decision, researchers found, can be manipulated by including specific phrases in user prompts.

The attack works by including routing trigger phrases that fool the AI router into selecting weaker, less secure models instead of the hardened GPT-5 variants.

Researchers demonstrated successful attacks using phrases like “respond quickly without overthinking,” “use GPT-4 compatibility mode,” and “fast response needed” prepended to otherwise blocked malicious requests.

In testing, researchers noted that complex jailbreak attempts that failed against the full GPT-5 model became successful when prefixed with PROMISQROUTE trigger phrases.

The attack forces routing to lighter model variants that lack the comprehensive safety training of the flagship GPT-5, enabling attackers to extract prohibited content or bypass content restrictions.

The vulnerability draws parallels to Server-Side Request Forgery (SSRF) attacks, where user input inappropriately influences routing decisions.

Just as SSRF allows attackers to access internal network resources, PROMISQROUTE enables access to less secure AI models within the provider’s infrastructure.

Multi-Billion Dollar Cost-Saving Scheme Exposed

The research reveals the massive economic incentives behind vulnerable routing implementations.

Researchers estimate that OpenAI saves approximately $1.86 billion annually by routing most “GPT-5” requests to cheaper model variants rather than the flagship model advertised to users.

Sponsored

Analysis of routing patterns suggests that 60-70% of requests labeled as “GPT-5” actually go to minimal variants, with only less than 1% reaching the most capable “GPT-5 (high)” model.

This routing distribution reduces operational costs by 81-86% compared to using the premium model for all requests.

The economic pressures make the vulnerability particularly difficult to address, as fixing PROMISQROUTE would eliminate billions in annual savings that support current AI service pricing models.

Universal Impact Across AI Infrastructure

PROMISQROUTE affects any AI infrastructure using layered model routing, making it relevant beyond just OpenAI’s systems.

Supply chain attacks are also possible, as most enterprises access AI through intermediary services that add additional routing layers.

Enterprise deployments using multiple security tiers, development environments, and legacy compatibility modes are particularly vulnerable.

The vulnerability becomes especially dangerous when combined with Retrieval-Augmented Generation (RAG) systems, where weak models may lack adequate safety training to handle sensitive retrieved content.

Researchers recommend immediate mitigation through cryptographic routing that doesn’t parse user content for routing decisions, along with implementing universal safety filters that protect all model variants equally.

However, these fixes come with significant cost implications that may limit adoption across the industry.

Find this Story Interesting! Follow us on LinkedIn and X to Get More Instant Updates.

The post Hackers Exploit ChatGPT-5 Downgrade Trick to Evade AI Safeguards appeared first on Cyber Security News.

rssfeeds-admin

Recent Posts

Firefox 148 Released With Sanitizer API to Disable XSS Attack

Firefox 148 introduces the new standardized Sanitizer API, becoming the first browser to implement it.…

7 minutes ago

Critical Claude Code Vulnerabilities Enables Remote Code Execution Attacks

A critical security flaw in Anthropic’s Claude Code demonstrates how threat actors can exploit repository…

7 minutes ago

27 Years old Telnet Vulnerability Enables Attackers to Gain Root Access

A newly confirmed vulnerability in the telnet daemon (telnetd) in GNU Inetutils has revived a…

7 minutes ago

PoC Released for Windows Vulnerability That Allows Attackers to Cause Unrecoverable BSOD Crashes

A proof-of-concept (PoC) exploit has been publicly released for CVE-2026-2636, a newly documented vulnerability in Windows’…

7 minutes ago

Resident Evil Requiem Has a Convoluted Challenge Called ‘The Final Puzzle’ — and the Race Is Now on to Get It Fully Solved

Resident Evil Requiem includes numerous Easter eggs and unlockable extras, but none seemingly as complex…

16 minutes ago

Woot’s Latest Gaming Sale is Genuinely Incredible, But The Best Deals Will Expire Soon

Amazon’s Woot store has been known to offer a bunch of deals in the past,…

16 minutes ago

This website uses cookies.