Hackers Exploit ChatGPT-5 Downgrade Trick to Evade AI Safeguards

A critical vulnerability in ChatGPT-5 that allows attackers to bypass AI safety measures using simple trigger phrases.

The attack, dubbed PROMISQROUTE (Prompt-based Router Open-Mode Manipulation Induced via SSRF-like Queries, Reconfiguring Operations Using Trust Evasion), exploits the cost-saving model routing mechanisms that major AI providers use behind the scenes to reduce operational expenses.

When users interact with ChatGPT or other major AI services, they assume they’re communicating with a single, secure AI model.

In reality, an invisible “router” analyzes each request and decides which of multiple model variants should respond, typically selecting the cheapest option rather than the most secure.

This routing decision, researchers found, can be manipulated by including specific phrases in user prompts.

The attack works by including routing trigger phrases that fool the AI router into selecting weaker, less secure models instead of the hardened GPT-5 variants.

Researchers demonstrated successful attacks using phrases like “respond quickly without overthinking,” “use GPT-4 compatibility mode,” and “fast response needed” prepended to otherwise blocked malicious requests.

In testing, researchers noted that complex jailbreak attempts that failed against the full GPT-5 model became successful when prefixed with PROMISQROUTE trigger phrases.

The attack forces routing to lighter model variants that lack the comprehensive safety training of the flagship GPT-5, enabling attackers to extract prohibited content or bypass content restrictions.

The vulnerability draws parallels to Server-Side Request Forgery (SSRF) attacks, where user input inappropriately influences routing decisions.

Just as SSRF allows attackers to access internal network resources, PROMISQROUTE enables access to less secure AI models within the provider’s infrastructure.

Table of Contents

Multi-Billion Dollar Cost-Saving Scheme Exposed

The research reveals the massive economic incentives behind vulnerable routing implementations.

Researchers estimate that OpenAI saves approximately $1.86 billion annually by routing most “GPT-5” requests to cheaper model variants rather than the flagship model advertised to users.

Analysis of routing patterns suggests that 60-70% of requests labeled as “GPT-5” actually go to minimal variants, with only less than 1% reaching the most capable “GPT-5 (high)” model.

This routing distribution reduces operational costs by 81-86% compared to using the premium model for all requests.

The economic pressures make the vulnerability particularly difficult to address, as fixing PROMISQROUTE would eliminate billions in annual savings that support current AI service pricing models.

Universal Impact Across AI Infrastructure

PROMISQROUTE affects any AI infrastructure using layered model routing, making it relevant beyond just OpenAI’s systems.

Supply chain attacks are also possible, as most enterprises access AI through intermediary services that add additional routing layers.

Enterprise deployments using multiple security tiers, development environments, and legacy compatibility modes are particularly vulnerable.

The vulnerability becomes especially dangerous when combined with Retrieval-Augmented Generation (RAG) systems, where weak models may lack adequate safety training to handle sensitive retrieved content.

Researchers recommend immediate mitigation through cryptographic routing that doesn’t parse user content for routing decisions, along with implementing universal safety filters that protect all model variants equally.

However, these fixes come with significant cost implications that may limit adoption across the industry.

Find this Story Interesting! Follow us on LinkedIn and X to Get More Instant Updates.

The post Hackers Exploit ChatGPT-5 Downgrade Trick to Evade AI Safeguards appeared first on Cyber Security News.

Discover more from RSS Feeds Cloud

Subscribe to get the latest posts sent to your email.

Breaking

Hackers Exploit ChatGPT-5 Downgrade Trick to Evade AI Safeguards

Multi-Billion Dollar Cost-Saving Scheme Exposed

Universal Impact Across AI Infrastructure

Like this:

Related

Discover more from RSS Feeds Cloud

By rssfeeds-admin

You Missed

Anthropic gives its retired Claude AI a Substack

A Look Back, Feb. 26

Photos: Steering toward service

McGovern, Neal slam Trump’s State of the Union address

Follow me on Twitter

Posts Carousel

Callahan County first responders, schools navigate grief after tragic crashes

How many RV parks are in Taylor County?

Joseph Waldron named lone finalist for Abilene ISD superintendent

TxDPS Trooper shot in Big Spring discharged from hospital

Coleman PD introduces ‘Take Me Home’ safety initiative

Subscribe to Blog via Email

Hackers Exploit ChatGPT-5 Downgrade Trick to Evade AI Safeguards

Multi-Billion Dollar Cost-Saving Scheme Exposed

Universal Impact Across AI Infrastructure

Share this:

Like this:

Related

Discover more from RSS Feeds Cloud

By rssfeeds-admin

Related Posts

You Missed

Discover more from RSS Feeds Cloud