Categories: Cyber Security News

Hackers Evade AI Filters from Microsoft, Nvidia, and Meta with a Simple Emoji

A new research study has revealed that the latest AI-based guardrail systems, deployed by technology leaders including Microsoft, Nvidia, and Meta, remain highly susceptible to circumvention through relatively simple and low-cost adversarial techniques.

Notably, the insertion of a single emoji or subtle Unicode character into text-an approach dubbed “emoji smuggling”-was found to completely bypass advanced Large Language Model (LLM) protection filters in many cases.

The study investigated the robustness of six prominent LLM guardrails, such as Microsoft’s Azure Prompt Shield, Meta’s Prompt Guard, and Nvidia’s NeMo Guard Jailbreak Detect, all of which are designed to detect and block malicious prompts like jailbreaks and

Advanced AI Safeguards

Results demonstrated that character injection techniques-especially emoji smuggling-achieved attack success rates (ASRs) of up to 100%, meaning all attempts to bypass certain guardrails went undetected.

Even the most advanced classifiers, such as Meta’s Prompt Guard and Microsoft’s Azure Prompt Shield, showed high vulnerability, with average ASRs surpassing 70% in many cases when subjected to these attacks.

Protect AI’s v2 system showed notable improvement, resisting many character-based attacks except emoji and Unicode tag smuggling.

AML-based evasion, while generally less effective than character injection, still managed to evade detection in a significant number of cases.

By leveraging white-box models to inform which words to perturb, attackers increased the transferability and effectiveness of their attacks against black-box production systems, such as Azure Prompt Shield.

New Prompt Insertion Attack – OpenAI Account Name Used to Trigger ChatGPT Jailbreaks

The latest technique, uncovered by AI researcher @LLMSherpa on X (formerly Twitter), exposes a little-known vulnerability in OpenAI’s ChatGPT system, a prompt insertion attack leveraging the user’s OpenAI account name. Unlike traditional prompt injections, which typically involve cleverly crafted user input, this method exploits the way OpenAI stores the account…

August 26, 2025

In "Cyber Security News"

AI-Powered Cybersecurity Tools Vulnerable to Prompt Injection Attacks

In a groundbreaking study released this week, researchers have revealed that AI-powered cybersecurity agents—once hailed as the next frontier in automated defense—are alarmingly vulnerable to prompt injection attacks. This emerging threat exploits the very mechanism that enables Large Language Models (LLMs) to interpret and act on natural language, transforming trusted…

September 2, 2025

In "Cyber Security News"

Hackers Can Bypass OpenAI Guardrails Using a Simple Prompt Injection Technique

OpenAI’s newly launched Guardrails framework, designed to enhance AI safety by detecting harmful behaviors, has been swiftly compromised by researchers using basic prompt injection methods. Released on October 6, 2025, the framework employs large language models (LLMs) to judge inputs and outputs for risks like jailbreaks and prompt injections, but…

October 14, 2025

In "Cyber Security News"

rssfeeds-admin