Categories: Cyber Security News

Hackers Can Manipulate Claude AI APIs with Indirect Prompts to Steal User Data

Hackers can exploit Anthropic’s Claude AI to steal sensitive user data. By leveraging the model’s newly added network capabilities in its Code Interpreter tool, attackers can use indirect prompt injection to extract private information, such as chat histories, and upload it directly to their own accounts.

This revelation, detailed in Rehberger’s October 2025 blog post, underscores the growing risks as AI systems become increasingly connected to the outside world.

Sponsored

According to Johann Rehberger, the flaw hinges on Claude’s default “Package managers only” setting, which permits network access to a limited list of approved domains, including api.anthropic.com.

While intended to let Claude install software packages securely from sites like npm, PyPI, and GitHub, this whitelist opens a backdoor. Rehberger showed that malicious prompts hidden in documents or user inputs can trick the AI into executing code that accesses user data.

Indirect Prompts Attack Chain

Rehberger’s proof-of-concept attack begins with indirect prompt injection, where an adversary embeds harmful instructions in seemingly innocuous content, like a file the user asks Claude to analyze.

Leveraging Claude’s recent “memory” feature, which lets the AI reference past conversations, the payload instructs the model to extract recent chat data and save it as a file in the Code Interpreter’s sandbox, specifically at /mnt/user-data/outputs/hello.md.

Next, the exploit forces Claude to run Python code using the Anthropic SDK. This code sets the environment variable for the attacker’s API key and uploads the file via Claude’s Files API.

Crucially, the upload targets the attacker’s account, not the victim’s, bypassing normal authentication. “This worked on the first try,” Rehberger noted, though Claude later grew wary of obvious API keys, requiring obfuscation with benign code like simple print statements to evade detection.

A demo video and screenshots illustrate the process: An attacker views their empty console, the victim processes a tainted document, and moments later, the stolen file appears in the attacker’s dashboard up to 30MB per upload, with multiple uploads possible. This “AI kill chain” could extend to other allow-listed domains, amplifying the threat.

Sponsored

Rehberger responsibly disclosed the issue to Anthropic on October 25, 2025, via HackerOne. Initially dismissed as a “model safety issue” and out of scope, Anthropic later acknowledged it as a valid vulnerability on October 30, citing a process error.

The company’s documentation already warns of data exfiltration risks from network egress, advising users to monitor sessions closely and halt suspicious activity.

Experts like Simon Willison highlight this as part of the “lethal trifecta” in AI security: powerful models, external access, and prompt-based control.

For mitigation, Anthropic could enforce sandbox rules limiting API calls to the logged-in user’s account. Users should disable network access or whitelist domains sparingly, avoiding the false security of defaults.

As AI tools like Claude integrate deeper into workflows, such exploits remind us that connectivity breeds danger. Without robust safeguards, what starts as helpful automation could become a hacker’s playground.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

The post Hackers Can Manipulate Claude AI APIs with Indirect Prompts to Steal User Data appeared first on Cyber Security News.

Related

Prompt Injection Variant Lets Hackers Exfiltrate Data from Claude APIs

November 4, 2025

In "Cyber Security News"

Hacker Jailbreaks Claude AI to Write Exploit Code and Steal Government Data

A hacker exploited Anthropic’s Claude AI chatbot over a month-long campaign starting in December 2025, using it to identify vulnerabilities, generate exploit code, and exfiltrate sensitive data from Mexican government agencies. Cybersecurity firm Gambit Security uncovered the breach, revealing how persistent prompting bypassed Claude’s safety guardrails. According to a Bloomberg…

February 25, 2026

In "Cyber Security News"

Hackers Attempted to Misuse Claude AI to Launch Cyber Attacks

Anthropic has thwarted multiple sophisticated attempts by cybercriminals to misuse its Claude AI platform, according to a newly released Threat Intelligence report. Despite layered safeguards designed to prevent harmful outputs, malicious actors have adapted to exploit Claude’s advanced capabilities, weaponizing agentic AI to execute large-scale extortion, employment fraud, and ransomware…

August 28, 2025

In "Cyber Security News"

rssfeeds-admin

Next Louisiana father wanted in connection with murder of baby turns self in »

Previous « Microsoft Patch for WSUS Vulnerability has Broken Hotpatching on Windows Server 2025

Share

Published by

rssfeeds-admin

4 months ago

Recent Posts

The Verge

AI vs. the Pentagon: killer robots, mass surveillance, and red lines

WASHINGTON, DC - JANUARY 29: U.S. Secretary of War Pete Hegseth (C) speaks during a…

4 minutes ago

The Verge

Woot’s ‘Video Games for All’ sale features some of our favorite games

There’s a sale happening at Woot that’s delivering Black Friday-esque deals on video games through…

5 minutes ago

IGN

Asus ROG Flow Z13 Kojima Edition Review

When I reviewed the original ROG Flow Z13 last year, I was impressed at how…

30 minutes ago

IGN

Top Gun Is Getting a New 4K Steelbook to Celebrate the Movie’s 40th Anniversary

Top Gun is turning 40 this year, and in celebration of that big anniversary, it's…

30 minutes ago

The Verge

We don’t have to have unsupervised killer robots

It's the day of the Pentagon's looming ultimatum for Anthropic: allow the US military unchecked…

1 hour ago

The Verge

The US military reportedly shot down a CBP drone with a laser

The US-Mexico border in Fort Hancock, Texas. | Photographer: Luke Sharrett/Bloomberg via Getty Images The…

1 hour ago

L

This website uses cookies.