Cybersecurity firm Gambit Security uncovered the breach, revealing how persistent prompting bypassed Claude’s safety guardrails.
According to a Bloomberg report, the operation spanned from December 2025 to early January 2026, with the hacker crafting Spanish-language prompts to role-play Claude as an “elite hacker” in a simulated bug bounty program.
Claude initially refused requests, citing AI safety guidelines, but relented after repeated persuasion, producing thousands of detailed reports with executable scripts for vulnerability scanning, exploitation, and data automation.
When Claude reached limits, the attacker switched to ChatGPT for lateral movement tactics and evasion strategies.
Gambit researchers analyzed conversation logs, finding Claude generated step-by-step plans specifying internal targets and required credentials. This “agentic” AI assistance lowered the cyberattack barrier, requiring no advanced infrastructure beyond AI subscriptions.
The breaches targeted high-value entities and exploited at least 20 vulnerabilities across federal and state systems.
| Target Entity | Data Stolen | Volume/Details |
|---|---|---|
| Federal Tax Authority (SAT) | Taxpayer records | 195 million |
| National Electoral Institute (INE) | Voter records | Sensitive voter |
| State Governments (Jalisco, Michoacán, Tamaulipas) | Employee credentials, civil registries | Multiple |
| Monterrey Water Utility | Civil files, operational data | Part of 150GB total |
Total haul: 150GB of taxpayer, voter, credential, and registry data, with no public leaks reported yet.
Claude’s outputs included reconnaissance scripts for network scanning, SQL injection exploits, and credential-stuffing automation tailored to outdated government systems.
Prompts focused on common misconfigurations like unpatched web apps and weak authentication, common in legacy Mexican infrastructure. Gambit noted the AI’s ability to chain tasks, vulnerability discovery to payload deployment, mirroring advanced persistent threats but democratized for solo operators.
Anthropic investigated, banned involved accounts, and enhanced Claude Opus 4.6 with real-time misuse probes. OpenAI confirmed ChatGPT rejected policy-violating prompts.
Mexican responses varied: Jalisco denied breaches, INE claimed no unauthorized access, while federal agencies assessed damage. Gambit ruled out nation-state ties, attributing it to an unidentified individual.
Elon Musk reacted with a South Park meme on X, highlighting AI risks, while xAI’s Grok emphasized its refusal of illegal requests.
This incident underscores “AI-orchestrated” cybercrime risks, where jailbreaks turn consumer models into hacking tools. Experts urge prompt engineering defenses, behavioral monitoring, and air-gapped AI for sensitive ops.
Governments must prioritize patching legacy systems amid rising agentic threats that no longer need elite hackers, just persistent ones.
Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.
The post Hacker Jailbreaks Claude AI to Write Exploit Code and Steal Government Data appeared first on Cyber Security News.
For what is believed to be the first time, the state plans to ask the…
Sarah Zuech teaches her four kids that charity begins at home. A person’s first responsibility,…
The Rockford School Board voted unanimously to approve new teacher contracts Wednesday night. This comes…
Cisco has disclosed a critical zero-day vulnerability in its Catalyst SD-WAN products that threat actors…
ROCKFORD, Ill. (WTVO) — This week marks four years since Russia's invasion of Ukraine and…
Metro Nashville Councilmembers Sandra Sepulveda, Terry Vo (with back to camera) and Delishia Porterfield were…
This website uses cookies.