Categories: Cyber Security News

Critical RCE Flaws in AI Inference Engines Expose Meta, Nvidia, and Microsoft Frameworks

Security researchers at Oligo Security have uncovered a series of critical Remote Code Execution vulnerabilities affecting widely deployed AI inference servers from major technology companies.

The flaws impact frameworks developed by Meta, NVIDIA, Microsoft, and open-source projects, including vLLM, SGLang, and Modular, potentially exposing enterprise AI infrastructure to serious security risks.

The vulnerabilities stem from a common root cause dubbed ShadowMQ—the unsafe use of ZeroMQ (ZMQ) combined with Python’s pickle deserialization mechanism.

This security flaw spread across multiple AI frameworks through code reuse, with developers copying vulnerable code patterns from one project to another, sometimes line-for-line.

The problem originated with Meta’s Llama Stack, where researchers discovered the use of ZMQ’s recv_pyobj() method, which deserializes incoming data using Python’s pickle module.

This creates a critical security issue because pickle can execute arbitrary code during deserialization. When exposed over unauthenticated network sockets, it enables remote attackers to execute malicious code.

CVE ID Affected Product Severity Vulnerability Type Patched Version
CVE-2024-50050 Meta Llama Stack Critical Remote Code Execution (RCE) ≥ v0.0.41
CVE-2025-30165 vLLM Critical Remote Code Execution (RCE) ≥ v0.8.0
CVE-2025-23254 NVIDIA TensorRT-LLM Critical (CVSS 9.3) Remote Code Execution (RCE) ≥ v0.18.2
CVE-2025-60455 Modular Max Server Critical Remote Code Execution (RCE) ≥ v25.6

The vulnerability pattern appeared across major AI inference engines that form the backbone of enterprise AI operations.

NVIDIA’s TensorRT-LLM, PyTorch projects vLLM and SGLang, and Modular Max Server all contained nearly identical unsafe patterns.

In SGLang’s case, the vulnerable code file literally began with “Adapted from vLLM,” demonstrating how security flaws propagated through code copying.

Organizations using these frameworks include major technology companies such as xAI, AMD, Intel, LinkedIn, Oracle Cloud, Google Cloud, Microsoft Azure, and AWS, as well as universities such as MIT, Stanford, UC Berkeley, and Tsinghua University.

Researchers identified thousands of exposed ZMQ sockets communicating unencrypted over the public internet, some of which clearly belonged to production inference servers.

Successful exploitation could allow attackers to execute arbitrary code on GPU clusters, escalate privileges to internal systems, exfiltrate sensitive model data, or install cryptominers.

Meta, NVIDIA, vLLM, and Modular responded quickly with patches that replaced pickle with safer serialization mechanisms like JSON or msgpack and implemented HMAC validation.

However, some projects, including Microsoft’s Sarathi-Serve, remain vulnerable, representing what researchers call “Shadow Vulnerabilities,” known issues without CVEs that persist quietly in production environments.

Organizations using AI inference frameworks should immediately patch to secure versions. Developers must avoid using pickle or recv_pyobj() with untrusted data, implement authentication mechanisms like HMAC or TLS for ZMQ-based communications, and scan for exposed ZMQ endpoints. Network access should be restricted by binding to specific interfaces rather than using “tcp://*”, which exposes sockets on all network interfaces.

Find this Story Interesting! Follow us on Google NewsLinkedIn and X to Get More Instant Updates

The post Critical RCE Flaws in AI Inference Engines Expose Meta, Nvidia, and Microsoft Frameworks appeared first on Cyber Security News.

rssfeeds-admin

Recent Posts

Wordle Game Show Hosted by Savannah Guthrie Gets the Green Light at NBC

NBC has greenlit a Wordle game show hosted by Today anchor Savannah Guthrie. The network…

1 hour ago

Fallout Season 3 Adds Breaking Bad Star Aaron Paul to Cast

Production seems to be ramping up on Fallout Season 3 as the show has begun…

1 hour ago

The Powerful Lenovo Legion RTX 5090 Gaming PC Drops to the Lowest Price of the Year

Lenovo's most powerful Legion gaming PC is back in stock, but not only that, it's…

1 hour ago

Friday the 13th Prequel Series Crystal Lake Gets Release Date

Peacock has finally confirmed the release date for Friday the 13th's upcoming prequel series, Crystal…

1 hour ago

Today’s Top Deals: MTG Edge of Eternities, Metal Gear Solid: Master Collection, and Pragmata

There are plenty of deals to get excited about today, from MTG Edge of Eternities…

2 hours ago

Today’s Top Deals: MTG Edge of Eternities, Metal Gear Solid: Master Collection, and Pragmata

There are plenty of deals to get excited about today, from MTG Edge of Eternities…

2 hours ago

This website uses cookies.