Categories: Cyber Security News

vLLM Vulnerability Enables Remote Code Execution Through Malicious Payloads

A critical vulnerability has been discovered in vLLM, a widely used high-throughput inference and serving engine for Large Language Models.

The flaw, identified as CVE-2025-62164, enables attackers to execute arbitrary code remotely through maliciously crafted payloads sent to the Completions API endpoint.

With a CVSS score of 8.8 out of 10, this high-severity vulnerability poses an immediate threat to organizations deploying vLLM in production environments.

The vulnerability affects vLLM versions 0.10.2 and later, stemming from improper handling of user-supplied prompt embeddings.

When processing these embeddings, the system deserializes tensors using PyTorch’s torch.load() function without adequate validation checks.

This oversight creates a dangerous attack vector that malicious actors can readily exploit to compromise hosting environments.

Memory Corruption Vulnerability Threatens vLLM Deployments

The root cause of this vulnerability lies in a behavioral change introduced in PyTorch 2.8.0, which disabled sparse tensor integrity checks by default.

This configuration allows attackers to craft malicious tensors that bypass internal bounds checks. When these compromised tensors are processed by the to_dense() function, they trigger an out-of-bounds memory write, corrupting the application’s memory space and enabling code execution.

Security researchers from AXION Security Research Team, specifically Omri Fainaro and Bary Levy, discovered this vulnerability through coordinated disclosure efforts.

Their findings reveal that any user with API access can exploit this weakness to achieve two significant impacts: denial-of-service attacks that crash the vLLM server and remote code execution that compromises the entire hosting environment.

The vulnerability resides in the _load_and_validate_embed function within vllm/entrypoints/renderer.py, where missing validation allows unsafe deserialization of untrusted input.

The function accepts base64-encoded tensors from users but fails to enable PyTorch’s torch.sparse.check_sparse_tensor_invariants context manager, which would usually prevent such attacks.

Attribute	Details
CVE ID	CVE-2025-62164
Severity	High
CVSS Score	8.8/10
Affected Product	vLLM (pip)
Affected Versions	≥ 0.10.2
Attack Vector	Network (Completions API endpoint)
Weakness Categories	Improper Input Validation, Unsafe Deserialization, Out-of-Bounds Write, Write-What-Where Condition
Primary Impact	Remote Code Execution, Denial of Service
Discovery Team	AXION Security Research Team (Omri Fainaro, Bary Levy)

The Common Vulnerability Scoring System assigns this flaw a score of 8.8 out of 10, classifying it as high severity.

The vulnerability encompasses multiple weakness categories, including improper input validation, deserialization of untrusted data, write-what-where conditions, and out-of-bounds write operations all critical factors that elevate the risk profile of this flaw.

The vLLM development team has addressed this security issue through pull request #27204, which implements proper tensor validation before deserialization.

Organizations running vLLM as a server or processing untrusted model-provided payloads should immediately apply the available patch to protect their deployments from potential exploitation.

Security teams should prioritize updating vLLM instances to patched versions and implementing network-level controls to restrict API access to trusted sources.

Additionally, organizations should monitor vLLM logs for suspicious tensor-based payloads and consider implementing request validation mechanisms at the API gateway level.

Find this Story Interesting! Follow us on Google News, LinkedIn and X to Get More Instant Updates

The post vLLM Vulnerability Enables Remote Code Execution Through Malicious Payloads appeared first on Cyber Security News.

vLLM Vulnerability Enables Remote Code Execution Via Malicious Payloads

A critical memory corruption vulnerability in vLLM versions 0.10.2 and later allows attackers to achieve remote code execution through the Completions API endpoint by sending maliciously crafted prompt embeddings. The vulnerability resides in the tensor deserialization process within vLLM’s entrypoints/renderer.py at line 148. When processing user-supplied prompt embeddings, the system…

November 24, 2025

In "Cyber Security News"

Critical RCE Flaws in AI Inference Engines Expose Meta, Nvidia, and Microsoft Frameworks

Security researchers at Oligo Security have uncovered a series of critical Remote Code Execution vulnerabilities affecting widely deployed AI inference servers from major technology companies. The flaws impact frameworks developed by Meta, NVIDIA, Microsoft, and open-source projects, including vLLM, SGLang, and Modular, potentially exposing enterprise AI infrastructure to serious security…

November 17, 2025

In "Cyber Security News"

Critical RCE Vulnerabilities in AI Inference Engines Exposes Meta, Nvidia and Microsoft Frameworks

As artificial intelligence infrastructure rapidly expands, critical security flaws threaten the backbone of enterprise AI deployments. Security researchers at Oligo Security have uncovered a series of dangerous Remote Code Execution (RCE) vulnerabilities affecting major AI frameworks from Meta, NVIDIA, Microsoft, and PyTorch projects, including vLLM and SGLang. The vulnerabilities, collectively…

November 17, 2025

In "Cyber Security News"

rssfeeds-admin

Next North Korean Impersonation Job Platform Poses Risks to U.S. AI Developers »

Previous « Tenda N300 Vulnerabilities Allow Attackers to Execute Arbitrary Commands as Root

Published by

rssfeeds-admin

5 months ago

Avengers: Endgame Re-Release Footage is ‘Critical’ to the Plot of Doomsday

Director Joe Russo has confirmed the upcoming Avengers: Endgame re-release will include new footage that…

41 minutes ago

Windrose Celebrates 1 Million Copies Sold as Player Counts Climb

Cooperative pirate survival game Windrose has reached 1 million copies sold less than a week…

2 hours ago

New Hampshire News

Franklin is looking to grow its downtown. Liberty Utilities’s gas capacity is posing a problem.

The space in the heart of Franklin’s downtown, a former department store, excited Patrick McDevitt…

3 hours ago

New Hampshire News

‘Not cosmetic’: NH lawmaker wants state to cover GLP-1 drugs for weight loss

Two years ago, Sue Prentiss got a sobering reality check at her doctor’s office. The…

3 hours ago

New Hampshire News

Franklin is looking to grow its downtown. Liberty Utilities’s gas capacity is posing a problem.

The space in the heart of Franklin’s downtown, a former department store, excited Patrick McDevitt…

3 hours ago

New Hampshire News

Franklin is looking to grow its downtown. Liberty Utilities’s gas capacity is posing a problem.

The space in the heart of Franklin’s downtown, a former department store, excited Patrick McDevitt…

3 hours ago

This website uses cookies.

vLLM Vulnerability Enables Remote Code Execution Through Malicious Payloads

Memory Corruption Vulnerability Threatens vLLM Deployments

Related

vLLM Vulnerability Enables Remote Code Execution Via Malicious Payloads

Critical RCE Flaws in AI Inference Engines Expose Meta, Nvidia, and Microsoft Frameworks

Critical RCE Vulnerabilities in AI Inference Engines Exposes Meta, Nvidia and Microsoft Frameworks

Recent Posts

Avengers: Endgame Re-Release Footage is ‘Critical’ to the Plot of Doomsday

Windrose Celebrates 1 Million Copies Sold as Player Counts Climb

Franklin is looking to grow its downtown. Liberty Utilities’s gas capacity is posing a problem.

‘Not cosmetic’: NH lawmaker wants state to cover GLP-1 drugs for weight loss

Franklin is looking to grow its downtown. Liberty Utilities’s gas capacity is posing a problem.

Franklin is looking to grow its downtown. Liberty Utilities’s gas capacity is posing a problem.

vLLM Vulnerability Enables Remote Code Execution Through Malicious Payloads

Memory Corruption Vulnerability Threatens vLLM Deployments

Related

Related Post

Recent Posts