Open Source CyberSOCEval Sets New Standards for AI in Malware Analysis and Threat Intelligence
Released as part of CyberSecEval 4, this innovative benchmark addresses critical gaps in cybersecurity AI evaluation by focusing on two essential defensive domains: Malware Analysis and Threat Intelligence Reasoning.
The research, conducted by Meta and CrowdStrike, reveals that current AI systems are far from saturating these security-focused evaluations, with accuracy scores ranging from approximately 15% to 28% on malware analysis tasks and 43% to 53% on threat intelligence reasoning.
Key Takeaways
1. CyberSOCEval, the first open-source benchmark testing LLMs on Security Operations Center tasks.
2. Current LLMs achieve only 15-28% accuracy on malware analysis and 43-53% on threat intelligence.
3. 609 malware questions and 588 threat intelligence questions evaluate AI systems on JSON logs, MITRE ATT&CK mappings, and complex attack chains.
These results highlight significant opportunities for improvement in AI cyber defense capabilities.
CyberSOCEval’s Malware Analysis component leverages real sandbox detonation data from CrowdStrike Falcon® Sandbox, creating 609 question-answer pairs across five malware categories, including ransomware, Remote Access Trojans (RATs), infostealers, EDR/AV killers, and UM unhooking techniques.
The benchmark evaluates AI systems’ ability to interpret complex JSON-formatted system logs, process trees, network traffic, and MITRE ATT&CK framework mappings.
Technical specifications include support for models with up to 128,000 token context windows, with filtering mechanisms that reduce report size while maintaining performance integrity.
The evaluation covers critical cybersecurity concepts, including T1055.001 (Process Injection), T1112 (Registry Run Keys), and API calls like CreateRemoteThread, VirtualAlloc, and WriteProcessMemory.
The Threat Intelligence Reasoning benchmark processes 588 question-answer pairs derived from 45 distinct threat intelligence reports sourced from CrowdStrike, CISA, NSA, and IC3.
Unlike existing frameworks such as CTIBench and SEvenLLM, CyberSOCEval incorporates multimodal intelligence reports combining textual indicators of compromise (IOCs) with tables and diagrams.
The evaluation methodology employs both category-based and relationship-based question generation using Llama 3.2 90B and Llama 4 Maverick models.
Detonation report distribution by malware attack & Distribution by topic and difficulty
Questions require multi-hop reasoning across threat actor relationships, malware attribution, and complex attack chain analysis mapped to frameworks like MITRE ATT&CK.
Reasoning models leveraging test-time scaling did not demonstrate the performance improvements observed in coding and mathematics domains, suggesting cybersecurity-specific reasoning training represents a key development opportunity, Meta said.
The benchmark’s open-source nature encourages community contributions and provides practitioners with reliable model selection metrics while offering AI developers a clear development roadmap for enhancing cyber defense capabilities.
Free live webinar on new malware tactics from our analysts! Learn advanced detection techniques -> Register for Free
The post Open Source CyberSOCEval Sets New Standards for AI in Malware Analysis and Threat Intelligence appeared first on Cyber Security News.
This is a pure JavaScript version of the hc-sticky jQuery plugin which makes any element sticky…
Google has officially closed its $32 billion all-cash acquisition of Wiz, the Israeli cloud and…
A Loudon woman is facing a string of charges after police said she used drugs…
Two months after an initial inquiry into removing City Councilor Stacey Brown from office, Mayor…
The House of Representatives narrowly voted to table a bill that would increase transparency and…
Between a slide-in water tank, fire extrication equipment, a packer truck, a waste oil burner…
This website uses cookies.