Categories: Cyber Security News

Open Source CyberSOCEval Sets New Standards for AI in Malware Analysis and Threat Intelligence

A groundbreaking open-source benchmark suite called CyberSOCEval has emerged as the first comprehensive evaluation framework for Large Language Models (LLMs) in Security Operations Center (SOC) environments.
Sponsored

Released as part of CyberSecEval 4, this innovative benchmark addresses critical gaps in cybersecurity AI evaluation by focusing on two essential defensive domains: Malware Analysis and Threat Intelligence Reasoning.

The research, conducted by Meta and CrowdStrike, reveals that current AI systems are far from saturating these security-focused evaluations, with accuracy scores ranging from approximately 15% to 28% on malware analysis tasks and 43% to 53% on threat intelligence reasoning. 

Key Takeaways
1. CyberSOCEval, the first open-source benchmark testing LLMs on Security Operations Center tasks.
2. Current LLMs achieve only 15-28% accuracy on malware analysis and 43-53% on threat intelligence.
3. 609 malware questions and 588 threat intelligence questions evaluate AI systems on JSON logs, MITRE ATT&CK mappings, and complex attack chains.

These results highlight significant opportunities for improvement in AI cyber defense capabilities.

CyberSOCEval Malware Analysis

CyberSOCEval’s Malware Analysis component leverages real sandbox detonation data from CrowdStrike Falcon® Sandbox, creating 609 question-answer pairs across five malware categories, including ransomware, Remote Access Trojans (RATs), infostealers, EDR/AV killers, and UM unhooking techniques. 

The benchmark evaluates AI systems’ ability to interpret complex JSON-formatted system logs, process trees, network traffic, and MITRE ATT&CK framework mappings.

Technical specifications include support for models with up to 128,000 token context windows, with filtering mechanisms that reduce report size while maintaining performance integrity. 

The evaluation covers critical cybersecurity concepts, including T1055.001 (Process Injection), T1112 (Registry Run Keys), and API calls like CreateRemoteThread, VirtualAlloc, and WriteProcessMemory.

The Threat Intelligence Reasoning benchmark processes 588 question-answer pairs derived from 45 distinct threat intelligence reports sourced from CrowdStrike, CISA, NSA, and IC3. 

Unlike existing frameworks such as CTIBench and SEvenLLM, CyberSOCEval incorporates multimodal intelligence reports combining textual indicators of compromise (IOCs) with tables and diagrams.

Sponsored

The evaluation methodology employs both category-based and relationship-based question generation using Llama 3.2 90B and Llama 4 Maverick models. 

Detonation report distribution by malware attack & Distribution by topic and difficulty

Questions require multi-hop reasoning across threat actor relationships, malware attribution, and complex attack chain analysis mapped to frameworks like MITRE ATT&CK.

Reasoning models leveraging test-time scaling did not demonstrate the performance improvements observed in coding and mathematics domains, suggesting cybersecurity-specific reasoning training represents a key development opportunity, Meta said.

The benchmark’s open-source nature encourages community contributions and provides practitioners with reliable model selection metrics while offering AI developers a clear development roadmap for enhancing cyber defense capabilities.

Free live webinar on new malware tactics from our analysts! Learn advanced detection techniques -> Register for Free

The post Open Source CyberSOCEval Sets New Standards for AI in Malware Analysis and Threat Intelligence appeared first on Cyber Security News.

rssfeeds-admin

Recent Posts

Cross-browser Sticky Element Plugin With Pure JavaScript – HC-Sticky

This is a pure JavaScript version of the hc-sticky jQuery plugin which makes any element sticky…

10 minutes ago

Google Completes Acquisition of Wiz in Historic $32 Billion Deal

Google has officially closed its $32 billion all-cash acquisition of Wiz, the Israeli cloud and…

40 minutes ago

Police say Loudon woman used drugs with infant in car

A Loudon woman is facing a string of charges after police said she used drugs…

1 hour ago

Concord Mayor warns city councilor over ‘improper and unprofessional’ conduct

Two months after an initial inquiry into removing City Councilor Stacey Brown from office, Mayor…

1 hour ago

NH House tables bill that would place Coalition Against Domestic and Sexual Violence under 91-A

The House of Representatives narrowly voted to table a bill that would increase transparency and…

1 hour ago

Canterbury voters to assess equipment upgrades, manage various funds at town meeting

Between a slide-in water tank, fire extrication equipment, a packer truck, a waste oil burner…

1 hour ago

This website uses cookies.