Husain Parvez
Published on: September 23, 2025
CrowdStrike and Meta have jointly released CyberSOCEval, a new open-source benchmark suite designed to evaluate how large language models (LLMs) perform across critical security operations center (SOC) tasks like malware analysis, incident response, and threat detection.
Built on Meta’s CyberSecEval framework and integrated with CrowdStrike’s threat intelligence, the tool aims to give organizations a standardized way to test the effectiveness of AI models under real-world attack conditions. The benchmark suite, now available on GitHub, includes documentation, sample datasets, and guidance for integrating the tests into existing SOC environments.
The rise of AI in cybersecurity has made it harder for teams to choose the right tools. Many security products now claim AI capabilities, but without clear benchmarks, it’s been difficult to assess which models deliver real-world value. CyberSOCEval addresses this by simulating adversarial tactics and complex security scenarios, allowing teams to validate LLM performance before deployment.
Vincent Gonguet, Director of Product, GenAI at Superintelligence Labs at Meta, said the collaboration “introduces a new open source benchmark suite to evaluate the capabilities of LLMs in real world security scenarios. With these benchmarks in place, and open for the security and AI community to further improve, we can more quickly work as an industry to unlock the potential of AI in protecting against advanced attacks.”
Daniel Bernard, Chief Business Officer at CrowdStrike, added that “when two leaders like CrowdStrike and Meta come together, it’s larger than collaboration, it’s about setting the direction of cybersecurity for the AI era,” emphasizing the benchmark’s role in helping security teams adopt AI with confidence.
The companies hope CyberSOCEval will support both enterprise users and AI developers. Businesses get a transparent framework for comparison, while developers gain feedback on how their models handle realistic security workflows, including complex reasoning and industry-specific language.