N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?

Researchers have developed N-Day-Bench, a platform that assesses the ability of large language models (LLMs) to identify known security vulnerabilities in real-world codebases. The platform utilizes a monthly refresh of test cases from GitHub security advisories, ensuring the evaluation remains relevant and uncontaminated by prior knowledge. Each test case involves checking out a repository at the last commit before a patch was applied, providing LLMs with a sandboxed environment to explore the codebase. This approach helps to measure the models' true vulnerability discovery capabilities, rather than their ability to memorize known vulnerabilities¹. The use of a sandboxed bash shell allows for a more realistic assessment of the models' abilities. The results of this research have significant implications for the field of cybersecurity, as they indicate the potential for LLMs to be used in vulnerability discovery. This matters to practitioners because it signals a potential shift in the way vulnerabilities are identified and addressed, highlighting the need to stay ahead of emerging technologies.

N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?

References

Related Intelligence

N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?

References

Related Intelligence

Get the Signal. Skip the Noise.