At the RSA Conference 2023 today, we are excited to unveil VirusTotal Code Insight, a cutting-edge feature that leverages artificial intelligence for code analysis. Powered by Google Cloud Security AI Workbench, Code Insight produces natural language summaries of code snippets with ease. This functionality empowers security experts and analysts by providing them with deeper insights into the purpose and operation of analyzed code, significantly enhancing their capability to detect and mitigate potential threats.
For quite some time, artificial intelligence (AI) and machine learning (ML) have played a crucial role in anti-malware and cybersecurity, mainly focusing on classification tasks. However, recent advancements in large language models (LLMs) have expanded their capabilities to encompass text generation and summarization.
Impressively, when these models are trained on programming languages, they can adeptly transform code into natural language explanations. This innovation not only expedites malware analysis but also bolsters a variety of cybersecurity applications. Recognizing the immense potential of this cutting-edge technology, we have incorporated it into the VirusTotal platform, significantly enhancing its capabilities.
Code Insight is a new feature based on Sec-PaLM, one of the generative AI models hosted on Google Cloud AI. What sets this functionality apart is its ability to generate natural language summaries from the point of view of an AI collaborator specialized in cybersecurity and malware. This provides security professionals and analysts with a powerful tool to figure out what the code is up to.
At present, this new functionality is deployed to analyze a subset of PowerShell files uploaded to VirusTotal. The system excludes files that are highly similar to those previously processed, as well as files that are excessively large. This approach allows for the efficient use of analysis resources, ensuring that only the most relevant files (such as PS1 files) are subjected to scrutiny. In the coming days, additional file formats will be added to the list of supported files, broadening the scope of this functionality even further.
Let's examine a few examples derived from authentic situations to truly appreciate the functionality of this feature.
In this first case, we have a file that was detected by only three engines on VirusTotal as “PowerShell/PSW-Agent.U” and “HEUR.Trojan-PSW.Multi.Disco.gen”. Meanwhile, Code Insight provided the following explanation:
Unveiling false negatives
It's important to note that Code Insight conducts its analysis independently, relying solely on the content of the file being processed, without access to antivirus results or any other associated metadata. A good example can be observed in this case of a false negative, where Code Insight’s explanation helps us detect malware to stealth user’s credentials that has not been identified by any antivirus software in VirusTotal:
Clearing false positives
In this other example, we have a file that is flagged as trojan and malware by 9 antivirus engines, but it's actually a false positive. Here we can see once again how Code Insight can be a valuable ally when managing incidents and analyzing potential malware. In this case, it explains that it's simply a script that installs Postman CLI:
In this last example, Code Insight demonstrates how it can help improve file categorization in VirusTotal. Code Insight accurately identifies the sample’s file type and fixes the tag that misclassified it as JavaScript.
Although the selected examples illustrate accurate descriptions, the performance of the LLM model may vary on a case-by-case basis, including judgment errors. It’s highly likely that attackers develop new evasive strategies and an ongoing competition between malware and this new approach is expected. That’s why it is crucial for a security analyst to oversee these features as they ultimately need to interpret this information combined with other contextual information and correlations relevant to the case at hand.
Nevertheless, the integration of LLMs into the arsenal of code analysis tools is a significant advancement that enables security professionals to gain valuable insights into the structure and behavior of potentially malicious code, improving threat detection and response efficiency.
Code Insight in VirusTotal Intelligence
This kind of analysis can be carried out by various AI models, each offering varying levels of precision and depth. However, the true value of VirusTotal's Code Insight lies in its capacity to scale this analysis through its platform. This enables not only the examination of individual code samples, but also the aggregation and exploitation of results on a large scale via the VirusTotal Intelligence service. As a result, security teams can swiftly and effectively scrutinize vast quantities of code and identify potential threats, enhancing their efficiency and ultimately fortifying their security stance.
Here’s an example of searching for ”codeinsight:keylogger”:
VirusTotal Intelligence finds several files that, according to the Code Insight report, record keystrokes and write them to a log file. Let’s expand the report of the first one, then we can read a comprehensive analysis explaining this specific keylogger’s behavior:
As we continue to refine and expand the capabilities of VirusTotal Code Insight and other cutting-edge features, we remain dedicated to providing our community with the most advanced and effective tools to stay ahead of evolving cyber threats. We are truly excited about what the future holds and are eager to continue pushing the boundaries of what is possible in the field of cybersecurity. Stay tuned for more updates and developments from the VirusTotal team.