Monday, April 24, 2023

Introducing VirusTotal Code Insight: Empowering threat analysis with generative AI

At the RSA Conference 2023 today, we are excited to unveil VirusTotal Code Insight, a cutting-edge feature that leverages artificial intelligence for code analysis. Powered by Google Cloud Security AI Workbench, Code Insight produces natural language summaries of code snippets with ease. This functionality empowers security experts and analysts by providing them with deeper insights into the purpose and operation of analyzed code, significantly enhancing their capability to detect and mitigate potential threats.

For quite some time, artificial intelligence (AI) and machine learning (ML) have played a crucial role in anti-malware and cybersecurity, mainly focusing on classification tasks. However, recent advancements in large language models (LLMs) have expanded their capabilities to encompass text generation and summarization. 

Impressively, when these models are trained on programming languages, they can adeptly transform code into natural language explanations. This innovation not only expedites malware analysis but also bolsters a variety of cybersecurity applications. Recognizing the immense potential of this cutting-edge technology, we have incorporated it into the VirusTotal platform, significantly enhancing its capabilities.

Code Insight is a new feature based on Sec-PaLM, one of the generative AI models hosted on Google Cloud AI. What sets this functionality apart is its ability to generate natural language summaries from the point of view of an AI collaborator specialized in cybersecurity and malware. This provides security professionals and analysts with a powerful tool to figure out what the code is up to. 

At present, this new functionality is deployed to analyze a subset of PowerShell files uploaded to VirusTotal. The system excludes files that are highly similar to those previously processed, as well as files that are excessively large. This approach allows for the efficient use of analysis resources, ensuring that only the most relevant files (such as PS1 files) are subjected to scrutiny. In the coming days, additional file formats will be added to the list of supported files, broadening the scope of this functionality even further.

Let's examine a few examples derived from authentic situations to truly appreciate the functionality of this feature. 

In this first case, we have a file that was detected by only three engines on VirusTotal as “PowerShell/PSW-Agent.U” and “HEUR.Trojan-PSW.Multi.Disco.gen”. Meanwhile, Code Insight provided the following explanation:

https://www.virustotal.com/gui/file/74662107227a6a28bebb77d5b9ec3890a80e507ee22ed99eb17f35c9d8730bf3/detection

Unveiling false negatives


It's important to note that Code Insight conducts its analysis independently, relying solely on the content of the file being processed, without access to antivirus results or any other associated metadata. A good example can be observed in this case of a false negative, where Code Insight’s explanation helps us detect malware to stealth user’s credentials that has not been identified by any antivirus software in VirusTotal:

https://www.virustotal.com/gui/file/552efb0dc7e62ded08c98d2e6355df1d27a1317c0a37aabeefd48667b7b1917b/detection

Clearing false positives


In this other example, we have a file that is flagged as trojan and malware by 9 antivirus engines, but it's actually a false positive. Here we can see once again how Code Insight can be a valuable ally when managing incidents and analyzing potential malware. In this case, it explains that it's simply a script that installs Postman CLI: