Thursday, August 29, 2024

, ,

Exploring the VirusTotal Dataset | An Analyst's Guide to Effective Threat Research

By Aleksandar Milenkoski (SentinelOne) and Jose Luis Sánchez Martínez

VirusTotal stores a vast collection of files, URLs, domains, and IPs submitted by users worldwide. It features a variety of functionalities and integrates third-party detection engines and tools to analyze the maliciousness of submitted artifacts and gather relevant related information, such as file properties, domain registrars, and execution behaviors. 
The VirusTotal dataset, the backbone of the platform, structures artifact-related information into objects and represents relevant relationships between them, providing contextual links between various artifacts. This makes VirusTotal a valuable resource for threat research, enabling users to perform activities such as clustering artifacts related to specific threat actors or campaigns, tracking malicious activities, and analyzing trends in the threat landscape. 
In this post, part of a collaborative effort between VirusTotal and SentinelLabs, we explore how to effectively use VirusTotal’s wide range of querying capabilities, highlight scenarios in which these capabilities return informative results, and discuss factors that may impact the completeness or relevance of the data. 
The content is aimed at VirusTotal users seeking to better understand the fundamental inner workings of the platform and how to effectively use it as part of their investigations. This contribution complements the comprehensive VirusTotal documentation by discussing certain aspects in greater detail along with a summary of relevant context and usage information, and demonstrating how VirusTotal capabilities are applied in real-world cases.

Overview

The VirusTotal platform analyzes files and network-related artifacts (URLs, domains, and IPs) submitted to the platform to detect maliciousness. The platform aggregates results from third-party detection engines, web scanners, and other tools to provide thorough analysis overviews. 
VirusTotal stores submitted artifacts as well as information related to each artifact in a dataset, which we refer to as the VirusTotal dataset. The artifact-related information is extensive and diverse, including, for example, file properties such as filename, file type, digital signatures, and hashes, as well as URL components such as domains, URL paths, and URL query parameters.
VirusTotal provides interfaces for users to interact with the platform and search filters for querying it. The search filters allow for retrieving and pivoting through artifact-related information. Additionally, they enable clustering multiple artifacts and identifying newly submitted ones based on user-defined queries aimed at capturing overlaps in content or related information. This has many use cases in threat intelligence, such as identifying trends in the threat landscape, commonalities between different threats, and tracking specific threat groups or campaigns.
Below, we first provide an overview of the VirusTotal dataset structure and the different interfaces available for interacting with it, with a focus on querying the dataset using search filters. We then delve into search modifiers, a specific type of VirusTotal search filter, highlighting modifiers for querying data generated by artificial intelligence (AI) technology. Next, we discuss factors that may impact the relevance or completeness of results when querying using search modifiers, including an example of using search modifiers in an actual threat research investigation.

The VirusTotal Dataset

The VirusTotal dataset is the backbone of the VirusTotal platform. Current records indicate that it stores a vast amount of submitted artifacts and related information, including over 50 billion files, 6 billion URLs, and 4 billion domains. The data is stored in a structured and hierarchical manner.
The top-level structure of the VirusTotal dataset
Artifact-related information is structured into objects, which have an ID, a type, and attributes. Optionally, objects may also have one or multiple relationships.
VirusTotal objects
The object ID uniquely identifies an object. An object can be directly related to a submitted artifact: a file, URL, domain, or IP address. In this case, the object's ID is derived from the artifact itself — the SHA-256 hash for files, the IP address or domain itself for IPs or domains, and the SHA-256 hash or the Base-64 encoded form for URLs.
An object type indicates the type of information stored by an object. For example, an object of type file stores file-specific information about submitted files, such as filenames, file hashes, and the file extension, whereas an object of type domain stores domain-specific information about submitted domains, such as the domain’s registrar. The objects of type file, url, ip, or domain are directly related to submitted artifacts, while the rest, such as threat_actor or reference, are not. 
An attribute is a data item that stores information related to an object and can be of a primitive or a complex data type, such as an array or a structure. For example, the file object (an object of type file) includes the string attribute sha256. This attribute stores the SHA-256 hash of a submitted file. Additionally, it may contain the attribute lnk_info, specifically present for Windows Shortcut (LNK) files. lnk_info is a structure containing information specific to LNK files and extracted from the submitted file, such as the date at which the shortcut had been created.
Relationships signify connections between objects of the same or different types, making them particularly useful for describing scenarios involving multiple artifacts. For instance, the malicious file config.lnk file (SHA-256 hash 85b317bb4463a93ecc4d25af872401984d61e9ddcee4c275ea1f1d9875b5fa61) communicates with the IP address 149.51.230[.]198 to download a payload. 
In the context of VirusTotal, this relationship is represented through the communicating_files relationship of the ip_address object, which is directly related to 149.51.230[.]198. communicating_files stores information about all files, in the form of file objects, observed to communicate with the IP address during sandbox execution. VirusTotal executes submitted executable files in sandboxes to capture behaviors and artifacts visible only while the file is executing, such as started processes, network communications, changes to the file system, or strings present in process memory.
The communicating_files relationship
Top-level collections group all objects of the same type. The top-level collections that VirusTotal currently implements are files (a set of all objects of type file), urls (a set of all objects of type url), ips (a set of all objects of type ip), domains (a set of all objects of type domain), collections (a set of all objects of type collection), threat actors (a set of all objects of type threat_actor), and references (a set of all objects of type reference).
The object of type collection is not to be confused with top-level collections. This object groups multiple objects of the same or different types given a user-specified context, such as a threat actor, a malicious campaign, or a malware family.
The top-level collections enable operations that relate to the set of all objects of a given type. Such an operation is submitting a new file for analysis, which adds an object of type file to the files collection. 

Querying VirusTotal

VirusTotal exposes two interfaces for interacting with its dataset: the platform’s graphical user interface (GUI) for manual interaction and the application program interface (API) for programmatic interaction. 

The VirusTotal GUI 

The VirusTotal GUI is the web interface of the platform. To query VirusTotal using the GUI, users enter a search query into the search field. A search query is composed of one or multiple search filters. 
A search filter can be a value uniquely identifying a submitted artifact – a URL, domain, file hash (MD5, SHA-1, or SHA-256), or an IP address. This filter cannot be combined with other filters and is used for retrieving information related to only a single submitted artifact. 
When a user inputs a value, VirusTotal retrieves the corresponding artifact and related information from its dataset and displays this data to the user as a web analysis report. To retrieve the artifact and its related information, VirusTotal searches its dataset for an object that has an ID or an attribute (for MD5 or SHA-1 hashes) that matches the user-provided value. The platform also explores relationships to and from this object.
Web analysis report (search query: 85b317bb4463a93ecc4d25af872401984d61e9ddcee4c275ea1f1d9875b5fa61)
Querying using value uniquely identifying a submitted artifact
A search filter can also be a search modifier in the format modifier:value, where value may be a predefined or a user-specified search criterion. 
Each modifier is mapped to one or more of the top-level collections files, urls, ips, domains, and collections, forming sets of file-, URL-, IP-, domain-, and collection-specific search modifiers. Further information on these modifiers can be found in the official VirusTotal documentation.
Multiple modifiers can be combined into more complex search queries using the logical operators AND, OR, and NOT. Parentheses can be used to group modifiers and logical operators, allowing for more precise queries by controlling the order of operations. All modifiers within a search query must be mapped to a single top-level collection. 
A special modifier is entity, which defines the top-level collection to which the search query is applied. For example, entity:file applies the search query to the files collection, and entity:url applies the query to the urls collection. If a user does not explicitly specify the entity modifier, the platform defaults to entity:file.
Structured overview of commonly used file-specific search modifiers (entity:file)
For each modifier, VirusTotal searches for and retrieves objects within the collection that the modifier is mapped to. These objects have an ID or attribute, or a relationship to another object with an ID or attribute, that meets the search criterion. For example, entity:domain AND domain:test instructs VirusTotal to retrieve all objects of type domain whose IDs (the domains themselves) contain the string test. An exception is the content modifier, which instructs the platform to search through the content of submitted files. 
In the case of a complex search query combining multiple modifiers using AND, OR, and/or NOT, VirusTotal combines the retrieved objects for each modifier into a resulting set that meets the combined criteria.
The platform then displays an overview of the resulting set of objects to the user in the form of a list of web analysis results. When a user clicks on an item in the list, the platform generates a web analysis report based on the corresponding object’s ID, as described previously. 
List of web analysis results (search query: entity:domain AND domain:test)
Querying using search modifiers
In addition to scoping searches to a specific top-level collection, VirusTotal also uses the entity modifier to disambiguate between search modifiers mapped to more than one collection, such as fs (first submission date). For example, entity:file AND fs:2024-07-15 instructs VirusTotal to search the files collection for file objects where the first_submission_date attribute is set to July 15, 2024. In contrast, entity:url AND fs:2024-07-15 directs VirusTotal to search the urls collection for url objects where first_submission_date is set to the same date.

The VirusTotal API

A basic way to query VirusTotal using the API is to issue HTTP GET requests to API endpoints exposed by the platform and specify search filters as part of the request URLs. VirusTotal implements multiple endpoints, such as the following:
  • /api/v3/intelligence/search?query={query}: This endpoint allows querying VirusTotal in the same manner as the GUI, using a value that uniquely identifies a submitted artifact (a URL, domain, IP address, or file hash) or search modifiers. Example request URLs are https://www.virustotal.com/api/v3/intelligence/search?query=test.com and https://www.virustotal.com/api/v3/intelligence/search?query=entity:domain+and+domain:test
  • /api/v3/files/{hash}: This endpoint retrieves the file object whose md5, sha1, or sha256 attribute matches the user-provided value. An example request URL is https://www.virustotal.com/api/v3/files/e6adf40a959308ea9de69699c58d2f25.
Querying using the /api/v3/files/{hash} API endpoint
VirusTotal returns JSON-formatted data in response to API requests. Users can parse this data and use it for additional actions, such as further querying and pivoting through the VirusTotal dataset.
Web requests can be issued using various methods, such as HTTP client libraries, command-line tools, or custom scripts. vt-py, the official Python library for the VirusTotal API, simplifies the process of sending web requests to endpoints and handling the responses, enabling users to perform various tasks programmatically.

The API vs. The GUI

There are several key differences between the VirusTotal GUI and API, particularly regarding scalability and the scope of available information.
The programmatic use of the API enables users to conduct large-scale querying of VirusTotal, which is not achievable through manual use of the GUI. For example, retrieving the names of the processes started by all Windows Shortcut files submitted to VirusTotal over 2024 is a task that is practically feasible only using the API.
Further, not all data stored in the VirusTotal dataset can be used as part of search queries using the GUI, which limits its querying capacity. For example, there are no search modifiers allowing users to query for URLs constructed in process memory during sandbox execution that submitted files have not contacted, such as secondary C2 URLs, which are contacted if communication with the primary C2 server fails. Although such search modifiers can be useful during investigations, the VirusTotal GUI displays these URLs in the Memory pattern URLs section of web analysis reports without providing a method to query for them directly.
The API endpoint /api/v3/files/{hash}/behaviours retrieves sandbox-generated data for files specified by their hash value. URLs discovered in process memory are stored in the memory_pattern_urls field returned by the endpoint. For files that meet other search criteria, users can programmatically extract the URLs and keep the file in consideration if a URL aligns with a specific search requirement.
memory_pattern_urls values 
The API may provide more information than what is visible to users in the GUI. For example, there can be discrepancies in sandbox-generated data provided to users through the GUI and the API. For example, the /api/v3/files/{hash}/behaviours endpoint retrieves all data generated by the sandbox CAPE for the file with the user-specified hash, including details on the suspicious behavior rules triggered by the file during execution. This information is not provided to users in the GUI.
CAPE suspicious behavior rules retrieved by api/v3/files/{hash}/behaviours

AI Search Modifiers

VirusTotal leverages artificial intelligence (AI) to generate natural language summaries of the functionalities of code in executable files submitted to the platform, such as scripts, Microsoft Office documents, or binary files. This feature is particularly beneficial for malware analysis, assisting analysts in understanding the capabilities of malware under investigation.
VirusTotal integrates AI engines into its pipeline for analyzing submitted files. These engines use large language models (LLMs) trained on programming languages, which enable them to analyze and translate code into natural language summaries of its functionalities. Some AI engines also generate verdicts, which are labels categorizing analyzed code as benign, suspicious, or malicious. VirusTotal supports two types of engines: Code Insight and Crowdsourced AI.
Code Insight is VirusTotal's in-house AI engine implementation, based on Google’s Gemini. Crowdsourced AI is a collection of third-party AI engines contributed by the community and is continuously enriched with new additions.
Depending on their training and design, the AI engines specialize in analyzing specific file types. For example, the ByteDefend AI engine is designed to analyze macro code in Microsoft Office files, including Word, Excel, and PowerPoint documents. 
The Code Insight engine focuses on script files, such as PowerShell, Python, and Ruby scripts. It excludes from analysis any script files that exceed a set file size or similarity threshold (these values are currently undocumented). The VirusTotal platform compares each script's code with scripts previously analyzed by the AI engine and calculates a similarity value. This value is then evaluated against the similarity threshold.
While we were writing this post, Google announced that Code Insight will also support Windows Portable Executable (PE) binary files. This feature is enabled by three interconnected phases: 
  • Unpacking binaries using the malware analysis service Mandiant Backscatter: This step reveals the underlying code of potentially obfuscated (packed) malicious binaries submitted to VirusTotal, which is the intended subject of analysis.
  • Decompilation of the unpacked binaries using Hex-Rays IDA Pro decompilers: This step translates the assembly code of the unpacked binaries into decompiled code written in a higher-level programming language (pseudocode). Gemini analyzes decompiled code with greater efficiency compared to assembly code due to its conciseness.
  • Analysis of the decompiled code with Gemini: This step generates a natural language summary of the decompiled code’s functionalities. 
The Code Insight analysis workflow
The summaries generated by Code Insight and Crowdsourced AI engines for a given file are stored in the analysis fields of the crowdsourced_ai_results attribute of the corresponding object of type file. The optional verdict fields of this attribute store the verdicts, while the source fields indicate the AI engines that have analyzed the code within the file.
The crowdsourced_ai_results attribute
Users can query VirusTotal for specific verdicts or content in AI-generated summaries using search modifiers designed for that purpose. These modifiers are mapped to the files top-level collection. 
For each AI search modifier, VirusTotal searches the platform’s dataset for file objects whose analysis and/or verdict fields of the crowdsourced_ai_results attribute meet the user-specified criterion. In addition, based on the source field, each AI search modifier focuses the search on summaries or verdicts generated by either Code Insight, specific Crowdsourced AI engines, or all available AI engines.  
Search modifier Usage and search scope
codeinsight
codeinsight:[text]
Searches for text in summaries generated by Code Insight.
crowdsourced_ai_analysis
crowdsourced_ai_analysis:[text]
Searches for text in summaries generated by Code Insight and all Crowdsourced AI engines.
crowdsourced_ai_verdict
crowdsourced_ai_verdict:[benign|suspicious|malicious]
Searches for benign, suspicious or malicious verdicts generated by Code Insight and all Crowdsourced AI engines.
[ENGINE]_ai_analysis
[ENGINE]_ai_analysis:[content]
Searches for text in summaries generated by a single Crowdsourced AI engine. [ENGINE] is the identifier for a specific engine, such as hispasec (hispasec_ai_analysis).
[ENGINE]_ai_verdict
[ENGINE]_ai_verdict:[benign|suspicious|malicious]
Searches for benign, suspicious or malicious verdicts generated by a single Crowdsourced AI engine.
VirusTotal introduces new engine-specific search modifiers ([ENGINE]_ai_analysis and [ENGINE]_ai_verdict) as new engines are incorporated into Crowdsourced AI. For example, with the addition of the ByteDefend engine, the platform released two new search modifiers: bytedefend_ai_analysis and bytedefend_ai_verdict.
The AI search modifiers can be combined with other AI search modifiers or with any other modifiers supported by VirusTotal using the logical operators AND, OR, and NOT. For example, the search query crowdsourced_ai_analysis:"inject" AND crowdsourced_ai_analysis:"explorer.exe" can be used to identify files that perform injection involving the explorer.exe process. The results returned from VirusTotal include the PowerShell script da.ps1, which injects code from an external file into this process. This functionality of the script is documented in the summary generated by the Code Insight AI engine.
da.ps1 injects code into explorer.exe
Code Insight analysis of da.ps1
Another example is the search query crowdsourced_ai_analysis:"Shell.Run" AND behavior_created_processes:"powershell.exe". This query can be used to identify files that invoke the Run function of the Windows Script Host Shell object to execute the PowerShell process powershell.exe for conducting further activities. The results returned from VirusTotal include the Visual Basic script 297641663, which executes a PowerShell command using the Run function to download a payload from a remote server.
297641663 executes powershell.exe
Code Insight analysis of 297641663
Although the AI engines integrated into VirusTotal provide valuable insights, they should be used as tools to assist in malware analysis efforts, as part of a broader analysis strategy. AI engines are designed and trained to analyze code based on historical data, and therefore may not always accurately interpret novel techniques or highly obfuscated code in malware implementations. As a result, the summaries they generate may sometimes lack sufficient or useful information for analysts.

Clustering With Search Modifiers

The extensive number of VirusTotal search modifiers enables analysts to query the platform in a practical and precise way. This allows for retrieving submitted artifacts and related information that are relevant to specific threats under investigation. However, false positives (where retrieved data is not related to the investigated threat) and false negatives (where relevant data is missing) can impact the relevance and completeness of search results. 
The way in which queries are formulated is important for addressing or alleviating the impact of these challenges. Combining search modifiers using the logical operators AND, OR, and NOT and refining search queries helps reduce the likelihood of false positives and false negatives. This is an iterative process where analysts may integrate information obtained from multiple sources into their query formulations. 
For example, malware analysis may provide characteristics suspected to be unique to the investigated activity cluster, such as specific file names, hashes, registry keys, network indicators, code signatures, strings or functions used by the malware, or distinct patterns of behavior. Additionally, information from previous reports documenting activities potentially related to the current investigation can also be beneficial. Upon reviewing the accuracy and completeness of query results, analysts may adjust the queries to further improve their relevance and precision. To illustrate these concepts, we provide an example from an actual threat research investigation.

Clustering Scenario

In 2023, SentinelLabs conducted an investigation into suspected China-nexus actors targeting Southeast Asian gambling companies. The investigation led to the AdventureQuest.exe Windows PE executable, which had been submitted to VirusTotal on May 11, 2023. Analysis of the file revealed it to be a malware loader implemented using the .NET framework, deploying further executables on compromised systems. These executables download archive files from attacker-controlled servers. The archives contain sideloading capabilities, including malicious DLLs sideloaded by legitimate executables to deploy the Cobalt Strike backdoor. 
AdventureQuest.exe is signed with a certificate issued to the Ivacy VPN vendor PMG PTE LTD (certificate serial number: 0E3E037C57A5447295669A3DB1A28B8A). It is probable that the PMG PTE LTD signing key has been stolen, a tactic often used by suspected Chinese threat actors to sign their malware. Based on overlaps in code and functionalities with malware observed in Operation ChattyGoblin, AdventureQuest.exe is likely part of the same activity cluster. 
The certificate’s serial number provides a starting point for identifying any other malware loaders submitted to VirusTotal that are signed with the same certificate and share implementation characteristics with AdventureQuest.exe, suggesting a potential link to the same threat actor or campaign.
VirusTotal uses the Sigcheck tool to extract digital signature information from submitted Windows PE files, including the serial numbers of code signing certificates. After extraction, this information is stored in the signature_info attribute of the corresponding file objects. Users can query VirusTotal for specific signature information using the signature search modifier.
The signature_info attribute (in the file object for AdventureQuest.exe)
The signature_info attribute (in the web analysis report for AdventureQuest.exe)
The query signature: "0E3E037C57A5447295669A3DB1A28B8A" searches for submitted files that have the serial number 0E3E037C57A5447295669A3DB1A28B8A in their signature information. The query returns 94 results, including both Windows PE executables like AdventureQuest.exe and other file types, such as Windows DLLs.
VirusTotal attempts to determine the type of each submitted file using third-party tools that search for magic numbers (byte sequences) and other types of signatures that identify specific file types, such as 0x4D 0x5A for Windows PE executables. These tools include the Unix utility file, Detect-it-Easy, and AI engines. The platform then inserts keywords indicating the file type in the type_tags attribute of the corresponding file object. Users can query VirusTotal for specific keywords stored in type_tags using the type search modifier.
The type_tags attribute (in the file object for AdventureQuest.exe)
The type_tags attribute (in the web analysis report for AdventureQuest.exe)
Building on the previous search, the query signature: "0E3E037C57A5447295669A3DB1A28B8A" AND type:"peexe" narrows the results to submitted Windows PE executables. The query returns 31 results, some of which, like AdventureQuest.exe, are implemented using the .NET framework, while others are not.
The file and Detect-it-Easy tools may provide VirusTotal with information about the environments in which submitted executables are built. The platform stores the output from these tools in the magic and detectiteasy attributes of the corresponding file objects. Users can query VirusTotal for specific content in these attributes using the magic and detectiteasy search modifiers.
The magic attribute (in the file object for AdventureQuest.exe)
The magic attribute (in the web analysis report for AdventureQuest.exe)
The detectiteasy attribute (in the file object for AdventureQuest.exe)
The detectiteasy attribute (in the web analysis report for AdventureQuest.exe)
Building on the previous search, the query signature: "0E3E037C57A5447295669A3DB1A28B8A" AND type:"peexe" AND magic:".NET" further narrows the results to submitted executables built using the .NET framework. The query returns 13 results.
Closer examination of the resulting files shows that most have PDB paths, such as Ivacy.pdb. However, AdventureQuest.exe does not have a PDB path, which is typical for malware, as malware authors often strip executables of debug information. This suggests that the files with PDB paths may not be associated with the investigated activity.
VirusTotal extracts information from the header of each submitted Windows PE executable and stores this information in the pe_info attribute of the corresponding file object. Users can query VirusTotal for specific content in this attribute using the metadata search modifier.
The pe_info attribute (in the file object for AdventureQuest.exe
Further, the resulting files with PDB paths had been residing at file paths in the Ivacy VPN installation directory, such as C:\Program Files (x86)\Ivacy\IvacyService.exe. For each submitted file, VirusTotal records the names under which the file has been submitted, which may be full file paths rather than just filenames. The platform stores this information in the names attribute of the corresponding file object. Users can query VirusTotal for specific content in this attribute using the name search modifier.
The names attribute (in the file object for AdventureQuest.exe)
The names attribute (in the web analysis report for AdventureQuest.exe)
Based on our insights and previous research on Operation ChattyGoblin, we know that the threat actors do not disguise their malware as Ivacy VPN components. This suggests that the files that had been located in the Ivacy VPN installation directory before submission to VirusTotal may be false positives. An analysis of some of these files using a .NET decompiler revealed that they are indeed legitimate Ivacy VPN components.
Building on the previous search, the query signature:"0E3E037C57A5447295669A3DB1A28B8A" AND tag:"peexe" AND magic:".NET" AND (NOT metadata:".pdb") AND (NOT name:"Program Files (x86)\ivacy") further narrows the results to submitted executables that do not have PDB paths and had not been located in the Ivacy VPN installation directory before submission. The query returns one result, AdventureQuest.exe
This suggests that VirusTotal does not host other malware loaders, which are signed with the same certificate as AdventureQuest.exe and are likely linked to the investigated threat cluster. However, the extensive number of VirusTotal search modifiers allows for the identification of such loaders based on characteristics beyond the used code signing certificate. For example, querying VirusTotal for a code segment specific to AdventureQuest.exe using the content modifier leads to further malware that is likely part of the same activity cluster. We leave this as an exercise for the reader.
Search queries and results

Clustering With Search Modifiers | Limitations

Certain aspects of how VirusTotal collects information on submitted artifacts, which users can query using search modifiers, may increase the likelihood of missing relevant findings in some search scenarios. This is particularly relevant given the third-party tools and functionalities that VirusTotal uses for collecting this information, such as sandboxes and detection engines. Each of these tools has specific limitations, which affect the quality and quantity of information VirusTotal collects and stores in its dataset. In this section, we highlight some of these limitations to help users understand how they impact querying VirusTotal with search modifiers.
As mentioned earlier, VirusTotal executes submitted executable files (executables and scripts) in sandboxes to capture behaviors and artifacts visible only during execution. Additionally, most of the sandboxes VirusTotal integrates can identify MITRE ATT&CK techniques exhibited during execution. This is accomplished through a set of rules that map observed behaviors to MITRE ATT&CK techniques. 
For each submitted file, the sandboxes generate a report documenting captured activities, which are accessible to VirusTotal users. To facilitate systematic searching of sandbox-generated data, VirusTotal stores this data in an object of type file_behaviour. This object has a relationship to the file object that is directly related to the submitted file. Users can query sandbox-generated data using a variety of search modifiers, such as behavior_created_processes (searches for a name of a created process), behavior_files (searches for a name or path of an opened, written, deleted, or dropped file), or attack_technique (searches for a MITRE ATT&CK technique ID).
Sandbox-generated data in a file_behaviour object
VirusTotal’s sandboxes may not always capture relevant behaviors of executable files, for example, due to execution conditions that must be met or techniques intentionally implemented by malware authors to evade sandbox analysis. This includes command-line parameters, library or platform dependencies, or external configuration or data files. In contrast to private submissions, VirusTotal automatically executes the vast volume of executable files continuously submitted to the platform's public corpus in sandboxes, without customizing their execution or execution environments. As a result, search queries with modifiers applied to sandbox-generated data, like behavior_files or attack_technique, may return incomplete results.
For example, the BlackCat ransomware requires operators to provide an execution password as a command-line parameter (referred to as an 'access token') for the malware to initiate encryption. For the BlackCat sample veros3.exe, the CAPE sandbox has not captured any file system activities, such as deleting, creating, or modifying files. When the correct access token is provided, this sample enumerates the files and folders on the filesystem of a compromised system and encrypts files as specified in embedded configuration data.
The CAPE sandbox report for veros3.exe
In addition to running executable files in sandboxes to capture their behaviors, VirusTotal is capable of extracting malware configurations from these files. To achieve this, the platform uses the Mandiant Backscatter malware analysis service, which implements configuration extraction modules to automatically extract configurations based on known implementation patterns. Users can search through extracted configuration data using the malware_config search modifier. However, the automated extraction might not work when the analyzed malware uses new or changed methods for storing its configuration data, which are not covered by the existing modules. As a result, search queries involving malware_config may return incomplete results.
It is also important to note that some information related to submitted artifacts may change over time. For example, the last_analysis_stats attribute of the file object stores the number of third-party detection engines that have labeled the corresponding submitted file as malicious. Users can use the positives search modifier to narrow searches based on whether this number is less than, greater than, or equal to a user-specified value. For example, positives:20+ narrows searches to files that have been labeled as malicious by more than 20 engines.
last_analysis_stats attribute
Setting positives to a relatively high number is a way to focus searches on files that are likely to be malicious. However, the returned results may not include malicious files for which an insufficient number of third-party detection engines have developed detections. The development of detections is fully in control of the engines' vendors and may depend on a variety of factors.
A common factor prompting vendors to develop a detection for a specific malware implementation is the public release of a threat research report listing files that implement the malware. For example, on September 21, 2023, SentinelLabs released a report on the Sandman APT group, identifying the UpdateCheck.dll file as malware used by this group. Prior to this date, on March 15, 22, and 29, 2023, the number of engines detecting the file as malware was 5, 6, and 7, respectively. Shortly after the release of the report, this number spiked to 17 and reached 53 by September 29, 2023.
Number of engines detecting the UpdateCheck.dll malware

Conclusions

Effectively using VirusTotal for threat research requires a good understanding of the platform’s wide range of querying capabilities, the scenarios in which these capabilities return informative results beneficial to investigations, and the factors that may impact the completeness or relevance of the data returned.
While the GUI provides an agile and user-friendly way to query VirusTotal, the API enables large-scale querying, offers expanded querying capabilities, and allows for retrieving more extensive information. Additionally, the AI engines that VirusTotal integrates can significantly speed up malware analysis efforts; however, their outputs should be considered as part of a broader analysis strategy as they may lack sufficient or useful information due to limitations in design or training data. Moreover, the extensive set of search modifiers provides flexible search capabilities, but the relevance and completeness of results can be impacted by false positives and false negatives.
SentinelLabs and VirusTotal are committed to sharing information and insights that help new users gain a solid understanding of the platform’s capabilities, enabling them to make full use of the available VirusTotal features and conduct thorough investigations.

Friday, August 16, 2024

VirusTotal += Huorong

We welcome Huorong anti-malware engine to VirusTotal. In the words of the company:

"Huorong is a Chinese information security company founded in 2011, which has been committed to the research and development of endpoint security products. Huorong anti-malware engine utilizes a builtin virtual sandbox to achieve generic scanning, generic unpacking and malicious behavior analysis capabilities. Huorong anti-malware engine supports scanning files of any format, and supports decomposition of a huge variety of compound archives, including but not limited to compressors, installers, self-extractors, OLE, MIME, PDF, RTF, AutoIT, ASAR, etc. Also, Huorong anti-malware engine is designed to be accurate, lightweight, cross-platform applicable."

Huorong has expressed its commitment to follow the recommendations of AMTSO and, in compliance with our policy, facilitates this Anti-Malware Certification Testing Report by SKD Labs.

Thursday, May 30, 2024

We Made It, Together: 20 Years of VirusTotal!

Hi Everyone,

We can hardly believe it, but VirusTotal is turning 20 on June 1st! As we sit down to write this, we’re filled with a mix of pride and gratitude. It's been an incredible journey, and we wouldn't be here without the amazing community that has supported us every step of the way.

When we started VirusTotal, our goal was simple: to help make the internet a safer place. We never imagined that two decades later, we'd be here celebrating this milestone with all of you. From the early days to now, it's always been about working together. Whether you're a user, a contributor, or a supporter, you've played a crucial role in our success.

Over the years, we've had the privilege of collaborating with some of the brightest minds in cybersecurity. We've received support and guidance from industry leaders who believed in our mission and helped us grow. To mark this special occasion, we reached out to a few of these key figures to share their thoughts and memories about VirusTotal. Their testimonials highlight the power of community and collaboration:

Adrian Hendrik

"VirusTotal has consistently tackled tough challenges in cybersecurity. By assisting them with detailed analyses and organizing the first-ever VirusTotal training in Japan, I've seen their impact firsthand. Celebrating their integration into Google's parent company was a milestone. As VirusTotal marks 20 years, it's clear they've become essential for detecting malware and supporting cyber threat intelligence. Their contributions are invaluable to security personnel. I hope the younger generation continues this vital work, ensuring VirusTotal thrives for another 20 years."

Adrian Hendrik (unixfreaxjp), Cyber Emergency Center of LACERT, Japan

Costin G. Raiu

"It’s difficult to think of a project that has had a greater impact on our industry than VirusTotal. I believe its success rests on three key pillars: providing easy access to top antivirus engines for users, enabling researchers to efficiently use YARA for pivoting, and the incredible dedication and passion of its team. On this 20th anniversary, happy birthday to VirusTotal and to everyone who has worked tirelessly to make this dream a reality! Cheers also to all who rely on VirusTotal daily for their work! Analizar, siempre!"

Costin G. Raiu, Independent security researcher

Florian Roth

“I've been using both VirusTotal and YARA since their early days. Over the past 12 years, I've written more than 18,000 YARA rules, greatly aided by the features and capabilities of VirusTotal. Today, I consider VirusTotal an indispensable tool for the cybersecurity community. We rely on it to track threat actors, connect the dots, uncover new undetected malware, quality test our detections, and discover related and still unnoticed threats. VirusTotal stands as one of the central pillars of the cybersecurity toolset, if not the most important one.”

Florian Roth, VP R&D at Nextron Systems

George Kurtz

“VirusTotal has become a vital asset for cybersecurity defenders globally, providing essential insights that accelerate detection and response. At CrowdStrike, we are proud to have been the first to integrate our NGAV technology with VirusTotal, reflecting our shared commitment to innovation and security. By harnessing collective intelligence, VirusTotal has significantly elevated cybersecurity standards, ensuring a safer digital environment for all. Congratulations on this remarkable milestone and thank you for your dedication to supporting the security community and protecting organizations worldwide.”

George Kurtz, President/CEO and co-founder of CrowdStrike

Heather Adkins

“For two decades, VirusTotal has maintained an unwavering commitment to partnering across the community, creating transparency around the tools that threat actors are using to undermine global safety. They have had a meaningful impact on countless individuals and organizations, uplifting security teams across the planet, in a challenging asymmetric threat landscape. Thank you for all that you've done for Google, and the world.”

Heather Adkins, VP/Fellow, Security Engineering at Google

Joe Pichlmayr

“When the first multi-scanner systems went online, we could not have imagined how quickly a simple way to get multiple scanner opinions would become a substantial building block for our daily malware analysis work. VirusTotal's amazing and comprehensive analyses have not only become an indispensable part of our analyzer work but have also become an essential building block for our threat intelligence services.”

Joe Pichlmayr, CEO at IKARUS

John Lambert

“Yara cut the gordian knot paralyzing information sharing. It gave defenders a way to share detection when they could not share samples. VirusTotal sped up global defense by providing a common hunting ground containing the world’s more important threats.”

John Lambert, Corporate Vice President and Security Fellow, Microsoft

Mark Kennedy

“Over the past 20 years, VirusTotal, or VT to most of us, has evolved from a simple multi-scanner to a key source of security intelligence. It is relied on by security companies as well as security professionals. Beyond that, VT has been a reliable partner from the very beginning. They have always been ready and willing to add features and APIs to make using their services and integrating it into both products and workflows easier. The vast wealth of data analytics and historical data on files and families, has permanently stitched VT into the fabric of security intelligence. I cannot wait to see what the next 20 years of innovation will produce. Congratulations on the first 20 years!”

Mark Kennedy, Distinguished Engineer Broadcom, AMTSO Chair

Mark Russinovich

“Microsoft believes security is a team sport and the integration of SysInternals with VirusTotal has made it easier to analyze malware and share those results to improve security for all. In addition, Microsoft Defender XDR uses VirusTotal reports as an accurate threat intelligence source, and VirusTotal uses detections from Microsoft Defender antivirus as a primary source of detection”

Mark Russinovich, Azure CTO and Technical Fellow, Microsoft

Mikko Hyppönen

"VirusTotal was a real gamechanger. In addition of building a technical platform, it also built a community. Thank You for your work!"

Mikko Hyppönen, Technology speaker and author. CRO at WithSecure

Parisa Tabriz

"Reflecting on VirusTotal's 20th anniversary, I still remember the launch of their URL scan service back in 2010 and early collaborations with Google Safe Browsing and Chrome. We all had an aligned mission to make the web a safer place for everyone. Twenty years in, lots of progress to be proud of protecting people around the world, and our work continues!"

Parisa Tabriz, VP/GM Chrome & Google Security Princess

Shane Huntley

“Since the earliest days of TAG in 2010, VirusTotal and the team have been a critical partner helping us to defend Google, Google users and the world. We all owe a huge debt to all this team has done and how they have provided so much to the community of those fighting against online threats.”

Shane Huntley, Sr Director Google Threat Intel and cofounder of TAG

One of the things we’re most proud of is how VirusTotal has always been a team effort. From our dedicated staff to our passionate users, everyone has contributed in their own way. It's this collective effort that has allowed us to innovate, evolve, and stay ahead of the ever-changing threat landscape.

What's Next?

We'd love to hear your stories! Share your favorite memories or how VirusTotal has impacted your work on Twitter/X, LinkedIn, and other social networks with the hashtag #VirusTotal20Years. We'll be collecting the best stories and sending some cool swag to the top contributors. Stay tuned for more exciting announcements, events, and blog posts about some behind-the-scenes stories from our early days and key milestones in our history throughout our anniversary year!

As we look to the future, we remain committed to our mission. There's still a lot of work to be done, and we know we can't do it alone. We're counting on your continued support, feedback, and collaboration to keep pushing the boundaries and making the digital world safer for everyone.

Thank you for being a part of our journey. Here's to many more years of working together to fight cyber threats and protect our digital lives.

Best regards,

The VirusTotal Founding Team


From left to right:

  • Julio Canto: Wrote the very first lines of code for VT and launched the first version, still in charge of adding all the new engines and tools we use.
  • Alejandro Bermúdez: The mastermind behind how our analyzer farm works. He keeps everything running smoothly to this day.
  • Francisco Santos: Started out designing our very first website, databases, and all those storage systems we rely on. Now he leads the backend analysis team.
  • Bernardo Quintero: Had the initial idea for VT (blame him if anything breaks!) and now focuses on using AI to make threat analysis even smarter.
  • Victor Manuel Alvarez: Gave the world YARA, helped design VT Intelligence and Hunting, and just recently announced YARA-X.
  • Emiliano Martínez: If you've used our VT API, that's Emiliano's work. He's also a co-designer of VT Intelligence and currently keeps everything running as our Product Manager.