Wednesday, 6 November 2019

Pipelining VT Intelligence searches and sandbox report lookups via APIv3 to automatically generate indicators of compromise

TL;DR: VirusTotal APIv3 includes an endpoint to retrieve all the dynamic analysis reports for a given file. This article showcases programmatic retrieval of sandbox behaviour reports in order to produce indicators of compromise that you can use to power-up your network perimeter/endpoint defenses. We are also releasing a set of python scripts alongside this blog post to illustrate this use case.

We recently rolled out a new Windows dynamic analysis system called VirusTotal Jujubox. This new sandbox represents a major revamp of VirusTotal’s in-house behaviour analysis capabilities as well as a key addition to the multi-sandbox project, which already aggregates behaviour reports from more than 10 partners and the most popular operating systems.

Behaviour reports are often perceived as a mechanism to understand what an individual sample does when executed, a quick overview before diving into disassembly and debugging. However, when you have a massive dynamic analysis setup processing hundreds of thousands of files per day, the microscopic dissection capability is far from being the most attractive use case.

When you generate reports at scale, and more importantly, when you index them in an elasticsearch index and expose it via API, the generated data can be used for advanced hunting, especially when this data can be combined with other static, binary and in-the-wild properties.

The basic workflow would be as follows:

  1. Periodically identify new malware variants pertaining to a family that you are tracking making use of the VT Intelligence search API. Use family variant commonalities (for instance a section name, the compilation timestamp or a document’s author metadata property) to retrieve a stream of malware.
  2. Focus on recent matches since the previous execution (query: fs:2019-11-01+).
  3. For each match, retrieve the generated behaviour reports for the pertinent file. You can also focus specifically on network communications with the contacted_ips, contacted_domains and contacted_urls relationships.
  4. For each automatically extracted network observable, check popularity ranks in order to filter out noise and FPs.
  5. All the newly yielded network artefacts (CnCs) can then be fed into SIEMs or transformed into IDS rules to power up network perimeter defenses.

Let’s illustrate this with a particular example. Bankbot is an Android banking trojan, it allows the attacker to perform:

  • SMS hijacking.
  • GPS tracking.
  • New permission requests.
  • Overlay attacks to mask legit bank apps with forms to intercept credentials. Sometimes based on a remote set of HTML templates. 

The trojan was released in an underground forum and the post included the source code for the client-side and server-side components, including the database setup to collect stolen information.

Initially, the trojan included a hardcoded list of target bank applications that it would overlay in order to intercept banking credentials:

Since the source code of the trojan was also published in the underground forum, other crooks soon modified it to accept a remote list of financial entities to attack. This makes target identification more complex, static analysis is not enough to identify the targeted banks and subsequent date-tied CnC infrastructure.

While identifying targeted financial institutions might be a more complex task, discovering new variants of the same family and automatically identifying new network infrastructure tied to it becomes easier. Why is this? A server-side remote target list leads to a common network infrastructure pattern that can be used to track the malware family.

This is an example of a Bankbot sample:

VT Enterprise allows similarity searches and other attribute searches to find additional variants of the same malware family. In this particular case, the Android package name under the details tab seems interesting, clicking on it will launch a VT Intelligence search for other Android APKs that share that very same package name:

The matches do indeed seem to belong to the same family:

When opening these samples and looking at their behaviour reports, certain commonalities are easily noticed:

Static/behaviour/code commonalities are very frequent since attackers usually reuse code across different campaigns. Sometimes the commonalities are a result of recompiling the same code to communicate with a different network infrastructure. Other times, commonalities are present because the attack binaries are generated with some kind of builder or kit for dummies. Similarly, CnC infrastructure often exhibits commonalities in terms of the same path structure or query parameters, it is the result of attackers reusing the same CnC panel through a server-side kit that they deploy without changing file names or path structure.

These patterns, in conjunction with VT’s massive dynamic analysis setup and indexing, make it easy to automatically discover new malicious network infrastructure and automatically generate indicators of compromise.

The behaviour reports for the identified cluster of samples shows that the CnC panel uses the subpaths tuk_tuk.php or checkPanel.php.

Let’s use this common pattern to periodically check VirusTotal for new variants of this malware family, and by doing so, let’s identify new network infrastructure tied to this attack, live, as samples are uploaded to VirusTotal.

Using the APIv3 Intelligence search endpoint, it’s possible to search for any Android APK whose network recordings contain the substring tuk_tuk.php:
type:apk behaviour_network:"tuk_tuk.php"

Multiple properties, such as dynamic/static analysis and metadata, can be combined to make a more refined search:
type:apk behaviour_network:"tuk_tuk.php" behaviour:"del_sws" androguard:"android.permission.ACCESS_FINE_LOCATION"

The API can sort matches according to first seen descending, meaning that by executing this search periodically and focusing on the latest results, it’s possible to discover new malicious network infrastructure tied to this particular family.

At the time of writing, this search yielded the following results:





The Intelligence search API endpoint will return a list of file objects matching a search criteria. Each of these file objects can have one or more multi-sandbox reports. These behaviour reports can be retrieved making use of the pertinent relationship (behaviours) for each of the files:

It’s also possible to filter the network communication relationships fields, instead of asking for the whole report (contacted_urls, contacted_ips, contacted_domains): behaviour_network:”tuk_tuk.php”&relationships=contacted_urls,contacted_domains,contacted_ips

Once the pertinent network infrastructure is parsed, it’s possible to either rely on the objects returned by the network-related relationships (contacted_urls, contacted_ips, contacted_domains) or make a subsequent automated call to the domain / IP address / URL API endpoint in order to retrieve further details about the given network observable. The aim of this subsequent stage is to filter out potential false positives. For instance, among the details returned for a domain lookup, there are different popularity rank lists that can be useful to filter out TOP domains.

You can easily test this workflow with a little script released along with this blog post. This script makes use of our official APIv3 python library, it can serve as your starting point to build more complex pipelines:

python3 --apikey=<YOUR_API_KEY> --query=’type:apk behaviour_network:"tuk_tuk.php"’

=== Results: ===

Note that this workflow is exclusively based on behavioural observations and works independently of the detection ratio of files, by pipelining VT Intelligence searches and sandbox report lookups, it is possible to generate indicators of compromise even if the related sample is undetected. The identified domains can be automatically checked against SIEM logs or can be automatically transformed into IDS rules, serving as an additional layer in your onion-like security strategy.

This blog post focuses on combining VT Intelligence searches with behaviour lookups, the same can be done with YARA rule matches. VT Hunting Livehunt matches can programmatically retrieved using APIv3, for each match the pertinent behaviour reports can be retrieved and CnC network infrastructure can be automatically extracted. Similarly, other properties that can be used as IoCs, such as mutexes, registry keys, embedded domains, file names, cmd parameters and the like can be automatically yielded. The following two script showcase this other VT Hunting workflow:

If you are rather a golang fan, feel free to check out our official VirusTotal golang library:

APIv3 was a major component of our 2019 roadmap, soon we will be officially releasing it and announcing a generous deprecation timeline for APIv2, stay tuned!

Thursday, 31 October 2019

VirusTotal += Bitdefender Theta

We welcome the Bitdefender Theta scanner to VirusTotal. This engine is 100% Machine Learning powered and reinforces the participation of Bitdefender that already had a multi-platform scanner in our service. In the words of the company:

“When it comes to pushing things forward in the fight against cyber-crime Bitdefender Theta checks all the boxes. This new technology stack makes use of deep neural networks to provide industry leading detection rates in the fight against ever changing cyber-attacks. Bitdefender Theta is 100% Machine Learning powered and built on top of Bitdefender's state of the art dynamic behavioral analysis and cloud services is used to identify and block threats without the need for daily signature updates.”

Bitdefender has expressed its commitment to follow the recommendations of AMTSO and, in compliance with our policy, facilitates this review by AV-Comparatives, an AMTSO-member tester.

Monday, 28 October 2019

Test your YARA rules against a collection of goodware before releasing them in production

The rising tide of malware threats has created an arms race in security tool accumulation, this has led to alarm fatigue in terms of noisy alerts and false positives. The last thing you need is more false alarms coming from buggy or suboptimal YARA rules, be it the ones you use in VT Hunting or the ones that you feed into your own security defenses.

As you may already know, VT Enterprise incorporates a component that allows you to match your own YARA rules against all newly uploaded files (Livehunt) as well as back in time against our historical malware collection (Retrohunt).

A common challenge for YARA users is that of potential false positives. False positives can have a negative effect on a users Livehunt feed by producing incorrect results. Similarly, a buggy rule can be a waste of your Retrohunt quota, and given that Retrohunt jobs are lengthy, it is also a waste of time. Since many security tools incorporate YARA these days, some users will be launching their rules against a fleet of machines that they manage, meaning that a buggy rule can be a big waste of resources.

In order to address this common pain point we are releasing a new Retrohunt feature: fast hunting over a goodware corpus. When you launch your Retrohunt jobs you can now select the corpus on which it should act:

The goodware corpus is a set of 1M files chosen from the NIST National Software Reference Library, accounting for 147GB. Jobs launched against this collection usually finish in under a minute. As such, we imagine that users may be modifying the way they use VT Hunting. Before writing a Livehunt YARA rule or launching a Retrohunt job, they probably will want to test it against this corpus and tweak the rule in order to prevent false positives and avoid unnecessary and lengthy Retrohunt iterations.

- Goodware Retrohunt jobs are correspondingly tagged -

In an effort to give back to the community behind VirusTotal and its premium services, we are making this feature entirely free. In other words, Retrohunt jobs against the goodware corpus do not consume Retrohunt quota.

This new feature builds upon some major improvements that have been recently released such as the new API endpoints to programmatically interact with VT Hunting. Stay tuned, soon we will be announcing far bigger enhancements to Retrohunt, you can take a sneak peek in our 2019 roadmap (Lightning-fast retrohunt).

Thursday, 24 October 2019

Revamping in-house dynamic analysis with VirusTotal Jujubox Sandbox

VirusTotal Jujubox Sandbox in action:

This is a small datastudio set up to illustrate the kind of analytics that can be built with a massive dynamic analysis setup, generating IoCs. Note that there are several pages.

One of the main themes of VirusTotal’s 2019 roadmap is “Holistic Threat Profiling”. Some users never move beyond the basic use case for VT: checking hashes and looking at detections. However, that use case, while still core to VT, is by no means the most popular. VT also provides information on URLs, IPs and domains, and what’s more, it builds a graph that relates all of these observables. In an effort to allow users to identify the complete attack campaign, beyond the individual malware variants, we continue to introduce new tools and features. This new functionality allows users to characterize a threat from different points of view: static analysis, dynamic analysis, code analysis, relationship analysis, and more.

In our ongoing efforts to improve our behaviour analysis infrastructure we are happy to announce the rollout of a new Windows Sandbox that radically improves and complements our previous Windows XP SP1 analysis systems that was launched in 2012. The analyses generated by this new system are seamlessly showing up in new file reports, freely for the community. We are also complementing our threat feed offerings with a dynamic analysis feed derived from this new system, more on this later, let’s first focus on the community impact.

The project has been baptised as “Jujubox” (a reference to the type of bad karma - juju- objects it processes) and integrated in the context of the multi-sandbox project. This new sandbox is currently running Windows 7 and records the actions of Windows 32bit and 64bit binaries under 80MB when executed. It extracts information such as:

  • File I/O operations.
  • Registry interactions.
  • Network traffic: HTTP calls, DNS resolutions, TCP connections, DGAs, etc.
  • JA3 digests.
  • Dropped files (and the interrelations between them).
  • Mutex operations (Creation, Opening).
  • Runtime Modules
  • Highlighted text in windows, dialogs, etc.
  • Highlighted winapi/syscalls
The information from the execution is indexed and searchable through VT Enterprise and fuels services such as VT Graph. Basically, any text found in these reports is indexed in an elasticsearch database. Each analysis also contains a fully revamped detailed HTML report, with improved filtering capabilities, allowing analysts to grasp the details of sample execution: syscalls, process tree and screenshots.

In order to access the detailed HTML report containing all windows API calls you just need to refer to the multi-sandbox action menu bar:

The detailed HTML report logs API calls and return values, meaning that it can greatly expand the observations contained in the summarized report view. You may refer to the following report in order to see an example of the full HTML report:

Let’s take a look at some specific use cases that can be solved with this new setup.


Pivoting and mapping threat campaigns

After the analysis we can gather information from the sample and use it to either find relationships with other elements or to pivot to other campaign artifacts. This is an example illustrating the sandbox analysis:

This new setup contributes to the relationships created between samples and domains, allowing us to appreciate the DGA used by this particular malicious sample. The same goes for its dropped files. The sandbox analysis acts as a microscope, allowing us to better understand an individual threat. For instance, we can also take a look at where this malicious sample usually stores itself for persistence by checking the copied files and registry keys set:

Using inline hover pivots it is easy to find other reports showcasing this very same behaviour:

To pivot even further and find other similar files, we can use one of the advanced search operators to focus on file activity:
behaviour_files:"C:\Program Files\AVG\AVG9\dfncfg.dat" and sandbox_name:jujubox

Once you have discovered several variants pertaining to the same threat actor, it might be a good time to build a YARA rule and feed it into VT Hunting in order to track the evolution of the given malware family and understand better the attackers behind it.


Finding similar samples by mutexes

Mutexes are often reused by many samples, although most of them are usually common and legit, malware often chooses very characteristic names for its mutexes, making it easy to identify families and threat campaigns. This sample is a perfect example, it has a very specific mutex name:

By clicking on the mutex name we can find samples sharing the same behavior when it comes to mutex creation. Within VT Enterprise we can execute the query behavior:sfdkjjhgkdsfhgjksd to find such samples.


Pivoting on JA3

JA3 hashing is a way to fingerprint TLS client connections. In this particular report we can see a JA3 hash:

To pivot on this JA3 we click on the hash and generate the pertinent search query. This will use the behavior search modifier:

Another JA3 example is to search for samples that use a Tor client:


Programmatically interacting via API

All of the data described above is freely surfacing in APIv3, giving users a complementary characterization of their files beyond file reputation. A common use case is VT Enterprise users setting up YARA rules in VT Hunting in order to track malware variants or threat actors and then automatically retrieving file behavior reports for their notifications. These file behaviour reports are then data mined for patterns in terms of mutexes, contacted domains, file naming conventions, etc. in order to generate indicators of compromise that can be used power-up security defenses.

The following datastudio showcases the kind of insights that can be derived from aggregated study of behavioral observations, it clearly illustrates that by focusing on volume, and beyond that on malware families and clusters, it is sometimes straightforward to identify patterns and commonalities in order to generate alternative detection mechanisms for threats. Note that this datastudio has several pages.


Sandbox feed

This important effort to improve our free community capabilities is also being leveraged to radically improve our premium services. As seen in the datastudio above, when operating at scale we can make use of clustering and data mining in order to generate patterns and commonalities that can be fed into security defenses as yet one more mechanism in our onion layered security model.

As such, we are creating a new offering that expands our portfolio of feeds (file and URL feed), allowing users to retrieve all the dynamic analysis reports generated for files uploaded to VirusTotal. The value proposition is simple:
  • Ingest every single sandbox dynamic analysis report generated for all files which are analyzed within VirusTotal sandbox. As of October 2019, we do our best to sandbox all PE EXE, MSI, Android, MacOS Mach-O/DMG/PKG files.
  • Datamine the feed and identify domains, IP addresses, URLs, mutexes, registry keys, etc. that may be used as indicators of compromise to power-up your security toolset.
  • Discover unknown malware flying under the radar of antivirus solutions by studying behavioral patterns.
  • Implement complex behavior detection rules.

If you are interested in getting Early Access Preview to this service feel free to reach out to us. In future blog posts we will dive deeper into how the sandbox feed can be leveraged to improve security defenses, stay tuned.

Wednesday, 23 October 2019

VirusTotal multisandbox += VenusEye

VirusTotal multisandbox project welcomes VenusEye. The VenusEye sandbox is currently contributing reports on PE Executables, documents and javascript.

In their own words:

VenusEye Sandbox, as a core component product of VenusEye Threat Intelligence Center, is a cloud-based sandbox service focused on analyzing malwares and discovering potential vulnerabilities. The sandbox service takes multiple(~100) types of files as input, performs both static analysis and behavior analysis after then, and eventually generates a detailed human-readable report in several supported formats like PDF or HTML. Being weaponized with MITRE ATT&CK knowledge base, VenusEye Sandbox combines the product and the service as a whole. With the help of our sandbox service, users can track threat actors or gather threat intelligence for their hunting in a much easier way.

You can find VenusEye reports under the “Behavior” tab:

Take a look at a few example reports within VirusTotal:

Document with macros

Taking a look at the embedded content preview for the sample 8143a2c2666575152896609c1d8d918717a358d4611a57a0cce2559e3c5cabbf we see that the malware is attempting to trick users to enable macros.

The VenusEye sandbox automatically enables macros and allows us to see the execution details, including the HTTP requests, DNS resolutions and process tree.

Javascript files

Wide use of online email services that automatically block executable attachments has led to attackers using alternative file formats for their spam campaigns. As depicted above, documents with macros are one example, Javascript files have also become quite popular. VenusEye represents a very interesting addition to the multi-sandbox project in that, unlike some of the other integrated sandboxes, it also analyses javascript files.

In this particular example, the simple fact that a javascript file that can execute in Windows (as opposed to being a website resource) performs DNS resolutions should be enough to consider the file highly suspicious. More so if we take into account the registry keys with which it interacts:

Rich relationships

The two examples above illustrate VenusEye acting as a microscope to understand what an individual threat does. However, thanks to the network traffic recordings, VenusEye also contributes macroscopic patterns that can be easily understood using VT Graph.

For example, when looking at the javascript file above we can make use of the file action menu in order to open it in VT Graph:

By default a one level depth inspection is performed, but we can always dig deeper. By expanding the files communicating with we get to discover a Windows executable that seems to be using such domain as its command-and-control:

In other words, VenusEye also helps in tracking entire campaigns thanks to the contributed file/domain/IP/URL relationships.

Advanced pivoting

As usual, all of this information is indexed in the elasticsearch database powering VT Enterprise, this makes it trivial to pivot to other variants of a given malware family or other tools built by a same attacker.

Let us now return to the document with macros above, VT Enterprise users can click on any of the behavior report contents in order to launch a VT Intelligence search for files exhibiting the same pattern when executed. Let us click on the first HTTP request entry:

This launches the search behaviour_network:"", finding other samples that communicate with that very same URL. Now that we have identified other variants belonging to the same campaign or threat actor, it is trivial to automatically generate commonalities that we can use as IoCs to power-up our security defenses:

Thank you VenusEye for joining the multi-sandbox family that aggregates more than 10 dynamic analysis partners and counting. If your organization has some kind of dynamic analysis setup, don’t hesitate to contact us to get it integrated in VirusTotal, we will be more than happy to grant you free VT Enterprise quota in exchange.

Wednesday, 17 July 2019

VirusTotal MultiSandbox += SNDBOX

Today, VirusTotal is happy to welcome SNDBOX to the Multi-sandbox project. SNDBOX is a cloud based automated malware analysis platform. SNDBOX advanced dynamic analysis capabilities gives additional insights and visibility intro a variety of file-types.

In their own words:
  • SNDBOX malware research platform developed by researchers for researchers and provides static, dynamic and network analysis. 
  • SNDBOX is the first malware research solution to leverage multiple AI detection vectors and undetectable kernel driver analysis. 
  • SNDBOX kernel agent is located between the user mode and kernel mode. The agent has the ability to detect all malicious activities going from the running application to its execution in the operating system.
  • SNDBOX technology delivers in-depth results, quickly while providing AI and big data insights necessary for comprehensive malware research and false positive rate reduction.

Highlighting some examples

Detecting ZBOT variant, with high visibility to “Process Hollowing” and “Process Injection” techniques used by the malware.

On the SNDBOX site you can see malicious network domains, as well as enabling next stage file analysis of dropped files found in analysis.

For VirusTotal Enterprise users, you may click on the mutex, to search for other samples with this same mutex. 

This links to a search of behavior:"7EF531C0" which will lead you to other behaviour reports with the same mutex name.

Revealing malicious network domains, as well as enabling next stage file analysis of dropped files found in analysis.


On VirusTotal take note of the DNS resolutions, and dropped files.  Dropped files are defined as the interesting files that are written to disk by the sample under analysis. 

Pykspa variant, network activity detected with Suricata and dropped files being sent for second stage analysis & detection:

Within the “Registry Keys Set” section we find that the sample is set to RunOnce on next startup, possibly a method to achieve persistence. 

VT Enterprise customers can click on the registry value which uses the “behavior_registry” search modifier  to search for other files that also use the same registry value:  behavior_registry:"nrsyjl"  

Bancteian variant data stealer caught and detected by SNDBOX's signatures:

Within the SNDBOX report check out the detections:

Thursday, 27 June 2019

VirusTotal, Chronicle and Google Cloud

It's been more than seven years since Google acquired VirusTotal, and more than one year since we moved to Chronicle. Today we have another update: Chronicle is joining Google Cloud. This update, like our move to Google a few years back, does not change the mission or focus of VirusTotal. We'll continue to operate independently, focused on our mission of helping keep you safe on the web.

Thursday, 6 June 2019

VirusTotal += Segasec URL scanner

We have added Segasec to the assortment of URL scanners on VirusTotal. You can find the results when scanning a URL at

In their own words:

Segasec is a Tel-Aviv based cyber-security startup providing end-to-end digital threat protection against consumer phishing attacks that originate in your blind spot - beyond the enterprise perimeter. Segasec’s patent-pending technology provides intelligence of upcoming attacks at the earliest possible preparation stages, running quadrillions of targeted scans that identify even unknown attack patterns. Segasec blocks compromised assets before they become a live risk, because once customer trust is broken, it’s already too late.

If you ask our customers what made them pick us over the competition, this is what they say -  End-to-end solution, in an entirely managed service. Early, proactive detection, both for brand and non-brand related threats. Fast and efficient block and take down, in under 3 hours.   Zero integration and fast onboarding .

If you would like to see a few example detections, checkout these reports:

Tuesday, 14 May 2019

VirusTotal += SecureAge

We welcome SecureAge APEX scanner to VirusTotal. In the words of the company:

“SecureAge APEX is an anti-malware scanning engine powered by artificial intelligence, designed to extend the detection capabilities of the SecureAge SecureAPlus endpoint protection platform (EPP). The APEX engine provides next-generation endpoint detection as part of the SecureAPlus layered approach to security which includes Application Control & Application Whitelisting, multi-cloud anti-virus, fileless attack protection and more. To deal with advanced threats like zero-day malware, the APEX engine goes beyond traditional scanners by reliably identifying unseen and mutated malware types and variants from day one of their release. The APEX engine that runs in VirusTotal targets Windows PE files; with integration into the VirusTotal ecosystem, SecureAge looks forward to further enhancing APEX's capabilities, and above that, adding value to VirusTotal's cybersecurity services.”

SecureAge has expressed its commitment to follow the recommendations of AMTSO and, in compliance with our policy, facilitates this review by AV-Comparatives, an AMTSO-member tester.

Wednesday, 8 May 2019

VirusTotal MultiSandbox += Yoroi: Yomi sandbox

We are excited to welcome Yomi: The Malware Hunter from Yoroi to the mutisandbox project. This brings VirusTotal upl to seven integrated sandboxes, in addition to VT’s own sandboxes for Windows, MacOS, and Android.

In their own words:
Yomi engine implements a multi-analysis approach able to exploit both static analysis and behavioral analysis, providing ad hoc analysis path for each kind of files. The static analysis section includes document and macro code extraction, imports, dependencies and trust chain analysis. The behavioral detection engine is weaponized to recognize suspicious actions the malware silently does, giving a powerful insight on command and control, exfiltration and lateral movement activities over the network, including encrypted channels. Each analysis is reported in an intuitive aggregated view to spot interesting patterns at a glance.

Some recent samples on VirusTotal with reports from Yoroi:

To see the full details click on the “Full report” within the behavior tab.

Interesting features

Executed commands
Within the Yomi Hunter report, additional information on executed commands can be seen. In this case, we see obfuscated powershell commands being run.

To search other behaviour reports for the string “zgohmskxd” we can use the behavior_processes:zgohmskxd search query to find another sample with the same variable name. Check out the other search modifiers that can be used to find similar samples.


Within the Additional information tab, we can also find the mutexes used by the sample under analysis. behaviour:AversSucksForever

To search other sandbox behavior reports with the same string we can search


Mitre ATT&CK™ tab

On the MITRE ATT&CK™ tab you can see how the specific behaviour is behavior is tagged


With the emotet sample we can see the SMB and HTTP traffic. Next you can click on the relationships tab to see other related IP Addresses, Domains, URLs and files.

You can visually see these relationships from within VirusTotal Graph: