Thursday, March 25, 2021

Leveraging adversarial data for security control validation

 Nowadays defenders have at their disposal a big amount of data describing how attackers proceed in their malicious campaigns, including TTPs (Techniques, Tactics and Procedures) and artefacts used. Threat Intelligence is the discipline that, in simple words, tries to make sense of all of this - then it is up to us how to make this knowledge actionable. The decision on how to use it in the most effective way depends on every organization, but there are different well-established methods gaining traction in the industry that will provide you with immediate valuable feedback about your defenses. Let’s explore them in more detail.

One way of leveraging this adversarial data is using it to check that our defenses are up to date to protect us against current real threats and campaigns. But before we go on, if you want to know more about this topic join us in our webinar with AttackIQ next March 31st 15:00 UTC.

In this blogpost we will discuss a few scenarios and examples where this data can be used for adversary-driven red-teaming. 

Checking the last sound campaign

The idea behind this is pretty simple. Every time details about a new relevant campaign are made public (how to define relevance depends on every organization) we can simply take a look at different artefacts available and see how they are being detected by our defenses. Let’s take as an example Sunburst malware discovered last December. We can start by finding some indicators to start with.

We could start by checking a couple of hashes in VTGraph to check if there are any graphs already shared by the community. Sometimes graphs are a more up-to-date and more enriched source of information than the original release of IOCs, typically more static. In this particular case we can find a few interesting graphs; we simply select one of them

Now, what to select from this investigation? It depends on what we want to check, but we could start by taking a look at all the documents or executables used in the attack. Here we should export the IOCs (you can go to Selection>Select All>Files, then export this data). At this point we can choose, for instance, Hashes with Detection and then open in VTI:

In this view it is simple to filter by file type. By clicking on the Commonalities button, we will find all the different types in the samples resulting from the query, which we can copy or directly open in VTI, but this selection of samples are ready to be used in our red-teaming exercise.   

Minimizing our infection surface against ransomware

In this scenario our first step would be understanding how fresh ransomware campaigns are being spread. We can do this in different ways, a simplistic approach would be checking by verdicts including “ransom” for recently seen malware:

Good news is that we can use many different angles for finding suspicious files, including crowdsourced YARA rules. For instance, from the previous query we could enforce finding results already spotted by some crowdsourced YARA rule, this way we can identify what rules are interesting for further pivoting:

engines:*ryuk* fs:2021-01-01+ type:peexe have:crowdsourced_yara_rule

Once we find any interesting rule, we can use it to find additional artefacts, like in this example where we use one of the crowdsourced YARA rules to find new Ryuk samples

Keeping an eye on fresh suspicious attachments

This is a use case I strongly recommend implementing to any organization. Given spear-phishing continues to be the most used infection vector, why not regularly monitor anything new coming this way? There are many different ways to do that, a generic approach could be something like this query to find fresh docx files suspected of being malicious and having macros.

We can be a bit more specific by adding additional search modifiers, for instance, which ones of the previous files we have seen being distributed as attachments in spear-phishing attempts:

tag:attachment type:docx fs:2021-03-01+ p:5+ s:2+ tag:macros

Once we have this information in front of us it is relatively easy to spot some patterns. For instance, the visual aspect of this file seems pretty common in the list of suspected samples, we can simply use visual similarity to find more artefacts.

The resulting set of samples not only have the same visual aspect, they also share a pattern for the file name, they have similar file size and were created around the same time. Armed with this information, we want to make sure we detect this new campaign before it spreads further. 

A must for our security strategy

Including adversarial data in our security strategy provides us with real world validation for our defenses. We can use this on a regular basis and shape it to our needs and weaknesses. Continuing monitoring particular adversaries, malware families and campaigns will help us understand how attackers evolve and how to shape our defenses. Not only that, crowdsourced intelligence allows us to stay one step ahead of adversaries by learning from other attacks and making sure our defenses are up to date before being hit by attackers.

We presented here just a few ideas we encourage everyone to explore with an open mind. For instance, in addition to the previous examples we could also use PCAP files generated by malware for replaying traffic in your infrastructure and check how effective your networking monitoring and detection capabilities are.  

Make sure to join us in our webinar to get additional pro tips!

Happy hunting

Thursday, February 18, 2021

, , , , , ,

When you go fighting malware don´t forget your VT plugins

It's been a year since we launched our VirusTotal plugin for IDA Pro, followed by SentinelOne’s amazing contribution to the community with their VirusTotal plugin for GHIDRA (thanks again for the great job), inspired by the original IDA plugin but adding some cool extra features.

Now, what are IDA Pro and Ghidra? These tools are the more popular disassemblers used by the security community for malware analysis. Basically, they help researchers to understand the functionality of the code used to build the malware.

Most of VirusTotal’s users simply use the web interface or the API in order to do their investigations or enrich their threat intelligence systems, so how and when do these plugins come handy?

Before we go on, make sure to join us for our next webinar with SentinelOne next February 24th where we will demonstrate how to use both plugins with real life examples. Join us and register here!

Looking inside the malware

VirusTotal usually provides all we need to know about a malware sample and more, especially when it comes to context and the relationships with other samples or malicious infrastructure. However, sometimes as analysts we need to take a deeper look, here is when we IDA Pro and Ghidra come to the rescue.

What do VirusTotal's plugins for these disassemblers have to offer? Basically, they make analysts’ life easier by providing several handy functionalities that leverage VirusTotal’s knowledge base. For instance, in one click we can search for samples that use a specific relevant piece of code that we found in the sample we are analyzing. Indeed, plugins’ code similarity search functionality offers new ways to find related samples that aren't easily reachable without going down into the reversing process.

We will usually want to find samples with a similar set of instructions than the one we are analyzing. Let's see an example. If we take a look at both WinMain functions of two different samples (as shown below) it is clear that they are practically identical, only differing in the value of some operands.

If we omit these differences, we can see that they have the same structure and share the same set of instructions.

You never know what kind of valuable information you will find when analyzing a sample. It could be a very peculiar implementation, or a distinctive function that attackers implement in all their samples. It also could be that we are taking a look into earlier versions of recently deployed malware, giving us the opportunity to understand its evolution before attackers implement anti-reversing techniques.

Analyzing corrupted files

Code similarity provides additional advantages. Let’s consider the case where we have some corrupted samples of a recent malware strain. They can be just memory dumped files, or PE files that were modified during the execution - anyways we cannot execute them. These kinds of files are not the best for creating YARA rules, because there is a chance that the content has been modified before the memory image was dumped to disk. In these scenarios is where the use of VirusTotal plugins shine, as we can search for code that we find interesting enough for finding related samples. We previously described this technique to hunt Ryuk samples starting from a corrupted one.

There are many other ways in which these plugins can assist you for code analysis. For instance, we can look for code similarity during a debugging session, the advantage being we can search for decrypted or uncompressed samples uploaded to VirusTotal by just searching for some instructions obtained in runtime. We'll further explore this technique in our webinar with SentinelOne.

What’s next?

So what is the future of the VirusTotal's plugin for IDA Pro? We are working hard on implementing a new exciting set of features focused on assisting you during the reversing process. For instance, we plan to collect contextual information from our database about the sample you are working in and show it in the IDA interface. We will also enrich the disassembled code to highlight the most significant information collected from VirusTotal.

We will show you more about what will be in the new version in our joint webinar next February 24th!

See you there and Happy hunting!

This post was co-authored by Vicente Diaz.

Monday, January 25, 2021

Building towards the richest and most interconnected malware ecosystem

 Investigations on malicious activity usually start with small pieces of a puzzle we don't know how big and complex it will be. Analysts will never have a full picture of the attack under investigation, only attackers know, but probably that's not necessary either. What is needed is to retrieve the context necessary to achieve the goal of the investigation.

How to get this context? Every piece of the puzzle can be used to obtain new pieces. Then, we repeat the process until we don't find any more clues, or we are satisfied with the results. In this case, the pieces of the puzzle will be Indicators of Compromise (IOCs), usually hashes, domains and IPs.

So when starting the investigation with only a few pieces... how to find the rest in VirusTotal? It is a pretty massive database, so we have been working hard to find every single clue we could to relate different items for you to complete your puzzle. For instance, if we start with a few malware samples we want to find the infrastructure used in the attack as well as other related files used by the same attacker in the same campaign. Maybe we can even use similarity to find potentially related samples from the same actor.

We have good news for everyone! During the last months we have included additional meaningful relationships to create a rich ecosystem that interconnects samples, URLs, domains and IP addresses. Below we will review what kind of relationships you can find in VirusTotal. You can visualize all the relationships-related information under the “Relations” tab in VirusTotal for any sample and networking item.

Below you can find all the fresh new relationships specific for files:

  • Dropped files: Interesting files written to disk during sandbox execution. Extremely useful to find what dropper was used for any specific malware.

        For example: baad6807d751aa8b44bd464b3302a6ad4c200dc27b22b3845b0397cf366e3f4c

  • Overlay children: Files that are contained as overlay in another sample. Once again, finding information about the parent of some malware sample helps understanding the whole execution chain and properly reproduce the attack.

        For example: 12304478f1c50f9d10497bc8afea771bd1e3bd5bd3beaa0370090f727f3713a1

  • PCAP children: Files seen inside the communication traffic for an uploaded PCAP file. Another valuable source of information, as the communication between samples and Command and Control servers can shed light on the artifacts used by attackers once having a foothold in the victim.

        For example: 2804184381e9c1c51a213bdcd703ae0a9a16c6abc39b43cd44619365d5914934

  • PE Resource children: PE files contained into another file as a resource. Similar to the cases above where we want to find the parent of the malware, this time hiding in a different place.

        For example: 12305f7314b7b3c13657d7da48b73a2d10a2303cc23e76d6954ea909ac74e997

  • In the wild (ITW) IP addresses: We have seen this file being downloaded from these IP addresses. This is how we know how the malware was distributed. It could help to find the malicious infrastructure used by attackers, but also hacked sites used as watering holes for example. 

        For example: a3b2528b5e31ab1b82e68247a90ddce9a1237b2994ec739beb096f71d58e3d5b

  • Email attachments: Files that were distributed through email as attachments. Spear phishing is still the most popular method employed by attackers to distribute malware. This relationship helps confirm what artefacts were spread this way. 

        For example: 1230725a4b8cbfa70c19c9eaa925b945511374da1cce787ea2854c2a2303f1b6

You can use the have: modifier with the newly added relationships for your searches in the following format have:name_of_relationship. For instance, you can find Emotet samples distributed through email as an attachment using the following query:

        emotet have:email_attachments

You can have the full list of modifiers for your searches here.

For URLs we also have the following new relationships:

  • Communicating files: Given an URL, we can find all files presenting any sort of traffic to it. This helps us understand what files were distributed from some malicious infrastructure or compromised website. Additionally, understanding what legitimate files communicate with a given URL can also provide a valuable insight, for instance for detecting suspicious supply chain activity.

        For example:

  • Referrer files: Any file that contains the given URL on its strings. Maybe we didn't see these files directly communicating with a given URL but it could be they are only the component containing the configuration.

        For example:

In addition to all these relationships, we are also stepping up our passive DNS capabilities. As a result, you can now find the following records for domain resolutions in VirusTotal:

  • CAA records

  • CNAME records

  • MX records

  • NS records

  • SOA records

The example below shows in VirusTotal Graph all these DNS records for a given suspicious domain.

A final reminder: you can automate dealing with all this data to make your hunting experience even smoother using API v3. For instance, you can use the following query to retrieve MX records for the domain above:

curl --request GET --url '' --header 'x-apikey: 'your_api_key_here'

Happy hunting!

Thursday, December 10, 2020

, , , , , , , ,

VirusTotal Multisandbox += Sangfor ZSand

VirusTotal multisandbox project welcomes Sangfor ZSandThe ZSand currently focuses on PE files,with extensions to other popular file types like javascript and Microsoft office to be released soon.

In their own words:
ZSand, developed by Sangfor Technologies’ Cloud Computing & Security Team, is an agentless behavioral analysis engine incorporating multiple innovative techniques. At the systems level, zSand employs Two-Dimensional Paging (TDP) techniques to inject hidden breakpoints, enabling accurate monitoring of the API calling sequence of a given process for further fine-grained analysis. At the GUI level, interactions are simulated by the virtual network console (VNC) and visual artificial intelligence (AI) techniques, providing a lifelike and fully functional sandbox. At the detection level, zSand identifies all forms of malware, including vulnerability exploits, by uncovering malicious behaviors and synergistically applying both conventional rule-based approaches and advanced AI algorithms. As a core innovation of the Sangfor anti-malware research group, zSand is a significant improvement in cyber-security capability for both Sangfor Technologies and its clients, customers and partners. Use cases include proactive hunting for unknown threats and the near real-time production of threat intelligence identifying malicious URLs, domain names, files, memory fingerprints, and malicious behavioral patterns. zSand is an agentless behavior monitoring engine, allowing users to deploy real-time defenses in a virtual environment.

In comparison with other sandboxes, the key advantages of zSand include:
  • High runtime performance -- By optimising the configuration of TDP and reducing the number of VMExit events, zSand minimizes monitoring overhead and resource utilization.
  • Strong anti-evasion measures -- Thanks to high performance hardware virtualisation and agentless features, zSand is immune to anti-sandbox detection. 
  • Comprehensive monitoring -- zSand retrieves detailed malware behavioral events and associated states of hardware including CPU, memory, disks, and network interfaces. 
  • Extensive and in-depth analysis -- Designed by cyber-security specialists and AI specialists, zSand is able to dynamically detect elusive and concealed malicious behavior, vulnerability exploits, malware persistence, and privilege escalation, at low levels.

Take take a look in the behavior tab to view these new sandbox reports:

Example reports:

You can also take a look at a couple of Sangfor ZSand behavior analysis reports here and here.
In case you are interested in searching for specific Sangfor ZSand reports, VirusTotal premium services customers may specify so using sandbox_name:sangfor in their queries.

Pivot on interesting behavioural characteristics

All malware uploaded to VirusTotal is detonated in multiple sandboxes, providing security analysts with many interesting and powerful possibilities. Having multiple fine-tuned sandboxes increases the possibilities of malware detonating properly (remember malware usually implements different anti-sandboxing techniques), and provides valuable dynamic data on how the malware behaves.

Why is this data valuable? Because it gives us details that are not visible at static analysis time. For instance, we can use this data to land some TTPs into something more actionable. We will get back on this topic on a future blogpost.

For example, taking in the following sandbox report we find some potentially interesting mutex names. 

We can use this data to pivot and find other malware having the same mutexes when detonated on our sandboxes. By clicking on one of the interesting mutexes, in this case ENGEL_12, we will create a new search ( behaviour:ENGEL_12) which provides us with samples belonging to a common family of padodor malware.

It turns out that this is a valuable dynamic indicator we can use to identify malware samples belonging to this particular malware strain.   From VirusTotal, we welcome this new addition to our Sandboxing arsenal. Happy hunting!

Tuesday, December 01, 2020

VirusTotal += BitDefender Falx

 We welcome the BitDefender Falx scanner to VirusTotal. This engine is specialized in Android and reinforces the participation of Bitdefender that already had two engines in our service, their multi-platform scanner (BitDefender) and a 100% machine learning engine (BitDefenderTheta). In the words of the company:

“Bitdefender offers a cloud-based malware detection product for Android. It is built on several automated systems that perform different methods of static and dynamic analysis. Powerful machine learning models and other complex threat detection techniques form a state of the art security solution capable of detecting previously-unseen advanced malware. The cloud-based approach offloads computationally intensive tasks to a distributed cloud environment to deliver the best protection with no impact on system or battery performance.”

Bitdefender has expressed its commitment to follow the recommendations of AMTSO and, in compliance with our policy, facilitates this review by AV-Comparatives, an AMTSO-member tester.

Thursday, November 26, 2020

, , , , , , ,

Using similarity to expand context and map out threat campaigns

TL;DR: VirusTotal allows you to search for similar files according to different orthogonal notions (structure, visual layout, icons, execution behaviour, etc.). File similarity can be combined with the “have:” search modifier in order to gain more context about threats, e.g. what are the emails or URLs that distribute them.

This is the second blog post in our similarity series, the first article focused on how to trigger file similarity searches and the different similarity vectors at your disposal. In the context of this series we have also done a webinar that can be viewed on-demand, it focuses on using similarity to automatically produce optimal YARA rules to detect a given malware framework/family/campaign via VTDIFF.

This situation might sound familiar. As a SOC analyst or Incident Responder you are often confronted with files you know nothing about. Your SIEM describes their internal sightings and actions but fails to transmit the bigger picture. You are constrained by the narrow visibility of your corporate logs. Context is king and the problem is that you are fighting threat actors that operate globally with just a piece of the puzzle, your local data.

What is this file? Who is behind it? What is their modus operandi? How did it get there? Are there other related components? What does it do? Are there other variants that could have impacted my organization in the past? Any that could impact us in the future? How do I contain it? Your SIEM, case management system, EDR, firewall, IDS etc. don’t answer these questions. You are missing a necessary layer in your defense-in-depth security strategy.

VirusTotal is your saving grace. You jump into VT ENTERPRISE and look up the hash: threat reputation is useful, but you need further context. Your task is to identify IoCs that can be used for remediation, e.g. by blocking a command-and-control domain in the network perimeter, as well as artefacts that can be used for proactive threat hunting purposes, to determine whether there has been a breach and what is its scope. The issue is that sometimes VirusTotal does not have full context for a specific individual file in terms of sandbox reports, in-the-wild sightings, relationships, etc. and so your investigation might end here.

How to do it better

Isolated hashes are of limited value. Many times they are unique per victim or campaign, so a better idea would be finding the cluster/family/campaign they belong to in order to unearth remediation IoCs and threat hunting patterns. Most importantly, you need to leverage those groupings in order to surface command-and-control domains, dropzones, distribution URLs, phishing emails, etc. that can be used for mitigation and containment, and, to build proper understanding and situational awareness.

Similarity and the “have” search modifier to the rescue. Let’s imagine the initial hash that popped up as an alert in our environment was a first stage EMOTET dropper, i.e. a document that delivers a malicious payload through macros.

Threat reputation allows you to perform an immediate first assessment (alert triage), but other than that there is little context in terms of remediation IoCs and hunting artifacts. We still know nothing about how this file gets distributed, i.e. its delivery vector. Similarly, we fully ignore whether this is something spear phished exclusively against our organization or part of a larger campaign. What about the threat network infrastructure? Does it download additional payloads? Does it communicate with a command-and-control?

The next step in an incident response engagement - and this is what most analysts fail to do - is to jump into the file’s cluster (its family/framework/campaign) in order to expand context and surface IoCs. This is just one click away:

For documents there is a limited number of approaches to find similar files (other file formats will expose more), this said, they are very rich because they are fully orthogonal: structural features, visual layout, local sensitive fuzzy hashing, execution behaviour similarity. Let’s jump to other similar files based on the document’s visual layout by clicking on “Similar by icon/thumbnail” or on the thumbnail itself, located in the top right: main_icon_dhash:23232b2b00010000.

There are too many matches, we would have to iterate over every single one in order to surface particular patterns that may allow us to understand the campaign.

Finding phishing emails that distribute the threat

We can narrow down the search above to match exclusively those files that have been seen as an attachment in some email uploaded to VirusTotal:

main_icon_dhash:23232b2b00010000 AND have:email_parents
(Note that you can also use tag:attachment instead of have:email_parents)

We can now run through the matching files, open up their Relations tab and jump into the pertinent email parent, so as to understand the deception techniques being used in the campaign:

This particular instance poses as some kind of World Health Organization report on COVID. It is important to inspect all the other emails because not only will they tell us more about the lures, it will also allow us to identify targeted industries, geographical spread, activity time spans, etc. For instance, there could be other localized variants that could be targeting some other corporate branches. Access to these emails will not only give us greater insight into the attacker, it is also something we can leverage tactically in order to improve filtering in our email gateways.

Discovering URLs that distribute this threat

We want to see if this campaign is also being distributed via download URLs. If that´s the case we can block them in our network perimeter or use them to search across web proxy logs. Let’s ask VirusTotal whether any of the files in the cluster have associated in-the-wild URLs:
main_icon_dhash:23232b2b00010000 AND have:itw

We can now jump into the Relations tab in order to export these additional IoCs:

There are over 3K files with in-the-wild URLs, note that we can automate all of this via the API.

Identifying command-and-control/exfiltration infrastructure

The next step is to understand whether any of the machines in our corporate fleet are beaconing out to infrastructure tied to this campaign. At the same time, we will probably want to block the CnC and exfiltration points in order to mitigate the impact of historical undetected breaches. Let’s filter down the search to focus exclusively on those files that exhibited network communications when executed in a dynamic analysis sandbox:

main_icon_dhash:23232b2b00010000 AND have:behaviour_network

Most of the matching files have been analysed by several sandboxes participating in our multi-sandbox effort. This gives us unparalleled visibility into the campaign. For an attacker it is easy to evade a single sandbox, it is far more complex to do so for 17+ of them at the same time. Each one of them set up in a different geographical region, going out to the internet through a different IP address, running different OS versions, with different software and language packages installed, etc. As a result, we now have very interesting sightings in terms of infrastructure:

These communication points can be very easily triaged. Remember that VirusTotal also characterizes domains, IP addresses and URLs. Threat reputation for these domains further confirms that they are accurate IoCs:

The domain relationships (in-the-wild sightings) tell the same story:

We now have additional IoCs that we can feed into our stack in order to proactively defend our organization from other variants. As a bonus point, pivoting to other campaign files that have sandbox behaviour reports allows us to shed more light into other TTPs that we might be tracking via MITRE ATT&CK (e.g. installation, actions on objectives, etc.).

Gaining context through the community

Furthering on the use of the “have” search modifier, we can also leverage it to find files on which some VT Community user has placed a comment providing more context:

main_icon_dhash:23232b2b00010000 AND have:comments

Community comments often give us interesting details in terms of in-the-wild observations, malware capabilities, reverse engineering reports, attribution, etc. For example, in this particular case we learn about additional distribution URLs:

This other case helps us understand that this first stage is EMOTET and allows us to jump into a pastebin dump with further context about the campaign in terms of related hashes and network infrastructure:

Additional context

The “have” modifier accepts many other values, some of the more representative ones are:

  • compressed_parents: the files were seen inside a compressed file uploaded to VirusTotal.
  • pcap_parents: the files were seen in a network traffic recording uploaded to VirusTotal.
  • embedded_(urls/domains/ips): a URL/domain/IP address pattern was extracted from the binary bodies of the files.
  • behaviour: the files managed to execute in at least one sandbox and produced the pertinent dynamic analysis report.
  • behaviour_registry: the files executed in a sandbox and interacted with the Windows Registry.
  • crowdsource_yara_rule: the files match some YARA rule coming from open source community repositories, these rules often provide additional references and descriptions about a threat.

Summing up

VirusTotal aggregates orthogonal means to cluster together groups of related files. Files which may belong to the same malware family/framework/campaign/actor. These file similarity vectors range from structural features to dynamic analysis observations.

We started off with a single IoC for which we had little context, neither did VirusTotal, beyond basic threat reputation. By leveraging file similarity we managed to find thousands of other files related to the campaign/malware framework. Through the “have” search modifier we then narrowed down our searches to identify phishing emails used by the attackers, distribution URLs, additional network infrastructure such as CnCs and context shared by other threat researchers.

All of this is tactical intelligence that can be fed into network perimeter defenses, but also context that can be operationalized and digested into TTPs in order to characterize threat actors. Finally, this blog post presented an incident response scenario but the very same logic can be applied to threat actor tracking or campaign monitoring use cases.

This post was authored by Emiliano Martinez.

Thursday, November 19, 2020

Why is similarity so relevant when investigating attacks

The concept of similarity is pretty straightforward: are two files similar? There are many ways to figure it out. That's why different similarity algorithms exist. Now, why is this useful? 

Attackers need tools for their attacks, basically malware. Malware in the end is a piece of software, built from frameworks, code and libraries, and takes some time and expertise to create. The result is that two different malware files built from the same developer using the same pieces will look alike.

Imagine you are investigating some attack and you find some suspicious file. After taking a look in VirusTotal, you find nothing really meaningful about the file itself. One idea at this point would be finding similar files: maybe the attacker used similar malware in other campaigns than the one under investigation, and maybe these files will tell more about the infection chain and infrastructure. Here is where similarity comes handy!

Additionally, the same approach can be applied to attribution. We find some malware that looks new, there are no references about it. Can we find similar malware? Maybe the new artefacts will tell more about the author, maybe they are well-known by the security industry. This is how attribution is built in many cases.

There are many situations where similarity becomes useful. We can always reduce the problem to the following: IOCs can easily be replaced, malware frameworks not. 

If you want to know more about how to use similarity in real cases, join us next November 25th for our “Similarity brings your threat hunting to the next level” webinar with TrendMicro and Trinity Cyber. Register here.

In this blogpost we will discuss some interesting ideas of what can be done with similarity in VirusTotal.

File similarity in VT

You came across the following sample c9b96d5d694e4e25e03d97c7b95eff637525e539b9c47c8eda498f72ecd51b22 within your network and you want to find some context. Crowdsourced sigma rules already warn that something fishy might be going on. 

At this point we want to get a better understanding of the whole picture, which means getting more artifacts. When we run out of indicators, similarity to the rescue!

How to find similar samples? Right from the Details panel in the sample report there are several hashes that correspond to the output of different similarity algorithms: vhash, authentihash, imphash, rich PE header hash, ssdeep and TLSH:

It is important to understand that different similarity algorithms provide different results. Choosing the right similarity many times depends on the samples we are working with, that's why sometimes it is just easier to check them all at the same time and take a look at the results.

Clicking on any of the hashes shown in the report will return all similar samples. In this case, vhash returns 57 additional files, imphash finds no other hits and rich PE header hash returns around 1.16 million other files in VT (we can spot potential non-malicious files adding the search operator positives:0).

All the above might sound too technical, that's why sometimes we can approach this similarity problem with a different angle. For instance, we implement visual similarity. This is specially useful for suspicious documents distributed by attackers, but it also works for executables sharing similar icons. In this case, visual similarity returns 3,390 new files by clicking on the icon above.

We do our best to detonate in a sandbox every file we receive in VirusTotal. Would it be possible to find files with a similar behavior? It is! Even better, we integrate multiple sandboxes, offering us different options. We can do this similarity search either by selecting it in the multiple similarity button, or in the Behavior tab. Following the example, JujuBox behaviour similarity returns 11 additional files. This is an interesting feature when we want to make TTPs actionable, but we will get back to all these topics in a future post.

We have used clustering hashes (both static/structural and behavioural), but are there concrete features that we could pivot on? We can look at the Capabilities and Indicators. Specifically, let's try to find some pivotable features (or clues) among the million files caught by rich PE header hash using the VT Enterprise query [rich_pe_header_hash:640b9fb49577f39427b39125155c2425 have:clue_rule]. One of the results, 15e5353c8d5d1b1dba8d9c99e77075d737771335eac9597eba95d1f3efc3b6cd, shows interesting dropped files, registry keys set and DNS resolutions in the Details panel. We can click in any of these indicators to find their respective clusters.

We can even drop all of this into a VT-Graph to see the whole picture and the different clusters in a single panel, including the rest of the attack like dropped files, contacted URLs, etc.


To sum up, once we understand the value of using similarity for our threat hunting, it is very important to have all the options available depending on our needs. Different investigations, or different malware families, need different approaches. Behavioural similarity for instance can be very interesting when the samples are different but the TTPs are common.

But we cannot apply similarity without any data to compare. In VirusTotal we have 2.5 Billion files to make sure you get the most from your Threat Intel investigations.

Happy hunting!

This post was co-authored by Marta Gomez and Jose Martin.