Monday, November 28, 2022

Threat Hunting with VirusTotal

We recently conducted our first “Threat Hunting with VirusTotal” open training session, providing some ideas on how to use VT Intelligence to hunt for in-the-wild examples of modern malware and infamous APT campaigns. In case you missed it, here you can find the video recording available on Brighttalk and Youtube.  We also created a PDF version of the slides with all the queries covered during the session and direct links to the documentation.

We received lots of questions during the session that we decided to answer in this Q&A blog post.

1.  How can we search for “have:itw” with a specific URL?
“have:itw” is a search modifier you can include in your VT Intelligence queries to get all samples we found being distributed in the wild. You can specify any particular domain in your query, for instance the following example finds samples being distributed itw through discord:  

2.  How can we convert the search queries to monitoring alerts?
Good question, at the moment we are working on a solution to do this automatically, hopefully available very soon. In the meantime, there are two workarounds: execute your query through the API or, in some file-related cases, you can rely on the Yara VT module to create and deploy a Livehunt rule.

3.  Is there any documentation on the VT website for all this info?
Yes, here you can find general manuals and dedicated documentation for the API. Another good resource is our getting started site. You can find more resources linked in the training slides.

4.  Can we have sessions on hunting part of VT like this.
Yes, we will be having quarterly “Hunting with VT” sessions (at a minimum).
 
5.  Can we use regex in queries?
You can use regex in VT Grep queries. The following is an example using wildcards for a hexadecimal sequence of bytes: content:{686f6c61 ?? 6d756e646f}
There are no wildcards for most of the regular VT Intelligence queries with the exceptions of “name:” and “domain_regex:”, because we use full text search. In some cases you can achieve the same effect by combining search terms with the “AND” keyword.

6.  Can you point out a location for all the most useful queries?
We are working on a Cheat Sheet which will be available very soon, stay tuned.

7.  Can you see the content tab in the free version?
This is only available to VT Enterprise customers.
 
8.  Can you use wildcards in date notations when searching?
Date notations are quite flexible even without wildcards. For example, for malware submitted to VT in January this year you can use the following: 
You can get malware submitted for the last 5 days with the following query:

9.  What is your keyboard? Super nice sound.
DROP CTRL + Kaihua Speed Silver + T0mb3ry SA Carbon

10.  Does “crowdsourced...:malware_name” will give all the rules sigma/yara written for that malware?
Not really. Rules are not assigned to any malware or actor in particular, so we need to rely on the name of the rule. For instance:

provides you with files detected by the Yara or IDS rules with “Sofacy” in its name.

11.  Does Virustotal collect samples from sources other than user submitted files? For example, does it passively scrape the Android App store to check for new apps created by APTs?
Yes! Just as an example, you can submit a file to VirusTotal from Process Explorer. Also, there are different research groups and other volunteers (thanks again to all of you!) who share new samples with the VT community.

12.  How to fetch all the matched samples for a query by script instead of going through all pages?
You can use this API endpoint.

13.  Sometimes I tried to perform behavior search using powershell commands, but it doesn’t work for me. For example, this one. When clicking to the powershell command itself, returns no results. Also, doing something like behavior:"PAAjAGcAeABwACMAPgAgAFIAZQBnAGkAcwB" does not work.
Thanks for the heads up! There is an issue when transforming super-long strings into a search query, we are already working on a fix. 
Regarding the second question, when doing a full text search unfortunately it is not possible to use substrings (unless it is a separate word). For example:

14.  If I am trying to hunt a certain ransomware, and the actual PE Files are present on VT as per YARA Rules. How can I find the initial vector?
Searching for the initial infection vector is tricky and not always possible as it depends on VirusTotal’s visibility, but you can try the following ideas:

  • To find suspicious (p:5+ for 5 or more AV detections, as rule of thumb) files distributed as email attachment you can use tag:attachment p:5+. Then you can get full details in the “Relationship” tab for every resulting sample:


  • To find suspicious files known to be distributed in-the-wild you can use have:itw p:5+. Details, again, in the “Relationship” tab:




15.  Do you have a link to all available VT "entity:" types?
That would be “file”, “url”, “ip”, “domain” and “collection”. You can find extensive information with documentation links on the slide #5 of PDF slides. This will also be available soon in VirusTotal’s documentation.

16.  How can we utilize the VT API to upload our own PE files to sandboxes for dynamic analysis? I have been running it on some executables and most of the files error out during the sandbox execution (e.g. tries to open a file in the file system that doesn’t exist, etc).
Sandboxing is part of the executable processing pipeline, you can submit your files using this API endpoint. Regarding any technical issues including unexpected sandboxing behavior you can reach out to Virustotal Bot at the bottom of the website.

VirusTotal Support Bot

17.  Can we have the option of moving back to the old color scheme in Retrohunts?
Thanks for your request, we’ll think about how to make the Yara editor pleasant for everybody.

18.  Can you show an example of using language with word document search? How can I search for Follina samples with specific languages?
The “lang:” keyword finds all files whose Exif language property matches the language provided. In Follina case, following our training instructions, we can use the crowdsourced Yara rule with language clarification:

19.  Is there a thinking on future development for GodMode to be able to label deployed malwares by sectors? like Healthcare?

20.  Is there Mitre Attack type T. identifier search? 
Yes, you can use the following keywords and specify any technique or tactic.

21.  Is there a way to run custom tools on top of what is available?
You can always organize a custom post-processing of VirusTotal data by using our API. This is exactly what the “APT dashboard” we showed during the training does.

22.  Were we able to search for a specific email and see all other files distributed by the email? 
You can leverage VT Grep to search for email files with specific content and then pivot to the children entities as it appears under the Relationship tab, for example:

23.  How can we combine a domain and a file seen together?
If you want to get files downloaded from a specific domain (drive.google.com in the example below), you can use:

24.  What about wildcards in androguard on receivers/services/activities?
Unfortunately, this data is indexed as full-text search, so no wildcards are allowed. However, if you feel that this is critical to you, please feel free to reach out.

25.  What encodings are accepted by the "content" query parameter? Is it only plaintext and hexadecimal?
You can also use escaped UTF-8, check the VT Grep manual for details.

26.  Why isn't it possible to combine the "have" and "content" modifiers? I have the following error: "have can't be combined with content modifier", example: have:email_parents AND content:"Hello World!"
Content search is a different querying tool by design, that's why it is not possible to combine with all the other keywords by default. We are collecting the feedback and adding compatibility with the most popular ones (tag, type, p, fs, etc). For more details please refer to VT Grep documentation.

27.  How can I submit a yara or a sigma rule to VT for community use? Are they generally just obtained from curated open source?
Yes, we are always looking for solid and active repositories constantly updated with the latest malware signatures. You can find the number of such repositories in our Contributors list. If you do want to contribute, please let us know.

28.  Will the APT dashboard be downloadable from GitHub? Would it be possible to share one of your jupyter notebooks, please?
At the moment we are not openly sharing such notebooks, which are mostly designed for demoing and internal research, and we are not really sure if they are solid enough for production environments. We are currently studying if this is something we will be doing in the near future. However, we are happy to help you build your own! Please feel free to let us know in case you are interested in this or any other ideas.

If you have any other questions, please feel free to reach out.

Happy hunting!


Friday, November 25, 2022

From zero to Zanubis

 A few weeks ago we stumbled upon a suspicious Android sample from a tweet from @malwrhunterteam which was only detected by four antivirus engines:


Antivirus verdicts didn’t provide specifics about the malware family other than it might be either a banking trojan or spyware, so a first approach to continue the investigation is finding other similar samples that will help provide a picture of this malware family. Continuing with the example, the first step is checking the “Relations” tab in the VirusTotal report to find other related IOCs (Indicators of Compromise). In this case we can observe a few interesting “Contacted URLs” obtained during sandbox detonation:


Although both URLs are interesting, the one used for socket communication (port 8000) uses Spanish strings in the endpoint (“instalado” meaning installed). By clicking on this URL in VirusTotal, we immediately find four additional samples sharing the same described networking behavior: 


These results show that the sample used to start the investigations (first seen 2022-08-27) was not the first one submitted to VirusTotal from the set of samples contacting the suspicious URL. A sample having a similar behavior and named “preso.apk” was submitted two weeks earlier (2022-08-11). The “Network Communications” section in the “Behavior” tab shows how it uses the same URL pattern:


We decided to create a Yara rule to monitor freshly submitted samples also having this same pattern.


Finding patterns for hunting

Writing YARA rules for APK files might be a bit complex, however we have two powerful tools on our side: the “vt” YARA module and VTDiff. 


Using VTDiff 


VTDiff finds common patterns between a set of files and excludes the “too common to be useful” ones. You can also provide an “exclusion list” based on a set of hashes. As a result, we get a shortlist of patterns to consider for our YARA. Considering that the APK format is basically a ZIP file, we need to choose the bundled files to submit to VTDiff. In this case, “classes4.dex” seems a good candidate based on the number of detections:

 


In our new VTDiff session we add classes4.dex hashes in the first box, and in the second box (exclusion list), a random DEX file from a clean Android file:

After VTDiff does its magic, we get several patterns matching most or all of the files submitted:


We can explore how useful these patterns are for hunting by doing a content search (just by clicking on the pattern). This returns how many files in VirusTotal match a pattern:


In this case, the chosen pattern returns 20+ additional dex files. We can find their corresponding APK parents through their relationships.


VT YARA module


The VT YARA module enhances YARA rules in many ways, including the possibility of including behavioral patterns in the rules’ conditions. This includes matching URLs extracted during sandbox detonation. In this particular case, we will take advantage of two facts to create our rule:
  • Samples connects to an endpoint called instalado
  • Samples create and delete a temporary backup file with .xml.bak extension.

With these very simple ideas, we can create the following rule:

import "vt"

rule Zanubis_Behaviour_VT
{
  meta:
      author = "@entdark_"
      email = "fdiaz@virustotal.com"
      created = "2022.09.1"
  condition:
      for any http_req in vt.behaviour.http_conversations : (                                                                                
        http_req.url contains "instalado") 
      and
      for any deleted_file in vt.behaviour.files_deleted : (
        deleted_file contains ".xml.bak")
      and vt.FileType.ANDROID
}

In case you are not familiar with YARA, the previous rule:
  • Imports the VT module to obtain access to VirusTotal’s metadata and behavioral data.
  • Iterates all the aggregated HTTP conversations in the behavioral sandbox report and finds patterns contained within the URL field.
  • Iterates all aggregated deleted files on detonation, and searches for the .xml.bak substring.
  • Verifies that the file is an Android file.

To test this YARA rule, we can manually rescan the samples used to generate it and check if they trigger the rule. However, it is more interesting to run a RetroHunt job and observe how many additional samples show up from VirusTotal’s collection:


We started from a single sample that allowed us to find 3 more through pivoting. Now RetroHunt provided us 14 files thanks to our behavior-based Yara rule. That’s great, isn’t it? 

Please note for the moment the “vt” YARA module can only be used for LiveHunts (availability for RetroHunts is in late Beta, very soon available for everyone!).


Analysis of this Android malicious family

Now that we have several samples to work with, we can go into a more manual analysis phase to make sure they are all related and what are the common capabilities. In this case, the samples belong to a family called Zanubis, which is a “work in progress” banking trojan. It uses accessibility services to overlay its victim’s apps with their own login screen. So far, it seems to exclusively target financial entities from Peru and retrieves data from the victim’s device including contacts, device details and SMS data. All the retrieved data is sent back to the remote Command and Control via websockets.


For each iteration, Zanubis’ actors have been adding additional functionality, sometimes completing features where placeholders were found in previous versions. For the last iteration, analyzed samples seem to focus on improving their social engineering capabilities, in this case including links to Peru’s government sites.

Conclusions

The process described in this post is the one many researchers follow when finding an interesting sample. One of the first things to do is finding something unique enough that helps us take a step back and look for additional samples belonging to the same family. In this particular case, networking indicators based on behavioral analysis helped identify the malicious infrastructure, and from here, we were able to find some patterns to continue hunting. With a small set in our hands, we have different options to quickly explore different options to create exploratory YARA rules, in this case we used the “vt” module to add behavior to our rule.

There are many additional ways to reach the same results, as always we are happy to hear any ideas from you side.

Happy hunting!




Thursday, November 17, 2022

Stopping Cobalt Strike with YARA

 VirusTotal was born with the idea of community in mind - an ecosystem where everybody contributes and benefits. This helped grow our product around the concept of crowdsourced intelligence, where all the security community could contribute in different ways to provide more actionable tools for our users, including researchers and analysts, for detection and threat hunting.

Sometimes we have beautiful success stories on how VirusTotal’s users give back to the community what they get from the platform. In this case, our colleagues from Uppercase created a precise set of YARA rules to detect Cobalt Strike components. You can read more about it here.

Unfortunately, Cobalt Strike has become one of the main components in any attacker’s toolset. Albeit a legitimate tool for pentesting, different versions in the last years have been leaked and abused in many different ways. The first step to create a robust set of YARA rules is to have a consistent set of samples, in this case including all the different Cobalt Strike versions we want to be able to detect. VirusTotal was the platform of choice to gather all the samples needed, and thanks to our new Collections, these samples can easily be grouped in a single set - actually, you can find this Cobalt Strike collection here.

Once all the samples are available for the researcher, there is not a single way to create the YARA rules. A first approach could be checking Commonalities among the samples, in case we find any interesting characteristic or metadata among all the samples that we could use for our rule. Below you can see an example of finding commonalities in a collection of suspicious documents with more than 2000 samples:


Another possibility would be using VTDiff to find what particular bytes these samples have in common, and at the same time, have low prevalence in VirusTotal’s collection in order to qualify them as significant for creating a YARA. Independently of using VTDiff, checking the prevalence for any byte sequence or string in VirusTotal’s collection with a quick search is always a great idea to understand how useful they would be in your rule. Remember you can combine different byte sequences (using the “content” modifier) in your VTIntelligence search. If your rule is purely based on strings and byte sequences, you can mostly test its effectiveness with a few searches in VirusTotal.

There is a more technical approach consisting of reversing the samples and finding something interesting and unique for the detection, which is what was used in this case. But how to know if your rules are good enough? Usually this is an iterative process where we want the first versions of the rules to be a bit loose so we can find more suspicious samples. This is a way for us to understand if a rule can be used for hunting. Once we are satisfied with the results (we are finding all the Cobalt Strike samples we wanted), we want to make sure we don’t detect anything else (avoid false positives), especially when it comes to legitimate software.

For the process described above, usually you want to use RetroHunts, as they will check your rules against the whole VirusTotal collection. When launching your RetroHunt, you can specify the collection of samples you want your rules to be checked against, there is one collection of goodware we can use to make sure our rules don’t detect any of these samples by mistake.


There are different ways you can check if the results obtained from your rules are True Positives, usually in VirusTotal you will find plenty of data points you can use to double check, including verdicts, other crowdsourced rules, community comments, presence in other collections, signatures, etc. Unfortunately, sometimes the results from your rules (or VT Intelligence searches) can be huge, in that case we really encourage creating a collection with them and use the Commonalities feature to get a better understanding of how your rule did and discriminate among your results. 

Once we are satisfied with the results, we suggest deploying your rules in LiveHunt for some time, which basically will execute your rule against anything uploaded to VirusTotal from the moment you deploy it. This way you can monitor its effectiveness and do some final polishing if needed. 

And voila! Your rules are ready to be deployed. In this case, the rules can be found here, and now they are part of our set of crowdsourced YARA rules, so everyone in the VirusTotal’s community will collectively benefit from this effort.

We really appreciate the effort and generosity of our colleagues from Uppercase, and we hope these ideas will help everyone understand a bit more about the creation and deployment of YARA rules. As usual, we are happy to hear from you.

Happy hunting!


Tuesday, November 15, 2022

Deception at scale: How attackers abuse governmental infrastructure

 Continuing our initiative of sharing VirusTotal’s visibility to help researchers, security practitioners and the general public better understand the nature of malicious attacks, we are proud to announce our “Deception at scale: How attackers abuse governmental infrastructure” report. Here are some of the main ideas presented there:

  • Governmental domains are among the top categories used by attackers in 2022 to distribute malicious content. 

  • We found dozens of government-related domains hosting many kinds of malware, including trojans, ransomware, phishing, coin miners, banking malware, and lateral movement tools.

  • Although some affected domains seem to be victims of opportunistic attacks, there are indicators that some of them were targeted by sophisticated attackers who abused their infrastructure to deploy their toolsets.

  • Using legitimate government domains for malware hosting can enable an attacker to improve the efficiency of social engineering attacks and avoid defenses and alerts based on deny/allow lists.

  • We also found traces of various webshells hosted in dozens of governmental domains. 

  • More generally, we observed an increase of phishing levels in 2022 along with a large distribution of suspicious PDFs. Recently created XLSX files seem to replace DOCX as the preferred mechanism to distribute  malware.


For full details, you can download the report here

In this blog post we will focus on technical hunting and monitoring ideas you can use to prevent such cyberattacks. We also provide additional technical details for some of the most interesting cases we provide in the report.

Domain categories abused by attackers 

Domains and URLs processed by VirusTotal are categorized by third party solutions, this data can typically be found in the “Details” tab:


You can use the following simple query to get domains categorized as, for example, military:


Unfortunately categories among different vendors might vary a lot as they use different criteria and spelling. Roughly speaking, categories can be grouped in two sets: one related to the business activity of the domain and a second one describing the type of threat detected. This is an important point to keep in mind while working on your own VT Intelligence queries.

We can add additional modifiers to our query. For instance the following query gets URLs (instead of domains) following a particular pattern:


We can add search modifiers for HTML content, metadata, and much more. You can find in the following links the whole list of search keywords for domains and URLs.

Each entity could be tagged by multiple categories at the same time, so you can combine them to search in VT Intelligence, below we provide some examples:

  • Malicious activity on CDN domains:

  • Suspicious phishing/malicious URLs/domains:

  • Suspicious Command and Control login URLs:

  • Suspicious waterhole attacks:

  • Sites distributing cracks for video games:

We did our best to unify different criteria to understand what are the top categories distributing malicious content. We were surprised to see Government among top results by number of domains:

Government-related infrastructure

We found several interesting representative examples when analyzing suspicious activity on governmental infrastructure. The following infographic shows top TLDs for government-related suspicious domains we found in VirusTotal in 2022. We decided to exclude non-specific TLDs (such as .com, .net, .org, etc) from this list.


We manually double-checked different cases we found interesting, described below.

Malware hosting

We found all kinds of malicious content hosted by governmental domains, including phishing, downloaders and trojans, ransomware, lateral movement tools, cryptominers and bankers. To obtain a first list to start working with, we used queries similar to the ones described before to find potentially compromised governmental infrastructure, followed by additional filtering and manual checking. 

Below we describe a few cases we found interesting.  

  • A sample of the Coper Android banking trojan was hosted on an Indonesian governmental entity website.

  • Malware with keylogger and screenshot capabilities hosted in a government office website in Bangladesh for around three months, according to telemetry.

  • A Peruvian governmental site hosted a sample of the dangerous njRAT. This particular sample was first seen in November 2021, and the site already cleaned it up at the time of writing the report, however we found more samples active on the same timeframe and using the same C2 server.

We also found traces of targeted attacks and lateral movement tools hosted in some victims.

  • Traces of Mimikatz hosted in a subdomain of a (likely compromised) public hospital in Indonesia.


  • A Cobalt Strike sample hosted in a Sri Lankan governmental entity last July 2022 under a non-suspicious name.

As for Ransomware, a regional governmental domain in the Philippines was found hosting an AgentTesla sample by mid 2021. Attackers seem to have abused a vulnerability in the CMS to deploy their sample under a URL clearly used for social engineering, greatly increasing its potential to spread.

Suspected webshells

We tried to find (potentially compromised) government-related sites hosting webshells. This is not an easy task, so we played with some ideas. For instance, we searched for common names used by webshells in governmental domains detected as suspicious by antivirus engines. We also combined searching for common content in webshell files with antivirus verdicts. Unfortunately, these queries still provided too many false positives, so we had to manually double check them. Below we detail some interesting findings, all found in governmental infrastructure: 

  • This JPG file embeds PHP code inside the comment section. This is an old well known technique that still seems to be effective to avoid antivirus detection.


Most of the URLs distributing this malware ITW listed in the Relations tab have 0 detections, probably legitimate compromised sites.

  • This PHP webshell, presented as a PDF file, was first seen in VirusTotal in 2013. Since then, the malware has been distributed ITW by at least 29 different domains. Interestingly, when crawled by any search engine bot it returns a 404 error value to avoid being indexed.

  • PHP webshells camouflage themselves as images using different techniques. In this case, this webshell adds the GIF89GHZ string at the beginning of the file to mislead filetype detection. The ITW list of URLs distributing this sample is pretty impressive. It also shows that most of the time this webshell is hosted as a GIF image. Its deobfuscated code shows a very simple upload functionality.

  • A simple PHP uploader sometimes hosted as "favicon.ico". The content tab for this webshell in VirusTotal shows the password it expects from operators.


When webshell content is available, we can use it to pivot (by clicking) to other files with the same content (likely more webshells!). By doing this we easily find nine additional files which also provide new password strings to keep pivoting. 

We also found a second encrypted webshell (first seen in 2016 and embedding two base64 Perl scripts) where the encryption password can be found in clear text in the content tab, which is always a great resource for pivoting. 

  • This file contains a trojanized version of AuraCMS, an Indonesian Content Management Service. The webshell can be found under the "files/siswa/be.php" path.

Interestingly, the file contains a disclaimer visible in the content tab

 

Nevertheless, it contains an obfuscated and encrypted block of code, which can be used to pivot to find other similar samples. The deobfuscated content shows this webshell is capable of file system navigation, local command execution, read/write files, download/upload, mysql interface, list processes, etc. It also includes four different ciphered blocks of code. Two of them create ordinary backdoor connections: $port_bind_bd_pl (perl) and $port_bind_bd_c (C code to compile locally). The other two blocks of code implement reverse shells: $back_connect (perl) and $back_connect_c (C).

  • This webshell is based on the WSO Webshell project (currently removed from Github, but there's a copy of the original repository here). Unfortunately, this is not the only WSO-based webshell we found in governmental domains (another one here). The content tab shows a password and email that will help us find other samples. We can use this information for a content-based VTI query based:

content:{3d2022323132333266323937613537613561373433383934613065346138303166633322} or content:{406d61696c2827686172645f6c696e7578406d61696c2e7275272c2027726f6f74272c2024746d7029}


Conclusions

Compromising government-related infrastructure represents a potential major threat given the implicit trust it represents. All the examples above show traces of both opportunistic and targeted attacks. The lack of regular maintenance seems to be fundamental for many of the observed attacks.

We suggest several ideas to minimize most common risks:
  • Regularly update and maintain government web sites, especially content management systems (CMS), to address vulnerabilities. 

  • Actively monitor government infrastructure for anomalies, such as malware actively communicating with them or subdomains hosting files with malicious verdicts.

  • Regularly scan all hosted files in government infrastructure, especially in subdomains and personal sites. Do not dismiss phishing, as it can be used in social engineering schemes.

  • Assume traffic from trusted domains might be malicious, as the infrastructure can be used to host lateral movement tools or other advanced malicious toolsets.

  • In case of finding anything suspicious, but especially in case of finding webshells or lateral movement tools in the infrastructure, assume compromise and consider a full investigation. 

We hope the examples provided should serve as a heads up towards better security practices when it comes to sensitive infrastructure. We also hope some ideas presented in this blog post will help defenders implement monitoring for their own infrastructure. As usual, we are happy to hear from you

Happy hunting!