Wednesday, May 29, 2024

, , , , ,

Tracking Threat Actors Using Images and Artifacts

When tracking adversaries, we commonly focus on the malware they employ in the final stages of the kill chain and infrastructure, often overlooking samples used in the initial ones.
In this post, we will explore some ideas to track adversary activity leveraging images and artifacts mostly used during delivery. We presented this approach at the FIRST CTI in Berlin and at Botconf in Nice.

Hunting early

In threat hunting and detection engineering activities, analysts typically focus heavily on the latter stages of the kill chain – from execution to actions on objectives (Figure 1). This is mainly because there is more information available about adversaries in these phases, and it's easier to search for clues using endpoint detection and response (EDR), security information and event management (SIEM), and other solutions.
Figure 1: Stages of the kill chain categorized by their emphasis on threat hunting and detection engineering.
We have been exploring ideas to improve our hunting focused on samples built in the weaponization phase and distributed in the delivery phase, focused on the detection of suspicious Microsoft Office documents (Word, Excel, and PowerPoint), PDF files, and emails.
In threat intelligence platforms and cybersecurity in general, green and red colors are commonly used to quickly indicate results and identify whether or not something is malicious. This is because they are perceived as representing good or bad, respectively.
Multiple studies in psychology have demonstrated how colors can influence our decision-making process. VirusTotal, through the third-party engines integrated into it, shows users when something is detected and therefore deemed "malicious," and when something is not detected and considered "benign."
For example, the sample in Figure 2 belongs to a Microsoft Word document distributed by the SideWinder group during the year 2024.
Figure 2: Document used by the SideWinder APT group
The sample in question was identified at the time of writing this post by 31 antivirus engines, leaving no doubt that it is indeed a real malware sample. In the process of pivoting to identify new samples or related infrastructure, starting with Figure 2, the analyst will likely click on the URL detected by 11 out of the 91 engines, and the domains detected by 17 and 15 engines, respectively, to see if there are other samples communicating with them. The remaining two domains (related to and in this case are easily identified as legitimate domains that were likely contacted by the sandbox during its execution.
Figure 3: Relationships within the SideWinder APT group document
In the same sample, if you go down in the VirusTotal report (Figure 3), the analyst will likely click on the ZIP file listed as "compressed parent" to check if there are other samples within this ZIP besides the current one. They may also click on the XML file detected by 8 engines, and the LNK file detected by 4 engines. The remaining files in the bundled files section probably won't be clicked, as the green color indicates they are not malicious, and also because they have less enticing formats — mainly XML and JPEG. But what if we explore them?

XML files generated by Microsoft Office

When you create a new Microsoft Office file, it automatically generates a series of embedded XML files containing information about the document. Additionally, if you use images in the document, they are also embedded within it. Microsoft Office files are compressed files (similar to ZIP files). In VirusTotal, when a Microsoft Word file is uploaded, you can see all these embedded files in the embedded files section.
We have mainly focused on three types of embedded files within Office documents:
  • Images:Many threat actors use images related to the organizations or entities they intend to impersonate. They do this to make documents appear legitimate and gain the trust of their victims.

  • [Content_Types].xml:This file specifies the content types and relationships within the Office Open XML (OOXML) document. It essentially defines the types of content and how they are organized within the file structure.

  • Styles.xml:Stores stylistic definitions for your document. These styles provide consistent formatting instructions for fonts, paragraph spacing, colors, numbering, lists, and much more.

Our hypothesis is: If malicious Microsoft Word documents are copied and pasted during the weaponization building process, with only the content being modified, the hashes of the [Content_Types].xml and styles.xml files will likely remain the same.

Office documents

To check our hypothesis, we selected a set of samples used during delivery and belonging the threat actors listed in Figure 4:
Figure 4: Number of samples per actor within the scope
Let’s analyze some of the results we obtained per actor.

APT28 – Images

We started by focusing on images APT28 has reused for different delivery samples (Figure 5).
Figure 5: Images shared in multiple documents by APT28
Each line in the Figure 5 graph represents the same image, and each point represents at least two samples that used that particular image.
The second image of the graph shows how it was used by different Office documents at different points in time, from 2018 to 2022 (dates related to their upload to VirusTotal).
Now, the chart in Figure 6 visualizes each of these images.
Figure 6: Content of the images shared in multiple documents by APT28
  • The first image is just a simple line with no particular meaning. It's embedded in over 100 files known by VirusTotal.

  • The second image is a hand and has 14 compressed parents.

  • The third image consists of black circles and also has over 100 compressed parents.

  • The last image is like a Word page with a table, presenting a fake EDA Roadmap of the European Commission. The image format is EMF (an old format) and it has 4 compressed parents

If we delve into the compressed parents of the second image (the one with the hand), we can see how the image is used in Office documents that are part of a campaign reported by Mandiant attributed to APT28. The image of the hand was used in fake Word documents for hotel reservations, particularly in a small section where the client was supposed to sign.
Figure 7: Pivoting through a specific image used by APT28

SideWinder – Images

SideWinder (aka RAZER TIGER) is a group focused on carrying out operations against military targets in Pakistan. This group traditionally reused images, which might help monitoring their activity.
Figure 8: Images shared in multiple documents by RAZOR TIGER
In particular, the image in Figure 9 was used in a sample uploaded in September 2021 and in a second one uploaded March 2022. The image in question is the signature of Baber Bilal Haider.
Figure 9: Two different samples of RAZOR TIGER share the same image of a handwritten signature

Gamaredon – [Content_Types].xml and styles.xml

For Gamaredon we found they reused styles.xml and [Content_Types].xml in different documents, which helped reveal new samples.
Figure 10 chart displays all the [Content_Types].xml files from Gamaredon's Office documents.
Figure 10: [Content_Types].xml shared in multiple documents by Gamaredon Group
There are a large number of samples that share the same [Content_Types].xml. It's important to highlight that these [Content_Types].xml files are not necessarily exclusively used by Gamaredon, and can be found in other legitimate files created by users worldwide. However, some of these [Content_Types].xml might be interesting to monitor.
Styles.xml files are usually less generic, which should make them a better candidate to monitor:
Figure 11: Styles.xml shared in multiple documents by Gamaredon Group
We see styles.xml files are less reused than [Content_Types].xml. This could be because some of the samples used by this actor for distribution are created from scratch or reusing legitimate documents.
We used identified patterns in the styles.xml files to launch a retrohunt on VirusTotal. Figure 12 visually represents the original set of style.xml files (left) and those that were added later after running the retrohunt (right).
Figure 12: Initial graph of the styles.xml and its parents used by Gamaredon (left). Final graph after identifying new styles.xml and their parents using retrohunt in VirusTotal (right)
One of the new styles.xml files found in our retrohunt has 17 compressed parents, meaning it was included in 17 Office files.
Figure 13: Number of parent documents for a specific styles.xml file used by Gamaredon
All the parents were malicious, some of them identical and the rest very similar between them. The content of many of them referred to "Foreign institutions of Ukraine - Embassy of Ukraine in Hungary," containing a table with phone numbers and information about the embassy, such as social media links and email accounts. Here's an example:
Figure 14: Document used by Gamaredon in one of its campaigns that includes multiple images which can be used to monitor new samples
The information for social media includes the logos of these platforms, such as the Facebook logo, Skype logo, an image of a telephone, etc. By pivoting, on the image of the Facebook icon, we find that it has 12 additional compressed parents, meaning it appears in 12 documents, all of them sharing the same styles.xml file.
Visualizing all together, we find a set of about 12-14 images used within the same timeframe by the actor. All of these images can be found in the “Embassy of Ukraine in Hungary” document.
Figure 15: Pivoting through the Facebook image that included the document in Figure 14
There's a pattern evident in the previous image where different images were included in files uploaded simultaneously. This pattern is associated with multiple documents used in the same campaign of the Embassy of Ukraine in Hungary, all of them were using the same social media images explained before.

Styles.xml shared between threat actors

Another aspect we explored was if different threat actors shared similar styles.xml files in their documents. Styles.xml files are somewhat more specific and unique than [Content_Types].xml files because they can contain styles created by threat actors or by legitimate entities that originally created the document and then were modified by the actor. This makes them stand out more and can help in identifying threat actor activity.
This doesn't necessarily imply they share information to conduct separate operations, although in some cases, it could be a scenario worth considering.
Figure 16: styles.xml shared between different threat actors
Of all styles.xml files related to actors in our initial set, only six of them were found to be shared by at least two actors. Some styles defined by the styles.xml file are very generic and could identify almost any type of file. However, there are others that could be interesting to explore further.
An interesting case is the Styles.xml file, which seems to be shared by Razor Tiger, APT28, and UAC-0099. Specifically, the samples from APT28 and UAC-0099 are attract because they were uploaded to VirusTotal within short time frames, suggesting they might belong to the same threat actor.
You can see the list of hashes in the appendix of this blog

[Content_Types].xml shared between threat actors

Like in the previous case, we checked if there were Office documents among different threat actors sharing [Content_Types].xml:
Figure 17: [Content_Types].xml shared between different threat actors
In this case, there are eleven [Content_Types].xml files that are shared by at least two different actors.
An interesting case here is the file dfa90f373b8fd8147ee3e4bfe1ee059e536cc1b068f7ec140c3fc0e6554f331a, which is shared by Gamaredon, APT37, Mustang Panda, APT28, SideCopy, and UAC-0099. Again, there could be different explanations for this.
Another interesting case that is worth analyzing in detail is [Content_Types].xml with hash 4ea40d34cfcaf69aa35b405c575c7b87e35c72246f04d2d0c5f381bc50fc8b3d, which is only shared by APT28 and APT29.
You can see the list of hashes in the appendix of this blog

AI to the rescue

The images reused by attackers seem to be a promising idea we decided to further explore.
We used the VirusTotal API to download and unzip a set of Office documents used for delivery, this way we obtained all the images. Then we used Gemini to automatically describe what these images were about.
Figure 18: Results obtained with Gemini after processing some of the embedded images in the documents used by the threat actors
Figure 18 shows some examples of images that were incorporated by certain actors. There were also other results that were not helpful, mainly related to images that did not show a logo or anything specific that indicated what they were.
Figure 19: Results obtained with Gemini after processing some of the embedded images in the documents used by the threat actors
Using the VirusTotal API to obtain documents that you might be looking for and combining the results with Gemini to analyze possible images automatically, can potentially help analysts to monitor potential suspicious documents and create your own database of samples using specific images, for example Government images or specific images about companies. This approach is interesting not only for threat hunting but also for brand monitoring.

PDF Documents

Images dropped by Acrobat Reader

Unlike Office documents, PDF files don't contain embedded XML files or images, although some PDF files may be created from Office documents. Some of our sandboxes include Adobe Acrobat Reader to open PDF documents which generates a thumbnail of the first page in BMP format. This image is stored in the directory C:\Users\\AppData\LocalLow\Adobe\Acrobat\DC\ConnectorIcons. Consequently, our sandboxes provide this BMP image as a dropped file from the PDF, allowing us to pivot.
To illustrate this functionality, see Figure 20 attributed to Blind Eagle, a cybercrime actor associated with Latin America.
Figure 20: Content of a PDF file related to Blind Eagle threat actor
Figure 20 was provided by our sandbox. In the "relations" tab, we can see the BMP image as a dropped file:
Figure 21: BMP file generated by the sandbox that can be used for pivoting
The BMP file itself also shows relations, in particular up to 6 PDF files in the "execution parents" section. In other words, there are other PDFs that look exactly the same as the initial one.
Typically, many actors engaged in financial crime activities utilize widely spread PDF files to deceive their victims, making this approach highly valuable. Another interesting example we found involves phishing activities targeting a Russian bank called "Tinkoff Bank."
The PDF files urge victims to accept an invitation from this bank to participate in a project.
Figure 22: The content of a PDF file used by cybercrime actors
Applying the same approach we identified 20 files with identical content, most of them classified as malicious by AV engines.
Figure 23: BMP file generated by the sandbox that can be used for pivoting, in this case having other 20 PDF with the same image
There are some limitations to this approach. For instance, the PDF file might be slightly modified (font size, some letter/word, color, …) which would generate a completely different hash value for the thumbnail we use to pivot.

Images dropped by Acrobat Reader

Just like the BMP files generated by Acrobat Reader, there are other interesting files that might be dropped during sandbox detonation. These artifacts can be useful on some occasions.
The first example is a JavaScript file dropped in another PDF attributed to Blind Eagle.
Figure 24: BMP file generated by the sandbox that can be used for pivoting, another example of Blind Eagle threat actor
The dropped JavaScript file's name during the PDF execution was "Chrome Cache Entry: 566" indicating that this file was likely generated by opening an URL through Chrome, possibly triggered by a sandbox click on a link within the PDF. Examining the file's contents, we observe some strings and variables in Spanish.
Figure 25: Artifact generated by the sandbox via Google Chrome when connecting to a domain
The strings “registerResourceDictionary”, “sampleCustomStringId”, “rf_RefinementTitle_ManagedPropertyName” are related to Microsoft SharePoint as we were able to confirm. These files were probably generated after visiting sites that have Microsoft Sharepoint functionalities. We found that all the PDFs containing this artifact dropped by Google Chrome came from a website belonging to the Government of Colombia.
Figure 26: Flow of artifact generation related to Google Chrome that can be used for pivoting in VirusTotal

Email files

Many threat actors incorporate images in their emails, such as company logos, to deceive victims. We used this to identify several mailing campaigns where the same footer was used.

Campaign impersonating universities

On November 13, 2023, we details about a new campaign impersonating universities, primarily located in Latin America. By leveraging the presence of social network logos in the footer, we were able to find more universities in different continents targeted by the same attacker.
Figure 27: Email impersonating a university that contains multiple images
Figure 27 shows several images, including the University of Chile's logo and building, as well as images related to social networks like YouTube, Facebook, and Twitter.
Pivoting through the images related to the University of Chile doesn't yield good results, as it's too specific. However, if we pivot through the images of the social media footer, represented as email attachments, we can observe multiple files using the same logo.
Figure 28: Using the images from the email footer to pivot and identify new emails
Just by analyzing one of the social media logos, we saw 33 email parents, all of them related to the same campaign.
Figure 29: Other emails identified through image pivoting techniques

Campaigns impersonating companies

Another usual case is adding a company logo in the email signatures to enhance credibility. Delivery companies, banks, and suppliers are some of the most observed images during our research.
For example, this email utilizes the corporate image of China Anhui Technology Import and Export Co Ltd in the footer.
Figure 30: Email impersonating a Chinese organization using the company logo in the footer
Pivoting through the image we found 20 emails using the same logo.
Figure 31: Other emails identified through image pivoting techniques

Wrapping up

We can potentially trace malicious actors by examining artifacts linked to the initial spreading documents, and in the case of images, AI can help us automate potential victim identification and other hunting aspects.
In order to make this even easier, we are planning to incorporate a new bundled_files field into the IOCs JSON structure, which basically will help to create livehunt rules. In the meantime you can use vt_behaviour_files_dropped.sha256 for those scenarios where the files are dropped.
In certain situations, the styles.xml and [Content_Types].xml files within office documents can provide valuable clues for identifying and tracking the same threat actor. The method presented here offers an alternative to traditional hunting or pivoting techniques, serving as a valuable addition to a team's hunting activities.
We hope you found this research interesting and useful, and as always we are happy to hear your feedback.
Happy hunting!


[Content_types].xml shared between threat actors

[Content_Type].xml sha256

Shared by


APT33, APT32


APT29, APT28


FIN7, Gamaredon, APT28, APT32


FIN7, APT33, TA505, Mustang Panda


Gamaredon, APT33


Gamaredon, Hazy Tiger, APT33,


Razor Tiger, APT28, UAC-0099


Razor Tiger, SideCopy


Gamaredon, APT37, Mustang Panda, APT28, UAC-0099, SideCopy


FIN7, Hazy Tiger


Mustang Panda, APT32

styles.xml shared between threat actors

Styles.xml sha256

Shared by


APT28, UAC-0099, Razor Tiger


Hazy Tiger, Gamaredon, APT33


TA505, Gamaredon


APT28, FIN7, Razor Tiger, APT32, APT33


Hazy Tiger, FIN7


APT32, SideCopy, Mustang Panda, Razor Tiger


Post a Comment