Monday, April 24, 2023

Introducing VirusTotal Code Insight: Empowering threat analysis with generative AI

At the RSA Conference 2023 today, we are excited to unveil VirusTotal Code Insight, a cutting-edge feature that leverages artificial intelligence for code analysis. Powered by Google Cloud Security AI Workbench, Code Insight produces natural language summaries of code snippets with ease. This functionality empowers security experts and analysts by providing them with deeper insights into the purpose and operation of analyzed code, significantly enhancing their capability to detect and mitigate potential threats.

For quite some time, artificial intelligence (AI) and machine learning (ML) have played a crucial role in anti-malware and cybersecurity, mainly focusing on classification tasks. However, recent advancements in large language models (LLMs) have expanded their capabilities to encompass text generation and summarization.

Impressively, when these models are trained on programming languages, they can adeptly transform code into natural language explanations. This innovation not only expedites malware analysis but also bolsters a variety of cybersecurity applications. Recognizing the immense potential of this cutting-edge technology, we have incorporated it into the VirusTotal platform, significantly enhancing its capabilities.

Code Insight is a new feature based on Sec-PaLM, one of the generative AI models hosted on Google Cloud AI. What sets this functionality apart is its ability to generate natural language summaries from the point of view of an AI collaborator specialized in cybersecurity and malware. This provides security professionals and analysts with a powerful tool to figure out what the code is up to.

At present, this new functionality is deployed to analyze a subset of PowerShell files uploaded to VirusTotal. The system excludes files that are highly similar to those previously processed, as well as files that are excessively large. This approach allows for the efficient use of analysis resources, ensuring that only the most relevant files (such as PS1 files) are subjected to scrutiny. In the coming days, additional file formats will be added to the list of supported files, broadening the scope of this functionality even further.

Let's examine a few examples derived from authentic situations to truly appreciate the functionality of this feature.

In this first case, we have a file that was detected by only three engines on VirusTotal as “PowerShell/PSW-Agent.U” and “HEUR.Trojan-PSW.Multi.Disco.gen”. Meanwhile, Code Insight provided the following explanation:

https://www.virustotal.com/gui/file/74662107227a6a28bebb77d5b9ec3890a80e507ee22ed99eb17f35c9d8730bf3/detection

Unveiling false negatives

It's important to note that Code Insight conducts its analysis independently, relying solely on the content of the file being processed, without access to antivirus results or any other associated metadata. A good example can be observed in this case of a false negative, where Code Insight’s explanation helps us detect malware to stealth user’s credentials that has not been identified by any antivirus software in VirusTotal:

https://www.virustotal.com/gui/file/552efb0dc7e62ded08c98d2e6355df1d27a1317c0a37aabeefd48667b7b1917b/detection

Clearing false positives

In this other example, we have a file that is flagged as trojan and malware by 9 antivirus engines, but it's actually a false positive. Here we can see once again how Code Insight can be a valuable ally when managing incidents and analyzing potential malware. In this case, it explains that it's simply a script that installs Postman CLI:

https://www.virustotal.com/gui/file/b5796d7e4a9efc0b81efdc94b3e42ba6a6ef71d10274e3b812cd5ef4dfb8787b/detection

In this last example, Code Insight demonstrates how it can help improve file categorization in VirusTotal. Code Insight accurately identifies the sample’s file type and fixes the tag that misclassified it as JavaScript.

Although the selected examples illustrate accurate descriptions, the performance of the LLM model may vary on a case-by-case basis, including judgment errors. It’s highly likely that attackers develop new evasive strategies and an ongoing competition between malware and this new approach is expected. That’s why it is crucial for a security analyst to oversee these features as they ultimately need to interpret this information combined with other contextual information and correlations relevant to the case at hand.

Nevertheless, the integration of LLMs into the arsenal of code analysis tools is a significant advancement that enables security professionals to gain valuable insights into the structure and behavior of potentially malicious code, improving threat detection and response efficiency.

Code Insight in VirusTotal Intelligence

This kind of analysis can be carried out by various AI models, each offering varying levels of precision and depth. However, the true value of VirusTotal's Code Insight lies in its capacity to scale this analysis through its platform. This enables not only the examination of individual code samples, but also the aggregation and exploitation of results on a large scale via the VirusTotal Intelligence service. As a result, security teams can swiftly and effectively scrutinize vast quantities of code and identify potential threats, enhancing their efficiency and ultimately fortifying their security stance.

Here’s an example of searching for ”codeinsight:keylogger”:

VirusTotal Intelligence finds several files that, according to the Code Insight report, record keystrokes and write them to a log file. Let’s expand the report of the first one, then we can read a comprehensive analysis explaining this specific keylogger’s behavior:

https://www.virustotal.com/gui/file/d6111869a8088e2d1b49a92a30fc3d477373d88a4a2f1a7da4e75ce85dc08ba4/detection

As we continue to refine and expand the capabilities of VirusTotal Code Insight and other cutting-edge features, we remain dedicated to providing our community with the most advanced and effective tools to stay ahead of evolving cyber threats. We are truly excited about what the future holds and are eager to continue pushing the boundaries of what is possible in the field of cybersecurity. Stay tuned for more updates and developments from the VirusTotal team.

APT43: An investigation into the North Korean group’s cybercrime operations

Introduction

As recently reported by our Mandiant's colleagues, APT43 is a threat actor believed to be associated with North Korea. APT43’s main targets include governmental institutions, research groups, think tanks, business services, and the manufacturing sector, with most victims located in the United States and South Korea. The group uses a variety of techniques and tools to conduct espionage, sabotage, and theft operations, including spear phishing and credential harvesting.

From VirusTotal we wanted to contribute to a better understanding of this actor’s latest activity based on their malware toolset’s telemetry, including geographical distribution, lookups, submissions, file types, detection ratios, and efficacy of crowdsourced YARA rules for the IOCs attributed by Mandiant to this threat actor. All the data provided in this post is also available for VirusTotal users through VT Intelligence. It can be obtained by aggregating Telemetry and Commonalities from a set of IOCs, which you can do using a VT Intelligence search, Collection or Graph.

Malware artifacts

File type distribution

We used Indicators of Compromise (IOCs) attributed to APT43 by Mandiant to collect telemetry on the threat actor’s latest activity, resulting in the following file type distribution:

Microsoft Word documents (docx) are the most common file format among the samples we analyzed. This suggests that APT43 relies heavily on Microsoft Word documents as a vector for delivering malicious payloads or exploiting vulnerabilities. We also found that most of these files used macros as their infection technique, while only a few of them exploited the CVE-2017-0199 vulnerability, which allows attackers to run malicious code on target systems by embedding malicious links in the docx file.

The file type distribution also reveals interesting patterns. For example, we can see that there are more pedll files than peexe files, even though both are executable formats. Further analysis of the pedll files revealed that the majority of them used the T1129 MITRE ATT&CK technique, which involves executing malicious payloads via loading shared modules. Another example is that there are more hwp files than doc files, even though both are word processing formats. This may indicate that APT43 targets specific regions or organizations that use Hangul Word Processor (HWP), a popular software in South Korea.

Lookups and submissions timeline

The oldest of the samples we analyzed, a portable executable, was uploaded in July 2018 from the United Kingdom.

The initial submission triggered only five positive detections, but after subsequent submissions from South Korea and Italy, the latest one being on March 30, there are now more than 40 detections. This is an important remark on how typical AV detections evolve protection as new techniques and artifacts are discovered.

For user queries (lookups) now associated with APT43, which speak of the interest of VirusTotal’s users on particular samples, there was a peak around April 2021, maybe related to some particular APT43 campaign. The plateau starting July 2022 might be related to security companies monitoring this actor’s activity. The very recent peak in April 2023 is most likely related to the publication of APT43, which revamped the interest from security analysts on this actor and related artifacts.

From January to April this year, most of the submissions came from South Korea, and the file type most associated with these submissions was the docx file. For lookups, most of them are from the United States with apk files being the most looked up, followed by docx, text, html, powershell, and peexe.

Geographical distribution

File sample submissions and lookups are important indicators of cyber threat activity and awareness. By analyzing the geographical distribution of these activities, we can gain insights into the regions that are most affected by or interested in a particular threat actor or campaign.

According to our telemetry data, the top countries for file sample submission are South Korea, the United States, Italy, Israel, and the United Kingdom.

South Korea stands out as the country with the most file sample submissions, accounting for almost half of the total submissions. This is not surprising, given that South Korea is (probably) the primary target of APT43, as pointed out by Mandiant. Submissions of the identified samples are observed from July 2018 to April 2023.

On the other hand, South Korea ranks third in file sample lookups which is consistent with their leading position in file sample submissions. This is remarkable given South Korea’s number of VT submitters, well below top 10 countries.

Interestingly, Turkey stands out as the country with the second highest number of lookups, even if they are not among top submitters. This may suggest that Turkey is either a victim or a conduit of North Korean cyber attacks. Lookups of the identified samples are observed from October 2020 to April 2023.

Detections

Analysis from the commonalities tool reveals the most common threat categories as trojan, downloader and dropper. Of the identified samples, 32% of the samples with threat names assigned to them are labeled ‘kimsuky’. Mandiant’s report mentions kimsuky as being used by multiple companies in their public statements on APT43’s activities.

To get a sense of the detection distribution among the samples we analyzed, we divided them into four categories:

15 or more detections,
less than 5 detections,
between 10 and 15 detections, and
between 5 and 10 detections.

The most common category was for 15 or more detections, representing 83% of all samples. On the other hand, samples with less than 5 detections came in at 9.7%. The other two categories, between 5 and 10 detections and between 10 and 15 detections, came in at 2.4% and 4.8%, respectively.

MITRE ATT&CK

During our analysis of the samples using the commonalities tool, we observed several MITRE ATT&CK techniques adopted to APT43 to compromise and exfiltrate data from their targets. VirusTotal extracts TTPs from samples by detonating them in different sandboxes and using tools such as CAPA. Among the techniques, the most prevalent ones were:

T1082: System Information Discovery
T1083: File and Directory Discovery
T1129: Execution through Module Load
T1055: Process Injection
T1071: Application Layer Protocol

Crowdsourced YARA rules

Of all the samples analyzed, quite a few of them triggered some crowdsourced YARA rules. The YARA rule that detected quite a few of the samples was one created by InQuest Labs, Ruleset: Office_Document_with_VBA_Project. The rule was created to detect any Office document with a VBA project in them. While a VBA project within an Office document is used for automating tasks, it can also be used to perform a range of malicious actions such as stealing sensitive data, installing malware, and modifying or deleting data. It can even be used in phishing attacks where the emails contain attachments with malicious VBA macros. In Mandiant’s publication, they point out APT43’s use of SPICYTUNA, which is a VBA downloader. This rule could be used as a soft signal.

Another crowdsourced YARA rule that detected a few of the analyzed samples was one by Florian Roth, which detects the Ghost419 RAT used by the Gold Dragon malware: GoldDragon_Ghost419_RAT. Mandiant reports that the malware extracts a payload from a hwp document and writes it to a startup directory resulting in the new file being executed when the current user logs in.

A sample using the Gh0st RAT malware was detected by one of the crowdsourced rules: GhostDragon_Gh0stRAT. The malware is a backdoor written in C++ that communicates via a custom binary protocol over TCP or UDP, as reported by Mandiant.

Collections

Collections are how our users can group different indicators in a shareable set. This has great benefits, such as providing a name and description, external references, attribution, etc. In addition it helps working with large datasets and obtaining commonalities, telemetry and other aggregated data. Other than this, we daily create dozens of collections based on OSINT security incidents and publications.

During our analysis of the samples, we observed that several of them belonged to two different collections created by AlienVaultOTX: APT43: North Korean Group Uses Cybercrime to Fund Espionage Operations and Analysis of Smoke Screen in APT campaign aimed at Korea and America. For the first collection, we observed that it contained a variety of file types such as pedll, peexe, docx, php, and powershell in that order. The second collection was different, however, as it contained only two file types. We found that the majority of the files were docx files which accounted for over 90% of the samples, and the remaining were peexe files.

Conclusions

This blog post sheds light on the activities of APT43, a threat actor operating on behalf of the North Korean regime. We have used VirusTotal’s features to explore the file type and geographical distributions, detections, ATT&CK techniques, collections, and crowdsourced YARA rules related to the threat actor’s campaign. We hope that this post has provided some insights into the capabilities and techniques of APT43, and how VirusTotal can help to monitor and investigate such campaigns.

We hope you found this useful, and please reach out to us if you have any suggestions or just want to share feedback.

Happy hunting!

VirusTotal += Deep Instinct

We welcome Deep Instinct to VirusTotal. In their own words:

"Deep Instinct is the only prevention-first cybersecurity company with a natively architected deep learning platform. We keep enterprises safe by stopping >99% of threats before other solutions even see them – at a speed and scale, unprecedented in the industry. Deep Instinct significantly reduces detection noise and false alert storms to reduce overall risk, improve SOC team productivity, and improve the total cost of ownership of our customers' entire cybersecurity stack. For more, visit www.deepinstinct.com.”

Deep Instinct has expressed its commitment to follow the recommendations of AMTSO and, in compliance with our policy, facilitates this review by AV-Comparatives, an AMTSO-member tester.

Popular Posts

Blog Archive