Friday, September 05, 2025

, ,

Supercharging Your Threat Hunts: Join VirusTotal at Labscon for a Workshop on Automation and LLMs

We are excited to announce that our colleague Joseliyo Sánchez, will be at Labscon to present our workshop: Advanced Threat Hunting: Automating Large-Scale Operations with LLMs. This workshop is a joint effort with SentinelOne and their researcher, Aleksandar Milenkoski. 

In today's rapidly evolving threat landscape, security professionals face an overwhelming tide of data and increasingly sophisticated adversaries. This hands-on workshop is designed to empower you to move beyond the traditional web interface and harness the full potential of the VirusTotal Enterprise API for large-scale, automated threat intelligence and hunting. 

We will dive deep into how you can use the VirusTotal Enterprise API with Python and Google Colab notebooks to automate the consumption of massive datasets. You'll learn how to track the behaviors of advanced persistent threat (APT) actors and cybercrime groups through practical, real-time exercises. 

A key part of our workshop will focus on leveraging Large Language Models (LLMs) to supercharge your analysis. We'll show how you can use AI to help understand complex data, build better queries, and create insightful visualizations to enrich your information for a deeper understanding of threats. 

This session is ideal for cyber threat intelligence analysts, threat hunters, incident responders, SOC analysts, and security researchers looking to automate and scale up their threat hunting workflows. 

After the workshop, we will publish a follow-up blog post that will delve deeper into some of the exercises and examples presented, providing a valuable resource for further learning and implementation. 

We look forward to seeing you at Labscon! 

(All of the scenarios are compatible with Google Threat Intelligence)

 ---- 
Conference website: https://www.labscon.io/ 
Date: September 17-20, 2025 
Registration: Invite-Only 
Place: Scottsdale, Arizona 
Duration: 3-5h

Thursday, September 04, 2025

Uncovering a Colombian Malware Campaign with AI Code Analysis

VirusTotal Code Insight keeps adding new file formats. This time, we’re looking at two vector-based formats from very different eras: SWF and SVG. Curiously, right after we rolled out this update in production, one of the very first submitted files gave us a perfect, and unexpected, example of Code Insight in action: it uncovered an undetected malware campaign using SVG files that impersonated the Colombian justice system.

Audio version of this post, created with NotebookLM Deep Dive

SWF: a blast from the past

Flash is dead, Adobe killed it in 2020 and browsers stopped supporting it shortly after. But surprisingly, SWF files still show up on VirusTotal. Whether it’s old malware resurging, retro hunting, or long-tail campaigns, they haven’t disappeared completely.

In fact, VirusTotal received 47,812 unique SWF files in the last 30 days that had never been seen before, and 466 of them were flagged as malicious by at least one antivirus engine.

SWF files are binary and compiled. That means Code Insight needs to:

  • Unpack and decompress the container (often zlib or LZMA)
  • Parse the internal tag structure
  • Extract embedded scripts, either ActionScript 2 (AVM1) or ActionScript 3 (AVM2 bytecode + decompiling/disassembling)

Once we lift those scripts into something closer to pseudocode or readable disassembly, the LLM steps in to summarize what the file is doing and why it might be suspicious.

SVG: modern, open, and still abusable

SVGs, on the other hand, are very much alive. It’s a standard web format, open, text-based, and everywhere: websites, design tools, build systems. And that’s also why attackers like it.

In the last 30 days alone, VirusTotal received 140,803 unique SVG files that had never been seen before, and 1,442 of them were flagged as malicious by at least one antivirus engine. That's roughly 1% showing up with detections, just like SWF curiously.

SVG is just XML with <svg> at the root. If it’s a .svgz, we decompress it first. From there, Code Insight looks for:

  • Embedded JavaScript in <script> tags or event handlers (onload, onclick…)
  • Redirects using javascript: URLs or location.href
  • Obfuscation tricks (CDATA, character entities, base64 payloads, etc.)

Because SVG is plain text, the challenge isn’t unpacking, it’s spotting the malicious logic hiding in plain sight.

Let’s see a couple of examples:

When a SWF is flagged, but isn’t malicious

One common challenge in threat analysis is dealing with files that trigger detections in just a few antivirus engines. They’re not clean, but they’re not clearly malicious either. These gray areas force analysts to dig deeper, often wasting time chasing false positives.

The SWF file in the screenshot is a perfect example.

350422c3915a8a1a1336147f89061b25c8354af58db0050e2f9ef2b384e59f62

It was flagged by 3 out of 63 engines. Enough to raise doubts, but not conclusive. The detections mention known SWF heuristics and an old CVE.

Thanks to Code Insight, we can quickly understand what’s going on. It identifies the SWF as a complex ActionScript-based game, including 3D rendering, sound management, and a full level editor. The analysis also explains why the file might look suspicious: it uses obfuscated classes and cryptographic functions (like RC4 and AES), and gathers system details, techniques often associated with malware, but also common in Flash games to enforce DRM or prevent tampering.

The verdict? No malicious behavior was observed, and now we know why it looked suspicious in the first place.

This kind of context is exactly what Code Insight is designed for: saving time, reducing uncertainty, and helping you focus on real threats.


When AV misses, but Code Insight doesn’t

This second example shows the other side of the coin: a malicious SVG file that evaded all antivirus engines, going completely undetected on VirusTotal. On the surface, it looks clean, but a quick look with Code Insight tells a very different story.

1527ef7ac7f79bb1a61747652fd6015942a6c5b18b4d7ac0829dd39842ad735d

According to Code Insight: “This SVG file executes an embedded JavaScript payload upon rendering. The script decodes and injects a Base64-encoded HTML phishing page impersonating a Colombian government judicial system portal. To deceive the user, it simulates a file download with a progress bar, while in the background, it decodes a second, large Base64 string, which is a malicious ZIP archive, and forces its download.”

We validated this behavior by opening the sample in a controlled environment. As shown in the screenshots below, the fake portal is rendered exactly as described, simulating an official government document download process. The phishing site includes case numbers, security tokens, and visual cues to build trust, all of it crafted within an SVG file.


Despite its zero detections, this SVG hides two layers of abuse:

  • A convincing phishing lure, injected via inline JavaScript and decoded on-the-fly
  • A malware dropper, silently extracting and triggering the download of a ZIP file in the background

This is exactly the kind of threat Code Insight is meant to catch: well-crafted, script-based attacks that fly under the radar.

A deeper look: from one SVG to a full campaign

Curiously, the malicious SVG we highlighted earlier wasn’t just any random sample, it was one of the very first files submitted right after we deployed SVG support in Code Insight. A coincidence? Or were we seeing the tip of something bigger?

Thanks to VirusTotal Intelligence, we can search through our massive sample collection using hundreds of parameters, including queries that look inside Code Insight reports. So we ran:

type:svg AND codeinsight:"Colombian"


And voilà: 44 unique SVG files surfaced, all undetected by antivirus engines, but all flagged by Code Insight as part of the same phishing and malware campaign.

Diving into the source code of these SVGs, we found:

  • Code obfuscation techniques
  • Use of polymorphism, with slight changes in every file
  • And large amounts of dummy (garbage) code to increase entropy and evade static detection.

But Code Insight had no problem cutting through the noise.

One thing stood out: the attackers left Spanish-language comments in their scripts, with phrases like "POLIFORMISMO_MASIVO_SEGURO" and "Funciones dummy MASIVAS". While most of the code changed from sample to sample, those comments stayed exactly the same, a clear weakness, and a perfect signature for a simple YARA rule.


So we wrote a very basic one:


Running a retrohunt over the last year with this basic rule returned 523 matches.


Sorting by submission time, the first sample dates back to August 14, 2025, also submitted from Colombia, and also with 0 antivirus detections at the time.


We reanalyzed that first sample with the current version of Code Insight, and again, it produced an accurate description of the phishing page and malware dropper, impersonating the Colombian Fiscalía General de la Nación.

Looking deeper, we saw that the earliest samples were larger, around 25MB, and the size decreased over time, suggesting the attackers were evolving their payloads. Most importantly, the distribution vector was email, allowing us to pivot into delivery metadata: senders, subjects, attachment names, and more.


Final thoughts

SWF and SVG are very different formats from very different eras, but both can still cause headaches for analysts.

In the first case, Code Insight helped explain why a SWF file looked suspicious without actually being malicious. In the second, it uncovered malicious behavior in an SVG that had gone completely undetected.

This is where Code Insight helps most: giving context, saving time, and helping focus on what really matters. It’s not magic, and it won’t replace expert analysis, but it’s one more tool to cut through the noise and get to the point faster. And when Code Insight and VirusTotal Intelligence work together, one suspicious sample can become the key to revealing an entire campaign.

Thursday, August 28, 2025

, ,

Integrating Code Insight into Reverse Engineering Workflows

More than two years have passed since we announced the launch of Code Insight at RSA 2023. From that time on, we have been applying this technology in different scenarios, expanding its use in new file formats (12).

As we advance in the automated analysis of new files with Code Insight, we want to offer an alternative that enables the integration of this type of technology into the analysis of disassembled or decompiled code.


Audio version of this post, created with NotebookLM Deep Dive

To that end, we have created a new endpoint that receives code requests and returns a description of its functionality, highlighting the most relevant aspects for malware analysts. This endpoint can be used to query code blocks, chaining previous analyses with modifications or corrections made by the analyst. This significantly reduces the reverse engineering workload by providing the analyst with an assistant that pre-analyzes functions deemed interesting, acquiring knowledge as the analysis proceeds.

This endpoint can be integrated into any reverse engineering tool that processes disassembled or decompiled code. As an implementation example, the VirusTotal plugin for IDA Pro has been updated to support its use from the IDA interface. This offers a simple way to integrate relevant analyses into a notebook, allowing the analyst to keep responses that play a direct role in understanding how the code works.

Endpoint for reversed code queries

Using this new endpoint is quite simple—just make a request to the API as shown in the following example:

API_URL = 'https://www.virustotal.com'
endpoint = 'api/v3/codeinsights/analyse-binary'
headers_apiv3 = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'x-apikey': [API_KEY]
}

payload = {
    'code': [code_base64],
    'code_type' = ['disassembled'|'decompiled']
}

response = requests.post(f'{API_URL}/{endpoint}',
                         json = {'data': payload},
                         headers = headers_apiv3)


This Python code corresponds to a request to the endpoint located at ‘https://www.virustotal.com/api/v3/codeinsights/analyse-binary’, in which the code to be analyzed is included in the ‘payload’ variable as follows:

payload = {
    'code': code_base64,
    'code_type' = 'disassembled'|'decompiled'
    "history": [
        {
            "request": code_base64,
            "response": {
                            "summary": text,
                            "description": text,
                        },
        },
        {
            "request": code_base64,
            "response": {
                            "summary": text,
                            "description": text,
                        },
        },
    ]
}


The request is divided into two parts: the first includes the code being analyzed (‘code’ and ‘code_type’), and the second includes previous requests—potentially reviewed by the analyst—that provide context for analyzing the queried code.

This request will return a general description of how the submitted code snippet works ("summary") and, in addition, another text where it describes in more detail how these functionalities are carried out ("description"). In this way, the analyst can quickly check if the function contains any behavior that they consider interesting, and thus, review the execution steps or discard the function as irrelevant.

New version of the VT-IDA Plugin for IDA Pro

Along with this new endpoint, we have updated the VirusTotal plugin to show how this new functionality can be integrated into the analyst's workflow.

This new functionality can be used as follows:
  1. The analyst selects a function from the disassembled or decompiled code to be analyzed.
  2. If the response provided by the endpoint is satisfactory and reveals an interesting function, they can click ‘Accept’ to include it in a list of selected functions, which we call the ‘CodeInsight Notebook’. They can also make modifications to the ‘Summary’ and ‘Description’ fields to correct errors or add information that helps put the code in context.
  3. With each new request sent to the endpoint, all previously stored functions are included—along with any modifications made by the analyst. This allows for more accurate analyses based on previously obtained and reviewed results.
Here’s how the new version of the plugin would look after a few iterations on a malware sample:



A practical example

Let's illustrate the benefits of the new plugin with a practical example. Imagine an analyst needs to analyze a malicious binary file to understand its function. This is typically a time-consuming and complex process, but with the help of Code Insight, their workflow becomes significantly more efficient:

  1. Targeted Analysis: The analyst selects a code block they suspect might be malicious and uses the endpoint to get an automated analysis.

    The code shown below implements an anti-disassembly technique aimed at generating disassembled code that hides malicious functionality through a hidden jump to a memory address. Essentially, the resulting disassembled code is unreliable, as it doesn’t accurately represent the code that will actually be executed.



  2. Review and Refinement: At this point, a request is made to obtain an initial analysis of the code. The analyst reviews the response and can modify both the ‘Summary’ and ‘Description’ fields with their own notes or corrections.

  3. In this case, the obtained code analysis correctly identifies an anti-disassembly technique that modifies the return address. However, it does not provide information about a possible return address that would help the analyst locate the hidden code.

    At this point, the analyst can modify the output provided by the endpoint to explain how this technique works. This way, the acquired knowledge can be used in the analysis of other code blocks within the sample. To do so, the analyst simply needs to include the (reviewed) analysis in the list of analyzed functions by clicking the ‘Accept’ button.



  4. Iterative Analysis and Improved Results: The file analysis continues in such a way that, with each new request, the list of analyzed functions is sent—effectively representing the knowledge acquired from analyzing the code selected by the analyst.


And as shown in the previous image, this knowledge is used in other function queries that employ a technique similar to the one previously discussed—this time providing more details about how it works and alerting the analyst to the possibility of jumping to an address containing hidden code.

Quick Tips

The endpoint offers some interesting features for the analyst. For example, as shown in the following figure, the presence of strings written in languages other than English has been detected, providing a translation and pinpointing their location in memory.



On the other hand, while analyzing assembly code has its own pros and cons compared to decompiled code, we can gain additional benefits by analyzing a decompiled function whose disassembled code has been previously analyzed and stored in Code Insight Notebook.

For example, let's look at the decompiled code of a function previously analyzed in its disassembled version:


The image below illustrates how analyzing a decompiled function becomes richer with the help of the previously stored analysis of its disassembled code. This happens because certain features, like text strings, are visible in the disassembled code but often missing from the decompiled version.

As a result, Code Insight can provide a more concise and direct explanation by leveraging the decompiled view, which is supported by the disassembled code.



It is important to highlight that both the endpoint and this new feature of the plugin for IDA Pro are offered in trial mode, with the aim of involving the community in the progress we are making in its application to the field of reverse engineering. Although the results produced by this new functionality have been very positive during the testing phase, it is possible that the output generated by the endpoint may not be 100% accurate and could contain errors or omit some relevant details of the analysis.

We are confident that this new integration will be a great help to analysts who are gradually incorporating LLM model capabilities into their workflow. As we continue to harness the power of AI, your feedback is incredibly valuable to us. Stay connected for future updates, and thank you for your continued support.


Monday, August 25, 2025

Applying AI Analysis to PDF Threats

In our previous post we extended VirusTotal Code Insights to browser extensions and supply-chain artifacts. A key finding from that analysis was how our AI could apply contextual knowledge to its evaluation. It wasn’t just analyzing code in isolation, it was correlating a package's stated purpose (its name and description) with its actual behavior, flagging malicious logic that contradicted its public description. We’re now applying the same idea to one of the most common file formats in the world, the PDF.


Audio version of this post, created with NotebookLM Deep Dive

PDFs are multi-layered. There’s the object tree (catalog, pages, objects, streams, actions, embedded files) and there’s the visible layer (text/images the user reads). Code Insights analyzes both, then correlates: does the document content, claims, and branding make sense given its internal behaviors? That lets us surface not only classic PDF exploitation (e.g., auto-actions, JS, external launches) but also pure social engineering (phishing, vishing, QR-lures) even when the file has no executable logic. This dual approach allows the AI not only to detect malicious code but also to identify sophisticated scams.

Let's look at real-world samples surfaced by Code Insights during its initial testing phase. We'll start with cases where the PDF contains no malicious code, which traditional engines often miss because there's no executable payload to detect. This is where Code Insights proves useful, identifying clear signs of fraud and social engineering that aim to manipulate the user, not the machine.


Case 1 - Fake debt collection targeting financial fraud

This PDF is a real-world sample sent to VirusTotal and captured by Code Insights during early testing. It was flagged as malicious based entirely on its visible content, without relying on any embedded code or execution logic. The file was marked as clean by all other engines, likely because it contains no scripts, exploits, or embedded payloads.

d92a1a7460c580f8bf6af3cbd39c7840cfe6a146ee15ede8e23c50c2a85becb9

The document pretends to be a debt collection notice from a German agency acting on behalf of Amazon. It includes a formal layout, legal threats, payment instructions, and multiple references to German addresses and regulations. Visually, it looks legitimate.


However, the AI flagged it as fraudulent based on several critical inconsistencies, the most important one being the destination bank account. The payment is requested to an IBAN starting with BG, indicating a Bulgarian account. This contradicts the sender's claimed German identity and would be highly unusual for a legitimate German debt agency. This mismatch alone was enough for Code Insights to classify the file as fraudulent. Additional content cues (urgent tone, fee breakdown, legal pressure) support the assessment.

As described in the Code Insights analysis:

“The visual and textual content confirms the document is a sophisticated phishing attack. It masquerades as an urgent payment demand from a German debt collection agency, supposedly on behalf of Amazon. The document employs high-pressure tactics, including threats of legal action, additional fees, and credit score damage, to compel the recipient to act quickly. The primary and most conclusive indicator of fraud is the demand for payment to a Bulgarian bank account, which is a stark and highly irregular contradiction to the agency's purported German location and registration.”

This is a case where AI adds value by reasoning over the content semantics, not the file structure.


Case 2 - QR-based phishing (quishing) campaign

This is another real-world PDF captured during early testing of Code Insights. At the time of analysis, no antivirus or malware detection engines flagged the file as malicious. The PDF has no embedded scripts, exploits, or execution logic. From a technical perspective, it looks benign.

259e202847d04866acd76427f53bfd9a15372ed6ed56a9e54ba1c62442c945ee

The visible content, however, impersonates an HR notification about a salary increase. It includes multiple social engineering red flags: awkward grammar, lack of personalization, and an irrelevant privacy disclaimer. The only call to action is a QR code, encouraging the recipient to scan it for more details.


Code Insights analyzed and decoded the QR, extracting the hidden URL. The domain is non-corporate and clearly unrelated to HR or payroll systems. The combination of deceptive HR messaging with a QR code that conceals a phishing URL confirms the document is a credential harvesting fraud delivered via PDF.


Case 3 - Vishing via fake PayPal alert

This is another real-world PDF flagged by Code Insights during early evaluation. No antivirus or malware detection engines classified the file as malicious. Structurally, it’s simple and inert: there are no scripts, automatic actions, or embedded links. Minor stream decoding errors are present but considered low-risk anomalies.

d0bedc70085efff5218b901cdaba95d565df867495181544041ba4b8a6019cea


The threat lies entirely in the content. The document impersonates PayPal and trusted brands like Visa to deliver a fake security alert about a high-value unauthorized purchase. The language is urgent and designed to induce panic.

According to Code Insights:

“[...]the visual content of the document is a clear social engineering lure designed for a voice phishing (vishing) attack. [...] The document's sole purpose is to persuade the user to call a specific phone number under the pretense of canceling the fraudulent order. The malicious nature is confirmed by several red flags, including an awkwardly phrased greeting and a phone number with a geographic area code (808) that is deceptively labeled as "Toll-Free." This tactic aims to route the victim to a scammer for social engineering and potential fraud.”


Case 4 - Fake Tax Refund from the Australian Taxation Office

As with previous cases, this PDF wasn’t flagged by any antivirus engine in VirusTotal, but Code Insights identified it as a phishing lure that impersonates the Australian Taxation Office.

b9b763e4b091bc59e9b9f355617622dbabdc1ff2de6707a94ccb26aa7682300e


As described by Code Insights:

“This document is a phishing lure designed to impersonate the Australian Taxation Office (ATO). The visual layer uses an authentic-looking government logo and the promise of a tax refund to entice the recipient into clicking an "Access Document" button. The purpose is to have the user provide an electronic signature for a supposed refund authorization, creating a sense of urgency and financial incentive. The document exhibits multiple red flags common to phishing attacks. These include a generic greeting, a suspicious reference to a .doc file (a common malware vector), instructions that discourage direct replies, and a complete lack of legitimate contact information or alternative methods for verification. The entire premise relies on tricking the user into clicking the button, which likely leads to a malicious website for credential theft or malware download.”


Auto-executing PDF Posing as a Movie Download

Unlike previous examples, this PDF was flagged by 13 antivirus engines in VirusTotal. In this case, the attack is embedded both in the internal structure of the file and its visual appearance. Code Insights correlates these two layers, the technical and the social, to expose the malicious intent.

44e653fe79d1ab160c784c06f4d99def6419e379ef3f802af9f48d595976d2c7


As described by Code Insights:

“The document presents a social engineering lure, masquerading as a download page for pirated movies […] to entice users into clicking links. This theme, centered on illegal content distribution, is a common tactic for malware delivery. Technical analysis of the PDF's internal structure corroborates the malicious intent. The file is configured with an /OpenAction command, a high-risk feature designed to automatically execute an action upon the document being opened […] The combination of a deceptive, high-risk theme with an automatic execution function indicates that the document’s purpose is to compromise the user's system.”

We are actively improving Code Insight based on what we learn from these early cases. PDF is the 6th most common file type submitted to VirusTotal, with around 100,000 new samples uploaded every day. That volume requires us to be strategic: for now, only a selected percentage of PDF files submitted via the public web interface are processed by Code Insight, as we test, tune, and scale the system.

These first results are helping us refine both effectiveness and performance. We’ll continue expanding coverage as we improve detection of threats.

Thursday, August 14, 2025

Code Insight Expands to Uncover Risks Across the Software Supply Chain

When we launched Code Insight, we started by analyzing PowerShell scripts. Since then, we have been continuously expanding its capabilities to cover more file types. Today, we announce that Code Insight can now analyze a broader range of formats crucial to the software supply chain. This includes browser extensions (CRX for Chrome, XPI for Firefox, VSIX for VS Code), software packages (Python Wheel, NPM), and protocols like MCP that enable Large Language Models to interact with external tools.


Audio version of this post, created with NotebookLM Deep Dive

Attackers are increasingly targeting these formats to distribute malware, steal data, or compromise systems. Traditional detection methods, which often rely on signatures or machine learning focused on classification, can struggle to keep up with the dynamic and obfuscated nature of these threats. This is where AI can make a real difference. By analyzing the underlying code logic, Code Insight can identify malicious behavior even in previously unseen threats, providing a deeper level of security analysis.

This is particularly relevant in a landscape where even a single malicious browser extension can lead to significant data breaches, financial loss, or the compromise of corporate networks.


A Viral Tweet and a Real-World Example

In the last few hours, a tweet from a seasoned crypto user (zak.eth) went viral, narrating how his wallet was drained by a malicious extension for the first time in over ten years of activity. This incident is a stark reminder that anyone can be a target.


This is a prime example of where Code Insight can be instrumental. It can analyze one of the suspicious extensions mentioned in the thread and reveal its malicious nature:

From here, we will explore different examples of the new formats supported by Code Insight and specific examples where traditional engines fail to detect a threat.


CRX (Chrome Extensions)

CRX files are the format used for packaging Google Chrome browser extensions. While they can enhance browsing, they also represent an attack vector if they contain malicious code. Here is an example of a seemingly legitimate "Norton Safe Search" extension. However, Code Insight's analysis reveals its true, malicious purpose:

6ca4466baf5ff09bab90a5d06bf113667717400daa59a287393e8f3f10959aba

The extension is obfuscated to hide its true purpose. The code in js/background.js communicates with a command and control (C2) server located at a domain unrelated to Norton. The most critical malicious behavior is its capability to fetch and execute arbitrary code from the C2 server. This allows the attacker to dynamically change the extension's functionality after installation, effectively turning the user's browser into a bot.

In another case, a banking trojan targeting Westpac customers was identified:

34244257f633e104d06b0c4273caca96eb916d26540eeea68495707cbc920bdb

This extension is a banking trojan specifically targeting Westpac customers. It operates as a Man-in-the-Browser (MitB) malware to steal credentials, session data, and funds. It establishes a persistent WebSocket connection to a hardcoded C2 server, collects all cookies from the browser and intercepts form submissions, specifically targeting the input field for the 'AuthorisationCode' (a 2FA/OTP token).


VSIX (Visual Studio Code Extensions)

VSIX files are used for extensions in Visual Studio Code, a popular code editor. Developers can be targeted through these extensions, potentially compromising their development environment and projects.

A deceptive "Zoom" extension for VS Code was found to be stealing user data:

5c89ba9e1bbb7ef869e4553081a40cabbd91a70506d759fd4e97eefb0434c074

The extension attempts to access sensitive user data by reading browser cookies from a known local SQLite database file. It also includes functionality to make external network requests to an unusual domain. which could be used to exfiltrate the collected sensitive data. This combination of local data collection and external communication is an indicator of malicious intent, specifically information theft.


XPI (Firefox Extensions)

XPI files are used for Firefox browser add-ons. Similar to Chrome extensions, they can be used to distribute malware.

A "Mass Tiktok Video Downloader" extension was found to be a phishing and data exfiltration tool:

2c0c8bd05a4942b389feaeb02c372b6443efac9d0931e0bdc602474178b54e7f

It presents a fake Facebook password confirmation popup to phish user credentials. Concurrently, its background script actively collects all browser cookies. All collected data, including the phished passwords, are exfiltrated to a Telegram bot API endpoint.


WHL (Python Wheel)

WHL files are a standard for distributing Python packages. The threats in these examples are not limited to intentionally malicious code, it also includes packages with critical vulnerabilities or insecure coding patterns that can be exploited in supply chain attacks.

An "hh-applicant-tool" designed to interact with an API was found to have a suspicious telemetry feature:

1a168e47cb2d81f54fe504e66e353251a772164959ec71517d2070bf96fee957

It collects data, including vacancy details, employer information, and Google Docs links found in messages, and sends it to a custom server. This communication explicitly disables SSL certificate verification (verify=False), making the data transfer vulnerable to Man-in-the-Middle attacks.

In another instance, a package named "ncatbot" contained a critical security vulnerability:

f2714f6b87689c4d631a587813d14c4e463be7251bf16ff383ad2b7940ca7a4d

A critical security vulnerability exists in the Linux installation process, which executes a remote script with root privileges using curl | sudo bash. This allows for arbitrary code execution and system compromise if the remote script is malicious or its source is compromised.


NPM (Node Package Manager)

NPM is the default package manager for Node.js and is central to the JavaScript ecosystem. Malicious NPM packages are a constant threat to developers and applications.

A package named "serverless-shop-functions" presented as a benign e-commerce application but contained two malicious Python scripts:

8f7a061901c935493e17f3f897a2b98b5ab21350593fda10a6936a84db5b28b7

Backdoor.Python.PolymorphNecro.h is identified as a polymorphic IRC botnet client. Its capabilities include: network sniffing, ARP poisoning, various DDoS attack methods. Main.py is a Discord-controlled Remote Access Trojan (RAT) with extensive capabilities, including: establishing persistence, executing arbitrary PowerShell commands, capturing and exfiltrating screenshots and webcam photos.


PyPI (Python Package Index)

PyPI is the official third-party software repository for Python. It's a common target for attackers looking to distribute malicious packages. However, the threat also comes from packages that, while not intentionally malicious, contain critical vulnerabilities in their design.

A package named python-mcp-client was found to have severe vulnerabilities allowing for remote code execution:

83c4c8d38e3eea555666e26ed85953b7479d46d9b4d2c12c521ae5f505b343d2

The package exposes severe vulnerabilities that allow for remote code execution (RCE) and arbitrary file system operations. The flask_app.py component allows users to dynamically add new MCP servers via the /api/add_server endpoint. This endpoint directly accepts user-provided command and args parameters, enabling an attacker to execute arbitrary shell commands on the host system.

By expanding Code Insight's capabilities, we aim to provide the cybersecurity community with a tool to better understand and mitigate the evolving threats within the software supply chain. Stay tuned as we continue to enhance our platform to counter new attack vectors.

Wednesday, June 04, 2025

YARA-X 1.0.0: The Stable Release and Its Advantages

Short note for everyone who already lives and breathes YARA:

Victor (aka plusvic) just launched YARA-X 1.0.0. Full details: https://virustotal.github.io/yara-x/blog/yara-x-is-stable/


Audio version of this post, created with NotebookLM Deep Dive

What changes for you

Area
YARA 4.x
YARA-X
Engine C/C++, manual memory Rust, memory-safe
Rule compatibility ~99 % work as-is
Speed (regex / loops) Can bottleneck scans Often 5–10× faster
Error messages Generic Line-accurate, clearer
CLI Plain text Colour, JSON/YAML dump, shell completion
Future work Bug-fix only New features land here


Why move now

  • Performance – heavy rules (large regex, deep loops) finish seconds faster.
  • Safety – Rust core avoids the usual memory bugs and makes crashes rare.
  • Maintainability – parser and scanner are decoupled; easier to embed or extend.
  • Better tooling – built-in formatter (yara-x fmt), linter-friendly output.
  • Active roadmap – new language features will go to YARA-X only.

We already use YARA-X at VirusTotal for Livehunt and Retrohunt. Billions of files later, it behaves.

Give it a spin, report issues, and send feedback our way. Huge thanks to Victor for pushing the project this far. Let’s keep making pattern matching simpler and faster

.

What 17,845 GitHub Repos Taught Us About Malicious MCP Servers

Spoiler: VirusTotal Code Insight’s preliminary audit flagged nearly 8% of MCP (Model Context Protocol) servers on GitHub as potentially forged for evil, though the sad truth is, bad intentions aren’t required to follow bad practices and publish code with critical vulnerabilities.


Audio version of this post, created with NotebookLM Deep Dive

Before we get started, a quick personal note. A couple of weeks ago, I announced at Google that I’m stepping away from my role as a manager of managers and getting back to my roots, focusing on the VirusTotal community. And I’m not doing it alone. I’m joined by some legendary names from the project’s early days, like Julio, the very first VirusTotal developer and Víctor, creator of YARA and YARA-X. In this new chapter, we’re going deep into AI, not just evolving VT and using it to analyze typical threats but also to hunt down the new ones riding the AI wave, like malicious models and MCPs among others.

As many of you already know, MCP (Model Context Protocol) is a simple but powerful standard that lets large language models interact with external tools and APIs via JSON-RPC. Think of it as a universal adapter, MCP turns scripts, services, and data sources into callable functions that models like Claude, GPT or Gemini can use to answer complex queries or automate tasks. In just a few months, MCP has gone from niche to near-standard with native support across most major LLM platforms.

Before building and releasing our own MCP server for VirusTotal (which is coming very soon) we wanted to take a step back and understand how this protocol is being used in the wild. Specifically: are people already abusing it to build malicious plugins? And if so, how could we detect and classify these threats inside VT?

With that in mind, I set out to run a quick three-phase experiment (aka three humble python scripts). First, a harvesting phase to collect as many GitHub projects as possible by querying the API for MCP-related keywords like “model-context-protocol”, “server_mcp” or “define_mcp_tool”, among others. Then came a filtering step to isolate the interesting repos, not everything with "MCP" in the README is a real server implementation, so I built a scoring system to identify true servers based on dependency files, import statements, keywords in code, presence of mcp.json, and more. After applying that filter, we ended up with a focused dataset of 17,845 likely MCP server projects.

Finally, as the third phase, we ran a security review using VT Code Insight powered by Gemini 2.5 Flash and taking advantage of its 1-million token context window, speed, and code analysis skills to evaluate each project as a whole. We asked Code Insight for a basic verdict and to flag any High, Medium, or Low vulnerabilities. But after just a few hundred analyses we had to hit pause, Code Insight was surfacing so many issues that the results quickly became overwhelming. So we tightened things up with a second and more focused prompt, asking Code Insight to look specifically for signs of intentional malicious behavior along with reasoning that supported a conclusion of malice.

We let the new prompt run on the full dataset and Code Insight got to work. In the end, it marked 1,408 repositories as likely designed to be malicious. After checking some of these results by hand, two things were clear to me. First: there are many possible attack vectors that can be used through an MCP server. And second: Code Insight seems to trust human developers too much, it often assumes that some bad practices and the resulting critical bugs couldn’t be accidental.

“This pattern—creating a powerful, remotely triggerable code execution vulnerability and simultaneously preparing a collection of sensitive data (including data not needed for normal operation)—is characteristic of an intentional backdoor designed for data exfiltration and system compromise. The dynamic tool generation serves as a plausible cover for the unsafe use of `exec`.” Oh, Code Insight… if only you knew the kind of chaos vibe coding is causing. We’re going to be very busy in cybersecurity cleaning up after these accidental masterpieces

We’ve confirmed some of the flagged projects were just proof-of-concepts and security researcher demos, and many tiny “hello-world” examples were missing basic security features which Code Insight called out as “likely malicious”, because no sane developer would ship that to production. But even if you filter out the hobby projects, there’s still a scary amount of real attack vectors and critical vulnerabilities out there.

While we continue manually reviewing Code Insight’s reports to learn more about the issues and weak spots it uncovered, we also asked Gemini 2.5 Flash to help us categorize them. We provided it with the problem summaries from the 1,408 MCP-related repositories flagged as potentially problematic, and asked for a simple list, just a brief enumeration of the attack techniques involved. Gemini came back with the following list:

Attack vector Example Indicators
Malicious-Server Supply Chain Self-update scripts, install hooks from non-canonical URLs, latest tag pulls.
Rogue Server / Impersonation Hard-coded IPs or typo-squatted domains, no TLS/mTLS verification.
Credential Harvesting Code that reads ~/.aws, Keychain, or env vars and posts to external endpoint.
Tool-Based RCE & File Ops subprocess, exec, or rm -rf paths built from LLM/user input.
Server-Side Command Injection Server concatenates JSON-RPC params into shell/SQL without escaping.
Semantic-Gap Poisoning Manifest says “read-only”; implementation writes files or opens sockets.
Over-broad Permissions OAuth scopes * / “full_access”, multiple data silos bridged in one tool.
Indirect Prompt Injection HTML comments, zero-width chars, or Base64 blobs returned to the host.
Context/Data Poisoning Unvalidated web-scrape fed straight into context= parameter.
Sampling-Feature Abuse Server requests giant completions before any other call; leaks system prompt.
Living-Off-The-Land Malicious server does nothing but orchestrate trusted tools already installed.
Chained MCP Exploitation Output from Server A becomes params for Server B within one loop.
Financial-Fraud Tools / DoS / Persistence Payment APIs with LLM-supplied dest-IDs, infinite loops without rate limits, hot-swapped binaries.

If you're building or defending around MCPs, there are a few quick wins to keep things safer:

  • treat MCP servers like browser extensions (sign, hash, and pin specific versions)
  • isolate them in containers or WASM sandboxes with strict file and network limits
  • make permissions visible and revocable through a clear, zero-trust-style UI
  • and never let model outputs go unfiltered, strip out sneaky stuff like invisible characters, HTML comments, or rogue script tags before looping anything back into your LLM.

MCPs are growing fast (almost 18,000 servers already in the wild), and with that growth comes a mountain of security debt. The good news? We’ll soon be launching a dedicated feature in VirusTotal to analyze MCP servers.
Stay tuned… we’re just getting started

Thursday, January 09, 2025

, , , ,

Research that builds detections

Note: You can view the full content of the blog here.

Introduction

Detection engineering is becoming increasingly important in surfacing new malicious activity. Threat actors might take advantage of previously unknown malware families - but a successful detection of certain methodologies or artifacts can help expose the entire infection chain.
In previous blog posts, we announced the integration of Sigma rules for macOS and Linux into VirusTotal, as well as ways in which Sigma rules can be converted to YARA to take advantage of VirusTotal Livehunt capabilities. In this post, we will show different approaches to hunt for interesting samples and derive new Sigma detection opportunities based on their behavior.

Tell me what role you have and I'll tell you how you use VirusTotal

VirusTotal is a really useful tool that can be used in many different ways. We have seen how people from SOCs and Incident Response teams use it (in fact, we have our VirusTotal Academy videos for SOCs and IRs teams), and we have also shown how those who hunt for threats or analyze those threats can use it too.
But there's another really cool way to use VirusTotal - for people who build detections and those who are doing research. We want to show everyone how we use VirusTotal in our work. Hopefully, this will be helpful and also give people ideas for new ways to use it themselves.
To explain our process, we used examples of Lummac and VenomRAT samples that we found in recent campaigns. These caught our attention due to some behaviors that had not been identified by public detection rules in the community. For that reason we have created two Sigma rules to share with the community, but if you want to get all the details about how we identified it and started our research, go to our Google Threat Intelligence community blog.

Our approach

As detection engineers, it is important to look for techniques that can be in use by multiple threat actors - as this makes tracking malicious activity more efficient. Prior to creating those detections, it is best to check existing research and rule collections, such as the Sigma rules repository. This can save time and effort, as well as provide insight into previously observed samples that can be further researched.
A different approach would be to instead look for malicious files that are not detected by existing Sigma rules, since they can uncover novel methodologies and provide new opportunities for detection creation.
One approach is to hunt for files that are flagged by at least five different AV vendors, were recently uploaded within the last month, have sandbox execution (in order to view their behavior), and which have not triggered any Crowdsourced Sigma rules.
p:5+ have:behavior fs:30d+ not have:sigma
This initial query can be adapted to incorporate additional filters that the researcher may find relevant. These could include modifiers to identify for example, the presence of the PowerShell process in the list of executed processes (behavior_created_processes:powershell.exe), filtering results to only include documents (type:document), or identifying communication with services like Pastebin (behavior_network:pastebin.com).
Another way to go is to look at files that have been flagged by at least five AV’s and were tested in either Zenbox or CAPE. These sandboxes often have great logs produced by Sysmon, which are really useful for figuring out how to spot these threats. Again, we'd want to focus on files uploaded in the last month that haven't triggered any Sigma rules. This gives us a good starting point for building new detection rules.
p:5+ (sandbox_name:"CAPE Sandbox" or sandbox_name:"Zenbox") fs:30d+ not have:sigma
Lastly, another idea is to look for files that have not triggered many high severity detections from the Sigma Crowdsourced rules, as these can be more evasive. Specifically, we will look for samples with zero critical, high or medium alerts - and no more than two low severity ones.
p:5+ have:behavior fs:30d+ sigma_critical:0 sigma_high:0 sigma_medium:0 sigma_low:2-
With these queries, we can start investigating some samples that may be interesting to create detection rules.

Our detections for the community

Our approach helps us identify behaviors that seem interesting and worth focusing on. In our blog, where we explain this approach in detail, we highlighted two campaigns linked to Lummac and VenomRAT that exhibited interesting activity. Because of this, we decided to share the Sigma rules we developed for these campaigns. Both rules have been published in Sigma's official repository for the community.

Detect The Execution Of More.com And Vbc.exe Related to Lummac Stealer

title: Detect The Execution Of More.com And Vbc.exe Related to Lummac Stealer
  id: 19b3806e-46f2-4b4c-9337-e3d8653245ea
  status: experimental
  description: Detects the execution of more.com and vbc.exe in the process tree. This behaviors was observed by a set of samples related to Lummac Stealer. The Lummac payload is injected into the vbc.exe process.
  references:
      - https://www.virustotal.com/gui/file/14d886517fff2cc8955844b252c985ab59f2f95b2849002778f03a8f07eb8aef
      - https://strontic.github.io/xcyclopedia/library/more.com-EDB3046610020EE614B5B81B0439895E.html
      - https://strontic.github.io/xcyclopedia/library/vbc.exe-A731372E6F6978CE25617AE01B143351.html
  author: Joseliyo Sanchez, @Joseliyo_Jstnk
  date: 2024-11-14
  tags:
      - attack.defense-evasion
      - attack.t1055
  logsource:
      category: process_creation
      product: windows
  detection:
      # VT Query: behaviour_processes:"C:\\Windows\\SysWOW64\\more.com" behaviour_processes:"C:\\Windows\\Microsoft.NET\\Framework\\v4.0.30319\\vbc.exe"
      selection_parent:
          ParentImage|endswith: '\more.com'
      selection_child:
          - Image|endswith: '\vbc.exe'
          - OriginalFileName: 'vbc.exe'
      condition: all of selection_*
  falsepositives:
      - Unknown
  level: high

Sysmon event for: Detect The Execution Of More.com And Vbc.exe Related to Lummac Stealer

{
  "System": {
    "Provider": {
      "Guid": "{5770385F-C22A-43E0-BF4C-06F5698FFBD9}",
      "Name": "Microsoft-Windows-Sysmon"
    },
    "EventID": 1,
    "Version": 5,
    "Level": 4,
    "Task": 1,
    "Opcode": 0,
    "Keywords": "0x8000000000000000",
    "TimeCreated": {
      "SystemTime": "2024-11-26T16:23:05.132539500Z"
    },
    "EventRecordID": 692861,
    "Correlation": {},
    "Execution": {
      "ProcessID": 2396,
      "ThreadID": 3116
    },
    "Channel": "Microsoft-Windows-Sysmon/Operational",
    "Computer": "DESKTOP-B0T93D6",
    "Security": {
      "UserID": "S-1-5-18"
    }
  },
  "EventData": {
    "RuleName": "-",
    "UtcTime": "2024-11-26 16:23:05.064",
    "ProcessGuid": "{C784477D-F5E9-6745-6006-000000003F00}",
    "ProcessId": 4184,
    "Image": "C:\\Windows\\Microsoft.NET\\Framework\\v4.0.30319\\vbc.exe",
    "FileVersion": "14.8.3761.0",
    "Description": "Visual Basic Command Line Compiler",
    "Product": "Microsoft® .NET Framework",
    "Company": "Microsoft Corporation",
    "OriginalFileName": "vbc.exe",
    "CommandLine": "C:\\Windows\\Microsoft.NET\\Framework\\v4.0.30319\\vbc.exe",
    "CurrentDirectory": "C:\\Users\\george\\AppData\\Roaming\\comlocal\\RUYCLAXYVMFJ\\",
    "User": "DESKTOP-B0T93D6\\george",
    "LogonGuid": "{C784477D-9D9B-66FF-6E87-050000000000}",
    "LogonId": "0x5876e",
    "TerminalSessionId": 1,
    "IntegrityLevel": "High",
    "Hashes": {
      "SHA1": "61F4D9A9EE38DBC72E840B3624520CF31A3A8653",
      "MD5": "FCCB961AE76D9E600A558D2D0225ED43",
      "SHA256": "466876F453563A272ADB5D568670ECA98D805E7ECAA5A2E18C92B6D3C947DF93",
      "IMPHASH": "1460E2E6D7F8ECA4240B7C78FA619D15"
    },
    "ParentProcessGuid": "{C784477D-F5D4-6745-5E06-000000003F00}",
    "ParentProcessId": 6572,
    "ParentImage": "C:\\Windows\\SysWOW64\\more.com",
    "ParentCommandLine": "C:\\Windows\\SysWOW64\\more.com",
    "ParentUser": "DESKTOP-B0T93D6\\george"
  }
} 

File Creation Related To RAT Clients

title: File Creation Related To RAT Clients
  id: 2f3039c8-e8fe-43a9-b5cf-dcd424a2522d
  status: experimental
  description: File .conf created related to VenomRAT, AsyncRAT and Lummac samples observed in the wild.
  references:
      - https://www.virustotal.com/gui/file/c9f9f193409217f73cc976ad078c6f8bf65d3aabcf5fad3e5a47536d47aa6761
      - https://www.virustotal.com/gui/file/e96a0c1bc5f720d7f0a53f72e5bb424163c943c24a437b1065957a79f5872675
  author: Joseliyo Sanchez, @Joseliyo_Jstnk
  date: 2024-11-15
  tags:
      - attack.execution
  logsource:
      category: file_event
      product: windows
  detection:
      # VT Query: behaviour_files:"\\AppData\\Roaming\\DataLogs\\DataLogs.conf"
      # VT Query: behaviour_files:"DataLogs.conf" or behaviour_files:"hvnc.conf" or behaviour_files:"dcrat.conf"
      selection_required:
          TargetFilename|contains: '\AppData\Roaming\'
      selection_variants:
          TargetFilename|endswith:
              - '\datalogs.conf'
              - '\hvnc.conf'
              - '\dcrat.conf'
          TargetFilename|contains:
              - '\mydata\'
              - '\datalogs\'
              - '\hvnc\'
              - '\dcrat\'
      condition: all of selection_*
  falsepositives:
      - Legitimate software creating a file with the same name
  level: high

Sysmon event for: File Creation Related To RAT Clients

{
  "System": {
    "Provider": {
      "Guid": "{5770385F-C22A-43E0-BF4C-06F5698FFBD9}",
      "Name": "Microsoft-Windows-Sysmon"
    },
    "EventID": 11,
    "Version": 2,
    "Level": 4,
    "Task": 11,
    "Opcode": 0,
    "Keywords": "0x8000000000000000",
    "TimeCreated": {
      "SystemTime": "2024-12-02T00:52:23.072811600Z"
    },
    "EventRecordID": 1555690,
    "Correlation": {},
    "Execution": {
      "ProcessID": 2624,
      "ThreadID": 3112
    },
    "Channel": "Microsoft-Windows-Sysmon/Operational",
    "Computer": "DESKTOP-B0T93D6",
    "Security": {
      "UserID": "S-1-5-18"
    }
  },
  "EventData": {
    "RuleName": "-",
    "UtcTime": "2024-12-02 00:52:23.059",
    "ProcessGuid": "{C784477D-04C6-674D-5C06-000000004B00}",
    "ProcessId": 7592,
    "Image": "C:\\Users\\george\\Desktop\\ezzz.exe",
    "TargetFilename": "C:\\Users\\george\\AppData\\Roaming\\MyData\\DataLogs.conf",
    "CreationUtcTime": "2024-12-02 00:52:23.059",
    "User": "DESKTOP-B0T93D6\\george"
  }

Wrapping up

Detection engineering teams can proactively create new detections by hunting for samples that are being distributed and uploaded to our platform. Applying our approach can benefit in the development of detection on the latest behaviors that do not currently have developed detection mechanisms. This could potentially help organizations be proactive in creating detections based on threat hunting missions.
The Sigma rules created to detect Lummac activity have been used during threat hunting missions to identify new samples of this family in VirusTotal. Another use is translating them into the language of the SIEM or EDR available in the infrastructure, as they could help identify potential behaviors related to Lummac samples observed in late 2024. After passing quality controls and being published on Sigma's public GitHub, they have been integrated for use in VirusTotal, delivering the expected results. You can use them in the following way:
Lummac Stealer Activity - Execution Of More.com And Vbc.exe
sigma_rule:a1021d4086a92fd3782417a54fa5c5141d1e75c8afc9e73dc6e71ef9e1ae2e9c
File Creation Related To RAT Clients
sigma_rule:8f179585d5c1249ab1ef8cec45a16d112a53f91d143aa2b0b6713602b1d19252
We hope you found this blog interesting and useful, and as always we are happy to hear your feedback.