Tuesday, September 30, 2025

, , ,

Advanced Threat Hunting: Automating Large-Scale Operations with LLMs

Last week, we were fortunate enough to attend the fantastic LABScon conference, organized by the SentinelOne Labs team. While there, we presented a workshop titled 'Advanced Threat Hunting: Automating Large-Scale Operations with LLMs.' The main goal of this workshop was to show attendees how they could automate their research using the VirusTotal API and Gemini. Specifically, we demonstrated how to integrate the power of Google Colab to quickly and efficiently generate Jupyter notebooks using natural language.

It goes without saying that the use of LLMs is a must for every analyst today. For this reason, we also want to make life easier for everyone who uses the VirusTotal API for research.

The Power of the VirusTotal API and vt-py

The VirusTotal API is the programmatic gateway to our massive repository of threat intelligence data. While the VirusTotal GUI is great for agile querying, the API unlocks the ability to conduct large-scale, automated investigations and access raw data with more pivoting opportunities.

To make interacting with the API even easier, we recommend using the vt-py library. It simplifies much of the complexity of HTTP requests, JSON parsing, and rate limit management, making it the go-to choice for Python users.

From Natural Language to Actionable Intelligence with Gemini

To bridge the gap between human questions and API queries, we can leverage the integrated Gemini in Google Colab. We have created a "meta Colab" notebook that is pre-populated with working real code snippets for interacting with the VirusTotal API to retrieve different information such as campaigns, threat actors, malware, samples, URLs among others (which we will share soon). This provides Gemini with the necessary context to understand your natural language requests and generate accurate Python code to query the VirusTotal API. Gemini doesn't call the API directly; it creates the code snippet for you to execute.

For Gemini to generate accurate and relevant code, it needs context. Our meta Colab notebook is filled with examples that act as a guide. For complex questions, it will be nice to provide the exact field names that you want to work with. This context generally falls into two categories:

  1. Reference Documentation: We include detailed documentation directly in the Colab. For example, we provide a comprehensive list of all available file search modifiers for the VirusTotal Intelligence search endpoint. This gives Gemini the "vocabulary" it needs to construct precise queries.
  2. Working Code Examples: The notebook is pre-populated with dozens of working vt-py code snippets for common tasks like retrieving file information, performing an intelligence search, or getting relationships. This gives Gemini the "grammar" and correct patterns for interacting with our API.

Example of code snippet context that we have included in our meta colab:

query_results_with_behaviors = []
query = "have:sigma have:yara have:ids have:malware_config fs:1d+ have:bundled_file tag:overlay"
RELATIONS = "behaviours"

async for itemobj in cli.iterator('/intelligence/search',params={'query': query, 'relationships': RELATIONS, 'relationship_attributes[%s]'%(RELATIONS): '\*'},limit=0):
    query_results_with_behaviors.append(itemobj.to_dict())

res_with_behaviors = list(query_results_with_behaviors)
Diagram showing the flow from User's Action to Actionable Intelligence via Gemini and VirusTotal API

Practical Example: Hunting for Malicious LNK Files

One of the exercises in our workshop focused on analyzing malicious LNK files submitted to VirusTotal in a specific period of time to limit results. The goal was to discover common execution techniques and interesting command-line patterns.

Here’s how you could approach this with natural language queries in our meta Colab:

1. Retrieve all LNK file objects matching the criteria.

You can start by asking Gemini:

Create a code snippet to retrieve all LNK files submitted from January 1st, 2025, until April 1st, 2025, with 5 or more positives.

Code generated:

query_results = []

# Query for LNK files submitted between January 1st, 2025, and April 1st, 2025, with 5 or more positives.
query = "fs:2025-01-01+ fs:2025-04-01- p:5+ type:lnk"

async for itemobj in cli.iterator('/intelligence/search',params={'query': "%s"%(query)},limit=0):
    query_results.append(itemobj.to_dict())

lnk_files = list(query_results)
print(f"Found {len(lnk_files)} LNK files matching the criteria.")

2. Flatten the data and visualize it.

Next, you can ask Gemini to process the retrieved data and create visualizations:

Flatten the data in a dataframe and create a bar chart depicting the most common TargetFileDOSName values.

Code generated:

import pandas as pd
import plotly.express as px

# Extract the 'attributes' dictionary from each object in our results
attributes_list = [result.get('attributes', {}) for result in lnk_files]

# Use json_normalize to create a flat DataFrame
df_flat = pd.json_normalize(attributes_list)

# Check if 'exiftool.TargetFileDOSName' column exists and is not empty
if 'exiftool.TargetFileDOSName' in df_flat.columns and not df_flat['exiftool.TargetFileDOSName'].empty:
    # Count the occurrences of each TargetFileDOSName
    target_file_counts = df_flat['exiftool.TargetFileDOSName'].value_counts().reset_index()
    target_file_counts.columns = ['TargetFileDOSName', 'Count']

    # Get the top 10 most common TargetFileDOSName values for visualization
    top_target_files = target_file_counts.head(10)

    # Create an interactive bar chart
    fig = px.bar(top_target_files,
                 x='Count',
                 y='TargetFileDOSName',
                 orientation='h',
                 title='Top 10 Most Common TargetFileDOSName in LNK files',
                 labels={'Count': 'Number of Occurrences', 'TargetFileDOSName': 'Target File DOS Name'})
    fig.show()
else:
    print("The 'exiftool.TargetFileDOSName' column was not found or is empty in the flattened data.")

print("DataFrame shape:", df_flat.shape)
df_flat.head()

This will generate a code snippet using pandas and Plotly to give you a clear view of the most frequent target files in the malicious LNKs.

3. Visualize relationships with a Sankey diagram.

To dig deeper and understand the relationships between the infected machines and the malware, you can ask for a more advanced visualization:

Create a Sankey diagram connecting MachineID (source) to TargetFileDOSName (target), with link widths representing the number of occurrences. Include only MachineID values with 10 or more occurrences, grouping the rest as "Other", and make the visualization interactive with hover details.

Code generated:

import plotly.graph_objects as go
import pandas as pd

# Ensure df_flat is available from previous steps
# If not, you might need to re-run the cell that creates df_flat.

# Check if necessary columns exist in df_flat
if 'lnk_info.machine_id' in df_flat.columns and 'exiftool.TargetFileDOSName' in df_flat.columns:

    # Calculate the frequency of each machine ID and target file name pair
    pair_frequencies = df_flat.groupby(['lnk_info.machine_id', 'exiftool.TargetFileDOSName']).size().reset_index(name='frequency')

    # Identify MachineIDs with less than 10 occurrences
    machine_id_counts = pair_frequencies.groupby('lnk_info.machine_id')['frequency'].sum().reset_index(name='count')
    other_machine_ids = machine_id_counts[machine_id_counts['count'] < 10]['lnk_info.machine_id'].tolist()

    # Replace infrequent MachineIDs with 'Other'
    pair_frequencies['source_node'] = pair_frequencies['lnk_info.machine_id'].apply(lambda x: 'Other' if x in other_machine_ids else x)

    # Aggregate frequencies for the 'Other' node and other valid nodes
    sankey_data = pair_frequencies.groupby(['source_node', 'exiftool.TargetFileDOSName'])['frequency'].sum().reset_index()

    # Create unique labels for the Sankey nodes
    all_nodes = pd.concat([sankey_data['source_node'], sankey_data['exiftool.TargetFileDOSName']]).unique()
    label_to_index = {label: i for i, label in enumerate(all_nodes)}

    # Create the Sankey diagram
    fig = go.Figure(data=[go.Sankey(
        node=dict(
            pad=15,
            thickness=20,
            line=dict(color="black", width=0.5),
            label=all_nodes,
            color="blue"
        ),
        link=dict(
            source=sankey_data['source_node'].map(label_to_index),
            target=sankey_data['exiftool.TargetFileDOSName'].map(label_to_index),
            value=sankey_data['frequency'],
            label=sankey_data.apply(lambda row: f"{row['source_node']} -> {row['exiftool.TargetFileDOSName']}: {row['frequency']}", axis=1)
        )
    )])

    fig.update_layout(title_text="MachineID to TargetFileDOSName Sankey Diagram (Infrequent MachineIDs grouped as 'Other')", font_size=10)
    fig.show()
else:
    print("Required columns 'lnk_info.machine_id' or 'exiftool.TargetFileDOSName' not found in the DataFrame.")

This will create a powerful visualization that shows how different machines are being targeted with specific files, allowing us identify patterns for each machineID metadata.

Sankey diagram

4. Analyze the behavior of the LNK files.

Finally, you can investigate the behavior of the LNK files themselves, for example, by looking at how they are executed:

Create a pie chart depicting the distribution of window behaviors for LNK target processes (hidden, minimized,...).

Code generated:

import plotly.express as px
import pandas as pd

# Check if the relevant column exists
if 'lnk_info.header.show_window_str' in df_flat.columns and not df_flat['lnk_info.header.show_window_str'].empty:
    # Count the occurrences of each window behavior string
    window_behavior_counts = df_flat['lnk_info.header.show_window_str'].value_counts().reset_index()
    window_behavior_counts.columns = ['Window Behavior', 'Count']

    # Create an interactive pie chart
    fig = px.pie(window_behavior_counts,
                 names='Window Behavior',
                 values='Count',
                 title='Distribution of LNK Target Process Window Behaviors',
                 hover_data={'Count': True})
    fig.show()
else:
    print("The 'lnk_info.header.show_window_str' column was not found or is empty in the flattened data. Cannot create the pie chart.")
Pie chart of window behaviors

This allows you to quickly identify if the LNK files are trying to hide their execution, a common use case for malware.

5. Geographic Threat Mapping of Submissions

Understanding the geographical distribution of malware submissions can provide crucial insights into whether a campaign is widespread or targeting a specific region. By analyzing the submission metadata for our set of LNK files, an analyst can quickly identify potential hotspots, which could indicate a geographically focused attack. This is achieved by fetching the submissions relationship for each file, extracting the country of origin for each submission, and plotting this data on a world map (please note that submissions doesn’t mean targeted countries, but in some cases could be related).

Using the LNK files identified previously, create a code snippet that retrieves the submissions relationship for each file. Then, count the number of submissions from each country and display the results on a choropleth world map to visualize the submission hotspots.

In this case, Gemini uses the step-by-step execution, which basically is a set of consecutive executions. The code generated was:

Step 1
query_results_with_submissions = []

# Define the search query string for LNK files
query = "fs:2025-01-01+ fs:2025-04-01- p:5+ type:lnk"

# Define the relationship to request
RELATIONS = "submissions"

# Define the relationship attributes to retrieve (all attributes)
ATTRIBUTES_RELATIONS = "*"

# Use cli.iterator to perform an intelligence search with the specified query, relationships, and relationship attributes.
async for itemobj in cli.iterator('/intelligence/search', params={'query': query, 'relationships': RELATIONS, 'relationship_attributes[%s]'%(RELATIONS): ATTRIBUTES_RELATIONS}, limit=0):
    query_results_with_submissions.append(itemobj.to_dict())

# Convert query_results_with_submissions to a list called lnk_files_with_submissions
lnk_files_with_submissions = list(query_results_with_submissions)

# Print the number of LNK files found with submission data.
print(f"Found {len(lnk_files_with_submissions)} LNK files with submission data matching the criteria.")
Step 2
import pandas as pd

# Initialize an empty list to store submission countries
submission_countries = []

# Iterate through each item (LNK file) in the lnk_files_with_submissions list
for item in lnk_files_with_submissions:
    # Check if 'relationships' and 'submissions' exist and are not empty
    if 'relationships' in item and 'submissions' in item['relationships'] and item['relationships']['submissions']['data']:
        # Iterate through each submission object
        for submission in item['relationships']['submissions']['data']:
            # Extract the 'country' from 'submission['attributes']' and append it to the list
            if 'country' in submission['attributes']:
                submission_countries.append(submission['attributes']['country'])

# Convert the submission_countries list into a pandas Series
submission_countries_series = pd.Series(submission_countries)

# Count the occurrences of each country and store in a DataFrame
country_counts = submission_countries_series.value_counts().reset_index()
country_counts.columns = ['country', 'count']

print(f"Total unique countries found: {len(country_counts)}")
print("Top 5 submission countries:")
print(country_counts.head())
Step 3
!pip install pycountry
import plotly.express as px
import pandas as pd
import pycountry

# Function to get ISO-3 code from country name
def get_iso3(country_name):
    try:
        return pycountry.countries.search_fuzzy(country_name)[0].alpha_3
    except:
        return None

# Apply the function to get ISO-3 codes
country_counts['iso_alpha'] = country_counts['country'].apply(get_iso3)

# Create a world heatmap using Plotly Express with ISO-3 codes
fig = px.choropleth(country_counts.dropna(subset=['iso_alpha']), # Drop rows where ISO code couldn't be found
                      locations="iso_alpha", # Use ISO-3 codes for locations
                      locationmode='ISO-3',   # Specify ISO-3 location mode
                      color="count",
                      hover_name="country",
                      color_continuous_scale="Viridis",
                      title="Submission Country Heatmap")

# Update layout to increase figure size
fig.update_layout(
    autosize=False,
    width=1000, # Adjust width as needed
    height=600 # Adjust height as needed
)

fig.show()
World heatmap of submissions

More Examples

Here are some other examples of how you can use natural language to query the VirusTotal API:

  • Find malicious Chrome extensions (based on this blog):

This specific question is actually a really good example of how Google Colab will run the step-by-step process. As you can see, we asked different questions in the same prompt, for that reason Gemini will do it in different steps.

Retrieve all the samples from the /intelligence/search endpoint with the following query: fs:2025-08-01+ fs:2025-08-14- p:5+ tag:crx have:codeinsight.

Get the bundled_files and create a new dataframe with them too, but store only those having 5 or more malicious value.

Then identify the top 5 magika of those bundled files obtained

Code generated:

Step 1
query_results_with_bundled_files = []

# Define the search query string
query = "fs:2025-08-01+ fs:2025-08-14- p:5+ tag:crx have:codeinsight"

# Define the relationship to request
RELATIONS = "bundled_files"

# Define the relationship attributes to retrieve (all attributes)
ATTRIBUTES_RELATIONS = "*"

# Use cli.iterator to perform an intelligence search with the specified query, relationships, and relationship attributes.
async for itemobj in cli.iterator('/intelligence/search', params={'query': query, 'relationships': RELATIONS, 'relationship_attributes[%s]'%(RELATIONS): ATTRIBUTES_RELATIONS}, limit=0):
    query_results_with_bundled_files.append(itemobj.to_dict())

# Convert query_results_with_bundled_files to a list called crx_files_with_bundled_files
crx_files_with_bundled_files = list(query_results_with_bundled_files)

# Print the number of found CRX files with bundled file data.
print(f"Found {len(crx_files_with_bundled_files)} CRX files with bundled file data matching the criteria.")
Step 2
import pandas as pd

# List to store all bundled file attributes
all_bundled_files_data = []

# Iterate through each CRX file with bundled file data
for crx_file in crx_files_with_bundled_files:
    if 'relationships' in crx_file and 'bundled_files' in crx_file['relationships']:
        for bundled_file in crx_file['relationships']['bundled_files']['data']:
            all_bundled_files_data.append(bundled_file.get('attributes', {}))

# Create a DataFrame from the extracted bundled file attributes
df_bundled_files = pd.json_normalize(all_bundled_files_data)

# Filter for bundled files with 5 or more malicious detections
df_malicious_bundled_files = df_bundled_files[df_bundled_files['last_analysis_stats.malicious'] >= 5]

print(f"Total bundled files extracted: {len(df_bundled_files)}")
print(f"Bundled files with 5 or more malicious detections: {len(df_malicious_bundled_files)}")

df_malicious_bundled_files.head()
Step 3
import pandas as pd
import plotly.express as px

# Check if the 'magika' column exists and is not empty in the filtered DataFrame
if 'magika' in df_malicious_bundled_files.columns and not df_malicious_bundled_files['magika'].empty:
    # Count the occurrences of each magika type
    magika_counts = df_malicious_bundled_files['magika'].value_counts().reset_index()
    magika_counts.columns = ['Magika Type', 'Count']

    # Get the top 5 most frequent magika types
    top_5_magika = magika_counts.head(5)

    print("Top 5 Magika Types in malicious bundled files:")
    print(top_5_magika)

    # Visualize the top 5 magika types
    fig = px.bar(top_5_magika,
                 x='Count',
                 y='Magika Type',
                 orientation='h',
                 title='Top 5 Magika Types in Malicious Bundled Files',
                 labels={'Count': 'Number of Occurrences', 'Magika Type': 'Magika Type'}) 
    fig.update_layout(yaxis={'categoryorder':'total ascending'}) # Order bars by count
    fig.show()
else:
    print("The 'magika' column was not found or is empty in the filtered malicious bundled files DataFrame. Cannot identify top magika types.")
  • Retrieve threat actors:
Retrieve threat actors targeting the United Kingdom with an espionage motivation. Sort the results in descending order of relevance. Display the total number of threat actors and their names.
  • Investigate campaigns:
Retrieve information about threat actors and malware involved in campaigns targeting Pakistan. For each threat actor, retrieve its country of origin, motivations, and targeted industries. For each malware, retrieve its name.

What’s next

This workshop, co-authored with Aleksandar from Sentinel LABS, will be presented at future conferences to show the community how to get the most out of the VirusTotal API. We'll be updating the content of our meta colab regularly and will share more information soon about how to get the Google Colab.

In the meantime, if you have any feedback or ideas to contribute, we are open to suggestions.

Friday, September 05, 2025

, ,

Supercharging Your Threat Hunts: Join VirusTotal at Labscon for a Workshop on Automation and LLMs

We are excited to announce that our colleague Joseliyo Sánchez, will be at Labscon to present our workshop: Advanced Threat Hunting: Automating Large-Scale Operations with LLMs. This workshop is a joint effort with SentinelOne and their researcher, Aleksandar Milenkoski. 

In today's rapidly evolving threat landscape, security professionals face an overwhelming tide of data and increasingly sophisticated adversaries. This hands-on workshop is designed to empower you to move beyond the traditional web interface and harness the full potential of the VirusTotal Enterprise API for large-scale, automated threat intelligence and hunting. 

We will dive deep into how you can use the VirusTotal Enterprise API with Python and Google Colab notebooks to automate the consumption of massive datasets. You'll learn how to track the behaviors of advanced persistent threat (APT) actors and cybercrime groups through practical, real-time exercises. 

A key part of our workshop will focus on leveraging Large Language Models (LLMs) to supercharge your analysis. We'll show how you can use AI to help understand complex data, build better queries, and create insightful visualizations to enrich your information for a deeper understanding of threats. 

This session is ideal for cyber threat intelligence analysts, threat hunters, incident responders, SOC analysts, and security researchers looking to automate and scale up their threat hunting workflows. 

After the workshop, we will publish a follow-up blog post that will delve deeper into some of the exercises and examples presented, providing a valuable resource for further learning and implementation. 

We look forward to seeing you at Labscon! 

(All of the scenarios are compatible with Google Threat Intelligence)

 ---- 
Conference website: https://www.labscon.io/ 
Date: September 17-20, 2025 
Registration: Invite-Only 
Place: Scottsdale, Arizona 
Duration: 3-5h

Thursday, September 04, 2025

Uncovering a Colombian Malware Campaign with AI Code Analysis

VirusTotal Code Insight keeps adding new file formats. This time, we’re looking at two vector-based formats from very different eras: SWF and SVG. Curiously, right after we rolled out this update in production, one of the very first submitted files gave us a perfect, and unexpected, example of Code Insight in action: it uncovered an undetected malware campaign using SVG files that impersonated the Colombian justice system.

Audio version of this post, created with NotebookLM Deep Dive

SWF: a blast from the past

Flash is dead, Adobe killed it in 2020 and browsers stopped supporting it shortly after. But surprisingly, SWF files still show up on VirusTotal. Whether it’s old malware resurging, retro hunting, or long-tail campaigns, they haven’t disappeared completely.

In fact, VirusTotal received 47,812 unique SWF files in the last 30 days that had never been seen before, and 466 of them were flagged as malicious by at least one antivirus engine.

SWF files are binary and compiled. That means Code Insight needs to:

  • Unpack and decompress the container (often zlib or LZMA)
  • Parse the internal tag structure
  • Extract embedded scripts, either ActionScript 2 (AVM1) or ActionScript 3 (AVM2 bytecode + decompiling/disassembling)

Once we lift those scripts into something closer to pseudocode or readable disassembly, the LLM steps in to summarize what the file is doing and why it might be suspicious.

SVG: modern, open, and still abusable

SVGs, on the other hand, are very much alive. It’s a standard web format, open, text-based, and everywhere: websites, design tools, build systems. And that’s also why attackers like it.

In the last 30 days alone, VirusTotal received 140,803 unique SVG files that had never been seen before, and 1,442 of them were flagged as malicious by at least one antivirus engine. That's roughly 1% showing up with detections, just like SWF curiously.

SVG is just XML with <svg> at the root. If it’s a .svgz, we decompress it first. From there, Code Insight looks for:

  • Embedded JavaScript in <script> tags or event handlers (onload, onclick…)
  • Redirects using javascript: URLs or location.href
  • Obfuscation tricks (CDATA, character entities, base64 payloads, etc.)

Because SVG is plain text, the challenge isn’t unpacking, it’s spotting the malicious logic hiding in plain sight.

Let’s see a couple of examples:

When a SWF is flagged, but isn’t malicious

One common challenge in threat analysis is dealing with files that trigger detections in just a few antivirus engines. They’re not clean, but they’re not clearly malicious either. These gray areas force analysts to dig deeper, often wasting time chasing false positives.

The SWF file in the screenshot is a perfect example.

350422c3915a8a1a1336147f89061b25c8354af58db0050e2f9ef2b384e59f62

It was flagged by 3 out of 63 engines. Enough to raise doubts, but not conclusive. The detections mention known SWF heuristics and an old CVE.

Thanks to Code Insight, we can quickly understand what’s going on. It identifies the SWF as a complex ActionScript-based game, including 3D rendering, sound management, and a full level editor. The analysis also explains why the file might look suspicious: it uses obfuscated classes and cryptographic functions (like RC4 and AES), and gathers system details, techniques often associated with malware, but also common in Flash games to enforce DRM or prevent tampering.

The verdict? No malicious behavior was observed, and now we know why it looked suspicious in the first place.

This kind of context is exactly what Code Insight is designed for: saving time, reducing uncertainty, and helping you focus on real threats.


When AV misses, but Code Insight doesn’t

This second example shows the other side of the coin: a malicious SVG file that evaded all antivirus engines, going completely undetected on VirusTotal. On the surface, it looks clean, but a quick look with Code Insight tells a very different story.

1527ef7ac7f79bb1a61747652fd6015942a6c5b18b4d7ac0829dd39842ad735d

According to Code Insight: “This SVG file executes an embedded JavaScript payload upon rendering. The script decodes and injects a Base64-encoded HTML phishing page impersonating a Colombian government judicial system portal. To deceive the user, it simulates a file download with a progress bar, while in the background, it decodes a second, large Base64 string, which is a malicious ZIP archive, and forces its download.”

We validated this behavior by opening the sample in a controlled environment. As shown in the screenshots below, the fake portal is rendered exactly as described, simulating an official government document download process. The phishing site includes case numbers, security tokens, and visual cues to build trust, all of it crafted within an SVG file.


Despite its zero detections, this SVG hides two layers of abuse:

  • A convincing phishing lure, injected via inline JavaScript and decoded on-the-fly
  • A malware dropper, silently extracting and triggering the download of a ZIP file in the background

This is exactly the kind of threat Code Insight is meant to catch: well-crafted, script-based attacks that fly under the radar.

A deeper look: from one SVG to a full campaign

Curiously, the malicious SVG we highlighted earlier wasn’t just any random sample, it was one of the very first files submitted right after we deployed SVG support in Code Insight. A coincidence? Or were we seeing the tip of something bigger?

Thanks to VirusTotal Intelligence, we can search through our massive sample collection using hundreds of parameters, including queries that look inside Code Insight reports. So we ran:

type:svg AND codeinsight:"Colombian"


And voilà: 44 unique SVG files surfaced, all undetected by antivirus engines, but all flagged by Code Insight as part of the same phishing and malware campaign.

Diving into the source code of these SVGs, we found:

  • Code obfuscation techniques
  • Use of polymorphism, with slight changes in every file
  • And large amounts of dummy (garbage) code to increase entropy and evade static detection.

But Code Insight had no problem cutting through the noise.

One thing stood out: the attackers left Spanish-language comments in their scripts, with phrases like "POLIFORMISMO_MASIVO_SEGURO" and "Funciones dummy MASIVAS". While most of the code changed from sample to sample, those comments stayed exactly the same, a clear weakness, and a perfect signature for a simple YARA rule.


So we wrote a very basic one:


Running a retrohunt over the last year with this basic rule returned 523 matches.


Sorting by submission time, the first sample dates back to August 14, 2025, also submitted from Colombia, and also with 0 antivirus detections at the time.


We reanalyzed that first sample with the current version of Code Insight, and again, it produced an accurate description of the phishing page and malware dropper, impersonating the Colombian Fiscalía General de la Nación.

Looking deeper, we saw that the earliest samples were larger, around 25MB, and the size decreased over time, suggesting the attackers were evolving their payloads. Most importantly, the distribution vector was email, allowing us to pivot into delivery metadata: senders, subjects, attachment names, and more.


Final thoughts

SWF and SVG are very different formats from very different eras, but both can still cause headaches for analysts.

In the first case, Code Insight helped explain why a SWF file looked suspicious without actually being malicious. In the second, it uncovered malicious behavior in an SVG that had gone completely undetected.

This is where Code Insight helps most: giving context, saving time, and helping focus on what really matters. It’s not magic, and it won’t replace expert analysis, but it’s one more tool to cut through the noise and get to the point faster. And when Code Insight and VirusTotal Intelligence work together, one suspicious sample can become the key to revealing an entire campaign.