TL;DR: We ran our new AI-based Mach-O analysis pipeline in production, no metadata, no prior detections, just raw Apple binaries. On Oct 18, 2025, out of 9,981 first-seen samples, VT Code Insight surfaced multiple real Mac and iOS malware cases that had 0 antivirus detections at submission time, including a multi-stage AppleScript infostealer and an iOS credential-stealing tweak. It also helped identify 30 antivirus false positives, later confirmed and fixed.
By Bernardo Quintero, Tom Bennett, and Paul Tarter
The Challenge: Reversing at Scale
The long-term goal of Code Insight is ambitious but simple to state: use AI to reason about every single file that reaches VirusTotal in real time. That’s more than two million samples a day, so scalability and efficiency aren’t nice-to-haves, they’re requirements.
We started this journey in early 2023 by analyzing small PowerShell scripts under 25 KB, focusing on fast, context-limited reasoning. As Gemini’s token capacity grew, we expanded support to larger files and richer formats: Office documents with macros, PDFs containing embedded objects, and package types such as NPM, SWF, SVG, MCP, CRX, VSIX, etc. Each step pushed the boundaries of what Code Insight could interpret automatically.
Eventually, we reached compiled binaries, by far the most challenging class due to their size, complexity, and low-level structure. Analyzing native code with large language models is not straightforward: Mach-O binaries can be massive, and full decompilation or disassembly often exceeds even the largest model contexts, while being too slow and expensive for a high-volume production pipeline.
To make this feasible, we built a pruning-based summarization layer. Instead of feeding Gemini a full decompilation or noisy disassembly, we first extract the most informative elements: code entry points, key imports and exports, relevant strings, and selected function summaries, using Binary Ninja’s High Level Intermediate Language (HLIL) for native code. The goal isn’t to reconstruct the full program logic, but to preserve just enough structure for meaningful reasoning.
This distilled representation fits comfortably within Gemini’s 1M-token context window and allows us to generate a concise, human-readable analyst summary in a single LLM call, regardless of the binary’s size. It’s a pragmatic balance between depth and scalability, good enough to reason and perform a fast first-pass triage, yet efficient enough to keep up with the continuous flow of new files reaching VirusTotal every day.
The 24-Hour Stress Test
On October 18, 2025, VirusTotal received 9,981 unique Mach-O binaries never seen before by our platform. We ran every single one through our new Code Insight pipeline, using only the raw binary, no external metadata, no crowdsourced intelligence, and no previous antivirus results.
Here’s how the AI’s fully independent analysis compared against the aggregate detections from more than 70 traditional antivirus engines on that same day:
- Traditional AV Detections: 67 binaries flagged as malicious by one or more engines.
- Code Insight Detections: 164 binaries identified as malicious.
The absolute numbers are interesting, but the real insight comes from the discrepancies between the two sets.
Clearing the Fog: AI as a False Positive Filter
Manual review confirmed that Code Insight’s explanations were accurate:
- 30 files were false positives from Microsoft’s engine. Once reported, Microsoft promptly reviewed the cases, confirmed the issue, and updated their signatures on October 31. We appreciate their quick response.
- 3 files were flagged by ClamAV with the signature Macos.Trojan.CrackedTool. While this label is technically correct within ClamAV’s detection policy (it flags software signed by alternative markets such as MacKed), Code Insight correctly identified that these binaries did not exhibit inherently malicious behavior.
In a Security Operations Center setting, even a single false alert like this could consume hours of analyst time before being cleared. While VirusTotal operates at global scale and aggregates samples from many independent sources, the same pattern applies within any organization’s network: unnecessary alerts create noise and drain resources. Code Insight demonstrated how AI reasoning can help triage these cases faster and more consistently, assisting rather than replacing human judgment.
Finding the Needles: Zero-Day Detections
Beyond filtering false positives, Code Insight also surfaced nearly 100 binaries that traditional engines had missed entirely at the time of analysis. Many of these were indeed suspicious, ranging from keygens and adware to grayware utilities with excessive privileges, such as certain developer e-learning tools or Roblox cheats often distributed outside the App Store.
That said, not every “malicious” verdict was black and white. Because Code Insight analyzes binaries in isolation, without context about their surrounding environment or intended use, it can occasionally err on the side of caution. For instance, one MCP component from the Hopper decompiler was described accurately in terms of behavior (persistent XPC communication, JSON-based client–server protocol, API-like command handlers) but was ultimately benign within its legitimate application context, as an MCP server rather than a malicious persistent C2 channel. In this case, we had an accurate technical description but a false positive in the final verdict issued by Code Insight.
These occasional gray-area cases are part of the natural learning curve for AI-based reasoning systems. Still, the vast majority of Code Insight’s findings were technically sound, and its detailed explanations allowed analysts to make quick, informed decisions based on actual capabilities rather than static signatures.
Among those findings, we also identified several clear-cut cases of undetected malware, confirmed through manual reversing and reproducible behavior. Below we highlight two representative examples, one from macOS and another from iOS, both caught by AI yet completely invisible to traditional defenses on Day 0.
1. Multi-stage macOS Dropper (0 Detections)
SHA-256: 9adef73a6255f6bcb203e84cbe9304d000f3c5354d3d7bf3fc3b2a0128b624c3
Code Insight immediately recognized this binary's hostile intent, describing it as a multi-stage threat. It didn't just flag it, it mapped the attack chain:
"The binary is a multi-stage malware that downloads and executes a second-stage AppleScript payload from a C2 server, and exfiltrates local data. It first connects to https://foggydoxz.xyz/dynamic to download an AppleScript, saves it to /tmp/test.scpt, and executes it using /usr/bin/osascript. Subsequently, it reads /tmp/osalogging.zip and exfiltrates it via a POST request to https://foggydoxz.xyz/gate. The malware also bypasses TLS certificate validation to secure its C2 communication."
Manual reversing confirmed not only the verdict but every detail of this AI-generated assessment.
The screenshot above shows how detections for this sample evolved on VirusTotal over time. When it first arrived on October 18, no antivirus engine flagged it as malicious, yet Code Insight already identified it as a multi-stage macOS dropper. Over the following days, traditional detections gradually caught up: three engines marked it nine days later, and eleven by October 28. This pattern is common for truly novel threats: AI reasoning can expose suspicious behaviors before signatures or reputation systems are updated, offering analysts an early warning window that would otherwise not exist.
2. iOS Jailbreak Tweak with a Phishing Twist (0 Detections)
SHA-256: 333913409c1e22b5da03c762cbb7d99a9d38ecdf0231cb9ac6db00efc6b3bd97
This sample masquerades as a dynamic library for jailbroken iOS devices, claiming to unlock premium features in Adobe Lightroom. Code Insight looked beyond the piracy functionality and uncovered a secondary payload focused on credential theft.
The AI correctly identified that it used method swizzling / hooking not only to bypass subscription checks but also to inject a fake login prompt. It highlighted obfuscation mechanisms used to conceal the exfiltration channel, including a hardcoded, obfuscated Telegram Bot API token and custom cryptographic routines to hide command strings.
Code Insight’s summary read:
"This is an iOS dynamic library (tweak) for jailbroken devices, designed to be injected into the Adobe Lightroom application process. Its primary purpose is to modify the target application's functionality. It exhibits several malicious behaviors: it uses method swizzling (e.g., `sub_41e564`) to hook functions, displays a custom UI overlay on top of the running application, and employs extensive string obfuscation and custom cryptographic routines (e.g., `sub_433b0c`, `sub_415b68`) to hide its functionality. Key IOCs include the installation path `/Library/MobileSubstrate/DynamicLibraries/Lightroom.dylib` and a URL to a Telegram channel (`https://t[.]me/blatants]`), likely used for C2 or distribution. These characteristics are consistent with malware designed for piracy, credential theft, or phishing within the context of the compromised host application."
A manual reverse-engineering review by an expert confirmed, and expanded on the AI’s assessment. The human analysis described the sample as a malicious dynamic library that functions as a dual-purpose tool: (1) it uses method hooking to bypass Lightroom’s premium-feature checks (by replacing subscription validation routines to always return success), and (2) it implements a phishing capability that displays a convincing fake login prompt to capture Adobe credentials. The stolen credentials are then exfiltrated via an obfuscated Telegram Bot API token and Chat ID, with string obfuscation and lightweight crypto used to hide the Telegram URL and tokens. In short: the manual review corroborated the AI’s technical description and confirmed the end-to-end exfiltration mechanism.
The VirusTotal report below shows the status of this sample not only at the time it was first analyzed on October 18, but also as of November 3, more than two weeks later. No antivirus engine has flagged it as malicious to date.
Moreover, on the same day this sample was analyzed, Code Insight detected multiple other binaries using the same injection framework, suggesting an organized campaign rather than an isolated specimen.
Further investigation of the Telegram channel referenced in the IOCs (https://t[.]me/blatants) revealed that it hosts a large-scale operation called Blatant’s iPA Library, boasting more than 38,000 subscribers. The group distributes automation bots (InjectBot, PatchBot, PaidAppScraper, and FileDownloader) that advertise the ability to inject .dylib payloads into iOS .ipa apps, patch premium features, and share modified packages. This infrastructure perfectly matches the behaviors described in the AI-generated report and confirmed through manual reversing: a dual-purpose ecosystem for app piracy and credential theft, powered by Telegram’s bot API.
These examples illustrate both the power and practicality of AI-driven reversing. Even without context or prior knowledge, the model can reason through complex binaries, extract intent, and expose behaviors that remain invisible to static or signature-based methods.
The Pragmatic Reality
This work is not about replacing traditional detection engines, it’s about complementing them and covering their blind spots at a scale human teams simply can’t match.
Until recently, reverse engineering and in-depth code analysis were tasks reserved for human analysts. Even in large-scale operations, fewer than 1% of new files ever underwent that level of scrutiny, simply because manual reversing doesn’t scale. Yet those are precisely the samples that tend to slip past signature or ML-based detections, the truly novel threats.
By autonomously performing this kind of junior-analyst–level reasoning across millions of files daily, VT Code Insight brings that deeper layer of understanding to every new sample, not just the few that would normally reach a human analyst’s desk.
It’s a pragmatic shift: AI reasoning where it scales, human expertise where it matters most, helping defenders see further, faster, and with greater context than ever before.


0 comments:
Post a Comment