Thursday, August 28, 2025

, ,

Integrating Code Insight into Reverse Engineering Workflows

More than two years have passed since we announced the launch of Code Insight at RSA 2023. From that time on, we have been applying this technology in different scenarios, expanding its use in new file formats (12).

As we advance in the automated analysis of new files with Code Insight, we want to offer an alternative that enables the integration of this type of technology into the analysis of disassembled or decompiled code.

To that end, we have created a new endpoint that receives code requests and returns a description of its functionality, highlighting the most relevant aspects for malware analysts. This endpoint can be used to query code blocks, chaining previous analyses with modifications or corrections made by the analyst. This significantly reduces the reverse engineering workload by providing the analyst with an assistant that pre-analyzes functions deemed interesting, acquiring knowledge as the analysis proceeds.

This endpoint can be integrated into any reverse engineering tool that processes disassembled or decompiled code. As an implementation example, the VirusTotal plugin for IDA Pro has been updated to support its use from the IDA interface. This offers a simple way to integrate relevant analyses into a notebook, allowing the analyst to keep responses that play a direct role in understanding how the code works.

Endpoint for reversed code queries

Using this new endpoint is quite simple—just make a request to the API as shown in the following example:

API_URL = 'https://www.virustotal.com'
endpoint = 'api/v3/codeinsights/analyse-binary'
headers_apiv3 = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'x-apikey': [API_KEY]
}

payload = {
    'code': [code_base64],
    'code_type' = ['disassembled'|'decompiled']
}

response = requests.post(f'{API_URL}/{endpoint}',
                         json = {'data': payload},
                         headers = headers_apiv3)


This Python code corresponds to a request to the endpoint located at ‘https://www.virustotal.com/api/v3/codeinsights/analyse-binary’, in which the code to be analyzed is included in the ‘payload’ variable as follows:

payload = {
    'code': code_base64,
    'code_type' = 'disassembled'|'decompiled'
    "history": [
        {
            "request": code_base64,
            "response": {
                            "summary": text,
                            "description": text,
                        },
        },
        {
            "request": code_base64,
            "response": {
                            "summary": text,
                            "description": text,
                        },
        },
    ]
}


The request is divided into two parts: the first includes the code being analyzed (‘code’ and ‘code_type’), and the second includes previous requests—potentially reviewed by the analyst—that provide context for analyzing the queried code.

This request will return a general description of how the submitted code snippet works ("summary") and, in addition, another text where it describes in more detail how these functionalities are carried out ("description"). In this way, the analyst can quickly check if the function contains any behavior that they consider interesting, and thus, review the execution steps or discard the function as irrelevant.

New version of the VT-IDA Plugin for IDA Pro

Along with this new endpoint, we have updated the VirusTotal plugin to show how this new functionality can be integrated into the analyst's workflow.

This new functionality can be used as follows:
  1. The analyst selects a function from the disassembled or decompiled code to be analyzed.
  2. If the response provided by the endpoint is satisfactory and reveals an interesting function, they can click ‘Accept’ to include it in a list of selected functions, which we call the ‘CodeInsight Notebook’. They can also make modifications to the ‘Summary’ and ‘Description’ fields to correct errors or add information that helps put the code in context.
  3. With each new request sent to the endpoint, all previously stored functions are included—along with any modifications made by the analyst. This allows for more accurate analyses based on previously obtained and reviewed results.
Here’s how the new version of the plugin would look after a few iterations on a malware sample:



A practical example

Let's illustrate the benefits of the new plugin with a practical example. Imagine an analyst needs to analyze a malicious binary file to understand its function. This is typically a time-consuming and complex process, but with the help of Code Insight, their workflow becomes significantly more efficient:

  1. Targeted Analysis: The analyst selects a code block they suspect might be malicious and uses the endpoint to get an automated analysis.

    The code shown below implements an anti-disassembly technique aimed at generating disassembled code that hides malicious functionality through a hidden jump to a memory address. Essentially, the resulting disassembled code is unreliable, as it doesn’t accurately represent the code that will actually be executed.



  2. Review and Refinement: At this point, a request is made to obtain an initial analysis of the code. The analyst reviews the response and can modify both the ‘Summary’ and ‘Description’ fields with their own notes or corrections.

  3. In this case, the obtained code analysis correctly identifies an anti-disassembly technique that modifies the return address. However, it does not provide information about a possible return address that would help the analyst locate the hidden code.

    At this point, the analyst can modify the output provided by the endpoint to explain how this technique works. This way, the acquired knowledge can be used in the analysis of other code blocks within the sample. To do so, the analyst simply needs to include the (reviewed) analysis in the list of analyzed functions by clicking the ‘Accept’ button.



  4. Iterative Analysis and Improved Results: The file analysis continues in such a way that, with each new request, the list of analyzed functions is sent—effectively representing the knowledge acquired from analyzing the code selected by the analyst.


And as shown in the previous image, this knowledge is used in other function queries that employ a technique similar to the one previously discussed—this time providing more details about how it works and alerting the analyst to the possibility of jumping to an address containing hidden code.

Quick Tips

The endpoint offers some interesting features for the analyst. For example, as shown in the following figure, the presence of strings written in languages other than English has been detected, providing a translation and pinpointing their location in memory.



On the other hand, while analyzing assembly code has its own pros and cons compared to decompiled code, we can gain additional benefits by analyzing a decompiled function whose disassembled code has been previously analyzed and stored in Code Insight Notebook.

For example, let's look at the decompiled code of a function previously analyzed in its disassembled version:


The image below illustrates how analyzing a decompiled function becomes richer with the help of the previously stored analysis of its disassembled code. This happens because certain features, like text strings, are visible in the disassembled code but often missing from the decompiled version.

As a result, Code Insight can provide a more concise and direct explanation by leveraging the decompiled view, which is supported by the disassembled code.



It is important to highlight that both the endpoint and this new feature of the plugin for IDA Pro are offered in trial mode, with the aim of involving the community in the progress we are making in its application to the field of reverse engineering. Although the results produced by this new functionality have been very positive during the testing phase, it is possible that the output generated by the endpoint may not be 100% accurate and could contain errors or omit some relevant details of the analysis.

We are confident that this new integration will be a great help to analysts who are gradually incorporating LLM model capabilities into their workflow. As we continue to harness the power of AI, your feedback is incredibly valuable to us. Stay connected for future updates, and thank you for your continued support.


0 comments:

Post a Comment