Friday, May 12, 2023

VT Code Insight: Updates and Q&A on Purpose, Challenges, and Evolution

Following the announcement of VirusTotal Code Insight at the RSA Conference 2023, we've been thrilled by the overwhelmingly positive response from the cybersecurity community. As enthusiasm grows, we've been flooded with inquiries from those keen to discover more about Code Insight. To address these questions, we've put together a Q&A covering popular topics, including news about the tool's expanded capabilities. Our aim is to establish realistic expectations and provide a comprehensive understanding of Code Insight's purpose, challenges, and ongoing progress.

In case you missed our presentation at RSA, we compiled a brief video showcasing a few use cases:

What's New?

  • Code Insight has broadened its support for script formats, moving beyond PowerShell to offer analysis for a variety of scripting languages. Here are some examples:
  • The maximum file size limit for files processed by Code Insight has been doubled, allowing for analysis of larger files.
  • The model delivers more concise and focused high-level explanations, placing greater emphasis on code behavior.
  • The user interface has been redesigned to display only the first sentences of the report by default, with the option for users to expand the report as needed, preventing extensive reports from overwhelming the default view.

Q: Is Code Insight replacing current detection technologies?
A: No, Code Insight is no substitute for current detection technologies. Rather, it's an additional layer of support designed to boost performance of security analysts, offer a different perspective on the technical details and help the cybersecurity industry improve their technologies.

Q: Is Code Insight replacing human analysts in cybersecurity roles?
A: No, Code Insight is not designed to replace human analysts in cybersecurity. Instead, it serves as a powerful assistant and partner, enhancing their performance and effectiveness. The major advantage of Code Insight is its scalability, as it can act as an extremely productive junior analyst, working 24x7 and analyzing a vast number of files. This enables human analysts to receive preliminary analyses to help in their evaluation and focus on the most interesting or challenging samples. Ultimately, Code Insight's goal is to augment human expertise, rather than replace it, and to help the cybersecurity industry evolve by improving both technologies and workflows.

Q: Why is there a limitation on the size of PowerShell files analyzed with Code Insight?
A: The limitation on the size of PowerShell files analyzed with Code Insight stems from the inherent constraints of large language models (LLMs) concerning token inputs. LLMs have a maximum token limit for each input, which affects the amount of code that can be processed in a single pass.

Like we mentioned earlier in this blog post, Code Insight can now handle files twice the size it could before, and we're not stopping there. We're going to keep working on improving this aspect in the coming months. To overcome the limitations and increase the size of the analyzed code, we are developing several strategies. One approach involves breaking down larger code snippets into smaller chunks, processing them separately, and then combining the results. Another strategy focuses on refining the LLM's pre-processing and tokenization methods to optimize the handling of larger input sizes more efficiently.

Q: Can Code Insight analyze executables?
A: While Code Insight currently focuses on analyzing scripting languages, we are actively working on expanding its capabilities to handle executables and other file formats. Our approach comprises multiple stages, such as disassembling and decompiling, which enable the large language model to effectively generate natural language summaries of the analyzed code.

Q: What are some other limitations of Code Insight and improvements planned in the roadmap?
A: Code Insight is continuously learning and improving its accuracy based on experience and user feedback. One of the current limitations is it only uses code as its unique input, without any additional context about the cases being analyzed. Sometimes, it can be challenging to determine if something is legitimate or malicious based solely on the code. For example, a code snippet that downloads a remote file and executes it could either be a legitimate installer or a malware downloader. This lack of context can sometimes lead to model hallucinations, where the model may make incorrect assumptions about the code’s intent.

To address this limitation, we have plans in our roadmap to provide more context to Code Insight, ultimately improving its criteria and deep analysis capabilities. This includes giving access to any metadata related to the URLs and files linked in the code snippet. By incorporating additional context, Code Insight will be better equipped to distinguish between legitimate and malicious behavior, enhancing its overall effectiveness.

Q: Can attackers develop strategies to confuse Code Insight and provoke mistakes in its judgment?
A: Yes, it was anticipated, and indeed, it has been happening from the very moment Code Insight was announced. As we highlighted in our prior blog post, it's crucial to acknowledge that attackers can devise strategies to deliberately confuse LLMs when analyzing code and cause them to make errors in judgment. This is similar to how they continually refine their tactics to bypass or mislead traditional detection technologies such as antivirus, EDRs, sandboxing, IDS/IPS, and so on.

Q: Have you detected any active malware specifically crafted to confuse Code Insight?

A: So far, we haven't detected any malware designed to confuse Code Insight that has been used in real attacks. However, we have observed numerous proof-of-concept attempts testing prompt injections and other tricks during the first few days after the announcement, as expected. We remain vigilant and anticipate more genuine attempts in the future if attackers and malware creators start to recognize the effectiveness of Code Insight in enhancing cybersecurity measures.

Q: Do you plan to filter out crafted samples designed to confuse Code Insight?

A: No, the nature of VirusTotal is to scan everything, including malware with advanced obfuscation or any kind of attack technique aimed at bypassing detection capabilities. Our mission is to provide our partners and security teams with access to any observed threat to learn and improve product performance and overall security posture. Embracing failure and learning from mistakes is a crucial part of the journey to improve and evolve Code Insight. It continuously learns from attackers' attempts to confuse its judgment, adapting and enhancing its capabilities to remain a valuable tool for security analysts. By analyzing a wide range of samples, including those crafted to cause confusion, Code Insight gains valuable insights that contribute to its ongoing development and effectiveness.

Q: How can I help improve Code Insight?

A: Your feedback is critical to learning and evolving Code Insight. The community's input helps us identify areas for improvement and fine-tune the system for better performance. You can contribute by providing feedback through the "Thumbs Up" and "Thumbs Down" buttons, which you can find in the bottom right corner of the Code Insight reports. These buttons let you express positive or negative feedback on the analysis results. By sharing your insights, you play a valuable role in enhancing the capabilities of Code Insight and driving its ongoing development.


Q: How does Code Insight integrate with VirusTotal Intelligence?

A: Code Insight's true value lies in its ability to scale analysis through the VirusTotal platform, allowing for the examination of individual code samples and the aggregation and exploitation of results on a large scale via VirusTotal Intelligence. All results generated by Code Insight are already easily searchable, pivotable, and accessible through API calls. This enables security teams to quickly and effectively scrutinize vast quantities of code and identify potential threats, ultimately enhancing their security stance.

Example of VTI query with Code Insight include to search samples related to Telegram:

Screenshot 2023-05-05 10.52.55

Q: Are there any additional functionalities of Code Insight specifically for VirusTotal Intelligence users?

A: Yes, one of the key features in our roadmap is the introduction of Code Insight on demand for VT Intelligence users. This functionality allows VT Intelligence users to copy and paste code snippets for immediate analysis by Code Insight, bypassing the need to submit entire samples to VirusTotal. This on-demand feature emphasizes a crucial distinction compared to the standard Code Insight process, where submitted samples may or may not be processed by Code Insight depending on pre-filtering criteria.

The pre-filtering criteria take into account several factors such as size, similarity with previous code snippets already processed by Code Insight, abuse signals, and other relevant considerations. This filtering mechanism ensures that Code Insight focuses on the most valuable and unique samples, which helps maintain the platform's overall efficiency and effectiveness.

However, with the on-demand functionality tailored for VT Intelligence users, security professionals can ensure that their specific code snippets are directly analyzed by Code Insight. This provides a more streamlined and targeted approach, enabling users to receive quick and precise results to further enhance their security posture.

Q: I have new ideas and use cases for implementing AI in VT, and I'm interested in contributing with new models and tools. Can I do this?

A: Absolutely! VirusTotal thrives on its community-driven approach, welcoming contributions from users and companies who have innovative ideas for enhancing the platform. We are eager to consider new features and integrate tools from contributors who share our passion for improving cybersecurity for the benefit of the global community. If you have any new ideas, suggestions, or tools, please don't hesitate to reach out to us. Your expertise and creativity can make a significant impact in advancing our collective efforts in the cybersecurity field, ultimately promoting a safer digital environment worldwide.

What's Next for Code Insight?

We are in the early stages of Code Insight's development; this is just the beginning. We continue to enhance and expand the current features while exploring new avenues to assist analysts even more effectively. Our plans include:

  • Further expanding file type and size support.
  • Analyzing binary and executable files.
  • Enriching analysis with contextual information besides the code itself.

Your feedback and support play a vital role in helping us stay ahead of evolving cyber threats. Thank you for your continued involvement, and stay tuned for more updates and developments from VirusTotal.

Happy hunting!


Post a Comment