Corpus search results
The corpus view is accessed using the F9 shortcut in Full & Pro versions of Malcat. It displays results of pattern and/or Yara searches across a corpus of local directories and files or using online datasets like VTgrep or MalwareBazaar.
Table of Contents
What is corpus search?
As a malware analyst or detection engineer, you often need to search for a pattern or yara-scan a large number of files in a timely manner. Some of the most frequent use cases are:
threat attribution: look for samples sharing a piece of code, a string or a Yara rule with the current file
false positive remediation: look if the selected string is a good candidate for your new Yara rule by searching in your clean files set
malware analysis: want to compare the current analyzed malware against previous versions? Search your corpus for all previous samples of the current family (using a Yara rule for instance)
Malcat allows you to perform such searches directly from within the interface. Not only is it fast, but it lets you list and open matching files directly from the same window. This makes the whole process of finding the right file a lot easier!
Local corpus search is done done using multiple threads. You can change the number of threads in. Disk I/O plays an important part in performances too. While your first corpus search may be slow, subsequent searches should be a lot faster once most of the files are in cache!.
Organising your local corpus directories
If you want to take advantage of Malcat’s local corpus search features, you first need to organize your Corpus collection. A corpus is simply a directory containing a bunch of files which has a nice label.
Using the preferences dialog (), you can add, edit and remove corpus:
First click on the Add/remove corpus button (1), this will open a dialog that allow you to organize your corpus labels
Once you have added (2) one or more labels, you can assign a directory (3) to each corpus label. Note that sub-directories will be searched recursively.
Click on OK (4), and you’re good to go!
If you’re on Windows, you can for instance add a “CleanSystem32” corpus that points to
C:\Windows\System32. This allows you to scan your Yara rules against Microsoft clean files. This is of course not enough to avoid Yara false positives, but it’s a good start!
Once you are happy with your local corpus configuration, you can start your first corpus searches.
Performing corpus searches
In the Hexadecimal view or the views/text, you can select a pattern that you are interested in (cf. Selecting). For instance, is this string a good candidate for a Yara rule? Or did I see this pattern in a malware? Feel free to wild card some of the bytes if you wish (cf. Excluding bytes from selection).
Once your are done, open the selection context menu and chose:. This will initiate a parallel search of the selected pattern across all files within your corpus directories.
In the same vein, you can scan your corpus of files against a single Yara rule. This is useful to test your newly created Yara detection rule for instance. First go to the Yara editor / browser, select the rule you want to scan with in the rules list and open its context menu. You have two options:
Scan corpus: simply scan the corpus and report every file matching the rule
Scan corpus (partial matches allowed): scan the corpus and report every file matching the rule AND every file where at least one string of the Yara rule was found
This will initiate a parallel search of the selected Yara rules across all files within your corpus directories.
Once the corpus search has finished (you can monitor the progress through the statusbar’s gauge control), Malcat will automatically open the corpus view. It is a three columns grid view that displays the result of the pattern search / Yara scan across local and remote corpus sets.
Local corpus hits
Under the Local corpus hits category, you’ll find all the files located inside one of your local corpus directories (cf. Organising your local corpus directories) which are matching the searched pattern or the selected Yara rule.
For local corpus hits, the grid columns have the following meaning:
the first column (Object) displays the path to the file relative to the corpus root directory.
the second column (# Hits) displays how many times the selected pattern was found in the file, or the number of string matches in the file for a Yara rule
the third column (Corpus) displays the label of the corpus where the file was found
Double-clicking on a row will open the selected file as a new project in Malcat. All the matching patterns (or matching Yara strings in case of a Yara scan) will be automatically highlighted in the newly open file, so that you easily inspect them and see if it is indeed the file you were looking for. Hitting Ctrl+N or Ctrl+Shift+N let you cycle through all the matches, like when Searching in current file.
For some time, Virustotal introduced a great feature named VTGrep. VTGrep allows premium Virustotal users to perform fast pattern searches in Virustotal’s malware corpus. This feature is also integrated inside Malcat’s own corpus view, in addition to the local corpus search.
Currently, you can only search for patterns on Virustotal. Yara scans on VT sadly are too slow, and we had to deactivate the functionnality.
After selecting a string, a function or any arbitrary data range in Malcat and starting a corpus search (cf. Searching in the corpus), you will be able to list all files containing this pattern in Virustotal:
For Virustotal’s hits, the grid columns have the following meaning:
the first column (Object) displays the initial name of the file in Virustotal. Note that files may have been uploaded several times using different names on Virustotal.
the second column (# Hits) displays the number of antivirus detections on Virustotal for the matching file
the third column (Corpus) displays the type of the matching file as reported by Virustotal
Double-clicking on a row will open a new browser tab on the Virustotal’s report for the selected file.
MalwareBazaar queries are not implemented yet, stay tuned!