Strings (analysis.strings)
- analysis.strings: malcat.Strings
The
analysis.strings
object is amalcat.Strings
instance that gives you access to all the strings found by the String analysis algorithm.
Note that in addition to this documentation, you can find usage examples in the sample script which is loaded when you hit F8.
Accessing / enumerating strings
Malcat can identify strings using different algorithms, depending on the file type (cf. String analysis). Algorithms can range from the simple regular-expression based linear sweep to more advanced file-format aware parsers, or even disassembly-based parsers. Multiple algorithms can also be used at the same time.
All identified strings are accessible through the analysis.strings
object.
- class malcat.Strings
This class contains all strings identified by the String analysis. Note that all addresses used in this class are effective addresses. See Addressing in Malcat for more details.
- __iter__()
Iterate over all the strings found in the file
for s in itertools.islice(analysis.strings, 15): print(f"* string {repr(s.text)} found at #{analysis.map.to_phys(s.address):x}")
- Return type:
iterator over
FoundString
- __getitem__(interval)
Iterate over all the strings found in the interval (effective address):
rdata = analysis.map[".rdata"] for s in analysis.strings[rdata.start : rdata.end]: print(f"* string {repr(s.text)} found at #{analysis.map.to_phys(s.address):x}")
- Parameters:
interval (slice) – effective address interval
- Return type:
iterator over
FoundString
- __getitem__(ea)
Returns the string found at the effective address ea.
for x in analysis.xref: if x.address in analysis.strings: print(f"string {analysis.strings[x.address]} is referenced {len(x)} times")
- Parameters:
ea (int) – effective address
- Return type:
FoundString
instance- Raises:
KeyError
if no strings was identified at ea
- __getitem__(string_content)
Returns the string object corresponding to the string “string_content” if found. Note that this queries Malcat’s string cache and this cache has some limitations. Strings bigger than 256 bytes are not cached for instance, and the cache has a limit on the number of strings it holds too.
if "VirtualProtect" in analysis.strings and "kernel32.VirtualProtect" not in analysis.syms: vp = analysis.strings["VirtualProtect"] print(f"VirtualProtect string found at #{analysis.map.a2p(vp.address)})
- Parameters:
string_content (str) – the string to search (can contain unicode characters)
- Return type:
FoundString
instance- Raises:
KeyError
if no strings “string_content” was identified
- __contains__(ea)
return True iff a string was found at effective address ea
rdata = analysis.map[".rdata"] if rdata.start in analysis.strings: print(".rdata section starts with a string")
- Parameters:
ea (int) – address to query
- Return type:
bool
- __contains__(string_content)
return True iff a string “string_content” was found. Note that this queries Malcat’s string cache and this cache has some limitations. Strings bigger than 256 bytes are not cached for instance, and the cache has a limit on the number of strings it holds too.
if "VirtualProtect" in analysis.strings and "kernel32.VirtualProtect" not in analysis.syms: print("Looks like VirtualProtect is imported dynamically!")
- Parameters:
string_content (str) – the string to search (can contain unicode characters)
- Return type:
bool
- find(ea)
return the string containing the effective address ea, or None if address ea is not contained in any identified string.
- Parameters:
ea (int) – effective address for the query
- Return type:
FoundString
or None
- find_forward(ea)
return the the first string defined over an effective address >= ea , or None if no no string at or past ea can be found.
first_string = analysis.strings.find_forward(0) if first_string is None: raise ValueError("No string in program!")
- Parameters:
ea (int) – effective address for the query
- Return type:
FoundString
or None
- find_backward(ea)
return the last string containing the effective address <= ea, or None if no string at or before ea can be found
last_string = analysis.strings.find_backward(analysis.map.end) if last_string is None: raise ValueError("No string in program!")
- Parameters:
ea (int) – effective address for the query
- Return type:
FoundString
or None
- __len__()
return the number of identified strings
if len(analysis.strings) == 0: raise ValueError("No string found!")
- Return type:
int
- total: int
return the cumulated size in bytes of all identified strings
ratio = analysis.strings.total / len(analysis.file) print(f"{100*ratio:.2f}% of the file consist of strings")
String object
An identified string is FoundString
python object showing the interface defined below.
Please note that Malcat makes the difference between how string objects are stored and the string they contain. The first character of the string is not always located at the beginning of the string object. NET user strings are prefixed by their size for instance. Keeping track of the whole string object is a necessity for working cross references.
String
- class malcat.FoundString
- address: int (effective address)
string object effective address. Note that it is not automatically the address of the first character, since some strings object have a size prefix, like in .NET #US stream for instance.
- size: int
number of bytes occupied on disk by the string (including any structural prefix/suffix)
- __len__()
number of character points in the string
- Return type:
int
- bytes: bytes
the encoded bytes of the string
- text: str
the string content (unicode-friendly)
- __repr__()
same as
text
:first_string = analysis.strings.find_forward(0) print(first_string)
- Return type:
str
- encoding: malcat.FoundString.Encoding
encoding of the string, can be one of:
- type: malcat.FoundString.Type
the type of the string, i.e. how was the string recovered. It can be one of:
- score: int
string interest score, an int between 0 and 255 (cf. String analysis)
- tag: str
string tag, e.g. “IP” or “URL” (cf. String analysis)
- num_xrefs: int
how many times is the string referenced (code and data references)
- entropy: int
entropy of string, an int between 0 and 255. This value is not very precise for performance reasons, but it can be used as a prefilter.
Encoding
The string encoding can be one of the following values
Type
The string type, i.e how was the string recovered:
- class malcat.FoundString.Type
- SCANNED
a string identified using a simple linear sweep algorithm. There is a non-negligeable chance that this is not a string
- META
a string used by the file format. These could be symbols, or class names
- USER
a string literal used by user code (eg. a .NET UserString, for other CPUs these could be just scan strings with at least one reference)
- DYNAMIC
a dynamic string, constructed on the stack or in memory