Strings (analysis.strings)

analysis.strings: malcat.Strings

The analysis.strings object is a malcat.Strings instance that gives you access to all the strings found by the String analysis algorithm.

Note that in addition to this documentation, you can find usage examples in the sample script which is loaded when you hit F8.

Accessing / enumerating strings

Malcat can identify strings using different algorithms, depending on the file type (cf. String analysis). Algorithms can range from the simple regular-expression based linear sweep to more advanced file-format aware parsers, or even disassembly-based parsers. Multiple algorithms can also be used at the same time.

All identified strings are accessible through the analysis.strings object.

class malcat.Strings

This class contains all strings identified by the String analysis. Note that all addresses used in this class are effective addresses. See Addressing in Malcat for more details.

__iter__()

Iterate over all the strings found in the file

for s in itertools.islice(analysis.strings, 15):
    print(f"* string {repr(s.text)} found at #{analysis.map.to_phys(s.address):x}")
Return type:

iterator over FoundString

__getitem__(interval)

Iterate over all the strings found in the interval (effective address):

rdata = analysis.map[".rdata"]
for s in analysis.strings[rdata.start : rdata.end]:
    print(f"* string {repr(s.text)} found at #{analysis.map.to_phys(s.address):x}")
Parameters:

interval (slice) – effective address interval

Return type:

iterator over FoundString

__getitem__(ea)

Returns the string found at the effective address ea.

for x in analysis.xref:
    if x.address in analysis.strings:
        print(f"string {analysis.strings[x.address]} is referenced {len(x)} times")
Parameters:

ea (int) – effective address

Return type:

FoundString instance

Raises:

KeyError if no strings was identified at ea

__getitem__(string_content)

Returns the string object corresponding to the string “string_content” if found. Note that this queries Malcat’s string cache and this cache has some limitations. Strings bigger than 256 bytes are not cached for instance, and the cache has a limit on the number of strings it holds too.

if "VirtualProtect" in analysis.strings and "kernel32.VirtualProtect" not in analysis.syms:
    vp = analysis.strings["VirtualProtect"]
    print(f"VirtualProtect string found at #{analysis.map.a2p(vp.address)})
Parameters:

string_content (str) – the string to search (can contain unicode characters)

Return type:

FoundString instance

Raises:

KeyError if no strings “string_content” was identified

__contains__(ea)

return True iff a string was found at effective address ea

rdata = analysis.map[".rdata"]
if rdata.start in analysis.strings:
    print(".rdata section starts with a string")
Parameters:

ea (int) – address to query

Return type:

bool

__contains__(string_content)

return True iff a string “string_content” was found. Note that this queries Malcat’s string cache and this cache has some limitations. Strings bigger than 256 bytes are not cached for instance, and the cache has a limit on the number of strings it holds too.

if "VirtualProtect" in analysis.strings and "kernel32.VirtualProtect" not in analysis.syms:
    print("Looks like VirtualProtect is imported dynamically!")
Parameters:

string_content (str) – the string to search (can contain unicode characters)

Return type:

bool

find(ea)

return the string containing the effective address ea, or None if address ea is not contained in any identified string.

Parameters:

ea (int) – effective address for the query

Return type:

FoundString or None

find_forward(ea)

return the the first string defined over an effective address >= ea , or None if no no string at or past ea can be found.

first_string = analysis.strings.find_forward(0)
if first_string is None:
    raise ValueError("No string in program!")
Parameters:

ea (int) – effective address for the query

Return type:

FoundString or None

find_backward(ea)

return the last string containing the effective address <= ea, or None if no string at or before ea can be found

last_string = analysis.strings.find_backward(analysis.map.end)
if last_string is None:
    raise ValueError("No string in program!")
Parameters:

ea (int) – effective address for the query

Return type:

FoundString or None

__len__()

return the number of identified strings

if len(analysis.strings) == 0:
    raise ValueError("No string found!")
Return type:

int

total: int

return the cumulated size in bytes of all identified strings

ratio = analysis.strings.total / len(analysis.file)
print(f"{100*ratio:.2f}% of the file consist of strings")

String object

An identified string is FoundString python object showing the interface defined below.

Please note that Malcat makes the difference between how string objects are stored and the string they contain. The first character of the string is not always located at the beginning of the string object. NET user strings are prefixed by their size for instance. Keeping track of the whole string object is a necessity for working cross references.

String

class malcat.FoundString
address: int (effective address)

string object effective address. Note that it is not automatically the address of the first character, since some strings object have a size prefix, like in .NET #US stream for instance.

size: int

number of bytes occupied on disk by the string (including any structural prefix/suffix)

__len__()

number of character points in the string

Return type:

int

bytes: bytes

the encoded bytes of the string

text: str

the string content (unicode-friendly)

__repr__()

same as text:

first_string = analysis.strings.find_forward(0)
print(first_string)
Return type:

str

encoding: malcat.FoundString.Encoding

encoding of the string, can be one of:

type: malcat.FoundString.Type

the type of the string, i.e. how was the string recovered. It can be one of:

score: int

string interest score, an int between 0 and 255 (cf. String analysis)

tag: str

string tag, e.g. “IP” or “URL” (cf. String analysis)

num_xrefs: int

how many times is the string referenced (code and data references)

entropy: int

entropy of string, an int between 0 and 255. This value is not very precise for performance reasons, but it can be used as a prefilter.

Encoding

The string encoding can be one of the following values

class malcat.FoundString.Encoding
ASCII
UTF8
UTF16

currently only utf16-le is supported

BINARY

no encoding, the .text value will be the hexadeciaml representation of the bytes

Type

The string type, i.e how was the string recovered:

class malcat.FoundString.Type
SCANNED

a string identified using a simple linear sweep algorithm. There is a non-negligeable chance that this is not a string

META

a string used by the file format. These could be symbols, or class names

USER

a string literal used by user code (eg. a .NET UserString, for other CPUs these could be just scan strings with at least one reference)

DYNAMIC

a dynamic string, constructed on the stack or in memory