File object (analysis.file)

analysis.file: File: The analysis.file object is a File instance which represents the content of the file/data being analyzed.

A file instance may be a real file located on the file system or a non-physical file (for instance when you chose to “open” a user selection). The file can be read from any python script, template, anomalies, basically everything that has access to the bindings. Python scripts may additionally write to the file object, as well as change its size

class malcat.File

This class represent a file open inside Malcat. Note that all addresses used in this class are physical addresses. See Addressing in Malcat for more details.

name: str: File base name, as shown in the user interface.

path: str: File path, may be None for non-physical files. Note that the file path is also accessible from sys.argv[1].

size: int: File size in bytes. Same as __len__()

__len__()

return the file size, same as malcat.File.size. Example:

filesize = len(analysis.file)

Return type:: int

I/O methods

Reading data

__getitem__(offset)

read one byte from file at <offset> (physical address). Example:

for i in range(10):
    print(analysis.file[i])

Parameters:: offset (int) – physical address of the byte to read
Return type:: int

__getitem__(interval)

return the bytes read from a (contiguous) file interval. Example:

header = analysis.file[:10]
tail = analysis.file[-10:]
mid = analysis.file[100:200]

Parameters:: interval (slice) – physical offset interval to read from
Return type:: bytes

Note

Non-contiguous (e.g. analysis.file[0:10:2]) or backward (e.g. analysis.file[0:10:-1]) slices are currently not supported.

read(offset, length)

return the <length> bytes read at offset <offset>.

Parameters:

offset (int) – physical address of the byte to read
length (int) – number of bytes to read

Return type:

bytes

read_until(offset, delimiter, read_max_bytes=0)

read bytes at offset until either <delimiter> is found or <read_max_bytes> have been read. The delimiter is not included.

Parameters:

offset (int) – physical address of the byte to read
delimiter (int) – stop at this delimiter (won’t be included)
read_max_bytes (int) – maximal number of bytes to read

Return type:

bytes

String read

Some helpers to read C strings from a file.

read_cstring_ascii(offset, read_max_bytes=512)

read the ascii-encoded null-terminated string (up to <read_max_bytes> bytes) located at <offset> (physical address). Note that the returned string does not include the null byte.

Parameters:

offset (int) – physical address of first byte of the C string
read_max_bytes (int) – maximal number of bytes to read (including terminating null byte)

Return type:

str

read_cstring_utf8(offset, read_max_bytes=512)

read the utf8-encoded null-terminated string (up to <read_max_bytes> bytes) located at <offset> (physical address). Note that the returned string does not include the null byte.

Parameters:

offset (int) – physical address of first byte of the C string
read_max_bytes (int) – maximal number of bytes to read (including terminating null byte)

Return type:

str

read_cstring_utf16le(offset, read_max_bytes=512)

read the utf16le-encoded null-terminated string (up to <read_max_bytes>/2 characters) located at <offset> (physical address). Note that the returned string does not include the null byte.

Parameters:

offset – physical address of first byte of the C string
read_max_bytes – maximal number of bytes to read (including terminating null bytes), number of chaeracters will be half of that

Return type:

str

read_cstring_utf16be(offset, read_max_bytes=512)

read the utf16be-encoded null-terminated string (up to <read_max_bytes>/2 characters) located at <offset> (physical address). Note that the returned string does not include the null byte.

Parameters:

offset – physical address of first byte of the C string
read_max_bytes – maximal number of bytes to read (including terminating null bytes), number of chaeracters will be half of that

Return type:

str

Write access

Python scripts are able to modify the file bytes using the python file object. Writes can be undone/redone from within the user interface once the script finishes.

__setitem__(offset, what)

write one byte from file at <offset> (physical address). Example:

analysis.file[i] = 56

Parameters:

offset (int) – physical address of the byte to read
what (int in range [0,256[) – the byte to write

__setitem__(interval, what)

write bytes to a (contiguous) file interval. Example:

analysis.file[:10] = b"0123456789"
analysis.file[100:200] = bytes(range(100))

Parameters:

interval (slice) – phyisical offset interval to write to
what (bytes) – the bytes to write

Note

Non-contiguous (e.g. analysis.file[0:10:2]) or backward (e.g. analysis.file[0:10:-1]) slices are currently not supported.

write(offset, what)

write <what> bytes to a offset <offset>.

Parameters:

offset (int) – physical address of the byte to read
what (bytes) – the bytes to write

Searching

The file object also has bindings for two C++-backed search functions. Why not read the whole file into a python buffer and then use the python regexp module? Well, for the following reasons:

the file may be very big and not fit in memory (for memory-mapped files for instance)
the C++ search functions make use of PCRE2 Just-In-Time regexp compilation, which makes it faster than python when searching over a large interval.

search(pattern, offset=0, size=0)

search the pcre2-compliant regexp <pattern> in the file starting at <offset> (physical address), on maximum <size> bytes (up to EOF if size is zero). Returns a pair (match_offset, match_size). match_size is 0 if not found. Note that capture groups are not supported. Example:

offset, size = analysis.file.search(r"\w+\.html")
if not size or offset is None:
    # when pattern is not found, size is 0 and offset is None
    print("pattern not found")
else:
    print("pattern '{}' found at offset #0x{:x}".format(analysis.file[offset:offset+size], offset))

Parameters:

pattern (str) – pcre2-compliant regular expression.
offset (int) – physical file offset where to start the search, defaults to 0.
size – on how many bytes should the search be performed. If zero, up the search is performed up to the last byte in the file.

Returns:

(match offset, match size)

Return type:

(int | None, int)

search_all(pattern, offset=0, size=0)

search the pcre2-compliant regexp <pattern> in the file starting at <offset> (physical address), on maximum <size> bytes (up to EOF if size is zero). Returns an iterator over all the matches that were found. Example:

for offset, size in analysis.file.search_all(r"\w+\.html"):
    print("pattern '{}' found at offset #0x{:x}".format(analysis.file[offset:offset+size], offset))

Parameters:

pattern (str) – pcre2-compliant regular expression.
offset (int) – physical file offset where to start the search, defaults to 0.
size – on how many bytes should the search be performed. If zero, up the search is performed up to the last byte in the file.

Returns:

iterator over pairs (match offset, match size)

Return type:

iterator

Note

The search is performed lazily, i.e. the second match will be found only after you’ve iterated over the first one.