File object (malcat.file)
- malcat.file: File
The
malcat.file
object is aFile
instance which represents the content of the file/data being analyzed. It may be a real file located on the file system or a non-physical file (for instance when you chose to “open” a user selection). The file can be read from any python script, template, anomaly, basically everything that has access to the bindings. Python scripts may additionally write to the file object.
- class bindings.File
This class represent a file open inside Malcat. Note that all addresses used in this class are physical addresses. See Addressing in Malcat for more details.
- name: str
File base name, as shown in the user interface.
- path: str
File path, may be None for non-physical files.
- __len__()
return the file size, same as
bindings.File.size
. Example:filesize = len(malcat.file)
- Return type
int
I/O methods
Reading data
- __getitem__(offset)
read one byte from file at <offset> (physical address). Example:
for i in range(10): print(malcat.file[i])
- Parameters
offset (int) – physical address of the byte to read
- Return type
byte
- __getitem__(interval)
return the bytes read from a (contiguous) file interval. Example:
header = malcat.file[:10] tail = malcat.file[-10:] mid = malcat.file[100:200]
- Parameters
interval (slice) – physical offset interval to read from
- Return type
bytes
Note
Non-contiguous (e.g.
malcat.file[0:10:2]
) or backward (e.g.malcat.file[0:10:-1]
) slices are currently not supported.
- read(offset, length)
return the <length> bytes read at offset <offset>.
- Parameters
offset (int) – physical address of the byte to read
length (int) – number of bytes to read
- Return type
bytes
String read
Some helpers to read C strings from a file.
- read_cstring_ascii(offset, max_bytes=512)
read the ascii-encoded null-terminated string (up to <max_bytes> bytes) located at <offset> (physical address). Note that the returned string does not include the null byte.
- Parameters
offset (int) – physical address of first byte of the C string
max_bytes (int) – maximal number of bytes to read (including terminating null byte)
- Return type
str
- read_cstring_utf8(offset, max_bytes=512)
read the utf8-encoded null-terminated string (up to <max_bytes> bytes) located at <offset> (physical address). Note that the returned string does not include the null byte.
- Parameters
offset (int) – physical address of first byte of the C string
max_bytes (int) – maximal number of bytes to read (including terminating null byte)
- Return type
str
- read_cstring_utf16le(offset, max_bytes=512)
read the utf16le-encoded null-terminated string (up to <max_bytes>/2 characters) located at <offset> (physical address). Note that the returned string does not include the null byte.
- Parameters
offset (int) – physical address of first byte of the C string
max_bytes (int) – maximal number of bytes to read (including terminating null bytes), number of chaeracters will be half of that
- Return type
str
- read_cstring_utf16be(offset, max_bytes=512)
read the utf16be-encoded null-terminated string (up to <max_bytes>/2 characters) located at <offset> (physical address). Note that the returned string does not include the null byte.
- Parameters
offset (int) – physical address of first byte of the C string
max_bytes (int) – maximal number of bytes to read (including terminating null bytes), number of chaeracters will be half of that
- Return type
str
Direct write access
Python scripts are able to modify the file bytes using the python file object. Writes can be undone/redone from within the user interface once the script finishes (all writes together form a single undo step).
- __setitem__(offset, what)
write one byte from file at <offset> (physical address). Example:
malcat.file[i] = 56
- Parameters
offset (int) – physical address of the byte to read
what (byte) – the byte to write
- __setitem__(interval, what)
write bytes to a (contiguous) file interval. Example:
malcat.file[:10] = b"0123456789" malcat.file[100:200] = bytes(range(100))
- Parameters
interval (slice) – phyisical offset interval to write to
what (bytes) – the bytes to write
Note
Non-contiguous (e.g.
malcat.file[0:10:2]
) or backward (e.g.malcat.file[0:10:-1]
) slices are currently not supported.
- write(offset, what)
write <what> bytes to a offset <offset>.
- Parameters
offset (int) – physical address of the byte to read
what (bytes) – the bytes to write
Searching
The file object also has bindings for two C++-backed search functions. Why not read the whole file into a python buffer and then use the python regexp module? Well, for the following reasons:
the file may be very big and not fit in memory (for memory-mapped files for instance)
the C++ search functions make use of PCRE2 Just-In_time regexp compilation, which makes it faster than python when searching over a large interval.
- search(pattern, offset=0, size=0)
search the pcre2-compliant regexp <pattern> in the file starting at <offset> (physical address), on maximum <size> bytes (up to EOF if size is zero). Returns a pair (match_offset, match_size). match_size is 0 if not found. Note that capture groups are not supported. Example:
offset, size = malcat.file.search(r"\w+\.html") if not size: print("pattern not found") else: print("pattern '{}' found at offset #0x{:x}".format(malcat.file[offset:offset+size], offset))
- Parameters
pattern (str) – pcre2-compliant regular expression.
offset (int) – physical file offset where to start the search, defaults to 0.
size – on how many bytes should the search be performed. If zero, up the search is performed up to the last byte in the file.
- Returns
(match offset, match size)
- Return type
(int, int)
- search_all(pattern, offset=0, size=0)
search the pcre2-compliant regexp <pattern> in the file starting at <offset> (physical address), on maximum <size> bytes (up to EOF if size is zero). Returns an iterator over all the matches that were found. Example:
for offset, size in malcat.file.search_all(r"\w+\.html"): print("pattern '{}' found at offset #0x{:x}".format(malcat.file[offset:offset+size], offset))
- Parameters
pattern (str) – pcre2-compliant regular expression.
offset (int) – physical file offset where to start the search, defaults to 0.
size – on how many bytes should the search be performed. If zero, up the search is performed up to the last byte in the file.
- Returns
iterator over pairs (match offset, match size)
- Return type
iterator
Note
The search is performed lazily, i.e. the second match will be found only after you’ve iterator over the first one.