Disassembly (analysis.asm)

analysis.asm: malcat.Asm

The analysis.asm object is a malcat.Asm instance that gives you access Malcat’s Disassembler.

Note that in addition to this documentation, you can find usage examples in the sample script which is loaded when you hit F8.

Disassembling

The disassembly interface in Malcat is accessed through the analysis.asm object. Every file-baked address can be disassembled, not only those identified as code. Purely virtual addresses can also be disassembled, but the memory content will be assumed to be all zeroes.

While Malcat does not use internally any intermediate representation common to all the Supported architectures, most of the disassembly interface is architecture-agnostic, i.e you can use the same code for all different architectures.

Note

We are not very happy with the internal disassembly architecture right now, so keep in mind that this interface may change in the future.

class malcat.Asm

This class is an interface to Malcat’s Disassembler. Note that all addresses used in this class are effective addresses. See Addressing in Malcat for more details.

Disassembling

__getitem__(interval)

Iterate over all the functions contained in the interval (effective address):

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
for basic_block in analysis.cfg[ep_fn.start : ep_fn.end]:
    if basic_block.code:
        for insn in analysis.asm[basic_block.start : basic_block.end]:
            print(insn)
Parameters:

interval (slice) – effective address interval

Return type:

iterator over the sequence of instructions (Instruction)

__getitem__(ea)

Disassemble the instruction at effective address ea:

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
print(first_insn)
Parameters:

address (int) – effective address where to disassemble

Return type:

Instruction

size(ea)

Returns the size (in bytes) of the instruction located at effective address ea.

Parameters:

address (int) – effective address where to disassemble

Return type:

int

Utility functions

align(ea)

returns the estimated start address of the assembly instruction located at effective address ea (ea can point to the middle of the instruction).

Note that unlike the CFG.align() method, this is merely an approximation of the start of the instruction. For some architectures (e.g. x86) getting the real start of the instruction is not always decidable.

start_of_instr = analysis.asm.align(0x100)
Parameters:

ea (int) – effective address for the query

Return type:

int (effective address)

Instructions

Instruction object

class malcat.Instruction

This class gives you information about a disassembled instruction.

Instruction location

address: int (effective address)

the address of the first byte of the instruction

start: int (effective address)

same as address

end: int (effective address)

address of the first byte after the instruction

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
next_insn = analysis.asm[first_insn.end]
size: int

size in bytes of the instruction: end - start

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
next_insn = analysis.asm[ep_fn.start + first_insn.size]
# equivalent
next_insn = analysis.asm[first_insn.end]

Operands

This set of functions gives you access to the instruction operands.

__len__()

Return the number of operands of the instruction

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
print(f"{first_insn} has {len(first_insn)} operands")
Return type:

int

__iter__()

Iterate over the instruction operands

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
for i, operand in enumerate(first_insn):
    print(f"{i}th operand of {first_insn}: {operand.type} ({operand.value})")
Return type:

iterator over malcat.InstructionOperand

__getitem__(i)

Returns the ith malcat.InstructionOperand of the instruction.

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
for i in range(len(first_insn)):
    print(f"{i}th operand of {first_insn}: {first_insn[i].type} ({first_insn[i].value})")
Parameters:

i (int) – zero-based index of the operand to get

Return type:

malcat.InstructionOperand instance

Raises:

IndexError if i >= len(instr)

Other

type: malcat.Instruction.Type

Type (aka category) of the instruction

from bindings import Instruction

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
if first_insn.type == Instruction.Type.RETURN:
    raise ValueError("Empty EP")
mnemonic: str

Textual representation of the instruction’s mnemonic (i.e opcode without operand)

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
print(first_insn.mnemonic)
>>> "mov"
__repr__()

print the disassembled instruction

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
disas = str(first_insn)
Return type:

str

Instruction types

In Malcat, every instruction of every Supported architectures gets assigned to an instruction category or instruction type. This helps writing heuristics/anomalies or scripts that work across different architectures.

class malcat.Instruction.Type

This enum describes all the different instruction categories.

ADD

add-like opcodes

AND

and-like opcodes

ASSIGN

mov-like opcodes

CALL

calls

CAST

cast-like opcodes

CJUMP

conditional jumps opcodes

CMP

comparison opcodes

DIV

div-like opcodes

FAULTY

faulty opcodes (i.e very likely to raise an error when executed, like int3)

FPU

fpu opcodes

INVALID

invalid opcodes (could not be decoded)

JUMP

non-conditional jumps

LSHIFT

lelft shift opcodes

MMX

mmx opcodes

MUL

mul-like opcodes

NOP

nop-like opcodes

OR

or-like opcodes

POP

pop-like opcodes

PUSH

push-like opcodes

RETURN

return-like opcodes

RSHIFT

right shift-like opcodes

STACK

stack-like opcodes (like dup or stack frame setup)

SUB

sub-like opcodes

XOR

xor-like opcodes

OTHER

opcodes which don’t fit in any other category

Instruction operands

Each operand of an instruction is represented by an InstructionOperand instance. The disassembly interface being still work in progress, the operand interface is a bit limited for now. It will be properly refactored once we have all the CPU architectures we want. For now, you can query the following properties of an instruction operand:

class malcat.InstructionOperand

This enum describes all the different instruction categories.

type: InstructionOperand.Type

What kind of operand it is. It can be:

action: InstructionOperand.Action

How is the operand accessed. It can be:

value: int

The immediate value of the operand, or None if not appicable (e.g. it’s a register)

register: int

The register id of the operand or None if not applicable (e.g its not a register)

symbol: str

The symbolic value of the operand or None if not applicable (e.g its not a symbol)

class malcat.InstructionOperand.Type
CONSTANT

The operand is is an immediate value, like in push 0x05

REGISTER

Program-wide registers, like eax in x86, or $R1 register in NSIS

LOCAL

A variable local to the current function, like locals or args in .net, or [ebp/esp+XXX] in x86

OBJECT

A pointer to an instanciated object or field thereof. For x86, it is all non-local [reg] or [reg+XXX] addressing, for .NET every fields or objects

GLOBAL

A global variable, like push [0x405678]

SYMBOL

A non-resolved symbol, like a class type in Python

class malcat.InstructionOperand.Action
NONE

The operand is not accessed.

R

The operand is read.

W

The operand is written.

RW

The operand is read and written.