Disassembly (analysis.asm)

analysis.asm: malcat.Asm: The analysis.asm object is a malcat.Asm instance that gives you access Malcat’s Disassembler.

Note that in addition to this documentation, you can find usage examples in the sample script which is loaded when you hit F8.

Disassembling 

The disassembly interface in Malcat is accessed through the analysis.asm object. Every file-baked address can be disassembled, not only those identified as code. Purely virtual addresses can also be disassembled, but the memory content will be assumed to be all zeroes.

While Malcat does not use internally any intermediate representation common to all the Supported architectures, most of the disassembly interface is architecture-agnostic, i.e you can use the same code for all different architectures.

Note

We are not very happy with the internal disassembly architecture right now, so keep in mind that this interface may change in the future.

class malcat.Asm

This class is an interface to Malcat’s Disassembler. Note that all addresses used in this class are effective addresses. See Addressing in Malcat for more details.

Disassembling

__getitem__(interval)

Iterate over all the functions contained in the interval (effective address):

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
for basic_block in analysis.cfg[ep_fn.start : ep_fn.end]:
    if basic_block.code:
        for insn in analysis.asm[basic_block.start : basic_block.end]:
            print(insn)

Parameters:: interval (slice) – effective address interval
Return type:: iterator over the sequence of instructions (Instruction)

__getitem__(ea)

Disassemble the instruction at effective address ea:

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
print(first_insn)

Parameters:: address (int) – effective address where to disassemble
Return type:: Instruction

size(ea)

Returns the size (in bytes) of the instruction located at effective address ea.

Parameters:: address (int) – effective address where to disassemble
Return type:: int

Utility functions

align(ea)

returns the estimated start address of the assembly instruction located at effective address ea (ea can point to the middle of the instruction).

Note that unlike the CFG.align() method, this is merely an approximation of the start of the instruction. For some architectures (e.g. x86) getting the real start of the instruction is not always decidable.

start_of_instr = analysis.asm.align(0x100)

Parameters:: ea (int) – effective address for the query
Return type:: int (effective address)

Instructions 

Instruction object 

class malcat.Instruction

This class gives you information about a disassembled instruction.

Instruction location

address: int (effective address): the address of the first byte of the instruction

start: int (effective address): same as address

end: int (effective address)

address of the first byte after the instruction

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
next_insn = analysis.asm[first_insn.end]

size: int

size in bytes of the instruction: end - start

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
next_insn = analysis.asm[ep_fn.start + first_insn.size]
# equivalent
next_insn = analysis.asm[first_insn.end]

bb: malcat.BasicBlock: the basic block this instruction belongs to

function: malcat.Function: the function this instruction belongs to

Operands

This set of functions gives you access to the instruction operands.

__len__()

Return the number of operands of the instruction

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
print(f"{first_insn} has {len(first_insn)} operands")

Return type:: int

__iter__()

Iterate over the instruction operands

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
for i, operand in enumerate(first_insn):
    print(f"{i}th operand of {first_insn}: {operand.type} ({operand.value})")

Return type:: iterator over malcat.InstructionOperand

__getitem__(i)

Returns the ith malcat.InstructionOperand of the instruction.

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
for i in range(len(first_insn)):
    print(f"{i}th operand of {first_insn}: {first_insn[i].type} ({first_insn[i].value})")

Parameters:: i (int) – zero-based index of the operand to get
Return type:: malcat.InstructionOperand instance
Raises:: IndexError if i >= len(instr)

Other

type: malcat.Instruction.Type

Type (aka category) of the instruction

from bindings import Instruction

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
if first_insn.type == Instruction.Type.RETURN:
    raise ValueError("Empty EP")

mnemonic: str

Textual representation of the instruction’s mnemonic (i.e opcode without operand)

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
print(first_insn.mnemonic)
>>> "mov"

inrefs: List[malcat.Reference]

list of all data and code (a list of malcat.Reference objects) that references this instruction

first_insn = analysis.asm[analysis.v2a(0x18000147a)]
for inref in first_insn.inrefs:
    print(f"instruction {first_insn} is referenced by {analysis.ppa(inref.address)} ({inref.type})")
>> "instruction mov rcx, [0x18002C670] is referenced by 0x180001463 (sub_180001400+63) (Type.CODE)"

outrefs: List[malcat.Reference]

list of all data and code (a list of malcat.Reference objects) referenced by this instruction

first_insn = analysis.asm[analysis.v2a(0x180001127)]
for outref in first_insn.outrefs:
    print(f"instruction {first_insn} references {analysis.ppa(outref.address)}")
>> "instruction mov dword ptr [0x18002B0E0], 0x01 references 0x18002b0e0 (.data:30e0)"

__repr__()

print the disassembled instruction

ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint']
ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)]
first_insn = analysis.asm[ep_fn.start]
disas = str(first_insn)

Return type:: str

disasm(use_hexadecimal=True, resolve_symbols=True, resolve_functions=True, resolve_strings=True, resolve_structures=True)

Disassemble this instruction following the given formatting

Parameters:

use_hexadecimal (bool) – display immediates in hexadecimal base
resolve_symbols (bool) – known symbol addresses will be replaced by their symbol names
resolve_functions (bool) – known function start addresses will be replaced by their function name
resolve_strings (bool) – known string addresses will be replaced by the string content
resolve_structures (bool) – known structure/fields addresses will be replaced by their structure/field name

Return type:

str

Instruction types 

In Malcat, every instruction of every Supported architectures gets assigned to an instruction category or instruction type. This helps writing heuristics/anomalies or scripts that work across different architectures.

class malcat.Instruction.Type

This enum describes all the different instruction categories.

ADD: add-like opcodes

AND: and-like opcodes

ASSIGN: mov-like opcodes

CALL: calls

CAST: cast-like opcodes

CJUMP: conditional jumps opcodes

CMP: comparison opcodes

DIV: div-like opcodes

FAULTY: faulty opcodes (i.e very likely to raise an error when executed, like int3)

FPU: fpu opcodes

INVALID: invalid opcodes (could not be decoded)

JUMP: non-conditional jumps

LSHIFT: lelft shift opcodes

MMX: mmx opcodes

MUL: mul-like opcodes

NOP: nop-like opcodes

OR: or-like opcodes

POP: pop-like opcodes

PUSH: push-like opcodes

RETURN: return-like opcodes

RSHIFT: right shift-like opcodes

STACK: stack-like opcodes (like dup or stack frame setup)

SUB: sub-like opcodes

XOR: xor-like opcodes

OTHER: opcodes which don’t fit in any other category

Instruction operands 

Each operand of an instruction is represented by an InstructionOperand instance. The disassembly interface being still work in progress, the operand interface is a bit limited for now. It will be properly refactored once we have all the CPU architectures we want. For now, you can query the following properties of an instruction operand:

class malcat.InstructionOperand

This enum describes all the different instruction categories.

type: InstructionOperand.Type

What kind of operand it is. It can be:

malcat.InstructionOperand.Type.CONSTANT
malcat.InstructionOperand.Type.REGISTER
malcat.InstructionOperand.Type.LOCAL
malcat.InstructionOperand.Type.OBJECT
malcat.InstructionOperand.Type.GLOBAL
malcat.InstructionOperand.Type.SYMBOL

action: InstructionOperand.Action

How is the operand accessed. It can be:

malcat.InstructionOperand.Action.NONE
malcat.InstructionOperand.Action.R
malcat.InstructionOperand.Action.W
malcat.InstructionOperand.Action.RW

value: int: The immediate value of the operand, or None if not appicable (e.g. it’s a register).

register: int: The register id of the operand or None if not applicable (e.g its not a register)

symbol: str: The symbolic value of the operand or None if not applicable (e.g its not a symbol). For .net for instance, this could be a typedef/methoddef name

class malcat.InstructionOperand.Type

CONSTANT: The operand is is an immediate value, like in push 0x05

REGISTER: Program-wide registers, like eax in x86, or $R1 register in NSIS

LOCAL: A variable local to the current function, like locals or args in .net, or [ebp/esp+XXX] in x86

OBJECT: A pointer to an instanciated object or field thereof. For x86, it is all non-local [reg] or [reg+XXX] addressing, for .NET every fields or objects

GLOBAL: A global variable, like push [0x405678]

SYMBOL: A non-resolved symbol, like a class type in Python

class malcat.InstructionOperand.Action

NONE: The operand is not accessed.

R: The operand is read.

W: The operand is written.

RW: The operand is read and written.

Disassembly (analysis.asm)

Disassembling

Disassembling

Utility functions

Instructions

Instruction object

Instruction location

Operands

Other

Instruction types

Instruction operands

Disassembling 

Instructions 

Instruction object 

Instruction types 

Instruction operands 