Disassembly (malcat.asm)
- malcat.asm: bindings.Asm
The
malcat.asm
object is abindings.Asm
instance that gives you access Malcat’s Disassembler.
Note that in addition to this documentation, you can find usage examples in the sample script which is loaded when you hit F8.
Table of Contents
Disassembling
The disassembly interface in Malcat is accessed through the malcat.asm
object. Every file-baked address can be disassembled, not only those identified as code. Purely virtual addresses can also be disassembled, but the memory content will be assumed to be all zeroes.
While Malcat does not use internally any intermediate representation common to all the Supported architectures, most of the disassembly interface is architecture-agnostic, i.e you can use the same code for all different architectures.
Note
We are not very happy with the internal disassembly architecture right now, so keep in mind that this interface may change in the future.
- class bindings.Asm
This class is an interface to Malcat’s Disassembler. Note that all addresses used in this class are effective addresses. See Addressing in Malcat for more details.
Disassembling
- __getitem__(interval)
Iterate over all the functions contained in the interval (effective address):
ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] for basic_block in malcat.cfg[ep_fn.start : ep_fn.end]: if basic_block.code: for insn in malcat.asm[basic_block.start : basic_block.end]: print(insn)
- Parameters
interval (slice) – effective address interval
- Return type
iterator over the sequence of instructions (
Instruction
)
- __getitem__(ea)
Disassemble the instruction at effective address ea:
ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] first_insn = malcat.asm[ep_fn.start] print(first_insn)
- Parameters
address (int) – effective address where to disassemble
- Return type
- size(ea)
Returns the size (in bytes) of the instruction located at effective address ea.
- Parameters
address (int) – effective address where to disassemble
- Return type
int
Utility functions
- align(ea)
returns the estimated start address of the assembly instruction located at effective address ea (ea can point to the middle of the instruction).
Note that unlike the
CFG.align()
method, this is merely an approximation of the start of the instruction. For some architectures (e.g. x86) getting the real start of the instruction is not always decidable.start_of_instr = malcat.asm.align(0x100)
- Parameters
ea (int) – effective address for the query
- Return type
int (effective address)
Instructions
Instruction object
- class bindings.Instruction
This class gives you information about a disassembled instruction.
Instruction location
- address: int (effective address)
the address of the first byte of the instruction
- end: int (effective address)
address of the first byte after the instruction
ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] first_insn = malcat.asm[ep_fn.start] next_insn + malcat.asm[first_insn.end]
- size: int
size in bytes of the instruction:
end
-start
ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] first_insn = malcat.asm[ep_fn.start] next_insn + malcat.asm[ep_fn.start + first_insn.size] # equivalent next_insn + malcat.asm[first_insn.end]
Operands
This set of functions gives you access to the instruction operands.
- __len__()
Return the number of operands of the instruction
ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] first_insn = malcat.asm[ep_fn.start] print(f"{first_insn} has {len(first_insn)} operands")
- Return type
int
- __iter__()
Iterate over the instruction operands
ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] first_insn = malcat.asm[ep_fn.start] for i, operand in enumerate(first_insn): print(f"{i}th operand of {first_insn}: {operand.type} ({operand.value})")
- Return type
iterator over
InstructionOperand
- __getitem__(i)
Returns the ith
InstructionOperand
of the instruction.ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] for i in range(len(first_insn)): print(f"{i}th operand of {first_insn}: {first_insn[i].type} ({first_insn[i].value})")
- Parameters
i (int) – zero-based index of the oeprand to get
- Return type
InstructionOperand
instance- Raises
IndexError
ifi >= len(instr)
Other
- type: bindings.Instruction.Type
Type (aka category) of the instruction
from bindings import Instruction ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] first_insn = malcat.asm[ep_fn.start] if first_insn.type == Instruction.Type.RETURN: raise ValueError("Empty EP")
- mnemonic: str
Textual representation of the instruction’s mnemonic (i.e opcode without operand)
ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] first_insn = malcat.asm[ep_fn.start] print(first_insn.mnemonic) >>> "mov"
- __repr__()
print the disassembled instruction
ep_rva = malcat.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = malcat.fns[malcat.map.from_rva(ep_rva)] first_insn = malcat.asm[ep_fn.start] disas = str(first_insn)
- Return type
str
Instruction types
In Malcat, every instruction of every Supported architectures gets assigned to an instruction category or instruction type. This helps writing heuristics/anomalies or scripts that work across different architectures.
- class bindings.Instruction.Type
This enum describes all the different instruction categories.
- ADD
add-like opcodes
- AND
and-like opcodes
- ASSIGN
mov-like opcodes
- CALL
calls
- CAST
cast-like opcodes
- CJUMP
conditional jumps opcodes
- CMP
comparison opcodes
- DIV
div-like opcodes
- FAULTY
faulty opcodes (i.e very likely to raise an error when executed, like int3)
- FPU
fpu opcodes
- INVALID
invalid opcodes (could not be decoded)
- JUMP
non-conditional jumps
- LSHIFT
lelft shift opcodes
- MMX
mmx opcodes
- MUL
mul-like opcodes
- NOP
nop-like opcodes
- OR
or-like opcodes
- POP
pop-like opcodes
- PUSH
push-like opcodes
- RETURN
return-like opcodes
- RSHIFT
right shift-like opcodes
- STACK
stack-like opcodes (like dup or stack frame setup)
- SUB
sub-like opcodes
- XOR
xor-like opcodes
- OTHER
opcodes which don’t fit in any other category
Instruction operands
Each operand of an instruction is represented by an InstructionOperand
instance. The disassembly interface being still work in progress, the operand interface is a bit limited for now. It will be properly refactored once we have all the CPU architectures we want. For now, you can query the following properties of an instruction operand:
- class bindings.InstructionOperand
This enum describes all the different instruction categories.
- type: InstructionOperand.Type
What kind of operand it is. It can be:
- action: InstructionOperand.Action
How is the operand accessed. It can be:
- value: int
The immediate value of the operand, or None if not appicable (e.g. it’s a register)
- register: int
The register id of the operand or None if not applicable (e.g its not a register)
- symbol: str
The symbolic value of the operand or None if not applicable (e.g its not a symbol)
- class bindings.InstructionOperand.Type
- CONSTANT
The operand is is an immediate value, like in
push 0x05
- REGISTER
Program-wide registers, like
eax
in x86, or$R1
register in NSIS
- LOCAL
A variable local to the current function, like locals or args in .net, or
[ebp/esp+XXX]
in x86
- OBJECT
A pointer to an instanciated object or field thereof. For x86, it is all non-local
[reg]
or[reg+XXX]
addressing, for .NET every fields or objects
- GLOBAL
A global variable, like
push [0x405678]
- SYMBOL
A non-resolved symbol, like a class type in Python