Disassembly (analysis.asm)
- analysis.asm: malcat.Asm
The
analysis.asm
object is amalcat.Asm
instance that gives you access Malcat’s Disassembler.
Note that in addition to this documentation, you can find usage examples in the sample script which is loaded when you hit F8.
Disassembling
The disassembly interface in Malcat is accessed through the analysis.asm
object. Every file-baked address can be disassembled, not only those identified as code. Purely virtual addresses can also be disassembled, but the memory content will be assumed to be all zeroes.
While Malcat does not use internally any intermediate representation common to all the Supported architectures, most of the disassembly interface is architecture-agnostic, i.e you can use the same code for all different architectures.
Note
We are not very happy with the internal disassembly architecture right now, so keep in mind that this interface may change in the future.
- class malcat.Asm
This class is an interface to Malcat’s Disassembler. Note that all addresses used in this class are effective addresses. See Addressing in Malcat for more details.
Disassembling
- __getitem__(interval)
Iterate over all the functions contained in the interval (effective address):
ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] for basic_block in analysis.cfg[ep_fn.start : ep_fn.end]: if basic_block.code: for insn in analysis.asm[basic_block.start : basic_block.end]: print(insn)
- Parameters:
interval (slice) – effective address interval
- Return type:
iterator over the sequence of instructions (
Instruction
)
- __getitem__(ea)
Disassemble the instruction at effective address ea:
ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] print(first_insn)
- Parameters:
address (int) – effective address where to disassemble
- Return type:
- size(ea)
Returns the size (in bytes) of the instruction located at effective address ea.
- Parameters:
address (int) – effective address where to disassemble
- Return type:
int
Utility functions
- align(ea)
returns the estimated start address of the assembly instruction located at effective address ea (ea can point to the middle of the instruction).
Note that unlike the
CFG.align()
method, this is merely an approximation of the start of the instruction. For some architectures (e.g. x86) getting the real start of the instruction is not always decidable.start_of_instr = analysis.asm.align(0x100)
- Parameters:
ea (int) – effective address for the query
- Return type:
int (effective address)
Instructions
Instruction object
- class malcat.Instruction
This class gives you information about a disassembled instruction.
Instruction location
- address: int (effective address)
the address of the first byte of the instruction
- end: int (effective address)
address of the first byte after the instruction
ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] next_insn = analysis.asm[first_insn.end]
- size: int
size in bytes of the instruction:
end
-start
ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] next_insn = analysis.asm[ep_fn.start + first_insn.size] # equivalent next_insn = analysis.asm[first_insn.end]
Operands
This set of functions gives you access to the instruction operands.
- __len__()
Return the number of operands of the instruction
ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] print(f"{first_insn} has {len(first_insn)} operands")
- Return type:
int
- __iter__()
Iterate over the instruction operands
ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] for i, operand in enumerate(first_insn): print(f"{i}th operand of {first_insn}: {operand.type} ({operand.value})")
- Return type:
iterator over
malcat.InstructionOperand
- __getitem__(i)
Returns the ith
malcat.InstructionOperand
of the instruction.ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] for i in range(len(first_insn)): print(f"{i}th operand of {first_insn}: {first_insn[i].type} ({first_insn[i].value})")
- Parameters:
i (int) – zero-based index of the operand to get
- Return type:
malcat.InstructionOperand
instance- Raises:
IndexError
ifi >= len(instr)
Other
- type: malcat.Instruction.Type
Type (aka category) of the instruction
from bindings import Instruction ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] if first_insn.type == Instruction.Type.RETURN: raise ValueError("Empty EP")
- mnemonic: str
Textual representation of the instruction’s mnemonic (i.e opcode without operand)
ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] print(first_insn.mnemonic) >>> "mov"
- __repr__()
print the disassembled instruction
ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] disas = str(first_insn)
- Return type:
str
Instruction types
In Malcat, every instruction of every Supported architectures gets assigned to an instruction category or instruction type. This helps writing heuristics/anomalies or scripts that work across different architectures.
- class malcat.Instruction.Type
This enum describes all the different instruction categories.
- ADD
add-like opcodes
- AND
and-like opcodes
- ASSIGN
mov-like opcodes
- CALL
calls
- CAST
cast-like opcodes
- CJUMP
conditional jumps opcodes
- CMP
comparison opcodes
- DIV
div-like opcodes
- FAULTY
faulty opcodes (i.e very likely to raise an error when executed, like int3)
- FPU
fpu opcodes
- INVALID
invalid opcodes (could not be decoded)
- JUMP
non-conditional jumps
- LSHIFT
lelft shift opcodes
- MMX
mmx opcodes
- MUL
mul-like opcodes
- NOP
nop-like opcodes
- OR
or-like opcodes
- POP
pop-like opcodes
- PUSH
push-like opcodes
- RETURN
return-like opcodes
- RSHIFT
right shift-like opcodes
- STACK
stack-like opcodes (like dup or stack frame setup)
- SUB
sub-like opcodes
- XOR
xor-like opcodes
- OTHER
opcodes which don’t fit in any other category
Instruction operands
Each operand of an instruction is represented by an InstructionOperand
instance. The disassembly interface being still work in progress, the operand interface is a bit limited for now. It will be properly refactored once we have all the CPU architectures we want. For now, you can query the following properties of an instruction operand:
- class malcat.InstructionOperand
This enum describes all the different instruction categories.
- type: InstructionOperand.Type
What kind of operand it is. It can be:
- action: InstructionOperand.Action
How is the operand accessed. It can be:
- value: int
The immediate value of the operand, or None if not appicable (e.g. it’s a register)
- register: int
The register id of the operand or None if not applicable (e.g its not a register)
- symbol: str
The symbolic value of the operand or None if not applicable (e.g its not a symbol)
- class malcat.InstructionOperand.Type
- CONSTANT
The operand is is an immediate value, like in
push 0x05
- REGISTER
Program-wide registers, like
eax
in x86, or$R1
register in NSIS
- LOCAL
A variable local to the current function, like locals or args in .net, or
[ebp/esp+XXX]
in x86
- OBJECT
A pointer to an instanciated object or field thereof. For x86, it is all non-local
[reg]
or[reg+XXX]
addressing, for .NET every fields or objects
- GLOBAL
A global variable, like
push [0x405678]
- SYMBOL
A non-resolved symbol, like a class type in Python