Disassembly (analysis.asm)
- analysis.asm: malcat.Asm
- The - analysis.asmobject is a- malcat.Asminstance that gives you access Malcat’s Disassembler.
Note that in addition to this documentation, you can find usage examples in the sample script which is loaded when you hit F8.
Disassembling
The disassembly interface in Malcat is accessed through the analysis.asm object. Every file-baked address can be disassembled, not only those identified as code. Purely virtual addresses can also be disassembled, but the memory content will be assumed to be all zeroes.
While Malcat does not use internally any intermediate representation common to all the Supported architectures, most of the disassembly interface is architecture-agnostic, i.e you can use the same code for all different architectures.
Note
We are not very happy with the internal disassembly architecture right now, so keep in mind that this interface may change in the future.
- class malcat.Asm
- This class is an interface to Malcat’s Disassembler. Note that all addresses used in this class are effective addresses. See Addressing in Malcat for more details. - Disassembling- __getitem__(interval)
- Iterate over all the functions contained in the interval (effective address): - ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] for basic_block in analysis.cfg[ep_fn.start : ep_fn.end]: if basic_block.code: for insn in analysis.asm[basic_block.start : basic_block.end]: print(insn) - Parameters:
- interval (slice) – effective address interval 
- Return type:
- iterator over the sequence of instructions ( - Instruction)
 
 - __getitem__(ea)
- Disassemble the instruction at effective address ea: - ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] print(first_insn) - Parameters:
- address (int) – effective address where to disassemble 
- Return type:
 
 - size(ea)
- Returns the size (in bytes) of the instruction located at effective address ea. - Parameters:
- address (int) – effective address where to disassemble 
- Return type:
- int 
 
 - Utility functions- align(ea)
- returns the estimated start address of the assembly instruction located at effective address ea (ea can point to the middle of the instruction). - Note that unlike the - CFG.align()method, this is merely an approximation of the start of the instruction. For some architectures (e.g. x86) getting the real start of the instruction is not always decidable.- start_of_instr = analysis.asm.align(0x100) - Parameters:
- ea (int) – effective address for the query 
- Return type:
- int (effective address) 
 
 
Instructions
Instruction object
- class malcat.Instruction
- This class gives you information about a disassembled instruction. - Instruction location- address: int (effective address)
- the address of the first byte of the instruction 
 - end: int (effective address)
- address of the first byte after the instruction - ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] next_insn = analysis.asm[first_insn.end] 
 - size: int
- size in bytes of the instruction: - end-- start- ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] next_insn = analysis.asm[ep_fn.start + first_insn.size] # equivalent next_insn = analysis.asm[first_insn.end] 
 - bb: malcat.BasicBlock
- the basic block this instruction belongs to 
 - function: malcat.Function
- the function this instruction belongs to 
 - Operands- This set of functions gives you access to the instruction operands. - __len__()
- Return the number of operands of the instruction - ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] print(f"{first_insn} has {len(first_insn)} operands") - Return type:
- int 
 
 - __iter__()
- Iterate over the instruction operands - ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] for i, operand in enumerate(first_insn): print(f"{i}th operand of {first_insn}: {operand.type} ({operand.value})") - Return type:
- iterator over - malcat.InstructionOperand
 
 - __getitem__(i)
- Returns the ith - malcat.InstructionOperandof the instruction.- ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] for i in range(len(first_insn)): print(f"{i}th operand of {first_insn}: {first_insn[i].type} ({first_insn[i].value})") - Parameters:
- i (int) – zero-based index of the operand to get 
- Return type:
- malcat.InstructionOperandinstance
- Raises:
- IndexErrorif- i >= len(instr)
 
 - Other- type: malcat.Instruction.Type
- Type (aka category) of the instruction - from bindings import Instruction ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] if first_insn.type == Instruction.Type.RETURN: raise ValueError("Empty EP") 
 - mnemonic: str
- Textual representation of the instruction’s mnemonic (i.e opcode without operand) - ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] print(first_insn.mnemonic) >>> "mov" 
 - inrefs: List[malcat.Reference]
- list of all data and code (a list of - malcat.Referenceobjects) that references this instruction- first_insn = analysis.asm[analysis.v2a(0x18000147a)] for inref in first_insn.inrefs: print(f"instruction {first_insn} is referenced by {analysis.ppa(inref.address)} ({inref.type})") >> "instruction mov rcx, [0x18002C670] is referenced by 0x180001463 (sub_180001400+63) (Type.CODE)" 
 - outrefs: List[malcat.Reference]
- list of all data and code (a list of - malcat.Referenceobjects) referenced by this instruction- first_insn = analysis.asm[analysis.v2a(0x180001127)] for outref in first_insn.outrefs: print(f"instruction {first_insn} references {analysis.ppa(outref.address)}") >> "instruction mov dword ptr [0x18002B0E0], 0x01 references 0x18002b0e0 (.data:30e0)" 
 - __repr__()
- print the disassembled instruction - ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_fn = analysis.fns[analysis.map.from_rva(ep_rva)] first_insn = analysis.asm[ep_fn.start] disas = str(first_insn) - Return type:
- str 
 
 - disasm(use_hexadecimal=True, resolve_symbols=True, resolve_functions=True, resolve_strings=True, resolve_structures=True)
- Disassemble this instruction following the given formatting - Parameters:
- use_hexadecimal (bool) – display immediates in hexadecimal base 
- resolve_symbols (bool) – known symbol addresses will be replaced by their symbol names 
- resolve_functions (bool) – known function start addresses will be replaced by their function name 
- resolve_strings (bool) – known string addresses will be replaced by the string content 
- resolve_structures (bool) – known structure/fields addresses will be replaced by their structure/field name 
 
- Return type:
- str 
 
 
Instruction types
In Malcat, every instruction of every Supported architectures gets assigned to an instruction category or instruction type. This helps writing heuristics/anomalies or scripts that work across different architectures.
- class malcat.Instruction.Type
- This enum describes all the different instruction categories. - ADD
- add-like opcodes 
 - AND
- and-like opcodes 
 - ASSIGN
- mov-like opcodes 
 - CALL
- calls 
 - CAST
- cast-like opcodes 
 - CJUMP
- conditional jumps opcodes 
 - CMP
- comparison opcodes 
 - DIV
- div-like opcodes 
 - FAULTY
- faulty opcodes (i.e very likely to raise an error when executed, like int3) 
 - FPU
- fpu opcodes 
 - INVALID
- invalid opcodes (could not be decoded) 
 - JUMP
- non-conditional jumps 
 - LSHIFT
- lelft shift opcodes 
 - MMX
- mmx opcodes 
 - MUL
- mul-like opcodes 
 - NOP
- nop-like opcodes 
 - OR
- or-like opcodes 
 - POP
- pop-like opcodes 
 - PUSH
- push-like opcodes 
 - RETURN
- return-like opcodes 
 - RSHIFT
- right shift-like opcodes 
 - STACK
- stack-like opcodes (like dup or stack frame setup) 
 - SUB
- sub-like opcodes 
 - XOR
- xor-like opcodes 
 - OTHER
- opcodes which don’t fit in any other category 
 
Instruction operands
Each operand of an instruction is represented by an InstructionOperand instance. The disassembly interface being still work in progress, the operand interface is a bit limited for now. It will be properly refactored once we have all the CPU architectures we want. For now, you can query the following properties of an instruction operand:
- class malcat.InstructionOperand
- This enum describes all the different instruction categories. - type: InstructionOperand.Type
- What kind of operand it is. It can be: 
 - action: InstructionOperand.Action
- How is the operand accessed. It can be: 
 - value: int
- The immediate value of the operand, or None if not appicable (e.g. it’s a register). 
 - register: int
- The register id of the operand or None if not applicable (e.g its not a register) 
 - symbol: str
- The symbolic value of the operand or None if not applicable (e.g its not a symbol). For .net for instance, this could be a typedef/methoddef name 
 
- class malcat.InstructionOperand.Type
- CONSTANT
- The operand is is an immediate value, like in - push 0x05
 - REGISTER
- Program-wide registers, like - eaxin x86, or- $R1register in NSIS
 - LOCAL
- A variable local to the current function, like locals or args in .net, or - [ebp/esp+XXX]in x86
 - OBJECT
- A pointer to an instanciated object or field thereof. For x86, it is all non-local - [reg]or- [reg+XXX]addressing, for .NET every fields or objects
 - GLOBAL
- A global variable, like - push [0x405678]
 - SYMBOL
- A non-resolved symbol, like a class type in Python