Control Flow Graph (analysis.cfg)
- analysis.cfg: malcat.CFG
The
analysis.cfg
object is amalcat.CFG
instance that gives you access to all the basic blocks identified by the CFG reconstruction algorithm.
Note that in addition to this documentation, you can find usage examples in the sample script which is loaded when you hit F8.
Basic blocks definition
Definition
The CFG, or control flow graph, divides executable code into a graph of basic blocks. Basic blocks are contiguous file ranges that satisfies:
control flow always starts at the beginning of the block for every possible execution of the program
control flow always goes to the end of the block for every possible execution of the program, i.e no branching/jump except for the last instruction (with a special case for the EXCEPTION edge, see below.
the basic block is located in a single region
basic blocks have incoming and outgoing edges, which can be of 4 types:
STEP: normal control flow, next instruction will be executed
JUMP: a conditional or unconditional jump
CALL: a call
EXCEPTION: a conditional jump, the condition being that an exception happens when executing any instruction inside the basic block.
Note
Contrary to some other tools or papers, we consider that a call
instruction ends a basic block. Such exotic basic blocks will have their property BasicBlock.exotic
set to True, to help user used to the other definition.
Code blocks and data blocks
In order to simplify the interface a bit and make code easier to read, non-code regions (i.e. data) also belong to special basic blocks named data blocks, which have no incoming nor outgoing edges. So every byte of the effective address space belongs to either a code or data block.
Basic blocks
Blocks
A basic block is a BasicBlock
instance offering the following python methods and properties:
- class malcat.BasicBlock
- address: int (effective address)
the start of the basic block
- end: int (effective address)
last effective address inside the basic block + 1
- function: malcat.Function
the function this basic block belongs to (or None if no function, e.g. for data blocks)
- __iter__()
Iterate over the basic block’s instructions (even if data). Shortcut for
analysis.asm[bb.start:bb.end]
.ep_rva = analysis.struct['OptionalHeader']['AddressOfEntryPoint'] ep_bb = analysis.cfg[analysis.map.from_rva(ep_rva)] for insn in ep_bb: print(insn)
- Return type:
iterator over the sequence of instructions (
Instruction
)
- __contains__(ea)
return True iff the effective address ea is within the basic block boundaries
- Parameters:
ea (int) – address to query
- Return type:
bool
- disasm(use_hexadecimal=True, resolve_symbols=True, resolve_functions=True, resolve_strings=True, resolve_structures=True)
Disassemble this basic block following the given formatting
- Parameters:
use_hexadecimal (bool) – display immediates in hexadecimal base
resolve_symbols (bool) – known symbol addresses will be replaced by their symbol names
resolve_functions (bool) – known function start addresses will be replaced by their function name
resolve_strings (bool) – known string addresses will be replaced by the string content
resolve_structures (bool) – known structure/fields addresses will be replaced by their structure/field name
- Return type:
str
- hex(exclude_off=False, exclude_disp=False, exclude_reg=False, exclude_imm=False)
Get the masked out basic block hex bytes, e.g.
558BEC68????????8374??45..
- Parameters:
exclude_off (bool) – exclude absolute offsets in valid instructions, e.g.
mov eax, off_15bf34
exclude_disp (bool) – exclude displacements in valid instructions, e.g.
mov eax, [ecx+128]
exclude_reg (bool) – exclude registers in valid instructions, e.g.
push eax
exclude_imm (bool) – exclude immediates in valid instructions, e.g.
mov eax, 0x1223
- Return type:
str
- code: bool
True iff the basic block contains code
- data: bool
True iff the basic block contains data
- entry: bool
True iff the basic block was an entry node in the CFG reconstruction algorithm
- exotic: bool
True iff the last instruction of the basic block is a call
- incoming: List[BasicBlockEdge]
list of incoming edges going to this basic block
- outgoing: List[BasicBlockEdge]
list of outgoing edges departing from this basic block
- inrefs: List[malcat.Reference]
list of all data and code (a list of
malcat.Reference
objects) that references the start of this basic blockbb = analysis.cfg[analysis.v2a(0x18000127c)] for inref in bb.inrefs: print(f"basic block {analysis.ppa(bb.start)} is referenced by {analysis.ppa(inref.address)} ({inref.type})") >> "basic block 0x18000127c (sub_18000127c) is referenced by 0x1800015f3 (sub_1800014c4+12f) (Type.CODE)"
- outrefs: List[malcat.Reference]
list of all data and code (a list of
malcat.Reference
objects) referenced by any code instruction of this basic blockbb = analysis.cfg[analysis.v2a(0x18000127c)] for outref in bb.outrefs: print(f"An instruction of basicblock {bb} references {analysis.ppa(outref.address)}") >> "An instruction of basicblock <malcat.BasicBlock object at 0x000000000A95B670> references 0x18002bc58 (.data:3c58)" >> "An instruction of basicblock <malcat.BasicBlock object at 0x000000000A95B670> references 0x18000129b (sub_18000127c+1f)"
Edges
An edge links two basic blocks and is represented by a BasicBlockEdge
python object:
- class malcat.BasicBlockEdge
- address: int (effective address)
for incoming edges, this is the source address, for outgoing edges this is the target address
- type: BasicBlockEdge.Type
edge type
- conditional: bool
True iff there is a condition attached to the edge. Condition interface will be added later.
- class malcat.BasicBlockEdge.Type
This enum describes the type of
BasicBlockEdge
- CALL
a call edge
- JUMP
a jump edge, conditionnal jump or (in)direct jump
- EXCEPTION
flow switch because of exception structure (try->catch, try->finally or catch->finally)
- STEP
links to contiguous basic blocks, no jump / call, just normal execution flow