fr33ke

Control Flow Deobfuscation Part 2

Rate this Entry
In part one I explained how to get a CFG from bytecode, and today I'll tell you about the inverse operation.
Well, actually there are two parts to it:
  1. Decide in which order you place the vertices
  2. Put them in that order and insert the correct branches


I'll discuss point 1 in part three, and point 2 now.

It may sound trivial to do this: just get the data for the first vertex, then the branch for the first vertex etc. However this doesn't work because to create the branch we need to know the position of its targets. But for the position of the targets we need to know how big the branch is!

To create an optimal solution where all branches are as small as possible we use a simplification of Robertson's algorithm. The original paper is hard to find but there is good documentation of it in the source of YASM ( http://www.tortall.net/projects/yasm/browser/branches/yasm-0.7.x/libyasm/section.c#L698 ). Basically what we do is assume all branches are 0 bytes1, then increase the size for each branch that doesn't fit and repeat until we find a fixed point where all branches fit.

We can skip some sizes that are impossible, for instance there exists no 1 byte branch in .NET. However a 4 byte branch is possible with two 2 byte branches ("jnz xxx; jmp xxx"). Also we don't have to recalculate every branch if one doesn't fit, only the ones that are affected by the increased branch size; but since I saw little performance increase (surprisingly), I chose to use the simpler version.

Now that I have explained everything, here is the pseudocode:
Code:
rebuild_from_order(cfg, order):
    for vertex in order
        size_of_branch[vertex] = 0

    do
        length = 0
        for vertex in order
            start[vertex] = length
            length += vertex.length + size_of_branch[vertex]

        has_changed = false
        for vertex in order
            for edge in vertex.edges
                delta[edge] = start[edge] - (start + length(vertex.bytecode))
            if not branch_fits(vertex.branchtype, delta, size_of_branch[vertex])
                has_changed = true
                do
                    size_of_branch[vertex]++
                while not possible_branch_size(size_of_branch[vertex])
    while has_changed

    bytecode = empty
    for vertex in order
        length = 0
        for vertex in order
            start[vertex] = length
            length += vertex.length + size_of_branch[vertex]

        for edge in vertex.edges
            delta[edge] = start[edge] - (start + vertex.length)

        bytecode += vertex.bytecode
        bytecode += create_branch(vertex.branchtype, delta)

    return bytecode
See you in part 3 which should be ready in 2009


1 This is possible if it's an unconditional branch with a delta of 0 bytes: "jmp next; next:" is the same as "".

Submit "Control Flow Deobfuscation Part 2" to Digg Submit "Control Flow Deobfuscation Part 2" to del.icio.us Submit "Control Flow Deobfuscation Part 2" to StumbleUpon Submit "Control Flow Deobfuscation Part 2" to Google

Categories
Uncategorized

Comments