1. Control Flow Deobfuscation Part 3

    Now we have:

    So to deobfuscate we only need one more thing:
    • A way to order the vertices

    Interestingly, we could also make a obfuscator in this way. All we have to do is put the vertices in a random order.

    So to deobfuscate we can't just use any ordering. We have to use one that is 'intuitive' and 'natural'. Unfortunately these words mean nothing to a computer; we'll have to specify them more carefully.

    Let's start simple. In which order would you place these vertices if you wanted to deobfuscate?

    I think we can agree that 1-2-3 is he right order.
    3-1-2 is bad because we have to start with 1: this tells everyone that the function starts there.
    1-3-2 is bad too, but why? For one it creates more branches than necessary, but I think this is not the reason it is hard to read. I think the reason is that we go backwards from 2 to 3. Most of us like to read in one direction: from top to bottom or from left to right.

    Another one:

    The right order here is 1-2-4-3 or 1-4-2-3.
    1-2-3-4 is wrong because 4 then goes backwards to 3.

    We can say: the order is right iff for all edges, the start of the edge comes before the end of the edge. This ordering is known as a topological ordering. There are various ways to compute it; it is important to remember that one is the reverse postorder of a depth-first traversal.

    This might all sound like gibberish to you, so feel free to take a pause and look up some of the underlined words. Here's a little pseudocode example for the reverse postorder:
        order = []
        for each child in vertex.edges
            order = reverse_postorder(child) + order
        order = vertex + order
        return order

    Well, so far so good. But what to think of this one:

    I'd say that the best ordering is 1-2-3-5-4. Some compilers order this as 1-5-2-3-4. Even 1-3-5-2-4 isn't unreasonable. But according to our ordering rules, they are all wrong.

    If your graph has a cycle, it isn't possible to get a perfect ordering. We can't have 2 before 3, 3 before 5 and 5 before 2 at the same time, so we'll have to go backwards at least one time.

    Something like 1-4-2-3-5 is even more wrong though. Now we have two backward edges, 3 to 4 and 5 to 2.

    One way to solve the problem is like this:

    We contract the vertices of the strongly connected component (a generalization of cycles) into one "supervertex". Now we order that and get 1-I-4. Then we order I: first we pick one vertex to start, I like to get the one that has most edges to it from outside I, but it doesn't matter much. In this example we pick 2. Then we order the rest (and if we find sub-SCC's in I we recurse). So here we get 3-5. Then the final order is 1-2-3-5-4.

    Now thinking that out is one thing, implementing it is another. The most obvious problem is how to find the SCC's. Luckily some very smart people already figured it out for us. We'll use Tarjan's strongly connected components algorithm. It is a bit hard to understand, but explains it very clearly.

    So, are we done now? Is the following algorithm enough:
    1. Find all SCC's
    2. Collapse them into supervertices
    3. Order the resulting graph
    4. Apply this algorithm to all supervertices in the result

    Yes, in a certain sense. It works fine, but it's quite a lot to code (especially in C) and it's pretty inefficient.

    / this is the clever part
    Now both Tarjan's algorithm and the reverse postorder are based on depth first search. So maybe it's possible to combine them? It turns out we can indeed do this and save ourself a lot of work. The only modification we need to do is:
    • Whenever we find an SCC, we check if it is trivial (one vertex). If so, we add it to the order. Else we order the SCC and put the result in the order.

    So the final algorithm is:
    get_order(first_vertex, vertices_to_consider):
        /* Define some variables */
        order = []
        stack = empty_stack
        cur_dfsnum = 0
        index = []
        low = []
        /* Nested function, has access to above variables (closure) */
            index[cur_vertex] = cur_dfsnum
            low[cur_vertex] = cur_dfsnum
            push cur_vertex on stack
            for each child in cur_vertex.edges where child in vertices_to_consider
                if index[child] == "To be done"
                    low[cur_vertex] = min(low[cur_vertex], low[child])
                else if index[child] == "Done"
                    /* Do nothing */
                    low[cur_vertex] = min(low[cur_vertex], index[child])
            if low[cur_vertex] == index[cur_vertex]
                /* we found an SCC */
                scc = []
                    popped = pop from stack
                    scc += popped
                    index[popped] = "Done"
                until popped == current_vertex
                if scc.length == 1
                    order = cur_vertex + order
                    order = get_order(choose_first(scc), scc) + order
        /* visit() ends */
        for vertex in vertices_to_consider:
            index[vertex] = "To be done"
        /* Special-case the start vertex to prevent infinite recursion */
        index[first_vertex] = "Done"
        for each child in first_vertex.edges where child in vertices_to_consider
            if index[child] == "To be done"
        order = first_vertex + order
        return order
    I omitted the choose_first function because it isn't important.

    Now we have every part, putting it together is simple:
        already_done_vertices = empty
        cfg = makecfg(bytecode)
        order = get_order(cfg, already_done_vertices)
        return rebuild_from_order(cfg, order)

    Well that's it mostly. I wanted to show how to do this so maybe someone doesn't have to spend weeks figuring this stuff out and because I think
    reverse postorder + Tarjan's strongly connected components algorithm = approximate topological ordering
    is a very nice insight.

    P.S. Sorry, my tool to do this for .NET is currently private because I'm not interested in an arms race. The ideas behind it are more important anyway
  2. Control Flow Deobfuscation Part 2

    In part one I explained how to get a CFG from bytecode, and today I'll tell you about the inverse operation.
    Well, actually there are two parts to it:
    1. Decide in which order you place the vertices
    2. Put them in that order and insert the correct branches

    I'll discuss point 1 in part three, and point 2 now.

    It may sound trivial to do this: just get the data for the first vertex, then the branch for the first vertex etc. However this doesn't work because to create the branch we need to know the position of its targets. But for the position of the targets we need to know how big the branch is!

    To create an optimal solution where all branches are as small as possible we use a simplification of Robertson's algorithm. The original paper is hard to find but there is good documentation of it in the source of YASM ( ). Basically what we do is assume all branches are 0 bytes1, then increase the size for each branch that doesn't fit and repeat until we find a fixed point where all branches fit.

    We can skip some sizes that are impossible, for instance there exists no 1 byte branch in .NET. However a 4 byte branch is possible with two 2 byte branches ("jnz xxx; jmp xxx"). Also we don't have to recalculate every branch if one doesn't fit, only the ones that are affected by the increased branch size; but since I saw little performance increase (surprisingly), I chose to use the simpler version.

    Now that I have explained everything, here is the pseudocode:
    rebuild_from_order(cfg, order):
        for vertex in order
            size_of_branch[vertex] = 0
            length = 0
            for vertex in order
                start[vertex] = length
                length += vertex.length + size_of_branch[vertex]
            has_changed = false
            for vertex in order
                for edge in vertex.edges
                    delta[edge] = start[edge] - (start + length(vertex.bytecode))
                if not branch_fits(vertex.branchtype, delta, size_of_branch[vertex])
                    has_changed = true
                    while not possible_branch_size(size_of_branch[vertex])
        while has_changed
        bytecode = empty
        for vertex in order
            length = 0
            for vertex in order
                start[vertex] = length
                length += vertex.length + size_of_branch[vertex]
            for edge in vertex.edges
                delta[edge] = start[edge] - (start + vertex.length)
            bytecode += vertex.bytecode
            bytecode += create_branch(vertex.branchtype, delta)
        return bytecode
    See you in part 3 which should be ready in 2009

    1 This is possible if it's an unconditional branch with a delta of 0 bytes: "jmp next; next:" is the same as "".
  3. Control Flow Deobfuscation Part 1

    Control Flow Obfuscation is one of the biggest protection mechanisms used in commercial protections for intermediate languages like IL (.NET) and Java bytecode. It is also used a bit in some native protectors, mostly to hide their own code, but I won't go into that. I'll use pseudocode for the examples.

    The basic idea behind control flow obfuscation is that spaghetti code is hard to read. So the protectors transform normal code into spaghetti code, with lots of spurious branches. You can repair this by hand, but as functions get bigger this becomes a huge PITA. So we must write a program to do it for us.

    I'll assume we have access to the bytecode in the program. Also we'll make use of some primitive operations on this bytecode. These problems are straightforward to tackle, all you need is the documentation and some (much ) time.

    To reorganize the function we will use a two-step approach:
    1. Make a Control Flow Graph from the bytecode.
    2. Make bytecode from the CFG.

    You might think "That doesn't do anything!", but the trick is that we read in bytecode that is obfuscated and write out bytecode that is simple to read. Both form the same control flow graph, so the meaning is not changed.

    English example:
    • Obfuscated: "The bike, that is owned by John, who has blue eyes, needs to be repaired."
    • Graph:
    • Simplified: "John has blue eyes. His bike needs to be repaired."

    Of course this isn't a perfect analogy but I hope you see what I mean.

    Building the CFG
    Let's see what we want from the CFG:
    • Have vertices with some amount of bytecode
    • Have edges that show the connection between vertices.
      • E.g. if vertex 1 branches to vertex 2 we want an edge from 1 to 2. Note that this edge has a direction: it doesn't go from 2 to 1.
      • We need a label on the edge that shows the type of connection. For instance if vertex 1 branches to vertex 2 if some condition is true, and to vertex 3 if it is false, we would have this graph:
    • Every vertex has one entry point and one exit point.

    (if this is all gibberish for you, please read up on graphs before continuing)

    The last point ensures our CFG is a proper Directed Graph.
    In practice you'll want to keep the edge information with the vertex it comes from, but this changes nothing to the basic ideas.

    With this info, we can make a simple recursive function to construct the CFG:
    already_done_vertices = empty
    cfg_of_function = makecfg(entry_point_of_function)
    function makecfg(current_instruction):
        vertex myvertex
        if current_instruction is in already_done_vertices
            myvertex = the appropriate vertex in already_done_vertices
            myvertex.bytecode   = bytecode of current_instruction
            myvertex.branchtype = branch type of current_instruction
            myvertex.edges    = empty
            add myvertex to already_done_vertices
            for each target of current_instruction
                add makecfg(target) to myvertex.edges
        return myvertex
    Note that the bytecode of a branch is nothing. Its only significance is in the branchtype and the targets. Most instructions have a branch type of "always" with a target that is the next instruction. The branch type relates to the targets, f.i. "top_of_stack != 0" may go to the first target if true, or the second target if false.

    Of course in a real implementation you want to keep the number of vertices and edges low, so you lump together as many instructions as you can without breaking the rules, splitting them when necessary.

    Well, that's it for now, in part 2 (and maybe 3) I'll show how to make bytecode from the CFG.
  4. Serials and hashing

    In a recent target I encountered a hash function in the serial algorithm. It went something like this:
    checksum_hash(string tmp)
      checksum = 0;
      for each char in tmp
        checksum += char;
      checksum = checksum % 9 + 1;
      do checksum times
        tmp = string(md5(tmp));
      return tmp;
    Apparently just hashing it once wasn't enough The funny thing however was that this hashing provided absolutely no security. It was used like this:
      namehash  = checksum_hash(name);
      producthash = checksum_hash("ProductName");
      for(i = 0; i < 4; i++)
        key[i] = "ABCDEFGHJKLMNPQRSTUVWXY123456789"[
          (namehash[i] + producthash[i]) % 32
      return checksum_hash(serial1) == checksum_hash(key);
    The last line is true when serial is the same as key. We can calculate key when we know both namehash and producthash, and we know those because we know the product name and the name of the user.

    The hashing adds no security because if we know something we know its hash too. The strength of a cryptographic hash lies in the reverse: if we know the hash, we don't know which input can make that hash. The only good way to use a hash as the main protection is thus to take the hash of something we don't know, like the serial, and compare it to hardcoded hashes.

    P.S. The target had another check on another part of the serial, but it was easily defeated too