Results 1 to 5 of 5

Thread: Control Flow Deobfuscation Part 1

  1. #1
    King of Redonda
    Join Date
    Jul 2006
    Posts
    109
    Blog Entries
    4

    Control Flow Deobfuscation Part 1

    Control Flow Obfuscation is one of the biggest protection mechanisms used in commercial protections for intermediate languages like IL (.NET) and Java bytecode. It is also used a bit in some native protectors, mostly to hide their own code, but I won't go into that. I'll use pseudocode for the examples.

    The basic idea behind control flow obfuscation is that spaghetti code is hard to read. So the protectors transform normal code into spaghetti code, with lots of spurious branches. You can repair this by hand, but as functions get bigger this becomes a huge PITA. So we must write a program to do it for us.

    I'll assume we have access to the bytecode in the program. Also we'll make use of some primitive operations on this bytecode. These problems are straightforward to tackle, all you need is the documentation and some (much ) time.

    To reorganize the function we will use a two-step approach:
    1. Make a Control Flow Graph from the bytecode.
    2. Make bytecode from the CFG.
    You might think "That doesn't do anything!", but the trick is that we read in bytecode that is obfuscated and write out bytecode that is simple to read. Both form the same control flow graph, so the meaning is not changed.

    English example:
    • Obfuscated: "The bike, that is owned by John, who has blue eyes, needs to be repaired."
    • Graph:
    • Simplified: "John has blue eyes. His bike needs to be repaired."
    Of course this isn't a perfect analogy but I hope you see what I mean.

    Building the CFG
    Let's see what we want from the CFG:
    • Have vertices with some amount of bytecode
    • Have edges that show the connection between vertices.
      • E.g. if vertex 1 branches to vertex 2 we want an edge from 1 to 2. Note that this edge has a direction: it doesn't go from 2 to 1.
      • We need a label on the edge that shows the type of connection. For instance if vertex 1 branches to vertex 2 if some condition is true, and to vertex 3 if it is false, we would have this graph:
    • Every vertex has one entry point and one exit point.
    (if this is all gibberish for you, please read up on graphs before continuing)

    The last point ensures our CFG is a proper Directed Graph.
    In practice you'll want to keep the edge information with the vertex it comes from, but this changes nothing to the basic ideas.

    With this info, we can make a simple recursive function to construct the CFG:
    Code:
    already_done_vertices = empty
    cfg_of_function = makecfg(entry_point_of_function)
    
    function makecfg(current_instruction):
        vertex myvertex
    
        if current_instruction is in already_done_vertices
            myvertex = the appropriate vertex in already_done_vertices
    
        else
            myvertex.bytecode   = bytecode of current_instruction
            myvertex.branchtype = branch type of current_instruction
            myvertex.edges    = empty
            add myvertex to already_done_vertices
            for each target of current_instruction
                add makecfg(target) to myvertex.edges
    
        return myvertex
    Note that the bytecode of a branch is nothing. Its only significance is in the branchtype and the targets. Most instructions have a branch type of "always" with a target that is the next instruction. The branch type relates to the targets, f.i. "top_of_stack != 0" may go to the first target if true, or the second target if false.

    Of course in a real implementation you want to keep the number of vertices and edges low, so you lump together as many instructions as you can without breaking the rules, splitting them when necessary.

    Well, that's it for now, in part 2 (and maybe 3) I'll show how to make bytecode from the CFG.

  2. #2
    Keep 'um coming!

    Regards,
    JMI

  3. #3
    Administrator dELTA's Avatar
    Join Date
    Oct 2000
    Location
    Ring -1
    Posts
    4,206
    Blog Entries
    5
    Nice, looking forward to the upcoming parts (and maybe even a tool in the end? ).
    "Give a man a quote from the FAQ, and he'll ignore it. Print the FAQ, shove it up his ass, kick him in the balls, DDoS his ass and kick/ban him, and the point usually gets through eventually."

  4. #4
    Give dELTA a new tool and he is a happy, happy guy!

    Regards,
    JMI

  5. #5
    nice :P

Similar Threads

  1. .NET Deobfuscation
    By acdwrp in forum The Newbie Forum
    Replies: 2
    Last Post: July 19th, 2011, 20:11
  2. Control Flow Deobfuscation via Abstract Interpretation
    By OpenRCE_RolfRolles in forum Blogs Forum
    Replies: 0
    Last Post: July 9th, 2011, 03:23
  3. Control Flow Integrity: Some interesting papers
    By Piotr Bania Chronicles in forum Blogs Forum
    Replies: 0
    Last Post: May 17th, 2011, 21:18
  4. Control Flow Deobfuscation Part 3
    By fr33ke in forum Blogs Forum
    Replies: 2
    Last Post: July 16th, 2008, 09:44
  5. Control Flow Deobfuscation Part 2
    By fr33ke in forum Blogs Forum
    Replies: 1
    Last Post: April 28th, 2008, 04:21

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •