Page 1 of 2 12 LastLast
Results 1 to 15 of 21

Thread: Static Disassembly - Best way forward

  1. #1
    Registered User
    Join Date
    Apr 2011
    Posts
    78
    Blog Entries
    1

    Static Disassembly - Best way forward

    Hey Guys,
    I've been doing some malware analysis over the last few months. Have read quite a bit, stepped through a lot of Lena's stuff, done quite a few crack me's and challenges, analyzed quite a few samples of malware, blogged about my learnings etc. So I'm sort of getting the hang of things. But now, I feel that I am stuck again.. what's new eh?? . So here's my problem..

    When confronted with a new piece of malware, I tend to do as much dynamic analysis as I can on it to try and understand what happens. I then try and load up the binary in IDA and study the static disassembly. Almost every time, I fail here.. as things get too complicated too fast. So then I load up the same malware in a debugger and then step through it, using IDA just to edit the names of functions...i.e sub 4012c9 now becomes => sub start_of_malware

    This is more fruitful, and I do move forward, but invariably I am able to proceed to the point of understanding around 7 or 8 functions and their purpose... these functions seem to be very similar to the knowledge I gained while doing dynamic analysis. So you can say, I have confirmed what I learnt while doing dynamic analysis...by doing static analysis on sections of the code.

    However, this is in no way complete as I understand, as there are vast parts of the disassembly with plenty of functions still 'un-analyzed'? And since the malware doesn't trigger them directly (something needs to happen).. I'm left with no other choice but to read each function individually and try and understand why its present. And I try doing this, but I get stuck... There have been articles which tell me to change the entry point to the function I want to trigger, but surely they wont run on their own in most cases? There'd be stack arguments and parameters that have to be passed?

    No, its not the assembly which is a problem.. I can "understand" the instructions.. as in I can understand MOV AL,1 .. but why is it there? what is its purpose? .. all that seems a bit fuzzy. I just read a very old Win95 tutorial that told me to rename "variables" as well along with "functions" and I'll try that... but just thought I'd ask here too.

    So what do you guys advise, to try and learn stuff better? Is the way I am doing it the only way? All suggestions are appreciated

    Thnx
    Arvind

  2. #2
    From my point of view, it's more of a philosophical problem you may have, not a technical one ;)

    I may be wrong on this, as I'm an entirely self-tought person, but you just may need to train your brain to help you subconciously understand the big picture in the listing. You can do it only by experience ;).

    You say you understand the mnemnonics, but fail at understanding the algorithm; that's perfectly OK. It's the same thing when in school you had to figure out a mathematical problem which could be pretty hard to solve, even if you knew the basic algebra syntax (+, -, /, etc). But when you get a hang of it, it becomes easier, not because you understand the syntax better, but the brain will hint you with more solutions, and one of them - after verification by conciousness - can be the solution you seek.

    The answer could be that it's the ability to read the source code and interpret it in your mind. When I'm looking at dead listing, I'm imaginating the higher level language's constructs. The are also some patterns that some compilers use to generate the code - recognizing these patterns could be useful to understand the higher abstraction. For example, if I'd trace into the code used to invoke a virtual function of a class derived from a parent abstract class, I'd be lost for a long time. But since I know the pattern, it's easier for me to understand what structures are used, what are the tables doing, why there are some pointers loaded into some registers, etc. I also search what could be a side effect of an expression. I read somewhere that a good technique is to pretend you're writing the code yourself -- this way you can think of what you need, only to see that further instructions of the code are doing a similar thing. I think the key is to be able to recognize the patterns; which is, after all, what the human's brain designed to do ;)

    That said, I'm no expert in reverse engineering; I've done it alot few years ago, had a long break without doing it, only to see that my skills related to disassembly interpretation have degraded, to the point that I can see the difference in the convinience of reading assembly code. Of course this degradation itself is a learning experience as well, so I'm not ranting about it ;)
    Last edited by ioactivity; September 22nd, 2011 at 02:27. Reason: typo

  3. #3
    Registered User
    Join Date
    Apr 2011
    Posts
    78
    Blog Entries
    1
    Thanks ioactivity. That's an interesting post. Train the brain. So if I break down your post, I should:

    a) Keep working (obviously)
    b) Try and convert as many functions into high level code as possible
    c) Keep identifying patterns, so b) becomes easier to do

    One thing I wasn't clear on though, was that you mentioned I should think of how I would write stuff. Now, its primarily malware I'm analyzing and I really don't know what I'm looking at - so its kind of tough to think on those lines. As in, if I knew --- 'This function connects to malware.xxx.com' .. I could look at rewriting pesudo high level code from assembly... that's cool. But, if I have no clue at all.. how do I do it? I hope I was clear

    All the same, some good points.. thank you.

  4. #4
    Teach, Not Flame Kayaker's Avatar
    Join Date
    Oct 2000
    Posts
    4,129
    Blog Entries
    5
    Hi

    ioactivity makes some good points. It is really all about Zen after all


    But Yeah, sometimes you just end up spinning your wheels in the mud wondering what a particular function does if it's not actually called during normal execution. One thing that might be useful for at least some functions, but I haven't tried as yet, is the IDA Appcall feature.

    http://www.hexblog.com/?p=112
    http://www.hexblog.com/?p=113

    Here's another example that uses IDAPython rather than Appcall to call a self-contained function

    Calculating API hashes with IDA Pro
    http://www.hexblog.com/?p=193



    As for naming variables - YES, always, even if it's to something like "wtf_is_this_var".

    I find it also helps to increase the number of XREFS from the default to a _much_ larger number. That way you can more easily see if one of your defined functions pops up as a reference to another function. You can create a /cfg/idauser.cfg file to set your own default options for all new disassemblies, i.e.

    Code:
    // /cfg/idauser.cfg
    
    SHOW_XREFS = 60         // Show 60 cross-references (the rest is accessible by Ctrl-X)
    
    MAX_NAMES_LENGTH = 50   // Maximal length of new names (you may specify values up to 511)
                                            
    OPCODE_BYTES = 6

    Of course one of the most useful things is if you are able to resolve API's and decrypt strings. At that point you can almost read the disassembly like a text book (at least it helps). That's where things like calculating API hashes or coming up with an IDC script to decrypt strings come in useful, IF the malware makes it easy enough for you to do that, not all are that accomodating.

    Actually, the Lenny Zeltser's malware challenge that you brought up in an earlier thread is a good example of that. It uses ROL 7 for an API hash, and there just happens to be a WinDbg extension which can deal with that in a standalone manner (one of the MSEC Crash Analyzer Debugger Extension commands). Several API's are called using a fancy-ass MapViewOfSection technique, but big deal, all you have to do is plug in the hash to the WinDbg extension and update your static disassembly with the API name.

    So too, the string decryption routine, while effective, can be resolved with an IDC script. At this point the malware is pretty much laid open, and even if a function isn't called during normal tracing, you can guess what it does.

    And then other times, you just spin your wheels in the mud...

    Cheers,
    Kayaker

  5. #5
    Registered User
    Join Date
    Apr 2011
    Posts
    78
    Blog Entries
    1
    Thanks Kayaker. Will do variable names and Xrefs in IDA..cool. I'm sure that will help a little...

    Now the rest of the stuff, Python inside IDA etc.. is OTT at the moment for me. I've read the IDA PRO book but only till Chap 7 or 8, where it teaches me how to navigate around IDA. And then, it all started glazing over .

    My point being...I get what you're saying about IDA scripting.. but I feel I've not yet reached, even a level, where I can say, hey.. this needs a script... and then I can start learning what I need to write a script. Like for e.g There was this challenge on Osix.net..level 5.. to find a serial with a specific hash ONLY. So I knew that doing it manually is stupid and I must code.. so i wrote a simple Perl script and I was done. So I know.. in that case.. that I must script... I can visualize the problem....and at least 'know'.. this is what I need to do... here .. its not that straightforward always...

    So after reading that, would you say I just need to keep spinning my wheels in the mud, 'till' I get to the level I need to get to.. to script.. or is there a middle path?

    Arvind

  6. #6
    Super Moderator
    Join Date
    Dec 2004
    Posts
    1,513
    Blog Entries
    15
    grow dynamically static is stagnant

    well jokes apart like io and kayaker posted you need to keep spinning the wheel in the mud till your fingers start making pots

    and once you become potter (not harry) every bit of mud will look like a pot

    btw ollydbg too has the ability to label and comment and there were utilities (ollysync i think ) that can transfer olly labels and comments back to ida for a better picture

  7. #7
    Registered User
    Join Date
    Apr 2011
    Posts
    78
    Blog Entries
    1
    All right then blabberer.. will keep spinning them and keep coming here when they dont take any shape... let alone a pot

  8. #8
    if I can give a suggestion, try to use olly on a monitor and IDA on another.

    take note of some of the procs/addresses you enter into when doing dynamic analysis, then check out in IDA/create code at such addresses, and start your analysis in IDA from there.
    you will speed up your analysis this way by examining only 'taken' ways.
    Sometime you fight with dynamic-encrypted code, so there you have basically 2 approach:

    1) grab the encryption scheme, do your IDA script and decrypt.
    2) do a partial dump right after the interesting code had been decrypted, and check it there.

    Plan B, if you're decent coder:
    ...grab executed instructions from olly trace, and write a simple IDA plugin that allows you to better view/follow such code.
    I want to know God's thoughts ...the rest are details.
    (A. Einstein)
    --------
    ..."a shellcode is a command you do at the linux shell"...

  9. #9
    Registered User
    Join Date
    Apr 2011
    Posts
    78
    Blog Entries
    1
    Thanks Maximus... No luxury of 2 monitors just yet. Maybe sometime in the future .. But I hear what you say.

    When you say 'taken ways' though.. do you mean, all the paths of code that are visible to me during dynamic malware analysis and stepping through code while the malware is running? So for e.g If there are 20 subroutines and the malware uses only 8 of those... as I understand... you're saying..focus on those 8 only to start with.. that'll speed things up? rt?

    Will keep your suggestions for encrypted code in mind..when I hit that type and have to work.

    I code ok but primarily in Perl and Ruby. Time to learn Python then. Both Immunity and IDA seem to have Python support.

    Thanks though!!

  10. #10
    Super Moderator
    Join Date
    Dec 2004
    Posts
    1,513
    Blog Entries
    15
    Maximus We Are In Virtual World
    no need for two monitors set up a vm say vpc2007
    run ollydbg in host
    run ida in vm

    run kernel debugger that debugs ida debugging your debugee in host and dance through all
    Attached Images Attached Images  

  11. #11
    Personally I also find multi-monitor setup as more productive. It's very convenient if I can open some docs on one monitor, code on the second monitor, let go of the mouse and keyboard, and just do some thinking without clicking anything ;)

    @live_dont_exist: By the "pretend you wrote it yourself" I didn't mean to try and think about writing the whole procedure, because - as you said - without interpreting it's impossible to say what's inside. It's a more subtle approach, also based on pattern matching. For example, if you see an OUT instruction to 0CF8h/0CFAh port, and some bit shift and manipulation slightly above, which uses some local stack variables, it could mean that you are looking at some write_to_pci() procedure. This, in turn, could mean that you're looking at some PCI device enumeration sequence. So what would you need to write such procedure? Some local variables holding PCI device number, function and bus number, some loops to increment these variables, and some read/write functions accessing the ports. Suddenly it becomes more clear why the code has a 0x07 or 0x20 in some of the the CMP instructions -- because maximum number of PCI functions is 7, or devices is 32, so these CMPs are the loop expressions, which decide if the loop continues or breaks. After some time it may look like you just generally reversed whole big procedure without actually interpreting the instructions one by one, but just by flagging the local variables in IDA and seeing how they interact. You can also figure the return value of this function (or the structure it fills out), and see how it is used in the scope above. Then, it may be possible to figure out other functions, based on information you got from the PCI enumeration function - allowing you to describe more structures, return values, local variables. But then again, situations like this aren't very common :), and sometimes it's just plain more effective to just use a debugger, to get a value in some memory address. But it's always good to look at IDA, even if you have debugger running near by ;).

    This probably won't help you much, at least practically, because there are probably different patterns used in regular malware (unless you reverse rootkits), but I hope you get the point :)

  12. #12
    Registered User
    Join Date
    Apr 2011
    Posts
    78
    Blog Entries
    1
    @blabberer: Yes, understood. I'll look at multiple VMs at least as 2 monitors are kind of out for the moment.

    @ioactivity: Got that yes..think logical..not linear is what you're saying... will keep it going

    Thnx
    Arvind

  13. #13
    Registered User
    Join Date
    Apr 2011
    Posts
    78
    Blog Entries
    1
    I felt this thread was very useful and also felt there will be many newbies like me who are struggling with 'information overload' so I wrote a small blog post on the same. It basically just summarizes things I have learnt so far, stuff from this thread and stuff that I learnt after I implemented a few of the suggestions in this thread. I do think this will help many as much as it helped me .

    Here is the link to my blog article - http://ardsec.blogspot.com/2011/09/reverse-engineering-know-your-tools.html

    p.s.. If I'm not allowed to 'advertise' stuff I write, please let me know and I will not paste such links in future and keep the discussion only inside the forum. I'll advertise it elsewhere

  14. #14
    Teach, Not Flame Kayaker's Avatar
    Join Date
    Oct 2000
    Posts
    4,129
    Blog Entries
    5
    p.s.. If I'm not allowed to 'advertise' stuff I write, please let me know and I will not paste such links in future and keep the discussion only inside the forum. I'll advertise it elsewhere
    Informative reversing blogs are always welcome Arvind. Link it in your Signature if you want.

    K.

  15. #15
    Registered User
    Join Date
    Apr 2011
    Posts
    78
    Blog Entries
    1
    Bumping an oldish thread..sorry but there didn't seem to be much sense starting a new one.

    So I have come a little way learning to do static reversing...now today I took up a malware which has a DLL and a kernel driver (all from dynamic analysis). Some very superficial analysis of the DLL in Olly and IDA reveals that there are 7090 functions (IDA Functions Menu). Some are imports from other DLLs so I can ignore those, but that still leaves me with say..6000 possible functions.

    So I can sit and manually use the Debug - Call DLL Export in Olly and struggle my way through all the functions...going mad in the process ... but want to know ... is there a better way to do it?

    Forget DLL...what about "large" EXEs which have a huge number of functions? My blog below seems okay for smaller EXEs with under 100 functions. More than that? How do you do it comprehensively?

    Thanks
    Arvind
    Reversing articles, primarily from a newbie perspective - http://ardsec.blogspot.com

    Latest article written - http://resources.infosecinstitute.com/author/arvind

Similar Threads

  1. Automated Static Malware Analysis with Pythonect
    By ikotler in forum Malware Analysis and Unpacking Forum
    Replies: 0
    Last Post: August 22nd, 2012, 01:42
  2. Replies: 2
    Last Post: May 10th, 2011, 06:52
  3. MIPS Decoding and Disassembly
    By peterg70 in forum Advanced Reversing and Programming
    Replies: 4
    Last Post: July 14th, 2007, 20:28
  4. Calc.exe Disassembly
    By 414B in forum OllyDbg Support Forums
    Replies: 7
    Last Post: May 14th, 2007, 16:42
  5. Art Of Disassembly
    By CuTedEvil in forum Advanced Reversing and Programming
    Replies: 20
    Last Post: November 26th, 2003, 14:59

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •