Page 3 of 3 FirstFirst 123
Results 31 to 42 of 42

Thread: Setting up a malware analysis environment

  1. #31
    chrisu
    Guest
    Quote Originally Posted by VirusBuster View Post
    Minibis could be seen as not for the masses because it was designed to analyze thousands of malwares but the same it can analyze 25k samples it can analyze 1.
    Correct!

    Quote Originally Posted by VirusBuster View Post
    Do malware-researchers really need to analyze thousands of malware samples? I donīt think so. They usually analyze malware samples one by one and manually, using decompilers like IDA or debuggers like OllyDbg.
    It depends on what's needed. Usually a researcher initially doesn't look into the code - that's what Ida, Olly, etc. are for. First you just want to know in a quick way with what you're dealing with. Then, if it's necessary or relevant you might take a look at the code.
    Btw., IDA is no decompiler, though hexrays are selling an decompiler-plugin for their Disassembler/Debugger IDA Pro. So, don't merge this up.

    Quote Originally Posted by VirusBuster View Post
    Do advancer users have 25k malware samples? Donīt think so, but even if they do... do they need to analyze them? Again, I donīt think so. I donīt see a reason for that.
    I wouldn't want to tell the advanced user that is really interested in my tool what to do.

    Quote Originally Posted by VirusBuster View Post
    Being realistic mass malware analysis tools are intended for antivirus companies that need to filter between the big amount of files they receive to discard between harmless and potentially dangerous files. Checking all they get one by one would be impossible nowadays.
    That's just ONE scenario where mass analysis can make sense. There are way more than this. How do you think are Malware trends identified. How do you think statistical data is produced for list i.e. the top-ten of Windows autostart-possibilities used by malware-authors. How do you think it's possible to find out eventually other malware that seem to be created by the same developer, or the same frameworks/tools. And so on - there are really a lot.

    Quote Originally Posted by VirusBuster View Post
    If an antivirus-company must do mass malware analysis, on what option will they rely? Probably in their own solution or on a professional solution like Norman Sandbox Analyzer.
    That really depends on too much influences.

    Quote Originally Posted by VirusBuster View Post
    So I think a good question is: Is there a "market" for public malware analyzers? It exists but itīs very very little.
    There is a market - that's it's big I never mentioned.

    Quote Originally Posted by VirusBuster View Post
    Then who will be using public malware analyzers? Mainly advanced users, not malware researchers because they donīt need that neither antivirus companies because they will use or their own solution or a professional one.
    Mainly CERTs - that's why I made it public. It's a common approach in the CERT community to share instrumentation.

    Quote Originally Posted by VirusBuster View Post
    So in my opinion the scope of the publicly available malware analysis tools (mass analyzer or not) are the advanced users.
    No, see above.

    Quote Originally Posted by VirusBuster View Post
    I will not comment about CERTs because I donīt really know if they process big amounts of samples or they mainly work with honeypots.
    I'm from a national and government CERT, I guess I know what our branch is doing. ;-)
    And to answer your question: They do, one more the other less, that depends on many things.

    Quote Originally Posted by VirusBuster View Post
    How do most of the advanced users prefer to make malware analysis? Probably using online malware analyzers like Anubis, ThreatExpert, JoeBox, etc. Why? I think because they are afraid of possible infections so they are safe using online tools.
    That's correct for the normal advanced users. But for CERTs and AV-vendors from time to time, too, there are periodicly scenarios where nothing is allowed to become public - so, no Anubis and so.

    Quote Originally Posted by VirusBuster View Post
    From the advanced users that donīt mind hosting a malware analyzer, what do they prefer: a Linux or a Windows based malware analyzer tool? Windows, of course, because they want to check if a program is trustable to later install it in their system. Having to make the analysis under Linux to analyze a Windows application is not practical for them.
    I don't really care a lot regarding this, as they are not my main-constituency. I just decided to let also the public (not CERT or researcher) guy participate in my work.

    Quote Originally Posted by VirusBuster View Post
    For all the above reasons is why I think malware analysis tools must be hosted under Windows. The few persons (letīs be realistic, probably just the 1 or 2% of computer users use them) that will use that kind of tools work with Windows.
    You're still merging up two different things. Instruments for fast analysis of lots of samples; and indepth code-analysis (mainly) on Windows PE files (executables).

    Cheers,
    Chrisu.
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  2. #32
    Quote Originally Posted by chrisu View Post
    It depends on what's needed. Usually a researcher initially doesn't look into the code - that's what Ida, Olly, etc. are for. First you just want to know in a quick way with what you're dealing with. Then, if it's necessary or relevant you might take a look at the code.
    Btw., IDA is no decompiler, though hexrays are selling an decompiler-plugin for their Disassembler/Debugger IDA Pro. So, don't merge this up.
    I meant disassembler not decompiler, sorry.

    Then we must difference between independent malware-researchers and malware-researchers working for antivirus-companies.

    The independent malware-researcher doesnīt need the quick way to check with what heīs dealing. Most of the time he will work on samples already known to be malware.

    The malware-researcher working for an antivirus-company will receive samples already filtered by the own malware analyzer tool or the third part professional tool.

    Quote Originally Posted by chrisu View Post
    That's just ONE scenario where mass analysis can make sense. There are way more than this. How do you think are Malware trends identified. How do you think statistical data is produced for list i.e. the top-ten of Windows autostart-possibilities used by malware-authors. How do you think it's possible to find out eventually other malware that seem to be created by the same developer, or the same frameworks/tools. And so on - there are really a lot.
    Who does that work? Antivirus-companies.

    What tools are being used to do such work? Internal tools or professional ones like the ones developed by Zynamics.

    Quote Originally Posted by chrisu View Post
    Mainly CERTs - that's why I made it public. It's a common approach in the CERT community to share instrumentation.
    How do CERTs get samples?

    Quote Originally Posted by chrisu View Post
    You're still merging up two different things. Instruments for fast analysis of lots of samples; and indepth code-analysis (mainly) on Windows PE files (executables).
    Iīm talking about malware analysis tools producing results with human intervention or automatically and independently if they can process lots of samples or only one at a time. And now Iīm discussing if itīs better to build them under Linux or Windows depending of the people that may use them.

    With the indepth code-analysis (mainly) on Windows PE files (executables) I meant that the people doing that work doesnīt need of tools like Minibis or BSA. That people work on samples already filtered.
    Last edited by VirusBuster; May 12th, 2010 at 13:24.

  3. #33
    Teach, Not Flame Kayaker's Avatar
    Join Date
    Oct 2000
    Posts
    4,047
    Blog Entries
    5
    Adding..

    5 Steps to Building a Malware Analysis Toolkit Using Free Tools

    http://zeltser.com/malware-analysis-toolkit/

  4. #34
    <script>alert(0)</script> disavowed's Avatar
    Join Date
    Apr 2002
    Posts
    1,281
    Quote Originally Posted by VirusBuster View Post
    What tools are being used to do such work? Internal tools or professional ones like the ones developed by Zynamics.
    Doubtful, as Zynamics's VxClass doesn't scale to real-world scenarios.
    Last edited by disavowed; June 12th, 2010 at 11:45. Reason: Clarification (thanks, wtbw :)

  5. #35
    Some time ago they wrote me telling they were working on scalating to match real-world scenarios. Donīt know how much they have advanced on the task since then.

  6. #36
    <script>alert(0)</script> disavowed's Avatar
    Join Date
    Apr 2002
    Posts
    1,281
    I haven't seen a live demo since VxClass came out, so it's entirely possible that it does scale to millions of samples now. Not sure how often Halvar reads this forum, but would be nice to get some input from him on this.

  7. #37
    Quote Originally Posted by disavowed View Post
    I haven't seen a live demo since VxClass came out, so it's entirely possible that it does scale to millions of samples now. Not sure how often Halvar reads this forum, but would be nice to get some input from him on this.
    Ah, cool, thanks for notifying me.

    When we speak about scalability, we have to look at two angles: Processing the stream of incoming samples, and processing the set of legacy samples accumulated over time.

    I will write a bit about stream processing here:

    A rough measure of how many files you need to process per day is roughly 40k or so (measured by MD5sum).

    When it comes to processing legacy samples, were quickly speaking millions of files, but I will talk about this later.

    So, core point is: We have spent the last year distributing VxClass, and we now regularly run a compute cluster where we process approx 1k executables per compute-node per day -- right now we run an 8-machine cluster. That puts us at processing 20% (8000 files) of the malware that needs to be processed on a (computing) budget of roughly 800 USD / month. Scaling further would be no problem from our end, but the database server that we're using tends to corrupt tables if we try to push it further (sigh, it seems to be hard to write decent software).

    Now, a second point to consider: The full VxClass run is meant for -correlation- -- e.g. it is designed to be favour accuracy over speed. There are two (trivial) tricks to push performance higher:

    1. Disable expensive comparisons if approximate comparisons yield high similarity -- VxClass contains a -very- fast approximate comparison that is used to schedule more expensive comparisons. If the approximate comparison detects high similarity, this is essentially sufficient for the AV scenario -- not necessarily for the correlation scenario though.
    2. Use the automated signature generation for pre-filtering. VxClass can automatically generate "smart" signatures (which is AV-speak for byte signatures with wild cards -- AVs use a lot of hashes these days, making wildcards seem "smart"). The way this works is that we generate byte signatures on the fly, and then only perform expensive comparisons on those executables -not- matched by existing signatures.


    Summary: Give me 40 machines and a decent database server, and I will make VxClass process 40k samples / day.

    Now, an interesting question comes up: What do most AV labs actually do ?

    From what I could gather, their approach is essentially a combination of behavior-monitoring and hash generation -- e.g. they run large farms of virtualized environments, inject malware, and then observe behavior. If the behavior is bad, they add a "hash signature" to their signature DB and roll it out.

    Kaspersky seems to be doing something that has relationships to image processing (they seem to use a lot of GPU code), the details are sketchy tho.

    Regarding dealing with the 30m or 60m legacy samples: The actual quantity of samples that you need to work on is going to be -drastically- less: Process the first 10k, generate signatures, sieve out from the 60m those caught by the signatures, repeat. I would be surprised if you need to perform expensive processing on more than 1m files.

    I hope this clarifies a bit. A lot of the issues we are wrestling with are those induced by our small size: An AV company spends more on executive cab fares per month than what we can afford to spend on computation. Most large AVs have -hundreds- of machines processing incoming malware, we have 8

    Cheers,
    Halvar

  8. #38
    <script>alert(0)</script> disavowed's Avatar
    Join Date
    Apr 2002
    Posts
    1,281
    Thanks for the response, Halvar!

  9. #39
    Thank you very much for the detailed explanation!

  10. #40
    Teach, Not Flame Kayaker's Avatar
    Join Date
    Oct 2000
    Posts
    4,047
    Blog Entries
    5
    A new blog post by Lenny Zeltser summarizing the topic:

    How to Get Started With Malware Analysis

    http://blogs.sans.org/computer-forensics/2010/11/12/get-started-with-malware-analysis/

  11. #41
    Yup, and quoting our board =)
    Please consider donating to help Woodmann.com staying online (here is why).
    Any amount greatly appreciated. Thank you.

  12. #42
    That was nice of Lenny .

    Woodmann
    Learn Or Die.

Similar Threads

  1. Replies: 12
    Last Post: December 24th, 2009, 11:34
  2. SANS malware analysis article
    By Kayaker in forum Malware Analysis and Unpacking Forum
    Replies: 0
    Last Post: May 4th, 2009, 18:17
  3. Virtual environment to test CIH (A.K.A Chernobyl) virus?
    By neo85 in forum Malware Analysis and Unpacking Forum
    Replies: 12
    Last Post: February 29th, 2008, 21:04
  4. Setting breakpoints in dll's
    By Solus in forum OllyDbg Support Forums
    Replies: 1
    Last Post: November 19th, 2005, 12:14

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •