Welcome to the new Woodmann RCE Messageboards Regroupment
Please be patient while the rest of the site is restored.

To all Members of the old RCE Forums:
In order to log in, it will be necessary to reset your forum login password ("I forgot my password") using the original email address you registered with. You will be sent an email with a link to reset your password for that member account.

The old vBulletin forum was converted to phpBB format, requiring the passwords to be reset. If this is a problem for some because of a forgotten email address, please feel free to re-register with a new username. We are happy to welcome old and new members back to the forums! Thanks.

All new accounts are manually activated before you can post. Any questions can be PM'ed to Kayaker.

Setting up a malware analysis environment

This forum focuses on analyzing malware and any aspects of dealing with packer protections.
chrisu

Post by chrisu »

VirusBuster wrote:Minibis could be seen as not for the masses because it was designed to analyze thousands of malwares but the same it can analyze 25k samples it can analyze 1.
Correct!
VirusBuster wrote:Do malware-researchers really need to analyze thousands of malware samples? I don´t think so. They usually analyze malware samples one by one and manually, using decompilers like IDA or debuggers like OllyDbg.
It depends on what's needed. Usually a researcher initially doesn't look into the code - that's what Ida, Olly, etc. are for. First you just want to know in a quick way with what you're dealing with. Then, if it's necessary or relevant you might take a look at the code.
Btw., IDA is no decompiler, though hexrays are selling an decompiler-plugin for their Disassembler/Debugger IDA Pro. So, don't merge this up.
VirusBuster wrote:Do advancer users have 25k malware samples? Don´t think so, but even if they do... do they need to analyze them? Again, I don´t think so. I don´t see a reason for that.
I wouldn't want to tell the advanced user that is really interested in my tool what to do.
VirusBuster wrote:Being realistic mass malware analysis tools are intended for antivirus companies that need to filter between the big amount of files they receive to discard between harmless and potentially dangerous files. Checking all they get one by one would be impossible nowadays.
That's just ONE scenario where mass analysis can make sense. There are way more than this. How do you think are Malware trends identified. How do you think statistical data is produced for list i.e. the top-ten of Windows autostart-possibilities used by malware-authors. How do you think it's possible to find out eventually other malware that seem to be created by the same developer, or the same frameworks/tools. And so on - there are really a lot.
VirusBuster wrote:If an antivirus-company must do mass malware analysis, on what option will they rely? Probably in their own solution or on a professional solution like Norman Sandbox Analyzer.
That really depends on too much influences.
VirusBuster wrote:So I think a good question is: Is there a "market" for public malware analyzers? It exists but it´s very very little.
There is a market - that's it's big I never mentioned.
VirusBuster wrote:Then who will be using public malware analyzers? Mainly advanced users, not malware researchers because they don´t need that neither antivirus companies because they will use or their own solution or a professional one.
Mainly CERTs - that's why I made it public. It's a common approach in the CERT community to share instrumentation.
VirusBuster wrote:So in my opinion the scope of the publicly available malware analysis tools (mass analyzer or not) are the advanced users.
No, see above.
VirusBuster wrote:I will not comment about CERTs because I don´t really know if they process big amounts of samples or they mainly work with honeypots.
I'm from a national and government CERT, I guess I know what our branch is doing. ;-)
And to answer your question: They do, one more the other less, that depends on many things.
VirusBuster wrote:How do most of the advanced users prefer to make malware analysis? Probably using online malware analyzers like Anubis, ThreatExpert, JoeBox, etc. Why? I think because they are afraid of possible infections so they are safe using online tools.
That's correct for the normal advanced users. But for CERTs and AV-vendors from time to time, too, there are periodicly scenarios where nothing is allowed to become public - so, no Anubis and so.
VirusBuster wrote:From the advanced users that don´t mind hosting a malware analyzer, what do they prefer: a Linux or a Windows based malware analyzer tool? Windows, of course, because they want to check if a program is trustable to later install it in their system. Having to make the analysis under Linux to analyze a Windows application is not practical for them.
I don't really care a lot regarding this, as they are not my main-constituency. I just decided to let also the public (not CERT or researcher) guy participate in my work.
VirusBuster wrote:For all the above reasons is why I think malware analysis tools must be hosted under Windows. The few persons (let´s be realistic, probably just the 1 or 2% of computer users use them) that will use that kind of tools work with Windows.
You're still merging up two different things. Instruments for fast analysis of lots of samples; and indepth code-analysis (mainly) on Windows PE files (executables).

Cheers,
Chrisu.
VirusBuster
Member
Posts: 85
Joined: Mon Aug 27, 2007 10:48 am

Post by VirusBuster »

chrisu wrote:It depends on what's needed. Usually a researcher initially doesn't look into the code - that's what Ida, Olly, etc. are for. First you just want to know in a quick way with what you're dealing with. Then, if it's necessary or relevant you might take a look at the code.
Btw., IDA is no decompiler, though hexrays are selling an decompiler-plugin for their Disassembler/Debugger IDA Pro. So, don't merge this up.
I meant disassembler not decompiler, sorry.

Then we must difference between independent malware-researchers and malware-researchers working for antivirus-companies.

The independent malware-researcher doesn´t need the quick way to check with what he´s dealing. Most of the time he will work on samples already known to be malware.

The malware-researcher working for an antivirus-company will receive samples already filtered by the own malware analyzer tool or the third part professional tool.
chrisu wrote:That's just ONE scenario where mass analysis can make sense. There are way more than this. How do you think are Malware trends identified. How do you think statistical data is produced for list i.e. the top-ten of Windows autostart-possibilities used by malware-authors. How do you think it's possible to find out eventually other malware that seem to be created by the same developer, or the same frameworks/tools. And so on - there are really a lot.
Who does that work? Antivirus-companies.

What tools are being used to do such work? Internal tools or professional ones like the ones developed by Zynamics.
chrisu wrote:Mainly CERTs - that's why I made it public. It's a common approach in the CERT community to share instrumentation.
How do CERTs get samples?
chrisu wrote:You're still merging up two different things. Instruments for fast analysis of lots of samples; and indepth code-analysis (mainly) on Windows PE files (executables).
I´m talking about malware analysis tools producing results with human intervention or automatically and independently if they can process lots of samples or only one at a time. And now I´m discussing if it´s better to build them under Linux or Windows depending of the people that may use them.

With the indepth code-analysis (mainly) on Windows PE files (executables) I meant that the people doing that work doesn´t need of tools like Minibis or BSA. That people work on samples already filtered.
User avatar
Kayaker
Posts: 4169
Joined: Thu Oct 26, 2000 11:00 am

Post by Kayaker »

Adding..

5 Steps to Building a Malware Analysis Toolkit Using Free Tools

http://zeltser.com/malware-analysis-toolkit/
User avatar
disavowed
Posts: 1290
Joined: Mon Apr 01, 2002 3:00 pm

Post by disavowed »

VirusBuster wrote:What tools are being used to do such work? Internal tools or professional ones like the ones developed by Zynamics.
Doubtful, as Zynamics's VxClass doesn't scale to real-world scenarios.
VirusBuster
Member
Posts: 85
Joined: Mon Aug 27, 2007 10:48 am

Post by VirusBuster »

Some time ago they wrote me telling they were working on scalating to match real-world scenarios. Don´t know how much they have advanced on the task since then.
User avatar
disavowed
Posts: 1290
Joined: Mon Apr 01, 2002 3:00 pm

Post by disavowed »

I haven't seen a live demo since VxClass came out, so it's entirely possible that it does scale to millions of samples now. Not sure how often Halvar reads this forum, but would be nice to get some input from him on this.
halvar
Junior Member
Posts: 17
Joined: Mon Dec 13, 2004 10:30 am

Post by halvar »

disavowed wrote:I haven't seen a live demo since VxClass came out, so it's entirely possible that it does scale to millions of samples now. Not sure how often Halvar reads this forum, but would be nice to get some input from him on this.

Ah, cool, thanks for notifying me.

When we speak about scalability, we have to look at two angles: Processing the stream of incoming samples, and processing the set of legacy samples accumulated over time.

I will write a bit about stream processing here:

A rough measure of how many files you need to process per day is roughly 40k or so (measured by MD5sum).

When it comes to processing legacy samples, were quickly speaking millions of files, but I will talk about this later.

So, core point is: We have spent the last year distributing VxClass, and we now regularly run a compute cluster where we process approx 1k executables per compute-node per day -- right now we run an 8-machine cluster. That puts us at processing 20% (8000 files) of the malware that needs to be processed on a (computing) budget of roughly 800 USD / month. Scaling further would be no problem from our end, but the database server that we're using tends to corrupt tables if we try to push it further (sigh, it seems to be hard to write decent software).

Now, a second point to consider: The full VxClass run is meant for -correlation- -- e.g. it is designed to be favour accuracy over speed. There are two (trivial) tricks to push performance higher:
  1. Disable expensive comparisons if approximate comparisons yield high similarity -- VxClass contains a -very- fast approximate comparison that is used to schedule more expensive comparisons. If the approximate comparison detects high similarity, this is essentially sufficient for the AV scenario -- not necessarily for the correlation scenario though.
  2. Use the automated signature generation for pre-filtering. VxClass can automatically generate "smart" signatures (which is AV-speak for byte signatures with wild cards -- AVs use a lot of hashes these days, making wildcards seem "smart"). The way this works is that we generate byte signatures on the fly, and then only perform expensive comparisons on those executables -not- matched by existing signatures.
Summary: Give me 40 machines and a decent database server, and I will make VxClass process 40k samples / day.

Now, an interesting question comes up: What do most AV labs actually do ?

From what I could gather, their approach is essentially a combination of behavior-monitoring and hash generation -- e.g. they run large farms of virtualized environments, inject malware, and then observe behavior. If the behavior is bad, they add a "hash signature" to their signature DB and roll it out.

Kaspersky seems to be doing something that has relationships to image processing (they seem to use a lot of GPU code), the details are sketchy tho.

Regarding dealing with the 30m or 60m legacy samples: The actual quantity of samples that you need to work on is going to be -drastically- less: Process the first 10k, generate signatures, sieve out from the 60m those caught by the signatures, repeat. I would be surprised if you need to perform expensive processing on more than 1m files.

I hope this clarifies a bit. A lot of the issues we are wrestling with are those induced by our small size: An AV company spends more on executive cab fares per month than what we can afford to spend on computation. Most large AVs have -hundreds- of machines processing incoming malware, we have 8 ;)

Cheers,
Halvar
User avatar
disavowed
Posts: 1290
Joined: Mon Apr 01, 2002 3:00 pm

Post by disavowed »

Thanks for the response, Halvar!
VirusBuster
Member
Posts: 85
Joined: Mon Aug 27, 2007 10:48 am

Post by VirusBuster »

Thank you very much for the detailed explanation!
User avatar
Kayaker
Posts: 4169
Joined: Thu Oct 26, 2000 11:00 am

Post by Kayaker »

A new blog post by Lenny Zeltser summarizing the topic:

How to Get Started With Malware Analysis

http://blogs.sans.org/computer-forensic ... -analysis/
Silkut
Senior Member
Posts: 579
Joined: Fri Mar 31, 2006 11:29 am

Post by Silkut »

Yup, and quoting our board =)
Please consider donating to help Woodmann.com staying online (here is why).
Any amount greatly appreciated. Thank you.
User avatar
Woodmann
Posts: 3605
Joined: Fri Jan 26, 2001 6:28 pm

Post by Woodmann »

That was nice of Lenny :yay: .

Woodmann
Learn Or Die.
User avatar
NickyBlue
Junior Member
Posts: 16
Joined: Thu Oct 11, 2012 4:31 am

May I add something guys! of course if you don't mind ;)

Post by NickyBlue »

Hi!,
Sorry to interrupt but I myself have some interest or idea in designing some software along similar line for windows platform. Cause I dunno any other as for now. ;)

I just joined today (few minutes ago actually) and have the flying look over this threads post.
Hey guys what are you all talking about here? Don't you think you all are little bit off mark? I mean what is this virtual machine, sandbox, escaping of malware discussions or all this fighting all about?

Where does this malware escape concept comes from? That bogus striker shit of "symantic"? I think you are not discussing what you should be concentrating your energy on instead of this worthless imaginary concepts promoted by various ..whtever for their own personnel reasons. (if you are not one of them in disguise talking in indirect sense. Which sorry to say not my style. I don't talk ciphers! ;)

Virtual machine is meant to be virtual machine not some running debugging enviroment. Its virtual not real. Those idea have been floated by some ppls whose engine are incomplete code implementation.

So there's nothing running in there to escape. There's another catch. The VM introduced at processor level but its another concept. Its to aid debugging or some runtime analysis. In malware analysis(analyzing malware sample) if I am building a virtual machine doesn't mean the code would be run in some controlled environment. It a foolish concept of automated debugging. It just means our engine can analyze(or virtually execute) or in plain terms can calculate the result of that instruction execution. Its kinda processor logic implemented within software rather than hardware. Yeah for making thing fast we on some occasion we calculate results by executing instruction itself on processor but thats also a different thing. We don't execute the virus body. For example if its some thing

ADD EAX, [2000]

during instruction analysis the EAX and [2000] component is taken out

Then run through dummy instruction created through template on fly i.e all kinda ADDs can be handled by single

ADD EAX, EBX instruction

you just load the register with extracted value single step it within exceptional handler or whatever mechanism you device for it and store the result. Where does the virus body comes into play, eh? This code is executing within analyzer body not malwares. ;)

And secondly about Linux thing. Yeah its good because it do comes into CD packages (small one). But the thing is then it must have drivers for reading everything of windows Partitions. such as Pagefile, Registry, NTFS volume itself. And that need some... u know what I mean. Secondly to analyze thing , for better results its good that if you do it in infected environment or clean same environment. It will require less coding on your part. :) Yeah sometime it become impossible if some real smart code is active. Cause if some real smart code has hacked the OS kernel there no way you can overcome it unless it writer himself hasn't written that handling logic. It'll always be the one trapping your calls not other way round but sure it can't hide the fact that processor running in VM mode. That's the kinda basic secuirity is build into processor level itself. So if you detect such condition. you'll have to ty some alternative way to handle it.

Pheww!!! okay guys I don't want it to be tutorial so I finish this up here.

rest some other time

NickyBlue (aka DarkAvenger: The Resurrection)
User avatar
NickyBlue
Junior Member
Posts: 16
Joined: Thu Oct 11, 2012 4:31 am

Hi guys! me again! Let me picky poky some more.... haha

Post by NickyBlue »

First of all, Its really heck of a thread you started here kayaker. Where do you get these two guys? ... I mean VirusBuster and Chrisu. They really can fight a lot! ;)

Lets do some real debugging into their scripts here ...courtsey DarkAvenger ;)

So lets start from the top...

First thing you fight over this linux thing of yours....

crishu I dunno what you mean by escape thing but even if does like you say. How could it not effect linux box unless or until you join one more line that "Linux on different processor", right? see the logic here?

Yeah Linux got one point in its favor that it can be modified to your liking its networking component if you are ready to take that much pain. But its also true that can be accomplish on windows as well not through code modification but other things.

And you Mr. smarty pant VirusBuster, Is that the valid reason to base your logic on, that such and such thing is never reported so its not worthy or you design things to patch real threats there is capable of taking off. This type of crash never been reported so lets make the plane this way, right? Who cares for passengers, that's accidental, exceptional case! u know! we are not responsible for that ...haha

You fighting over this type of lab that type of lab. What you gonna do in your whatever type of lab of yours. How you gonna accomplish the real task? Fire up the IDA Pro or that Olly DBG thing and do analysis? There's lots of so called reputed organisation doing it. One more is not needed. They are well funded and crisu at least you are getting your paycheck from one of them as you yourself boasted. ..by the way whats your company name?

Got to be aware of that you see. Self protection is my birth right, right? ...haha

Yeah I seen that manual of yours about cert thing. I mean there were all "this head" "that comp pic" and sketches. But where the meat? If I give you a 30 MB file - infected with some real smart new virus. How you gonna report, if its infected by something malicious or not? Simple question, isn't it? Show it your company's trend chart. Or will start chanting C.E.R.T C.E.R.T in front of it...will that do the trick? If yes, than pls, I might try that too... Some nasty ppl here have been real pain in my ass. Always trying find way to install cheap code into my system.

Or is it your code secret ..and if it is then pls I am sorry I didn't know that. The why you two guys banging your head on senseless things here. You know we have a saying here in Hindi. I am Indian, right! ..that "Gaaun basa nahin, bhikari pehle bhik mangne chaley aaye" means the chain store you trying build mr chirsu need some product or not. Something which can do the real work. Or its just the PDF magic you believe in? I know you have been here long and might be doing lot of debugging things. Tell you the fact I am not even a computer engineer, never been to any formal private computer institute we have on every nook corner let it be the computer Engineering. But I can easily see the senselessness of your talks. And mind it Mr. crisu your whole Antivirus or security industry as well. So don't try bog me with your trend charts. I know what trend this world goes on let it be some worthless malwares.

Their been another topic why you making it, for whom? You two have been fighting over this too, no? I don't know about you but I tell you about myself. For you guys! Guys like you ..who's gone mad using primitive products like IDA Pro or Olly DBG. Its gone in your head. Now all you can think of hex byte, breakpoints, this VM that VM, This plugin that plugin, go and bring that plugin and shove it into my ass. Will that make you happy! ;)

Relax! okay!
have a break, a holiday trip with your girlfriend.
And when you comeback, think about the problem again coolly. Its a very good thing Mr crisu but you have to be objective rather than been theoretical. Leave those statistics within the premises of your office for some time. And think how you gonna achieve real code analysis within a executable you talk about. Who knows the next state-of-the-art kernel debugger hitting the scene would be of crisu if not fully automated code analyzer.

and last of all Mr. Kayakar you really very smart fellow. ;)

Thanking you
Your well wisher
Mind it!
Locked