Results 1 to 5 of 5

Thread: Dynamic Binary Code and Data Flow Analysis Instrumentation.

Hybrid View

  1. #1
    |< x != '+' BanMe's Avatar
    Join Date
    Oct 2008
    Location
    Farmington NH
    Posts
    510
    Blog Entries
    4

    Dynamic Binary Code and Data Flow Analysis Instrumentation.

    So I've been integrating Boomerang into Sin32 and I am releasing all future code under BSD and GPL licenses references therein.

    In doing this I dont want to use the GC stuff or the wierd LOG class provided to do the logging of all this important information that is gleaned out of this project, so a reimplementation of that is needed( all 367 or so calls that I commented out) as well as the reimplementation of the GUI..removing QT was fun.. But reworking the controller GUI to also view output of server..is primary goal. But as seen with my post in rekindled hope(maybe) I'm trying to probe for remote console allocation for output as well as input commands.

    For the Most Part I am done with getting it to compile correctly, now I have to make the code not examine 'Binary Files' and examine 'mapped Binary portions' which isnt anything 'really different' from what it does anyways, my method is just runtime based ...

    But I know the benefits from the inclusion of the marvelous little tool,will be great. but there is so much to be done..But I will give you the source and the 'complete compiling project' on 2k5 vs.. This update is only running what has been released in the past for the 'LPC Server portion of this maybe with minor updates' expect a BIG update on that regard soon.
    Last edited by BanMe; August 11th, 2010 at 11:28.

  2. #2
    |< x != '+' BanMe's Avatar
    Join Date
    Oct 2008
    Location
    Farmington NH
    Posts
    510
    Blog Entries
    4

    'live' console for boomerang..

    I am adding a console for testing purposes.. This is only for testing of the commandline Interface(looking for testers to btw), but it is still being refactored to do both static and dynamic analysis..it currently only does static and it requires a path to the exe.. so some of this will be recognizable and small portions of this are obsolete.. the portion that I am focusing on is changing the app into something similar to Command.com or cmd.exe..so much still to do..
    Code:
    int Boomerang::commandLine(int argc, const char **argv) 
    {
    	char line[1024];
    	printf("Sin32_Boomerang %s\n", VERSION);// Display a version and date (mainly for release versions)
    	printf("Sin32_Boomerang: ");
    	while (fgets(line, sizeof(line), stdin)) {
    		int argc = splitLine(line,(char***)&argv);
    //		if (parseCmd(argc, (const char **)argv) == 2) 
    //			return 2;
    		if(ParseInputCmds(argc,(const char**)argv))
    		{
    		  printf("Sin32_Boomerang: ");
    		  fflush(stdout);
    		}
    	}
    	return 0;
    }
    Dont expect much change to the interal commands(beyond better detail or naming them.. source is below...

    regards BanMe
    Last edited by BanMe; August 15th, 2010 at 02:05. Reason: Last blog entry has code :P
    No hate for the lost children;
    more love for the paths we walk,
    'words' shatter the truth we seek.
    from the heart and mind of Me
    me, to you.. down and across

    No more words from me, to you...
    Hate and love shatter the heart and Mind of Me.
    For the Lost Children;For the paths we walk; the real truth we seek!

  3. #3
    |< x != '+' BanMe's Avatar
    Join Date
    Oct 2008
    Location
    Farmington NH
    Posts
    510
    Blog Entries
    4

    PARSER updates here..

    Code:
    int Boomerang::ParseInputCmds(int argc,const char**argv)
    {
    	char CmdIntuit[256];
    	int kmd = 0;
    	/*keeping this for later :]
    			case 'g': 
    				if(argv[i][2]=='d')
    					dotFile = argv[++i];
    				else if(argv[i][2]=='c')
    					generateCallGraph=true;
    				else if(argv[i][2]=='s') {
    					generateSymbols=true;
    					stopBeforeDecompile=true;
    				}
    				break;
    				*/
    	if(strlen(argv) < 256)
    	{
    		strcpy(&CmdIntuit,argv[0]);
    	}
    	else
    	{
    		return 0;
    	}
    	switch(CmdIntuit[0])
    	{
    		//alphabetical lower or upper(must be one or other..for now ;P)
    		//case handler for commands.
    		case 'a':
    		{
    			if(CmdIntuit[1] == 'd')
    			{
    				if(argc <= 1)
    				{
    					usage();
    					return 0;
    				}
    				else if(argc <=2)
    				{
    					usage();
    					return 0;
    				}
    				else
    				{
    					if(strcpy(CmdIntuit,argv[1]) == 'e')
    					{
    						if(Cmdintuit[1][0] == 'n')
    						{
    							noDecodeChildren = true;
    							ADDRESS addr;
    							int n;
    							decodeMain = false;
    							if(argv[2][0] == '0' && argv[2][1] == 'x')
    							{
    								n = sscanf(argv[2], "0x%x", &addr);
    							} else {
    								n = sscanf(argv[2], "%i", &addr);
    							}
    							if (n != 1)
    							{
    								std::cerr << "bad address: " << argv[i] << std::endl;
    								return 0;
    							}
    							entrypoints.push_back(addr);
    							return 1;
    						}
    					}
    				}
    			}
    			usage();
    			return 0;
    		}
    		case 'A':
    		{
    			if(CmdIntuit[1] == 'D')
    			{
    				if(argc <= 1)
    				{
    					usage();
    					return 0;
    				}
    				else if(argc <=2)
    				{
    					usage();
    					return 0;
    				}
    				else
    				{
    					if(strcpy(CmdIntuit,argv[1]) == 'E')
    					{
    						if(Cmdintuit[1][0] == 'N')
    						{
    							noDecodeChildren = true;
    							ADDRESS addr;
    							int n;
    							decodeMain = false;
    							if(argv[2][0] == '0' && argv[2][1] == 'X')
    							{
    								n = sscanf(argv[2], "0x%x", &addr);
    							} else {
    								n = sscanf(argv[2], "%i", &addr);
    							}
    							if (n != 1)
    							{
    								std::cerr << "bad address: " << argv[i] << std::endl;
    								return 0;
    							}
    							entrypoints.push_back(addr);
    							return 1;
    						}
    					}
    				}
    			}
    			usage();
    			return 0;
    		}
    		case 'b':
    		case 'B':
    		case 'c':
    		case 'C':
    		case 'd':
    
    		{
    			if(CmdIntuit[1] == 'f')
    			{
    				dfaTypeAnalysis = true;
    				return 1;
    			}
    		}
    		case 'D':
    		{
    			if(CmdIntuit[1] == 'F')
    			{
    				dfaTypeAnalysis = true;
    				return 1;
    			}
    		}
    		case 'e':
    		case 'E':
    		case 'f':
    		case 'F':
    		case 'g':
    		case 'G':
    		case 'h':
    		{
    			switch(CmdIntuit[1])
    			{
    				case 'e':
    				{
    					if(argc !> 1)
    					{
    						help();
    						return 1;
    					}
    					else
    					{
    						helpcmd();
    						return 1;
    					}
    				}
    				default:
    					return 0;
    			}
    		}
    		case 'H':
    		{
    			switch(CmdIntuit[1])
    			{
    				case 'E':
    				{
    					if(argc !> 1)
    					{
    						help();
    						return 1;
    					}
    					else
    					{
    						helpcmd();
    						return 1;
    					}
    				}
    				default:
    					return 0;
    			}
    		}
    		case 'i':
    		case 'I':
    		case 'j':
    		case 'J':
    		case 'k':
    		case 'K':
    		case 'l':
    		case 'L':
    		case 'm':
    		case 'M':
    		case 'n':
    		case 'N':
    		case 'o':
    		case 'O':
    		case 'p':
    		case 'P':
    		case 'q':
    		case 'Q':
    		case 'r':
    		case 'R':
    		case 's':
    		case 'S':
    		case 't':
    		{
    			switch(CmdIntuit[1])
    			{
    				case 'c':
    				{
    					conTypeAnalysis = true;		// -Tc: use old constraint-based type analysis
    					dfaTypeAnalysis = false;
    					return 1;
    				}
    				default:
    					return 0;
    			}
    		}
    		case 'T':
    		{
    			switch(CmdIntuit[1])
    			{
    				case 'C':
    				{
    					conTypeAnalysis = true;		// -Tc: use old constraint-based type analysis
    					dfaTypeAnalysis = false;
    					return 1;
    				}
    				default:
    					return 0;
    			}
    		}
    		case 'u':
    		case 'U':
    		case 'v':
    		{
    			switch(CmdIntuit[1])
    			{
    				case 'e':
    				{
    					vFlag = true;
    					return 1;
    				}
    				default:
    					return 0;
    			}
    		}
    		case 'V':
    		{
    			switch(CmdIntuit[1])
    			{
    				case 'E':
    				{
    					vFlag = true;
    					return 1;
    				}
    				default:
    					return 0;
    			}
    		}		
    		case 'w':
    		case 'W':
    		case 'x':
    		case 'X':
    		case 'y':
    		case 'Y':
    		case 'z':
    		case 'Z':
    		default:
    			return 0;	   	
    	}
    /*
    
    			case 'o': {
    				outputPath = argv[++i];
    				char lastCh = outputPath[outputPath.size()-1];
    				if (lastCh != '/' && lastCh != '\\')
    					outputPath += '/';		// Maintain the convention of a trailing slash
    				break;
    			}
    			case 'n':
    				switch(argv[i][2]) {
    					case 'b':
    						noBranchSimplify = true;
    						break;
    					case 'c':
    						noDecodeChildren = true;
    						break;
    					case 'd':
    						noDataflow = true;
    						break;
    					case 'D':
    						noDecompile = true;
    						break;
    					case 'l':
    						noLocals = true;
    						break;
    					case 'n':
    						noRemoveNull = true;
    						break;
    					case 'P':
    						noPromote = true;
    						break;
    					case 'p':
    						noParameterNames = true;
    						break;
    					case 'r':
    						noRemoveLabels = true;
    						break;
    					case 'R':
    						noRemoveReturns = true;
    						break;
    					case 'g':
    						noGlobals = true;
    						break;
    					case 'G':
    						break;
    					default:
    						help();
    				}
    				break;
    		
    			case 'E':
    				noDecodeChildren = true;
    				// Fall through
    			case 'e':
    				{
    					ADDRESS addr;
    					int n;
    					decodeMain = false;
    					if (++i == argc) {
    						usage();
    						return 1;
    					}
    					if (argv[i][0] == '0' && argv[i+1][1] == 'x') {
    						n = sscanf(argv[i], "0x%x", &addr);
    					} else {
    						n = sscanf(argv[i], "%i", &addr);
    					}
    					if (n != 1) {
    						std::cerr << "bad address: " << argv[i] << std::endl;
    					}
    					entrypoints.push_back(addr);
    				}
    				break;
    			case 's':
    				{
    					if (argv[i][2] == 'f') {
    						symbolFiles.push_back(argv[i+1]);
    						i++;
    						break;
    					}
    					ADDRESS addr;
    					int n;
    					if (++i == argc) {
    						usage();
    						return 1;
    					}
    					if (argv[i][0] == '0' && argv[i+1][1] == 'x') {
    						n = sscanf(argv[i], "0x%x", &addr);
    					} else {
    						n = sscanf(argv[i], "%i", &addr);
    					}
    					if (n != 1) {
    						std::cerr << "bad address: " << argv[i+1] << std::endl;
    						exit(1);
    					}
    					const char *nam = argv[++i];
    					symbols[addr] = nam;
    				}
    				break;
    			case 'd':
    				switch(argv[i][2]) {
    					case 'a':
    						printAST = true;
    						break;
    					case 'c':
    						debugSwitch = true;
    						break;
    					case 'd':
    						debugDecoder = true;
    						break;
    					case 'g':
    						debugGen = true;
    						break;
    					case 'l':
    						debugLiveness = true;
    						break;
    					case 'p':
    						debugProof = true;
    						break;
    					case 's':
    						stopAtDebugPoints = true;
    						break;
    					case 't':		// debug type analysis
    						debugTA = true;
    						break;
    					case 'u':		// debug unused locations (including returns and parameters now)
    						debugUnused = true;
    						break;
    					default:
    						help();
    				}
    				break;
    			case 'm':
    				if (++i == argc) {
    					usage();
    					return 1;
    				}
    				sscanf(argv[i], "%i", &maxMemDepth);
    				break;
    			case 'i':
    				if (argv[i][2] == 'c')
    					decodeThruIndCall = true;		// -ic;
    				if (argv[i][2] == 'w')				// -iw
    					if (ofsIndCallReport) {
    						std::string fname = getOutputPath() + "indirect.txt";
    						ofsIndCallReport = new std::ofstream(fname.c_str());
    					}
    				break;
    			case 'L':
    				if (argv[i][2] == 'D')
    					#if USE_XML
    					loadBeforeDecompile = true;
    					#else
    					std::cerr << "LD command not enabled since compiled without USE_XML\n";
    					#endif
    				break;
    			case 'S':
    				if (argv[i][2] == 'D')
    					#if USE_XML
    					saveBeforeDecompile = true;
    					#else
    					std::cerr << "SD command not enabled since compiled without USE_XML\n";
    					#endif
    				else {
    					sscanf(argv[++i], "%i", &minsToStopAfter);					
    				}
    				break;
    			case 'k':
    				kmd = 1;
    				break;
    			case 'P':
    				progPath = argv[++i];
    				if (progPath[progPath.length()-1] != '\\')
    					progPath += "\\";
    				break;
    			case 'a':
    				assumeABI = true;
    				break;
    			case 'l':
    				if (++i == argc) {
    					usage();
    					return 1;
    				}
    				sscanf(argv[i], "%i", &propMaxDepth);
    				break;
    			default:
    				help();
    		}
    	}
    
    	setOutputDirectory(outputPath.c_str());
    	
    	if (kmd)
    		return cmdLine();
    */
    	return decompile(argv[argc-1]);	   
    }
    new parser..still more to do.. as can be seen...
    Last edited by BanMe; August 13th, 2010 at 13:22.
    No hate for the lost children;
    more love for the paths we walk,
    'words' shatter the truth we seek.
    from the heart and mind of Me
    me, to you.. down and across

    No more words from me, to you...
    Hate and love shatter the heart and Mind of Me.
    For the Lost Children;For the paths we walk; the real truth we seek!

  4. #4
    |< x != '+' BanMe's Avatar
    Join Date
    Oct 2008
    Location
    Farmington NH
    Posts
    510
    Blog Entries
    4

    Running changes to internals here:

    So with the new Parser I needed to modify the help for commands

    This parser only reads the first 2 letters of each word and goes off of that.. this can be easily expanded but no need yet.. so keep in mind instead of 'add entry 0x07c904020', can be written as 'ad en 0x7c904020' as a shortcut, here is the current help as I've modified it.. :]

    Changes to 'wording' and 'Ideas'(I dont need code..just a idea and a direction) for commands one would want to have.
    please post your proposed changes or ideas here...

    Code:
    void Boomerang::help() {
    	std::cout << "Symbols\n";
    	std::cout << "  add symbol <addr> <name> : Define a symbol\n";
    	std::cout << "  ADD SYMBOL <addr> <name> : Define a symbol\n";
    	
    	std::cout << "  load symbols <filename>   : Read a symbol/signature file\n";
    	std::cout << "  LOAD SYMBOLS <filename>   : Read a symbol/signature file\n";
    	
    	std::cout << "Decoding/decompilation options\n";
    	std::cout << "  add entry <addr>        : Decode the procedure beginning at addr, and callees\n";
    	std::cout << "  ADD ENTRY <addr>        : Decode the procedure at addr, no callees\n";
    	std::cout << "  decode indirect calls   : Decode Indirect Calls\n";//ic
    	std::cout << "  DECODE INDIRECT CALLS   : Decode Indirect Calls\n";
    	std::cout << "  trace	                : Trace (print address of) every instruction decoded\n";
    	std::cout << "  TRACE	                : Trace (print address of) every instruction decoded\n";
    	std::cout << "  type constraint analysis :Use constraint-based type analysis\n";
    	std::cout << "  TYPE CONSTRAINT ANALYSIS :Use constraint-based type analysis\n";
    	std::cout << "  data flow analysis      : Use data-flow-based type analysis\n";
    	std::cout << "  DATA FLOW ANALYSIS      : Use data-flow-based type analysis\n";
    	std::cout << "  -a               : Assume ABI compliance\n";
    	std::cout << "  -W               : Windows specific decompilation mode (requires pdb information)\n";
    //	std::cout << "  -pa              : only propagate if can propagate to all\n";
    	//std::cout << "Output\n";
    	std::cout << "  verbose               : Set verbose output\n";
    	std::cout << "  VERBOSE               : Set verbose output\n";
    	std::cout << "  help               : This help\n";
    	std::cout << "  HELP               : This help\n";
    	
    	std::cout << "  -o <output path> : Where to generate output (defaults to ./output/)\n";
    	std::cout << "  -x               : Dump XML files\n";
    	std::cout << "  -r               : Print RTL for each proc to log before code generation\n";
    	std::cout << "  -gd <dot file>   : Generate a dotty graph of the program's CFG and DFG\n";
    	std::cout << "  -gc              : Generate a call graph (callgraph.out and callgraph.dot)\n";
    	std::cout << "  -gs              : Generate a symbol file (symbols.h)\n";
    	std::cout << "  -iw              : Write indirect call report to output/indirect.txt\n";
    	std::cout << "Misc.\n";
    	std::cout << "  take command     : Activate Command mode, for available commands see help command\n";
    	std::cout << "  TAKE COMMAND     : Activate Command mode, for available commands see help command\n";
    	std::cout << "  -P <path>        : Path to Boomerang files, defaults to where you run\n";
    	std::cout << "                     Boomerang from\n";
    	std::cout << "  -X               : activate eXperimental code; errors likely\n";
    	std::cout << "  --               : No effect (used for testing)\n";
    	std::cout << "Debug\n";
    	std::cout << "  -da              : Print AST before code generation\n";
    	std::cout << "  -dc              : Debug switch (Case) analysis\n";
    	std::cout << "  -dd              : Debug decoder to stdout\n";
    	std::cout << "  -dg              : Debug code Generation\n";
    	std::cout << "  -dl              : Debug liveness (from SSA) code\n";
    	std::cout << "  -dp              : Debug proof engine\n";
    	std::cout << "  -ds              : Stop at debug points for keypress\n";
    	std::cout << "  -dt              : Debug type analysis\n";
    	std::cout << "  -du              : Debug removing unused statements etc\n";
    	std::cout << "Restrictions\n";
    	std::cout << "  -nb              : No simplifications for branches\n";
    	std::cout << "  -nc              : No decode children in the call graph (callees)\n";
    	std::cout << "  -nd              : No (reduced) dataflow analysis\n";
    	std::cout << "  -nD              : No decompilation (at all!)\n";
    	std::cout << "  -nl              : No creation of local variables\n";
    //	std::cout << "  -nm              : No decoding of the 'main' procedure\n";
    	std::cout << "  -ng              : No replacement of expressions with Globals\n";
    	std::cout << "  -nG              : No garbage collection\n";
    	std::cout << "  -nn              : No removal of NULL and unused statements\n";
    	std::cout << "  -np              : No replacement of expressions with Parameter names\n";
    	std::cout << "  -nP              : No promotion of signatures (other than main/WinMain/\n";
    	std::cout << "                     DriverMain)\n";
    	std::cout << "  -nr              : No removal of unneeded labels\n";
    	std::cout << "  -nR              : No removal of unused Returns\n";
    	std::cout << "  -l <depth>       : Limit multi-propagations to expressions with depth <depth>\n";
    	std::cout << "  -p <num>         : Only do num propagations\n";
    	std::cout << "  -m <num>         : Max memory depth\n";
    }
    Also be aware, some of these are now obsolete.. I just haven't done the commenting here yet.. ;p
    No hate for the lost children;
    more love for the paths we walk,
    'words' shatter the truth we seek.
    from the heart and mind of Me
    me, to you.. down and across

    No more words from me, to you...
    Hate and love shatter the heart and Mind of Me.
    For the Lost Children;For the paths we walk; the real truth we seek!

  5. #5
    |< x != '+' BanMe's Avatar
    Join Date
    Oct 2008
    Location
    Farmington NH
    Posts
    510
    Blog Entries
    4

    More Changes :)

    heres the source code from todays workings.
    enjoy the small update.. still need to fix this issue..
    • 1>basicblock.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
      1>dataflow.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
      1>dfa.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
      1>exp.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
      1>proc.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
      1>signature.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
      1>sslparser.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
      1>type.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators


    im gonna start in basic block.. and work my way down..also this project is dependent still on the dll Win32Binary and some other one..so the 'true' internal dont work yet.. I've just got to add in the class and modify a few bits of code to make it not use the external dlls..but after this is complete, and we are finally able to have a lil 'public' testing.. the output to the console is going to be immense, until I implement my own logger...
    Attached Files Attached Files
    Last edited by BanMe; August 15th, 2010 at 02:01.
    No hate for the lost children;
    more love for the paths we walk,
    'words' shatter the truth we seek.
    from the heart and mind of Me
    me, to you.. down and across

    No more words from me, to you...
    Hate and love shatter the heart and Mind of Me.
    For the Lost Children;For the paths we walk; the real truth we seek!

Similar Threads

  1. Data in Code Section
    By Ret in forum The Newbie Forum
    Replies: 5
    Last Post: July 25th, 2012, 00:48
  2. Replies: 0
    Last Post: March 19th, 2011, 02:18
  3. Replies: 0
    Last Post: July 14th, 2009, 22:37
  4. Generalizing Data Flow Information
    By Uninformed Journal in forum Blogs Forum
    Replies: 0
    Last Post: October 22nd, 2007, 12:22
  5. Memalyze: Dynamic Analysis of Memory Access Behavior in Software
    By Uninformed Journal in forum Blogs Forum
    Replies: 0
    Last Post: October 22nd, 2007, 12:22

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •