Terminology:

Profiling: Traversing through Program for Run Time behavioral Checks. The profiling is basically dissected into two parts. The segregation is done on the size of Code Segments analyzed and the Interdependncy of segments.

Macroprofiling: Performing Run Time checks for complex code segment. This type of profiling basically deals with large software code and complexity of calls between them.

Microprofiling: Performing Run Time checks for single line code or short code segments.

Throughput: The number of instructions executed by the processor in per unit time.

Latency: It is desrcibed as the time interval required to complete the on production cycle.

The profiling calculates the run time usage and CPU utilization to query the resultant affect on the system.

Analytical View.
This entry strictly deals with the Thread Optimization Checks. When the concept of optimization is undertaken the Profiling of code is a Logical aspect that has to follow. For smaller segments of code [ single line command execution] , process of Microprofling is followed. When larger codes are encountered , the Macroprofiling is applied. When any process is initialized , threads will be generated based on the code that is executing. For all type of functions defined and called , it will generate a thread in system state during execution. The Instruction Usage plays a crucial role in Profiling. The more complex the code is the more timing lapse will be there to resolute the inherent complexity factor. It means the Latency factor is high. This in turn utilizes the CPU state.
Lets go through Thread Entry structure which will crystallizes the objects used:

Code:
typedef struct tagTHREADENTRY32 
                         {  
                            DWORD dwSize;  
                            DWORD cntUsage;  
                            DWORD th32ThreadID;  
                            DWORD th32OwnerProcessID;  
                            LONG tpBasePri;  
                            LONG tpDeltaPri;  
                            DWORD dwFlags;
                         } THREADENTRY32, *PTHREADENTRY32;

The level are given as:

                                  THREAD_PRIORITY_IDLE
                                  THREAD_PRIORITY_LOWEST
                                  THREAD_PRIORITY_BELOW_NORMAL
                                  THREAD_PRIORITY_NORMAL
                                  THREAD_PRIORITY_ABOVE_NORMAL
                                  THREAD_PRIORITY_HIGHEST
                                  THREAD_PRIORITY_TIME_CRITICAL

                         BOOL WINAPI Thread32First(
                                         HANDLE hSnapshot,
                                         LPTHREADENTRY32 lpte
                             );

                         BOOL WINAPI Thread32Next
                            (
                                          HANDLE hSnapshot,
                                          LPTHREADENTRY32 lpte
                             );

The thread entry structure is utilized extensively.
The code segments directly reflect the working aspect of Thread that will be executed in the context of memory. Lets see the model of Code Profiling.


This model clearly presents the run time peripherals of code profiling process. The main point of this model is to point the out the kind of characteristic to look for profiling , when the code snippets are analyzed. So it depends a lot on the type of code is executing. The code that involves nested loops , pointer referencse and in depth code interdependencies will create a subtle enviornment to profile. The Latency rate will be high with low throughput.For Example of a complex code segment.

Code:
                          void printError( TCHAR* msg )
                                {
                                               DWORD eNum;
                                               TCHAR sysMsg[256];
                                               TCHAR* p;

                                         eNum = GetLastError( );
                                         FormatMessage( FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
                                         NULL, eNum,
                                         MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), // Default language
                                         sysMsg, 256, NULL );

                              // Trim the end of the line and terminate it with a null
                              p = sysMsg;
                             
                             while( ( *p > 31 ) || ( *p == 9 ) )
                             ++p;
                             do { *p-- = 0; } while( ( p >= sysMsg ) &&
                             ( ( *p == '.' ) || ( *p < 33 ) ) );
                             
                             // Display the message
                             printf( "&#92;n  WARNING: %s failed with error %d (%s)", msg, eNum, sysMsg );
                  }

The ThreadEntry structure provides an description of thread in any process when the snapshot is generated. TheThread32First function provides information of first thread of any process in a system. Lets see the micropofiling model:


So this shows a general view point.To get into it and for practical citation I am going to ThreadProfile a windows binary which will execute in a serial manner and Intel VTune Profiler will be used for analysis. The binary I am going to analyze is Boo32.exe.. and the working paradigm is provided below:

Code:
 
                        E:&#92;tools>BOO32.EXE
                        BOO32 -- Simple Win32 boot sector read/write utility.
                        Copyright (c) 1998 Data Fellows.

                        usage: boo32 [-r | -w] filename [drive]

                        filename: boot sector image file (512 bytes).
                        drive:    a letter and a colon (e.g. "A:") for boot sector,
                        or a decimal number for MBR (e.g. "0" for the first physical hard drive).
                        Default is "A:".

                        -r        read the sector to the image file
                        -w        write the sector from the image file (this is the default)
I run the boo32.exe from console and I feed it in Intel Vtune Thread profiler. Let's see the view.


The profiler is projecting the serial nature of binary i.e the simple running but not as such specific operation carried out.Then I run pslist.exe which will enumerates number of processes in a system with the help of Thread Generation.

Code:
E:&#92;tools>pslist

               Process information for KNOCK:

               Name                Pid Pri Thd  Hnd   Priv        CPU Time    Elapsed Time
               Idle                  0   0   1    0      0    83:10:18.734     0:00:00.000
               System                4   8  54  293      0     1:05:27.281     0:00:00.000
               smss                400  11   3   21    216     0:00:00.015    96:06:18.265
               csrss               508  13  14  451   2300     0:08:46.531    96:06:14.000
               winlogon            540  13  23  574   8388     0:00:18.359    96:06:09.062
               services            596   9  16  269   2220     0:01:03.750    96:05:58.125
               lsass               608   9  14  387   3328     0:00:17.828    96:05:57.812
               svchost             792   8  17  213   3524     0:00:00.968    96:05:53.484
               svchost             868   8  12  306   2368     0:00:05.812    96:05:52.203
               svchost             948   8  46 1166  20652     0:00:49.765    96:05:51.828
               svchost            1056   8  11  168   1828     0:00:56.671    96:05:50.906
               explorer           1332   8  16  899  44580     0:22:53.359    96:05:42.796
               googletalk         1680   8  20  526  33124     0:27:11.593    96:04:51.312
               IEXPLORE           1388   8  14  688  83844     0:43:19.843    79:25:23.968
               svchost            1196   8   8  138   2720     0:00:00.484    79:14:05.296
               spoolsv             588   8  10  132   3688     0:00:00.796    30:14:08.375
               acrotray           1896   8   2   31    984     0:00:00.078    24:35:06.546
               Opera              1356   8   9  237  50260     0:02:15.187     2:39:50.093
               winamp             1108   8  14  268  14088     0:00:23.218     1:55:54.109
               console            3476   8   2   31   2812     0:00:03.296     1:01:36.875
               cmd                4012   8   1   30   2152     0:00:00.078     1:01:36.390
               dexplore           1032   8   5  292  10424     0:00:11.250     0:49:13.796
               notepad            4084   8   1   30   1176     0:00:00.265     0:33:25.953
               VTuneEnv           2872   8   9  510  50208     0:02:05.359     0:16:29.453
               vtunecca           1560  13   8  268  10004     0:00:00.203     0:16:24.625
               wmiapsrv           2904   8   3  151   1664     0:00:01.125     0:16:21.718
               mspaint            1144   8   5  142   7032     0:00:01.359     0:06:55.750
               wuauclt            3804   8   7  172   6784     0:00:00.281     0:02:55.296
               cmd                3392   8   1   31   2152     0:00:00.078     0:00:09.125
               pslist             1036  13   2   87    896     0:00:00.062     0:00:00.031
Let see the Thread Profiling for optimization checks:


So one can see the Limit checks , the Blue Code states it is over utilized.Lets see the summary for checking the Crticial Sections if they are used i.e. Mutexes , Semaphores etc.


After looking the summary thinga are some what clear for Blue code. The examples are taken in generalized manner to show the changes occur with resultant threads. With complex system the results are different. But with the process of Optimizationa and Profiling the code can be controlled in a sequential manner.

More opinions are required.

Regards
0kn0ck

https://www.openrce.org/blog/view/1050/Thread_Optimization_Checks_:_Code_Prominence