Some notes on how to find out hidden callbacks

Rate this Entry
Can I blog an incomplete solution or an incomplete analysis? Why not! That’s the spirit of this blog entry!

More than one year ago I started a project with Kayaker, we decided to write a tool able to show hidden callbacks. If I remember correctly the idea was born while we were putting our hands on a rootkit. In the same days I bet there were many reversers around thinking the same thing because the same tool was developed by others. As you can imagine our tool never see the light, but not because there are similar tools available online; mostly because we are two old lazy reversers!

I bet you are thinking: why the hell are you writing this stupid intro? Well, the tools I mentioned before were bugged and some months ago I discovered the same thing, they are still bugged (I don’t know if they have solved their problems right now…). Strange that no one else noticed it yet.
Anyway, we won’t complete the tool, but with this blog post I would like to tell you some notes about our investigations. At the beginning I wanted to write a detailed and complete article about the subject, but I don’t know when I’ll be able to end this project so I decided to spread out some of my notes.

It’s a sort of two minds work so credit goes to Kayaker too!

The idea is to try to retrieve hidden callbacks that has been installed via CmRegisterCallback, PsSetCreateProcessNotifyRoutine, PsSetCreateThreadNotifyRoutine and PsSetLoadImageNotifyRoutine. After that it would be good to deregister one or more of them.

Where to start?
First of all you have to understand what’s behind functions like CmRegisterCallback, and others. Then, you’ll have something to work on. I’ll start with CmRegisterCallback (from XP SP2), the function is used to register a RegistryCallback routine, and I think the XP version is the most simple one to fully undestand the principles behind the function. There are some differencies between XP and 7 versions, but I think you’ll be able to fully understand 7 structure too! Here is the disassembled function (without useless parts of course):
487E6B  push   'bcMC'                          ; Pool Tag: "CMcb" 
487E70  xor    ebx, ebx 
487E72  push   38h                             ; NumberOfBytes: 0x38 
487E74  inc    ebx 
487E75  push   ebx                             ; PoolType: PAGEDPOOL 
487E76  call   ExAllocatePoolWithTag           ; ExAllocatePoolWithTag(x,x,x): allocates pool memory 
487E7B  mov    esi, eax                        ; eax is the pointer to the allocated pool memory, PCM_CALLBACK_CONTEXT_BLOCK 
487E7D  xor    edi, edi 
487E7F  cmp    esi, edi                        ; Is PCM_CALLBACK_CONTEXT_BLOCK a NULL pointer? 
487E81  jz     cmRegisterCallback_fails        ; yes: function fails... 
487E87  push   esi 
487E88  push   [ebp+Function]                  ; PEX_CALLBACK_FUNCTION, pointer to callback function 
487E8B  call   _ExAllocateCallBack             ; allocates and fill EX_CALLBACK_ROUTINE_BLOCK structure (more on this later...) 
487E90  cmp    eax, edi                        ; ExAllocateCallback success or not? 
487E92  mov    [ebp+PEX_CALLBACK_ROUTINE_BLOCK], eax ; store the pointer to the allocated pool memory 
487E95  jnz    short _ExAllocateCallBack_success   
    ...                                         ; fill CM_CALLBACK_CONTEXT_BLOCK fields 
487EDC  mov    ebx, offset CmpCallBackVector 
487EE1  mov    [ebp+i], edi                    ; i = 0 
487EE4 try_next_slot: 
487EE4  push   edi                             ; OldBlock: NULL 
487EE5  push   [ebp+PEX_CALLBACK_ROUTINE_BLOCK] ; NewBlock with information to add 
487EE8  push   ebx                             ; CmpCallbackVector[i] 
487EE9  call   _ExCompareExchangeCallBack   ; try to *insert* the new callback inside CmpCallBack vector 
487EEE  test   al, al                       ;check the result... 
487EF0  jnz    short free_slot_has_been_found    ; jump if the vector has an empty space for the new entry 
487EF2  add    [ebp+i], 4                      ; i++, increase the counter 
487EF6  add    ebx, 4                          ; shift to the next item of the vector to check 
487EF9  cmp    [ebp+i], 190h                   ; is the end of the vector? 
487F00  jb     short try_next_slot             ; no: try another one. yes: no free slot!    
487F11 cmRegisterCallback_fails: 
487F16 end_CmRegisterCallback:    
487F1A  retn   0Ch    
487F1D free_slot_has_been_found: 
487F1D  mov    eax, 1 
487F22  mov    ecx, offset _CmpCallBackCount   ; CmpCallBackCount: number of not NULL item inside the vector 
487F27  xadd   [ecx], eax                      ; there's a new callback, it increases the number of item inside the vector 
487F2A  xor    eax, eax 
487F2C  jmp    short end_CmRegisterCallback
As you can see the idea behind the function is really simple!
Basically, it tries to add a new entry inside a vector named CmpCallBackVector, and when the entry is correctly inserted the registration process will end with a success.
How do I know is it using a vector? The add instruction at 0x487EF6 represents a clear clue, and the cmp at 0x487EF9 reveals the fixed length of the vector (the vector has 100 items (0×190/4…)). Now that I have this information I’m going to try to explain the entire procedure in detail. The algorithm could be divided into 5 big blocks:

1: try to allocate 0×38 bytes for a structure named CM_CALLBACK_CONTEXT_BLOCK
2: try to allocate 0x0C bytes for a structure named EX_CALLBACK_ROUTINE_BLOCK
4: look for an empty slot, insert a sort of PEX_CALLBACK_ROUTINE_BLOCK in it and update CmpCallBackCount
5: notify success or error and exit

Point #1 is pretty simple to understand, it’s only a call to ExAllocatePoolWithTag.

To understand point #2 you have to see what’s going on behind ExAllocateCallBack procedure. Let’s start taking a look at it:
52AB35  push   'brbC'                              ; Pool Tag: Cbrb
52AB3A  push   0Ch                                 ; NumberOfBytes: 0x0C 
52AB3C  push   1                                   ; PoolType: PAGED_POOL 
52AB3E  call   ExAllocatePoolWithTag               ; alloc a EX_CALLBACK_ROUTINE_BLOCK structure 
52AB43  test   eax, eax                            ; ExAllocatePoolWithTag success or not? 
52AB45  jz     short _ExAllocateCallBack_fails 
52AB47  mov    ecx, [ebp+_pex_callback_function]   ; pointer to callback function (PEX_CALLBACK_FUNCTION) 
52AB4A  and    dword ptr [eax], 0                  ; 1° field: 0 
52AB4D  mov    [eax+4], ecx                        ; 2° field: _pex_callback_function 
52AB50  mov    ecx, [ebp+_pool_allocated_memory]   ; PCM_CALLBACK_CONTEXT_BLOCK 
52AB53  mov    [eax+8], ecx                        ; 3° field: _pcm_callback_context_block 
52AB56 _ExAllocateCallBack_fails:   
The procedure is used to allocate and fill a special structure:

       EX_RUNDOWN_REF             RundownProtect;
       PEX_CALLBACK_FUNCTION      Function;
As you can see from the lines above the first field has been setted to 0 while the other fields are filled with two pointers: the function to register and the context containing info about the callback.

While point #3 is just a series of mov instructions used to fill CM_CALLBACK_ROUTINE_BLOCK structure, point #4 gives some usefull information to us: CmpCallBackVector has 100 elements and this part of code is used to scan the entire vector until an empty element is found. A failure leads us to a non-registration of the callback. What happens when there’s a empty slot inside the vector? The new entry will be added inside the vector. Most of the job is done by the function named ExCompareExchangeCallBack, here is the core of the function:

52AB81  mov    eax, [ebp+CmpCallbackVector]    ; vector at the current position 
52AB84  mov    ebx, [eax]                      ; ebx is a PEX_CALLBACK_ROUTINE_BLOCK, the item could be NULL or not 
52AB86  mov    eax, ebx 
52AB88  xor    eax, [ebp+OldBlock]             ; OldBlock is NULL for a registration process 
52AB8B  mov    [ebp+current_pex_callback_routine_block], ebx 
52AB8E  cmp    eax, 7                          ; check used to see if the current item is NULL or not 
52AB91  ja     short loc_52ABB5                ; jump if not NULL 
52AB93  test   esi, esi                        ; is NewBlock NULL? 
52AB95  jz     short loc_52ABA1                ; jump if it's NULL 
52AB97  mov    eax, esi                        ; esi, NewBlock pointer (changed...) 
52AB99  or     eax, 7                          ; PAY ATTENTION HERE: or 7 !?! 
52AB9C  mov    [ebp+NewBlock], eax             ; change NewBlock pointer: NewBlock = NewBlock OR 7 
52AB9F  jmp    short loc_52ABA5    
52ABA5  mov    eax, [ebp+var_4]               ; here if CmpCallbackVector's item is null 
52ABA8  mov    ecx, [ebp+CmpCallbackVector]    ; current empty slot 
52ABAB  mov    edx, [ebp+NewBlock]             ; new pointer to insert 
52ABAE  cmpxchg [ecx], edx                     ; insert the new pointer inside the empty slot! 
52ABB1  cmp    eax, ebx 
52ABB3  jnz    short loc_52AB81 
52ABB5  and    ebx, not 7                     ; PAY ATTENTION HERE! 
52ABB8  cmp    ebx, [ebp+OldBlock]            ; here if CmpCallbackVector's item is not null 
52ABBB  jnz    short loc_52AC19 
52ABBD  test   ebx, ebx 
52ABBF  jz     short loc_52AC15
The routine contains some more things inside, but we can stop here with the analysis because we have everything we need. If the pointer to the NewBlock to insert is not NULL and there’s an available empty slot the pointer is inserted inside the vector; after that CmpCallBackCount value will be updated (remember the snippet at the beginning of this blog entry?).

The last part of the algorithm (point #5) is a simple return with a success or insuccess value:

52AC15 mov    al, 1                          ; 1 means success, new item has been added to CmpCallbackVector 
52AC17 jmp    short loc_52AC29 
52AC19 test   esi, esi                      ; esi -> NewBlock 
52AC1B jz     short loc_52AC27 
52AC1D push   8 
52AC1F pop    edx 
52AC20 mov    ecx, esi 
52AC22 call   ExReleaseRundownProtectionEx   ; if esi is not null something went wrong... 
52AC27 xor    al, al                         ; 0 means insuccess, new item has not been added to CmpCallbackVector
Ok, I think we have a general idea about the vector; each entry contains a *sort* of pointer to a EX_CALLBACK_ROUTINE_BLOCK, and to reveal all of them you only have to scan the entire vector!

To sum up, I have 3 possible scenes:
1. CmpCallbackVector’s item is empty:
the new block will be inserted inside the vector. The added value is not the one passed to ExCompareExchangeCallBack, but it’s the value modified by a “OR 7″ logic operation.
2. CmpCallbackVector’s item is full:
it simply returns STATUS_INSUCCESS and it will try with the next item of the vector
3. Someone is working on the CmpCallbackVector’s item:
the registration process reveals an interesting behaviour, just to be sure to be the only one accessing the resource the system uses a lock mechanism. The OR and AND operations are the core of that mechanism (0x52AB99 and 0x52ABB5, commented using “PAY ATTENTION HERE!”). If the current item of the vector is not NULL the compare instruction at 0x52AB8E fails and the code flow continues from 0x52ABB5. At this point the real address of the item is extracted (stored_value AND NOT 7) and compared with NULL; it’s obviously not NULL and as you can see around 0x52AC22 the resource is released because someone else is working on it. Now you should understand why the hell the system uses to OR by 7 the value to add inside the vector.

With all this kind of information I can finally write a routine able to read all the stored callbacks:
cells = 0x64;                    // cells inside CmpCallbackVector 
nMod = *(DWORD*)_sysmodBuffer;   //    _sysmodBuffer filled by "ZwQuerySystemInformation(SystemModuleInformation..." 
   // take current item from CmpCallbackVector (look at the "& ~7" operation)    
   pCBRB = (PEX_CALLBACK_ROUTINE_BLOCK)((*(DWORD*)(_CmpCallbackVectorAddress + 4*i )) & ~7);    
   if (pCBRB != 0)    
      sysmodTmp = (PSYSTEM_MODULE_INFORMATION)((DWORD)_sysmodBuffer + 4);       
      j = 0;       
      while (jFunction) Base + (DWORD)sysmodTmp->Size) &&             ((DWORD)pCBRB->Function) > ((DWORD)sysmodTmp->Base))
         // Callback has been found             
         DbgPrint("Result: %LX: %s\r\n", pCBRB->Function, sysmodTmp->ImageName);             
      // get the next module          
      j = j + 1;       
It’s important to scan all the cells inside the vector! One of the tool available on the web fails to retrieve callbacks stored after an empty element of the vector.

Well, the only thing to reveal about the code above is CmpCallbackVectorAddress, the address of CmpCallBackVector. How can I locate the exact address of CmpCallBackVector? Imho, that’s the hardest part of the entire process!

How to find CmpCallbackVector address
To develop a tool for a specific OS is pretty easy because the vector’s address is hardcoded; it would be nice to discover an OS independent technique.
I think the most used approach is a byte-search based on a specific sequence of bytes; it’s a nice idea but I don’t want to list every OS version known to man inside my source code. We (I and kayaker) spent a lot of time over this point, we both wanted to develop something that is not totally related to a specific OS version; something that doesn’t require a series of “if OS == xxx” statements inside the code. It’s quite impossible to write a non OS dependent code but I believe it’s possible to remove some OS checks from the code.

We finally came up with two ideas, a practical and a theoretical idea. I hate theory and mine is the practical solution of course. I think both ideas are valid and just to be sure to find the right vector’s address we decided to combine them inside a hypothetical tool, four eyes are always better than two!

The practical approach
My idea is really simple, since of the vector’s address is hardcoded you’ll surely have it in two different parts of the code:
PAGE:005392D0   BB 20 05 48 00   mov    ebx, offset _CmpCallBackVector 
.data:00480520                   _CmpCallBackVector db    0
The address is inside two sections, PAGE and data. An *xref-search* is the core of the idea! It’s pretty stupid indeed, but from what I’ve seen so far it works!
The pseudo code of my xref search is explained here, basically it scans the entire PAGE section trying to locate the right address:
callbackAddress = CmUnregisterCallback address in memory 
pagePointer = pointer_to_PAGE_section 
while (pagePointer < pointer_to_PAGE_section + size_of_PAGE_section) 
   value = get dword pointed by pagePointer    
   if (value is inside DATA section)       
      if ((pagePointer > callbackAddress) && (pagePointer < callbackAddress + range))       
         CmpCallbackVector = value      
As you can imagine a simple xref-search is unable to find out the right value, you need one more check. That’s why I added the line:

if ((pagePointer > callbackAddress) && (pagePointer < callbackAddress + range))
where callbackAddress is the address of CmUnregisterCallback. What does it mean? Well, ‘pagePointer’ should be inside the first “range” bytes of CmUnregisterCallback function. If both “if” statements are satisfied I’m pretty sure about the vector’s address value.

There are still 2 points to clarify:
- what's range variable?
- why CmUnregisterCallback?

range is just a numerical value and you'll only have to decide a value to assign to it. Under XP the first bytes of the CmUnregisterCallback function are:

PAGE:005392C3 8B FF           mov    edi, edi 
PAGE:005392C5 55              push   ebp 
PAGE:005392C6 8B EC           mov    ebp, esp 
PAGE:005392C8 51              push   ecx 
PAGE:005392C9 83 65 FC 00     and    [ebp+var_4], 0 
PAGE:005392CD 53              push   ebx 
PAGE:005392CE 56              push   esi 
PAGE:005392CF 57              push   edi 
PAGE:005392D0 BB 20 05 48 00  mov    ebx, offset _CmpCallBackVector
In this specific case 16 could be a possible value… What about the other OSs? Well, as I said before I think it's hard to write a universal piece of code, but as far as I have seen it's possible to adjust the "range" to cover some more OSs. I don't have Vista and 7 running on my system and I'm working on the dead list only, but I think 148 could be a nice value to set and it should cover all the OSs. If you are still reading and you have Vista or 7, can you confirm that?
One more thing about the search pattern: I use CmUnregisterCallback because (inspecting all the OSs) CmRegisterCallback doesn't always store the CmpCallbackVector value inside the main routine, but it hides it under some calls. i.e. look at CmRegisterCallback from 7:
PAGE:0065712A mov  edi, edi 
PAGE:0065712C push ebp 
PAGE:0065712D mov  ebp, esp 
PAGE:0065712F push [ebp+Cookie] 
PAGE:00657132 mov  eax, offset stru_4FFDF0 
PAGE:00657137 push 1 
PAGE:00657139 push [ebp+Context] 
PAGE:0065713C push [ebp+Function] 
PAGE:0065713F call sub_657153                 ; It's everything inside this call!!! 
PAGE:00657144 pop  ebp 
PAGE:00657145 retn 0Ch
It’s much more complex to attack a procedure with sub-routines, don't you think? That's why I did opt for CmUnregisterCallback.

What about the PsSet* functions?
At the beginning of this blog post I mentioned some more functions, it's time to spend some words for them too.

The functions are:

There are some similarities between CmRegisterCallback and the new three functions: they all register something, they all use a vector to store the information, and they all use the same function! YES, to register a function they use the same scheme:

1. get the address of a specific vector
2. try to insert the new item inside the vector calling ExCompareExchangeCallBack

Just to clarify everything look at this snippet, taken from PsSetCreateThreadNotifyRoutine:

4ED7C4  mov    esi, offset _threadVector   ; the vector 
4ED7C9  push   0 
4ED7CB  push   ebx 
4ED7CC  push   esi 
4ED7CD  call   _ExCompareExchangeCallBack   ; the function 
4ED7D2  test   al, al 
4ED7D4  jnz    short loc_4ED7F3 
4ED7D6  add    edi, 4 
4ED7D9  add    esi, 4 
4ED7DC  cmp    edi, 20h   ; the check over the number of items inside the vector 
4ED7DF  jb     short loc_4ED7C9
The only different thing is the length of the vector:
_callbackVector: 0×64 slots
_processVector: 0×8 slots
_threadVector: 0×8 slots
_imageVector: 0×8 slots

Well, you can use all the info I gave you about CmRegisterCallback for these three functions too! I think you'll be able to retrieve all the hidden callbacks, and -just in case- unregister a callback. There are so many ways from the dirty one (put NULL inside the vector's slot) to the right one (calling the right unregister function)… you only have to decide!

Submit "Some notes on how to find out hidden callbacks" to Digg Submit "Some notes on how to find out hidden callbacks" to Submit "Some notes on how to find out hidden callbacks" to StumbleUpon Submit "Some notes on how to find out hidden callbacks" to Google