Jump to content
Nytro

Hooking 32bit System Calls under WOW64

Recommended Posts

Hooking 32bit System Calls under WOW64

oxff: Georg Wicherski

2011-05-16 16:47:49

While hooking code in userland seems to be fairly common for various purposes (such as sandboxing malware by API hooking), hooking system calls is usually not done in userland. As you can get the same information from employing such hooks in kernelland (just after the transition), people usually choose to deploy their hooks there, since they benefit from added security and stability if implemented properly. That being said, there is one application of system call hooking that rightfully belongs into userland: Hooking of 32bit system calls on a native 64bit environment.

WOW64 is the emulation / abstraction layer introduced in 64bit Windows to support 32bit applications. There are many details about it that I don't want to cover. However for various reasons (I'll leave it to your creativity to find your own; I found a good one playing together with Tillmann Werner), one might be interested in hooking the 32bit system calls that are issued by a 32bit application running in such an environment.

On 32bit Windows XP, there used to be a function pointer within the KUSER_SHARED_DATA page at offset 0x300 that pointed to the symbol ntdll!KiFastSystemCall for any modern machine and was used in any system call wrapper in ntdll to issue a system call:

0:001> u poi(0x7ffe0000+0x300)
ntdll!KiFastSystemCall:
7c90e510 8bd4 mov edx,esp
7c90e512 0f34 sysenter
ntdll!KiFastSystemCallRet:
7c90e514 c3 ret
7c90e515 8da42400000000 lea esp,[esp]
7c90e51c 8d642400 lea esp,[esp]
ntdll!KiIntSystemCall:
7c90e520 8d542408 lea edx,[esp+8]
7c90e524 cd2e int 2Eh
7c90e526 c3 ret

Hooking this would not make much sense, since one could gather the same data just right after the sysenter within kernelland.

Now fast forward to Windows 7, 64bit with a 32bit process running on WOW64. For the following, I will use the 64bit WinDbg version.

On this newer environment, the code executed by a system call wrapper, such as ntdll!ZwCreateFile in this example, does not take any indirection through KUSER_SHARED_DATA. Instead, it calls a function pointer within the TEB:

0:000:x86> u ntdll32!ZwCreateFile
ntdll32!ZwCreateFile:
77a80054 b852000000 mov eax,52h
77a80059 33c9 xor ecx,ecx
77a8005b 8d542404 lea edx,[esp+4]
77a8005f 64ff15c0000000 call dword ptr fs:[0C0h]
77a80066 83c404 add esp,4
77a80069 c22c00 ret 2Ch

This new field is called WOW32Reserved and points into wow64cpu:

    +0x0c0 WOW32Reserved    : 0x743b2320

0:000:x86> u 743b2320 L1
wow64cpu!X86SwitchTo64BitMode:
743b2320 ea1e273b743300 jmp 0033:743B271E

This is in turn a far jmp into the 64bit code segment. The absolute address points into the 64bit part of wow64cpu and sets up the 64bit stack first:

0:000> u 743B271E
wow64cpu!CpupReturnFromSimulatedCode:
00000000`743b271e 67448b0424 mov r8d,dword ptr [esp]
00000000`743b2723 458985bc000000 mov dword ptr [r13+0BCh],r8d
00000000`743b272a 4189a5c8000000 mov dword ptr [r13+0C8h],esp
00000000`743b2731 498ba42480140000 mov rsp,qword ptr [r12+1480h]

Following this, the code will convert the system call specific parameters and convert them to their 64bit equivalents. The code than transitions to the original kernel code.

So the only way to grab the unmodified 32bit system calls (and parameters), before any conversion is being done, is to hook this code. My first idea was to hijack the writable function pointer inside the TEB, but that involves the inconvenience that I need to track threads and modify it for every new thread. Since this function pointer always points to the same location, I decided to go for an inline function hook. In this case, the hook is very simple, since I know that there will be one long enough instruction with fixed length operands. However, we have to take into account SMP systems that might be decoding this instruction while we're writing there, so it is desirable to use a locked write. Unfortunately, there is not enough room around the instruction to write the hook there and overwrite the original instruction with a near jmp (two bytes, can be written atomically with mov if the address is word-aligned or xchg in the general case).

Hence we need to write our five bytes with one single locked write. There is (at least?) one instruction on x86 in 32bit mode which can do that: cmpxchg8b. Reading the processor manual, it gets obvious that we can abuse this to do an unconditional write if we just execute two subsequent cmpxchg8b in a row (assuming that no one else is writing there concurrently):

asm("cmpxchg8b (%6)\n\tcmpxchg8b (%6)"
: "=a" (* (DWORD *) origTrampoline), "=d" (* (DWORD *) &origTrampoline;[4])
: "a" (* (DWORD *) trampoline), "d" (* (DWORD *) &trampoline;[4]),
"b" (* (DWORD *) trampoline), "c" (* (DWORD *) &trampoline;[4]),
"D" (fnX86SwitchTo64BitMode));

One can read out the original jump destination in between those two instructions from edx:eax to hotpatch your hook before it is eventually inserted. This is especially useful when a debugger is attached, as single-stepping results in the syscall trampoline being silently executed (this is great for debugger detection). The hook can then just end in the same jmp far 0x33:?? that was present at X86SwitchTo64BitMode, one just needs to preserve esp and eax.

Happy hooking!

Sursa: Hooking 32bit System Calls under WOW64

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...