Nytro Posted September 22, 2019 Report Posted September 22, 2019 Writeup for the BFS Exploitation Challenge 2019 Table of Contents Introduction TL;DR Initial Dynamic Analysis Statically Identifying the Vulnerability Strategy Preparing the Exploit Building a ROP Chain See Exploit in Action Contact Introduction Having enjoyed and succeeded in solving a previous BFS Exploitation Challenge from 2017, I've decided to give the 2019 BFS Exploitation Challenge a try. It is a Windows 64 bit executable for which an exploit is expected to work on a Windows 10 Redstone machine. The challenge's goals were set to: Bypass ASLR remotely Achieve arbitrary code execution (pop calc or notepad) Have the exploited process properly continue its execution TL;DR Spare me all the boring details, I want to grab a copy of the challenge study the decompiled code study the exploit Initial Dynamic Analysis Running the file named 'eko2019.exe' opens a console application that seemingly waits for and accepts incoming connections from (remote) network clients. Quickly checking out the running process' security features using SysinternalsProcess Explorer shows that DEP and ASLR are enabled, but Control Flow Guard is not. Good. Further checking out the running process dynamically using tools such as SysinternalsTCPView, Process Monitor or simply running netstat could have been an option right now, but personally I prefer diving directly into the code using my static analysis tool of choice,IDA Pro (I recommended following along with your favourite disassembler / decompiler). Statically Identifying the Vulnerability Having disassembled the executable file and looking at the list of identified functions, the maximum number of functions that need to be analyzed for weaknesses was as little as 17 functions out of 188 in total - with the remaining ones being known library functions, imported functions and the main() function itself. Navigating to and running the disassembled code's main() function through the Hex-Rays decompiler and putting some additional effort into renaming functions, variables and annotating the code resulted in the following output: By looking at the code and annotations shown in the screenshot above, we can see there is a call to a function in line 19 which creates a listening socket on TCP port 54321, shortly followed by a call to accept() in line 27. The socket handle returned by accept() is then passed as an argument to a function handle_client() in line 36. Keeping in mind the goals of this challenge, this is probably where the party is going to happen, so let's have a look at it. As an attacker, what we are going to look for and concentrate on are functions within the server's executable code that process any kind of input that is controlled client-side. All with the goal in mind of identifying faulty program logic that hopefully can be taken advantage of by us. In this case, it is the two calls to the recv() function in lines 21 and 30 in the screenshot above which are responsible for receiving data from a remote network client. The first call to recv() in line 21 receives a hard-coded number of 16 bytes into a "header" structure. It consists of three distinct fields, of which the first one at offset 0 is "magic", a second at offset 8 is"size_payload" and the third is unused. By accessing the "magic" field in line 25 and comparing it to a constant value "Eko2019", the server ensures basic protocol compatibility between connected clients and the server. Any client packet that fails in complying with this magic constant as part of the "header" packet is denied further processing as a consequence. By comparing the "size_payload" field of the "header" structure to a constant value in line 27, the server limits the field's maximum allowed value to 512. This is to ensure that a subsequent call torecv() in line 30 receives a maximum number of 512 bytes in total. Doing so prevents the destination buffer "buf" from being written to beyond its maximum size of 512 bytes - too bad! If this sanity check wasn't present, it would have allowed us to overwrite anything that follows the"buf" buffer, including the return address to main() on the stack. Overwriting the saved return address could have resulted in straightforward and reliable code execution. Skimming through this function's remaining code (and also through all the other remaining functions) doesn't reveal any more code that'd process client-side input in any obviously dangerous way, either. So we must probably have overlooked something and -yes you guessed it- it's in the processing of the "pkthdr" structure. A useful pointer to what the problem could be is provided by the hint window that appears as soon as the mouse is hovered over the comparison operator in line 27. As it turns out, it is a signed integer comparison, which means the size restriction of 512 can successfully be bypassed by providing a negative number along with the header packet in "size_payload"! Looking further down the code at line 30, the "size_payload" variable is typecast to a 16 bit integer type as indicated by the decompiler's LOWORD() macro. Typecasting the 32 bit "size_payload" variable to a 16 bit integer effectively cuts off its upper 16 bits before it is passed as a size argument to recv(). This enables an attacker to cause the server to accept payload data with a size of up to 65535 bytes in total. Sending the server a respectively crafted packet effectively bypasses the intended size restriction of 512 bytes and successfully overwrites the "buf" variable on the stack beyond its intended limits. If we wanted to verify the decompiler's results or if we refrained from using a decompiler entirely because we preferred sharpening or refreshing our assembly comprehension skills instead, we could just as well have a look at the assembler code: the "jle" instruction indicates a signed integer comparison the "movzx eax, word ptr..." instruction moves 16 bits of data from a data source to a 32 bit register eax, zero extending its upper 16 bits. Alright, before we can start exploiting this vulnerability and take control of the server process' instruction pointer, we need to find a way to bypass ASLR remotely. Also, by checking out thehandle_client() function's prologue in the disassembly, we can see there is a stack cookie that will be checked by the function's epilogue which eventually needs to be taken care of . Strategy In order to bypass ASLR, we need to cause the server to leak an address that belongs to its process space. Fortunately, there is a call to the send() function in line 45, which sends 8 bytes of data, so exactly the size of a pointer in 64 bit land. That should serve our purpose just fine. These 8 bytes of data are stored into a _QWORD variable "gadget_buf" as the result of a call to theexec_gadget() function in line 44. Going further up the code to line 43, we can see self-modifying code that uses theWriteProcessMemory() API function to patch the exec_gadget() function with whatever data"gadget_buf" contains. The "gadget_buf" variable in turn is the result of a call to the copy_gadget() function in line 41 which is passed the address of a global variable "g_gadget_array" as an argument. Looking at the copy_gadget() function's decompiled code reveals that it takes an integer argument, swaps its endianness and then returns the result to the caller. In summary, whatever 8 bytes the "g_gadget_array" at position "gadget_idx % 256" points to will be executed by the call to exec_gadget() and its result is then sent back to the connected client. Looking at the cross references to "g_gadget_array" which is only initialized during run-time, we can find a for loop that initializes 256 elements of the array "g_gadget_array" as part of the server's main() function: Going back to the handle_client() function, we find that the "gadget_idx" variable is initialized with 62, which means that a gadget pointed to by "p_gadget_array[62]" is executed by default. The strategy is getting control of the "gadget_idx" variable. Luckily, it is a stack variable adjacent to the "buf[512]" variable and thus can be written to by sending the server data that exceeds the "buf" variable's maximum size of 512 bytes. Having "gadget_idx" under control allows us to have the server execute a gadget other than the default one at index 62 (0x3e). In order to be able to find a reasonable gadget in the first place, I wrote a little Python script that mimics the server's initialization of "g_gadget_array" and then disassembles all its 256 elements using the Capstone Engine Python bindings: I spent quite some time reading the resulting list of gadgets trying to find a suitable gadget to be used for leaking a qualified pointer from the running process, but with partial success only. Knowing I must have been missing something, I still settled with a gadget that would manage to leak the lower 32 bits of a 64 bit pointer only, for the sake of progressing and then fixing it the other day: Using this gadget would modify the pointer that is passed to the call to exec_gadget(), making it point to a location other than what the "p" pointer usually points to, which could then be used to leak further data. Based on working around some limitations by hard-coding stuff, I still managed to develop quite a stable exploit including full process continuation. But it was only after a kind soul asked me whether I hadn't thought of reading from the TEB that I got on the right track to writing an exploit that is more than just quite stable. Thank you Preparing the Exploit The TEB holds vital information that can be used for bypassing ASLR, and it is accessed via the gs segment register on 64 bit Windows systems. Looking through the list of gadgets for any occurence of "gs:" yields a single hit at index 0x65 of the"g_gadget_array" pointer. Acquiring the current thread's TEB address is possible by reading from gs:[030h]. In order to have the gadget that is shown in the screenshot above to do so, the rcx register must first be set to 0x30. The rcx register is the first argument to the exec_gadget() function, which is loaded from the "p" variable on the stack. Like the "gadget_idx variable", "p" is adjacent to the overflowable buffer, hence overwritable as well. Great. By sending a particularly crafted sequence of network packets, we are now given the ability to leak arbitrary data of the server thread's TEB structure. For example, by sending the following packet to the server, gadget number 0x65 will be called with rcx set to 0x30. [0x200*'A'] + ['\x65\x00\x00\x00\x00\x00\x00\x00'] + ['\x30\x00\x00\x00\x00\x00\x00\x00'] Sending this packet will overwrite the target thread's following variables on the stack and will cause the server to send us the current thread's TEB address: [buf] + [gadget_idx] + [p] The following screenshot shows the Python implementation of the leak_teb() function used by the exploit. With the process' TEB address leaked to us, we are well prepared for leaking further information by using the default gagdet 62 (0x3e), which dereferences arbitrary 64 bits of process memory pointed to by rcx per request: In turn, leaking arbitrary memory allows us to bypass DEP and ASLR identify the stack cookie's position on the stack leak the stack cookie locate ourselves on the stack eventually run an external process In order to bypass ASLR, the "ImageBaseAddress" of the target executable must be acquired from the Process Environment Block which is accessible at gs:[060h]. This will allow for relative addressing of the individual ROP gadgets and is required for building a ROP chain that bypasses Data Execution Prevention. Based on the executable's in-memory "ImageBaseAddress", the address of the WinExec() API function, as well as the stack cookie's xor key can be leaked. What's still missing is a way of acquiring the stack cookie from the current thread's stack frame. Although I knew that the approach was faulty, I had initially leaked the cookie by abusing the fact that there exists a reliable pointer to the formatted text that is created by any preceding call to the printf() function. By sending the server a packet that solely consisted of printable characters with a size that would overflow the entire stack frame but stopping right before the stack cookie's position, the call to printf() would leak the stack cookie from the stack into the buffer holding the formatted text whose address had previously been acquired. While this might have been an interesting approach, it is an approach that is error-prone because if the cookie contained any null-bytes right in the middle, the call to printf() will make a partial copy of the cookie only which would have caused the exploit to become unreliable. Instead, I've decided to leak both "StackBase" and "StackLimit" from the TIB which is part of the TEB and walk the entire stack, starting from StackLimit, looking for the first occurence of the saved return address to main(). Relative from there, the cookie that belongs to the handle_client() function's stack frame can be addressed and subsequently leaked to our client. Having a copy of the cookie and a copy of the xor key at hand will allow the rsp register to be recovered, which can then be used to build the final ROP chain. Building a ROP Chain Now that we know how to leak all information from the vulnerable process that is required for building a fully working exploit, we can build a ROP chain and have it cause the server to pop calc. Using ROPgadget, a list of gadgets was created which was then used to craft the following chain: The ROP chain starts at "entry_point", which is located at offset 0x230 of the vulnerable function's "buf" variable and which previously contained the orignal return address to main(). It loads "ptr_to_chain" at offset 0x228 into the rsp register which effectively lets rsp point into the next gadget at 2.). Stack pivoting is a vital step in order to avoid trashing the caller's stack frame. Messing up the caller's frame would risk stable process continuation This gadget loads the address of a "pop rax" gadget into r12 in preparation for a "workaround" that is required in order to compensate for the return address that is pushed onto the stack by the call r12 instruction in 4.). A pointer to "buf" is loaded into rax, which now points to the "calc\0" string The pointer to "calc\0" is copied to rcx which is the first argument for the subsequent API call to WinExec() in 5.). The call to r12 pushes a return address on the stack and causes a "pop rax" gadget to be executed which will pop the address off of the stack again This gadget causes the WinExec() API function to be called The call to WinExec() happens to overwrite some of our ROP chain on the stack, hence the stack pointer is adjusted by this gadget to skip the data that is "corrupted" by the call to WinExec() The original return address to main()+0x14a is loaded into rax rbx is loaded with the address of "entry_point" The original return address to main()+0x14a is restored by patching "entry_point" on the stack -> "mov qword ptr [entry_point], main+0x14a". After that, rsp is adjusted, followed by a few dummy bytes rsp is adjusted so it will slowly slide into its old position at offset 0x230 of"buf", in order to return to main() and guarantee process continuation see 10.) see 10.) see 10.) See Exploit in Action Contact Twitter Sursa: https://github.com/patois/BFS2019 1 Quote