Nytro Posted April 15, 2020 Report Posted April 15, 2020 Interactive guide to Buffer Overflow exploitation December 16, 2019 Written by: Vetle Økland Back to blog A Buffer Overflow is a bug class in a program typically written in a memory unsafe language like C or C++. Buffer Overflow bugs from user-input can often allow someone to overwrite some data in memory they weren't supposed to. Before we dive into how to exploit Buffer Overflow bugs, we will do a quick introduction to Assembly. Assembly is a language that describes raw machine code. Assembly is the lowest level programming language, and the processor executes exactly what you write in Assembly. This means that we can also directly translate machine code bits and bytes back to Assembly without losing any information. Languages like C, C++, Rust, etc. on the other hand translate what you write to machine code, which we can translate back to Assembly, but we can't translate it back to exactly what the code was in C, C++ or Rust, only approximations based on the Assembly. For all intents and purposes, Assembly is the exact code that the processor executes. Let's jump into a little interactive primer on Assembly. First of all, in Assembly we don't really have variables in the sense that we have in JavaScript, C, Rust, Go, etc. Instead, we have a set amount of registers that can store one value at a time. On a 64-bit system, these registers can store up to 64 bits of values (e.g. number from 0 to 18 446 744 073 709 551 615), on a 32-bit system registers can store up to 32-bit of values (0 to 4 294 967 295). Some of the registers in Intel architectures are named RAX, RDI, RBX, these are general purpose registers that you can pretty much use for whatever you want. Then there are others like RIP and RSP which control the address of the next instruction we should execute is in memory and address to the stack (more on that later), respectively. Underneath you will see what happens when we execute mov {register}, {value} instructions. mov is a "command" that tells the processor to store (or "move") values into a register or memory address. When you step through the program you can see that for each mov instruction the register updates with the value specified in the mov instruction. Most values in these examples are represented as HEX values. main: 0xFA2 mov eax, 0x1337 0xFA7 mov ebx, 0x7331 0xFAC mov edi, 0x1995 0xFB1 call exit RAX0x00000000RBX0x00000000RDI0x00000000 Note that eax and rax are two different names for the same register. You can read about that here if you're interested in learning more about it. Most processor architectures have a concept of a "stack". The stack is a quick place to store temporary values in memory (RAM). 64-bit systems store 64 bits (8 bytes) at a time in a growing stack, and when you go to fetch the values you fetch them 64 bits at a time, or 8 bytes. You store a value on the stack by using instruction push {register name} and you fetch a value into a register by using instruction pop {register name}. There is a special register RSP that has the value (pointer) of the memory address of the last value on the stack. This means RSP should always point to the last value you pushed to the stack. If you pop (fetch) a value from the stack, RSP decreases by 8, because we always push (store) and pop (fetch) values in 8-byte increments or decrements. main: 0xFA9 mov eax, 0x13371337 0xFAE push rax 0xFAF pop rdi 0xFB0 call exit RAX0x00000000RDI0x00000000RSP0x00005000 Here we set RAX to the HEX value 0x13371337, we push that value to the stack and then pop it into RDI On the right side underneath the registers you can see the stack, each box is 1 byte, each row is 1 8-byte value. On the top of each box you can see the memory address of the byte, on the bottom is the byte's value and in the middle you can see the ASCII representation of the value. Here's how it looks if you push and pop multiple values. main: 0xFA5 mov eax, 0x13371337 0xFAA push rax 0xFAB push rax 0xFAC push rax 0xFAD pop rdi 0xFAE pop rbx 0xFAF pop rsi 0xFB0 call exit RAX0x00000000RDI0x00000000RBX0x00000000RSI0x00000000RSP0x00005000 There's another special register called RIP that always points to the memory address of the next instruction the processor is going to execute. This register cannot be changed with mov instructions, but we can use jmp instructions instead. This allows us to jump to different places in the program when we need to. add1: 0xFA5 add rax, 1 0xFA9 jmp add2 main: 0xFAB mov eax, 1 0xFB0 jmp add1 add2: 0xFB2 add rax, 2 0xFB6 jmp add1 RAX0x00000000RIP0x00000fab When we jump to "add2" we add 2 to the current value of RAX, when we jump to add1 we add 1. The emulator has conveniently labeled addresses for us, so instead of seeing jmp 0xFA5, it's changed to the more readable jmp add1. The labels in the assembly view are for convenience, but when the processor executes the code, it doesn't care nor need the labels. This means that the label add1 is for address 0xFA5, main is for address 0xFAB and add2 is for address 0xFB2. Another way to jump to another place in the code is to use a call {address} instruction. A call instruction basically pushes the address of the next instruction after the call instruction to stack then jumps to the specified address. The counterpart to call is ret which pops the last value on the stack and jumps (returns) to that address. This lets pieces of code act as functions that you can call and then return from when it's done and continue normal execution. Functions we often want to call is to print something on screen, transform text, etc., just like you use functions in other programming languages. main: 0xF9D mov eax, 0x666 _loop: 0xFA2 call sub1 0xFA7 call add2 0xFAC jmp _loop add2: 0xFAE add rax, 2 0xFB2 ret sub1: 0xFB3 sub rax, 1 0xFB7 ret RAX0x00000000RIP0x00000f9dRSP0x00005000 Calls can also be nested, and you should see the stack grow and shrink as calls and returns are being executed. main: 0xF93 call call1 0xF98 call exit call1: 0xF9D call call2 0xFA2 ret call2: 0xFA3 call call3 0xFA8 ret call3: 0xFA9 call call4 0xFAE ret call4: 0xFAF mov eax, 0x42 0xFB4 ret RAX0x00000000RIP0x00000f93RSP0x00005000 We can also call functions with arguments, but we cannot do that inline in a call instruction, we need to either use the stack or the registers. For functions with few parameters we use registers for the first argument, the second, and the third. If we want to print something, we can use the printf function with the memory address of the string we want to print in the RDI register. To load the memory address into RDI we need to use the lea instruction, for our intents and purposes you can think of lea as a fancy mov. When we run the lea instruction in the emulator you can see that RDI is updated to 0x1000 which is the address of the "Hello, world!" string. main: 0xFA2 lea rdi, qword ptr [hello_world] 0xFA9 call printf 0xFAE call exit RDI0x00000000 Which registers to use for function arguments, and how many before the stack is used instead, varies a lot depending on programming language, compiler, operating system, etc. Assembly does not specify which registers must be used for arguments. Some functions also return some value, this is often in the RAX register. After we've called a function we can get the return value from the RAX register. Underneath is a little program that parrots what you input. First it prints the string "Say something: " to the screen, then it allocates 8 bytes on the stack for your input by subtracting 8 from the value of RSP. The read function takes two arguments, the first (in RDI) is the address of where to put the user's message, the second (in RSI) is the maximum number of bytes from user input to read. main: 0xF7C lea rdi, qword ptr [say_something] 0xF83 call print 0xF88 sub rsp, 8 0xF8C mov rdi, rsp 0xF8F mov esi, 8 0xF94 call read 0xF99 mov rax, rsp 0xF9C lea rdi, qword ptr [you_said] 0xFA3 call printf 0xFA8 call exit RDI0x00000000RSP0x00005000 When you reach the read instruction, all further execution is blocked until you press enter in the console. When running the example, you should be able to see your input (up to 8 bytes) in the stack visualization, each character represented in their own "boxes". printf in this case takes the first format argument in RAX, so we copy the value of RSP over to RAX because the user input string is currently the last thing on the stack, and we want printf to print that string. A bug arises when user input is allowed to exceed the number of bytes we reserved for it. In the case of the last example, if RSI had been a larger number than what we subtracted from RSP before calling read, the user could have written over some previous values on the stack. In the get_name function of the example below we are allocating 0x10 (16) bytes from RSP (with sub rsp, 0x10), but when calling read we are instructing it in the RSI register to accept up to 0x18 (24) bytes into the memory location of RDI (which points at the stack). The value put on the stack before the 16 bytes allocated for user input is a return address from the call made to get_name in main. secret_function: 0xF52 lea rdi, qword ptr [secret_text] 0xF59 call print 0xF5E call exit main: 0xF63 call ask_name 0xF68 call get_name 0xF6D mov rax, rdi 0xF70 call welcome 0xF75 call exit ask_name: 0xF7A lea rdi, qword ptr [wassa_you_name] 0xF81 call print 0xF86 ret get_name: 0xF87 sub rsp, 0x10 0xF8B mov rdi, rsp 0xF8E mov esi, 0x18 0xF93 call read 0xF98 mov rdi, rsp 0xF9B add rsp, 0x10 0xF9F ret welcome: 0xFA0 lea rdi, qword ptr [welcome_msg] 0xFA7 call printf 0xFAC ret RDI0x00000000RSP0x00005000 In this example there is a function called secret_function that is not called anywhere in normal execution. However, because of the buffer overflow it is possible to "return" to that function by first writing 16 bytes of whatever you want followed by 8 bytes that represent the address you want to return to. For example, if I wanted to go directly to call exit at address 0xF75, I would input AAAAAAAAAAAAAAAA\x00\x00\x00\x00\x00\x00\x0f\x75 into the console when prompted for input. By prepending \x to a hexadecimal, you input that byte directly, so \0x41 is the ASCII representation for A, \x00 is just a null byte. The secret_function starts at address 0xf52, you should be able to edit my example input to jump to that function. The last example is more complex than the previous examples. This program generates a random password each time it starts and requires you to input the right one to "log in". You'll find that in the enter_password function read is being given more space than was allocated on the stack. Try to bypass the login function and get to say_success. main: 0xE80 call boot_up 0xE85 call exit say_success: 0xE8A lea rdi, qword ptr [_loggedin_msg] 0xE91 call print 0xE96 call exit say_wrong: 0xE9B lea rdi, qword ptr [wrong_pass_msg] 0xEA2 call print 0xEA7 ret enter_password: 0xEA8 lea rdi, qword ptr [enter_pass_msg] 0xEAF call print 0xEB4 sub rsp, 0x10 0xEB8 mov rdi, rsp 0xEBB mov esi, 0x20 0xEC0 call read 0xEC5 lea rsi, qword ptr [password_ptr] 0xECC mov rsi, qword ptr [rsi] 0xECF mov edx, 0x10 0xED4 call strncmp 0xED9 add rsp, 0x10 0xEDD ret boot_up: 0xEDE lea rdi, qword ptr [booting_msg] 0xEE5 call print 0xEEA call integrity_check 0xEEF lea rdi, qword ptr [welcome_msg] 0xEF6 call print password_loop: 0xEFB call enter_password 0xF00 cmp rax, 0 0xF04 je say_success 0xF06 call say_wrong 0xF0B jmp password_loop 0xF0D ret integrity_check: 0xF0E push rdi 0xF0F lea rdi, qword ptr [password_ptr] 0xF16 mov edi, 0x10 0xF1B call malloc 0xF20 mov rdi, rax 0xF23 call create_random_password 0xF28 lea rax, qword ptr [password_ptr] 0xF2F mov qword ptr [rax], rdi 0xF32 pop rdi 0xF33 ret RAX0x00000000RBX0x00000000RDI0x00000000 Some of the concepts in this blog post are grossly oversimplified, and we are not talking about any mitigations to these types of attacks at all. These types of attacks have worked well in the past, but due to mitigations such as ASLR, DEP, Stack Canaries, and more recently Pointer Authentication, modern binary exploitation requires a lot more effort from the attacker. I have not touched endianness because I only see it as adding unnecessary complexity that isn't really relevant to the exercises. In fact, the emulator actually handles endianness wrong, according to the Intel specifications. Normally, in a situation where you can control the instruction pointer and DEP is not present (e.g. you can execute code on the stack or other writable memory) you would just write your own custom code there and jump to it. That gives you a lot of freedom in terms of how to attack, but with DEP enabled you instead have to create a ROP chain. We have another interactive tutorial on ROP that you might want to look at. I'd love some feedback over on Twitter @bordplate. Or you can check out the code for the emulator on GitHub. Sursa + Interactiv: https://nagarrosecurity.com/blog/interactive-buffer-overflow-exploitation Quote