A required step to understand buffer overflow

Nytro · February 20, 2013

[h=3]A required step to understand buffer overflow[/h]

This is not a buffer overflow exploit, but a required background that will help to understand how CPU & memory "collaborate" each other to execute a program.

I have read many articles about 'buffer overflow'. Most of them starting from a specific point by 'stowing' the basic knowledge one must have to deeply understand what is going on (behind the scenes). I wrote this article to cover (I hope) this gap.

If at the end of this article you feel more comfortable with concepts like CALL, RETN and how a function is executed using the memory (buffer, stack, etc) then I will consider this article as a successful one...

First, I would like to point out that everything we say, is about the processor xx86 family. In addition, most memory addresses are expressed in a decimal notation (for the shake of clarity, for beginners) instead of hexadecimal that actually represented by real world software systems.

Requirements in order to read this article:

1. A basic understanding of assembly language.

2. A basic understanding of C language.

Every process starts in a computer memory (RAM – Random Access Memory) in three basic segments:

-Code Segment

-Data Segment (the well known BSS)

-Stack Segment

CODE SEGMENT

------------

In this memory segment, "live" all instructions of our program. Nobody... (nobody? well ok, almost nobody) can write to this memory segment i.e. is a read only segment.

For example

All assembly instructions (in C code here) are located in code segment:

/* Set the 1st diagonal items to 1 otherwise 0 */
for (i = 0; i < 100; i++)
     for (j = 0; j < 100; j++)
          if (i<>j)
              a[i][j] = 0
          else
              a[i][j] = 1;

PS: The remarks /*...*/ are not included... in the data segment. The compiler does not produce code for the remarks.

DATA SEGMENT

------------

All initialized or un-initialized global variable are stored in this non-read only segment.

For example:

int i;
int j = 0;
int a[100][100];

STACK SEGMENT

-------------

All function variables, return addresses and function addresses are stored in this non-readonly memory.

This segment is actually a stack data structure (for those that have attended a basic information technology course). This, actually means, that we put variables in a stack in memory. The last putted (or pushed) variable is in the top on stack i.e. the first available. The well known LIFO (Last In First Out) data structure.

The processor register ESP (Extended Stack Pointer) is used to keep the address of the first current available element of the stack.

In the stack: we can put (PUSH) and get (POP) values.

There are two important “secrets” here:

[1] PUSH and POP instructions are done in 4-byte-units because of the 32bit architecture of xx86 processors family.

[2] Stack grows downward, that is, if SP=256, just after a “PUSH 34” instruction, SP will become 252 and the value of EAX will be placed on address 252.

For example:

STACK
adrs      memory
---- ------------------
256  |   xy          |
252  |               |
248  |               |
244  |               |
...  .................
(ESP=256)

Instruction > PUSH EAX ; remark: suppose EAX = 34

STACK
256  |   xy          |
252  |   34          |
248  |               |
244  |               |
...  .................
(ESP=252)

Instruction > POP EAX ; remark: Get the value from the stack into EAX register

STACK
256  |   xy          |
252  |   34          |
248  |               |
244  |               |
...  .................
(ESP=256)

Instruction > PUSH 15 ; remark: suppose EAX = 15

Instruction > PUSH 16 ; remark: suppose EBX = 16

STACK
256  |   xy          |
252  |   15          |
248  |   16          |
244  |               |
...  .................
(ESP=248)

What is behind a function-call

-------------------------------

Before we explain what is behind, we must say a few words about the EIP (Extended Instruction Pointer or simple 'Instruction pointer'). This register keeps the code segment address of the instruction that will be executed by the CPU.

Every time CPU executes an instruction stores into EIP the address of the instruction that follows the currently executed.

But, how does CPU find the address of the next instruction?

Well... we have two cases here...

1. The address is immediately after the instruction currently executed.

2. There is a 'JMP' (jump, i.e. a function call) so the instruction that needs to be executed next is in an address which is not next to the current.

In case 1 the address is calculated by simply add the Length of the currently executed instruction to the current EIP value.

Example:

Suppose we have the following 2 instruction to the addresses 100, 101

100 push EDX
101 mov  ESP 0

Suppose that at the starting point of our little program we have: EIP = 100

CPU executes the instruction at address 100.

CPU checks the instruction:

Is it a JUMP? No, so calculate its size. CPU knows that the push instruction is 1 byte long.

So,... the new value of

EIP = EIP + size(push EDX) =>

EIP = 100 + 1 =>

EIP = 101

So,.... CPU executes the instruction at address 101, and so forth...

In case 2, we have a jump... things are a bit more different.

Actually, just before we JMP to another address (i.e. call a function), we save the address of the next instruction in a temporary register, say in EDX; and before returning from the function we write the address in EDX to EIP back again.

CALL and RETN assembly instructions are used ... by the CPU to calculate the above addresses:

The CALL is used to do 2 things:

1. To "remember" the next instruction that will be executed after function returns (by pushing its address to the stack) and

2. To write into the EIP the address of the calling function i.e. to perform the function call.

The RETN instruction is called at the end of the function:

It pops (gets) the "return address" that CALL pushes into the stack to continue the execution after the end of the function.

The Base pointer (EBP)

----------------------

Each function in any program (even the main() function in C) has its own stack frame. A stack frame is a logical group of consecutive variables in the stack that keeps variables and addresses for every function that is currently executed.

Every address in the stack’s frame is a relative address. That means, we address the locations of data in our stack in relative to some criterion. And this criterion is EBP, which is the acronym for Extended Base Pointer.

EBP has the stack pointer of the caller function. We PUSH the old ESP to the stack, and utilize another register,named EBP to relatively reference local variables in the callee function.

I hope the use of the base pointer will be more clear in the following example.

A REAL EXAMPLE C PROGRAM:

Consider the following C program:

void function1(int , int , int );
void main()
{
    function1 (1, 2, 3);
}

void function1 (int a, int b, int c)
{
        char z[4];
}

I compile/link the above program and I use the olly debugger to check the assembly code created.

Bypassing the operating systems instructions (which is the 90% of the assembly code) the rest is the code that corresponds to our little program:

0040123C  /. 55             PUSH EBP
0040123D  |. 8BEC           MOV EBP,ESP
0040123F  |. 6A 03          PUSH 3  ; /Arg3 = 00000003
00401241  |. 6A 02          PUSH 2  ; |Arg2 = 00000002
00401243  |. 6A 01          PUSH 1  ; |Arg1 = 00000001
00401245  |. E8 05000000    CALL bo1.0040124F  ; \bo1.0040124F
0040124A  |. 83C4 0C        ADD ESP,0C
0040124D  |. 5D             POP EBP
0040124E  \. C3             RETN

0040124F  /$ 55             PUSH EBP
00401250  |. 8BEC           MOV EBP,ESP
00401252  |. 51             PUSH ECX
00401253  |. 59             POP ECX
00401254  |. 5D             POP EBP
00401255  \. C3             RETN

ANALYSIS:

---------

The addresses from 0040123C to 0040124E is the main() function.

The addresses from 0040124F to 00401255 is the function1() function.

0040123C /. 55 PUSH EBP

Backs up the old stack pointer. It pushes it onto the stack.

0040123D |. 8BEC MOV EBP,ESP

Copy the old stack pointer to the ebp register

From then on, in the function, we'll reference function's local

variables with EBP. These two instructions are called the

"Procedure Prologue".

The stack has the EBP value:

[ebp]
STACK
256  |   [ebp]       |
...  .................
(ESP=256)

0040123F  |. 6A 03          PUSH 3             ; /Arg3 = 00000003
00401241  |. 6A 02          PUSH 2             ; |Arg2 = 00000002
00401243  |. 6A 01          PUSH 1             ; |Arg1 = 00000001

Here we put the arguments into the stack

The stack is:

STACK

256  |   [ebp]       |
252  |     3         |
248  |     2         |
244  |     1         |
...  .................
(ESP=244)

00401245 |. E8 05000000 CALL bo1.0040124F ; \bo1.0040124F

call the function at addresss 0040124F. bo1 is the name of my executable.

The stack becomes:

STACK

256  |   [ebp]       |
252  |     3         |
248  |     2         |
244  |     1         |
240  |  0040124A     | <- the return address when the function1 ends.
...  .................
(ESP=240)

Let’s follow the execution, so go to address 0040124F (the function1):

0040124F  /$ 55             PUSH EBP
00401250  |. 8BEC           MOV EBP,ESP

Hmm... this is the "Procedure Prologue" again (remember this must be executed in every function). It set ups its own stack frame. The EBP register is currently pointing at a location in main's stack frame. This value must be preserved. So, EBP is pushed onto the stack. Then the contents of ESP is transferred to EBP. This allows the arguments to be referenced as an offset from EBP and frees up the stack register ESP to do other things.

The stack now, is:

STACK

256  |   [ebp]       |
252  |     3         |
248  |     2         |
244  |     1         |
240  |  0040124A     | <- the return address when the function1 ends.
236  |  <main’s EBP> | <- Note that ESP=EBP indicates this address.
...  .................
(ESP=236)

00401253  |. 59             POP ECX
00401254  |. 5D             POP EBP

After two pops the actual stack becomes:

STACK

256  |   [ebp]       |
252  |     3         |
248  |     2         |
244  |     1         |
...  .................
(ESP=244)

00401255  \. C3             RETN

The function ends and returns to the 0040124A (remember our definition of the RET instruction).

0040124A  |. 83C4 0C        ADD ESP,0C

After the function RETurned, we add 12 or 0C in hex (since we pushed 3 args

onto the stack, each allocating 4 bytes (integers)) into Stack Pointer. Increasing the ESP we actually decreasing the stack (remember that we fill stack downwards from high to low memory addresses i.e. ESP = 244 + 12 = 256).

STACK

256  |   [ebp]       |
...  .................
(ESP=256)

Thus, the ESP has the value that has at the first step of the programs execution before the function call.

I hope that you get a basic understanding of the use of Stack and Stack Pointer.

In another article I will describe how nasty things can happened here. Hint: How about overwriting the stack item (at address 240 in our example above) or how about overwriting the value of the Instruction Pointer (EIP)...

I suggest you to try my little program or better create your own and test, check, review, test, check, review, test, check, review!!

Happy Programming Guys!!

References:

[1] BUFFER OVERFLOWS DEMYSTIFIED by murat@enderunix.org

[2] C Function Call Conventions and the Stack (UMBC CMSC 313, Computer Organization & Assembly Language, Spring 2002, Section 0101)

[3] The Assembly Language Book for IBM PC by Peter Norton (ISBN 960-209-028-6)

[4] Analysis of Buffer Overflow Attacks from Analysis of Buffer Overflow Attacks :: Windows OS Security :: Articles & Tutorials :: WindowSecurity.com

[5] 8088 8086 Programming and Applications for IBM PC/XT & Compatibles by Nikos Nasoufis

Posted by Andreas Venieris at 7:49 PM

Sursa: 0x191 Unauthorized: A required step to understand buffer overflow

Sign In

A required step to understand buffer overflow

Recommended Posts

Nytro

Join the conversation

Browse

Activity

Pages