Jump to content
Nytro

Attacking Uninitialized Variables with Recursion

Recommended Posts

Attacking Uninitialized Variables with Recursion

Overview

Recursion can be defined as a computer programming technique involving the use of a procedure, subroutine, function, or algorithm that calls itself one or more times until a specified condition is met. There are debates about recursion, how it should be used, and whether it should be used at all. Recursion is a useful tool that should be in every programmers toolbox. However, when it’s used incorrectly it can over complicate flow logic and introduce vulnerabilities in code. This has lead to some coding standards explicitly banning its use, along with banning other complex flow constructs such as goto. In a nutshell, recursion can get confusing, and confusing code can lead to programming mistakes. In this post I explain recursion and how it can be leveraged for vulnerability research. I also share a plugin I wrote that leverages Binary Ninja’s medium level intermediate language (MLIL) to develop a cross-architecture solution for locating recursive logic in compiled code.

Recursion Summary

Recursion can be broken down into two categories, Direct Recursion and Indirect Recursion. Direct recursion occurs when a routine invokes itself. There are many practical applications for direct recursion such as mathematical algorithms, list sorts, and binary search trees. An example of direct recursion is depicted below. This example returns the factorial of the supplied integer (the product of the integer and all descending integers below it).

uint32_t factorial(uint32_t number) { if (number <= 1) { return 1; } return number * factorial(number -1); }
1
2
3
4
5
6
7
8
9
uint32_t factorial(uint32_t number)
{
    if (number <= 1)
    {
        return 1;
    }
 
    return number * factorial(number -1);
}

Indirect recursion occurs when a routine invokes another routine that eventually results in the invocation of the original routine. The most practical use I’ve found for indirect recursion is a directory traversal program containing a routine that navigates the tree, and another routine that processes the files. If the file is a directory, then the original routine is invoked. Another example, that can be used to check if an integer is odd or even, is depicted below.

int is_odd(int number) { return number == 0 ? 0 : is_even(abs(number) - 1); } int is_even(int number) { return number == 0 ? 1 : is_odd(abs(number) - 1); }
1
2
3
4
5
6
7
8
9
int is_odd(int number)
{
    return number == 0 ? 0 : is_even(abs(number) - 1);
}
 
int is_even(int number)
{
    return number == 0 ? 1 : is_odd(abs(number) - 1);
}

 

Recursion in Vulnerability Research

Stack Overflow Denial-of-Service

Recursion tends to cause problems when the amount of recursive calls are not properly bounded. When a function call is carried out by an x86 processor, the return address and the function arguments are pushed onto the call stack. With each recursive call, the stack continues to grow exponentially. If recursion is too deep, a stack overflow will occur once the allocated stack size is exceeded. Recursion-induced stack overflows are rarely exploitable and often only result in denial-of-service. This is because recusion-based stack overflows exceed the top of the stack region as the call stack grows during recursion. Above the stack (in most cases) there is a gaurd page present that prevents the stack from clashing with another memory region. However, gaurd pages aren’t used in all environments. The Linux kernel for example does not employ this protection. There are also techniques for jumping the gaurd page and overflowing into the heap. These techniques require that the recursive function declares sufficiently large stack variables and that heap memory is near the other side of the guard page. More on jumping gaurd pages can be found here.

Attacking Uninitialized Stack Variables with Recursion

Another role recursion could play in exploiting a vulnerable program is attacking uninitialized stack variables. As mentioned previously, x86 calling convention dictates that function arguments are pushed to the stack by the caller. If an attacker can control the values supplied as function arguments during recursion they could spray the stack with values that could be used to initialize a “uninitialized” stack-based variable. Compilers assume that when you declare a variable, that you are going to initialize it (before using it). Compilers account for this by assigning a virtual location to store the value. In the case of a uninitialized stack variable, the compiler will assign a location on the stack. Since stack frames across function invocations can overlap (and be re-used), an uninitialized stack variable will often contain junk data left behind by a previously invoked routine, until which point it is initialized. To illustrate how recursion can play a role in abusing uninitialized variables, I wrote a very practical program that should set me up nicely for the next Turing award.

#include <stdio.h> #include <stdlib.h> #include <stdint.h> void take_int(int j) { if (j == 1337) printf("How could this be? j was never initialized!\n"); } void recursive_func(int n) { static int i = 0; i++; if (i == 10) return; else recursive_func(n); } void func() { int j; take_int(j); } int main(int argc, char **argv) { int n = atoi(argv[1]); recursive_func(n); func(); return 0; }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
 
void take_int(int j)
{
    if (j == 1337)
        printf("How could this be? j was never initialized!\n");
}
 
void recursive_func(int n)
{
    static int i = 0;
    i++;
    if (i == 10)
        return;
    else
        recursive_func(n);
}
 
void func()
{
    int j;
    take_int(j);
}
 
int main(int argc, char **argv)
{
    int n = atoi(argv[1]);
    recursive_func(n);
    func();
    return 0;
}

The program above takes an integer as a command-line argument and calls recursive_function, which recurses 10 times. Then the program calls func, which calls take_int while passing j as an argument. The vulnerability in this code exists in func. The variable j is declared, but never initialized before being passed to take_int. Let’s compile and run this program while supplying 1337 as a command-line argument.

$ gcc recursion_demo.c -m32 -o recursion_demo $ ./recursion_demo 1337 How could this be? j was never initialized!
1
2
3
$ gcc recursion_demo.c -m32 -o recursion_demo
$ ./recursion_demo 1337
How could this be? j was never initialized!

The conditional in take_int (which checks if j equals 1337) clearly evaluated as true, despite that j was never initialized,  but why? After recursive_func returns in main, the stack resembles the illustration depicted below.  During repeated recursive calls, the stack pointer is adjusted into lower addressable memory and n (0x00000539) is pushed to the stack along with the return address of the caller. With each time that the function recurses, we have better odds that we can get the program to use our value when accessing the uninitialized variable in take_int. In addition to the return address, and the function argument, there’s also additional data on the stack caused by logic within recursive_func. This is the challenge of attacking uninitialized variables in practical applications. It is difficult to get your data to overlap with the offset used by the uninitialized variable, and it is difficult to achieve this without some other instruction clobbering it before your variable is referenced.

stack-after-recursion-e1510470166953-290 Stack Layout After Recursion

By the time take_int is called, the stack has been polluted with 0x00000539 entries. In the figure below, at 00000644, the operand (stack offset) that is supposed to contain the value of j, is copied onto the stack as the first argument for take_int. However, [ebp-0xc] does not contain the value of j, because j was never initialized. Instead it contains 0x00000539, which was left behind from a call setup for recursive_func.

j-from-stack-1.pngtake_int call setup

Depicted in the next screenshot is the disassembly for a portion of take_int. The most important instruction is at 000005d1. This instruction is where the pointer to arg1 (j) is dereferenced and compared with the constant value 0x00000539. As already established, j is never set, so the program uses what was left behind on the stack at that location, which happens to be 0x00000539, left behind from one of the calls to recursive_func.  The two compared values are equal. Therefore, the program does not take the branch at 000005d8 and proceeds to print the anti-climatic output.

j-from-stack.pngtake_int Dereference of Uninitialized Stack Variable j 

The example program will behave different depending on your version of gcc and how it compiles the source. I compiled the program with gcc version 6.3.0 20170516. It is likely that the example won’t work if you use a different version of gcc. Instead of the uninitialized stack variable (j) overlapping with addressable memory containing your sprayed argument for recursive_func (n), it may instead overlap with a return address or some other data on the stack. It is all dependent on how the compiler lays out the instructions. If you insist on running the example, and your compiler isn’t cooperating, leave a comment or shoot me an email and I’ll post my binary.

Binary Ninja Recursion Plugin

As part of my endeavor to study recursion, I developed a plugin that uses Binary Ninja’s API and medium level intermediate language to locate recursive logic in compiled code. The plugin is capable of identifying both direct and indirect recursion and applies comments to recursive functions and their caller instructions. It also generates a markdown page that is rendered in BN’s UI that contains the locations of recursive logic. It works against binaries compiled under all architectures supported by BN. This plugin is part of Binjago, a suite of plugins that I’ve written to aid in static analysis. A screenshot of the instruction comments applied by the plugin is depicted below.  The plugin can be found here.

direct-recursion.png Binjago Recursive Logic Annotations

Conclusion

Complex recursive routines are often difficult to understand, which sometimes leads to other programming mistakes in surrounding logic. Unbounded recursion can be leveraged to smash the stack and under some conditions, recursion can be leveraged to effect the layout of the stack to attack an uninitialized stack variable. If you are interested in reading about a recursion vulnerability that has been exploited in practice, I recommend Project Zero: Exploiting Recursion in the Linux Kernel. Thank you for reading!

 

Sursa: https://signal11.io/index.php/2017/11/19/attacking-uninitialized-variables-with-recursion/

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...