Leaderboard

rVMI: Perform Full System Analysis with Ease

rVMI: Perform Full System Analysis with Ease September 18, 2017 | by Jonas Pfoh, Sebastian Vogl | Threat Research Manual dynamic analysis is an important concept. It enables us to observe the behavior of a sophisticated malware sample or exploit by executing it in a controlled environment. The information gathered through this process is often crucial in gaining a full understanding of a sample. When performing manual dynamic analysis today, there are essentially two tools one can use: debuggers and sandboxes. While both of these tools are certainly very valuable, neither has been designed for the purpose of manual dynamic analysis. As a consequence, both approaches have inherent shortcomings that make interactive dynamic analysis difficult and tedious. In this blog post we present a novel approach to manual dynamic analysis: rVMI. rVMI was specifically designed for interactive malware analysis. It combines virtual machine introspection (VMI) and memory forensics to provide a platform for interactive and scriptable analysis. This blog post follows our presentation at Black Hat USA 2017. What is rVMI? rVMI can best be described as debugger on steroids. In contrast to traditional debuggers, rVMI operates entirely outside of the target environment and allows the analysis of a live system from the hypervisor-level. This is achieved by combining VMI with memory forensics. In particular, rVMI makes use of full system virtualization to move the debugger out of the virtual machine (VM) to the hypervisor-level. As a result, the debugger runs isolated from the malware executed in a QEMU/KVM VM. This gives the analyst complete control over the target through VMI while keeping the malware in an isolated, debugger free environment. In addition, this enables an analyst to pause and resume the VM at any point in time as well as making use of traditional debugging functionality such as breakpoints and watchpoints. To bridge the semantic gap and support full system analysis, rVMI makes use of Rekall. Rekall is a powerful open source memory forensics framework. It provides a wide range of features that allow one to enumerate processes, inspect kernel data structures, access process address spaces, and much more. While Rekall usually works with static memory dumps, rVMI extends Rekall to support live VMs. This enables an analyst to leverage the entire Rekall feature set while performing an analysis with rVMI, effectively allowing them to inspect user space processes, kernel drivers, and even pre-boot environments with a single tool. rVMI supports all operating systems that Rekall supports, including Windows (XP-10), Linux, and Mac OS X. Analysis is performed through an iPython shell that makes all Rekall and VMI features available through a single interface. In addition, rVMI provides a Python API that makes it easy to automate tasks through external scripts or on-the-fly within the iPython shell. Finally, rVMI supports snapshots, which allows an analyst to easily save or restore states of the target environment. Articol complet: https://www.fireeye.com/blog/threat-research/2017/09/rvmi-full-system-analysis.html

September 23, 2017

2 points

Optimizing web servers for high throughput and low latency

This is an expanded version of my talk at NginxConf 2017 on September 6, 2017. As an SRE on the Dropbox Traffic Team, I’m responsible for our Edge network: its reliability, performance, and efficiency. The Dropbox edge network is an nginx-based proxy tier designed to handle both latency-sensitive metadata transactions and high-throughput data transfers. In a system that is handling tens of gigabits per second while simultaneously processing tens of thousands latency-sensitive transactions, there are efficiency/performance optimizations throughout the proxy stack, from drivers and interrupts, through TCP/IP and kernel, to library, and application level tunings. Disclaimer In this post we’ll be discussing lots of ways to tune web servers and proxies. Please do not cargo-cult them. For the sake of the scientific method, apply them one-by-one, measure their effect, and decide whether they are indeed useful in your environment. This is not a Linux performance post, even though I will make lots of references to bcc tools, eBPF, and perf, this is by no means the comprehensive guide to using performance profiling tools. If you want to learn more about them you may want to read through Brendan Gregg’s blog. This is not a browser-performance post either. I’ll be touching client-side performance when I cover latency-related optimizations, but only briefly. If you want to know more, you should read High Performance Browser Networking by Ilya Grigorik. And, this is also not the TLS best practices compilation. Though I’ll be mentioning TLS libraries and their settings a bunch of times, you and your security team, should evaluate the performance and security implications of each of them. You can use Qualys SSL Test, to verify your endpoint against the current set of best practices, and if you want to know more about TLS in general, consider subscribing to Feisty Duck Bulletproof TLS Newsletter. Structure of the post We are going to discuss efficiency/performance optimizations of different layers of the system. Starting from the lowest levels like hardware and drivers: these tunings can be applied to pretty much any high-load server. Then we’ll move to linux kernel and its TCP/IP stack: these are the knobs you want to try on any of your TCP-heavy boxes. Finally we’ll discuss library and application-level tunings, which are mostly applicable to web servers in general and nginx specifically. For each potential area of optimization I’ll try to give some background on latency/throughput tradeoffs (if any), monitoring guidelines, and, finally, suggest tunings for different workloads. Hardware CPU For good asymmetric RSA/EC performance you are looking for processors with at least AVX2 (avx2 in /proc/cpuinfo) support and preferably for ones with large integer arithmetic capable hardware (bmi and adx). For the symmetric cases you should look for AES-NI for AES ciphers and AVX512 for ChaCha+Poly. Intel has a performance comparison of different hardware generations with OpenSSL 1.0.2, that illustrates effect of these hardware offloads. Latency sensitive use-cases, like routing, will benefit from fewer NUMA nodes and disabled HT. High-throughput tasks do better with more cores, and will benefit from Hyper-Threading (unless they are cache-bound), and generally won’t care about NUMA too much. Specifically, if you go the Intel path, you are looking for at least Haswell/Broadwell and ideally Skylake CPUs. If you are going with AMD, EPYC has quite impressive performance. NIC Here you are looking for at least 10G, preferably even 25G. If you want to push more than that through a single server over TLS, the tuning described here will not be sufficient, and you may need to push TLS framing down to the kernel level (e.g. FreeBSD, Linux). On the software side, you should look for open source drivers with active mailing lists and user communities. This will be very important if (but most likely, when) you’ll be debugging driver-related problems. Memory The rule of thumb here is that latency-sensitive tasks need faster memory, while throughput-sensitive tasks need more memory. Hard Drive It depends on your buffering/caching requirements, but if you are going to buffer or cache a lot you should go for flash-based storage. Some go as far as using a specialized flash-friendly filesystem (usually log-structured), but they do not always perform better than plain ext4/xfs. Anyway just be careful to not burn through your flash because you forgot to turn enable TRIM, or update the firmware. Operating systems: Low level Firmware You should keep your firmware up-to-date to avoid painful and lengthy troubleshooting sessions. Try to stay recent with CPU Microcode, Motherboard, NICs, and SSDs firmwares. That does not mean you should always run bleeding edge—the rule of thumb here is to run the second to the latest firmware, unless it has critical bugs fixed in the latest version, but not run too far behind. Drivers The update rules here are pretty much the same as for firmware. Try staying close to current. One caveat here is to try to decoupling kernel upgrades from driver updates if possible. For example you can pack your drivers with DKMS, or pre-compile drivers for all the kernel versions you use. That way when you update the kernel and something does not work as expected there is one less thing to troubleshoot. CPU Your best friend here is the kernel repo and tools that come with it. In Ubuntu/Debian you can install the linux-tools package, with handful of utils, but now we only use cpupower, turbostat, and x86_energy_perf_policy. To verify CPU-related optimizations you can stress-test your software with your favorite load-generating tool (for example, Yandex uses Yandex.Tank.) Here is a presentation from the last NginxConf from developers about nginx loadtesting best-practices: “NGINX Performance testing.” cpupower Using this tool is way easier than crawling /proc/. To see info about your processor and its frequency governor you should run: $ cpupower frequency-info ... driver: intel_pstate ... available cpufreq governors: performance powersave ... The governor "performance" may decide which speed to use ... boost state support: Supported: yes Active: yes Check that Turbo Boost is enabled, and for Intel CPUs make sure that you are running with intel_pstate, not the acpi-cpufreq, or even pcc-cpufreq. If you still using acpi-cpufreq, then you should upgrade the kernel, or if that’s not possible, make sure you are using performance governor. When running with intel_pstate, even powersave governor should perform well, but you need to verify it yourself. And speaking about idling, to see what is really happening with your CPU, you can use turbostat to directly look into processor’s MSRs and fetch Power, Frequency, and Idle State information: # turbostat --debug -P ... Avg_MHz Busy% ... CPU%c1 CPU%c3 CPU%c6 ... Pkg%pc2 Pkg%pc3 Pkg%pc6 ... Here you can see the actual CPU frequency (yes, /proc/cpuinfo is lying to you), and core/package idle states. If even with the intel_pstate driver the CPU spends more time in idle than you think it should, you can: Set governor to performance. Set x86_energy_perf_policy to performance. Or, only for very latency critical tasks you can: Use /dev/cpu_dma_latency interface. For UDP traffic, use busy-polling. You can learn more about processor power management in general and P-states specifically in the Intel OpenSource Technology Center presentation “Balancing Power and Performance in the Linux Kernel” from LinuxCon Europe 2015. CPU Affinity You can additionally reduce latency by applying CPU affinity on each thread/process, e.g. nginx has worker_cpu_affinity directive, that can automatically bind each web server process to its own core. This should eliminate CPU migrations, reduce cache misses and pagefaults, and slightly increase instructions per cycle. All of this is verifiable through perf stat. Sadly, enabling affinity can also negatively affect performance by increasing the amount of time a process spends waiting for a free CPU. This can be monitored by running runqlat on one of your nginx worker’s PIDs: usecs : count distribution 0 -> 1 : 819 | | 2 -> 3 : 58888 |****************************** | 4 -> 7 : 77984 |****************************************| 8 -> 15 : 10529 |***** | 16 -> 31 : 4853 |** | ... 4096 -> 8191 : 34 | | 8192 -> 16383 : 39 | | 16384 -> 32767 : 17 | | If you see multi-millisecond tail latencies there, then there is probably too much stuff going on on your servers besides nginx itself, and affinity will increase latency, instead of decreasing it. Memory All mm/ tunings are usually very workflow specific, there are only a handful of things to recommend: Set THP to madvise and enable them only when you are sure they are beneficial, otherwise you may get a order of magnitude slowdown while aiming for 20% latency improvement. Unless you are only utilizing only a single NUMA node you should set vm.zone_reclaim_mode to 0. ## NUMA Modern CPUs are actually multiple separate CPU dies connected by very fast interconnect and sharing various resources, starting from L1 cache on the HT cores, through L3 cache within the package, to Memory and PCIe links within sockets. This is basically what NUMA is: multiple execution and storage units with a fast interconnect. For the comprehensive overview of NUMA and its implications you can consult “NUMA Deep Dive Series” by Frank Denneman. But, long story short, you have a choice of: Ignoring it, by disabling it in BIOS or running your software under numactl --interleave=all, you can get mediocre, but somewhat consistent performance. Denying it, by using single node servers, just like Facebook does with OCP Yosemite platform. Embracing it, by optimizing CPU/memory placing in both user- and kernel-space. Let’s talk about the third option, since there is not much optimization needed for the first two. To utilize NUMA properly you need to treat each numa node as a separate server, for that you should first inspect the topology, which can be done with numactl --hardware: $ numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 16 17 18 19 node 0 size: 32149 MB node 1 cpus: 4 5 6 7 20 21 22 23 node 1 size: 32213 MB node 2 cpus: 8 9 10 11 24 25 26 27 node 2 size: 0 MB node 3 cpus: 12 13 14 15 28 29 30 31 node 3 size: 0 MB node distances: node 0 1 2 3 0: 10 16 16 16 1: 16 10 16 16 2: 16 16 10 16 3: 16 16 16 10 Things to look after: number of nodes. memory sizes for each node. number of CPUs for each node. distances between nodes. This is a particularly bad example since it has 4 nodes as well as nodes without memory attached. It is impossible to treat each node here as a separate server without sacrificing half of the cores on the system. We can verify that by using numastat: $ numastat -n -c Node 0 Node 1 Node 2 Node 3 Total -------- -------- ------ ------ -------- Numa_Hit 26833500 11885723 0 0 38719223 Numa_Miss 18672 8561876 0 0 8580548 Numa_Foreign 8561876 18672 0 0 8580548 Interleave_Hit 392066 553771 0 0 945836 Local_Node 8222745 11507968 0 0 19730712 Other_Node 18629427 8939632 0 0 27569060 You can also ask numastat to output per-node memory usage statistics in the /proc/meminfo format: $ numastat -m -c Node 0 Node 1 Node 2 Node 3 Total ------ ------ ------ ------ ----- MemTotal 32150 32214 0 0 64363 MemFree 462 5793 0 0 6255 MemUsed 31688 26421 0 0 58109 Active 16021 8588 0 0 24608 Inactive 13436 16121 0 0 29557 Active(anon) 1193 970 0 0 2163 Inactive(anon) 121 108 0 0 229 Active(file) 14828 7618 0 0 22446 Inactive(file) 13315 16013 0 0 29327 ... FilePages 28498 23957 0 0 52454 Mapped 131 130 0 0 261 AnonPages 962 757 0 0 1718 Shmem 355 323 0 0 678 KernelStack 10 5 0 0 16 Now lets look at the example of a simpler topology. $ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 46967 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 48355 MB Since the nodes are mostly symmetrical we can bind an instance of our application to each NUMA node with numactl --cpunodebind=X --membind=X and then expose it on a different port, that way you can get better throughput by utilizing both nodes and better latency by preserving memory locality. You can verify NUMA placement efficiency by latency of your memory operations, e.g. by using bcc’s funclatency to measure latency of the memory-heavy operation, e.g. memmove. On the kernel side, you can observe efficiency by using perf stat and looking for corresponding memory and scheduler events: # perf stat -e sched:sched_stick_numa,sched:sched_move_numa,sched:sched_swap_numa,migrate:mm_migrate_pages,minor-faults -p PID ... 1 sched:sched_stick_numa 3 sched:sched_move_numa 41 sched:sched_swap_numa 5,239 migrate:mm_migrate_pages 50,161 minor-faults The last bit of NUMA-related optimizations for network-heavy workloads comes from the fact that a network card is a PCIe device and each device is bound to its own NUMA-node, therefore some CPUs will have lower latency when talking to the network. We’ll discuss optimizations that can be applied there when we discuss NIC→CPU affinity, but for now lets switch gears to PCI-Express… PCIe Normally you do not need to go too deep into PCIe troubleshooting unless you have some kind of hardware malfunction. Therefore it’s usually worth spending minimal effort there by just creating “link width”, “link speed”, and possibly RxErr/BadTLP alerts for your PCIe devices. This should save you troubleshooting hours because of broken hardware or failed PCIe negotiation. You can use lspci for that: # lspci -s 0a:00.0 -vvv ... LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L0s <2us, L1 <16us LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- ... Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- ... UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- ... UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- ... CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ PCIe may become a bottleneck though if you have multiple high-speed devices competing for the bandwidth (e.g. when you combine fast network with fast storage), therefore you may need to physically shard your PCIe devices across CPUs to get maximum throughput. source: https://en.wikipedia.org/wiki/PCI_Express#History_and_revisions Also see the article, “Understanding PCIe Configuration for Maximum Performance,” on the Mellanox website, that goes a bit deeper into PCIe configuration, which may be helpful at higher speeds if you observe packet loss between the card and the OS. Intel suggests that sometimes PCIe power management (ASPM) may lead to higher latencies and therefore higher packet loss. You can disable it by adding pcie_aspm=off to the kernel cmdline. NIC Before we start, it worth mentioning that both Intel and Mellanox have their own performance tuning guides and regardless of the vendor you pick it’s beneficial to read both of them. Also drivers usually come with a README on their own and a set of useful utilities. Next place to check for the guidelines is your operating system’s manuals, e.g. Red Hat Enterprise Linux Network Performance Tuning Guide, which explains most of the optimizations mentioned below and even more. Cloudflare also has a good article about tuning that part of the network stack on their blog, though it is mostly aimed at low latency use-cases. When optimizing NICs ethtool will be your best friend. A small note here: if you are using a newer kernel (and you really should!) you should also bump some parts of your userland, e.g. for network operations you probably want newer versions of: ethtool, iproute2, and maybe iptables/nftables packages. Valuable insight into what is happening with you network card can be obtained via ethtool -S: $ ethtool -S eth0 | egrep 'miss|over|drop|lost|fifo' rx_dropped: 0 tx_dropped: 0 port.rx_dropped: 0 port.tx_dropped_link_down: 0 port.rx_oversize: 0 port.arq_overflows: 0 Consult with your NIC manufacturer for detailed stats description, e.g. Mellanox have a dedicated wiki page for them. From the kernel side of things you’ll be looking at /proc/interrupts, /proc/softirqs, and /proc/net/softnet_stat. There are two useful bcc tools here: hardirqs and softirqs. Your goal in optimizing the network is to tune the system until you have minimal CPU usage while having no packet loss. Interrupt Affinity Tunings here usually start with spreading interrupts across the processors. How specifically you should do that depends on your workload: For maximum throughput you can distribute interrupts across all NUMA-nodes in the system. To minimize latency you can limit interrupts to a single NUMA-node. To do that you may need to reduce the number of queues to fit into a single node (this usually implies cutting their number in half with ethtool -L). Vendors usually provide scripts to do that, e.g. Intel has set_irq_affinity. Ring buffer sizes Network cards need to exchange information with the kernel. This is usually done through a data structure called a “ring”, current/maximum size of that ring viewed via ethtool -g: $ ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 4096 TX: 4096 Current hardware settings: RX: 4096 TX: 4096 You can adjust these values within pre-set maximums with -G. Generally bigger is better here (esp. if you are using interrupt coalescing), since it will give you more protection against bursts and in-kernel hiccups, therefore reducing amount of dropped packets due to no buffer space/missed interrupt. But there are couple of caveats: On older kernels, or drivers without BQL support, high values may attribute to a higher bufferbloat on the tx-side. Bigger buffers will also increase cache pressure, so if you are experiencing one, try lowing them. Coalescing Interrupt coalescing allows you to delay notifying the kernel about new events by aggregating multiple events in a single interrupt. Current setting can be viewed via ethtool -c: $ ethtool -c eth0 Coalesce parameters for eth0: ... rx-usecs: 50 tx-usecs: 50 You can either go with static limits, hard-limiting maximum number of interrupts per second per core, or depend on the hardware to automatically adjust the interrupt rate based on the throughput. Enabling coalescing (with -C) will increase latency and possibly introduce packet loss, so you may want to avoid it for latency sensitive. On the other hand, disabling it completely may lead to interrupt throttling and therefore limit your performance. Offloads Modern network cards are relatively smart and can offload a great deal of work to either hardware or emulate that offload in drivers themselves. All possible offloads can be obtained with ethtool -k: $ ethtool -k eth0 Features for eth0: ... tcp-segmentation-offload: on generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] In the output all non-tunable offloads are marked with [fixed] suffix. There is a lot to say about all of them, but here are some rules of thumb: do not enable LRO, use GRO instead. be cautious about TSO, since it highly depends on the quality of your drivers/firmware. do not enable TSO/GSO on old kernels, since it may lead to excessive bufferbloat. **** Packet Steering All modern NICs are optimized for multi-core hardware, therefore they internally split packets into virtual queues, usually one-per CPU. When it is done in hardware it is called RSS, when the OS is responsible for loadbalancing packets across CPUs it is called RPS (with its TX-counterpart called XPS). When the OS also tries to be smart and route flows to the CPUs that are currently handling that socket, it is called RFS. When hardware does that it is called “Accelerated RFS” or aRFS for short. Here are couple of best practices from our production: If you are using newer 25G+ hardware it probably has enough queues and a huge indirection table to be able to just RSS across all your cores. Some older NICs have limitations of only utilizing the first 16 CPUs. You can try enabling RPS if: you have more CPUs than hardware queues and you want to sacrifice latency for throughput. you are using internal tunneling (e.g. GRE/IPinIP) that NIC can’t RSS; Do not enable RPS if your CPU is quite old and does not have x2APIC. Binding each CPU to its own TX queue through XPS is generally a good idea. Effectiveness of RFS is highly depended on your workload and whether you apply CPU affinity to it. Flow Director and ATR Enabled flow director (or fdir in Intel terminology) operates by default in an Application Targeting Routing mode which implements aRFS by sampling packets and steering flows to the core where they presumably are being handled. Its stats are also accessible through ethtool -S:$ ethtool -S eth0 | egrep ‘fdir’ port.fdir_flush_cnt: 0 … Though Intel claims that fdir increases performance in some cases, external research suggests that it can also introduce up to 1% of packet reordering, which can be quite damaging for TCP performance. Therefore try testing it for yourself and see if FD is useful for your workload, while keeping an eye for the TCPOFOQueue counter. Operating Systems: Network Stack There are countless books, videos, and tutorials for the tuning the Linux networking stack. And sadly tons of “sysctl.conf cargo-culting” that comes with them. Even though recent kernel versions do not require as much tuning as they used to 10 years ago and most of the new TCP/IP features are enabled and well-tuned by default, people are still copy-pasting their old sysctls.conf that they’ve used to tune 2.6.18/2.6.32 kernels. To verify effectiveness of network-related optimizations you should: Collect system-wide TCP metrics via /proc/net/snmp and /proc/net/netstat. Aggregate per-connection metrics obtained either from ss -n --extended --info, or from calling getsockopt(TCP_INFO)/getsockopt(TCP_CC_INFO) inside your webserver. tcptrace(1)’es of sampled TCP flows. Analyze RUM metrics from the app/browser. For sources of information about network optimizations, I usually enjoy conference talks by CDN-folks since they generally know what they are doing, e.g. Fastly on LinuxCon Australia. Listening what Linux kernel devs say about networking is quite enlightening too, for example netdevconf talks and NETCONF transcripts. It worth highlighting good deep-dives into Linux networking stack by PackageCloud, especially since they put an accent on monitoring instead of blindly tuning things: Monitoring and Tuning the Linux Networking Stack: Receiving Data Monitoring and Tuning the Linux Networking Stack: Sending Data Before we start, let me state it one more time: upgrade your kernel! There are tons of new network stack improvements, and I’m not even talking about IW10 (which is so 2010). I am talking about new hotness like: TSO autosizing, FQ, pacing, TLP, and RACK, but more on that later. As a bonus by upgrading to a new kernel you’ll get a bunch of scalability improvements, e.g.: removed routing cache, lockless listen sockets, SO_REUSEPORT, and many more. Overview From the recent Linux networking papers the one that stands out is “Making Linux TCP Fast.” It manages to consolidate multiple years of Linux kernel improvements on 4 pages by breaking down Linux sender-side TCP stack into functional pieces: Fair Queueing and Pacing Fair Queueing is responsible for improving fairness and reducing head of line blocking between TCP flows, which positively affects packet drop rates. Pacing schedules packets at rate set by congestion control equally spaced over time, which reduces packet loss even further, therefore increasing throughput. As a side note: Fair Queueing and Pacing are available in linux via fq qdisc. Some of you may know that these are a requirement for BBR (not anymore though), but both of them can be used with CUBIC, yielding up to 15-20% reduction in packet loss and therefore better throughput on loss-based CCs. Just don’t use it in older kernels (< 3.19), since you will end up pacing pure ACKs and cripple your uploads/RPCs. TSO autosizing and TSQ Both of these are responsible for limiting buffering inside the TCP stack and hence reducing latency, without sacrificing throughput. Congestion Control CC algorithms are a huge subject by itself, and there was a lot of activity around them in recent years. Some of that activity was codified as: tcp_cdg (CAIA), tcp_nv (Facebook), and tcp_bbr (Google). We won’t go too deep into discussing their inner-workings, let’s just say that all of them rely more on delay increases than packet drops for a congestion indication. BBR is arguably the most well-documented, tested, and practical out of all new congestion controls. The basic idea is to create a model of the network path based on packet delivery rate and then execute control loops to maximize bandwidth while minimizing rtt. This is exactly what we are looking for in our proxy stack. Preliminary data from BBR experiments on our Edge PoPs shows an increase of file download speeds: 6 hour TCP BBR experiment in Tokyo PoP: x-axis — time, y-axis — client download speed Here I want to stress out that we observe speed increase across all percentiles. That is not the case for backend changes. These usually only benefit p90+ users (the ones with the fastest internet connectivity), since we consider everyone else being bandwidth-limited already. Network-level tunings like changing congestion control or enabling FQ/pacing show that users are not being bandwidth-limited but, if I can say this, they are “TCP-limited.” If you want to know more about BBR, APNIC has a good entry-level overview of BBR (and its comparison to loss-based congestions controls). For more in-depth information on BBR you probably want to read through bbr-dev mailing list archives (it has a ton of useful links pinned at the top). For people interested in congestion control in general it may be fun to follow Internet Congestion Control Research Group activity. ACK Processing and Loss Detection But enough about congestion control, let’s talk about let’s talk about loss detection, here once again running the latest kernel will help quite a bit. New heuristics like TLP and RACK are constantly being added to TCP, while the old stuff like FACK and ER is being retired. Once added, they are enabled by default so you do not need to tune any system settings after the upgrade. Userspace prioritization and HOL Userspace socket APIs provide implicit buffering and no way to re-order chunks once they are sent, therefore in multiplexed scenarios (e.g. HTTP/2) this may result in a HOL blocking, and inversion of h2 priorities. TCP_NOTSENT_LOWAT socket option (and corresponding net.ipv4.tcp_notsent_lowat sysctl) were designed to solve this problem by setting a threshold at which the socket considers itself writable (i.e. epoll will lie to your app). This can solve problems with HTTP/2 prioritization, but it can also potentially negatively affect throughput, so you know the drill—test it yourself. Sysctls One does not simply give a networking optimization talk without mentioning sysctls that need to be tuned. But let me first start with the stuff you don’t want to touch: net.ipv4.tcp_tw_recycle=1—don’t use it—it was already broken for users behind NAT, but if you upgrade your kernel, it will be broken for everyone. net.ipv4.tcp_timestamps=0—don’t disable them unless you know all side-effects and you are OK with them. For example, one of non-obvious side effects is that you will loose window scaling and SACK options on syncookies. As for sysctls that you should be using: net.ipv4.tcp_slow_start_after_idle=0—the main problem with slowstart after idle is that “idle” is defined as one RTO, which is too small. net.ipv4.tcp_mtu_probing=1—useful if there are ICMP blackholes between you and your clients (most likely there are). net.ipv4.tcp_rmem, net.ipv4.tcp_wmem—should be tuned to fit BDP, just don’t forget that bigger isn’t always better. echo 2 > /sys/module/tcp_cubic/parameters/hystart_detect—if you are using fq+cubic, this might help with tcp_cubic exiting the slow-start too early. It also worth noting that there is an RFC draft (though a bit inactive) from the author of curl, Daniel Stenberg, named TCP Tuning for HTTP, that tries to aggregate all system tunings that may be beneficial to HTTP in a single place. Application level: Midlevel Tooling Just like with the kernel, having up-to-date userspace is very important. You should start with upgrading your tools, for example you can package newer versions of perf, bcc, etc. Once you have new tooling you are ready to properly tune and observe the behavior of a system. Through out this part of the post we’ll be mostly relying on on-cpu profiling with perf top, on-CPU flamegraphs, and adhoc histograms from bcc’s funclatency. Compiler Toolchain Having a modern compiler toolchain is essential if you want to compile hardware-optimized assembly, which is present in many libraries commonly used by web servers. Aside from the performance, newer compilers have new security features (e.g. -fstack-protector-strong or SafeStack) that you want to be applied on the edge. The other use case for modern toolchains is when you want to run your test harnesses against binaries compiled with sanitizers (e.g. AddressSanitizer, and friends). System libraries It’s also worth upgrading system libraries, like glibc, since otherwise you may be missing out on recent optimizations in low-level functions from -lc, -lm, -lrt, etc. Test-it-yourself warning also applies here, since occasional regressions creep in. Zlib Normally web server would be responsible for compression. Depending on how much data is going though that proxy, you may occasionally see zlib’s symbols in perf top, e.g.: # perf top ... 8.88% nginx [.] longest_match 8.29% nginx [.] deflate_slow 1.90% nginx [.] compress_block There are ways of optimizing that on the lowest levels: both Intel and Cloudflare, as well as a standalone zlib-ng project, have their zlib forks which provide better performance by utilizing new instructions sets. Malloc We’ve been mostly CPU-oriented when discussing optimizations up until now, but let’s switch gears and discuss memory-related optimizations. If you use lots of Lua with FFI or heavy third party modules that do their own memory management, you may observe increased memory usage due to fragmentation. You can try solving that problem by switching to either jemalloc or tcmalloc. Using custom malloc also has the following benefits: Separating your nginx binary from the environment, so that glibc version upgrades and OS migration will affect it less. Better introspection, profiling and stats. ## PCRE If you use many complex regular expressions in your nginx configs or heavily rely on Lua, you may see pcre-related symbols in perf top. You can optimize that by compiling PCRE with JIT, and also enabling it in nginx via pcre_jit on;. You can check the result of optimization by either looking at flame graphs, or using funclatency: # funclatency /srv/nginx-bazel/sbin/nginx:ngx_http_regex_exec -u ... usecs : count distribution 0 -> 1 : 1159 |********** | 2 -> 3 : 4468 |****************************************| 4 -> 7 : 622 |***** | 8 -> 15 : 610 |***** | 16 -> 31 : 209 |* | 32 -> 63 : 91 | | TLS If you are terminating TLS on the edge w/o being fronted by a CDN, then TLS performance optimizations may be highly valuable. When discussing tunings we’ll be mostly focusing server-side efficiency. So, nowadays first thing you need to decide is which TLS library to use: Vanilla OpenSSL, OpenBSD’s LibreSSL, or Google’s BoringSSL. After picking the TLS library flavor, you need to properly build it: OpenSSL for example has a bunch of built-time heuristics that enable optimizations based on build environment; BoringSSL has deterministic builds, but sadly is way more conservative and just disables some optimizations by default. Anyway, here is where choosing a modern CPU should finally pay off: most TLS libraries can utilize everything from AES-NI and SSE to ADX and AVX512. You can use built-in performance tests that come with your TLS library, e.g. in BoringSSL case it’s the bssl speed. Most of performance comes not from the hardware you have, but from cipher-suites you are going to use, so you have to optimize them carefully. Also know that changes here can (and will!) affect security of your web server—the fastest ciphersuites are not necessarily the best. If unsure what encryption settings to use, Mozilla SSL Configuration Generator is a good place to start. Asymmetric Encryption If your service is on the edge, then you may observe a considerable amount of TLS handshakes and therefore have a good chunk of your CPU consumed by the asymmetric crypto, making it an obvious target for optimizations. To optimize server-side CPU usage you can switch to ECDSA certs, which are generally 10x faster than RSA. Also they are considerably smaller, so it may speedup handshake in presence of packet-loss. But ECDSA is also heavily dependent on the quality of your system’s random number generator, so if you are using OpenSSL, be sure to have enough entropy (with BoringSSL you do not need to worry about that). As a side note, it worth mentioning that bigger is not always better, e.g. using 4096 RSA certs will degrade your performance by 10x: $ bssl speed Did 1517 RSA 2048 signing ... (1507.3 ops/sec) Did 160 RSA 4096 signing ... (153.4 ops/sec) To make it worse, smaller isn’t necessarily the best choice either: by using non-common p-224 field for ECDSA you’ll get 60% worse performance compared to a more common p-256: $ bssl speed Did 7056 ECDSA P-224 signing ... (6831.1 ops/sec) Did 17000 ECDSA P-256 signing ... (16885.3 ops/sec) The rule of thumb here is that the most commonly used encryption is generally the most optimized one. When running properly optimized OpenTLS-based library using RSA certs, you should see the following traces in your perf top: AVX2-capable, but not ADX-capable boxes (e.g. Haswell) should use AVX2 codepath: 6.42% nginx [.] rsaz_1024_sqr_avx2 1.61% nginx [.] rsaz_1024_mul_avx2 While newer hardware should use a generic montgomery multiplication with ADX codepath: 7.08% nginx [.] sqrx8x_internal 2.30% nginx [.] mulx4x_internal Symmetric Encryption If you have lot’s of bulk transfers like videos, photos, or more generically files, then you may start observing symmetric encryption symbols in profiler’s output. Here you just need to make sure that your CPU has AES-NI support and you set your server-side preferences for AES-GCM ciphers. Properly tuned hardware should have following in perf top: 8.47% nginx [.] aesni_ctr32_ghash_6x But it’s not only your servers that will need to deal with encryption/decryption—your clients will share the same burden on a way less capable CPU. Without hardware acceleration this may be quite challenging, therefore you may consider using an algorithm that was designed to be fast without hardware acceleration, e.g. ChaCha20-Poly1305. This will reduce TTLB for some of your mobile clients. ChaCha20-Poly1305 is supported in BoringSSL out of the box, for OpenSSL 1.0.2 you may consider using Cloudflare patches. BoringSSL also supports “equal preference cipher groups,” so you may use the following config to let clients decide what ciphers to use based on their hardware capabilities (shamelessly stolen from cloudflare/sslconfig): ssl_ciphers '[ECDHE-ECDSA-AES128-GCM-SHA256|ECDHE-ECDSA-CHACHA20-POLY1305|ECDHE-RSA-AES128-GCM-SHA256|ECDHE-RSA-CHACHA20-POLY1305]:ECDHE+AES128:RSA+AES128:ECDHE+AES256:RSA+AES256:ECDHE+3DES:RSA+3DES'; ssl_prefer_server_ciphers on; Application level: Highlevel To analyze effectiveness of your optimizations on that level you will need to collect RUM data. In browsers you can use Navigation Timing APIs and Resource Timing APIs. Your main metrics are TTFB and TTV/TTI. Having that data in an easily queriable and graphable formats will greatly simplify iteration. Compression Compression in nginx starts with mime.types file, which defines default correspondence between file extension and response MIME type. Then you need to define what types you want to pass to your compressor with e.g. gzip_types. If you want the complete list you can use mime-db to autogenerate your mime.types and to add those with .compressible == true to gzip_types. When enabling gzip, be careful about two aspects of it: Increased memory usage. This can be solved by limiting gzip_buffers. Increased TTFB due to the buffering. This can be solved by using [gzip_no_buffer]. As a side note, http compression is not limited to gzip exclusively: nginx has a third party ngx_brotli module that can improve compression ratio by up to 30% compared to gzip. As for compression settings themselves, let’s discuss two separate use-cases: static and dynamic data. For static data you can archive maximum compression ratios by pre-compressing your static assets as a part of the build process. We discussed that in quite a detail in the Deploying Brotli for static content post for both gzip and brotli. For dynamic data you need to carefully balance a full roundtrip: time to compress the data + time to transfer it + time to decompress on the client. Therefore setting the highest possible compression level may be unwise, not only from CPU usage perspective, but also from TTFB. ## Buffering Buffering inside the proxy can greatly affect web server performance, especially with respect to latency. The nginx proxy module has various buffering knobs that are togglable on a per-location basis, each of them is useful for its own purpose. You can separately control buffering in both directions via proxy_request_buffering and proxy_buffering. If buffering is enabled the upper limit on memory consumption is set by client_body_buffer_size and proxy_buffers, after hitting these thresholds request/response is buffered to disk. For responses this can be disabled by setting proxy_max_temp_file_size to 0. Most common approaches to buffering are: Buffer request/response up to some threshold in memory and then overflow to disk. If request buffering is enabled, you only send a request to the backend once it is fully received, and with response buffering, you can instantaneously free a backend thread once it is ready with the response. This approach has the benefits of improved throughput and backend protection at the cost of increased latency and memory/io usage (though if you use SSDs that may not be much of a problem). No buffering. Buffering may not be a good choice for latency sensitive routes, especially ones that use streaming. For them you may want to disable it, but now your backend needs to deal with slow clients (incl. malicious slow-POST/slow-read kind of attacks). Application-controlled response buffering through the X-Accel-Buffering header. Whatever path you choose, do not forget to test its effect on both TTFB and TTLB. Also, as mentioned before, buffering can affect IO usage and even backend utilization, so keep an eye out for that too. TLS Now we are going to talk about high-level aspects of TLS and latency improvements that could be done by properly configuring nginx. Most of the optimizations I’ll be mentioning are covered in the High Performance Browser Networking’s “Optimizing for TLS” section and Making HTTPS Fast(er) talk at nginx.conf 2014. Tunings mentioned in this part will affect both performance and security of your web server, if unsure, please consult with Mozilla’s Server Side TLS Guide and/or your Security Team. To verify the results of optimizations you can use: WebpageTest for impact on performance. SSL Server Test from Qualys, or Mozilla TLS Observatory for impact on security. Session resumption As DBAs love to say “the fastest query is the one you never make.” The same goes for TLS—you can reduce latency by one RTT if you cache the result of the handshake. There are two ways of doing that: You can ask the client to store all session parameters (in a signed and encrypted way), and send it to you during the next handshake (similar to a cookie). On the nginx side this is configured via the ssl_session_tickets directive. This does not not consume any memory on the server-side but has a number of downsides: You need the infrastructure to create, rotate, and distribute random encryption/signing keys for your TLS sessions. Just remember that you really shouldn’t 1) use source control to store ticket keys 2) generate these keys from other non-ephemeral material e.g. date or cert. PFS won’t be on a per-session basis but on a per-tls-ticket-key basis, so if an attacker gets a hold of the ticket key, they can potentially decrypt any captured traffic for the duration of the ticket. Your encryption will be limited to the size of your ticket key. It does not make much sense to use AES256 if you are using 128-bit ticket key. Nginx supports both 128 bit and 256 bit TLS ticket keys. Not all clients support ticket keys (all modern browsers do support them though). Or you can store TLS session parameters on the server and only give a reference (an id) to the client. This is done via the ssl_session_cache directive. It has a benefit of preserving PFS between sessions and greatly limiting attack surface. Though ticket keys have downsides: They consume ~256 bytes of memory per session on the server, which means you can’t store many of them for too long. They can not be easily shared between servers. Therefore you either need a loadbalancer which will send the same client to the same server to preserve cache locality, or write a distributed TLS session storage on top off something like ngx_http_lua_module. As a side note, if you go with session ticket approach, then it’s worth using 3 keys instead of one, e.g.: ssl_session_tickets on; ssl_session_timeout 1h; ssl_session_ticket_key /run/nginx-ephemeral/nginx_session_ticket_curr; ssl_session_ticket_key /run/nginx-ephemeral/nginx_session_ticket_prev; ssl_session_ticket_key /run/nginx-ephemeral/nginx_session_ticket_next; You will be always encrypting with the current key, but accepting sessions encrypted with both next and previous keys. OCSP Stapling You should staple your OCSP responses, since otherwise: Your TLS handshake may take longer because the client will need to contact the certificate authority to fetch OCSP status. On OCSP fetch failure may result in availability hit. You may compromise users’ privacy since their browser will contact a third party service indicating that they want to connect to your site. To staple the OCSP response you can periodically fetch it from your certificate authority, distribute the result to your web servers, and use it with the ssl_stapling_file directive: ssl_stapling_file /var/cache/nginx/ocsp/www.der; TLS record size TLS breaks data into chunks called records, which you can’t verify and decrypt until you receive it in its entirety. You can measure this latency as the difference between TTFB from the network stack and application points of view. By default nginx uses 16k chunks, which do not even fit into IW10 congestion window, therefore require an additional roundtrip. Out-of-the box nginx provides a way to set record sizes via ssl_buffer_size directive: To optimize for low latency you should set it to something small, e.g. 4k. Decreasing it further will be more expensive from a CPU usage perspective. To optimize for high throughput you should leave it at 16k. There are two problems with static tuning: You need to tune it manually. You can only set ssl_buffer_size on a per-nginx config or per-server block basis, therefore if you have a server with mixed latency/throughput workloads you’ll need to compromize. There is an alternative approach: dynamic record size tuning. There is an nginx patch from Cloudflare that adds support for dynamic record sizes. It may be a pain to initially configure it, but once you over with it, it works quite nicely. TLS 1.3 TLS 1.3 features indeed sound very nice, but unless you have resources to be troubleshooting TLS full-time I would suggest not enabling it, since: It is still a draft. 0-RTT handshake has some security implications. And your application needs to be ready for it. There are still middleboxes (antiviruses, DPIs, etc) that block unknown TLS versions. ## Avoid Eventloop Stalls Nginx is an eventloop-based web server, which means it can only do one thing at a time. Even though it seems that it does all of these things simultaneously, like in time-division multiplexing, all nginx does is just quickly switches between the events, handling one after another. It all works because handling each event takes only couple of microseconds. But if it starts taking too much time, e.g. because it requires going to a spinning disk, latency can skyrocket. If you start noticing that your nginx are spending too much time inside the ngx_process_events_and_timers function, and distribution is bimodal, then you probably are affected by eventloop stalls. # funclatency '/srv/nginx-bazel/sbin/nginx:ngx_process_events_and_timers' -m msecs : count distribution 0 -> 1 : 3799 |****************************************| 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 409 |**** | 32 -> 63 : 313 |*** | 64 -> 127 : 128 |* | AIO and Threadpools Since the main source of eventloop stalls especially on spinning disks is IO, you should probably look there first. You can measure how much you are affected by it by running fileslower: # fileslower 10 Tracing sync read/writes slower than 10 ms TIME(s) COMM TID D BYTES LAT(ms) FILENAME 2.642 nginx 69097 R 5242880 12.18 0002121812 4.760 nginx 69754 W 8192 42.08 0002121598 4.760 nginx 69435 W 2852 42.39 0002121845 4.760 nginx 69088 W 2852 41.83 0002121854 To fix this, nginx has support for offloading IO to a threadpool (it also has support for AIO, but native AIO in Unixes have lots of quirks, so better to avoid it unless you know what you doing). A basic setup consists of simply: aio threads; aio_write on; For more complicated cases you can set up custom thread_pool‘s, e.g. one per-disk, so that if one drive becomes wonky, it won’t affect the rest of the requests. Thread pools can greatly reduce the number of nginx processes stuck in D state, improving both latency and throughput. But it won’t eliminate eventloop stalls fully, since not all IO operations are currently offloaded to it. Logging Writing logs can also take a considerable amount of time, since it is hitting disks. You can check whether that’s that case by running ext4slower and looking for access/error log references: # ext4slower 10 TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 06:26:03 nginx 69094 W 163070 634126 18.78 access.log 06:26:08 nginx 69094 W 151 126029 37.35 error.log 06:26:13 nginx 69082 W 153168 638728 159.96 access.log It is possible to workaround this by spooling access logs in memory before writing them by using buffer parameter for the access_log directive. By using gzip parameter you can also compress the logs before writing them to disk, reducing IO pressure even more. But to fully eliminate IO stalls on log writes you should just write logs via syslog, this way logs will be fully integrated with nginx eventloop. Open file cache Since open(2) calls are inherently blocking and web servers are routinely opening/reading/closing files it may be beneficial to have a cache of open files. You can see how much benefit there is by looking at ngx_open_cached_file function latency: # funclatency /srv/nginx-bazel/sbin/nginx:ngx_open_cached_file -u usecs : count distribution 0 -> 1 : 10219 |****************************************| 2 -> 3 : 21 | | 4 -> 7 : 3 | | 8 -> 15 : 1 | | If you see that either there are too many open calls or there are some that take too much time, you can can look at enabling open file cache: open_file_cache max=10000; open_file_cache_min_uses 2; open_file_cache_errors on; After enabling open_file_cache you can observe all the cache misses by looking at opensnoop and deciding whether you need to tune the cache limits: # opensnoop -n nginx PID COMM FD ERR PATH 69435 nginx 311 0 /srv/site/assets/serviceworker.js 69086 nginx 158 0 /srv/site/error/404.html ... Wrapping up All optimizations that were described in this post are local to a single web server box. Some of them improve scalability and performance. Others are relevant if you want to serve requests with minimal latency or deliver bytes faster to the client. But in our experience a huge chunk of user-visible performance comes from a more high-level optimizations that affect behavior of the Dropbox Edge Network as a whole, like ingress/egress traffic engineering and smarter Internal Load Balancing. These problems are on the edge (pun intended) of knowledge, and the industry has only just started approaching them. If you’ve read this far you probably want to work on solving these and other interesting problems! You’re in luck: Dropbox is looking for experienced SWEs, SREs, and Managers. Source: https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/

September 19, 2017

2 points

AWSBucketDump – AWS S3 Security Scanning Tool

AWSBucketDump AWSBucketDump is a tool to quickly enumerate AWS S3 buckets to look for loot. It's similar to a subdomain bruteforcer but is made specifically for S3 buckets and also has some extra features that allow you to grep for delicious files as well as download interesting files if you're not afraid to quickly fill up your hard drive. @ok_bye_now Pre-Requisites Non-Standard Python Libraries: xmltodict requests argparse Created with Python 3.6 General This is a tool that enumerates Amazon S3 buckets and looks for interesting files. I have example wordlists but I haven't put much time into refining them. https://github.com/danielmiessler/SecLists will have all the word lists you need. If you are targeting a specific company, you will likely want to use jhaddix's enumall tool which leverages recon-ng and Alt-DNS. https://github.com/jhaddix/domain && https://github.com/infosec-au/altdns As far as word lists for grepping interesting files, that is completely up to you. The one I provided has some basics and yes, those word lists are based on files that I personally have found with this tool. Using the download feature might fill your hard drive up, you can provide a max file size for each download at the command line when you run the tool. Keep in mind that it is in bytes. I honestly don't know if Amazon rate limits this, I am guessing they do to some point but I haven't gotten around to figuring out what that limit is. By default there are two threads for checking buckets and two buckets for downloading. After building this tool, I did find an interesting article from Rapid7 regarding this research: https://community.rapid7.com/community/infosec/blog/2013/03/27/1951-open-s3-buckets Usage usage: AWSBucketDump.py [-h] [-D] [-t THREADS] -l HOSTLIST [-g GREPWORDS] [-m MAXSIZE] optional arguments: -h, --help show this help message and exit -D Download files. This requires significant diskspace -d If set to 1 or True, create directories for each host w/ results -t THREADS number of threads -l HOSTLIST -g GREPWORDS Provide a wordlist to grep for -m MAXSIZE Maximum file size to download. python AWSBucketDump.py -l BucketNames.txt -g interesting_Keywords.txt -D -m 500000 -d 1 Download: AWSBucketDump-master.zip or git clone https://github.com/jordanpotti/AWSBucketDump.git Source: https://github.com/jordanpotti/AWSBucketDump

September 19, 2017

2 points

NSA Capstone Course - Reverse Engineering

Lecture 1 - Introduction to x86 Assembly Slides Presentation Lecture 2 - Reverse Engineering Machine Code Pt. 1 Slides Presentation Lecture 3 - Reverse Engineering Machine Code Pt. 2 Slides Presentation Lecture 4 - Reverse Engineering Machine Code Pt. 3 Slides Presentation Lecture 5 - Executable File Formats Slides Presentation Lecture 6 - Modern Vulnerability Exploitation: The Stack Overflow Slides Presentation Lecture 7 - Modern Vulnerability Exploitation: The Heap Overflow Slides Presentation Lecture 8 - Modern Vulnerability Exploitation: Shellcoding Slides Presentation Lecture 9 - Modern Vulnerability Exploitation: Format String Attacks Slides Presentation Sursa: https://codebreaker.ltsnet.net/resources

September 18, 2017

2 points

BeRoot - Windows Privilege Escalation Tool

BeRoot BeRoot(s) is a post exploitation tool to check common Windows misconfigurations to find a way to escalate our privilege. A compiled version is available here. It will be added to the pupy project as a post exploitation module (so it will be executed in memory without touching the disk). Except one method, this tool is only used to detect and not to exploit. If something is found, templates could be used to exploit it. To use it, just create a test.bat file located next to the service / DLL used. It should execute it once called. Depending on the Redistributable Packages installed on the target host, these binaries may not work. Run it |====================================================================| | | | Windows Privilege Escalation | | | | ! BANG BANG ! | | | |====================================================================| usage: beRoot.exe [-h] [-l] [-w] [-c CMD] Windows Privilege Escalation optional arguments: -h, --help show this help message and exit -l, --list list all softwares installed (not run by default) -w, --write write output -c CMD, --cmd CMD cmd to execute for the webclient check (default: whoami) All detection methods are described on the following document. Path containing space without quotes Consider the following file path: C:\Program Files\Some Test\binary.exe If the path contains spaces and no quotes, Windows would try to locate and execute programs in the following order: C:\Program.exe C:\Program Files\Some.exe C:\Program Files\Some Folder\binary.exe Following this example, if "C:\" folder is writable, it would be possible to create a malicious executable binary called "Program.exe". If "binary.exe" run with high privilege, it could be a good way to escalate our privilege. Note: BeRoot realized these checks on every service path, scheduled tasks and startup keys located in HKLM. How to exploit The vulnerable path runs as: a service: create a malicious service (or compile the service template) a classic executable: Create your own executable. Writable directory Consider the following file path: C:\Program Files\Some Test\binary.exe If the root directory of "binary.exe" is writable ("C:\Program Files\Some Test") and run with high privilege, it could be used to elevate our privileges. Note: BeRoot realized these checks on every service path, scheduled tasks and startup keys located in HKLM. How to exploit The service is not running: Replace the legitimate service by our own, restart it or check how it's triggered (at reboot, when another process is started, etc.). The service is running and could not be stopped: Most exploitation will be like that, checks for dll hijacking and try to restart the service using previous technics. Writable directory on %PATH% This technic affects the following Windows version: 6.0 => Windows Vista / Windows Server 2008 6.1 => Windows 7 / Windows Server 2008 R2 6.2 => Windows 8 / Windows Server 2012 On a classic Windows installation, when DLLs are loaded by a binary, Windows would try to locate it using these following steps: - Directory where the binary is located - C:\Windows\System32 - C:\Windows\System - C:\Windows\ - Current directory where the binary has been launched - Directory present in %PATH% environment variable If a directory on the %PATH% variable is writable, it would be possible to realize DLL hijacking attacks. Then, the goal would be to find a service which loads a DLL not present on each of these path. This is the case of the default "IKEEXT" service which loads the inexistant "wlbsctrl.dll". How to exploit: Create a malicious DLL called "wlbsctrl.dll" (use the DLL template) and add it to the writable path listed on the %PATH% variable. Start the service "IKEEXT". To start the IKEEXT service without high privilege, a technic describe on the french magazine MISC 90 explains the following method: Create a file as following: C:\Users\bob\Desktop>type test.txt [IKEEXTPOC] MEDIA=rastapi Port=VPN2-0 Device=Wan Miniport (IKEv2) DEVICE=vpn PhoneNumber=127.0.0.1 Use the "rasdial" binary to start the IKEEXT service. Even if the connection failed, the service should have been started. C:\Users\bob\Desktop>rasdial IKEEXTPOC test test /PHONEBOOK:test.txt MS16-075 For French user, I recommend the article written on the MISC 90 which explain in details how it works. This vulnerability has been corrected by Microsoft with MS16-075, however many servers are still vulnerable to this kind of attack. I have been inspired from the C++ POC available here Here are some explaination (not in details): Start Webclient service (used to connect to some shares) using some magic tricks (using its UUID) Start an HTTP server locally Find a service which will be used to trigger a SYSTEM NTLM hash. Enable file tracing on this service modifying its registry key to point to our webserver (\\127.0.0.1@port\tracing) Start this service Our HTTP Server start a negotiation to get the SYSTEM NTLM hash Use of this hash with SMB to execute our custom payload (SMBrelayx has been modify to realize this action) Clean everything (stop the service, clean the regritry, etc.). How to exploit: BeRoot realize this exploitation, change the "-c" option to execute custom command on the vulnerable host. beRoot.exe -c "net user Zapata LaLuchaSigue /add" beRoot.exe -c "net localgroup Administrators Zapata /add" AlwaysInstallElevated registry key AlwaysInstallElevated is a setting that allows non-privileged users the ability to run Microsoft Windows Installer Package Files (MSI) with elevated (SYSTEM) permissions. To allow it, two registry entries have to be set to 1: HKEY_CURRENT_USER\SOFTWARE\Policies\Microsoft\Windows\Installer\AlwaysInstallElevated HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\Installer\AlwaysInstallElevated How to exploit: create a malicious msi binary and execute it. Unattended Install files This file contains all the configuration settings that were set during the installation process, some of which can include the configuration of local accounts including Administrator accounts. These files are available on these following path: C:\Windows\Panther\Unattend.xml C:\Windows\Panther\Unattended.xml C:\Windows\Panther\Unattend\Unattended.xml C:\Windows\Panther\Unattend\Unattend.xml C:\Windows\System32\Sysprep\unattend.xml C:\Windows\System32\Sysprep\Panther\unattend.xml How to exploit: open the unattend.xml file to check if passwords are present on it. Should looks like: <UserAccounts> <LocalAccounts> <LocalAccount> <Password> <Value>RmFrZVBhc3N3MHJk</Value> <PlainText>false</PlainText> </Password> <Description>Local Administrator</Description> <DisplayName>Administrator</DisplayName> <Group>Administrators</Group> <Name>Administrator</Name> </LocalAccount> </LocalAccounts> </UserAccounts> Other possible misconfigurations Other tests are realized to check if it's possible to: Modify an existing service Create a new service Modify a startup key (on HKLM) Modify directory where all scheduled tasks are stored: "C:\Windows\system32\Tasks" Special thanks Good description of each checks: https://toshellandback.com/2015/11/24/ms-priv-esc/ C++ POC: https://github.com/secruul/SysExec Impacket as always, awesome work: https://github.com/CoreSecurity/impacket/ Author: Alessandro ZANNI zanni.alessandro@gmail.com Download: BeRoot-master.zip or git clone https://github.com/AlessandroZ/BeRoot.git Source: https://github.com/AlessandroZ/BeRoot

September 18, 2017

2 points

CCleaner Hacked to Distribute Malware

If you have downloaded or updated CCleaner application on your computer between August 15 and September 12 of this year from its official website, then pay attention—your computer has been compromised. CCleaner is a popular application with over 2 billion downloads, created by Piriform and recently acquired by Avast, that allows users to clean up their system to optimize and enhance performance. Security researchers from Cisco Talos discovered that the download servers used by Avast to let users download the application were compromised by some unknown hackers, who replaced the original version of the software with the malicious one and distributed it to millions of users for around a month. This incident is yet another example of supply chain attack. Earlier this year, update servers of a Ukrainian company called MeDoc were also compromised in the same way to distribute the Petya ransomware, which wreaked havoc worldwide. Avast and Piriform have both confirmed that the Windows 32-bit version of CCleaner v5.33.6162 and CCleaner Cloud v1.07.3191 were affected by the malware. Detected on 13 September, the malicious version of CCleaner contains a multi-stage malware payload that steals data from infected computers and sends it to attacker's remote command-and-control servers. Moreover, the unknown hackers signed the malicious installation executable (v5.33) using a valid digital signature issued to Piriform by Symantec and used Domain Generation Algorithm (DGA), so that if attackers' server went down, the DGA could generate new domains to receive and send stolen information. "All of the collected information was encrypted and encoded by base64 with a custom alphabet," says Paul Yung, V.P. of Products at Piriform. "The encoded information was subsequently submitted to an external IP address 216.126.x.x (this address was hardcoded in the payload, and we have intentionally masked its last two octets here) via a HTTPS POST request." The malicious software was programmed to collect a large number of user data, including: Computer name List of installed software, including Windows updates List of all running processes IP and MAC addresses Additional information like whether the process is running with admin privileges and whether it is a 64-bit system. According to the Talos researchers, around 5 million people download CCleaner (or Crap Cleaner) each week, which indicates that more than 20 Million people could have been infected with the malicious version the app. "The impact of this attack could be severe given the extremely high number of systems possibly affected. CCleaner claims to have over 2 billion downloads worldwide as of November 2016 and is reportedly adding new users at a rate of 5 million a week," Talos said. However, Piriform estimated that up to 3 percent of its users (up to 2.27 million people) were affected by the malicious installation. Affected users are strongly recommended to update their CCleaner software to version 5.34 or higher, in order to protect their computers from being compromised. The latest version is available for download here. Sursa: http://thehackernews.com/2017/09/ccleaner-hacked-malware.html

October 12, 2017

1 point

CCleaner Hacked to Distribute Malware

Puteti incerca si Privazer http://privazer.com/

September 19, 2017

1 point

X41 Browser Security White Paper - Tools and PoCs

X41 Browser Security White Paper - Tools and PoCs X41 D-Sec GmbH (“X41”) - a research driven IT-Security company - released an in-depth analysis of the three leading enterprise web browsers Google Chrome, Microsoft Edge, and Internet Explorer. The full paper can be downloaded at: https://browser-security.x41-dsec.de/X41-Browser-Security-White-Paper.pdf (Sha256 d05d9df68ad8d6cee1491896b21485a48296f14112f191253d870fae16dc17de) Sursa: https://browser-security.x41-dsec.de/X41-Browser-Security-White-Paper.pdf

September 19, 2017

1 point

New Android trojan targeting over 60 banks and social apps

Han Sahin, Wesley Gahr, September, 2017 Increased threat for Android users Since the beginning of this year, SfyLabs' threat hunters have discovered several Google Play malware campaigns using new modi operandi such as clean dropper apps that effectively evaded all antivirus and Google Play protection solutions (Bouncer & Protect) for months. Unfortunately this was not the only threat this year. Android actors such as ExoBot have also been very busy adding Remote Access Trojan capabilities (SOCKS5 and VNC) to their software in their attempt to evade fraud detection solutions of financial organizations that mainly rely on IP-based geolocation and device binding vectors. The shift of malware campaigns from desktop (Windows) to mobile (Android) seems largely related to the fact that these days most transactions are initiated from mobile devices instead of the desktop. This motivates actors to invest in developing solutions that target Android and have the same capabilities as the malware variants that have been evolving on the desktop for years. New Android banking trojan: Red Alert 2.0 The last several months a new actor has been very busy developing and distributing a new Android trojan dubbed "Red Alert 2.0" by the actor. The bot and panel (C&C) are fully written from scratch, while many other trojans are evolutions of leaked sources of older trojans. Red Alert has the same capabilities as most other Android banking trojans such as the use of overlay attacks, SMS control and contact list harvesting. There are however also other functions that have not been seen in other Android banking trojans. New attack vectors Red Alert actors are regularly adding new functionality, such as blocking and logging incoming calls of banks (see image below), which could affect the process of fraud operation departments at financials that are calling users on their infected Android phone regarding potential malicious activity. Forum post of Red Alert actor on bot update Another interesting vector is the use of Twitter to avoid losing bots when the C2 server is taken offline (NTD). When the bot fails to connect to the hardcoded C2 it will retrieve a new C2 from a Twitter account. This is something we have seen in the desktop banking malware world before, but the first time we see it happening in an Android banking trojan. All these parts are under development but it gives the reader a good idea of the mindset of the actors behind Red Alert 2.0 as a new Android bot. Technical details The following code flow is triggered when the C2 of Red Alert is unavailable (connection error): 1) Red Alert Android bot has a salt stored in strings.xml 2) The following code uses the current date combined with the salt to create a new MD5 hash of which the first 16 characters are used as a Twitter handle registered by the Red Alert actors (i.e. d8585cf920cb893a for 9/18/2017). 3) The bot then requests the Twitter page of the created handle and parses the response to obtain the new C2 server address. OVERLAY ATTACK TARGETS The interesting part of the overlay attack vector for this malware is that the targets are stored on the C2 server and the list is not sent back to the bot, making it more work to retrieve the list compared to other Android banking trojans. The following list is not complete but gives a good overview of most of the overlay HTML the actor has bought and developed: aib.ibank.android au.com.bankwest.mobile au.com.cua.mb au.com.mebank.banking au.com.nab.mobile au.com.newcastlepermanent au.com.suncorp.SuncorpBank com.akbank.android.apps.akbank_direkt com.anz.android.gomoney com.axis.mobile com.bankofireland.mobilebanking com.bbva.bbvacontigo com.caisseepargne.android.mobilebanking com.chase.sig.android com.citibank.mobile.au com.cm_prod.bad com.comarch.security.mobilebanking com.commbank.netbank com.csam.icici.bank.imobile com.finansbank.mobile.cepsube com.garanti.cepsubesi com.infonow.bofa com.instagram.android com.konylabs.capitalone com.konylabs.cbplpat com.latuabancaperandroid com.nearform.ptsb com.palatine.android.mobilebanking.prod com.pozitron.iscep com.sbi.SBIFreedomPlus com.snapwork.hdfc com.suntrust.mobilebanking com.tmobtech.halkbank com.unionbank.ecommerce.mobile.android com.vakifbank.mobile com.wf.wellsfargomobile com.ykb.android com.ziraat.ziraatmobil de.comdirect.android de.commerzbanking.mobil de.postbank.finanzassistent es.cm.android es.lacaixa.mobile.android.newwapicon eu.eleader.mobilebanking.pekao fr.banquepopulaire.cyberplus fr.creditagricole.androidapp fr.laposte.lapostemobile fr.lcl.android.customerarea in.co.bankofbaroda.mpassbook it.nogood.container net.bnpparibas.mescomptes org.stgeorge.bankorg.westpac.bank pl.bzwbk.bzwbk24 pl.bzwbk.mobile.tab.bzwbk24 pl.eurobank pl.ipko.mobile pl.mbank pl.millennium.corpApp src.com.idbi wit.android.bcpBankingApp.millenniumPL OVERLAY ATTACK MECHANISM Upon opening an application that is targeted by Red Alert an overlay is shown to the user. When the user tries to log in he is greeted with an error page. The credentials themselves are then sent to the C2 server. To determine when to show the overlay and which overlay to show, the topmost application is requested periodically. For Android 5.0 and higher, the malware uses Android toolbox, which is different from the implementation used by other Android trojans such as Mazar, Exobot and Bankbot. v0_3 = Runtime.getRuntime().exec("/system/bin/toolbox ps -p - P -x -c"); BufferedReader v1 = new BufferedReader(new InputStreamReader(v0_3.getInputStream())); v2 = new ArrayList(); v3 = new ArrayList(); while(true) { String v4 = v1.readLine(); if(v4 == null) { break; } ((List)v2).add(v4); } ... BOT OPERATIONS The C2 server can command a bot to perform specific actions. The commands found in the latest samples are listed below: a.a = new a("START_SMS_INTERCEPTION", 0, "startSmsInterception"); a.b = new a("STOP_SMS_INTERCEPTION", 1, "stopSmsInterception"); a.c = new a("SEND_SMS", 2, "sendSms"); a.d = new a("SET_DEFAULT_SMS", 3, "setDefaultSms"); a.e = new a("RESET_DEFAULT_SMS", 4, "resetDefaultSms"); a.f = new a("GET_SMS_LIST", 5, "getSmsList"); a.g = new a("GET_CALL_LIST", 6, "getCallList"); a.h = new a("GET_CONTACT_LIST", 7, "getContactList"); a.i = new a("SET_ADMIN", 8, "setAdmin"); a.j = new a("LAUNCH_APP", 9, "launchApp"); a.k = new a("BLOCK", 10, "block"); a.l = new a("SEND_USSD", 11, "sendUssd"); a.m = new a("NOTIFY", 12, "notify"); a.o = new a[]{a.a, a.b, a.c, a.d, a.e, a.f, a.g, a.h, a.i, a.j, a.k, a.l, a.m}; SAMPLES Update Flash Player Package name: com.patixof.dxtrix SHA-256: a7c9cfa4ad14b0b9f907db0a1bef626327e1348515a4ae61a20387d6ec8fea78 Update Flash Player Package name: com.acronic SHA-256: bb0c8992c9eb052934c7f341a6b7992f8bb01c078865c4e562fd9b84637c1e1b Update Flash Player Package name: com.glsoftwre.fmc SHA-256: 79424db82573e1d7e60f94489c5ca1992f8d65422dbb8805d65f418d20bbd03a Update Flash Player Package name: com.aox.exsoft SHA-256: 4d74b31907745ba0715d356e7854389830e519f5051878485c4be8779bb55736 Viber Package name: com.aox.exsoft SHA-256: 2dc19f81352e84a45bd7f916afa3353d7f710338494d44802f271e1f3d972aed Android Update Package name: com.aox.exsoft SHA-256: 307f1b6eae57b6475b4436568774f0b23aa370a1a48f3b991af9c9b336733630 Update Google Market Package name: com.aox.exsoft SHA-256: 359341b5b4306ef36343b2ed5625bbbb8c051f2957d268b57be9c84424affd29 WhatsApp Package name: com.aox.exsoft SHA-256: 9eaa3bb33c36626cd13fc94f9de88b0f390ac5219cc04a08ee5961d59bf4946b Update Flash Player Package name: com.aox.exsoft SHA-256: dc11d9eb2b09c2bf74136b313e752075afb05c2f82d1f5fdd2379e46089eb776 Update WhatsApp Package name: com.aox.exsoft SHA-256: 58391ca1e3001311efe9fba1c05c15a2b1a7e5026e0f7b642a929a8fed25b187 Android Update Package name: com.aox.exsoft SHA-256: 36cbe3344f027c2960f7ac0d661ddbefff631af2da90b5122a65c407d0182b69 Update Flash Player Package name: com.aox.exsoft SHA-256: a5db9e4deadb2f7e075ba8a3beb6d927502b76237afaf0e2c28d00bb01570fae Update Flash Player Package name: com.aox.exsoft SHA-256: 0d0490d2844726314b7569827013d0555af242dd32b7e36ff5e28da3982a4f88 Update Flash Player Package name: com.excellentsft.xss SHA-256: 3e47f075b9d0b2eb840b8bbd49017ffb743f9973c274ec04b4db209af73300d6 ebookreader Package name: com.clx.rms SHA-256: 05ea7239e4df91e7ffd57fba8cc81751836d03fa7c2c4aa1913739f023b046f0 Update Flash Player Package name: com.glsoftwre.fmc SHA-256: 9446a9a13848906ca3040e399fd84bfebf21c40825f7d52a63c7ccccec4659b7 Update Flash Player Package name: com.kmc.prod SHA-256: 3a5ddb598e20ca7dfa79a9682751322a869695c500bdfb0c91c8e2ffb02cd6da Android Update Package name: com.kmc.prod SHA-256: b83bd8c755cb7546ef28bac157e51f04257686a045bbf9d64bec7eeb9116fd8a Source

September 19, 2017

1 point

Building the DOM faster: speculative parsing, async, defer and preload

In 2017, the toolbox for making sure your web page loads fast includes everything from minification and asset optimization to caching, CDNs, code splitting and tree shaking. However, you can get big performance boosts with just a few keywords and mindful code structuring, even if you’re not yet familiar with the concepts above and you’re not sure how to get started. The fresh web standard <link rel="preload">, that allows you to load critical resources faster, is coming to Firefox later this month. You can already try it out in Firefox Nightly or Developer Edition, and in the meantime, this is a great chance to review some fundamentals and dive deeper into performance associated with parsing the DOM. Understanding what goes on inside a browser is the most powerful tool for every web developer. We’ll look at how browsers interpret your code and how they help you load pages faster with speculative parsing. We’ll break down how defer and async work and how you can leverage the new keyword preload. Building blocks HTML describes the structure of a web page. To make any sense of the HTML, browsers first have to convert it into a format they understand – the Document Object Model, or DOM. Browser engines have a special piece of code called a parser that’s used to convert data from one format to another. An HTML parser converts data from HTML into the DOM. In HTML, nesting defines the parent-child relationships between different tags. In the DOM, objects are linked in a tree data structure capturing those relationships. Each HTML tag is represented by a node of the tree (a DOM node). The browser builds up the DOM bit by bit. As soon as the first chunks of code come in, it starts parsing the HTML, adding nodes to the tree structure. The DOM has two roles: it is the object representation of the HTML document, and it acts as an interface connecting the page to the outside world, like JavaScript. When you call document.getElementById(), the element that is returned is a DOM node. Each DOM node has many functions you can use to access and change it, and what the user sees changes accordingly. CSS styles found on a web page are mapped onto the CSSOM – the CSS Object Model. It is much like the DOM, but for the CSS rather than the HTML. Unlike the DOM, it cannot be built incrementally. Because CSS rules can override each other, the browser engine has to do complex calculations to figure out how the CSS code applies to the DOM. The history of the <script> tag As the browser is constructing the DOM, if it comes across a <script>...</script> tag in the HTML, it must execute it right away. If the script is external, it has to download the script first. Back in the old days, in order to execute a script, parsing had to be paused. It would only start up again after the JavaScript engine had executed code from a script. Why did the parsing have to stop? Well, scripts can change both the HTML and its product―the DOM. Scripts can change the DOM structure by adding nodes with document.createElement(). To change the HTML, scripts can add content with the notorious document.write() function. It’s notorious because it can change the HTML in ways that can affect further parsing. For example, the function could insert an opening comment tag making the rest of the HTML invalid. Scripts can also query something about the DOM, and if that happens while the DOM is still being constructed, it could return unexpected results. document.write() is a legacy function that can break your page in unexpected ways and you shouldn’t use it, even though browsers still support it. For these reasons, browsers have developed sophisticated techniques to get around the performance issues caused by script blocking that I will explain shortly. What about CSS? JavaScript blocks parsing because it can modify the document. CSS can’t modify the document, so it seems like there is no reason for it to block parsing, right? However, what if a script asks for style information that hasn’t been parsed yet? The browser doesn’t know what the script is about to execute—it may ask for something like the DOM node’s background-color which depends on the style sheet, or it may expect to access the CSSOM directly. Because of this, CSS may block parsing depending on the order of external style sheets and scripts in the document. If there are external style sheets placed before scripts in the document, the construction of DOM and CSSOM objects can interfere with each other. When the parser gets to a script tag, DOM construction cannot proceed until the JavaScript finishes executing, and the JavaScript cannot be executed until the CSS is downloaded, parsed, and the CSSOM is available. Another thing to keep in mind is that even if the CSS doesn’t block DOM construction, it blocks rendering. The browser won’t display anything until it has both the DOM and the CSSOM. This is because pages without CSS are often unusable. If a browser showed you a messy page without CSS, then a few moments later snapped into a styled page, the shifting content and sudden visual changes would make a turbulent user experience. See the Pen <a href="https://codepen.io/micikato/pen/JroPNm/">Flash of Unstyled Content</a> by Milica (<a href="https://codepen.io/micikato">@micikato</a>) on <a href="https://codepen.io">CodePen</a>.<br /> That poor user experience has a name – Flash of Unstyled Content or FOUC To get around these issues, you should aim to deliver the CSS as soon as possible. Recall the popular “styles at the top, scripts at the bottom” best practice? Now you know why it was there! Back to the future – speculative parsing Pausing the parser whenever a script is encountered means that every script you load delays the discovery of the rest of the resources that were linked in the HTML. If you have a few scripts and images to load, for example– <script src="slider.js"></script> <script src="animate.js"></script> <script src="cookie.js"></script> <img src="slide1.png"> <img src="slide2.png"> –the process used to go like this: That changed around 2008 when IE introduced something they called “the lookahead downloader”. It was a way to keep downloading the files that were needed while the synchronous script was being executed. Firefox, Chrome and Safari soon followed, and today most browsers use this technique under different names. Chrome and Safari have “the preload scanner” and Firefox – the speculative parser. The idea is: even though it’s not safe to build the DOM while executing a script, you can still parse the HTML to see what other resources need to be retrieved. Discovered files are added to a list and start downloading in the background on parallel connections. By the time the script finishes executing, the files may have already been downloaded. The waterfall chart for the example above now looks more like this: The download requests triggered this way are called “speculative” because it is still possible that the script could change the HTML structure (remember document.write ?), resulting in wasted guesswork. While this is possible, it is not common, and that’s why speculative parsing still gives big performance improvements. While other browsers only preload linked resources this way, in Firefox the HTML parser also runs the DOM tree construction algorithm speculatively. The upside is that when a speculation succeeds, there’s no need to re-parse a part of the file to actually compose the DOM. The downside is that there’s more work lost if and when the speculation fails. (Pre)loading stuff This manner of resource loading delivers a significant performance boost, and you don’t need to do anything special to take advantage of it. However, as a web developer, knowing how speculative parsing works can help you get the most out of it. The set of things that can be preloaded varies between browsers. All major browsers preload: scripts external CSS and images from the <img> tag Firefox also preloads the poster attribute of video elements, while Chrome and Safari preload @import rules from inlined styles. There are limits to how many files a browser can download in parallel. The limits vary between browsers and depend on many factors, like whether you’re downloading all files from one or from several different servers and whether you are using HTTP/1.1 or HTTP/2 protocol. To render the page as quickly as possible, browsers optimize downloads by assigning priority to each file. To figure out these priorities, they follow complex schemes based on resource type, position in the markup, and progress of the page rendering. While doing speculative parsing, the browser does not execute inline JavaScript blocks. This means that it won’t discover any script-injected resources, and those will likely be last in line in the fetching queue. var script = document.createElement('script'); script.src = "//somehost.com/widget.js"; document.getElementsByTagName('head')[0].appendChild(script); You should make it easy for the browser to access important resources as soon as possible. You can either put them in HTML tags or include the loading script inline and early in the document. However, sometimes you want some resources to load later because they are less important. In that case, you can hide them from the speculative parser by loading them with JavaScript late in the document. You can also check out this MDN guide on how to optimize your pages for speculative parsing. defer and async Still, synchronous scripts blocking the parser remains an issue. And not all scripts are equally important for the user experience, such as those for tracking and analytics. Solution? Make it possible to load these less important scripts asynchronously. The defer and async attributes were introduced to give developers a way to tell the browser which scripts to handle asynchronously. Both of these attributes tell the browser that it may go on parsing the HTML while loading the script “in background”, and then execute the script after it loads. This way, script downloads don’t block DOM construction and page rendering. Result: the user can see the page before all scripts have finished loading. The difference between defer and async is which moment they start executing the scripts. defer was introduced before async. Its execution starts after parsing is completely finished, but before the DOMContentLoaded event. It guarantees scripts will be executed in the order they appear in the HTML and will not block the parser. async scripts execute at the first opportunity after they finish downloading and before the window’s load event. This means it’s possible (and likely) that async scripts are not executed in the order in which they appear in the HTML. It also means they can interrupt DOM building. Wherever they are specified, async scripts load at a low priority. They often load after all other scripts, without blocking DOM building. However, if an async script finishes downloading sooner, its execution can block DOM building and all synchronous scripts that finish downloading afterwards. Note: Attributes async and defer work only for external scripts. They are ignored if there’s no src. preload async and defer are great if you want to put off handling some scripts, but what about stuff on your web page that’s critical for user experience? Speculative parsers are handy, but they preload only a handful of resource types and follow their own logic. The general goal is to deliver CSS first because it blocks rendering. Synchronous scripts will always have higher priority than asynchronous. Images visible within the viewport should be downloaded before those below the fold. And there are also fonts, videos, SVGs… In short – it’s complicated. As an author, you know which resources are the most important for rendering your page. Some of them are often buried in CSS or scripts and it can take the browser quite a while before it even discovers them. For those important resources you can now use <link rel="preload"> to communicate to the browser that you want to load them as soon as possible. All you need to write is: <link rel="preload" href="very_important.js" as="script"> You can link pretty much anything and the as attribute tells the browser what it will be downloading. Some of the possible values are: script style image font audio video You can check out the rest of the content types on MDN. Fonts are probably the most important thing that gets hidden in the CSS. They are critical for rendering the text on the page, but they don’t get loaded until browser is sure that they are going to be used. That check happens only after CSS has been parsed, and applied, and the browser has matched CSS rules to the DOM nodes. This happens fairly late in the page loading process and it often results in an unnecessary delay in text rendering. You can avoid it by using the preload attribute when you link fonts. One thing to pay attention to when preloading fonts is that you also have to set the crossorigin attribute even if the font is on the same domain: <link rel="preload" href="font.woff" as="font" crossorigin> The preload feature has limited support at the moment as the browsers are still rolling it out, but you can check the progress here. Conclusion Browsers are complex beasts that have been evolving since the 90s. We’ve covered some of the quirks from that legacy and some of the newest standards in web development. Writing your code with these guidelines will help you pick the best strategies for delivering a smooth browsing experience. Source

September 18, 2017

1 point

An Analysis of CVE-2017-5638

https://github.com/mazen160/struts-pwn

September 18, 2017

1 point

Penetration Testing Flash apps

PENETRATION TESTING FLASH APPS (AKA “HOW TO CHEAT AT BLACKJACK”) " In this post, we will walk through detailed steps to intercept, review, modify, and replay flash-based web apps. For demonstration purposes, I’ve selected a blackjack-style card game. We will work to control what cards are dealt, as well as how a score is calculated. " Source: https://privsec.blog/penetration-testing-flash-apps-aka-how-to-cheat-at-blackjack/

September 17, 2017

1 point

High-Level Approaches for Finding Vulnerabilities

High-Level Approaches for Finding Vulnerabilities Fri 15 September 2017 This post is about the approaches I've learned for finding vulnerabilities in applications (i.e. software security bugs, not misconfigurations or patch management issues). I'm writing this because it's something I wish I had when I started. Although this is intended for beginners and isn't new knowledge, I think more experienced analysts might gain from comparing this approach to their own just like I have gained from others like Brian Chess and Jacob West, Gynvael Coldwind, Malware Unicorn, LiveOverflow, and many more. Keep in mind that this is a work-in-progress. It's not supposed to be comprehensive or authoritative, and it's limited by the knowledge and experience I have in this area. I've split it up into a few sections. I'll first go over what I think the discovery process is at a high level, and then discuss what I would need to know and perform when looking for security bugs. Finally, I'll discuss some other thoughts and non-technical lessons learned that don't fit as neatly in the earlier sections. What is the vulnerability discovery process? In some ways, the vulnerability discovery process can be likened to solving puzzles like mazes, jigsaws, or logic grids. One way to think about it abstractly is to see the process as a special kind of maze where: You don't immediately have a birds-eye view of what it looks like. A map of it is gradually formed over time through exploration. You have multiple starting points and end points but aren't sure exactly where they are at first. The final map will almost never be 100% clear, but sufficient to figure out how to get from point A to B. If we think about it less abstractly, the process boils down to three steps: Enumerate entry points (i.e. ways of interacting with the app). Think about insecure states (i.e. vulnerabilities) that an adversary would like to manifest. Manipulate the app using the identified entry points to reach the insecure states. In the context of this process, the maze is the application you're researching, the map is your mental understanding of it, the starting points are your entry points, and the end points are your insecure states. Entry points can range from visibly modifiable parameters in the UI to interactions that are more obscure or transparent to the end-user (e.g. IPC). Some of the types of entry points that are more interesting to an adversary or security researcher are: Areas of code that are older and haven't changed much over time (e.g. vestiges of transition from legacy). Intersections of development efforts by segmented teams or individuals (e.g. interoperability). Debugging or test code that is carried forward into production from the development branch. Gaps between APIs invoked by the client vs. those exposed by the server. Internal requests that are not intended to be influenced directly by end-users (e.g. IPC vs form fields). The types of vulnerabilities I think about surfacing can be split into two categories: generic and contextual. Generic vulnerabilities (e.g. RCE, SQLi, XSS, etc.) can be sought across many applications often without knowing much of their business logic, whereas contextual vulnerabilities (e.g. unauthorized asset exposure/tampering) require more knowledge of business logic, trust levels, and trust boundaries. The rule of thumb I use when prioritizing what to look for is to first focus on what would yield the most impact (i.e. highest reward to an adversary and the most damage to the application's stakeholders). Lightweight threat models like STRIDE can also be helpful in figuring out what can go wrong. Let's take a look at an example web application and then an example desktop application. Let's say this web application is a single-page application (SPA) for a financial portal and we have authenticated access to it, but no server-side source code or binaries. When we are enumerating entry points, we can explore the different features of the site to understand their purpose, see what requests are made in our HTTP proxy, and bring some clarity to our mental map. We can also look at the client-side JavaScript to get a list of the RESTful APIs called by the SPA. A limitation of not having server-side code is that we can't see the gaps between the APIs called by the SPA and those that are exposed by the server. The identified entry points can then be manipulated in an attempt to reach the insecure states we're looking for. When we're thinking of what vulnerabilities to surface, we should be building a list of test-cases that are appropriate to the application's technology stack and business logic. If not, we waste time trying test cases that will never work (e.g. trying xp_cmdshell when the back-end uses Postgres) at the expense of not trying test cases that require a deeper understanding of the app (e.g. finding validation gaps in currency parameters of Forex requests). With desktop applications, the same fundamental process of surfacing vulnerabilities through identified entry points still apply but there are a few differences. Arguably the biggest difference with web applications is that it requires a different set of subject-matter knowledge and methodology for execution. The OWASP Top 10 won't help as much and hooking up the app to an HTTP proxy to inspect network traffic may not yield the same level of productivity. This is because the entry points are more diverse with the inclusion of local vectors. Compared to black-box testing, there is less guesswork involved when you have access to source code. There is less guesswork in finding entry points and less guesswork in figuring out vulnerable code paths and exploitation payloads. Instead of sending payloads through an entry point that may or may not lead to a hypothesized insecure state, you can start from a vulnerable sink and work backwards until an entry point is reached. In a white-box scenario, you become constrained more by the limitations of what you know over the limitations of what you have. What knowledge is required? So why are we successful? We put the time in to know that network. We put the time in to know it better than the people who designed it and the people who are securing it. And that's the bottom line. — Rob Joyce, TAO Chief The knowledge required for vulnerability research is extensive, changes over time, and can vary depending on the type of application. The domains of knowledge, however, tend to remain the same and can be divided into four: Application Technologies. This embodies the technologies a developer should know to build the target application, including programming languages, system internals, design paradigms/patterns, protocols, frameworks/libraries, and so on. A researcher who has experience programming with the technologies appropriate to their target will usually be more productive and innovative than someone who has a shallow understanding of just the security concerns associated with them. Offensive and Defensive Concepts. These range from foundational security principles to constantly evolving vulnerability classes and mitigations. The combination of offensive and defensive concepts guide researchers toward surfacing vulnerabilities while being able to circumvent exploit mitigations. A solid understanding of application technologies and defensive concepts is what leads to remediation recommendations that are non-generic and actionable. Tools and Methodologies. This is about effectively and efficiently putting concepts into practice. It comes through experience from spending time learning how to use tools and configuring them, optimizing repetitive tasks, and establishing your own workflow. Learning how relevant tools work, how to develop them, and re-purpose them for different use cases is just as important as knowing how to use them. A process-oriented methodology is more valuable than a tool-oriented methodology. A researcher shouldn't stop pursuing a lead when a limitation of a tool they're using has been reached. The bulk of my methodology development has come from reading books and write-ups, hands-on practice, and learning from many mistakes. Courses are usually a good way to get an introduction to topics from experts who know the subject-matter well, but usually aren't a replacement for the experience gained from hands-on efforts. Target Application. Lastly, it's important to be able to understand the security-oriented aspects of an application better than its developers and maintainers. This is about more than looking at what security-oriented features the application has. This involves getting context about its use cases, enumerating all entry points, and being able to hypothesize vulnerabilities that are appropriate to its business logic and technology stack. The next section details the activities I perform to build knowledge in this area. The table below illustrates an example of what the required knowledge may look like for researching vulnerabilities in web applications and Windows desktop applications. Keep in mind that the entries in each category are just for illustration purposes and aren't intended to be exhaustive. Web Applications Desktop Applications Application Technologies Offensive and Defensive Concepts Tools and Methodologies Target Application Thousands of hours, hundreds of mistakes, and countless sleepless nights go into building these domains of knowledge. It's the active combination and growth of these domains that helps increase the likelihood of finding vulnerabilities. If this section is characterized by what should be known, then the next section is characterized by what should be done. What activities are performed? When analyzing an application, I use the four "modes of analysis" below and constantly switch from one mode to another whenever I hit a mental block. It's not quite a linear or cyclical process. I'm not sure if this model is exhaustive, but it does help me stay on track with coverage. Within each mode are active and passive activities. Active activities require some level of interaction with the application or its environment whereas passive activities do not. That said, the delineation is not always clear. The objective for each activity is to: Understand assumptions made about security. Hypothesize how to undermine them. Attempt to undermine them. Use case analysis is about understanding what the application does and what purpose it serves. It's usually the first thing I do when tasked with a new application. Interacting with a working version of the application along with reading some high-level documentation helps solidify my understanding of its features and expected boundaries. This helps me come up with test cases faster. If I have the opportunity to request internal documentation (threat models, developer documentation, etc.), I always try to do so in order to get a more thorough understanding. This might not be as fun as doing a deep-dive and trying test cases, but it's saved me a lot of time overall. An example I can talk about is with Oracle Opera where, by reading the user-manual, I was able to quickly find out which database tables stored valuable encrypted data (versus going through them one-by-one). Implementation analysis is about understanding the environment within which the application resides. This may involve reviewing network and host configurations at a passive level, and performing port or vulnerability scans at an active level. An example of this could be a system service that is installed where the executable has an ACL that allows low-privileged users to modify it (thereby leading to local privilege escalation). Another example could be a web application that has an exposed anonymous FTP server on the same host which could lead to exposure of source code and other sensitive files. These issues are not inherent to the applications themselves, but how they have been implemented in their environments. Communications analysis is about understanding what and how the target exchanges information with other processes and systems. Vulnerabilities can be found by monitoring or actively sending crafted requests through different entry points and checking if the responses yield insecure states. Many web application vulnerabilities are found this way. Network and data flow diagrams, if available, are very helpful in seeing the bigger picture. While an understanding of application-specific communication protocols is required for this mode, an understanding of the application's internal workings are not. How user-influenced data is being passed and transformed within a system is more or less seen as a black-box in this analysis mode, with the focus on monitoring and sending requests and analyzing the responses that come out. If we go back to our hypothetical financial portal from earlier, we may want to look at the feature that allows clients to purchase prepaid credit cards in different currencies as a contrived example. Let's assume that a purchase request accepts the following parameters: fromAccount: The account from which money is being withdrawn to purchase the prepaid card. fromAmount: The amount of money to place into the card in the currency of fromAccount (e.g. 100). cardType: The type of card to purchase (e.g. USD, GBP). currencyPair: The currency pair for fromAccount and cardType (e.g. CADUSD, CADGBP). The first thing we might want to do is send a standard request so that we know what a normal response should look like as a baseline. A request and response to purchase an $82 USD card from a CAD account might look like this: Request Response { "fromAccount": "000123456", "fromAmount": 100, "cardType": "USD", "currencyPair": "CADUSD" } { "status": "ok", "cardPan": 4444333322221111, "cardType": "USD" "toAmount": 82.20, } We may not know exactly what happened behind the scenes, but it came out ok as indicated by the status attribute. Now if we tweak the fromAmount to something negative, or the fromAccount to someone else's account, those may return erroneous responses indicating that validation is being performed. If we change the value of currencyPair from CADUSD to CADJPY, we'll see that the toAmount changes from 82.20 to 8863.68 while the cardType is still USD. We're able to get more bang for our buck by using a more favourable exchange rate while the desired card type stays the same. If we have access to back-end code, it would make it easier to know what's happening to that user input and come up with more thorough test cases that could lead to insecure states with greater precision. Perhaps an additional request parameter that's not exposed on the client-side could have altered the expected behaviour in a potentially malicious way. Code and binary analysis is about understanding how user-influenced input is passed and transformed within a target application. To borrow an analogy from Practical Malware Analysis, if the last three analysis modes can be compared to looking at the outside of a body during an autopsy, then this mode is where the dissection begins. There are a variety of activities that can be performed for static and dynamic analysis. Here are a few: Data flow analysis. This is useful for scouting entry points and understanding how data can flow toward potential insecure states. When I'm stuck trying to get a payload to work in the context of communications analysis, I tweak it in different ways to try get toward that hypothesized insecure state. In comparison with this mode, I can first look into checking whether that insecure state actually exists, and if so, figure out how to craft my payload to get there with greater precision. As mentioned earlier, one of the benefits of this mode is being able to find insecure states and being able to work backwards to craft payloads for corresponding entry points. Static and dynamic analysis go hand-in-hand here. If you're looking to go from point A to B, then static analysis is like reading a map, and dynamic analysis is like getting a live overview of traffic and weather conditions. The wide and abstract understanding of an application you get from static analysis is complemented by the more narrow and concrete understanding you get from dynamic analysis. Imports analysis. Analyzing imported APIs can give insight into how the application functions and how it interacts with the OS, especially in the absence of greater context. For example, the use of cryptographic functions can indicate that some asset is being protected. You can trace calls to figure out what it's protecting and whether it's protected properly. If a process is being created, you can look into determining whether user-input can influence that. Understanding how the software interacts with the OS can give insight on entry points you can use to interact with it (e.g. network listeners, writable files, IOCTL requests). Strings analysis. As with analyzing imports, strings can give some insights into the program's capabilities. I tend to look for things like debug statements, keys/tokens, and anything that looks suspicious in the sense that it doesn't fit with how I would expect the program to function. Interesting strings can be traced for its usages and to see if there are code paths reachable from entry points. It's important to differentiate between strings that are part of the core program and those that are included as part of statically-imported libraries. Security scan triage. Automated source code scanning tools may be helpful in finding generic low-hanging fruit, but virtually useless at finding contextual or design-based vulnerabilities. I don't typically find this to be the most productive use of my time because of the sheer number of false positives, but if it yields many confirmed vulnerabilities then it could indicate a bigger picture of poor secure coding practices. Dependency analysis. This involves triaging dependencies (e.g. open-source components) for known vulnerabilities that are exploitable, or finding publicly unknown vulnerabilities that could be leveraged within the context of the target application. A modern large application is often built on many external dependencies. A subset of them can have vulnerabilities, and a subset of those vulnerabilities can "bubble-up" to the main application and become exploitable through an entry point. Common examples include Heartbleed, Shellshock, and various Java deserialization vulnerabilities. Repository analysis. If you have access to a code repository, it may help identify areas that would typically be interesting to researchers. Aside from the benefits of having more context than with a binary alone, it becomes easier to find older areas of code that haven't changed in a long time and areas of code that bridge the development efforts of segmented groups. Code and binary analysis typically takes longer than the other modes and is arguably more difficult because researchers often need to understand the application and its technologies to nearly the same degree as its developers. In practice, this knowledge can often be distributed among segmented groups of developers while researchers need to understand it holistically to be effective. I cannot overstate how important it is to have competency in programming for this. A researcher who can program with the technologies of their target application is almost always better equipped to provide more value. On the offensive side, finding vulnerabilities becomes more intuitive and it becomes easier to adapt exploits to multiple environments. On the defensive side, non-generic remediation recommendations can be provided that target the root cause of the vulnerability at the code level. Similar to the domains of knowledge, actively combining this analysis mode with the others helps makes things click and increases the likelihood of finding vulnerabilities. Other Thoughts and Lessons Learned This section goes over some other thoughts worth mentioning that didn't easily fit in previous sections. Vulnerability Complexity Vulnerabilities vary in a spectrum of complexity. On one end, there are trivial vulnerabilities which have intuitive exploit code used in areas that are highly prone to scrutiny (e.g. the classic SQLi authentication bypass). On the other end are the results of unanticipated interactions between system elements that by themselves are neither insecure nor badly engineered, but lead to a vulnerability when chained together (e.g. Chris Domas' "Memory Sinkhole"). I tend to distinguish between these ends of the spectrum with the respective categories of "first-order vulnerabilities" and "second-order vulnerabilities", but there could be different ways to describe them. The modern exploit is not a single shot vulnerability anymore. They tend to be a chain of vulnerabilities that add up to a full-system compromise. — Ben Hawkes, Project Zero Working in Teams It is usually helpful to be upfront to your team about what you know and don't know so that you can (ideally) be delegated tasks in your areas of expertise while being given opportunities to grow in areas of improvement. Pretending and being vague is counterproductive because people who know better will sniff it out easily. If being honest can become political, then maybe it's not an ideal team to work with. On the other hand, you shouldn't expect to be spoon-fed everything you need to know. Learning how to learn on your own can help you become self-reliant and help your team's productivity. If you and your team operate on billable time, the time one takes to teach you might be time they won't get back to focus on their task. The composition of a team in a timed project can be a determining factor to the quantity and quality of vulnerabilities found. Depending on the scale and duration of the project, having more analysts could lead to faster results or lead to extra overhead and be counterproductive. A defining characteristic of some of the best project teams I've been on is that in addition to having good rapport and communication, we had diverse skill sets that played off each other which enhanced our individual capabilities. We also delegated parallelizable tasks that had converging outcomes. Here are some examples: Bob scouts for entry points and their parameters while Alice looks for vulnerable sinks. Alice fleshes out the payload to a vulnerable sink while Bob makes sense of the protocol to the sink's corresponding entry point. Bob reverses structs by analyzing client requests dynamically while Alice contributes by analyzing how they are received statically. Bob looks for accessible file shares on a network while Alice sifts through them for valuable information. Overall, working in teams can increase productivity but it takes some effort and foresight to make that a reality. It's also important to know when adding members won't be productive so as to avoid overhead. Final Notes Thanks for taking the time to read this if you made it this far. I hope this has taken some of the magic out of what's involved when finding vulnerabilities. It's normal to feel overwhelmed and a sense of impostor syndrome when researching (I don't think that ever goes away). This is a constant learning process that extends beyond a day job or what any single resource can teach you. I encourage you to try things on your own and learn how others approach this so you can aggregate, combine, and incorporate elements from each into your own approach. I'd like to thank @spookchop and other unnamed reviewers for their time and contributions. I am always open to more (critical) feedback so please contact me if you have any suggestions. The following tools were used for graphics: GIMP, WordItOut, draw.io, and FreeMind. @Jackson_T Sursa: http://jackson.thuraisamy.me/finding-vulnerabilities.html

September 17, 2017

1 point

Pentester Academy Collection

If you don't know how to do it, it's not for you. Try ballet instead.

July 12, 2017

1 point

Windows Exploitation resources

Awesome Windows Exploitation A curated list of awesome Windows Exploitation resources, and shiny things. There is no pre-established order of items in each category, the order is for contribution. If you want to contribute, please read the guide. Table of Contents Windows stack overflows Windows heap overflows Kernel based Windows overflows Return Oriented Programming Windows memory protections Bypassing filter and protections Typical windows exploits Exploit development tutorial series Corelan Team Fuzzysecurity Securitysift Whitehatters Academy TheSprawl Tools Windows stack overflows Stack Base Overflow Articles. Win32 Buffer Overflows (Location, Exploitation and Prevention) - by Dark spyrit [1999] Writing Stack Based Overflows on Windows - by Nish Bhalla’s [2005] Stack Smashing as of Today - by Hagen Fritsch [2009] SMASHING C++ VPTRS - by rix [2000] Windows heap overflows Heap Base Overflow Articles. Third Generation Exploitation smashing heap on 2k - by Halvar Flake [2002] Exploiting the MSRPC Heap Overflow Part 1 - by Dave Aitel (MS03-026) [September 2003] Exploiting the MSRPC Heap Overflow Part 2 - by Dave Aitel (MS03-026) [September 2003] Windows heap overflow penetration in black hat - by David Litchfield [2004] Glibc Adventures: The Forgotten Chunk - by François Goichon [2015] Pseudomonarchia jemallocum - by argp & huku The House Of Lore: Reloaded - by blackngel [2010] Malloc Des-Maleficarum - by blackngel [2009] free() exploitation technique - by huku Understanding the heap by breaking it - by Justin N. Ferguson [2007] The use of set_head to defeat the wilderness - by g463 The Malloc Maleficarum - by Phantasmal Phantasmagoria [2005] Exploiting The Wilderness - by Phantasmal Phantasmagoria [2004] Advanced Doug lea's malloc exploits - by jp Kernel based Windows overflows Kernel Base Exploit Development Articles. How to attack kernel based vulns on windows was done - by a Polish group called “sec-labs” [2003] Sec-lab old whitepaper Sec-lab old exploit Windows Local Kernel Exploitation (based on sec-lab research) - by S.K Chong [2004] How to exploit Windows kernel memory pool - by SoBeIt [2005] Exploiting remote kernel overflows in windows - by Eeye Security Kernel-mode Payloads on Windows in uninformed - by Matt Miller Exploiting 802.11 Wireless Driver Vulnerabilities on Windows BH US 2007 Attacking the Windows Kernel Remote and Local Exploitation of Network Drivers Exploiting Comon Flaws In Drivers I2OMGMT Driver Impersonation Attack Real World Kernel Pool Exploitation Exploit for windows 2k3 and 2k8 Alyzing local privilege escalations in win32k Intro to Windows Kernel Security Development There’s a party at ring0 and you’re invited Windows kernel vulnerability exploitation A New CVE-2015-0057 Exploit Technology - by Yu Wang [2016] Exploiting CVE-2014-4113 on Windows 8.1 - by Moritz Jodeit [2016] Easy local Windows Kernel exploitation - by Cesar Cerrudo [2012] Windows Kernel Exploitation - by Simone Cardona 2016 Return Oriented Programming The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls Blind return-oriented programming Sigreturn-oriented Programming Jump-Oriented Programming: A New Class of Code-Reuse Attack Out of control: Overcoming control-flow integrity ROP is Still Dangerous: Breaking Modern Defenses Loop-Oriented Programming(LOP): A New Code Reuse Attack to Bypass Modern Defenses - by Bingchen Lan, Yan Li, Hao Sun, Chao Su, Yao Liu, Qingkai Zeng [2015] Systematic Analysis of Defenses Against Return-Oriented Programming -by R. Skowyra, K. Casteel, H. Okhravi, N. Zeldovich, and W. Streilein [2013] Return-oriented programming without returns -by S.Checkoway, L. Davi, A. Dmitrienko, A. Sadeghi, H. Shacham, and M. Winandy [2010] Jump-oriented programming: a new class of code-reuse attack -by T. K. Bletsch, X. Jiang, V. W. Freeh, and Z. Liang [2011] Stitching the gadgets: on the ineffectiveness of coarse-grained control-flow integrity protection - by L. Davi, A. Sadeghi, and D. Lehmann [2014] Size does matter: Why using gadget-chain length to prevent code-reuse attacks is hard - by E. Göktas, E.Athanasopoulos, M. Polychronakis, H. Bos, and G.Portokalidis [2014] Buffer overflow attacks bypassing DEP (NX/XD bits) – part 1 - by Marco Mastropaolo [2005] Buffer overflow attacks bypassing DEP (NX/XD bits) – part 2 - by Marco Mastropaolo [2005] Practical Rop - by Dino Dai Zovi [2010] Exploitation with WriteProcessMemory - by Spencer Pratt [2010] Exploitation techniques and mitigations on Windows - by skape A little return oriented exploitation on Windows x86 – Part 1 - by Harmony Security and Stephen Fewer [2010] A little return oriented exploitation on Windows x86 – Part 2 - by Harmony Security and Stephen Fewer [2010] Windows memory protections Windows memory protections Introduction Articles. Data Execution Prevention /GS (Buffer Security Check) /SAFESEH ASLR SEHOP Bypassing filter and protections Windows memory protections Bypass Methods Articles. Third Generation Exploitation smashing heap on 2k - by Halvar Flake [2002] Creating Arbitrary Shellcode In Unicode Expanded Strings - by Chris Anley Advanced windows exploitation - by Dave Aitel [2003] Defeating the Stack Based Buffer Overflow Prevention Mechanism of Microsoft Windows 2003 Server - by David Litchfield Reliable heap exploits and after that Windows Heap Exploitation (Win2KSP0 through WinXPSP2) - by Matt Conover in cansecwest 2004 Safely Searching Process Virtual Address Space - by Matt Miller [2004] IE exploit and used a technology called Heap Spray Bypassing hardware-enforced DEP - by Skape (Matt Miller) and Skywing (Ken Johnson) [October 2005] Exploiting Freelist[0] On XP Service Pack 2 - by Brett Moore [2005] Kernel-mode Payloads on Windows in uninformed Exploiting 802.11 Wireless Driver Vulnerabilities on Windows Exploiting Comon Flaws In Drivers Heap Feng Shui in JavaScript by Alexander sotirov [2007] Understanding and bypassing Windows Heap Protection - by Nicolas Waisman [2007] Heaps About Heaps - by Brett moore [2008] Bypassing browser memory protections in Windows Vista - by Mark Dowd and Alex Sotirov [2008] Attacking the Vista Heap - by ben hawkes [2008] Return oriented programming Exploitation without Code Injection - by Hovav Shacham (and others ) [2008] Token Kidnapping and a super reliable exploit for windows 2k3 and 2k8 - by Cesar Cerrudo [2008] Defeating DEP Immunity Way - by Pablo Sole [2008] Practical Windows XP2003 Heap Exploitation - by John McDonald and Chris Valasek [2009] Bypassing SEHOP - by Stefan Le Berre Damien Cauquil [2009] Interpreter Exploitation : Pointer Inference and JIT Spraying - by Dionysus Blazakis[2010] Write-up of Pwn2Own 2010 - by Peter Vreugdenhil All in one 0day presented in rootedCON - by Ruben Santamarta [2010] DEP/ASLR bypass using 3rd party - by Shahin Ramezany [2013] Typical windows exploits Real-world HW-DEP bypass Exploit - by Devcode Bypassing DEP by returning into HeapCreate - by Toto First public ASLR bypass exploit by using partial overwrite - by Skape Heap spray and bypassing DEP - by Skylined First public exploit that used ROP for bypassing DEP in adobe lib TIFF vulnerability Exploit codes of bypassing browsers memory protections PoC’s on Tokken TokenKidnapping . PoC for 2k3 -part 1 - by Cesar Cerrudo PoC’s on Tokken TokenKidnapping . PoC for 2k8 -part 2 - by Cesar Cerrudo An exploit works from win 3.1 to win 7 - by Tavis Ormandy KiTra0d Old ms08-067 metasploit module multi-target and DEP bypass PHP 6.0 Dev str_transliterate() Buffer overflow – NX + ASLR Bypass SMBv2 Exploit - by Stephen Fewer Microsoft IIS 7.5 remote heap buffer overflow - by redpantz Browser Exploitation Case Study for Internet Explorer 11 - by Moritz Jodeit [2016] Exploit development tutorial series Exploid Development Tutorial Series Base on Windows Operation System Articles. Corelan Team Exploit writing tutorial part 1 : Stack Based Overflows Exploit writing tutorial part 2 : Stack Based Overflows – jumping to shellcode Exploit writing tutorial part 3 : SEH Based Exploits Exploit writing tutorial part 3b : SEH Based Exploits – just another example Exploit writing tutorial part 4 : From Exploit to Metasploit – The basics Exploit writing tutorial part 5 : How debugger modules & plugins can speed up basic exploit development Exploit writing tutorial part 6 : Bypassing Stack Cookies, SafeSeh, SEHOP, HW DEP and ASLR Exploit writing tutorial part 7 : Unicode – from 0x00410041 to calc Exploit writing tutorial part 8 : Win32 Egg Hunting Exploit writing tutorial part 9 : Introduction to Win32 shellcoding Exploit writing tutorial part 10 : Chaining DEP with ROP – the Rubik’s Cube Exploit writing tutorial part 11 : Heap Spraying Demystified Fuzzysecurity Part 1: Introduction to Exploit Development Part 2: Saved Return Pointer Overflows Part 3: Structured Exception Handler (SEH) Part 4: Egg Hunters Part 5: Unicode 0x00410041 Part 6: Writing W32 shellcode Part 7: Return Oriented Programming Part 8: Spraying the Heap Chapter 1: Vanilla EIP Part 9: Spraying the Heap Chapter 2: Use-After-Free Part 10: Kernel Exploitation -> Stack Overflow Part 11: Kernel Exploitation -> Write-What-Where Part 12: Kernel Exploitation -> Null Pointer Dereference Part 13: Kernel Exploitation -> Uninitialized Stack Variable Part 14: Kernel Exploitation -> Integer Overflow Part 15: Kernel Exploitation -> UAF Part 16: Kernel Exploitation -> Pool Overflow Part 17: Kernel Exploitation -> GDI Bitmap Abuse (Win7-10 32/64bit) Heap Overflows For Humans 101 Heap Overflows For Humans 102 Heap Overflows For Humans 102.5 Heap Overflows For Humans 103 Heap Overflows For Humans 103.5 Securitysift Windows Exploit Development – Part 1: The Basics Windows Exploit Development – Part 2: Intro to Stack Based Overflows Windows Exploit Development – Part 3: Changing Offsets and Rebased Modules Windows Exploit Development – Part 4: Locating Shellcode With Jumps Windows Exploit Development – Part 5: Locating Shellcode With Egghunting Windows Exploit Development – Part 6: SEH Exploits Windows Exploit Development – Part 7: Unicode Buffer Overflows Whitehatters Academy Intro to Windows kernel exploitation 1/N: Kernel Debugging Intro to Windows kernel exploitation 2/N: HackSys Extremely Vulnerable Driver Intro to Windows kernel exploitation 3/N: My first Driver exploit Intro to Windows kernel exploitation 3.5/N: A bit more of the HackSys Driver Backdoor 103: Fully Undetected Backdoor 102 Backdoor 101 TheSprawl corelan - integer overflows - exercise solution heap overflows for humans - 102 - exercise solution exploit exercises - protostar - final levels exploit exercises - protostar - network levels exploit exercises - protostar - heap levels exploit exercises - protostar - format string levels exploit exercises - protostar - stack levels open security training - introduction to software exploits - uninitialized variable overflow open security training - introduction to software exploits - off-by-one open security training - introduction to re - bomb lab secret phase open security training - introductory x86 - buffer overflow mystery box corelan - tutorial 10 - exercise solution corelan - tutorial 9 - exercise solution corelan - tutorial 7 - exercise solution getting from seh to nseh corelan - tutorial 3b - exercise solution Tools Disassemblers, debuggers, and other static and dynamic analysis tools. angr - Platform-agnostic binary analysis framework developed at UCSB's Seclab. BARF - Multiplatform, open source Binary Analysis and Reverse engineering Framework. binnavi - Binary analysis IDE for reverse engineering based on graph visualization. Bokken - GUI for Pyew and Radare. Capstone - Disassembly framework for binary analysis and reversing, with support for many architectures and bindings in several languages. codebro - Web based code browser using clang to provide basic code analysis. dnSpy - .NET assembly editor, decompiler and debugger. Evan's Debugger (EDB) - A modular debugger with a Qt GUI. GDB - The GNU debugger. GEF - GDB Enhanced Features, for exploiters and reverse engineers. hackers-grep - A utility to search for strings in PE executables including imports, exports, and debug symbols. IDA Pro - Windows disassembler and debugger, with a free evaluation version. Immunity Debugger - Debugger for malware analysis and more, with a Python API. ltrace - Dynamic analysis for Linux executables. objdump - Part of GNU binutils, for static analysis of Linux binaries. OllyDbg - An assembly-level debugger for Windows executables. PANDA - Platform for Architecture-Neutral Dynamic Analysis PEDA - Python Exploit Development Assistance for GDB, an enhanced display with added commands. pestudio - Perform static analysis of Windows executables. Process Monitor - Advanced monitoring tool for Windows programs. Pyew - Python tool for malware analysis. Radare2 - Reverse engineering framework, with debugger support. SMRT - Sublime Malware Research Tool, a plugin for Sublime 3 to aid with malware analyis. strace - Dynamic analysis for Linux executables. Udis86 - Disassembler library and tool for x86 and x86_64. Vivisect - Python tool for malware analysis. X64dbg - An open-source x64/x32 debugger for windows. Sursa: https://github.com/enddo/awesome-windows-exploitation

January 22, 2017

1 point

Sign In

Fi8sVrs

Points

Posts

Usr6

Points

Posts

Nytro

Points

Posts

osirium

Points

Posts

Popular Content

rVMI: Perform Full System Analysis with Ease

Optimizing web servers for high throughput and low latency

AWSBucketDump – AWS S3 Security Scanning Tool

NSA Capstone Course - Reverse Engineering

BeRoot - Windows Privilege Escalation Tool

CCleaner Hacked to Distribute Malware

CCleaner Hacked to Distribute Malware

X41 Browser Security White Paper - Tools and PoCs

New Android trojan targeting over 60 banks and social apps

Building the DOM faster: speculative parsing, async, defer and preload

An Analysis of CVE-2017-5638

Penetration Testing Flash apps

High-Level Approaches for Finding Vulnerabilities

Pentester Academy Collection

Windows Exploitation resources

Browse

Activity

Pages