Nytro Posted July 16, 2014 Report Posted July 16, 2014 Shellcode analysis like a semi-PRODuring Nicolas Brulez‘s training at REcon there was a challenge where the goal was to have function names instead of hashes into IDA in order to make shellcode analysis easier. This post describes the problem with more detail, possible solutions and the approach I took to solve the challenge. If you would like to know the PRO version then take Nicolas’s training next year. IntroductionIn a few words, in order to resolve API function addresses, shellcode uses to parse EAT from loaded modules and compare a given function name with a hash, this is sometimes used by malware as well for the purpose of being stealth. More information about this technique is available here. The problem is that on IDA Pro you will have an output like the below:seg000:00386C91 mov ecx, 0A1233BBChseg000:00386C96 mov edx, [ebp+4]seg000:00386C99 call find_func_addrseg000:00386C9E call eaxSince there are many calls to find_func_addr, static analysis without knowing the function name related to each hash is very time-consuming, it would be necessary to follow each call on the debugger and manually update IDA Pro. Possible solutionsThere are two ways to solve this problem: statically or dynamically. Each of them has its advantages and disadvantages. DynamicThe only advantage I can think of this approach is that you don’t need to reverse engineer the hash function in order to get every function name, the disadvantages are: (a) the code can take different paths and then you won’t be able to identify all the hashes; ( it’s not portable, the Immunity or Olly script needs to be changed for every other sample that use hashes. A similar example of using the dynamic approach can be found at the VRT blog. StaticThis is the approach I took, the advantages/disadvantages are the reserve of the dynamic one, I had to crack the hash algorithm (which was pretty simple) but at least the other parts can be easily portable for further samples. The static solutionThe solution is divided in three parts: (a) get the function names from EAT of a given DLL; ( calculate the hash for each function name; © import the data into IDA. Getting the function namesI ended up using DLL Export Viewer, Nicolas used pefile, I totally forgot about pefile. For the purpose of this post, the code below is enough to illustrate:dll = "c:/windows/system32/kernel32.dll"pe = pefile.PE(dll) for func in pe.DIRECTORY_ENTRY_EXPORT.symbols: print func.nameCalculate the hashThis will change for every sample. On this example, the hash algorithm was very simple, it can get really complicated and then the dynamic approach would be better. There is a MUCH easier/clever way to calculate the hash, however, you will need to take Nicolas’s training to get to know it, his solution to this was a *facepalm* moment. Hash function:seg000:00386BE0 calc_hash proc near ; CODE XREF: find_func_addr+28pseg000:00386BE0 push eaxseg000:00386BE1 xor eax, eaxseg000:00386BE3 xor ecx, ecxseg000:00386BE5seg000:00386BE5 loop_calc_hash: ; CODE XREF: calc_hash+13jseg000:00386BE5 lodsbseg000:00386BE6 test al, alseg000:00386BE8 jz short calc_hash_endseg000:00386BEA xor ecx, eaxseg000:00386BEC rol ecx, 3seg000:00386BEF inc ecxseg000:00386BF0 shl eax, 8seg000:00386BF3 jmp short loop_calc_hashseg000:00386BF5 ; ---------------------------------------------------------------------------seg000:00386BF5seg000:00386BF5 calc_hash_end: ; CODE XREF: calc_hash+8jseg000:00386BF5 pop eaxseg000:00386BF6 retnseg000:00386BF6 calc_hash endpRelevant C code:#include <stdio.h>#include <inttypes.h> #include <string.h>__inline__ rol(uint32_t operand, uint8_t width) { __asm__ __volatile__ ("rol %%cl, %%eax" : "=a" (operand) : "a" (operand), "c" (width) );}int main(int argc, char* argv[]) { unsigned int i = 0; int out = 0; int eax = 0; FILE *ptr_file; char buf[100]; ptr_file =fopen(argv[1],"r"); if (!ptr_file) { printf("Unable to read text file\n"); return 1; } while (fgets(buf,100, ptr_file)!=NULL) { for (i = 0; i < strlen(buf)-1; i++) { eax = eax | buf; out = out ^ eax; out = rol(out,3); out += 1; eax = eax << 8; } printf("0x%08x",out); printf(","); printf("%s",buf); eax = 0; out = 0; } fclose(ptr_file); return 0;}Import data into IDABasically, the script below add Enums into IDA, later just press ‘M’ on every hash to get the function name. If the hash is not found, consider import other DLLs. I didn’t find any function in IDA to automatically “refresh” a hex value for an enum value, if this is available please let me know. IDAPython script:from idaapi import *from idc import *SANE_NAME_RE = re.compile("[@?$:`+&\[\]]", 0)def sanitize_name(name): return SANE_NAME_RE.sub("_", name)def main(): enum_name = AskStr("Kernel32_Functions_Hash","Enter Enum name:") id = idc.AddEnum(0, enum_name, idaapi.hexflag()) print 'Enum id %d' % (id) file_path = AskFile(0,"*.*","Open txt file") file = open(file_path,'r') addr = '' name = '' for line in file: addr,name = line.split(',') addr_hex = long(addr,16) name = sanitize_name(name).rstrip('\n') if __name__ == '__main__': main()After executing script and adding the Enum to the hash value:.data:00405159 mov ecx, KERNEL32_ExitProcess.data:0040515E mov edx, [ebp+4].data:00405161 call get_func_addr.data:00405166 push 0.data:00405168 call eaxConclusionAs shown on another post, even without a clean IAT or loading a binary file like when dealing with shellcodes, it’s still possible to have a decent static analysis by mixing IDAPython with other tools.Sursa: Shellcode analysis like a semi-PRO | drimeldotorg Quote