Nytro Posted April 19, 2014 Report Posted April 19, 2014 PlaidCTF 2014 Paris writeupPlaidCTF 2014 Paris writeupClue: This binary was found on some of our Windows machines. It's got The Plaguewritten all over it. What secrets are contained inside?The download is a tbz file containing:$ md5sum paris.exe11e9ffcc9d5e963eb9c2d806cedb2ad3 *paris.exeUsing file we learn:$ file paris.exeparis.exe: PE32 executable (GUI) Intel 80386 (stripped to external PDB), for MS WindowsIDA reports that the import section has been destroyed which is typical ofprograms that have been obfuscated.The entry to the program is unspectacular, using ReadConsole to read up to 255 charactersinto a password buffer located at 0x401490, then overwriting the last two bytes (CRLF)of the input with nulls.Next function sub_402066 is called and we can see that if we return withesi == 0xdeadbeef and the global variable word_4024A3 == 0, that they programwill print "Yeah!"We surmise that sub_402066 is the password checking function and investigate.Here is what the function looks like:.text:00402066 exceptionHell proc near ; CODE XREF: start+56.text:00402066 xor ecx, ecx.text:00402068.text:00402068 loc_402068: ; CODE XREF: exceptionHell+39.text:00402068 push offset MAIN_EXC_HANDLER ; setup SEH == sub_402376.text:0040206D push fs:off_7FFDE000.text:00402074 mov fs:off_7FFDE000, esp.text:0040207B mov eax, 0.text:00402080 mov [eax], eax ; NULL pointer deref generated MEMORY_ACCESS exception.text:00402082 sub edi, 1111h ; This is important .text:00402088 pop fs:off_7FFDE000.text:0040208F add esp, 4.text:00402092 cmp value_1, 1.text:00402099 jz locret_401C90.text:0040209F jmp short loc_402068.text:0040209F exceptionHell endpThis function simply generates MEMORY_ACCESS exceptions until the global variablehalt_flag (byte_4024AA) == 1. So we move on to exploring the behavior of MAIN_EXC_HANDLERat 402376. As an SEH handler it has the following prototype:EXCEPTION_DISPOSITION __cdecl _except_handler( struct _EXCEPTION_RECORD *ExceptionRecord, void * EstablisherFrame, struct _CONTEXT *ContextRecord, void * DispatcherContext );For this program the CONTEXT* is the thing we are interested in. This is where the faulting threads execution context is saved and provided to the exception handler for examination andpossible modification. The CONTEXT struct contains space for all CPU registers, with the general purpose registers beginning at offset 0x9C into the struct.MAIN_EXC_HANDLER uses a global variable (byte_40225E) as an index into a jump tablelocated at 0x40220E. The jump table contains 20 addresses and the function simply loopsthrough the address from 0..19 repeatedly for the duration.At this point it starts to become clear that this main program loop is a simple virtual machinewith a few twists. The basic details of the machine are as follows: 1. The machine has 20 simple instructions. The machines byte code begins at 0x401D8E. 2. The machine's pc is at word_4024A7. 3. The machine has a 256 word stack that grows from low memory to high memory at word_401A90 4. There is an initial value of 0x5a4d on the stack and the initial SP is 2. Push is post-increment, pop is pre-decrement. 5. The machine makes use of 16-bit registers that overlay the exception handlers CONTEXT struct such that uint16_t *registers = (uint16_t*)&CONTEXT._Edi. Thus registers[0] and registers[1] overlay _Edi, registers[2] and registers[3] overlay _Esi etc. This is important because the faulting thread's registers are restored from the CONTEXT struct on return from the exception which means that the VM register may be manipulated from within the exception handler and from outside the exception handler where they take their values at the moment an exception is triggered.In practice, the only register of significance that is changed outside the exception is ediwhich is manipulated at 402082 where 0x1111 is subtracted upon return from every exception.The next step in understanding the program was to determine what the 20 VM instructions doand if possible disassemble the virtual machine program responsible for verifying our password.None of the 20 instruction handlers are terribly difficult to understand and we end up withthe following set of operations:Opcode Operation 0 nop 1 mov rA, rB ; registers[A] = registers[B] 2 mov password[rA], rB ; *(uint16_t*)(registers[A] + (uint8_t*)password) = registers[B] 3 mov rA, password[rB] ; registers[A] = bswap(*(uint16_t*)(registers[B] + (uint8_t*)password)) 4 mov rA, #imm ; registers[A] = 16 bit immediate 5 add rA, rB ; registers[A] += registers[B] 6 sub rA, rB ; registers[A] -= registers[B] 7 xor rA, rB ; registers[A] ^= registers[B] 8 and rA, rB ; registers[A] &= registers[B] 9 shr rA ; registers[A] >>= 8 10 not rA ; registers[A] ~= registers[A] 11 inc rA ; registers[A]++ 12 cmp rA, rB ; ZF = rA == rB 13 jmp dest ; pc = dest 14 beq dest ; if (ZF) pc = dest 15 push rA ; stack[sp] = registers[A], sp += 2 16 pop rA ; sp -= 2, registers[A] = stack[sp] 17 bswap rA ; registers[A] = bswap(registers[A]) 18 xortab rA ; for i in 0..512: lookup_table[i] ^= xor_key[registers[A]] 19 haltWhere:0x401490: password0x401DFA: xor_key0x401690: lookup_tableOne of the challenges with understanding what the virtual machine does is that atfirst glance it appears that the program simply does the following: i = 0 while not halt_flag: execute_inst(i) i = (i + 1) % 20Upon further investigation it turns out that every instruction begins by savingall of the writable portions of the VM to a dedicated save area. Areas saved includethe password buffer, stack, lookup table, stack pointer, program counter, halt flag,and registers.The operands for each instruction are encoded in 1-3 bytes taken from the "program"array at 0x401D8E. These bytes specify which VM registers the instruction operates on,any immediate data or jump target, and a value that indicates whether the instructionwill be discarded or not. When an instruction is discarded, the VM state is restored from the state that was saved at the beginning of the instruction. A computation in the VM advances only when an instruction is not discarded.By modelling the behavior of the VM in a high level language we instrumented each instructionand recorded only those instructions that did not get discarded. From the instruction tracewe learned that the program being executed in the virtual machine looked like the following: 0x00: nop 0x01: nop 0x02: nop 0x03: mov r4, #0x3133 0x06: mov r6, #0x0 0x09: mov r8, #0xff00 0x0c: mov r10, #0xff; r6 used as index into our passwordLENGTH_LOOP: 0x0f: mov r0, htons(pass[r6]) <------- from 0x34 0x11: mov r14, r0 <***** r0 gets decremented by (18 * 0x1111) externally between 0x0f and 0x11 0x13: bswap r14 0x14: not r14 0x15: cmpeq r4, r14 0x17: beq 0x37 (BREAK) ; end of string \x00\x00 encountered 0x1a: mov r12, r14 0x1c: and r12, r8 ; HI(r12) 0x1e: and r14, r10 ; LO(R14) 0x20: shr r12, 8 0x21: xor r14, r12 ; xor the two bytes together 0x23: mov r12, #0x200 0x26: add r14, r14 ; double for uint16_t index 0x28: add r12, r14 0x2a: mov r14, htons(pass[r12]) 0x2c: bswap r14 0x2d: pop r12 0x2e: xor r14, r12 0x30: push r12 0x31: push r14 0x32: xortable r6 0x33: inc r6 0x34: jmp 0xf (LENGTH_LOOP)BREAK: 0x37: xor r14, r14 <---------- from 0x17 0x39: mov r4, #0x100 ; index to beginning of check values 0x3c: mov r12, #0xaf21 ; "end of" value for check values; r4 starts at offset 256 into password buffer (static content); increments by 2 each passCHECK_LOOP: 0x3f: mov r10, htons(pass[r4]) <---------- from 0x4c 0x41: bswap r10 0x42: inc r4 0x43: inc r4 0x44: pop r6 0x45: cmpeq r12, r10 0x47: beq 0x56 (MAYBE) 0x4a: cmpeq r6, r10 0x4c: beq 0x3f (CHECK_LOOP)FAIL: 0x4f: mov r6, #0x0 0x52: mov r4, #0x0 0x55: haltMAYBE: 0x56: mov r10, #0x5a4d <--------- from 0x47 ; "bottom" of stack marker 0x59: cmpeq r10, r6 0x5b: beq 0x65 (SUCCESS)FAIL2: 0x5e: mov r6, #0x0 0x61: mov r4, #0x0 0x64: haltSUCCESS: 0x65: mov r6, #0xdead <---------- from 0x5b 0x68: mov r4, #0xbeef 0x6b: haltPerhaps the trickiest aspect of the computation happens at 0xf, 0x11. BecauseVM register equates to DI, the sub edi, 0x1111 instruction in the mainexception generator loop modifies the content of r0 between 0xf and 0x11.This is complicated by the fact that large numbers of VM instructions are simply discarded but 0x1111 is subtracted from edi regardless of whetherthe VM instruction was discarded or not. It was observed that 17 instructionswere discarded between 0xf and 0x11 so r0 decrements by 18*0x1111 between 0xf and0x11.At a high level, the program above advances through our password 1 byte at a timeprocessing two bytes at a time, ie pass 0 takes pw[0:2], pass 1 takes pw[1:3}, etc.Processing ends when a double null is encountered. Each word is used to computeand index into a lookup table from which a word is extracted and placed on the stack.Following each pass, the entire lookup_table is xor'ed using a key byte based on the current index into the password buffer.Once the end of the password is reached, the stack is unwound and the values on the stack compared against an embedded list of values stored at 0x401590. Noting that 0xaf21is used as an end marker, we can see that our password needs to generate 29 stack words.In other words, the password we are searching for is 29 characters long..text:00401590 check_vals dw 2E0Bh, 6D02h, 7492h, 870Ch, 93B9h, 0EDB3h, 312Ch, 7107h.text:00401590 dw 7D10h, 2007h, 0E7C6h, 3A1Bh, 0BAD8h, 9417h, 0FA6Bh.text:00401590 dw 0BE6Ch, 621Dh, 4D3Bh, 47ADh, 7A7Ah, 3E9Dh, 53A2h, 0F22Fh.text:00401590 dw 0D1A9h, 0F574h, 8173h, 11BCh, 0AE15h, 6179h, 0AF21hThe following code describes how passwords are processed and verified; for (pw = (uint16_t*)(password + r6); *pw; pw = (uint16_t*)(password + r6)) { uint16_t val = *pw; val1 = ~(bswap(val - 18 * 0x1111)); val2 = ((val1 >> 8) ^ val1) & 0xff; push bswap(lookup_table[val2]) ^ TOS //here entire lookup_table is xor'd against xor_key[r6] for (int i = 0; i < 512; i++) *(r6 + (uint8_t*)lookup) ^= xor_key[r6]; r6++; } idx = 0; while (1) { v = pop() if (v == 0x5a4d && check_vals[i] == 0xaf21) { esi = 0xdeadbeef; return; } if (v != check_vals[i]) fail }Recovering the password is a matter of understanding the following1. Each work placed on the stack depends on two adjacent bytes of the password2. The stack word is generated by using the two password bytes to read a value from the lookup table which is xored with the current top item on the stack3. In the last pair of password bytes, one of the bytes is the terminating null4. The fact that we know one byte in the password byte pair allows us to determine the second byte of the pair by xoring the top two check values to learn the corresponding lookup table, and work the process backward to learn what the other byte of the password byte pair was. 5. Once learn one byte we can repeat step 4 to gradually reveal all of the key bytes.As an example of working the problem, the first item on the stack needs to be0x6179 = lookup_val ^ 0x5a4dlookup_val = 0x6179 ^ 0x5a4dlookup_val = 0x3B34which we find at 221 into the lookup table221 is thus the result of a transformation on password[0] and password[1]we can similarly compute the lookup table indicies for all of the check_valuesknowing that password[29] == 0 allows to work backwards to solve forpassword password[28], password[27], etc.A solver may be found at: http://www.idabook.com/paris_solver.cFinal password: V1rTu4L_M4ch1n3s_4r3_Aw3s0m3!Sursa: http://www.idabook.com/paris_writeup.txt Quote