Nytro Posted January 15, 2012 Report Posted January 15, 2012 [h=2]Obfuscated JavaScript 2.0 - Building an encoder[/h] JavaScript is a wonderful language full of tricks, power and the element of confusion. In this day and age it is likely that most people handling PDF, JAVA, Flash or browser-based exploits has either seen, reversed or been owned due to JavaScript. To this day attackers continue to find clever new ways of hiding their exploit or making the reversing process a nightmare, but not many have turned to the web 2.0 features. M86 wrote up an entry a last week detailing some malware that used AJAX to fetch a portion of its shellcode. Oddly enough, over the winter break, I decided it would be fun to write my own JavaScript encoder with the intention of making it a royal pain to reverse. My encoder also used AJAX, but in a nastier way, so I felt now was the time to do a write up on it. That and I am in Miami this week attending Infiltrate where there is nothing but offensive events happening all around me, so this is my attempt to fit in. This post won't cover the creation of the encoder as that in itself could take a couple posts alone. Instead, focus will be placed on some of the techniques I used and how the overall output product is generated. Comparisons will also be done between my output versus what is currently being seen in the wild. Routines The routines used in my encoder are similar to what has already been put out by attackers, but with some more technical aspects to ensure the code is not easily reversed. The following describes the flow of transformation:Code to be encoded is taken inEncoder sets and splits are generated to be used later in the routinesCode is ran through a function that converts ASCII to its number formEach number is than mapped to a random alpha key resulting in a single characterEach character is mapped against the encoder set which results in a long string made up of 3 unique charactersAlpha key is stored server side and mapped to a seed tokenRound one decryption routine is built taking in all variables listed above into account (additional data for round two decryption routine is stored local to the class)Output generated from routine one is fed to routine twoSteps 1-5 are ran againRound two decryption routine is built taking pieces of data from round one Output Code and Results The output code is large, so here is a screenshot capturing the bulk portion of the code: As you can see, it clearly looks like something malicious is happening here, but without reversing you are left guessing what exploit could be used in the delivery. If you want a live example, visit here and if you just want the output sample then see here. After running my code through Wepawet and Jsunpack a few times, I was able to tune the script such that it would be flagged as benign. This is mostly due to the fact that jQuery is required to fully decode the payload and neither of these engines seem to account for that. In the example I decided not to output the original code that I fed to the encoder, so if you want to have a shot at reversing it, go ahead and email your answer. There won't be any prizes for solving the puzzle, but it could be good practice. Variable Names Almost all obfuscated JS makes a point of creating random variable names. At first glance it is difficult to identify what is going on, but after doing several finds and replaces, you have a pretty good idea where to hook so you get the output result. Instead of doing a 100% random sequence, I opted for something a bit more annoying. All my variable names are derived off a single string of one letter that has randomly been camel cased. Each variable is essentially carved from the core string at random lengths and offsets. This results in variable names that are sometimes contained within other variable names. To illustrate what I mean, take a look at this example: As you can see, no longer can you simply find and replace every instance of a variable. In fact, you can almost never do that with any one variable without first checking where else it occurs. Not only is this effective in making the code difficult to read, but it also still retains the same effect that existing code achieves. Invisible Payloads Within the ASCII character set are numerous characters that print as blanks. Unfortunately, only four of these characters are printable in a web environment. These characters are spaces, horizontal tabs, vertical tabs and new pages. It is ideal to have one of these four characters serve as a spacer as it makes identifying where separation begins and ends on the encoded output. This leaves three characters left that can be used to encode our input code. Fortunately, if we do not take case into account, then we have 3 to the 9th power combinations (27) allowing us to represent the entire lowercase alphabet in invisible characters (this process is used in step 5 of the flow). Initially all encoded output code was done using the three letters, but that resulted in the same pattern every time. At first this would be a pain to reverse, but once you knew what you were dealing with, it wouldn't be hard at all. To combat against this issue, I decided to randomly select the spacer from the four values, and included the remaining three in an array of "encoding characters". The output of this would sometimes result in a 100% visible encoded output, but other times it results in half-way visible or three-quarters visible output making it difficult to identify which invisible character is being used for what. Below is a small portion of the output code after being encoded: Preemptive Hooks I wrote about JavaScript Hookers a few weeks ago and it dawned on me that these did not exist in malicious output. If I were hooking certain functions to reverse a payload, then why couldn't an attacker do the same? Following the same concept as a reverse engineer, I hooked "eval", "alert" and reassigned "console" and "document" before clobbering them. Essentially what this means is that if you try and use "console.log" or "alert" when reversing this code, it will send you into an infinite loop. Also, because I reassign certain functionality to random variables, you need to also keep those preserved otherwise you will break the code later when they get used. To combat against this you would need to inject JS after my hooks and redefine the functions back to their original state. I am not certain how this would be done, but if in the event someone managed to do it, I decided to throw in another problem. Some hooks are defined in one round and then later used in the second round decoder meaning you can't just redefine everything back to how it was. Furthermore, on the second round, I clobber all global functions listed above again therefore forcing the user to inject another override. AJAX Required As part of the decoding routine, AJAX is used to pass data back to the server to get the proper return value. This is based on the alpha key generated and stored during the encoding process. Since this key is random, you are out of luck if you don't have access to the server unless of course you want to brute force the values. The AJAX portion of the encoding is only present in round two of the decoding routine, so at first glance, there is no mention of any AJAX. If you copy and paste the script into a reversing environment, you will be able to decode the first round without issue, but the second round will leave you stuck. The nature of AJAX forces you to hardcode a URL that is within the same domain as your hosted code. This is not really an issue as we can control this value server-side, but to pull the correct alpha key, we need to pass a unique seed token. Someone reversing the code could copy the URL and parameters to just get the values and subvert the whole process. To combat against this, the amount of iterations is calculated and stored with the seed token and alpha key mapping. This means that the payload is literally only good to run for one and one time only. After the sequence has ran and talked with the sever for its set call limit, it starts spewing random values causing the decoding to fail. Exception Clauses Try/except clauses have been known to cause problems for automated analysis engines, but that has been fixed to some extent. For the AJAX portion of the code to work above, I include jQuery (it is small and ubiquitous) which means it uses its own syntax for certain actions. Analysis engines are currently not smart enough to include these libraries and as a result, we can use this against them in our try/except clauses. If we wrap out entire code base and routines in try/except clauses where the try attempts to do something with jQuery, then we know the engine will fail and therefore hit the exception catch. This simple, yet effective technique is used in both rounds of the decoding process. It should be noted that in some cases the catch portion of the code returns actual data that can be used within the overall decoding process. This means that the code won't break or cease to function just because the exception is hit. In other words, a good exception is not wasted and instead is used to throw the analyst off. If you are not carful, you could easily miss that fact that the catch is caught in the round two decryption resulting in random characters being generated for the output string. Encoded Code/Shellcode Detour One of the less technical or amusing pieces of the code is what I call the "detour data". This is just essentially random code made to look interesting so the user spends time saving, reversing or trying to make sense of it. There is nothing stopping it from getting used later on, but for now it is not and just takes up space. Since it is random, it too changes giving it the appearance of being useful. Comment Bombs When reversing obfuscated JS, it is normal to remove it from the live environment and throw it into a safe place where it can be ran. The first tool that comes to mind for dealing with this sort of problem is Malzilla. It does a great job making ugly code readable and assisting in the process of reversing. Unfortunately the code used to "beautify" the JS is flawed to some extent. If I throw in some specially made comments, when you hit the cleanup, it completely sprays the comment data into the code therefore breaking it. This is by no means advanced or technical, but can be confusing if done near a single instruction if/else statement. Tailored Output The code currently generated by the encoder does not account for the browser version being used. Keep in mind that if you know the browser or have a reasonable idea of what version it is, then there are certain things you can do to make life hell or tailor your output code to make it less bulky. As an example, think of Firefox and the Firebug extension. Firebug is great, almost too great for doing live analysis or code changes. If we can detect the presence of Firebug being on, then why not kill ourselves to avoid being analyzed. The current output will not kill itself if it sense Firebug, but it will clear the console to avoid all the AJAX calls from being seen. This is just a small example, but it helps illustrate what more could be done. Future Improvements One and Done Following with the same trend on limiting AJAX calls, there is nothing stopping an attacker form generating a random directory to hold a randomly named JS file that deletes itself after being ran once. Imagine a user gets compromised and you now want to look to see what was used only to find that file no longer exists on the server and the payload is useless without the proper decoding handler. I have toyed around with this, but do not intend on sharing it at this time. Secure Chatting HTTP GET requests are used when making the AJAX call back to the server. This could easily be changed to HTTPS POST requests therefore hiding what was sent and killing any hope of successfully understanding what was going on between the client and server. Enabling such a setup is as simple as changing the web server configuration and AJAX call within the code. Conclusions Once again, this example goes to show that attackers can do a lot more to make life hell. The fact that they don't is a huge relief to us, but given we are already starting to see AJAX used to fetch shellcode, I can say with certainty that these sort of techniques and more are going to show up in malicious code soon. If we start working against them now, it will be easier when they are being used for evil. At this point in time I am not releasing the encoder as that would spoil the challenge if someone wanted to accept it, but if you are interested in knowing more or discussing the process, email me. Sursa: Obfuscated JavaScript 2.0 - Building an encoder - 9b+ Quote