Jump to content
Nytro

A Research on general auto-unpacking methods for Android applications

Recommended Posts

A Research on general auto-unpacking methods for Android applications

From: http://drops.wooyun.org/tips/9214

This post is the extended version of the article A Research on general auto-unpacking methods for Android applications presented on the Wooyun summit.

0x00 Background & Significance

Comparing traditional PC applications, Android applications are easier to reverse, because when the reversing process is done, it will completely recover the Java code or the smali intermediate language of the application. Both have a wealth of high-level semantic information that is easy to understand and expose its program logic to less technical attackers. Therefore, consolidated protection service for Android applications is brought to life. At first, only Party A companies provided such service and nowadays almost every large Internet company has it. Meanwhile, more and more money-related Android apps, such as banking applications are consolidating themselves, which means the market is booming.

A typical consolidated protection service can provide following protections: anti – reverse, anti-tampering, anti – debugging, anti – theft and so on. Although it cannot avoid and protect from the security issues or flaws residing in the application itself, it can effectively protect its true logic and integrity. However, these features can be easily leveraged by malware. There is evidence showed that the rate of packed malware is increasing, as consolidating protection is getting popular. On the one hand, malware analysis first requires unpacking. On the other hand, if an application can be easily unpacked and analyzed, the dangers it will face would also rise.

0x01 Research Object

Normally a consolidated protection service can provide general consolidation solutions and customized solutions for EDX. Customized consolidation often requires closer integration of development, which may involve deeper levels of consolidation (such as code consolidation). General DEX consolidation only requires users to provide compiled Android APK. Currently, the former lacks samples and needs close cooperation with consolidated protection providers, while the latter is provides as a free service by most providers and is applied more widely because of that. This post will focus on the latter one that provides protection for executable EDX, which is EDX files protection and aims at research on general recovery methods for DEX files. Customized consolidation service or obfuscation protection for native code is not included here.

0x02 the characteristics of consolidation service

We’ll detail the common characteristic of consolidation service by using a static reversing consolidation method as an example. A consolidation provider adopted this case several months ago. Because consolidation services frequently change its decryption algorithm and scheme, the implementation details are not applied to current products or other consolidation services. But the overall consolidation concept and protection methods are almost the same.

Usually when we use static analysis tools to analyzed a consolidated App, AndroidManifest.xml will save all original information including defined components, permissions and add an entry point, which is often an application, based on that.

However, the code of DEX will be like this.

%E6%9B%BF%E6%8D%A2%E5%9B%BE%E7%89%87.jpg

DEX code only contains a few classes and codes, which mainly does some detection work or preparation. Then it dynamically loads the original DEX file by loading a native library. On account of the use of dynamic loading mechanism, the consolidated DEX file will not have the real code from the original DEX file (There are some consolidations that don’t dynamically load complete DEX).

Next, you can use IDA to get the entry point and load the running native code. Often the so library is obfuscated and packed. The method involves breaking down the ELF header information so that IDA parse will fail, as shown in the figure below:

2015092406160113939pic215.png

2015092406160298967pic313.png

As you can see in readelf, apparently several fields in the ELF header are problematic.

2015092406160438857pic412.png

After repairmen, IDA will be able to decompile the so files. Next, we’ll start to analyze from the entry point. You will notice there are lots of problems when decompiling F5 to C code. The contents of most functions cannot be decompiled to normal C code. In assembly code, you’ll see the following junk code:

2015092406160549505pic512.png

We’ve concluded the junk code pattern of this product. It pushes the stack to jump out of the stack so that the decompiled function can identify problems. Because de-compilation often takes a push on the stack as function calling, as a matter of fact, it pushes the stack to compute register values. By pushing and jumping out the stack to fail the de-compilation and find the balance stack, then it executes a really useful instruction. Therefore the above case only has two really useful instructions.

The real assembly instruction can be extracted by using a script or manually. After extraction, you can reverse the code and use it to decrypt the JNI_OnLoad function. JNI_OnLoad will decrypt another ELF file from a piece of data, but at present, this ELF file cannot be correctly disassembled. The next code will continue to correct the data of the ELF. It decompresses the text end from the new ELF file and extracts a key from the text end to decrypt the rotext. At last, the real DEX packing application will be decrypted like this:

2015092406160721742pic612.png

The above process is actually a pack for an ELF file. The newly decrypted and corrected ELF file is the real decryption application for DEX pack. This process is not obfuscated or packed. By reversing, you can see it takes a piece of padding data from the original DEX to get some necessary parameters for decryption and decompression. You can get the real DEX file through decrypting and decompressing the whole padding data. Of course, ELF contains some code for anti-debugging and anti- analysis. As we are conducting static analysis, we’ll ignore this part of code. If you use a debugger to conduct dynamic analysis on additional processes, such as dump, you’ll have to consider how to bypass these anti-debugging techniques.

The above case is a dynamic loading DEX instance. Although many consolidation services vary in technical details, including decryption algorithm, junk code patter and ELF pack, it basically can represent how to decrypt and release the most consolidation services that dynamically load DEX, static reversing and the ideas to break it. This case is just a glimpse of it. Because consolidation services also feature that they frequently change its decryption algorithm and consolidation methods.

At the same time, there are some consolidation services that don’t use the dynamic loading mechanism of the whole DEX file; instead they use dynamic change in running. Under this mechanism, the consolidated DEX file will contain part of precise information from the original DEX, but the protected code will choose other methods to hide. Other than that, there is a combination of two methods, which will be covered later.

To sum up, a consolidated Android application actually is used to hide the real DEX file and it will add protection features to prevent from being reversed. As can be seen, it takes time to reverse its unpacking algorithm by using only static analysis. Besides, different consolidation services use different algorithm and each frequently changes its algorithm and consolidation technique, which makes static reverse unpacking technique will fail in a short period of time. Meanwhile, consolidation services also take other Android application protection measures than dynamic DEX loading. We’ll just talk a little about it here, because this part can be illustrated in a single post.

The first class is integrity check. It includes check on the integrity of itself, for instance, examine the test value of a DEX file in the memory and the application certificate to determine if it’s repacked or inserted with code. It also includes check on its environment; for example, by checking certain device files to inspect simulator, by ptrace or process status to determine if it’s debugged, hooked specific function to prevent the code memory from being accessed or dump.

The second class is code obfuscation. Normally obfuscation requires modification on source code or byte code. The purpose of this is to make it more difficult for analyst to understand the semantic meaning. Common modifications include changes on variable name, method name or class name and decryptions on constant string. It uses Java reflection mechanism invoke method to insert junk instructions or invalid code in order to disorder application’s control flow. Or it may use more complex operations to replace the original basic instructions to interrupt control flow by using JNI method.

We define the third class as anti-analysis or code-behind technique. Its purpose is to use all kinds of methods to avoid the exposure of application code and analysis. The most popular one is the above general encryption protection for DEX and the dynamic auto-modification in running. Dynamic auto-modification is to decrypt and execute code when the application runs to certain class or method. Meanwhile it may correct part of dalvik data structure after dynamic modifications to make it harder for analysis. Some anti-analysis techniques needs a little tricks, for example exploit the bug of static analysis tools, or use analytic features against it, including manifest cheating, APK pseudo encryption, the method hides in DEX file, or insert illegal or non-exist class to crash the analysis tool.

0x03 Thoughts on Unpacking

There are currently two types of common unpacking methods against consolidated applications. One is static reversing analysis. The shortage of this method is obvious that is difficult to operate and is not able to handle changes of algorithm. The other is an unpacking method based on memory dump. This method requires you to bypass all anti-debuting methods first and new emerging anti-dumping tricks. For example, changing dex headers avoids enumerating search. Dynamically changing dalvik data structures destroys the DEX files in memory. These techniques require more manual repair work to be done by observing the consolidated features even after you dump the DEX files.

Therefore we brought up this general auto-unpacking method, which is based on dynamic analysis. Regardless of different consolidation implementation, this method can avoid any anti-debuting techniques but meanwhile doesn’t need much repair work.

First of all, our unpacking object is the DEX file in Android applications, so we choose to directly change the source code of Dalvik in Android for instrumentation. As the DEX file is interpreted on Dalvik, all of its true behaviour can be exposed on Dalvik. Dalvik has multiple interpreting modes, among which there is a C++ based portable mode and some other modes are developed with assembly languages related to the platform. To implement the code of instrumentation, once we find the App to be unpacked is getting interpreted, we first should (the directory of source code-dalvik/vm/interp/Interp.cpp) change the interpreting mode to portable. The benefit of this is to make it harder for detecting the existence of unpacking behaviour by directly changing the execution environment. Compared with methods like attaching to debugger, this method is more transparent. Another advantage of using interpreter is we don’t have to care at which stage the consolidation app will load, initiate and decrypt the code. We can feet the real data and behaviour directly when running. The instrumentation code is implemented at the switch point of each instruction interpreted in Dalvik?dalvik/vm/mterp/out/InterpC-portable.cpp?, thus the unpacking operation can be performed at any instruction during execution, and meanwhile deal with the consolidation app which is decrypted when it’s executed. At last, source code based modification can be applied to actual machines, Android source code can perfectly suit all Nexus cellphones and doesn’t have to confront with the consolidation app’s detecting tricks for simulators.

The essence of unpacking is to obtain the actual behaviour of an app, therefore instrumentation is to get the Dalvik data structure in memory to reflect the real code to be executed. When an instruction is executed, it can get the method that this instruction belongs to and Method this structure. While each executed method contains the class object-calzz that this method belongs to and clazz (directory o source code-dalvik/vm/oo/Object.h) contains the pDvmDex?dalvik/vm/DvmDex.h?object among which pDexFile?dalvik/libdex/DexFile.h?structure represents DEX files, that’s to say, after the execution process gets the current method, you can use curMethod?clazz->pDvmDex->pDexFile to get the DEX file structure that this method belongs to. This structure contains the memory information of all DEX file when it’s executed during interpreting. The real DEX can be recovered by parsing this DexFile structure.

0x04 A simple unpacking implementation

So far, our first reaction is that if there’s any existing program that can interpret Dalvik byte code but take the DexFile structure that is read into the memory as input and meanwhile directly implement based on source code which is C/C++ implementation, instead read a static DEX file as input like many static reversing tools. We did a search and found that the source code in Android already provides DexDump (dalvik/dexdump/DexDump.cpp). This tool can satisfy this need. We have slightly changed the code of this DexDump and inserted it to the interpreter as follow:

2015092406160961474pic712.png

Use it to read the DexFile, this code will be executed in the main Activity of an APP by default. The main Activity can be acquired through AndroidManifest.xml, because the entry point class in this file is not hidden. We find this can handle most consolidation apps and get the real code in the DEX file which is hidden by consolidation apps, the output is as follow:

2015092406161086438pic811.png

But this method has a very obvious disadvantage. The output is in text form for the byte code of Dalvik. One the one hand, it cannot be reversed to Java. On the other hand, text form is not good for next complex analysis. Our best goal is to get a complete DEX file.

0x05 A perfect unpacking implementation

Normally at last step, many other unpacking tools will directly read pDexFile?baseAddr??pDvmDex->memMap as starting address to recover the complete DEX file and directly dump the memory of the whole file size. Howver, we find for some consolidation apps, the code dumped in this way still doesn’t contain the real code. This is because part of the real code in the DEX file is modified and mapped to the file out of the continuous memory section in running, as show in the figure below, a DEX file is supposed to be in a continuous memory section when it’s loaded into the memory, then it’s parsed and assigned with values as the structure required by Dalvik for each dynamic execution, but part index structure should point to continuous data block. However, the consolidation app may make some changes, for example, modify the data in header and re-allocate with non-continuous memory to store data and make those index data blocks to point to newly assigned data block. If we directly use dump, then we’ll not be able to get the complete DEX file.

2015092406161365959pic911.png

We aim to recover the original DEX file in a unified manner and don’t want to perform subsequent repairmen for different shells, because this leads us to the same dilemma as static reversing consolidation algorithm. Therefore, we have an better implementation based on the above simple implementation, which is called DEX file restructuring. The process is quite simple, that is to get all Dalvik data structures required by the interpreter during executing the application, the data structures here are all real and interpreted to execute. Then restructuring these data structures to a new DEX file. As shown in the above figure, even the memory is not continuous, we don’t have to worry its operation on the original mapped memory but we can directly obtain each non-continuous data and recombine these data to a new DEX file according to certain criterion. The first step is to precisely obtain each Dalvik data structure. To ensure accuracy of acquisition, we take the same method as the one the interpreter uses (refer to the dexGetXXXX method in DexFile.h ), because a DEX file or a same data block can be acquired in many ways, for instance, character strings may read the offset in file header to get DEX files or use the stringId table. In normal circumstances, these ways are all correct, but consolidation apps will do some damages. However, it cannot destroy the data used for acquisition in running. Because once the data is broken, the application won’t work. The specific acquiring method is as shown in the figure:

2015092406161588897pic1010.png

We need to walk through some pointers and offsets in every array (eg. pStringIds?pProtoIds?…?pClassDefs) , get each item and combine its contents as a large class (eg. stringData?typeList?…?ClassData?Code). Next, after acquisition, there are several problems to pay attention to during rewriting. First is the arrangement of these captured data, we referenced the order of map item type codes in dalvik/libdex/DexFile.h for arrangement. The arrangement requires to adjust the offset in each item as the new offset, for instance, stringDataOff, parametersOff, interfacesOff, classDataOff, codeOff. Then for values in structures DexHeader and MapList, we’ll calculate and set, instead of taking the original value. For some fixed values, such as the values in header, we meed to set them based on our knowledge. Eventually, we need to consider some data expression in memory and the data format difference in the DEX file, for example, some data items are ULEB128 encoded in the file and int type in memory. Besides, you need to pay attention to the 4-byte aligned and the field_idx_diff in encoded_method_format.

https://source.android.com/devices/tech/dalvik/dex-format.html

While we restructuring the data, we ignored some blocks, such as all the data structure related to annotation, but its structure is extremely complex. When it’s ignored, it won’t affect the real behaviour of the application.

0x06 Experiment and Discovery

After modifying the code, we re-compile the libdvm module and write the newly-generated libdvm.so into system directory /system/lib/ to overwrite the original library file. Our experimental subject are a Galaxy Nexus running Android 4.3 and a Nexus 4 running Android 4.4.2. Then we submit a simple application, send it to every online consolidation service, get the consolidated application and unpack it. The experimental discovery can used on almost every consolidation app to recover the original DEX file. The following are some discoveries about consolidation apps, mainly on different self-protection tricks that are adopted by consolidation apps, here the results are all texts of DexDump. Since some protection measures can use this method to show a lot details, all can be directly recovered to DEX files.

2015092406161725388pic1111.png

2015092406161949538pic129.png

The above two cases show that some consolidation apps will remove the magic number to hide the DEX file in memory so that one won’t be able to search any DEX file. Besides,some will modify the size of header and remove all field offsets in the header. Because our method is to recalculate the header, the restructured DEX will not be affected.

2015092406161265021pic136.png

2015092406162097502pic146.png

Other than that, some consolidation apps will insert additional classes to beak normal reversing effects. For example, this class has a way to fail dex2jar.

2015092406162278673pic155.png

Some pack will change codeOff to a negative value, so that the code will not be mapped into the memory range of the file. Our method can directly get the code and rewrite it to normal position.

2015092406162624379pic163.png

Some other packs rewrite other methods to put the code into a new method, decrypt it before execution and remove it after execution. For case like that, since our code is instrumented where each method is invoked, all we need is to adjust unpacking point to where the method is executed so that unpacking will recover the code.

2015092406162730098pic172.png

Except the above case, we find some consolidation app will hook the write function in process space. If it detects the content to write is specific data (eg. dex header), it would fail the write operation. Or it detects it’s getting the memory space in the mapped DEX file section, it would fail the write operation. Some consolidation app will separate the original DEX file to several DEX and modify specific data items like debug_info_off to false value. when running which will be dynamically changed back to the correct value. Some packs will obfuscate code based on original application.

(Note: The above cases are not all latest versions. It’s not guaranteed the current consolidation apps will be consistent with the above cases.)

0x07 Discussion and Thought

First of all, our method is still limited. On the one hand, as explained in research object, we only protect the encryption of DEX files without de-obfuscation work. Second, our method is still based on dynamic analysis which comes with the limitations of dynamic analysis, for example, a piece of encrypted code is decrypted when running to it, but this method cannot be triggered to execute and our method cannot decrypt the code of this method. At last, although this method is hard to be detected by consolidation apps, the implementation of the tools based on this method will surely be of some features. There features can be leveraged by consolidation app and confront.

Finally, I want to discuss with you about better consolidation ideas on Android platform. In fact, it’s relatively easier to break the consolidation on Android Platform, but there are harder and safer consolidation solutions. However inevitably you have to consider the performance and compatibility of the consolidation solution on smartphone platform. Combining these factors, I think the trend and practice of consolidation protection mainly focus on the followings.

One is I think Android obfuscation and packing can be combined. From the perspective of an attacker, I think a powerful obfuscation is more effective than packed protection code logic. But it’s very difficult to design a good obfuscation scheme. Currently the domestic consolidation won’t change and obfuscate the original code in fear of the compatibility on changing the code. I think this is a development point. I find that some excellent foreign tools will work on deep obfucation, for example, the dexprotector has packing and obfuscation, even if unpacking is successful, you’ll have to face the obfuscated code which is hard to understand.

Besides, I think partial consolidation is stronger than entire consolidation in security. Like a previous case, a method only decrypts itself in running, once it’s not in running, it will be encrypted or removed. This equals to using the flaw of the low coverage rate if dynamic execution to further protect itself.

The third is for better consolidation, the consolidation process should be performed in the middle of development instead of the current before development. Some consolidation SDK of today is a good practice in this direction. During development, some sensitive operations use an interface of a security library. Wether in performance or effects, this can dramatically improve the entire consolidation. Any developer who is familiar with the business will know exactly which part of code requires protection, because in fact there is only a few portion logics of an application requiring protection. By narrowing the range of consolidation, performance will be enhanced and singe security library can be targeted protected for better effects. Compared with entire APP consolidation, this is easier for compatibility test.

The other idea of consolidation is to use Native code, especially for critical program logic. It’s harder to reverse Native code than Java, let alone the code after obfuscation or packing. Meanwhile Native code in fact can improve performance, it’s a win-win choice. This extents to how to deeply protect the native code in Android applications, if sensitive operation is protected by native code that is deeply obfuscated, the attack cost will surly be raised.

At last, I think the trend of consolidation protection is to use less tricks, for instance, it’s not meaningful to use the bug of static analysis tools or the bug of system passing APK to perform protection, consolidation should focus more on the structure of the whole system instead of some little tricks.

Sursa: A Research on general auto-unpacking methods for Android applications - Drops

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...