1 / 40

Design of x86 Emulator for Generic Unpacking

Design of x86 Emulator for Generic Unpacking . Chandra Prakash (chandrap@sunbelt-software.com). The problem. Large number of detections are still based on some static signature, e.g., MD5, CRC32 etc. Malware has cleverly evolved to evade signature based detections by use of packers.

zizi
Download Presentation

Design of x86 Emulator for Generic Unpacking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of x86 Emulator for Generic Unpacking Chandra Prakash (chandrap@sunbelt-software.com)

  2. The problem • Large number of detections are still based on some static signature, e.g., MD5, CRC32 etc. • Malware has cleverly evolved to evade signature based detections by use of packers

  3. The problem, contd… • It is possible to write custom packing routines for each packer • Cryptanalysis or X-Ray can also be used • But, the number of packers and variations within each packer type are too many, e.g., Current version range for UPX is 1.x–3.x and FSG is 1.x-2.x • Moreover, there can be recursive layers of packing done

  4. A Solution - Emulation • Due to nature of the problem, it is desirable to have a general purpose solution • Emulation provides a “fairly” general purpose solution that leads to the term Generic Unpacking

  5. What is Emulation? • Wikipedia definition is pretty clear • “An emulator duplicates (provides an emulation of) the functions of one system using a different system, so that the second system behaves like (and appears to be) the first system. This focus on exact reproduction of external behavior is in contrast to simulation, which can concern an abstract model of the system being simulated, often considering internal state.”

  6. Emulation – where else is it used? • Supporting cross-platform applications • Controlled and secure execution of un-trusted applications • And off course, Dynamic behavioral analysis of malware and packed malware detection via generic unpacking • Etc.

  7. Emulation – to what degree? • Full emulation – Emulate everything; Application as well as the Operating System • E.g., VMWare and VirtualPC • Application Only - Emulate application level instruction set and System Call interface • E.g., Wow64, Win32 emulation on 64-bit Windows • Our emulator for Generic Unpacking is Application Only

  8. Emulator Components • A software implementation of the subset of hardware, operating system and application environment needed for running an application. • The hardware components include: the CPU, registers, interrupt vector table. The operating system components include: PE loader, virtual memory manager, structured exception handling(SEH). The • Application environment include: input parameter and environmental variable support, heap, stack, process environment block(PEB), thread information block (TIB), function hooks for spoofing execution references into system dll(s)

  9. Emulator Components

  10. Emu Components - PE Loader • The very first step in a target’s emulation • Create a memory-mapped image as per Windows PE specifications. • Calculate virtual mapped size • Allocate contiguous buffer based on virtual mapped size and the copy PE headers and section data in aligned sections • Fix imports from primary module • Fix relocations

  11. Emu Components - Registers • There are eight 32-bit general purpose registers (EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI) • Six 16-bit segment registers (CS, SS, DS, ES, FS, GS), DR0-DR3, DR6, DR7 hardware debug registers • EFLAGS and EIP registers • Added benefit to also provide support for FPU instructions and extensions to x86 architecture, such as MMX, SSE, SSE2, SSE3[10] and 3DNow! instructions

  12. Emu Components - CPU • Fetch instructions from the virtual memory address space of the target • Decode instruction; find instruction type, get operands • Execute instruction; calculate results and store • Move on to the next instruction as indicated by EIP

  13. Emu Components – Interrupt Handling • INT N generates interrupt, with N range as 0-255 • Execution of INT N results in a software exception in the application • From user mode only a subset of these are allowed, all others result in access violation exception

  14. Emu Components – Interrupt Handling…contd

  15. Emu Components – Virtual Memory Manager • Manages Virtual Memory used by the target at the very lowest level • Maintains memory regions • Each region consist of a contiguous sequence of pages, e.g., PE image region • Each page has its own allocation and protection characteristics • Allocation type include reserved, committed and free • Protection type include read, write, execute etc. • Access violation generated when a memory reference is not compatible to the allocation and protection type for the region

  16. SEH handling • Most commonly used to obfuscate execution path by deliberate generation and handling of software exceptions. • Typically used instructions are: • Single step (INT1) and break point (INT3) instructions • Arithmetic divide or integer overflow exceptions that are generated by DIV/IDIV and INTO instruction.

  17. Stack • The stack is a contiguous memory region that serves among other things as a memory work area for parameters passed in function calls and SEH chain. • There exists one stack for each thread. • It is implemented in an inverted manner so that it grows in the direction of decreasing memory address. • The stack parameters, e.g., base, limit, address of top level exception handler frame, should be appropriately set in TIB

  18. Heap • Heap enables efficient memory allocations of much lower granularity as opposed to page granular allocations of VirtualAlloc call. • To support Win32 heap related calls made by the target, e.g., HeapAlloc, HeapFree, etc., a simulation for the same needs to be provided. • The heap is implemented as a wrapper around page granular memory allocation calls.

  19. Thread Information Block(TIB) • For each thread there is a TIB structure stored at the address indicated by FS:[18h] in each thread. +0x000 ExceptionList : Ptr32 _EXCEPTION_REGISTRATION_RECORD +0x004 StackBase : Ptr32 Void +0x008 StackLimit : Ptr32 Void +0x00c SubSystemTib : Ptr32 Void +0x010 FiberData : Ptr32 Void +0x010 Version : Uint4B +0x014 ArbitraryUserPointer : Ptr32 Void +0x018 Self : Ptr32 _NT_TIB • The first field ExceptionList in TIB contains address of the top level exception handler frame represented by EXCEPTION_REGISTRATION_RECORD structure. • StackBase and StackLimit contain lower bound and upper bound of the thread’s stack. • Address of PEB can be obtained as FS:[30h]

  20. Process Environment Block (PEB) • For each user mode process there is one PEB • Some of the important fields accessed by malware are: BeingDebugged, ImageBaseAddress, InLoadOrderModuleList, InMemoryOrderModuleList and InInitializationOrderModuleList of PEB_LDR_DATA • The IsDebuggerPresent Win32 API simply returns value in BeingDebugged field of PEB. This is used by malware to detect debugger’s presence as one of the anti-debugging tricks 0x002 BeingDebugged : UChar //In PEB • The sorted list of modules is maintained in three different LIST_ENTRY type data structures in PEB_LDR_DATA +0x00c InLoadOrderModuleList : _LIST_ENTRY +0x014 InMemoryOrderModuleList : _LIST_ENTRY +0x01c InInitializationOrderModuleList : _LIST_ENTRY

  21. Function hooks • In application-only emulator, any system call made by malware in a dependent system module like kernel32.dll is intercepted and its corresponding spoofed implementation provided • Some of the functions include: LoadLibraryA/W, GetProcAddresss, GetModuleHandleA/W, VirtualAlloc, VirtualFree, HeapAlloc, HeapFree, GetVersionExA/W etc. • Also a default un-emulated function hook should also be provided that gets called when an un-implemented import function is encountered

  22. Stop Conditions • Ideally emulator should be stopped at OEP • Finding exact OEP in a generic way is non-trivial • Typical conditions other than the target initiated explicit termination are: • Encountering an un-emulated system call in a dependent module. • Unhandled exception for which no SEH handler was found. Some of these exceptions include invalid memory read, write, execute, divide by zero, integer overflow.

  23. Stop Conditions…Contd • Encountering an un-emulated or illegal instruction. • A configured timeout. • Maximum number of instructions being reached. • Attempt to load a dll that could not be located. • Too many dlls being loaded by the target in explicit load module.

  24. Emulator fine tuning due to malware unique characteristics • Practical constraints due to performance optimizations and undocumented features would allow only limited implementation of the emulator. • Once the core emulator system is ready, developing a robust emulator is an iterative process driven by minor fine tuning of it for unique characteristics of supported packers and symptoms exhibited by the malware test-bed. • Examples that follow describe some of the cases experienced with malware samples that lead to the improvement of our emulator. • The cases described in these examples are no way complete!

  25. Example 1 – Setting Initial Stack 0041C25A CALL 0041C25F 0041C25F PUSH EBP 0041C260 MOV EBX,DWORD PTR SS:[ESP+8] 0041C264 MOV EBP,DWORD PTR SS:[ESP+4] 0041C268 SUB DWORD PTR SS:[ESP+4],1A4AF • At address 0041C260, the MOV instruction references an address ([ESP+8]) at the top of initial stack. • This address is the return address after the CALL instruction in kernel32.dll that “calls” the malware entry point. • The return address actually ends up calling ExitProcess.

  26. Example 2 – Module load address alignment 004A1584 MOV EBX,DWORD PTR SS:[ESP+24] ; EBX=77E8141A 004A1588 AND EBX,FFE00000 ; EBX=77E00000 . . . 004A16C4 ADD EBX,10000 004A16CA JE SHORT 004A16F7 004A16CC CMP WORD PTR DS:[EBX],5A4D 004A16D1 JNZ SHORT 004A16C4 • At 004A16C4 EBX value is incremented by system allocation granularity. • At 004A16CC it compares the content of value located at the address in EBX with WORD type 5A4D (ascii ‘MZ’), which is the startup marker for a PE image. • If the address of the startup marker is found in the address pointed by EBX, execution follows to location 004A16C4.

  27. Example 3 – Startup Register Values 31428200 PUSH ED01C390 31428205 MOV EAX,ESP 31428207 CALL EAX 0012FFC0 NOP 0012FFC1 RETN 31428209 XCHG EAX,EBX ; EAX=7FFDF000, EBX=0012FFC0 3142820A POP EBX • At 31428209 EBX is referenced whose value is equal to the PEB address of the program

  28. Example 4 – Handling DLL emulation • For the correct emulation of a dll, before the entry point function DllMain gets called, its input parameters must be set in the stack as in Windows. BOOL WINAPI DllMain( HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved );

  29. Example 5 – Setting register values before calling SEH handler 0048C093 CMP AL,4 0048C095 JNZ SHORT 0048C09B 0048C097 NOP 0048C098 NOP 0048C099 RETN • SEH handler’s second instruction at 0048C095 has a conditional jump instruction depending on whether AL is zero or not. • In real Windows, EAX is set to zero just before SEH handler gets control. • Therefore, before SEH handler gets control other registers should be set up as they are set in Windows.

  30. Example 6 – Setting top level exception handler in SEH 004141EF SUB EDX,EDX 004141F1 MOV EAX,DWORD PTR FS:[EDX] 004141F4 MOV ESP,DWORD PTR DS:[EAX] 004141F6 POP DWORD PTR FS:[EDX] 004141F9 POP EAX 004141FA POP EBP 004141FB RETN • Windows also registers another handler on top before application handler gets control • Malware had already configured return a address on the stack that gets executed after RETN at 004141FB • At 00414F4 it skips over the top level SEH handler and positions ESP to the SEH frame for this handler • At 004141F6 the two top SEH handlers are torn down and after 00414FB execution resumes at location specified by ESP, that was last updated at 00414DFA

  31. Example 7 – Check for BeingDebugged field in PEB 3142821B MOV EAX, DWORD PTR FS:[18] 31428220 MOV EAX, DWORD PTR DS:[EAX+30] 31428223 MOVZX EAX, BYTE PTR DS:[EAX+2] 31428227 CMP EAX, 0 3142822A JNZ SHORT 3142826E 3142822C CALL 31428231 31428231 POP EBP • At 3142821B address of TIB is obtained which is used to get address of PEB at 31428220 • At 31428223 BeingDebugged field of PEB is checked to evaluate the condition of the branch instruction at 3142822A

  32. Example 8 – Check for loader lists in PEB 0044D0A5 MOV EAX,DWORD PTR FS:[30] 0044D0AB TEST EAX,EAX 0044D0AD JS SHORT 0044D0BB 0044D0AF MOV EAX,DWORD PTR DS:[EAX+C] 0044D0B2 MOV ESI,DWORD PTR DS:[EAX+1C] 0044D0B5 LODS DWORD PTR DS:[ESI] • At 0044DA05PEB is referenced at 0044DA05. • At 0044D0AF, 0044D0B2 and 0044D0B5, PEB_LDR_DATA, InInitializationOrderModuleList and InInitializationOrderModuleList.Flink respectively are referenced • The malware happens to be referencing the kernel32.dll load information in its dependent module list sorted on initialization order

  33. Example 9 – Reference to Thread Local Storage 004033FA MOV EAX,DWORD PTR DS:[4503D4] 00403400 TEST CL,CL 00403402 JNZ SHORT 0040341A 00403404 MOV EDX,DWORD PTR FS:[2C] 0040340B MOV EAX,DWORD PTR DS:[EDX+EAX*4] 0040340E RETN • At 00403404 the beginning of thread local storage pointer array value in FS:[2Ch] is copied over in EDX • The next instruction at 0040340B returns in EAX value of a TLS pointer as indexed by previous value in EAX

  34. Example 10 – Normalizing malformed PEs in loader • All Win32 PE executables are expected to follow the PE format specifications in the strictest sense • Yet, it is seen that many malware samples do not conform to these formal guidelines and are still allowed to be run by the Windows loader. • In general a malware should be loaded by the emulator as long as Windows loader accepts it by relaxing constraints on these kind of aberrations

  35. Example 10 – Normalizing malformed PEs in loader…contd

  36. Emulator Performance Optimizations • In plain emulation instructions are executed in software • Plain emulation is hundreds of times slower than native execution • Is not well suited for malware that require emulation for hundreds of millions of instructions

  37. Emulator Performance Optimizations – Dynamic Binary Translation (DBT) • Frequently executed instructions, e.g., decryption loop, are translated into native instructions • Repeat execution of same set of instructions above a certain threshold causes their translated counterpart to be executed • DBT is only about ten times slower than native execution

  38. Some more DBT details • Code is partitioned into a sequence of Basics Blocks (BB) • Each BB is self contained and does not contain any branch instructions • For each BB corresponding translation of native instruction obtained • There is a performance hit at the time of translation but that’s one time

  39. Page fault handler based unpacking • All memory writes from a packed program are monitored from kernel until an execute is issued in the modified monitored memory regions • The page fault handler based unpacking system yields maximum speed improvements as the malware is allowed to run natively on the host machine and in that sense does not require any kind of emulation. • But, its implementation is discouraged as it requires un-conventional ways of modification of page fault interrupt handler in the kernel and may not even work on 64-bit Vista because of patch guard protection.

  40. Thank You

More Related