1 / 43

Eureka: A Framework for Enabling Static Analysis on Malware

Eureka: A Framework for Enabling Static Analysis on Malware. MARS.MTC.SRI.COM. Motivation. Malware landscape is diverse and constant evolving Large botnets Diverse propagation vectors, exploits, C&C Capabilities – backdoor, keylogging, rootkits, Logic bombs, time-bombs

sezja
Download Presentation

Eureka: A Framework for Enabling Static Analysis on Malware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eureka: A Framework for Enabling Static Analysis on Malware MARS.MTC.SRI.COM

  2. Motivation • Malware landscape is diverse and constant evolving • Large botnets • Diverse propagation vectors, exploits, C&C • Capabilities – backdoor, keylogging, rootkits, • Logic bombs, time-bombs • Malware is not about script-kiddies anymore, it’s real business. • Manual reverse-engineering is close to impossible • Need automated techniques to extract system logic, interactions and side-effects

  3. Dynamic vs Static Malware Analysis • Dynamic Analysis • Techniques that profile actions of binary at runtime • Better track record to date • CWSandbox, TTAnalyze • Only provides partial ``effects-oriented profile’’ of malware potential • Static Analysis • Can provide complementary insights • Potential for more comprehensive assessment

  4. Malware Evasions and Obfuscations • To defeat signature based detection schemes • Polymorphism, metamorphism: started appearing in viruses of the 90’s primarily to defeat AV tools • To defeat Dynamic Malware Analysis • Anti-debugging, anti-tracing, anti-memory dumping • VMM detection, emulator detection • To defeat Static Malware analysis • Encryption (packing) • API and control-flow obfuscations • Anti-disassembly

  5. System Goals • Desiderata for a Static Analysis Framework • Unpack over 90% of contemporary malware • Handle most if not all packers • Deobfuscate API references • Automate identification of capabilities • Provide feedback on unpacking success • Simplify and annotate call graphs to illustrate interactions between key logical blocks

  6. The Eureka Framework • Novel unpacking technique based on coarse grained execution tracing • Heuristic-based and statistic-based upacking • Implements several techniques to handle obfucated API references • Multiple metrics to evaluate unpack success • Annotated call graphs provide bird’s eye view of system interaction

  7. The Eureka Workflow Packed Binary Dis-assembly IDA-Pro Packed .ASM Statistics based Evaluator Unpack Evaluation Trace Malware syscalls in VM Eureka’s Unpacker Un- packed Binary Dis-assembly IDA-Pro Un-Packed .ASM Eureka’s API Resolver (Control and Data-flow Analysis) Syscall trace Favorable execution point Heuristic based offline analysis Un- obfuscated .ASM Detailed call-graph Statistics based Evaluator Annotated Call-Graphs (Control and Data-flow Analysis)

  8. Coarse-grained Execution Monitoring • Generalized unpacking principle • Execute binary till it has sufficiently revealed itself • Dump the process execution image for static analysis • Monitoring exection progress • Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table) • Callback invoked on each NTDLL system call • Filtering based on malware process pid

  9. Related Work • PolyUnpack (Royal et al. ACSAC 2006) • Static model using program analysis • Fine-grained execution tracking detects execution steps outside the model • Renovo (Kang et al. WORM 2007) • Fine-grained execution tracking using QEMU • Dumping trigger: execution of newly written code • OmniUnpack (Martigoni et al. ACSAC 2007) • Coarse-grained monitoring using page-level protection mechanisms

  10. Design Space Evasions: (1) multiple packing (2) partial code revealing packers (3) VM detection (4) Emulator detection

  11. Heuristic-based Unpacking • How do you determine when to dump? • Heuristic #1: Dump as late as possible. NtTerminateProcess • Heuristic #2: Dump when your program generates errors. NtRaiseHardError • Heuristic #3: Dump when program forks a child process. NtCreateProcess • Issues • Weak adversarial model, too simple to evade… • Doesn’t work well for package non-malware programs

  12. Statistics-based Unpacking • Observations • Statistical properties of packed executable differ from unpacked exectuable • As malware executes code-to-data ratio increases • Complications • Code and data sections are interleaved in PE executables • Data directories(import tables) look similar to data but are often found in code sections • Properties of data sections vary with packers

  13. Statistics-based Unpacking (2) • Our Approach • Model statistical properties of unpacked code • Volume of unpacked code must strictly increase • Estimating unpacked code • N-gram analysis to look for frequent instructions • We use bi-grams (2-grams) because x-86 opcodes are 1 or 2 bytes • Extract subroutine code from 9 benign executables • FF 15 (call), FF 75 (push), E8 _ _ _ ff (call), E8 _ _ _ 00 (call)

  14. Statistics-based Unpacking (3)

  15. Statistics-based Unpacking (4) • Feasibility test • Corpus of (pre- and post-unpacked) executables unpacked with heuristic unpacking • 1090 executables: 125 originally unpacked, 965 unpacked • Simple bi-gram counting was able to distinguish 922 out of 965 unpacked executables (95% success rate)

  16. STOP Algorithm • STOP – Statistical Test for Online unPacking • Online algorithm for determing dumping trigger • Simple hypothesis test for change in mean • Null Hypothesis: mean bigram count has not increased • Assumption: bigram counts are normally distributed with prior mean μo. • If (μ1 – μ0) / σ1 > 1.645, we reject null hypothesis with confidence level of 0.95. • Test is repeated to determine beginning of unpacking and end of unpacking.

  17. API Resolution • User-level malware programs require system calls to perform malicious actions • Use Win32 API to access user level libraries • Obufscations impede malware analysis using IDA Pro or OllyDbg • Packers use non-standard linking and loading of dlls • Obfuscated API resolution

  18. Standard API Resolution Imports X KERNEL32.OpenFile …….. KERNEL32.DLL B: Exports OpenFile R …….. CALL X IAT (Import Address Table) B+R …….. X: Dynamic linking F: JMP [X] ; thunk R: Entrypoint to OpenFile CALL F ; call by thunk … CALL [X] ; indirect call • API Calls • Calls to various user-level DLL’s linked by the Windows Linker/Loader • Legitimate executables have import table • Import table is used to fill up IAT with virtual addresses at run-time

  19. Standard API Resolution Imports in IAT identified by IDA by looking at Import Table

  20. API Obfuscation by Packers Imports X KERNEL32.OpenFile …….. KERNEL32.DLL B: Exports OpenFile R …….. CALL X IAT (Import Address Table) B+R …….. X: F: JMP [X] ; thunk R: Entrypoint to OpenFile …………. CALL F Import table is removed IAT is not filled in by the linker and loader Unpacker fills in IAT or similar data structure by itself Hard to identify corresponding API call in executable

  21. Identifying APIs by Address Imports X KERNEL32.OpenFile …….. KERNEL32.DLL 7c800000: Exports OpenFile R …….. CALL X IAT (Import Address Table) 7c810332 …….. Dynamic linking X: 7c810332: Entrypoint to OpenFile …………. CALL [X] • For each DLL build relative and absolute address database • Default “Image address” is the base address • Calculate corresponding virtual address for each exported API • Match addresses used in calls with the databaseS

  22. Handling DLL Load Obfuscations Imports X KERNEL32.OpenFile …….. KERNEL32.DLL RVA:00000: Exports OpenFile R …….. CALL X IAT (Import Address Table) 21810332 …….. Dynamic linking X: RVA:10332: Entrypoint to OpenFile …………. CALL F • Intercept dynamic loading at arbitrary addresses • Look for “NtOpenSection” and “NtMapViewOfSection” in trace • Search for DLL headers in memory during dumping • Can even identify DLL code that are copied to arbitrary location

  23. Handling Thunks IsDebuggerPresent • Identify subroutines with a JMP instruction only • Treat any calls to these subs as an API call

  24. Using Dataflow Analysis GetEnvironmentStringW def use Identify register based indirect calls

  25. Handling Dynamic Pointer Updates A def to dword_41e308 is found Look for probable call to GetProcAddress earlier dword_41e304 has no static value to look up API Call to GetProcAddress def use Identify register based indirect calls

  26. Evaluation Metrics • Measuring analyzability • Code-to-data ratio • Use disassembler to separate code and data. • Most successfully unpacked malware have code-to-data ratio over 50% • API resolution success • Percentage of API calls that have been resolved from the set of all call sites. • Higher percentage implies more the malware is amenable to static anlaysis.

  27. Graph Generation • Call graph simplification • Most malware contain hundreds of functions • Remove nodes without APIs connecting inbound and outbound edges • Micro-ontology labeling • Bird’s eye view of malware instance • Translate API functions into categories based on functionality • Categories based on Microsoft’s Classifications • Common Filesystem, Random, Time, Registry, Socket, File Management

  28. Storm Worm Case Study Storm Worm: Bird’s Eye View (Semi-manually generated)

  29. Storm Worm Case Study (2) Control Flow Graph: eDonkey Handler

  30. Eureka Ontology Graph

  31. Experimental Evaluation • Evaluation using three different datasets • Goat (packed benign executable) dataset • 15 common packers • Provides ground truth for what packer is used and what is expected after unpacking • Spam malware corpus • Honeynet malware corpus

  32. Goat Dataset

  33. Goat Dataset

  34. Evaluation (ASPack)

  35. Evaluation (MoleBox)

  36. Evaluation (Armadillo)

  37. Spam Corpus Evaluation • Evaluation of a corpus of 481 executables • Binaries collected at spam traps • 470 executables successfully unpacked (over 97% success) • 401 unpacked simply using heuristic unpacker • Rest unpacked using statistical hypothesis test • Most API references were successfully deobfuscated

  38. Spam Corpus Evaluation (2)

  39. Spam Corpus Evaluation (3)

  40. Honeynet Corpus Evaluation • Evaluation of a corpus of 435 executables • Binaries collected at SRI honeynet • 178 out of 435 packed with Themida (only partially analyzable) • Analysis of the 257 non-Themida binaries • 20 did not execute on Win XP • Eureka unpacks 228 / 237 remaining binaries • High API resolution rates on unpacked binaries

  41. Honeynet Corpus Evaluation (2) *Includes all binaries except those packed with Themida

  42. Honeynet Corpus Evaluation (3) *Includes all binaries except those packed with Themida

  43. Runtime Performance • Evaluation of a corpus of 435 executables • Binaries collected at SRI honeynet • 178 out of 435 packed with Themida (only partially analyzable) • Analysis of the 257 non-Themida binaries • 20 did not execute on Win XP • Eureka unpacks 228 / 237 remaining binaries • High API resolution rates on unpacked binaries

More Related