‘Supervised Automation’ for Malware Variant Generation: Theoretical and Practical Implications

‘Supervised Automation’ for Malware Variant Generation: Theoretical and Practical Implications Rachit Mathur Research ScientistMcAfee January 5, 2020 18th EICAR Annual Conference 9th – 12th May, 2009 Berlin, Germany

Agenda Introduction & Malware Growth Supervised-Automation Compare With Metamorphism Real-World Examples Detection Challenges Conclusions & Future work Questions

Malware Growth – All known samples +180%

Malware Growth – Families vs Variants

Rogue AV Unique Binaries Discovered

Sample Count Explosion Lots of variants per family New variants released even before a signature for previous ones gets released Money-motivated organized malware gangs ‘Professional products’ Pose serious detection challenges Difficult to anticipate changes Short-term per family proactive detection is minimum requirement Use bleeding-edge technology Conficker – crypto algorithms MBR rootkit – stealth techniques To evade detection is the primary motive

Morphing Malware Not the traditional poly or metamorphics Do not carry the mutator Delivered through the cloud (server-side) Drive-by downloads, social engineering, self-updating malware Binaries change often Now adopted by all Backdoor, PWS, AdClicker, Proxy, Worms etc Morphing services Tibs-Packed: Storm worm, downloader, uploder, spam-bot, backdoors etc. FakeAV looking downloaders, backdoors, worms Human supervised automated variant generation system

Supervised-Automation Supervised Automation (SA) is semi-automated method of generation of malware variants with sporadic human intervention Loosely related to the concept of metamorphism Not based off of any particular malware family

Supervised Automation Malicious binary & info B Info • ADD • SUB • XOR • ROT • RC4 Select and apply encryption Loop-back to re-encrypt Human E(B) Info • Dead Code Insertion • Junk Code Insertion • CFG Obfuscation • Instruction Substitution • Decryption Key Obfuscation • Geometric Fuzzyfication Select and apply morphing Info M(E(B)) Black-Box signature extraction Release-to-world

Supervised-Automation Generate any number of new variants at the desired frequency Motive is to evade detection and not ‘blindly’ generate variants Different pattern of operation observed in Tibs-Packed, FakeAV, GamePWS trojans

SA vs. Metamorphism Generally speaking, virus detection is undecidable Solutions for specific sub cases have been proposed Let us see what existing results from comparable technology apply to SA Purely automatic variant generation i.e. the concept of metamorphism is studied

SA vs. Metamorphism • Do not carry the engine • Transformation logic is not self-contained • Transformation rules not constant • No feed-back loop • Transformations not limited • Anti debugging, anti disassembly, anti emulation : anti analyses Locate own code Decode Analyze Transform Metamorphic engine

Normalization based approach Transformation rules modelled as Term Rewriting Systems (TRS) and related to formal grammars Proving equivalence between two programs w.r.t. a rewriting system reduces to the famous word problem Undecidable in general Unless TRS is confluent and terminating Some approximation based approaches movedi, 0x04 push ecx movecx, 0x04 movedi, ecx pop ecx unconditional push eax moveax, 0x04 push eax eax not live push 0x04 moveax, 0x04 push eax eax not live

Normalization based approach RS1 RS2 RS3 Time • Multiple TRS bad news for some solutions • Q: Do multiple TRS really make a difference? • Same worst case for a ‘well-designed’ system • But multiple TRS does make things worse

Approaches Approaches that are agnostic of rule systems can be useful against such systems Smart byte-based detection schemes Normalization based on general optimization techniques and program semantics based detection methods Behaviour based detection may be useful today Emulation based techniques have been proposed earlier to identify detectable behaviours but emulation has a host of well known problems

Example – Storm worm start of encrypted code • Locate the start address of encrypted data and size/end of the data • Calculate key(s): key[i] • Apply key(s) • Transfer control to decrypted code Add , rotate Fake call returns -1. end of encrypted code

Example – Storm worm start of encrypted code Add , rotate Fake call returns -1. end of encrypted code

Example – Storm worm Base Variant (BV) ….. Algorithm A Algorithm B Algorithm C ….. Algorithm N EBV1 EBV2 EBV3 ….. M1 M2 … Mn M1 M2 … Mn M1 M2 … Mn K M11 K M21 M11 K M11 K K M11 K M11 K M12 K M22 M12 K M12 M11 K M11 … … K K … … … … K M1n K M2n M1n K M1n K K M1n K M1n Day 1 Day 2 …. Day m Day m+1 Day m+2 Day n Day n+1 Day n+2 Day o • High, medium and low frequency changes

Example – DNSChanger • Uses obfuscated calls Possible call targets Rules can be conditional

Example – PWS dll • Rules change often • Constructs strings HBXYXND-0109-NEW

Example – PWS dll • Rules change often • Constructs strings WM_HOOKEX_RK

Example – PWS dll • Rules change often • Constructs strings Explorer.exe

Example – PWS dll • Rules change often • Constructs strings act=getpos&account=%s

Example – FakeAV • junk code • variable renaming • register liveness • second one is reversed

Detection Challenges Virus authors want to evade detection, and keep undetected once a machine is compromised AV update should detect the ‘current’ vairant – somewhat ‘proactive’ Able to detect all automatically generated variants up till the next human based update Resistant to non-functional changes

Signatures Goal is to find ‘enough’ evidence to detect and classify a file for practical purposes such that it will not generate any false positives Generic Reliable : No falses “my virus botnet, attack ms08-067 ping”

Signatures Simple byte sequence based not useful Hash based Detection worthy strings Detection worthy code sequence Multiple sets of wildcard based byte sequences at various locations that remain constant Emulation Decryption or cryptanalysis based Presence of a technique can yield itself to detection Geometry based Combination provides the right balance

Conclusions & Future Work Stakes are getting bigger with increasingly critical, sensitive, high-value information at risk Adoption of cutting-edge research concepts and innovation skills by virus authors More automation and more understanding of ‘correct’ transformation techniques is expected Interesting to formalize some results in the realm of SA based malware Detections solutions which are agnostic of rewrite systems need to be investigated. It will also be interesting to see how behaviour evolution materializes in reality and any forward looking research around that is very relevant

References Bruschi, D., Martignoni, L., & Monga, M. (2006). Detecting Self-mutating Malware Using Control-Flow Graph Matching. Lecture Notes in Computer Science, 4064/2006 (Detection of Intrusions and Malware & Vulnerability Assessment), 129-143. Bruschi, D., Martignoni, L., & Monga, M. (2006). Using code normalization for fighting self-mutating malware. International Symposium on Secure Software Engineering. Washington, DC, USA: IEEE. Chess, D. M., & White, S. R. (2000). An undetectable computer virus. In Proceedings of Virus Bulletin Conference. Christodorescu, M., & Jha, S. (2003). Static analysis of executables to detect malicious patterns. SSYM'03: Proceedings of the 12th conference on USENIX Security Symposium (pp. 12 - 30). USENIX Association. Christodorescu, M., & Jha, S. (2004). Testing malware detectors. ACM SIGSOFT Software Engineering Notes, 29 (4), 34 - 44. Christodorescu, M., Jha, S., Seshia, S. A., Song, D., & Bryant, R. E. (2005). Semantics-Aware Malware Detection. IEEE Symposium on Security and Privacy (pp. 32 - 46). ACM Press. Filiol, E. (2006). Malware Pattern Scanning Schemes Secure Against Black-box Analysis. Journal in Computer Virology , 35-50. Filiol, E. (2007). Metamorphism, Formal Grammars and Undecidable Code Mutation. International Journal of Computer Science . Filiol, E., & Josse, S. (2007). A statistical model for undecidable viral detection. Journal in Computer Virology, 3, 65-74. Filiol, E., Jacob, G., & Liard, M. L. (2006). Evaluation methodology and theoretical model for antiviral behavioural detection strategies. Journal in Computer Virology , 23-37. Kapoor, A., & Mathur, R. (2008, June). STRIKE ME DOWN, AND I SHALL BECOME MORE POWERFUL! VIRUS BULLETIN , pp. 8-10. Lakhotia, A., Kapoor, A., & Kumar, E. U. (2005, January). Are metamorphic viruses really invincible? - part II. Virus Bulletin , pp. 9-12. Mathur, R. (2006, December). Normalizing Metamorphic Malware using Term-Rewriting. M.S. Thesis . University of Louisiana at Lafayette. Mathur, R., & Kapoor, A. (2007, December). Exploring The Evolutionary Patterns Of Tibs-Packed Executables. Virus Bulletin , pp. 6-9. Soeder, D., & Permeh, R. (2005). BootRoot. Retrieved from eEye: http://research.eeye.com/html/tools/RT20060801-7.html Szor, P., & Ferrie, P. (2001). Hunting for metamorphic. 11th International Virus Bulletin Conference. Tan, X. (2007). Anti-unpack Tricks in Malicious Code. AVAR. Seoul. Walenstein, A., Mathur, R., Chouchane, M. R., & Lakhotia, A. (2008). Constructing malware normalizers using term rewriting. Journal in Computer Virology , 307-322. Walenstein, A., Mathur, R., Chouchane, M. R., & Lakhotia, A. (2007). The Design Space of Metamorphic Malware. Proceedings of the 2nd International Conference on Information Warfare. Monterey, CA, U.S.A. Webster, M., & Malcolm, G. (2008, July). Detection of metamorphic and virtualization-based malware using algebraic specification. Journal in Computer Virology .

Thank You! (Danke!) Suggestions & Questions: Email: Rachit_Mathur@avertlabs.com

‘Supervised Automation’ for Malware Variant Generation: Theoretical and Practical Implications