Diversity Algorithms for Worrisome Software and Networks (DAWSON)

Diversity Algorithms for Worrisome Software and Networks(DAWSON) James Just, GITI Karl Levitt & Jeff Rowe, UC Davis

The DAWSON Team • Global InfoTek, Inc (GITI) • James Just , Bruce Wilner, Mark Cornwell • UC Davis • Karl Levitt, Hao Chen, Zhendong Su, Jeff Rowe • Ivan Balepin, Allen Ting, Ebrima Ceesay, Tufan Demir • R. Sekar (consultant, SUNY)

The Problem Space • Increasing monoculture in operating systems and key applications create opportunity for massive common mode failures • Highly susceptible to viruses and worms • Spread widely and quickly (or slowly) • Intrusion tolerant systems difficult to build • Expensive, large hw/sw footprints, a priori knowledge of attack modalities • Need significant diversity of spares -- even if intrusion tolerance is affordable approach, practical limits exist on diversity of spares • N-version programming is costly • Limited commercial diversity of similar apps • Foreseeable cyber-risks dominated by static, durable executable monoculture

Synthetic Diversity Approaches • Source code: • Easiest area to introduce diversity via compilers • Complicates distribution and updating • Significant market penetration problems with software vendors • Binary (executable) code: • More difficult to introduce diversity • Disassembly is not exact science • Runtime and randomized modification is harder still • Can be implemented by organizations and users who need the benefits • Binary code with annotations (hints)

DAWSON Approach • Runtime randomized transformations of executable • Preserve functionality of executable modules (e.g., dll) • Transform binary code, machine addresses, names, etc • Can use annotations to facilitate • Random “cryptographic” keys will cause unique transformations on each application restart • Similar approach for network protocols • Low overhead transforms (runtime performance) • Goal: Beat program metric by 10X for large fraction of exploit space • 100 functional equivalents with no more than 3 susceptible to same exploit as baseline code for most exploits • Policies can determine which type of transform is favored

Evaluation • Internal testing with • Fabricated applications with known vulnerabilities and exploits • Real applications with known vulnerabilities and exploits • Independent test team?

Impacts • Introducing spatial and temporal diversity to common Windows applications will reduce the effective size of world’s largest computer monoculture • Size and breadth of transform space vs vulnerability space • Extent of penetration • Counter-measures • Fine-grained diversity mechanisms are key enabler for intrusion tolerance • Fourth generation, secure, survivable systems • Easily integrated with other SRS components.

Transition and Future Work • Standard research publications • Integrate with follow-on projects/products • If successful • Possible GITI commercial product • Possible open source “toolkit” approach • Microsoft or other software vendors • Military users • More definitive plans as project develops

Software: Concept to Implementation Software Specification Source Code Executable Code Linker Loader Machine-level Code Machine Code Specification

Illustrative Common Techniques & AssumptionsSegment of Code Red 1 Disassembly1 1. Relative Virtual Addressing is Windows name for identifying specific memory addresses in executable files without hardcoding (offset to virtual memory load location). 01 loc_4B4: ; CODE XREF: DO_RVA+26Dj 02 mov esi, esp 03 mov ecx, [ebp-198h]; set ecx with the data segment pointer 04 push ecx; push data segment (pointer of function to load) 05 mov edx, [ebp-1CCh]; get current RVA base offset 06 push edx; push module handle(base loaded address) 07 call dword ptr [ebp-190h]; call GetProcAddress 2. This calls dll’s specific API (GetProcAddress) via IAT. GetProcAddress is one of two key functions for the exploit. LoadLibraryA (not shown) is the other. 3. After this, GetProcAddress and LoadLibraryA are used to load kernel32.DLL, infocomm.DLL and WS2_32.DLLto access the file system, open network sockets and send and receive network packets. 1 Excerpted from Code Red Disassembly Analysis by Ryan Permeh, Marc Maiffret. http://www.eeye.com/html/advisories/codered.zip.

Many Assumptions Made • Breaking the assumptions mechanically • Re-arrange the run-time stack • Permute the addresses in the jump table • Change the machine code (table transformation) • Change the interpretation of (encrypt) filenames • Change the order of parameters for system calls • Encrypt file name parameters to system calls • Rename ports for network connections • Put return pointers on a separate stack

Current Attack Problem Software Specification Source Code Executable Code Known V-Spec Vulnerability Linker Loader Attack Machine-level Code Machine Code Specification

Breaking the V-Spec Software Specification Transform Specifications Source Code Executable Code Known V-Spec Vulnerability?? Transformer Linker Loader Machine-level Code Machine-level Code Machine-level Code Machine-level Code Machine-level Code Machine-level Code Machine-level Code X Attack Machine-level Code Machine-level Code Machine-level Code Machine-level Code Machine-level Code Doesn’t Match Machine-level Code Machine-level Code Machine Code Specification V-Spec Unknown Until Load-Time

Transform Techniques in Literature* • Obfuscation • Layout obfuscation (scramble identifiers, remove comments, change formats) • Control flow obfuscations (Statement grouping, ordering, computation, opaque constructs) • Data obfuscation (Storage, encoding, grouping, ordering) • Preventative transformations (prevent decompilers from operating by exploiting weaknesses) • Inherent (aliases, variable or bogus dependencies, opaqueness side effect & difficulty) • Targeted • Source code • N-version programming • Functional-behavior preserving diversity in components used (e.g., different encryption algorithms, different scales for data such as Celsius or Fahrenheit) • Semantics preserving source code transformations • Place sensitive data (such as function and data pointer) below the starting address of any buffer • Variable ordering • Equivalent instructions • Variable compilation --Variable internal names, padding and addresses, linking orders • Insertion of opaque constructs or other dead code to change memory layout • Binary code • Address transformation on binary code • Randomize base address of memory regions (Stack, Heap, DLL, routines/static data in executable) • Permute order of variable/routines (Local variables in stack frame, static variables, routines in shared libraries or routines in executables) • Introduce random gaps between objects (Padding in stack frames, between successive malloc allocation requests, between variables in the static area; Gaps within routines and add jump instructions to skip over gaps) • System resource, system call, or DLL name/address transformation • Instruction set transformation *References at end of document

Effectiveness of Transforms Diversity is not a panacea for achieving cyber-security. There are many other ways to penetrate a system

Diversity System Architecture Overview Execution Space Input Key Generator Policy Monitor Original Program Input Key Transformed Code Transformer Loader Analyzer Wrapper Annotation

Untranslation Wrapper Diversity System Functional Architecture Response to normal user inputs are translated & untranslated so they work Modified loader transforms original stored program and generates wrapper that retranslates external calls Attacker User Inputs Other System Resources Original Program Modified Loader Random Key Transformed In-memory program Some attacks fail because assumed vulnerability is gone Other attacks fail because injected commands are wrong Annotation File

DAWSON Project Schedule & Milestones • Baseline Tasks • 1. Requirement Refinement • 2. Program. Code Diversity • 3. Protocol Diversity • 4. Integration • 5. T&E • Program Mgt. • Prototype • Optional Task • - Self-Monitoring FY04 FY05 FY06 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 1 2 3

Creating heterogeneous environments for malicious code Karl Levitt, Hao Chen, Zhendong Su, Jeff Rowe, Ivan Balepin, Allen Ting, Ebrima Ceesay, Tufan Demir UC Davis Computer Security Lab

Why do we want heterogeneity? • Homogeneity is a natural requirement • As for software: development, distribution, maintenance, data sharing, etc. • Fixed protocols for communications and interoperability • The real challenge is in achieving both • Create a heterogeneous vulnerability environment • Keep all the benefits of homogeneity.

Host level Heterogeneity • Microsoft Windows • Most prevalent platform • Many security flaws • Defense against Scripted Attacks • Windows forms a wide homogeneous environment for large scale attacks. • Single attack implementation compromises all machines with the vulnerable service. • Buffer overflow is the most common vulnerability in widespread attacks.

Windows Execution Environment • Windows executables must call API functions for any significant tasks • System calls are not accessible by a user program. • All API functions are provided in DLLs. • Load address of API functions is not known until the program loads • Load address of API functions varies from host to host

How does DLL system work? 80000000 stack kernel32.dll 20000000 LoadLibraryA() 77E9D961 IAT 010031A0 77E9D961 `LoadLibraryA’ .text 77E80000 Call *010031A0 01001000 heap 00070000 65D60000 00000000

Permute IAT 80000000 stack Call *010031A0 kernel32.dll 20000000 77E9D961 LoadLibraryA() IAT `LoadLibraryA’ 010031A0 77E90332 .text `GetProcAddress’ 77E9D961 0100308C 77E90332 GetProcAddress() 77E80000 Call *010031A0 Call *0100308C 01001000 heap 00070000 65D60000 00000000

Case study: Code Red stack Injected code kernel32.dll 77E9D961 LoadLibraryA() 20000000 .text EAT `LoadLibraryA’ 77E9D961 01001000 heap `KERNEL32’ 77E80000 00070000 00000000

Case study: SQL Sapphire stack Injected code kernel32.dll 77E9D961 LoadLibraryA() 20000000 .text sqlsort.dll 01001000 IAT 77E9D961 heap 77E80000 00070000 00000000

Case study: MS Blaster PEB stack Injected code `KERNEL32’ 77E80000 20000000 .text kernel32.dll LoadLibraryA() 77E9D961 01001000 heap EAT `LoadLibraryA’ 77E9D961 00070000 77E80000 00000000

Operand hijacking 80000000 PEB stack Injected code 20000000 kernel32.dll LoadLibraryA() 77E9D961 IAT .text 0100308C 77E9D961 77E80000 Call *0100308C 01001000 heap 00070000 65D60000 00000000

Heterogeneous Network Protocols • Many common attacks rely upon security flaws in network protocols. • Invariant properties of protocols are the foundation of network communications. • How can network protocols be diversified while still maintaining the critical invariant requirements.

Approach • Use a state-machine specification of protocols to express the required invariant properties. • Transitions between states, representing protocol messages, can be modified as long as a path to the subsequent state is maintained. • Variation in the modifications can create a diverse collection of network protocols whose protocol messages may differ, but with the required invariant properties needed for functionality.

Reallocate Released T2 Timeout Request arrive Bind Relinquish Offer Confirm Allocate IP Bind Confirm IP Prepare Offer Waiting Expire Binding Release END Request Selected INIT INIT Bound Bound Selecting ReplyWait Renew/InfoReq Renew/InfoReq Bind_Dec/Timeout ReplyTimeOut Rejected/Timeout No Available IP Example: The DHCP Protocol DHCP Server FSM DHCP Client FSM

A DHCP Protocol Variation • Introduce two new states: S1 and C1. • Introduce two new transitions: Snew and Cnew. • Standard DHCP protocol messages take the server and client from the new state to the invariant state preserving protocol functionality. • Attacks depending upon forged AllocateIP messages will fail since the attacker assumes that server is in the IP_prepare state Reallocate Released S1 Allocate IP Snew Request arrive IP Prepare Expire END Selected INIT Bound ReplyWait Renew/InfoReq Rejected/Timeout No Available IP T2 Timeout C1 Offer Confirm Cnew Bind Relinquish Bind Confirm Offer Waiting Binding Release Request INIT Selecting Bound Renew/InfoReq Bind_Dec/Timeout ReplyTimeOut

Research Topics • Evaluation of diversity approach for known large-scale attacks. • Evaluation for hypothesized attacks. • Analyze complexity of diversified system • How does it affect normal system operations • Does it increase the attacker’s work factor • Prototypes • Heterogeneous host-based Windows environment • Heterogeneous network environments

Related research • Source code checker • Flawfinder, ITS4, PScan, Splint, etc. • Compiler techniques • StackGuard, StackShield • Non-executable stack • OverflowGuard • SecureStack • Service pack for Windows (will be available in ‘04) • Run-time protection • Baratloo et. al. • Code Obfuscation

References • Lee Badger, Larry D'Anna, Doug Kilpatrick, Brian Matt, Andrew Reisse, Tom Van Vleck. “Self-Protecting Mobile Agents Obfuscation Techniques Evaluation Report,” Network Associates Laboratories, Report #01-036, Nov 30, 2001, updated March 22, 2002. • [Balzer-99] R. Balzer, N. Goldman. Mediating Connectors. Proceedings of the 19th IEEE International Conference on Distributed Computing Systems, Austin, Texas, May 31-June 4, 1999, IEEE Computer Society Press 73-77 • Boaz Barak, Oded Goldreich, Russell Impagaliazzo, Steven Rudich, Amit Sahai, Salil Vadhan, and Ke Yang. “On the (im)possibility of obfuscating programs.” In J. Kilian, editor, Advances in Cryptology-CRYPTO ‘01, Lecture Notes in Computer Science. Springer-Verlag. • Elena Gabriela Barrantes, David H. Ackley, Stephanie Forrest, Trek S. Palmer, Darko Stefanovic and Dino Dai Zovi, “Randomized instruction set emulation to disrupt binary code injection attacks,” 10th ACM Conference on Computer and Communications Security, Washington DC, October 27-31, 2003. • Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar, “Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits,” 12th USENIX Security Symposium, August 2003. • M. Chew, D. Song. “Mitigating Buffer Overflows by Operating System Randomization,” Technical Report CMU-CS-02-197. • C. Collberg, C. Thomborson, and D. Low. “A Taxonomy of Obfuscating Transformations”. Technical Report 148, Department of Computer Science, University of Auckland, July 1997. • C. Collberg, C. Thomborson, and D. Low. “Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs” Department of Computer Science, University of Auckland. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'98). January 1998 • C. Collberg, C. Thomborson, D. Low. “Breaking Abstractions and Unstructuring Data Structures”, Proceedings of the 1998 International Conference on Computer Languages, pages 28-38. IEEE Computer Society Press. May 1998. • Larry D’Anna, Brian Matt, Andrew Reisse, Tom Van Vleck, Steve Schwab, Patrick LeBlanc, “Self-Protecting Mobile Agents Obfuscation Report - Final report,” Network Associates Laboratories, Report #03-015, June 30, 2003 • Hiroaki Etoh and Kunikazu Yoda. Protecting from stack smashing attacks. Published on World-WideWeb at URL http://www.trl.ibm.com/projects/security/ssp/main.html, June 2000. • Stephanie Forrest, Anil Somayaji, and David H. Ackley. “Building diverse computer systems.” In 6th Workshop on Hot Topics in Operating Systems, pages 67-72, Los Alamitos, CA, 1997. IEEE Computer Society Press. • James E. Just, et al., “Learning Unknown Attacks A Start,” Recent Advances in Intrusion Detection, 5th International Symposium, Zurich, Switzerland, October 16-18, 2002, Proceedings, A. Wespi, G. Vigner, and L. Deri, (Eds.), Springer, Lecture Notes in Computer Science. • Gaurav S. Kc, Angelos D. Keromytis, Vassilis Prevelakis, “Countering Code-Injection Attacks with Instruction-Set Randomization,” 10th ACM Conference on Computer and Communications Security, Washington DC, October 27-31, 2003. • Cullen Linn, Saumya Debray, “Obfuscation of Executable Code to Improve Resistance to Static Disassembly,” ACM Conference on Computer and Communications Security, Washington DC, October 27-31, 2003. • D. L. Lough. A Taxonomy of Computer Attacks with Applications to Wireless Networks. PhD Thesis, Virginia Polytechnic and State University, Blackburg, VA. • Douglas Low, Java Control Flow Obfuscation, MS Thesis, Univ. Auckland, 3 June 1998 • Pax. Published on World-Wide Web at URL http://pageexec.virtualave.net, 2001. • Ryan Permeh, Marc Maiffret, Code Red Disassembly Analysis, eEye Digital Security, http://www.eeye.com/html/advisories/codered.zip. • Chenxi Wang, “A Security Architecture for Survivability Mechanisms.” PhD thesis, University of Virginia, October 2000. • Gregory Wroblewski, “General Method of Program Code Obfuscation,” PhD Dissertation, Wroclaw University of Technology, Institute of Engineering Cybernetics, 2002. • Jun Xu, Z. Kalbarczyk and R. K. Iyer. Transparent Runtime Randomization for Security. Proc. of 22nd Symposium on Reliable

Diversity Algorithms for Worrisome Software and Networks (DAWSON)