Software Obfuscation

Software Obfuscation Anirban Majumdar University of Trento anirban@disi.unitn.it

Previous Talk (Mariano) • The state of the art in computer security research. • The problem of mobile code and malicious hosts.

Previous Talk (Mariano) • The problem of software piracy and malicious reverse engineering … • 2005 BSA reports USD34 bn loss per year to software firms due to piracy.

Research Problem: Software Protection • Valuable software is distributed in highly-portable intermediate language formats • MSIL for .NET • Bytecode for Java • Intermediate code can be reverse engineered • Non-malicious RE: for testing, integration, extending, … • Malicious RE: for security attacking, piracy, Trojan Horse insertion, … • How can we “harden” software so that it resists reverse engineering?

Existing Protection Techniques • Hardware dongle • Not viable. Retrofit hardware needed. • Server-side execution • High bandwidth always-on requirement. • Additional problems of network security – authorisation, authentication need to be taken care of. • Encryption • Chicken-egg conundrum … the decryption routine is visible. Works only if entire decryption/execution takes place in hardware.

Talk Outline • Define obfuscation. • Provide taxonomy of obfuscations with examples. • Play two obfuscation games to defeat reverse engineering. • Future of obfuscation.

What is obfuscation? • It is a software protection technique. • Transforms the application into one that is functionally identical to the original but is more difficult to reverse engineer. • Can never completely protect an application from malicious reverse engineering. • Given sufficient time and resources, an adversary can reverse engineer any obfuscated code.

Potential application domains • Good ones … • Obscure program logic. • Hide ownership information (e.g. watermarks --- discussed by Mariano) • Bad ones … • Development of polymorphic virus or code that contains obfuscated malicious payload. • Code Plagiarism!

Defining Obfuscation • Let P  P’ be a transformation from source program P to target program P’. • P  P’ is an obfuscating transformation if P and P’ have the same observable behaviour; i.e. the following two conditions hold (Collberg and Thomborson): • If P fails to terminate or terminates with an error, then P’ may or may not terminate. • Otherwise, P’ must terminate and produce the same output as P. • Two important conditions that need to be preserved: • functionality – the obfuscated program should have the same input/output behaviour as the input program (semantics preserving transformation), and • unintelligibility – the obfuscated program should be unintelligible to the adversary in some sense.

Quality of Software Obfuscation Evaluated according to four criteria: • Potency: How much obscurity it adds to the program (we can use Software Complexity Metrics to determine this.) • Resilience: How difficult it is to break for an automatic deobfuscator (combination of programmer effort and deobfuscator effort). • Stealth: How well obfuscated code blends in with the rest of the program (context-sensitive metric). • Cost: How much computational overhead (time/space penalty) it adds to the obfuscated application (this can be measured but is probably the least important evaluation criteria).

Goals of obfuscation … • Ideal obfuscator (Ehud Barak, PhD, 2004):- • Should simulate the “black box” property. • Fails if there exists at least one program that cannot be obfuscated by this method; i.e. an adversary can learn something from an examination of the obfuscated version of this program that cannot be learned by merely executing the program repeatedly. • Practical obfuscator (What we have now):- • Use transforms such that the resourcesrequired for undoing them are too expensive for attackers.

Taxonomy of Obfuscations • Layout obfuscation: Changes or removes useful information from the IL without affecting real instructions. E.g. comment stripping, identifier renaming. • Data Obfuscation: Targets data and data structures in the program. E.g. changing data encoding, splitting/merging arrays. • Control-flow obfuscation: Affects the control-flow within the code. E.g. Reordering statements, introducing dummy control-flow.

Layout Obfuscation • Changes or removes useful information from the IL without affecting real instructions. E.g. comment stripping, identifier renaming. • Used in commercial obfuscators like DashO for Java and Dotfuscator for MSIL … both from PreEmptive Corp.

Data Obfuscations • Variable Encoding

Data Obfuscations • Variable splitting and merging • Arrays can be split into several sub-arrays, two or more arrays can be merged into one bigger array, folded so as to increase the number of dimensions, or flattened to decrease the number of dimensions.

Control-flow Obfuscations • Aggregation/De-Aggregation: The original control-flow logic is disturbed by coalescing unrelated methods or splitting related methods. E.g. DOJ (Design Obfuscator for Java) Method inlining, outlining, cloning, and loop transformations are also fall in this class. • Ordering: This category performs reordering operations on statements, loops, and expressions to disturb the locality of related information. • Spurious Computations: This type of obfuscation is done by modifying the real control-flow by adding spurious computation blocks. E.g. Opaque predicates

The branch dispatcher model [Wang 2001 PhD]

Opaque Predicates • An opaque predicate (): • conditional expression  thus called predicate • value is known to the obfuscator, • value difficult for the adversary to deduce (by statically analysing the code)  thus called opaque • The opacity property of predicates determines the resilience of control-flow transformations, i.e.  opaque a predicate   difficulty in determining its outcome by static analysis.

Opaque Predicates • T/ F –  always evaluates to T/F (Opaquely T/F Predicate) • ? – may sometimes evaluate to T and sometimes to F. (Opaquely Unknown Predicate)

Embedding of opaque predicates(Dummy Code insertion)

Embedding of opaque predicates(Loop condition extension) i = 1; while (i < 100){ … i++; } Can be transformed into: i = 1; j = 100; while ((i < 100) && (j*j*(j+1)*(j+1)%4 == 0)T){ … i++; j = j*i+3; }

Opaque Predicates based on aliasing • Aliasing occurs when two variables refer to the same memory location. • In the presence of aliasing, inter-procedural static analysis is intractable. • This intractability property of pointer aliasing can be used to construct opaque predicates. • Construction based on the fact that it is impossible for approximate static analysers to detect all aliases all of the time. • The basic idea: • Construct a dynamic data structure and maintain a set of pointers on it. • Make opaque predicates from these pointers. • Insert code for manipulating these pointer locations, yet maintain the invariant condition.

Opaque Predicates based on aliasing

Opaque Predicates based on concurrency • Parallel programs are more difficult to analyse than their sequential counterparts because of their interleaving semantics. • Parallel semantics can be incorporated in an otherwise sequential program using threads. • If asynchronous events dictate the scheduling policy of threads, a large amount of nondeterminism may be generated which can be used to construct opaque predicates.

Obfuscatory Strength Evaluation through Reverse Engineering • We do not know, in practice, how to arbitrarily generate sufficiently hard obfuscated problem instances such that all program analysis techniques would fail (e.g. give imprecise, unanalysable results, or be unscalable, run out of memory, crash, or never terminate). • What sort of analysis tools are useful for the automated understanding of the code obfuscated with “computationally intractable” transforms? • Can general purpose program analysis tools be used to assess the obfuscatory strength of aliasing transforms or do we need to develop customised analysis tools instead? • Can we guarantee that all general tools of that category (and its improved versions) can “crack” any general instance of code obfuscated with a particular obfuscation?

Program Slicing • A reverse engineering technique often used to aid program comprehension. • A slice consists of the program parts that potentially affect the values computed at a particular point. • We will restrict ourselves just to backwards slices and output statements.

Experimental Design • We would like to restrict the usefulness of slicing for program comprehension. • Use CodeSurfer to slice our programs. • We slice our unobfuscated program and use this information to create obfuscations that are targetted to restrict the effectiveness of slicing.

Adding dependencies • We consider the nodes from the SDG that are left behind after slicing – we call such nodes the orphans. • Add in obfuscations that create dependencies between the slicing variable and the variables contained within the orphans.

A Particular Example As a running example, we will use the following method which calculates the sum and product of the first n positive integers.

A Particular Example As a running example, we will use the following method which calculates the sum and product of the first n positive integers. The backwards slice from out(y)is indicated in red.

A Particular Example As a running example, we will use the following method which calculates the sum and product of the first n positive integers. The backwards slice from out(y)is indicated in red. The goal is to include the orphans in the slice for y.

Inserting a Bogus Predicate We can add a opaquely true (or false) predicate so that y appears to depend on x. We use the relationship:

Inserting a Bogus Predicate We can add a opaquely true (or false) predicate so that y appears to depend on x. We use the relationship: Here’s the full method…

Inserting a Bogus Predicate We can add a opaquely true (or false) predicate so that y appears to depend on x. We use the relationship: With the slice…

Creating a Variable Encoding We can transform y so that the definition of y seems to depend on x. When we define x we also have to define y too.

Creating a Variable Encoding We can transform y so that the definition of y seems to depend on x. When we define x we also have to define y too. Here’s the obfuscation….

Creating a Variable Encoding We can transform y so that the definition of y seems to depend on x. When we define x we also have to define y too. With the slice…

Adding a variable into the loop • We can add a new variable into the loop that depends on x and y: • Change the guard • Initialise j so that the loop invariant is maintained.

Adding a variable into the loop • We can add a new variable into the loop that depends on x and y: • Change the guard • Initialise j so that the loop invariant is maintained. The new loop…

Adding a variable into the loop • We can add a new variable into the loop that depends on x and y: • Change the guard • Initialise j so that the loop invariant is maintained. With the slice…

Another example Consider the program wc which counts the number of lines (nl), words (nw) and characters (nc) in a file. The backwards slice from nl.

A Particular Example Consider the program wc which counts the number of lines (nl), words (nw) and characters (nc) in a file. The backwards slice from nl … Our goal is to include these orphans in the slice.

An Example Obfuscation As an obfuscation, we add a bogus predicate (that is always false) to create dependencies.

Software Obfuscation

Software Obfuscation

Presentation Transcript

Code Obfuscation

On The (Im)possibility of Software Obfuscation

Applying Software Obfuscation to Malicious Code

Reversing Trojan.Mebroot’s Obfuscation

JavaScript Obfuscation

Obfuscation and Tamperproofing

Obfuscation for Evasive Functions

Advances in Obfuscation

Protecting Obfuscation Against Algebraic Attacks

Wireless Sensor Route Obfuscation

Software Obfuscation from Crackers’ viewpoint

Obfuscation techniques in Java

Code Obfuscation

URL Obfuscation With @

Code Obfuscation Final Presentation

Obfuscation techniques in Java

On The (Im)possibility of Software Obfuscation

Code Obfuscation Final Presentation

Code Obfuscation Midterm Presentation

Binary Obfuscation Using Signals

Software Security Through Code Obfuscation