1 / 41

Lecture 16

Lecture 16. Software Reverse Engineering. Grading. Algorithm for deciding your final grades: Final score: 10% class participation + 40% homework + 50 % project Rank the list: around 50% of A (subject to change), the rest will be B Project grading:

laban
Download Presentation

Lecture 16

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 16 Software Reverse Engineering

  2. Grading • Algorithm for deciding your final grades: • Final score: 10% class participation + 40% homework + 50% project • Rank the list: around 50% of A (subject to change), the rest will be B • Project grading: • Signup (2%) + Proposal (10%) + Mid-point check (8%): 20% • Overall score: 80% • Presentation (10%) • Documents (30%) • Quality of your (part of) work (40%) -- your score • 1 person group: you can do less work, but the quality should be good

  3. Previous Question

  4. A Roadmap for Today • Software reverse engineering: an introduction • Static approaches: a case study • Dynamic approaches: a case study • Reverse engineering tools

  5. What is Software Reverse Engineering • Determining structure or behavior of software by building static or dynamic models • The process of analyzing a subject system to create representations of the system at a higher level of abstraction [Chikofsky90] • Goals: • Understand malware : security • Understand legacy code: software maintenance • Input: source code or binary code Output: invariants, architecture, API rules, … code is not changed

  6. Why Reverse Engineering? • Software maintenance is ”modification of a software to correct faults, to improve performance or to adapt to a changed environment”. (ASNI/IEEE Std 729) • Software maintenance accounts for 50%~90% of total costs in software life-cycle. • Reverse engineering is part of maintenance process and can facilitate this practice. Through reverse engineering, cost can be reduced and value can be added.

  7. Applications of Reverse Engineering • Program comprehension, visualization • Software reuse • Document • Design discovery • Software verification • Modify software • Change of the environment • Redesign the software

  8. Software Reverse Engineering Overall Approaches • Two general techniques: static and dynamic analysis • Static analysis: search source code • Dynamic analysis: running programs with given input, and collect and analyze runtime information • Two steps: • Collect info • Compilation phases (from source code) • Profilers, logs, debuggers • Abstract info and build models • Mining understandable, high-level models

  9. Static models • Based on code structure, dependency, architecture • Example models: • Class diagrams • Design patterns • Dependency graphs at the levels of components, functions and variables • Contracts • Aspects

  10. Static Approaches • Static: only process source code, not execute the programs • Advantages: • No executables required • No input needed • Types of Static Analyses (some of them done in compilers) • Control and data flow analysis • Type checking: types and a set of operations associated with types • Dependency analysis • Slicing and dicing (different ways to partition the software)

  11. Case study: Static Control-Flow Analysis for Reverse Engineering of UML Sequence Diagrams • Existing work: UML class diagram and UML sequence diagram • Tools: Together ControlCenter by Borland and Eclipse UML by Omondo • UML sequence diagram: • Software understanding • used for testing - interactions among collaborating objects

  12. A Graph Representation for a Program • Control flow graph (CFG): <N, E> N: a set of statements in a program E: represent control transfer between two statements bar() 1 bar(); s = (char*)malloc(80); x[10] = ‘\0’; if(strlen(t)<8) strcpy(s,t); else strcat(x,t); s = (char*)malloc(80) 2 x[10] = ‘0’ 3 strlen(t) < 8 4 yes no 5 strcpy(s,t) strcat(x,t) 6

  13. Case study: Static Control-Flow Analysis for Reverse Engineering of UML Sequence Diagrams • Two Challenges: • How should a CFG be mapped to UML? • Can UML 2.0 enough to specify the discovered control flows? • Sequence diagram: objects and message exchange • Four types of control primitives: • Opt • Loop • Alt • Break

  14. Analysis Algorithm

  15. Control Flow Analysis • Control flow analysis • Find a branch node (alt/opt edges) • Find a Merge point • Find the header of the loop • All of the Loop exit edges • No exceptional flow is considered, as any program point potentially throws an asynchronous exceptions in Java

  16. Design Decisions for Analysis • Tradeoffs between precision and size of sequence diagram: Mapping with replication – Full Precise

  17. Design Decisions for Analysis • Not precise, no replication

  18. Dependency Analysis

  19. Dependency Graph

  20. Dependency Graph

  21. Applications of Dependency Graphs • Security check • Guidance for refactoring • Regression Testing

  22. Summary for Static Techniques • Static approaches advantages and disadvantages: • No executable and input needed • Potentially imprecise: e.g., infeasible • References: • Case study: static control-flow analysis for reverse engineering of UML sequence diagrams • Dependency: combining slicing and constraint solving for validate of measurement software

  23. Abstracting the dynamic model • Finding behavior patterns, repeating sequences of events • E.g. socket protocol, secure API sequences • Using static abstractions • E.g. representing interactions between high-level software elements in sequence diagrams • Dynamic information is combined with the high-level static model

  24. Dynamic models • Finding out the run-time behaviour of software • debugger, profiler, source code instrumentation • Visualisation: • scenarios (sequence diagrams) • State diagrams • (hierarchical) graphs

  25. Other Information can be Found Using Dynamic Approaches • Object creation and related dependencies • Dynamic binding, polymorphism • Method calls (virtual calls and function pointers) • Looking for dead code/reachability analysis • Memory management • Performance and related problems • Concurrency

  26. Case Study: dynamic analysis to find program invariants • Program invariant: a property that holds at a certain point or points of a program • Dynamic invariant detection: runs a program, observes the values that the program computes, and reports the properties that were true over the observed executions • Types of invariants • Constant • Non-zero • Range: a < x < b • Linear: y = ax+b • Ordering: a than b ……

  27. Case Study: dynamic analysis to find program invariants • Use of the invariants: • Generate test inputs, predict incompatibilities of component integrations, repairing inconsistent data structures, check correctness • Reference: http://pag.csail.mit.edu/daikon

  28. A stack example Fields: Object[] theArray; // Array that contains the stack elements. inttopOfStack; // Index of top element. -1 if stack is empty Methods: void push(Object x) // Insert x void pop() // Remove most recently inserted item Object top() // Return most recently inserted item Object topAndPop() // Remove and return most recently inserted item booleanisEmpty() // Return true if empty; else false booleanisFull() // Return true if full; else false void makeEmpty() // Remove all items

  29. Steps to Run Daikon to Infer Invariants for Stack • Create simple test class: StackArTester • Daikon instruments the code and analyzes the resulting execution traces • Outputs procedural pre/post conditions and also object invariants hold at every public method entry and exit

  30. Daikon Output for the Stack Example Object invariants for StackAr this.theArray != null this.theArray.getClass() == java.lang.Object[].class this.topOfStack >= -1 this.topOfStack <= this.theArray.length - 1 this.theArray[0..this.topOfStack] elements != null this.theArray[this.topOfStack+1..] elements == null Pre-conditions for the StackAr constructor capacity >= 0 Post-conditions for the StackAr constructor orig(capacity) == this.theArray.length this.topOfStack == -1 this.theArray[] elements == null Post-conditions for the isFull method this.theArray == orig(this.theArray) this.theArray[] == orig(this.theArray[]) this.topOfStack == orig(this.topOfStack) (return == false) <==> (this.topOfStack < this.theArray.length - 1) (return == true) <==> (this.topOfStack == this.theArray.length - 1)

  31. Daikon Internal design • Grammar of variables: global, input, parameters, return • Grammar of predicates: (75 templates) • conditional predicate • supplied template • Program points: entry and exit

  32. Daikon Internal Structures • Instrumenters (language dependent) • Inference engine (generate-and-check algorithm) • Test a set of parameters against traces • Assume all invariants possible and then exclude ones that contradict with the observed values • Optimizations • Equal variables • Dynamically constant variables • Suppress weaker variables • Variable hierachy

  33. Summary for Dynamic techniques • Need a set of good test cases • Challenges of scalabilities • Precise techniques

  34. Reverse engineering for OO software • Dynamic behavior may be hard to detect from static model (creating and deleting objects, garbage collection, dynamic binding,…)-> this emphasises dynamic modelling • Pure object languages support encapsulation (classes, packages,…)-> helps in static reverse engineering -> increases usability of metrics • OO paradigm supports the use of design patterns-> reusability applications (pattern recognition)

  35. Tools • Rigi (University of Victoria, Canada) • http://www.rigi.csc.uvic.ca/ • a research prototype that represents an open and public domain reverse engineering tool • user programmable • analysis for: C, C++, COBOL, PL/AS, LaTeX • SNIFF+ (TakeFive Software) • a software development environment that also provides reverse engineering capabilities

  36. Tools • McCabe’s Visual Reengineering Toolset and Visual Quality Toolset • various views • software metrics (complexity and structuredness) • shown as specific colors on the views • Logiscope (CS Verilog) • reverse eng, code testing, static and dynamic testing, metrics • analysis for: C, C++, Java, ADA • ESW (Viasoft Inc.) • forward and reverse engineering (maintenance), metrics, testing

  37. Tools • Refine (Reasoning Systems Inc.) • an open and programmable tool that works in the Refinery environment • tools for generating source code parsing and conversion tools • features for analyzing and re-engineering code • analysis for: Ada, C, Cobol • Imagix4D (Imagix Corp.) • http://www.powersoftware.com/english/im/index.html • a closed tool that provides a large set of built-in functionalities • several views (also 3D) • analysis for: C/C++

  38. CodeCrawler: * a reverse engineering tool that combines metrics and graphs to visualize OO systems

  39. Tools for OO languages • Produce a class diagram from code • Rational Rose (Rational Software Corp.) • Paradigm Plus (Computer Associates International) • OEW (Innovative Software GmbH) • Graphical Designer (Advanced Software Technologies Inc.) • Domain Objects (Domain ObjectsInc.) • COOL:Jex (Sterling Software Inc.) • Fujaba (Paderborn University) • ...

  40. Wild & Crazy Ideas How good software needs to be? As a consumer, you will feel comfortable to take an airplane with the failure rate of: A: 0 B: 0.000001 and below D: 0.001 and below C: 0.01 and below What about stock software, mobile phone, …

More Related