Program analysis representation and transformation
Download
1 / 49

Program Analysis, Representation, and Transformation - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

Program Analysis, Representation, and Transformation. Program Analysis. Extracting information, in order to present abstractions of, or answer questions about, a software system Static Analysis: Examines the source code Dynamic Analysis: Examines the system as it is executing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Program Analysis, Representation, and Transformation' - appollo-kristin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Program analysis
Program Analysis

  • Extracting information, in order to present abstractions of, or answer questions about, a software system

  • Static Analysis: Examines the source code

  • Dynamic Analysis: Examines the system as it is executing

COSC6431


What are we looking for
What are we looking for?

  • Depends on our goals and the system

    • In almost any language, we can find out information about variable usage

    • In an OO environment, we can find out which classes use other classes, which are a base of an inheritance structure, etc.

    • We can also find potential blocks of code that can never be executed in running the program (dead code)

    • Typically, the information extracted is in terms of entities and relationships

COSC6431


Entities
Entities

  • Entities are individuals that live in the system, and attributes associated with them.

    Some examples:

    • Classes, along with information about their superclass, their scope, and ‘where’ in the code they exists.

    • Methods/functions and what their return type or parameter list is, etc.

    • Variables and what their types are, and whether or not they are static, etc.

COSC6431


Relationships
Relationships

  • Relationships are interactions between the entities in the system.

    Relationships include:

    • Classes inheriting from one another.

    • Methods in one class calling the methods of another class, and methods within the same class calling one another.

    • One variable referencing another variable.

COSC6431


Information format
Information format

  • Many different formats in use

  • Simple but effective: RSFinherit TRIANGLE SHAPE

  • TA is an extension of RSF that includes a schema$INSTANCE SHAPE Class

  • GXL is a XML-like extension of TABlow-up factor of 10 or more makes it rather cumbersome

COSC6431


Static analysis
Static Analysis

  • Involves parsing the source code

  • Usually creates an Abstract Syntax Tree

  • Borrows heavily from compiler technology but stops before code generation

  • Requires a grammar for the programming language

  • Can be very difficult to get right

COSC6431


Cppets
CppETS

  • CppETS is a benchmark for C++ extractors

  • It consists of a collection of C++ programs that pose various problems commonly found in parsing and reverse engineering

  • Static analysis research tools typically get about 60% of the problems right

COSC6431


Example program
Example program

#include <iostream.h>

class Hello {

public: Hello(); ~Hello();

};

Hello::Hello()

{ cout << "Hello, world.\n"; }

Hello::~Hello()

{ cout << "Goodbye, cruel world.\n"; }

main() {

Hello h;

return 0;

}

COSC6431


Example q a
Example Q&A

  • How many member methods are in the Hello class? Answer: Two, the constructor (Hello::Hello()) and destructor (Hello::~Hello()).

  • Where are these member methods used?Answer: The constructor is called implicitly when an instance of the class is created. The destructor is called implicitly when the execution leaves the scope of the instance.

COSC6431


Static analysis in ides
Static analysis in IDEs

  • High-level languages lend themselves better to static analysis needs

    • EiffelStudio automatically creates BON diagrams of the static structure of Eiffel systems

    • Rational Rose does the same with UML and Java

  • Unfortunately, most legacy systems are not written in either of these languages

COSC6431


Static analysis pipeline
Static analysis pipeline

Source code

Parser

Abstract Syntax Tree

Fact extractor

Clustering algorithm

Fact base

Visualizer

Metrics tool

COSC6431


Dynamic analysis
Dynamic Analysis

  • Provides information about the run-time behaviour of software systems, e.g.

    • Component interactions

    • Event traces

    • Concurrent behaviour

    • Code coverage

    • Memory management

  • Can be done with a profiler or a debugger

COSC6431


Instrumentation
Instrumentation

  • Augments the subject program with code that transmits events to a monitoring application, or writes relevant information to an output file

  • A profiler can be used to examine the output file and extract relevant facts from it

  • Instrumentation affects the execution speed and storage space requirements of the system

COSC6431


Instrumentation process
Instrumentation process

Source code

Annotator

Annotated program

Annotation

script

Compiler

Instrumented

executable

COSC6431


Dynamic analysis pipeline
Dynamic analysis pipeline

Instrumented

executable

CPU

Dynamic analysis data

Profiler

Clustering algorithm

Fact base

Visualizer

Metrics tool

COSC6431


Non instrumented approach
Non-instrumented approach

  • One can also use debugger log files to obtain dynamic information

  • Disadvantage: Limited amount of information provided

  • Advantage: Less intrusive approach, more accurate performance measurements

COSC6431


Dynamic analysis issues
Dynamic analysis issues

  • Ensuring good code coverage is a key concern

  • A comprehensive test suite is required to ensure that all paths in the code will be exercised

  • Results may not generalize to future executions

COSC6431


Static vs dynamic

Reasons over all possible behaviours (general results)

Conservative and sound

Challenge: Choose good abstractions

Observes a small number of behaviours (specific results)

Precise and fast

Challenge: Select representative test cases

Static vs. Dynamic

COSC6431


Program representation
Program Representation

  • Fundamental issue in re-engineering

    • Provides means to generate abstractions

    • Provides input to a computational model for analyzing and reasoning about programs

    • Provides means for translation and normalization of programs

COSC6431


Key questions
Key questions

  • What are the strengths and weaknesses of various representations of programs?

  • What levels of abstraction are useful?

COSC6431


Representation schemes
Representation schemes

  • Chosen based on objectives and tasks to be performed. Popular ones are:

    • Abstract syntax trees

    • Control Flow Graphs

    • Data Flow Graphs

    • Structure Charts

    • Module Dependency Graphs

COSC6431


Abstract syntax trees
Abstract Syntax Trees

  • A translation of the source text in terms of operands and operators

  • Omits superficial details, such as comments, whitespace

  • All necessary information to generate further abstractions is maintained

COSC6431


Ast production
AST production

  • Four necessary elements to produce an AST:

    • Lexical analyzer (turn input strings into tokens)

    • Grammar (turn tokens into a parse tree)

    • Domain Model (defines the nodes and arcs allowable in the AST)

    • Linker (annotates the AST with global information, e.g. data types, scoping etc.)

COSC6431


Ast example
AST example

  • Input string: 1 + /* two */ 2

  • Parse Tree:

  • AST (withoutglobal info)

+

1

2

Add

arg1

arg2

int

int

1

2

COSC6431


Control flow graphs
Control Flow Graphs

  • Offer a way to eliminate variations in control statements by providing a normalized view of the possible flow of execution of a program

  • To produce a CFG:

    • AST of the program

    • Decomposition of the program into basic blocks

    • Basic semantics on the control statements of the language

COSC6431


Data flow graphs
Data Flow Graphs

  • Focus mostly on the exchange of information between program components, i.e. basic blocks, functions, modules

  • To produce a DFG:

    • AST of the program

    • Decomposition of the program into basic blocks (or more coarsely-grained level)

    • Annotations on uses and definitions of variables

COSC6431


Structure charts
Structure charts

  • Represent data and control information in a concise and compact form

  • To produce a structure chart:

    • The CFG of the program

    • The DFG of the program

COSC6431


Module dependency graphs
Module Dependency Graphs

  • The most common way to represent data coupling and data dependencies between program and system entities

  • To produce an MDG:

    • Structure chart of the program

    • Information on parameter passing between procedures and functions

    • Containment information

COSC6431


Program transformation
Program Transformation

  • A program is a structured object with semantics

  • Structure allows us to transform a program

  • Semantics allow us to compare programs and decide on the validity of transformations

COSC6431


Program transformation1
Program Transformation

  • The act of changing one program into another (from a source language to a target language)

  • Used in many areas of software engineering:

    • Compiler construction

    • Software visualization

    • Documentation generation

    • Automatic software renovation

COSC6431


Application examples
Application examples

  • Converting to a new language dialect

  • Migrating from a procedural language to an object-oriented one, e.g. C to C++

  • Adding code comments

  • Requirement upgrading, e.g. using 4 digits for years instead of 2 (Y2K)

  • Structural improvements, e.g. changing GOTOs to control structures

  • Pretty printing

COSC6431


Simple program transformation
Simple program transformation

  • Modify all arithmetic expressions to reduce the number of parentheses using the formula: (a+b)*c = a*c + b*cx := (2+5)*3becomesx := 2*3 + 5*3

COSC6431


Two types of transformations
Two types of transformations

  • Translation

    • Source and target language are different

    • Semantics remain the same

  • Rephrasing

    • Source and target language are the same

    • Goal is to improve some aspect of the program such as its understandability or performance

    • Semantics might change

COSC6431


Translation
Translation

  • Program synthesis

    • Lowers the level of abstraction, e.g. compilation

  • Program migration

    • Transform to a different language

  • Reverse Engineering

    • Raises the level of abstraction, e.g. create architectural descriptions from the source code

  • Program Analysis

    • Reduces the program to one aspect, e.g. control flow

COSC6431



Rephrasing
Rephrasing

  • Program normalization

    • Decreases syntactic complexity (desugaring), e.g. algebraic simplification of expressions

  • Program optimization

    • Improves performance, e.g. inlining, common-subexpression and dead code elimination

COSC6431


Rephrasing1
Rephrasing

  • Program refactoring

    • Improves the design by restructuring while preserving the functionality

  • Program obfuscation

    • Deliberately makes the program harder to understand

  • Software renovation

    • Fixes bugs such as Y2K

COSC6431


Transformation tools
Transformation tools

  • There are many transformation tools

  • Program-Transformation.org lists 90 of them

  • Most are based on term rewriting

  • Other solutions use functional programming, lambda calculus, etc.

COSC6431


Term rewriting
Term rewriting

  • The process of simplifying symbolic expressions (terms) by means of a Rewrite System, i.e. a set of Rewrite Rules.

  • A Rewrite Rule is of the formlhs rhswhere lhs and rhs are term patterns

COSC6431


Example rewrite system
Example Rewrite System

0 + x x

s(x) + y s(x + y)

(x + y) + z x + (y + z)

Under these rewrite rules, the term

((s(s(a)) + s(b)) + c)

will be rewritten as

s(s(s(a + (b + c))))

COSC6431


TXL

  • A generalized source-to-source translation system

  • Uses a context-free grammar to describe the structures to be transformed

  • Rule specification uses a by-example style

  • Has been used to process billions of lines of code for Y2K purposes

COSC6431


Txl programs
TXL programs

  • TXL programs consist of two parts:

    • Grammar for the input language

    • Transformation Rules

  • Let’s look at some examples…

COSC6431


Calculator txl grammar

% Part I. Syntax specification

define program

[expression]

end define

define expression

[term]

| [expression] [addop] [term]

end define

define term

[primary]

| [term] [mulop] [primary]

end define

define primary

[number]

| ( [expression] )

end define

define addop

'+

| '-

end define

define mulop

'*

| '/

end define

Calculator.Txl - Grammar

COSC6431


Calculator txl rules

% Part 2. Transformation rules

rule main

replace [expression]

E [expression]

construct NewE [expression]

E [resolveAddition]

[resolveSubtraction]

[resolveMultiplication]

[resolveDivision]

[resolveParentheses]

where not

NewE [= E]

by NewE

end rule

rule resolveAddition

replace [expression]

N1 [number] + N2 [number]

by

N1 [+ N2]

end rule

rule resolveSubtraction …

rule resolveMultiplication …

rule resolveDivision …

rule resolveParentheses

replace [primary]

( N [number] )

by N

end rule

Calculator.Txl - Rules

COSC6431


Dotproduct txl

% Form the dot product of two vectors,

% e.g., (1 2 3).(3 2 1) => 10

define program

( [repeat number] ) . ( [repeat number] )

| [number]

end define

rule main

replace [program]

( V1 [repeat number] ) .

( V2 [repeat number] )

construct Zero [number]

0

by

Zero [addDotProduct V1 V2]

end rule

rule addDotProduct V1 [repeat number]

V2 [repeat number]

deconstruct V1

First1 [number]

Rest1 [repeat number]

deconstruct V2

First2 [number]

Rest2 [repeat number]

construct ProductOfFirsts [number]

First1 [* First2]

replace [number]

N [number]

by

N [+ ProductOfFirsts]

[addDotProduct Rest1 Rest2]

end rule

DotProduct.Txl

COSC6431


Sort txl
Sort.Txl

% Sort.Txl - simple numeric bubble sort

define program

[repeat number]

end define

rule main

replace [repeat number]

N1 [number] N2 [number] Rest [repeat number]

where

N1 [> N2]

by

N2 N1 Rest

end rule

COSC6431


Other txl constructs
Other TXL constructs

compounds

-> :=

end compounds

keys

var procedure exists inout out

end keys

function isAnAssignmentTo X [id]

match [statement]

X := Y [expression]

end function

COSC6431


Www txl ca
www.txl.ca

  • Guided Tour

  • Many examples

  • Reference manual

  • Download TXL for many platforms

COSC6431


ad