reverse architecting n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Reverse Architecting PowerPoint Presentation
Download Presentation
Reverse Architecting

Loading in 2 Seconds...

play fullscreen
1 / 69

Reverse Architecting - PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on

Reverse Architecting. Arie van Deursen. Outline. Legacy systems Reverse architecting Architecture exploration Extraction Abstraction Presentation Evaluation. Motivation. Multi-channel distribution Web enable existing applications Due dilligence / QA Company merger

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Reverse Architecting' - coy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
reverse architecting

Reverse Architecting

Arie van Deursen

outline
Outline
  • Legacy systems
  • Reverse architecting
  • Architecture exploration
    • Extraction
    • Abstraction
    • Presentation
  • Evaluation
motivation
Motivation
  • Multi-channel distribution
    • Web enable existing applications
  • Due dilligence / QA
    • Company merger
  • Helping software immigrants
  • Estimating new functionality

Documentation

at best

out of date

legacy systems
Legacy Systems

Definition:

  • Any information system that significantly resists evolution
  • to meet new and changing business requirements

Characteristics

  • Large
  • Geriatric
  • Outdated languages
  • Outdated databases
  • Isolated
software volume
Software Volume
  • Capers Jones software size estimate:
    • 700,000,000,000 lines of code
    • (7 * 109function points )
    • (1 fp ~ 110 lines of code)
  • Total nr of programmers:
    • 10,000,000
    • 40% new dev. 45% enhancements, 15% repair
    • (2020: 30%, 55%, 15%)
reverse architecting motivation
Reverse Architecting: Motivation
  • Architecture description lost or outdated
  • Obtain advantages of expl. arch.:
    • Stakeholder communication
    • Explicit design decisions
    • Transferable abstraction
  • Architecture conformance checking
  • Quality attribute analysis
software architecture
Software Architecture

Structure(s) of a system which

  • comprise the software components
  • the externally visible properties of those systems
  • and the relationships among them
architectural structures
Architectural Structures
  • Module structure
  • Data model structure
  • Process structure
  • Call structure
  • Type structure
  • GUI flow
  • ...
the 4 1 view model
The 4 + 1 View Model

Logical

view

Development

view

Use case

view

Physical

view

Process

view

Extract & compare!

reverse engineering
Reverse Engineering
  • The process of analyzing a subject system with two goals in mind:
    • to identify the system's components and their interrelationships; and,
    • to create representations of the system in another form or at a higher level of abstraction.

Decompilation

Reverse Architecting

reengineering
Reengineering
  • The examination and alteration of a subject system
  • to reconstitute it in a new form
  • and the subsequent implementation of that new form

Beyond analysis -- actually improve.

program understanding
Program Understanding
  • the task of building mental models of an underlying software system
  • at various abstraction levels, ranging from
    • models of the code itself to
    • ones of the underlying application domain,
  • for software maintenance, evolution, and reengineering purposes

50% of

maintenance

effort!!

cognitive processes
Cognitive Processes
  • Building a mental model
  • Top down / bottom up / opportunistic
  • Generate and validate hypotheses
  • Chunking: create higher structures from chunks of low-level information
  • Cross referencing: understand relationships
supporting program understanding
Supporting Program Understanding
  • Architects build up mental models:
    • various abstractions of software system
    • hierarchies for varying levels of detail
    • graph-like structures for dependencies
  • How can we support this process?
    • infer number of predefined abstractions
    • enrich system’s source code with abstractions
    • let architect explore result
architecture exploration
Architecture Exploration
  • Lesson from compiler construction:

split processing in separate stages

  • Goal: Translate source code into form that can easily be processed by humans

Similarity with compilers:

translate source code into form that can

be processed by machines

  • parsing turns source code into intermediate form
  • optimisation improves intermediate form
  • code generation emits the machine code
architecture exploration1
Architecture Exploration

artifacts

repository

results

extract

view

query

  • Extract src models from system artifacts
  • Query/manipulate to infer new knowledge
  • Present different views on results
source model extraction
Source Model Extraction

artifacts

repository

results

extract

view

query

source model extraction1
Source Model Extraction
  • Derive information from system artifacts
    • variable usage, call graphs, file dependencies, database access, …
  • Challenges
    • Accurate & complete results
    • Flexible: easy to write and adapt
    • Robust: deal with irregularities in input
grammar challenges
Syntax Errors

Language Dialects

Local Idioms

Missing Parts

Embedded Languages

Preprocessing

Grammar Challenges
  • Additional problem: grammar availability
    • process languages without grammar

(e.g. undisclosed proprietary languages)

    • development of full grammar is expensive (Cobol: 1500 productions, 4-5 months)
processing artifacts

accurate complete flexible robust

syntactical + + – –

lexical – – + +

Processing Artifacts
  • Syntactical analysis
    • generate / hand-code / reuse parser
  • Lexical analysis
    • tools like perl, grep, Awk or LSME, MultiLex
    • generally easier to develop
island grammars

Islands:

accuracy & completeness

Water:

robustness

Island Grammars
  • Grammar containing:
    • detailed productions for constructs of interest
    • liberal productions that catch remainder
island grammars1
Island Grammars
  • Grammar containing:
    • detailed productions for constructs of interest
    • liberal productions that catch remainder

Input

Parse tree “standard” grammar

Parse tree island grammar

island grammars2
Island Grammars
  • Grammar containing:
    • detailed productions for constructs of interest
    • liberal productions that catch remainder

Lisland

Accept larger language:

  • catch dialects, syntax errors, embedded languages, …

L

island grammars3
Island Grammars
  • Grammar containing:
    • detailed productions for constructs of interest
    • liberal productions that catch remainder

Gi

GL

GL

Often smaller grammar

  • can share productions
  • can have different structure

Gi’

example water
Example (Water)

lexical syntax

~[]  Water {avoid}

context-free syntax

Water  Part

Part*  Input

Water is

“fall-back”

example program calls
Example (Program Calls)

lexical syntax

~[]  Water {avoid}

[A-Z][A-Z0-9]*  Id

context-free syntax

Water  Part

Part*  Input

“CALL” Id  Call

Call  Part

Water is

“fall-back”

query and manipulate
Query and Manipulate

artifacts

repository

results

extract

view

query

query and manipulate1
Query and Manipulate
  • Goals:
    • infer new knowledge & abstractions
    • filter information
  • Example structures:
    • Perform graph
    • Call graph (OI, PVL)
    • Screen flow
    • Batch job
    • Subsystem dbs

In search for

more abstraction

combining data functionality
Combining Data & Functionality
  • Cluster analysis
    • technique for finding groups in data
    • Relies on metrics to compare distance between data items
  • Concept analysis
    • for finding groups too
    • Relies on maximal subsets of data items sharing a set of features
cluster analysis
Cluster Analysis
  • Calculate distance (similarity) number between all data items (record fields)
  • Use clustering to find hierarchy
dendrogram

0

1

Name

Title

Initial

Prefix

Dendrogram
dendrogram1

0

1

Name

Title

Initial

Prefix

Number

Nb-Ext

Zipcode

Dendrogram
dendrogram2

0

1

Name

Title

Initial

Prefix

Number

Nb-Ext

Zipcode

Dendrogram

Distance is 1

dendrogram3

0

1

Name

Title

Initial

Prefix

Number

Nb-Ext

Zipcode

Distance is 1

City

Dendrogram
dendrogram4

0

1

Name

Title

Initial

Prefix

Number

Nb-Ext

Zipcode

City

Street

Dendrogram
dendrogram5

0

1

Name

Title

Initial

Prefix

Number

Nb-Ext

Zipcode

City

Street

Dendrogram
dendrogram6

0

1

Name

Title

Initial

Prefix

Number

Nb-Ext

Zipcode

City

Street

Dendrogram
dendrogram from real data

0

2

1

Dendrogram from Real Data

Amount

OfficeName

BankCity

IntAccount

OfficeType

PaymentKind

RelationNr

ChangeDate

Account

MortSeqNr

MortNr

TitleCd

Prefix

Initial

Name

ZipCd

CountyCd

StreetNr

City

Street

concept analysis
Concept Analysis
  • Relies on maximal subsets of data items sharing a set of features
  • Concept analysis finds a lattice
concept lattice

Set of features

Set of items

(field names)

P1 P2 P3 P4

Concept Lattice

top

All Variables

bottom

concept lattice1

P1

P4

Name Title

Initial Prefix

Number Nb-Ext

Zipcode Street City

P1 P2 P3 P4

Concept Lattice

top

All Variables

bottom

concept lattice2

P1

Name Title

Initial Prefix

P3 P4

P2 P4

Street

City

P1 P2 P3 P4

Concept Lattice

top

All Variables

P4

Number Nb-Ext

Zipcode Street City

bottom

concept lattice3

P1

Name Title

Initial Prefix

P2 P4

P3 P4

City

Street

P1 P2 P3 P4

Concept Lattice

top

All Variables

P4

Number Nb-Ext

Zipcode Street City

bottom

slide46

Many fields

Progr. nrs

Concept

Fields

One field

system views
System Views
  • Grouping method based on feature table
  • Metrics or subset based
  • Find alternative system views:
    • Kruchten’s logical view
    • Object-based view on procedural code
    • Starting point for “objectification”
  • Keep “human in the loop”
types
Types
  • A type describes a set of possible values
  • A type groups variables
  • A type encapsulates representation
  • Parameter types provide interfaces
  • Types provide component connectors

Types are architectural structures

but types are already available
But types are already available...
  • Not in a legacy language like Cobol:
    • Data division declares variables + structure
    • No separation between type/variable.
    • Repeated structure per variable.
    • No enumeration types, no ranges.
    • No parameters for sections
  • Similar problems with other legacy languages
automatic type inference
Automatic Type Inference
  • Group variables based on usage
  • Initially:
    • Each variable unique primitive type
  • From statements infer equivalencies:
    • Assignment v := e
    • Comparison e1 > e2
    • Computation e1 + e2
example

DATA DIVISION.

01 PERSON.

03 INITIALS PIC X(05).

03 NAME PIC X(27).

03 STREET PIC X(18).

01 TAB000

03 A00-NAME-PART.

05 A00-POS PIC X(01) OCCURS 40.

03 A00-MAX PIC S9(03) COMP-3 VALUE 40.

03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.

01 N000.

03 N100 PIC S9(03) COMP-3 VALUE 0.

...

PROCEDURE DIVISION.

R210-INITIAL SECTION.

MOVE INITIALS TO A00-NAME-PART.

PERFORM R300-COMPOSE-NAME.

R300-COMPOSE-NAME SECTION.

...

PERFORM UNTIL N100 > A00-MAX

...

IF A00-FILLED = N100

...

Example
slide52

DATA DIVISION.

01 PERSON.

03 INITIALS PIC X(05).

03 NAME PIC X(27).

03 STREET PIC X(18).

01 TAB000

03 A00-NAME-PART.

05 A00-POS PIC X(01) OCCURS 40.

03 A00-MAX PIC S9(03) COMP-3 VALUE 40.

03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.

01 N000.

03 N100 PIC S9(03) COMP-3 VALUE 0.

...

PROCEDURE DIVISION.

R210-INITIAL SECTION.

MOVE INITIALS TO A00-NAME-PART.

PERFORM R300-COMPOSE-NAME.

R300-COMPOSE-NAME SECTION.

...

PERFORM UNTIL N100 > A00-MAX

...

IF A00-FILLED = N100

...

Example

N100, A00-MAX and

A00-FILLED are equivalent

slide53

DATA DIVISION.

01 PERSON.

03 INITIALS PIC X(05).

03 NAME PIC X(27).

03 STREET PIC X(18).

01 TAB000

03 A00-NAME-PART.

05 A00-POS PIC X(01) OCCURS 40.

03 A00-MAX PIC S9(03) COMP-3 VALUE 40.

03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.

01 N000.

03 N100 PIC S9(03) COMP-3 VALUE 0.

...

PROCEDURE DIVISION.

R210-INITIAL SECTION.

MOVE INITIALS TO A00-NAME-PART.

PERFORM R300-COMPOSE-NAME.

R300-COMPOSE-NAME SECTION.

...

PERFORM UNTIL N100 > A00-MAX

...

IF A00-FILLED = N100

...

Example

slide54

DATA DIVISION.

01 PERSON.

03 INITIALS PIC X(05).

03 NAME PIC X(27).

03 STREET PIC X(18).

01 TAB000

03 A00-NAME-PART.

05 A00-POS PIC X(01) OCCURS 40.

03 A00-MAX PIC S9(03) COMP-3 VALUE 40.

03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.

01 N000.

03 N100 PIC S9(03) COMP-3 VALUE 0.

...

PROCEDURE DIVISION.

R210-INITIAL SECTION.

MOVE INITIALS TO A00-NAME-PART.

PERFORM R300-COMPOSE-NAME.

R300-COMPOSE-NAME SECTION.

...

PERFORM UNTIL N100 > A00-MAX

...

IF A00-FILLED = N100

...

Example

INITIALSsubtype of

A00-NAME-PART

slide55

DATA DIVISION.

01 PERSON.

03 INITIALS PIC X(05).

03 NAME PIC X(27).

03 STREET PIC X(18).

01 TAB000

03 A00-NAME-PART.

05 A00-POS PIC X(01) OCCURS 40.

03 A00-MAX PIC S9(03) COMP-3 VALUE 40.

03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.

01 N000.

03 N100 PIC S9(03) COMP-3 VALUE 0.

...

PROCEDURE DIVISION.

R210-INITIAL SECTION.

MOVE INITIALS TO A00-NAME-PART.

PERFORM R300-COMPOSE-NAME.

R300-COMPOSE-NAME SECTION.

...

PERFORM UNTIL N100 > A00-MAX

...

IF A00-FILLED = N100

...

Example

system level types
System Level Types
  • Propagate types across modules
  • Calls
  • Database operations
  • File I/O
  • Include files / copybooks
  • Lift type dependencies to package level
type inference case study i
Type Inference Case Study (I)
  • 100,000 lines Cobol / CICS system
  • First param of all batch progs:
    • program-fields
    • info required for restart and error recovery
    • literals in subroutine field: all progs
  • First param of all on line progs:
    • dfhcommarea
    • mapped to appropriate record --> type
type inference case study ii
Type Inference Case Study (II)
  • Programs with integer parameter
    • Used as enumeration type
    • Value represents function to be performed
    • Program as package
  • Parameter links
    • Formal parameters of same type
    • RA31.6 = RA36.4
  • Relations between copybooks
presentation of results
Presentation of Results

artifacts

repository

results

extract

view

query

presentation desiderata
Presentation Desiderata
  • Show multiple structures
  • Show relationships between structures
  • Multiple levels of abstraction
    • Zoom in, zoom out
  • Visual as well as textual information
    • Graph visualization
  • Browsing and searching
presenting architectures using hypertext
Presenting ArchitecturesUsing Hypertext
  • Hyperlinked pages for system elements
  • Multiple structures, multiple views
  • Backbone: system hierarchy, sources
  • Abstractions become additional navigation structures
  • Text & clickable graphs
types of navigation
Types of navigation
  • Vertical browsing
    • supported by hierarchical structures
    • zoom into more detailed level
      • system  subsystem  program  …  source
  • Horizontal browsing
    • supported by graph-like structures
    • find related on same abstraction level
      • called programs, variables of same type, etc
presentation challenges
Presentation Challenges
  • Handling abstractions not visible in code
  • Giving abstractions a meaningful name
    • e.g., name for inferred type
  • Defining starting points for browsing
    • lists of types, programs, copybooks, words, lits
    • add cross-cutting hyperlinks on all levels
advanced documentation generation
Advanced Documentation Generation
  • DocGen
    • Provide technical documentation
    • Used for all ABN AMRO Cobol sources
    • Customizable product line
  • TypeExplorer
    • Include inferred types as navigation structure
    • Advance level of abstraction
tool sets
Rigi (Victoria)

Bauhaus (Stuttgart)

Dali (SEI)

Portable Bookshelf (Toronto)

DocGen (Amsterdam)

Extract

Query

Abstract

Present

Visualize

Browse

Search

Tool Sets
swarm wcre 2001
SWARM / WCRE 2001
  • The UML
  • Rationale recovery
  • Pattern-oriented software architecture
  • Architecture description languages
  • Dynamic analysis
  • Software product lines
  • Software architecture “user’s guide”
summary
Summary
  • Extract, abstract, present
  • Multiple structures
  • Zoom in/out, switch abstraction levels
  • Browse / hypertext
  • Compiler construction technology
  • Active area of research
  • Experiment in your projects
further reading i
Further Reading (I)
  • A. van Deursen and T. Kuipers. Identifying Objects using Cluster &Concept Analysis. ICSE’99
  • A. van Deursen and T. Kuipers. Building Documentation Generators. ICSM’99.
  • A. van Deursen and L. Moonen. Exploring Legacy Systems Using Types. WCRE’00.
  • A. van Deursen. Software Architecture Recovery and Modeling. WCRE’2001 workshop report. Applied Computing Review, ACM, 2002.
further reading ii
Further Reading (II)

www.cwi.nl/~arie/papers/

www.cwi.nl/~arie/swarm2001/

www.program-transformation.org