1 / 69

Reverse Architecting

Reverse Architecting. Arie van Deursen. Outline. Legacy systems Reverse architecting Architecture exploration Extraction Abstraction Presentation Evaluation. Motivation. Multi-channel distribution Web enable existing applications Due dilligence / QA Company merger

coy
Download Presentation

Reverse Architecting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reverse Architecting Arie van Deursen

  2. Outline • Legacy systems • Reverse architecting • Architecture exploration • Extraction • Abstraction • Presentation • Evaluation

  3. Motivation • Multi-channel distribution • Web enable existing applications • Due dilligence / QA • Company merger • Helping software immigrants • Estimating new functionality Documentation at best out of date

  4. Legacy Systems Definition: • Any information system that significantly resists evolution • to meet new and changing business requirements Characteristics • Large • Geriatric • Outdated languages • Outdated databases • Isolated

  5. Software Volume • Capers Jones software size estimate: • 700,000,000,000 lines of code • (7 * 109function points ) • (1 fp ~ 110 lines of code) • Total nr of programmers: • 10,000,000 • 40% new dev. 45% enhancements, 15% repair • (2020: 30%, 55%, 15%)

  6. Legacy By Example

  7. Reverse Architecting: Motivation • Architecture description lost or outdated • Obtain advantages of expl. arch.: • Stakeholder communication • Explicit design decisions • Transferable abstraction • Architecture conformance checking • Quality attribute analysis

  8. Software Architecture Structure(s) of a system which • comprise the software components • the externally visible properties of those systems • and the relationships among them

  9. Architectural Structures • Module structure • Data model structure • Process structure • Call structure • Type structure • GUI flow • ...

  10. The 4 + 1 View Model Logical view Development view Use case view Physical view Process view Extract & compare!

  11. Reverse Engineering • The process of analyzing a subject system with two goals in mind: • to identify the system's components and their interrelationships; and, • to create representations of the system in another form or at a higher level of abstraction. Decompilation Reverse Architecting

  12. Reengineering • The examination and alteration of a subject system • to reconstitute it in a new form • and the subsequent implementation of that new form Beyond analysis -- actually improve.

  13. Reengineering

  14. Program Understanding • the task of building mental models of an underlying software system • at various abstraction levels, ranging from • models of the code itself to • ones of the underlying application domain, • for software maintenance, evolution, and reengineering purposes 50% of maintenance effort!!

  15. Cognitive Processes • Building a mental model • Top down / bottom up / opportunistic • Generate and validate hypotheses • Chunking: create higher structures from chunks of low-level information • Cross referencing: understand relationships

  16. Supporting Program Understanding • Architects build up mental models: • various abstractions of software system • hierarchies for varying levels of detail • graph-like structures for dependencies • How can we support this process? • infer number of predefined abstractions • enrich system’s source code with abstractions • let architect explore result

  17. Architecture Exploration • Lesson from compiler construction: split processing in separate stages • Goal: Translate source code into form that can easily be processed by humans Similarity with compilers: translate source code into form that can be processed by machines • parsing turns source code into intermediate form • optimisation improves intermediate form • code generation emits the machine code

  18. Architecture Exploration artifacts repository results extract view query • Extract src models from system artifacts • Query/manipulate to infer new knowledge • Present different views on results

  19. Source Model Extraction artifacts repository results extract view query

  20. Source Model Extraction • Derive information from system artifacts • variable usage, call graphs, file dependencies, database access, … • Challenges • Accurate & complete results • Flexible: easy to write and adapt • Robust: deal with irregularities in input

  21. Syntax Errors Language Dialects Local Idioms Missing Parts Embedded Languages Preprocessing Grammar Challenges • Additional problem: grammar availability • process languages without grammar (e.g. undisclosed proprietary languages) • development of full grammar is expensive (Cobol: 1500 productions, 4-5 months)

  22. accurate complete flexible robust syntactical + + – – lexical – – + + Processing Artifacts • Syntactical analysis • generate / hand-code / reuse parser • Lexical analysis • tools like perl, grep, Awk or LSME, MultiLex • generally easier to develop

  23. Islands: accuracy & completeness Water: robustness Island Grammars • Grammar containing: • detailed productions for constructs of interest • liberal productions that catch remainder

  24. Island Grammars • Grammar containing: • detailed productions for constructs of interest • liberal productions that catch remainder Input Parse tree “standard” grammar Parse tree island grammar

  25. Island Grammars • Grammar containing: • detailed productions for constructs of interest • liberal productions that catch remainder Lisland Accept larger language: • catch dialects, syntax errors, embedded languages, … L

  26. Island Grammars • Grammar containing: • detailed productions for constructs of interest • liberal productions that catch remainder Gi GL GL Often smaller grammar • can share productions • can have different structure Gi’

  27. Example (Water) lexical syntax ~[]  Water {avoid} context-free syntax Water  Part Part*  Input Water is “fall-back”

  28. Example (Program Calls) lexical syntax ~[]  Water {avoid} [A-Z][A-Z0-9]*  Id context-free syntax Water  Part Part*  Input “CALL” Id  Call Call  Part Water is “fall-back”

  29. Query and Manipulate artifacts repository results extract view query

  30. Query and Manipulate • Goals: • infer new knowledge & abstractions • filter information • Example structures: • Perform graph • Call graph (OI, PVL) • Screen flow • Batch job • Subsystem dbs In search for more abstraction

  31. Combining Data & Functionality • Cluster analysis • technique for finding groups in data • Relies on metrics to compare distance between data items • Concept analysis • for finding groups too • Relies on maximal subsets of data items sharing a set of features

  32. Cluster Analysis • Calculate distance (similarity) number between all data items (record fields) • Use clustering to find hierarchy

  33. 0 1 Name Title Initial Prefix Dendrogram

  34. 0 1 Name Title Initial Prefix Number Nb-Ext Zipcode Dendrogram

  35. 0 1 Name Title Initial Prefix Number Nb-Ext Zipcode Dendrogram Distance is 1

  36. 0 1 Name Title Initial Prefix Number Nb-Ext Zipcode Distance is 1 City Dendrogram

  37. 0 1 Name Title Initial Prefix Number Nb-Ext Zipcode City Street Dendrogram

  38. 0 1 Name Title Initial Prefix Number Nb-Ext Zipcode City Street Dendrogram

  39. 0 1 Name Title Initial Prefix Number Nb-Ext Zipcode City Street Dendrogram

  40. 0 2 1 Dendrogram from Real Data Amount OfficeName BankCity IntAccount OfficeType PaymentKind RelationNr ChangeDate Account MortSeqNr MortNr TitleCd Prefix Initial Name ZipCd CountyCd StreetNr City Street

  41. Concept Analysis • Relies on maximal subsets of data items sharing a set of features • Concept analysis finds a lattice

  42. Set of features Set of items (field names) P1 P2 P3 P4  Concept Lattice  top All Variables bottom

  43. P1 P4 Name Title Initial Prefix Number Nb-Ext Zipcode Street City P1 P2 P3 P4  Concept Lattice  top All Variables bottom

  44. P1 Name Title Initial Prefix P3 P4 P2 P4 Street City P1 P2 P3 P4  Concept Lattice  top All Variables P4 Number Nb-Ext Zipcode Street City bottom

  45. P1 Name Title Initial Prefix P2 P4 P3 P4 City Street P1 P2 P3 P4  Concept Lattice  top All Variables P4 Number Nb-Ext Zipcode Street City bottom

  46. Many fields Progr. nrs Concept Fields One field

  47. System Views • Grouping method based on feature table • Metrics or subset based • Find alternative system views: • Kruchten’s logical view • Object-based view on procedural code • Starting point for “objectification” • Keep “human in the loop”

  48. Types • A type describes a set of possible values • A type groups variables • A type encapsulates representation • Parameter types provide interfaces • Types provide component connectors Types are architectural structures

  49. But types are already available... • Not in a legacy language like Cobol: • Data division declares variables + structure • No separation between type/variable. • Repeated structure per variable. • No enumeration types, no ranges. • No parameters for sections • Similar problems with other legacy languages

  50. Automatic Type Inference • Group variables based on usage • Initially: • Each variable unique primitive type • From statements infer equivalencies: • Assignment v := e • Comparison e1 > e2 • Computation e1 + e2

More Related