Automated Software Reuse

Automated Software Reuse Jonathan Phillips 7/19/2002

Overview • Introduction • History of Reuse: methods, current research • Examples of automated reuse • Problem definition • Proposed solution • Examples • Future work

Introduction • Definition of software reuse: using existing software artifacts during the construction of a new software system. • The types of artifacts that can be reused are not limited to source code fragments but rather may include design structures, module-level implementation structures, specifications, documentation, transformations, and so on. [Kreuger, 1992].

History of Reuse • Software Engineering Field • NATO Software Engineering Conference 1968 • Birthplace of Software Engineering Field • Software Reuse • Mass Produced Software Components [McIlroy, 1968]

Reuse Methods • High - Level Languages • C, Ada, Lisp, Smalltalk • Reuse assembly language constructs • Manual programmer reuse (code scavenging) • Code fragments or components • Designs, architectures

Reuse Methods cont. • Other common methods: source code components, software schemas, application generators • For full details see Software Reuse, Krueger 1992

Drawbacks of Reuse Methods • Most methods rely on manual reuse • Component reuse, code scavenging • Retrieval systems • Design for reuse • Most research devoted to understanding reuse, its applicability, refining methodologies • Lack of automation

Automated Reuse • Case-based reasoning • Software built from retrieved and adapted solution plan • Problems represented with abstract operators • Derivational analogy • Uses CBR paradigm • problems/solutions stored as derivational traces that contain decision information from previous problem solving episodes

CBR Example: HCBRSmyth et.al. • Plant-Control Software • Hierarchy of concrete tasks/operations • Abstraction hierarchy used to represent concrete operators • Solution plans built abstractly then concretized

Retrieval Adaptation HCBR Example

Abstract Task Load Task :vehicle :content :source-location :target-location :source-container :target-container Unload Load Concrete Task Position-Change Task :vehicle :content Location-Change Task :source-location :target-location Move :speed :slowing-distance Level-Change Task Lift :source-level :target-level :speed :slowing-distance Content Transfer Task :vehicle :content :location :container Collect Junction-Change Task :vehicle :content Cross :junction :track Deliver Align :container :container-level Track-Change Task :vehicle :container :vehicle-content :container-content Engage Track-Change Task :vehicle Switch :content :junction :target-track Disengage Park :source-location :target-location Transfer Task :source-container :target-container :content Release HCBR Example cont. Concretetaskhierarchy Abstract task hierarchy

HCBR Example cont. Unload -1 Collect -1 Park -1 Deliver -1 Move-5 Move-1 Lift-1 Move-2 Align-1 Move-6 Switch-1 Lift-3 Cross-4 Release-1 Engage-1 Disengage-1 Cross-1 Cross-2 Lift - 2 Move-4 Align-2 Move-3 Cross-3 Engage-2 Release-2 Disengage-2 Solution

HCBR: Assets & Hindrances • Assets of interest: • Abstract and concrete hierarchies => hierarchical problem solving • Elimination of sub-solution conflict • Hindrances of interest: • Limited type and number of concrete operators • Fixed abstraction hierarchy • Combine to limit the scope of the system to a single domain

Derivational Analogy: Automated Programmer for Unix • Creates Unix scripts • Problems described using Lisp type logic • Concept dictionary • Rule base contains programming “rules” • Replays derivation of similar problems to create new solution

Concept dictionary Contains all objects relevant to the domain arranged in ISA and CONTAINED hierarchies Abstraction hierarchy for predicates and functions Abstraction hierarchy for predicates and functions APU Example predicate optimally frequent rel-operator subsumed contained CONTAINED hierarchy for objects: object most least owned belongs frequent frequent file directory < > = line word char file directory occurs descendant member

Rule Base 3 rule types: Strategies, problem-solving rules, domain-specific rules Strategy examples divide-and-conquer, recursively-solve, loop-over-objects Problem-solving rule: Rule:To count the number of objects of type A, map the objects of type A into objects of type B, count the number of objects of type B, and apply the inverse mapping relation to the output of the count. Unix specific rule for searching for a pattern in a file: TODO: (SET(?l :line) :SUCHTHAT (and (occurs ?l ?f)(occurs ?pat ?l))) FILTER: (fixed-string? ?pat) PLAN: (COMMAND (fgrep ?pat ?f)) APU Example cont.

INPUT: ?f :file OUTPUT: ?s :set PRECONDITION: true POSTCONDITION: (= ?s (SET (?w :word ?n :integers) :SUCH-THAT (and (occurs ?w ?f) (= ?n (linenumber ?w ?f))) (Tuple ?w ?n))) - No Unix rule (operator) can do this - No problem solving rules -Two strategies to choose from: 1) Divide-Vertically-and-Conquer Rule 2) Divide-Horizontally-and Conquer Rule Choose the Divide-Vertically and Conquer Rule: TODO: (SET (?x :text-object ?y :text-object) :SUCH-THAT ?conds (Tuple ?x ?y)) PLAN: (TEMPLATE (DIVIDE-VERTICALLY-&-CONQUER :ARGUMENTS (?x ?y) :CONDS ?conds)) The divide-divide and conquer template is expanded to: (ACHIEVE (SET(?x :type) :SUCH-THAT R1(?x) > file1 (TEMPLATE (WHILE-LOOP :INPUT file1 :LOOP-VAR ?var1 :LOOP-BODY (ACHIEVE R2(?var1, ?y)) :OUTPUT file2 (ACHIEVE (paste file1 file2)) R(?x) is instantiated to (oocurs ?w ?f) and R2(?var1, ?y) to (= ?n (linenumber ?var1 ?f)). Thus, the while-loop template is expanded to : WHILE read ?var1 DO # replace newlines by space (ACHIEVE (= ?n (linenumber ?var1 ?f)) | tr-s’\012’ ‘ ‘ >> file2 DONE < file 1 APU Example cont.

Thus the solution plan is generated and must be filled in. The subtasks (ACHIEVE statements) generated by this plan are Compute a list of unique words occuring in file ?f Compute a list of linenumbers of the words in file ?f Paste the two lists together Through sub-goal decomposition further sub-goals are formulated: 4. Compute a list of words occuring in file ?f. 5. Remove duplicates from a list of words 6. Replace space by newline in file ?f 7. Replace newline by newline if file ?f 8. Replace tab by newline in file ?f 9. Compute regular expression for a word in a file 10. Compute linenumber of regular expression in a file The remaining sub-goals (except for 9) all have a UNIX operator associated with them so they cannot be further decomposed. The system has no rule for formulating a regular expression for a word so the user is prompted to provide one. Assuming he/she can, the system uses the operators associated with each sub-goal to produce the following final result. APU Example cont.

APU Example cont. • Final result: • tr -s ‘ ‘ ‘\012’ | • tr -s ‘ ‘ ‘\012’ | • sort | • uniq > /tmp/file622 | • WHILE read ?v624 • DO • grep -n [regular-expression] ?f | • awk $1 • tr ‘\012’ ‘ ‘ >> /tmp/file623 • DONE < /tmp/file622 • paste /tmp/file622 /tmp/file623

APU: Assets & Hindrances • Assets of interest: • Concept dictionary and rule base • Derivational analogy • Generative programming ability • User input • Hindrances of interest: • Rule base and concept dictionary are static

Problem Description • To automate software reuse in a typical high level language programming domain while overcoming the limitations of previous manual and automated reuse systems. • Sub problems: • To create a system which is not limited to a strict set of operators to work with in problem solving. Current reuse systems are limited in scope due to static sets of operators. • Embed the system with domain independent behavior so that as the system gains knowledge it will be applicable to an ever expanding set of problems. This is in contrast to the examples seen, which are applicable in only one domain. • Develop a method of Dynamic Abstraction so that the system is able to efficiently reuse newly learned code. Such a feature is necessary in order to implement new operators which are necessary in new domains.

Problem Description: Dynamic Abstraction • Requires a learning algorithm that is able to change more than the case base. • The abstract operators the system handles must be updated to make use of new concrete operators that require a different abstraction than previously offered. • Abstract and concrete hierarchies must be reorganized (extended) as required by new operators. • The translation algorithm that abstracts the problem representation must recognize any new operators introduced to the system so that new operators can be used to describe problems. (not sure of this yet)

Solution Overview • The exact form and operations of the solution have not yet been determined • Five sub-systems must be implemented to achieve the total solution: • Problem implementation • Abstraction scheme • Retrieval system • Adaptation/Generation system • Learning and system update algorithm

Solution Algorithm The tentative solution algorithm is as follows: 1 add( problems, problem description ) 2 while not_empty problems 3 cur_prob := Front(problems) 4 abstract_p := abstract(cur_prob) 5 case_p := retrieve (abstract_p) • if empty case_p and not decomposable (cur_prob) • generate_solution (cur_prob) 8 else if empty case_p 9 add_front(problems, decompose(cur_prob)) • else • append(solution_plan, adapt( case_p)) • end if 13 end while 14 compose_solution( solution_plan)

Problem Representation • Requirements: • ease of use and understanding, powerfully descriptive, easily abstracted • Possibilities • First order logic • Abstract operators • Specifications (formal/informal) • Some combination of the above (looks good) • Other

Abstraction Scheme • Requirements: • abstraction algorithm separate from structure, modifiable structure (tree[s]), expressive (for use in problem representations • Possibilities: • Abstraction hierarchy • Rule base • Combination • Other

Retrieval System • Requirements: • small, expressive • must incorporate new knowledge (representation of cases independent of abstraction, is this possible?) • easily adaptable components • distance/similarity measure • Possibilities: • Derivational traces, models (Britanik, Marefat),constraint matching (not even close on this one)

Adaptation/Generation System • Requirements • able to solve problems from first principles • able to accept user input • makes use of abstraction hierarchy so past experience is useful • Possibilities: • Use rule base for generation • Not much more here yet

Learning/Update Algorithm • Requirements: • Grow case base similar to other approaches (learn only new cases) • Recognize novel problems in order to update systems • Have knowledge of concrete/abstract hierarchies in order to make updates • Identify common operators that require the abstraction hierarchy to be changed • Possibilities: • Ideas?

Consider the example of sorting a file of student records. The file has lines made up of a students first name, last name, and gpa. The sorting is to occur based upon students’ last names. Assuming an APU-like problem definition structure, a possible expression of this problem is given to the right. INPUT: ?f :file OUTPUT: ?g :file PRECONDITION: (and (?l :line) (l has_elements(?fn :string ?ln :string ?m :float))) POSTCONDITION: (and (g contains (elements_of (f))) (sorted (elements_of (g)) wrt(ln)))) Example of Desired Reuse

Reuse Example • This is a difficult example due to the nature of the file data. We are sorting records not simply individual fields. Thus we either need a mechanism to sort objects like the line object we are using here or a rule that will show the system how to sort such objects. • Other concerns include how much knowledge the system currently has. Has it solved sorting problems before? Has it sorted non-primitive data-types before? For the sake of example, we will pick and choose the knowledge the system has to illustrate the desired attributes of the system.

Reuse Ex. Problem decomposition • To display as much of the behavior of the system as possible, assume that no case is retrieved for the initial problem. • Following our algorithm through, the initial problem is decomposed into these three sub_problems that are entered into the problems list: • 1) Retrieve lines of ?f • 2) Sort the lines according to last name • 3) Print the sorted lines to ?g

Retrieving lines from file • The first sub_problem that is encountered is the “retrieve lines of ?f” problem • The “retrieve line” structure of the problem leads the abstraction system to the RETRIEVE abstraction: RETRIEVE(?l :line ?f :file) • Assume that several cases are retrieved from the case base: getline( file, string ) getlines ( file, vector) getnlines (file, n , vector) • These cases represent retrieving one line, all the lines and a given number of lines from a file. In this case the system has the knowledge that all the lines from the file are to be retrieved so the second function is returned from the retrieve operation • Lastly the function must be instantiated with the correct variable names for this particular problem which occurs in the adapt operation. • This partial solution to the retrieve sub-problem is added to the solution_plan and the solution continues.

Sorting lines • The next problem in the problems list is the sorting lines problem. • At this point the system knows that it has a list of lines and must sort these lines using the second field ln (last name). • Now we assume that the system abstracts the sorting lines problem and does not return a matching case so the problem must be decomposed. • The problem is decomposed into generating a list of type Student and then sorting the list of Student objects • Now assume that the system cannot find a past case that changes a list of strings to a list of a non-primitive data-type. So the problem must be further decomposed to defining a data-type and then creating a list of objects of this data-type. • The student data-type is abstracted as DATA_TYPE(string, string, float)because no other information is known about the data-type.

Data-type retrieval • There may be several system-defined data-types with such class variables so a distance measure must be used to differentiate between them. • Consider the example data-types and their interfaces given below: public Student { string first_name; string last_name; float gpa; bool full_time; Public Student(); } public Employee{ string first_name; string last_name; float pay_raise_percentage; public Employee(); public Employee(string, string, float); }

Data-type retrieval • At first glance, the Student type seems the choice but a closer look shows the optimal choice would be Employee because of the constructor offered with all three variables as parameters. This class will need the fewest adaptations so it is chosen. • The only difference between the type we desire and what was retrieved is the variable names which are easily adapted (and in fact this is not necessary but is done for the sake of clarity for the user) and the extra boolean variable full_time, which is dropped from the data-type. • Now lines that are retrieved can be entered into a Student record. Next we must create a list of Student objects from the list of strings we have. For brevity let us assume that the algorithm iterates through with this sub-problem and generates a solution that loops through the contents of the list of lines and adds a corresponding Student object to a list being generated.

Sorting objects • The next sub-problem left in the problems list is to sort a list of Student objects. • This is abstracted as SORT (list, Student, ln) which we take to mean as “sort a list of Student records according to last name”. • Assume that no such operator exists since the Student type is newly defined and so no matching cases are found. Now the system cannot decompose the problem any further so it must generate a solution from scratch. • Assume a problem-solving rule is in the Rule Base for sorting objects: If there is no sorting algorithm for objects A then map objects A to objects B for which a sorting algorithm exists, sort, and apply an inverse mapping. • No such mapping can be retrieved from the system so the user is prompted whether he/she can define such a mapping. Let us assume a competent programmer for whom such a problem is well within their skill and the user returns yes. • The system completes the sub-problem by using a string sorting function and the mapping and inverse mapping functions, which the user will later define. Thus the sorting of lines problem has been solved.

Printing • The third sub-problem is easy to abstract: PRINT( list, file) • This abstracts the concept of printing a list of objects to a file line by line. • The abstraction retrieves many functions for printing to a file many of which are suitable. • Assume such an operator is used and that adaptation causes a loop to print a series of lines, with first_name, last_name, and gpa as their contents, to be printed to a file. • Now all of the problems have been solved and all that remains is for the solution to be composed.

Solution Composition • We have seen the system develop a plan which can be paraphrased as follows: 1. Get all of the lines from the input file and store them in a list 2. Store every string as a Student object in a new list 3. Map Student objects to strings in a list 4. Sort the list of Student-strings 5. Map Student-strings back to Students 6. Print Student list to the output file • Indeed this seems intuitive, a solution a programmer might come up with, but the overall solution must still be encoded to avail of this plan. In light of this necessity, the solution developed clearly has many holes but let us assume that the gaps in the plan can be filled in with other (numerous) sub-problems and subsequent sub-goals that arise out of the preconditions of the operators that are used or indeed from more robust problem descriptions or abstractions than have been described here. At any rate the final result should be a quite easily executed program that will accomplish what was desired. • All that now remains in our example is a discussion of the dynamic learning methods of the system

Issues • Filling in holes in adaptation structure: • Generating preconditions and sub-goals • Defining when rules are used as opposed to using past cases • How past cases are used in adaptation

Learning (onwards to dynamic abstraction) • The operations and data type that were retrieved and adapted in this problem were mostly trivial with the exception of the sorting scheme developed. I.e the operators are not new, except perhaps for the overall file sorting solution which might be added to the sorting hierarchy of operations. • Perhaps the most interesting solution derived is the the sorting of system or user-defined data types. A problem solving rule was used as no operators were available. However, in the future as more and more similar problems are encountered, the goal of the system is to be able to add such a SORT operation to the abstraction hierarchy, perhaps SORT_DATA_TYPE and subsequently concrete abstractions define the data type and the mapping that is required.

Future Work • Continue experimentation with reuse examples to determine the desired attributes and potential pitfalls of the system. • Begin to develop representation, abstraction, retrieval, adaptation, and learning algorithms.

Automated Software Reuse