1 / 60

Component Search and Retrieval

Component Search and Retrieval. Advanced Reuse Seminars Eduardo Cruz. Information Retrieval - 1948. Structured Documents Unstructured Documents No software documentation standard Semi-Structured Documents. Calvin Northrup Mooers. Mooers' Law: “An information retrieval system

denise
Download Presentation

Component Search and Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

  2. Information Retrieval - 1948 • Structured Documents • Unstructured Documents • No software documentation standard • Semi-Structured Documents Calvin Northrup Mooers

  3. Mooers' Law: “An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it,” 1959 Calvin Northrup Mooers

  4. Mass Production Software components [Mcllroy, 1968]

  5. “software industry is weakly founded, and that one aspect of this weakness is the absence of a software components subindustry” [McIlroy, 1968]

  6. “The storage and retrieval of software assets is nothing but a specialized form of information storage and retrieval” [Mili, 1998]

  7. Software Library • Browsing – Inspecting without a predefined criterion • Retrieval – Satisfy a predefined matching criterion

  8. Classification Scheme • Facet-based • Better than hierarchical classification • Manual classification different facets • Automatic classification • Controlled Vocabulary • Semantic information • Uncontrolled Vocabulary • Big software libraries • Little or no descriptors

  9. Recall and Precision • High Precision – Most retrieved elements are relevant • High Recall – Few elements left behind • Spreading Activation (Relaxed Search) – Related matches are retrieved • Coverage – The average number of assets that are visited over the total size of the library

  10. Asset Representation • Library representation is made in full knowledge of the artifact. User representation is made in ignorance of the artifact • Asset representation is purposefully abstract to capture important features while overlooking miner or irrelevant details • Asset's surrogate is used in retrieval literature

  11. Asset retrieval Goals • Exact retrieval – Black box reuse • Approximate retrieval – White box reuse • Generative modification – Reusing the design • Compositional modification – using building blocks of the retrieved asset

  12. Usually non included information • Interface description • Non-functional requirements • Interoperability

  13. Situational Model x System Model Component retrieval model [Lucrédio et. al, 2004]

  14. “Repository representation is made in full knowledge of the artifact at hand” “User representation is made in ignorance of the artifact” [Mili, 1998]

  15. Scott Henninger

  16. Tools

  17. Web Delphi Search Engine Ispey CSourceSearch.net (2004) Gonzui SourceBank Koders (2004) Codase (2005) Aplications Agora (1998) Codebroker (2002) Koders Enterprise (2004) Maracatu (2005) Component Search Tools

  18. Delphi Search Engine

  19. Ispey.com

  20. Filter SPARS-J – (2003)

  21. SourceBank Filter

  22. CSourceSearch.Net – (2004)

  23. Koders.com – (2004)

  24. CODASE – Launched Sep 9, 2005 Multiple Search Options Example Searches Browsing “…based on the number of people in your company, starting from $5,000 USD”

  25. CODASE - Browsing

  26. Other Tools

  27. JavaBeansAgent JavaBeansAgent JavaBeansAgent JavaBeansIntrospector JavaBeansIntrospector JavaBeansIntrospector INTERNET AGORA - Location and Indexing (1998) INDEX AltaVistaSearchIndex Server Filter AltaVista Query Server Web Server

  28. Component Rank (1998) 0.4 0.2 0.2 V1 V2 D12 = 0.5 D13 = 0.5 0.2 D23 = 1 0.2 0.4 V3 Nodes v Edges e Graph G Weight w Distribution Ratio d D31 = 1 0.4

  29. “Classes defining data structures and their containers are highly ranked”

  30. V3 V7 V2 V6 V1 V4 V5 V’3 V7 Clustered Component Graph V’26 V’14 V’5 V1 ≡ V4 , V2 ≡ V6

  31. V3 V7 V2 V6 V1 V4 V5 NO MORE MULTIPLE DISCONNECTED COMPONENTS

  32. .java file ≡ component Component Rank System Architecture INPUT (1) Similarity Measurement (3) Use Relation Extraction (2) Clustering (4) Component Graph Construction (6) De-Clustering to Original Component Graph (5) Component Rank Computation by Repetition OUTPUT Order of Weights ≡ Component Rank of .java files

  33. A A X X’ A’ X’ A’ B’ Y’ Y B’ B Y’ B 1/4 1/4 Simple Copied Components Copied Components Other Components 1/4 1/4 Clustering Before Weight Computation 1/6 1/3 Non-clustered component Graph 1/6 1/3 Clustering After Weight Computation

  34. DO NOT COUNT SIMPLY DUPLICATED COMPONENTS

  35. A A X X’ X’ B C Y Y’ Y’ 2/5 1/5 A Copied AND MODIFIED Components Copied and Modified Components Original Components Other Components B’ C’ 1/5 1/5 1/5 Clustering Before Weight Computation 1/3 1/5 A’ Non-clustered component Graph B’ C’ 1/6 1/6 1/6 Clustering Before Weight Computation

  36. Beyond Searching and Browsing • Searching and browsing • Require users to initiate the information seeking process • Information access and Information Delivery

  37. CodeBroker – (2001) • Components repositories are often so large that software developers cannot learn about all of the components • Component repositories are not static • New components added • Old components updated • Context-Aware browsing

  38. May not have suficient knowledge about the reuse repository • May perceive that reuse costs more than developing from scratch • May not be able to use the repository by formulating a proper query • May not be able to understand the found components

  39. L4: Entire Information Space Information Islands Belief Vaguely Known Well Known Unknown components

  40. L4: Entire Information Space CodeBroker L3: Belief L2: Vaguely Known L1: Well Known Information Use: L1 – Use by Memory L2 – Use by Recall L3 – Use by Anticipation L4 – Use by Delivery Already Known Components Task Relevant Information Irrelevant Components

  41. Program Aspects • Concept • Formal • Informal • Indentation, comments, identifier names (semantic) • Executability • Code • Constraint environment • Signature

  42. Information delivery • Feedback • After execution of the action • Feedforward • Affects the execution of the action

  43. Information delivery • Interruptive • Noninterruptive

  44. Latent Semantic Analysis (LSA) • Synonymy • Polysemy • “Text documents and queries are represented as vectors in the semantic space, based on the words contained and the similarity between a query and a document is determined by the distance of their respective vectors”

  45. Comments signature Discourse model User model

  46. Koders Enterprise – (2004)

  47. M.A.R.A.C.A.T.U. – Modern Architecture for Retrieving All Components At The Universe (2005)

More Related