1 / 33

A Framework for Examning Topical Locality in Object-Oriented Software

A Framework for Examning Topical Locality in Object-Oriented Software. 2012 IEEE International Conference on Computer Software and Applications p76004546 江怡岑 P76004685 王于庭. OUTLINE. Introduction Background & Related work Framework Dataset and Experimental Procedure

blythe
Download Presentation

A Framework for Examning Topical Locality in Object-Oriented Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for ExamningTopical Locality in Object-Oriented Software 2012 IEEE International Conference on Computer Software and Applications p76004546 江怡岑 P76004685 王于庭

  2. OUTLINE • Introduction • Background & Related work • Framework • Dataset and Experimental Procedure • Static analysis results • Conclusions

  3. INTRODUCTION • Program comprehension is a key developer activity during software maintenance. • Topic models : rely on lexical information to identify topics that are semantically related to high-level domain concepts. • LSI ( latent semantic indexing ) • LDA ( latent Dirichlet allocation )

  4. INTRODUCTION • While topics reflect semantic relatedness, it is believedthat human evolves spatial cognition strategiesto navigatethe code base. • for object-oriented (OO) systems built on the principle ofencapsulation, the entities should be spatially organized in away that reflects the topics of software

  5. INTRODUCTION • the tenet of “topical locality” • spatial relatedness entails semantic relatedness • So basic that in many cases it is not mentioned • When the tenet is mentioned, its validity is not measured explicitly. • our goal is to measure the extent to which this key tenet holds for OO systems. • propose a framework to examine what extent three relationships of topical locality hold in large-scale open-source projects.

  6. BACKGROUND and Related Work • A. Way-finding in Code Base • B. Relating Spatial and Semantic Cues • C. Topical Locality Applied in Software Engineering Tools

  7. BACKGROUND and Related Work • A. Way-finding in Code Base • Developer comprehending a code base can therefore be thought of as continually trying to answer way-finding questions. • Moonenhas examined way-finding in soft-ware and extended the concept of legibility to software.

  8. BACKGROUND and Related Work • B. Relating Spatial and Semantic Cues • We are interested in the interplay of different cues so that they can be effectively synthesized. • We focus on therelationship between two types of cues. • Spatial. • Semantic. • Spatial + Semantic = “topical locality” • the software entities should be neither randomly named nor randomly placed. • Source code entities should be spatially organized to reflect the semantics of software.

  9. BACKGROUND and Related Work • C. Topical Locality Applied in Software Engineering Tools • The idea of topical locality plays an important role in building a number of software engineering tools. • Survey three tools • Code Indexers • Code Visualizers • Code Summarizers

  10. BACKGROUND and Related Work • Code Indexers • An indexer takes source code and generates profiles of the code for later searching • Should index header comments ? • we want to address how well name and header comments represent the target code entity’s topic.

  11. BACKGROUND and Related Work

  12. BACKGROUND and Related Work • Code Visualizers • Once a relevant code line is located,its surroundings provide valuable contextual information for the developer • examining topical locality of acontiguous fragment allows us to assess to what extent thecode line indicates the topic of its surroundings.

  13. BACKGROUND and Related Work

  14. BACKGROUND and Related Work • Code Summarizers • A summarizer generates a snapshot ofthe source code in order to reduce the costfor developersto read and understand the staggering amount of softwarerepository information • Our contribution is to measure the degree of topical locality of the snapshot

  15. BACKGROUND and Related Work

  16. FRAMEWORKoverview • Framework Overview

  17. FRAMEWORKresearch questions • Research questions • RQ1 : Which better conveys class body’s topic: class name, header comments, or a combination of both? • RQ2 : Can a code line indicate its surrounding’s topic? • RQ3 : Can a contiguous code fragment serve as a snapshot of the entire class?

  18. FRAMEWORKmethod • independent variables are concerned with identifying spatial relationships • dependent variable is about the semantic relatedness • Three measures: • TFIDF cosine similarity • query term probability • document overlap • We treat source code as document • output score in the range [0, 1]

  19. FRAMEWORKthree measures (1/3) • TFIDF scheme – text mining model • 𝑞𝑖 = 𝑡𝑓𝑖(𝑄)×𝑖𝑑𝑓𝑖 • 𝑤𝑖 = 𝑡𝑓𝑖(𝑊)×𝑖𝑑𝑓𝑖 • 𝑡𝑓𝑖 refers to the term frequency of 𝑡𝑒𝑟𝑚𝑖 • 𝑖𝑑𝑓𝑖 is the inverse document frequency, 𝑖𝑑𝑓𝑖 = 𝑙𝑜𝑔2(𝑡+1/𝑑𝑓𝑖), where 𝑡is the total number of documents in the corpus and 𝑑𝑓𝑖 is the number of documents in which 𝑡𝑒𝑟𝑚𝑖 occurs.

  20. FRAMEWORKthree measures (2/3) • Query term probability • measures the likelihood of a termin the query/source being present in the target document.

  21. FRAMEWORKthree measures (3/3) • Document overlap • a set-based measure that quantifiesthe amount of overlap between two documents Q and W

  22. Dataset and Experimental Procedure • LOC : the lines of code • COM : the lines of comments • CCs : the number of classes

  23. Dataset and Experimental Procedure • Use a source code indexer to process the code base of the selected projects. • The indexing process results in the profiles that store partial and important information from the source code. • We calculate the three semantic relatedness measures (TFIDF-Cos, Proband Overlap) based on the profiles.

  24. RQ1 • Can class name (N) and/or header comment (H)convey the topic of class body(B) ? • Calculate the lexical similarity for (N,B), (H,B), (NH,B)

  25. RQ2 • Can a code line indicate the topic of its surroundings? • For randomly selected code line(L), we take a contiguous code fragment of 30 lines as its surroundings (S) and select from the same file another 30-line contiguous code fragment(R) • Compare the lexical similarity of (L,S) with that of (L,R) • Those classes with at least 70 LOC are considered.

  26. RQ3 • Can a contiguous code fragment serve as a snapshot of entire class? • Form a code search perspective, the lexical similarity of the snapshot should indicate the topical closeness of the classes • Randomly select a term w(‘data’ in Fig.4) to act as query keyword. The snapshot is extracted as 30-line contiguous code fragment. • Only consider classes with at least 60 LOC.

  27. Static Analysis Results • RQ1 : Name vs. Header • RQ2:Code Line and Surroundings • RQ3: Contiguous Fragment as a Snapshot • Threats to Validity

  28. RQ1 : Name vs. Header • NH is the closet to B in most cases, expect MegaMek when measured by TFIDF, whereNBis larger than HB and NHB. => MegaMek classes do not have useful header comments.

  29. RQ1 : Name vs. Header • Least Significant Distance(LSD) multiple comparison test: a test places the combinations significantly different from others in separate groups, and allocates the best combination to ‘group A’. • The result classifies NH-B into ‘group A’, indicating that the similarity score of NH-B is significantly higher than N-B and H-B. • We conclude that if the class contains useful header comments, then it is important to combine the header comments with the class name in order to convey the topic of the class body.

  30. RQ2:Code Line and Surroundings • A code line indicates the topic of its surroundings more than it indicates the topic of a random code fragment.

  31. RQ3: Contiguous Fragment as a Snapshot • We calculate the Pearson correlation coefficient, which is a parametric statistic that shows the correlation between two variables. • From the viewpoint of distinguishing the topics of different classes, a contiguous code fragment can serve as a snapshot of the entire class.

  32. Threats to Validity • Construct Validity: the selection of 30-line contiguous, non-empty, and comments-inclusive code fragment for addressing RQ2 and RQ3. • Empty lines contribute little to spatial and semantic information. All comments is a choice influenced by RQ1. • Internal validity : using three measures derived form different mathematical models diminished the measuring bias. • External validity : this analysis may not generalize to other software projects.

  33. Conclusions • In this paper, we contributed a novel experimental framework for testing this tenet of “topical locality” and applied the framework to provide empirical evidence of topical locality in large-scale OO systems. • Our future work includes carrying out more empirical studies to examine other topical locality instances. • It is important to integrate the theoretical understandings and empirical findings to enhance the practical tool support for software developers.

More Related