1 / 17

Multi-Abstraction Concern Localization

Multi-Abstraction Concern Localization. Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University. Motivation. Concern Localization Locating code units that match a text descriptions Text descriptions: bug reports or feature requests

isabel
Download Presentation

Multi-Abstraction Concern Localization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University

  2. Motivation • Concern Localization • Locating code units that match a text descriptions • Text descriptions: bug reports or feature requests • Code units: classes or methods’ source code • Documents are compared • Based on words (IR) or topics (topic modeling) that they contain  compared at one level of abstraction i.e. word/topic level

  3. Motivation • A word can be abstracted at multiple levels of abstraction. European Continent Level N Western Europe … Level 3 Netherlands Level 2 North Brabant Eindhoven Level 1

  4. Multi-Abstraction Concern Localization compare Level N Level N … … Level 3 Level 3 Level 2 Level 2 Level 1 Level 1 Bug Report or Feature Request Source Code

  5. Multi-Abstraction Concern Localization • Locating code units that match a textual descriptions • By comparing documents at multiple abstraction levels. • By leveraging multiple topic models • 3 main components • Text preprocessing • Hierarchy creation • Multi-abstraction retrieval technique

  6. Concerns Method Corpus Overall framework Preprocessing Hierarchy Creation Level 1 Level 2 …. Level N Abstraction Hierarchy + Multi-Abstraction Retrieval Standard Retrieval Technique Ranked Methods Per Concern

  7. Hierarchy Creation • We apply Latent Dirichlet Allocation (LDA) a number of times • LDA (with default setting) accepts • Number of topics K • A set of documents • LDA returns • K topics, each is a distribution of words • Probability of topic t to appear in document d

  8. Hierarchy Creation • Each application of LDA creates a topic model with K topics • Assigned to a document • Corresponds to an abstraction level • Abstraction hierarchy of height L • Height = number of topic models • Created by L LDA applications

  9. Multi-Abstraction Vector Space Model • Multi-Abstraction Vector Space Model (VSM) • Standard VSM + Abstraction Hierarchy • In standard Vector Space Model • Document is represented as a vector of weights • Each element corresponds to a word • Its value is the weight of the word • Term frequency-inverse document frequency (tf-idf)

  10. Multi-Abstraction Vector Space Model • We extend document vectors • Added elements: • Topics of topic models in the abstraction hierarchy • Their values are the probabilities of the topics to appear in the documents • Example: • Document vector has length of 10 • Abs. hierarchy has 3 topic models of size 50,100,150 • Extended document vector is of size: 10+ (50+100+150) = 310

  11. Experiments • Dataset: • 285 AspectJ faulty versions extracted from iBugs • Evaluation Metric: • Mean Average Precision (MAP)

  12. Empirical Result • The MAP improvement of H4 is 19.36% • The MAP is improved when the height of the abstraction hierarchy is increased

  13. Empirical Result Number of concerns with various Improvements:  The improvements are positive for most of the concerns

  14. Conclusion • We propose a multi-abstraction concern localization framework • We also propose a multi-abstraction vector space model • Our experiments on 285 AspectJ bugs show that MAP improvement is up to 19.36%

  15. Future work • Extend experiments by investigating: • Different numbers of topics in each level of the hierarchy • Different hierarchy heights • Different topic models

  16. Future work • Analyze the effects of document lengths: • For different number of topics • For different hierarchy heights • Experiment with Panichella et al. ‘s method [1] to infer good LDA configurations for our approach • [1] A. Panichella, B. Dit, R.Oliveto, M.D. Penta, D. Poshyvanyk, and A.D Lucia. How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. (ICSE 2013)

  17. Thank you! Questions? Comments? Advice? {btdle.2012, shaoweiwang.201, davidlo}@smu.edu.sg

More Related