Improving automatic abbreviation expansion within source code to aid in program search tools
Download
1 / 36

Improving Automatic Abbreviation Expansion within Source Code to Aid in Program Search Tools - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Improving Automatic Abbreviation Expansion within Source Code to Aid in Program Search Tools. Zak Fry. Outline. Problem and Motivation Automatically Identifying Abbreviation Expansions A Scoped Approach Analysis and Refinement: iScope Evaluations Conclusions. Maintenance Tasks.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Improving Automatic Abbreviation Expansion within Source Code to Aid in Program Search Tools' - herve


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Improving automatic abbreviation expansion within source code to aid in program search tools

Improving Automatic Abbreviation Expansion within Source Code to Aid in Program Search Tools

Zak Fry


Outline
Outline Code to Aid in Program Search Tools

  • Problem and Motivation

  • Automatically Identifying Abbreviation Expansions

  • A Scoped Approach

  • Analysis and Refinement: iScope

  • Evaluations

  • Conclusions


Maintenance tasks
Maintenance Tasks Code to Aid in Program Search Tools

  • 60-90% of software lifecycle

  • Problem: id where relevant code is – where changes need to be made

  • Code to perform a certain task can be very scattered

  • Causes difficulty for current maintenance search tools


Challenges coding practices
Challenges - Coding Practices Code to Aid in Program Search Tools

  • Identifier names important for code documentation and understanding

  • Problem: Programmers’ use of abbreviations in code

    • Frequency of occurrence

      • character, integer, string

    • Complex inheritance – long class names

      • SecureMessageServiceClientMessageImpl

  • Negates usefulness of identifier names and complicates program understanding


Abbreviations and maintenance tools
Abbreviations and Maintenance Tools Code to Aid in Program Search Tools

  • Problem: Search based maintenance tools rely on natural language

    • Abbreviations change the natural language

  • Search Term: “distributed hash”

    dht = (DHTPlugin)dht_pi.getPlugin(); Thread t = new AEThread( "DHTTrackerPlugin:init" ) { public void runSupport() { try{ if ( dht.isEnabled()){ log.log( "DDB Available" ); } }

    catch( Throwable e ){ log.log( "DDB Failed", e ); } ... }

    }


Automatically identifying abbreviation expansions
Automatically Identifying Abbreviation Expansions Code to Aid in Program Search Tools

  • First, how do we identify candidates for expansion?

    • Non-dictionary words

  • Abbreviation

    • Short form

  • Expansion

    • Long form


Types of non dictionary words
Types of Non-Dictionary Words Code to Aid in Program Search Tools


State of the art
State of the Art Code to Aid in Program Search Tools

  • Lawrie, Feild, and Binkley

    • Abbreviation Expansion

    • Problem:

      • Lack of precision

      • No support for choosing between multiple matches


Scoped approach
Scoped Approach Code to Aid in Program Search Tools

  • How to choose between multiple possible long forms:

    • By manual inspection we found correct long forms are more likely to be found in certain locations

    • Also, correctly identifying the long forms for certain types of abbreviations is easier than for others


Order of types
Order of Types Code to Aid in Program Search Tools


Order of program context
Order of Program Context Code to Aid in Program Search Tools


General algorithm
General Algorithm Code to Aid in Program Search Tools

Acronym

Prefix


Multiple matches
Multiple matches Code to Aid in Program Search Tools

  • We assume one best candidate though multiple might be present at the same level of scope

  • If multiple matches:

    • Examine frequencies

    • Stem long forms and reexamine frequencies

    • Broaden Scope and reexamine frequencies

    • Most frequent expansion


Most frequent expansion mfe
Most Frequent Expansion (MFE) Code to Aid in Program Search Tools

  • If still no ideal candidate is found:

    • We mined long forms from 1.5 million LOC of Java 5 code base

    • Return most frequent long form as last resort


Evaluation of scoped approach
Evaluation of Scoped Approach Code to Aid in Program Search Tools

  • 250 abbreviations from 5 subject programs

  • Gold standard developed by human developer inspecting the code manually

  • Implemented LFB according to description

    • Except combination words – due to missing database

(Accuracy)


Analysis and refinement iscope
Analysis and Refinement - iScope Code to Aid in Program Search Tools

  • Analyzed results and found 3 major sources of problems

  • Developed iScope by addressing these 3 major problem areas


Order of scoping
Order of Scoping Code to Aid in Program Search Tools

  • Problem:

    • Scoped approach ordering: examine every context for an abbreviation type then go to next type

      • Investigating broader contexts for one type before even the narrowest context for another type is likely to yield incorrect matches

Insight: Context is more sensitive than type

Solution: Check each type at each context level, then go to next context level (switch order)


Single letter abbreviations
Single Letter Abbreviations Code to Aid in Program Search Tools

  • Problem:

    • Developers use single letter abbreviations differently than multiple letter abbreviations

    • A large subset are actually semantically meaningless

    • Single letter very easily matched especially because prefix matching is greedy

Reader r = new BufferedReader()

Insight: Based on manual inspection, we found that meaningful single letter short forms were identifiers whose long forms were also their type name

Solution: Limit contextual scope to type only


Hyper common abbreviation
Hyper-Common Abbreviation Code to Aid in Program Search Tools

Problem: Some abbreviations used so often in code that long form rarely ever co-occurs leading to incorrect expansion based on coincidence

Solution: Mine a small set of extremely common abbreviations and use as a preprocessing step


Mined list of hyper common abbreviations
Mined list of hyper-common abbreviations Code to Aid in Program Search Tools


Evaluations
Evaluations Code to Aid in Program Search Tools

  • Is our method accurate enough to be useful?

    • Reevaluation of previous experiment

  • Does abbreviation expansion help maintenance tasks?

    • Simple Search

    • Concern Location Task


1 reevaluation of previous test
1. Reevaluation of Previous Test Code to Aid in Program Search Tools

  • Based on our previous experimental methodology and metrics, how much improvement was made from Scope to iScope?

  • Modified goldset based on new assumptions – single letter abbreviations


1 reevaluation of previous test results
1. Reevaluation of Previous Test - Results Code to Aid in Program Search Tools

Compare LFB with Scope and iScope using non combinational word (NCW) accuracy values

Compare JavaMFE, ProgMFE, Scope, and iScope using the total accuracy values


2 simple search evaluation
2. Simple Search Evaluation Code to Aid in Program Search Tools

  • When abbreviations are expanded in software, how many more search results are returned than without expansion?

  • Focus: Recall

    • Not missing important results – want as many potentially relevant results as possible

  • Metric: Percent increase in results

    • P.I. = Raw returned results with expansion - 100%

      Raw returned results without expansion


2 simple search evaluation cont
2. Simple Search Evaluation (cont) Code to Aid in Program Search Tools

  • Subjects: 215 concerns(Eaddy et al.) annotated by 3 people each for total of 645 queries

    • Developed independent of the idea of abbreviation expansion – many queries might not be affected by abbreviation expansion at all

  • “Match”: if any word in the query matches any word in the method considered a match and returned as a result


2 simple search evaluation results
2. Simple Search Evaluation - Results Code to Aid in Program Search Tools

  • Less increase with iScope – single letter abbreviation false positive decrease

  • Ideally, this means quality is better

    • experiment 3


3 evaluation with concern location
3. Evaluation with Concern Location Code to Aid in Program Search Tools

  • Concern location task: identification of methods that are deemed to be relevant for the given search term

  • How much increase in effectiveness can be gained from expanding abbreviations in source code when performing concern location tasks?


3 evaluation methodology
3. Evaluation Methodology Code to Aid in Program Search Tools

  • Tools: Latent Semantic Indexing(LSI) and Log Entropy-based concern location

    • Goals: Attempt to calculate similarity values based on location and frequency of potential query matches

  • Subjects: same as previous experiment


3 methodology cont
3. Methodology (cont) Code to Aid in Program Search Tools

  • Metric: Mean Average Precision (MAP)

    • Precision: # True positives / Total # of positives

    • MAP:

      • Collect precision values for every new true positive, going down the ranked returned results

      • Then take average of all results

    • Attempts to reward highly ranked true positives


3 concern location tasks results
3. Concern Location Tasks - Results Code to Aid in Program Search Tools


3 concern location tasks results1
3. Concern Location Tasks - Results Code to Aid in Program Search Tools


Conclusions
Conclusions Code to Aid in Program Search Tools

  • Abbreviation expansion is proven to be helpful in maintenance tools and processes

  • iScope approach improves upon Scope and greatly upon state-of-the-art


Future work
Future Work Code to Aid in Program Search Tools

  • Further refinement of expansion process to achieve highest possible accuracy

  • Full integration into maintenance tool

  • Extension into other programming languages


Acknowledgments
Acknowledgments Code to Aid in Program Search Tools

  • Emily Hill and Haley Boyd

  • Dr. Vijay K. Shanker and Dr. Lori Pollock


Questions
Questions? Code to Aid in Program Search Tools


Inherent inaccuracy
Inherent Inaccuracy Code to Aid in Program Search Tools

Problem: Additional errors in code not generalizable into solvable problems

Insight: There will always be inherent error when developing automatic systems for non-standard input


ad