1 / 33

Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers. Raphael Hoffmann, James Fogarty, Daniel S. Weld University of Washington, Seattle UIST 2007. Programmers Use Search. To identify an API To seek information about an API

brad
Download Presentation

Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University of Washington, Seattle UIST 2007

  2. Programmers Use Search • To identify an API • To seek information about an API • To find examples on how to use an API Example Task: “Programmatically output an Acrobat PDF file in Java.”

  3. Example: General Web Search Interface

  4. Example: Code-Specific Web Search Interface

  5. Problems • Information is dispersed: tutorials, API itself, documentation, pages with samples • Difficult and time-consuming to … • locate required pieces, • get an overview of alternatives, • judge relevance and quality of results, • understand dependencies. • Many page visits required

  6. With Assieme we … • Designed a new Web search interface • Developed needed inference

  7. Outline • Motivation • What Programmers Search For • The Assieme Search Engine • Inferring Implicit References • Using Implicit References for Scoring • Evaluation of Inference & User Study • Discussion & Conclusion

  8. Six Learning Barriers faced by Programmers (Ko et al. 04) • Design barriers — What to do? • Selection barriers — What to use? • Coordination barriers — How to combine? • Use barriers — How to use? • Understanding barriers — What is wrong? • Information barriers — How to check?

  9. Dataset • 15 million queries and click-through data • Random sample of MSN queries in 05/06 Procedure • Extract query sessions containing ‘java’ – 2,529 • Manual looking at queries and defining regex filters • Informal taxonomy of query sessions Examining Programmer Web Queries Objective • See what programmers search for

  10. Examining Programmer Web Queries

  11. Examining Programmer Web Queries 64.1 % 35.9 % Descriptive Contain package, type or member name “java JSP current date” “java SimpleDateFormat” Selection barrier Use barrier Contain terms like “example”, “using”, “sample code” 17.9 % “using currentdate in jsp” Coordination barrier

  12. Assieme relevance indicated by # uses documentation examplecode Summaries show referenced types links torelated info requiredlibaries

  13. Challenges ? How to put the right information on the interface • Get all programming-related data • Interpret data and infer relationships

  14. Outline • Motivation • What Programmers Search For • The Assieme Search Engine • Inferring Implicit References • Using Implicit References for Scoring • Evaluation of Inference & User Study • Discussion & Conclusion

  15. Assieme’s Data Pages with code examples JavaDoc pages JAR files … is crawled using existing search engines Downloaded libraryfiles for all projects onSun.com, Apache.org,Java.net, SourceForge.net Queried Google on “java ±import ±class …” Queried Google on “overview-tree.html …” ~79,000 ~480,000 ~2,360,000

  16. ?  The Assieme Search Engine … infers 2 kinds of implicit references JAR files   Pages with code examples JavaDoc pages Uses of packages, types and members Matches of packages, types and members

  17. unclear segmentation code in a different language (C++) distracting terms ‘…’ in code line numbers Extracting Code Samples

  18. Extracting Code Samples  remove HTML commands,but preserve line breaks <html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html> <html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html> A simple example:import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}back A simple example:import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}back A simple example:1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }back A simple example:1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }back  remove some distracters by heuristics  launch (error-tolerant) Java parser at every line break (separately parse for types, methods, and sequences of statements)

  19. Resolving External Code References Naïve approach of finding term matches does not work:1 import java.util.*; 2 class c { 3 HashMap m = new HashMap(); 4 void f() { m.clear(); } 5 } Reference java.util.HashMap.clear() on line 4 only detectable by considering several lines ?  Use compiler to identify unresolved names

  20. Compile & lookup unresolved names index lookup JARfiles compile JARfiles greedily pickbest JARs put on classpath Resolving External Code References • Index packages/types/members in Jar files java.util.HashMap.clear() java.util.HashMap … Utility function: # covered references (and JAR popularity)

  21. … do not work well for code, because: • JAR files (binary code) provide no context • Source code contains few relevant keywords • Structure in code important for relevance Scoring • Existing techniques … • Docs modeled as weighted term frequencies • Hypertext link analysis (PageRank)

  22. and structure in code code references Using Implicit References to Improve Scoring • Assieme exploits structure on Web pages HTML hyperlinks

  23. Scoring APIs (packages/types/members) Web pages

  24. Web Pages • Use fully qualified references (java.util.HashMap) and adjust term weights • Filter pages by references • Favor pages with accompanying text Scoring APIs • Use text on doc pages and on pages with code samples that reference API (~ anchor text) • Weight APIs by #incoming refs (~ PageRank)

  25. Outline • Motivation • What Programmers Search For • The Assieme Search Engine • Inferring Implicit References • Using Implicit References for Scoring • Evaluation of Inference & User Study • Discussion & Conclusion

  26. Reference Resolution • Recall 89.6%, Precision 86.5% • False positives: Fisheye and diff pages • False negatives: incomplete code samples Evaluating Code Extraction and Reference Resolution … on 350 hand-labeled pages from Assieme’s data Code Extraction • Recall 96.9%, Precision 50.1% ( 76.7%) • False positives: C, C#, JavaScript, PHP, FishEye/diff • (After filtering pages without refs: precision 76.7%)

  27. Participants • 9 (under-)graduate students in Computer Science User Study Assieme vs. Google vs. Google Code Search Design • 40 search tasks based on queries in logs:query “socket java”  “Write a basic server that communicates using Sockets” • Find code samples (and required libraries) • 4 blocks of 10 tasks: 1 for training + 1 per interface

  28. User Study – Task Time F(1,258)=5.74 p ≈ .017 * significant F(1,258)=1.91 p ≈ .17

  29. User Study – Solution Quality 0 seriously flawed .5 generally good but fell short in critical regard 1 fairly complete F(1,258)=55.5 p < .0001 * F(1,258)=6.29 p ≈ .013 *

  30. User Study – # Queries Issued F(1,259)=6.85 p ≈ .001 * F(1,259)=9.77 p ≈ .002 *

  31. Outline • Motivation • What Programmers Search For • The Assieme Search Engine • Inferring Implicit References • Using Implicit References for Scoring • Evaluation of Inference & User Study • Discussion & Conclusion

  32. Discussion & Conclusion • Assieme – a novel web search interface • Programmers obtain better solutions, using fewer queries, in the same amount of time • Using Google subjects visited 3.3 pages/task, using Assieme only 0.27 pages, but 4.3 previews • Ability to quickly view code samples changed participants’ strategies

  33. Thank You Raphael HoffmannComputer Science & EngineeringUniversity of Washingtonraphaelh@cs.washington.edu James FogartyComputer Science & EngineeringUniversity of Washingtonjfogarty@cs.washington.edu Daniel S. WeldComputer Science & EngineeringUniversity of Washingtonweld@cs.washington.edu This material is based upon work supported by the National Science Foundation under grant IIS-0307906, by the Office of Naval Research under grant N00014-06-1-0147, SRI International under CALO grant 03-000225 and the Washington Research Foundation / TJ Cable Professorship.

More Related