1 / 25

What Happens After the Search? User Interface Ideas for Information Retrieval Results

This article explores user interface ideas for enhancing information retrieval results. It discusses the role of graphics in displaying retrieval results and provides examples of techniques such as TileBars and Scatter/Gather. The article also addresses challenges with short queries and offers strategies for improving user understanding of result sets.

jmildred
Download Presentation

What Happens After the Search? User Interface Ideas for Information Retrieval Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Happens After the Search?User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC

  2. Repositories Goals Workspace Search is only part of the Information Analysis Process

  3. Outline A. Background: Search and Short Queries The role of Graphics in Retrieval Results B. Throw light on Retrieval Results by Showing Context 1. Context of query terms in docs (TileBars) 2. Inter-document context (Scatter/Gather) C. Initial attempts at Evaluation D. Conclusions

  4. Search Results(Scope of this work) • “Ad hoc” searches • Unanticipated, not tried previously • As opposed to filtering, monitoring • External collections • Personal information spaces probably require special consideration • Naïve users • As opposed to intermediaries • Full text, general document collections

  5. Search Goal Types ANSWER A QUESTION • How old is Bob Dole? • Who wrote Dante’s Inferno? FIND A PARTICULAR DOCUMENT • Dan Rose’s Home Page • IRS Form 1040 Schedule D MAKE AN INFORMED OPINION / ASSESS TIMELY SITUATION • What are the tradeoffs and side effects of this treatment? • Should I wait for the new CD technology? • How will Apple’s new CEO affect sales next quarter? GET BACKGROUND INFORMATION • How to plant annual bulbs in Northern California • What aspects of 3D compact disk technology are patented?

  6. What is the Goal of the Search? Different goal types require different collections and different search techniquesand different retrieval result display strategies E.g., a question should receive an answer rather than a document • Focus of this work: • General, ill-defined queries • General collections • Naïve, or inexperienced, users

  7. Problems with Short Queries TOO MANY DOCUMENTS • If only a few words supplied, there is little basis upon which to decide how to order the documents • The fewer words there are, the less they serve to mutually disambiguate one another

  8. Why Short Queries? THE USERS DON’T KNOW • What they want • How to express what they want • How what they want is expressed in the collection LONG QUERIES CAN BACKFIRE • If ANDing terms, get empty results • If ORing terms, get matches on useless subsets of terms • If using Similarity Search, can’t specify important terms • Some search engines can’t handle long queries

  9. R1 g1(t) Balancing Text and Graphics Graphics and animation are very useful for summarizing complex data However, text content is difficult to graph THE CHALLENGE: HOW TO COMBINE GRAPHICAL AND TEXTUAL REPRESENTATIONS USEFULLY?

  10. “Fixing” Short Queries:Help Users Understand Result Sets TWO APPROACHES, FROM TWO DIRECTIONS Context of Query Terms (within documents) Inter-document Context Show info about many docs simultaneously

  11. Showing Context of Query Terms • Existing Approaches: • Lists of titles + ranks • This augmented with other meta-information • This augmented with how often each search term occurred • Graphical display of which subset of query terms occurred

  12. Brief Summaries

  13. List Query Terms

  14. A B C D Idea: ShowWhich Terms Occur How Often • Problem: Which words did what? • Solution: One symbol per term Term B was most frequent, followed by Term A and Term D. Term C did not appear.

  15. 1 2 3 4 5 A A B B C C D D Represent Document Structure • Recognize the structure of the document • Represent this structure graphically • Simultaneously display representation of query term frequencies and doc structure • Term distribution becomes explicit • Many docs’ info can be seen simultaneously

  16. Add Structure to the Query Problem: Only room for a few terms Solution: Structure the Query • A list of Topics • Can be category labels, lists of synonyms, etc. • Translated into Conjunctive Normal Form • User doesn’t need to know this • No need for special syntax • Allows for a variety of ranking algorithms • Creates a feedback loop between query structure and graphical representation

  17. Graphical Landscapes D A Problems: • No Titles! • No Frequencies! • No Document Lengths! • No Nearness/Overlap! • Each document classified only one way! B C BB DD CC AA EE

  18. “Fixing” Short QueriesOther Techniques • Term Expansion • Relevance Feedback • Category Information

  19. Short Queries:Imprecisely Understood Goals A tack to be pursued in future: Identify the goal type. Then • Suggest a relevant collection • Suggest a search strategy • Suggest links to sources of expertise • Create information sources tailored to the goal type A more general, but less powerful tack: Provide useful descriptions of the space of retrieved information

  20. Dealing with Short Queries • Using Text Analysis to find Context • Finding a Useful Mix of Text and Graphics • TileBars Query-document context Shows structure of document and query Compact: many docs compared at once • Scatter/Gather Clustering Inter-document Context Shows summary information textually Uses state/animation for relationships among clusters • Add simple structure to the Query Format • Future work: incorpore into Workspace / SenseMaking environment (e.g., Information Visualizer, Card et al.)

  21. Background:A Brief History of IR • Card Catalogs: Boolean Search on title words and subject codes • Abstracts and Newswire: Vector Space Model, Probabilistic Ranking, and “Soft” Boolean • Full Text (NIST TREC): Vector Space and Probabilistic Methods on very long queries • WWW: Boolean+ Search on Short Queries

  22. Naïve Users Write Short Queries • 88% of queries on the THOMAS system (Congressional bills) used <= 3 words (Croft et al. 95) • Average query length is 7 words on the MEAD news system (Lu and Keefer 94) • Most systems perform poorly on short queries on full-text collections (compared to long queries) (Jing and Croft 94, Voorhees 94)

  23. The Vector Space Model • Represent each document as a term vector • If term does not appear in doc, value = 0 • Otherwise, record frequency or weight of term • Represent the query as a similar term vector • Compare the query vector to every document vector • Usually some variation on the inner product • Various strategies for different aspects of normalization Probabilistic models: approximately the same idea, but try to predict the relevance of a document given a query

  24. Conclusions In a general search situation: • We can organize large retrieval results collections for user viewing and manipulation (Scatter/Gather) • We can show, compactly and informatively, the patterns of distributions of query terms in retrieved documents (TileBars) • We need still more powerful ways to reveal context and structure of retrieval results • Future: get a better understanding of the user goals in order to build better interfaces

  25. Term Overlap • Problem: Several query terms appear… • … but have nothing to do with one another. Out, damned spot! … … … Throw physics to the dogs, I’ll none of it. … … He has kill’d me, Mother. Runaway, I pray you!

More Related