1 / 32

SIMS 296a-3: Aids for Source Selection

SIMS 296a-3: Aids for Source Selection. Carol Butler Fall ‘98. Outline. IA Interfaces Design Principles Aids for Source Selection SavvySearch HITS Kohonen maps Implications for New Research. IA Interface should help User:. Express information needs and/or formulate queries.

ebecker
Download Presentation

SIMS 296a-3: Aids for Source Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIMS 296a-3:Aids for Source Selection Carol Butler Fall ‘98

  2. Outline • IA Interfaces • Design Principles • Aids for Source Selection • SavvySearch • HITS • Kohonen maps • Implications for New Research Carol Butler Fall 98

  3. IA Interface should help User: • Express information needs and/or formulate queries. • Select among available sources. • Understand search results. From: User Interfaces and Visualization, by Marti A. Hearst, 1998. Carol Butler Fall 98

  4. IA Interface should allow User to: • Reassess goals and adjust search strategy. • Follow trails with unanticipated results. • Monitor the progress of a search strategy. • Use output of one action as input to the next. From: User Interfaces and Visualization, by Marti A. Hearst, 1998. Carol Butler Fall 98

  5. Role of Visualization: • Communicate more rapidly and effectively. • Techniques • icons and color highlighting • brushing and linking • panning and zooming • focus-plus-context • animation • Interactivity From: User Interfaces and Visualization, by Marti A. Hearst, 1998. Carol Butler Fall 98

  6. “Visualization of inherently abstract information is more difficult, and visualization of textually represented information is especially challenging.” From: User Interfaces and Visualization, by Marti A. Hearst, 1998. Carol Butler Fall 98

  7. Starting Points for Search • Lists of sources (Lexis-Nexis) • Overviews • Clusters • Category Hierarchies/Subject Codes • Co-citation Links • Examples • Automatic source selection Carol Butler Fall 98

  8. Last Week’s Readings • Overviews via Category Hierarchies • HIBROWSE (Pollitt 97) • Cat-A-Cone (Hearst 97) Carol Butler Fall 98

  9. Today’s Readings • Automatic Source Selection • SavvySearch (Howe & Dreilinger 97) • Overviews via co-citation hyperlinks • HITS (Kleinberg et al. 97) • Overviews via clusters • Kohonen maps (Chen et al. 97) Carol Butler Fall 98

  10. SavvySearch • Addresses problems with meta-search engines. • reduce burden on user … but • may waste computational and Web resources • Carefully selects search engines likely to return useful results. Carol Butler Fall 98

  11. Options provided by interface • Sources and types of information. • Treatment of query terms. • Display of results. • Interface language. • View interface. Carol Butler Fall 98

  12. Query Processing • Reasoning about available resources • modify concurrency (number of search engines queried in parallel) • network load estimates (lookup table, time) • local CPU load (UNIX uptime command) • Ranking search engines • learned associations between search engines and query terms (stored in a meta-index) • recent data on performance Carol Butler Fall 98

  13. Meta-Index • No Results • search engine failed to return links • reduces confidence that this engine is appropriate for particular query • effectiveness values are reduced • Visits • number of links explored by user • indicates user found some links to be interesting and increases confidence Carol Butler Fall 98

  14. Future Development • Meta-search will need to be personalized and embedding in other systems. • Experimental version divides search into categories, with separate sets of rules for creating a search plan. • Web Indexes • Web Directories • Usenet News • Software • People • Reference • Entertainment • Technical Reports Carol Butler Fall 98

  15. Hyperlink-Induced Topic Search (HITS) • System for locating authoritative web sources • Two premises: • Implicit annotation provided by creators of hyperlinks contains sufficient information to infer a notion of “authority. • Sufficiently broad topics contain embedded communities of hyperlinked pages. Carol Butler Fall 98

  16. HITS • Two types of pages • Authorities • highly referenced pages on the topic • Hubs • pages that “point” to many of the authorities • Mutually reinforcing relationships • Starts from a user-supplied query Carol Butler Fall 98

  17. HITS method • Base set of pages returned by search engine • Add pages that point to, or are pointed to by, any page in base set • Assign each page a hub weight h(p) and authority weight a(p) (initialize to 1) • For each page: • Replace a(p) by the sum of the h()’s of all pages pointing to it • Replace h(p) by the sum of the a()’s of all pages pointed to by it • Repeat Carol Butler Fall 98

  18. HITS results • Broad topics tend to have robust structure • astrophysics • Michael Jordan • Generalizes topics not sufficiently broad • Dennis Ritchie • Density of linkage on a topic influences authority/hub structure • English literature vs. German literature • Web-centric topics • cryptography • Commercialization • tennis Carol Butler Fall 98

  19. Future Development • Study temporal evolution of communities on the Web. • Combining text and the structure of hyperlinks. • text within <href> • text near hyperlink • CLEVER project at IBM Almaden Research Center Carol Butler Fall 98

  20. Automatically Generated Concept Space (Kohonen map and ET-Space Thesaurus) • IR users need: • Working knowledge of the system where the information is stored • how to navigate • how info is categorized or organized • Knowledge of the subject of interest • particularly the vocabulary of the subject domain Carol Butler Fall 98

  21. Browsing vs. Searching • Browsing • users rely on mental models • embedded digression problem • Searching • content-based • two basic approaches • keyword search • combined keyword search and categorization • vocabulary differences problem Carol Butler Fall 98

  22. User Aids for Browsing • Directories • categories limited in granularity • categories limited in timeliness • creating categories is manual, slow, and cumbersome • Kohonen self-organizing map (SOM) • generates clusters of important concepts Carol Butler Fall 98

  23. Concept “Landscapes” Disease Pharmacology Anatomy Legal Hospitals Built using Kohonen Feature Maps Xia Lin, H.C. Chen slide by Marti Hearst Carol Butler Fall 98

  24. User Aids for Searching • Query expansion • Relevance feedback • Multidimensional scaling • metric similarity modeling • latent semantic indexing • Thesauri use • incorporating existing thesauri • automatic thesaurus generation Carol Butler Fall 98

  25. Automatic Thesaurus Generation • Statistical co-occurrence • Cluster analysis further groups terms • Chen et al. • document collection • automatic indexing • co-occurrence analysis • associative retrieval • Et-Space Webpage Carol Butler Fall 98

  26. Experiment with Yahoo • Browsing tested with Kohonen SOM • subjects who started with Yahoo were less successful in repeating the task with the SOM than vice versa • useful more for broad exploring than for searching • Searching tested with AGT • suggested terms came from web pages • most useful in further refining an initially too broad search Carol Butler Fall 98

  27. Future Development • Effects of different information sources • cohesion • consistent with user’s mental model • User Interface design • flexibility • spelling errors and typos • pan-zoom • help screens or instructions (or more intuitive design, or both) Carol Butler Fall 98

  28. Review and Discussion • Overviews • Category Labels • when docs stored “inside” categories, users cannot create queries based on combinations of categories • display of hierarchies takes up large amounts of screen space • tightly coupled with queries? • Other starting points Carol Butler Fall 98

  29. Overviews in the User Interface • Unsupervised Groupings • Clustering • Kohonen Feature Maps • Supervised Categories • Yahoo! • Superbook • HiBrowse • Cat-a-Cone • Combinations • DynaCat • SONIA Carol Butler Fall 98

  30. Category Labels (from Hearst slide) • Advantages: • Interpretable • Capture summary information • Describe multiple facets of content • Domain dependent, and so descriptive • Disadvantages • Do not scale well (for organizing documents) • Domain dependent, so costly to acquire • May mis-match users’ interests Carol Butler Fall 98

  31. Other Starting Points Approaches • Co-citation Links • Examples, Guided Tours Carol Butler Fall 98

  32. Review and Discussion (cont..) • Interface Design • Visualization • textual vs. 2D spatial representation • Search Strategies • integration with non-search parts of process (reading, annotating, analysis) • Evaluation Methodology Carol Butler Fall 98

More Related