1 / 69

Lecture 23: Interfaces for Information Retrieval

Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002 http://www.sims.berkeley.edu/academics/courses/is202/f02/. Lecture 23: Interfaces for Information Retrieval. SIMS 202: Information Organization and Retrieval. Lecture Overview.

Download Presentation

Lecture 23: Interfaces for Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002 http://www.sims.berkeley.edu/academics/courses/is202/f02/ Lecture 23: Interfaces for Information Retrieval SIMS 202: Information Organization and Retrieval

  2. Lecture Overview • Review and Continuation • Web Search Engines and Algorithms • Interfaces for Information Retrieval • Introduction to HCI • Why Interfaces Don’t Work • Early Visions: Memex Credit for some of the slides in this lecture goes to Marti Hearst

  3. Lecture Overview • Review and Continuation • Web Search Engines and Algorithms • Interfaces for Information Retrieval • Introduction to HCI • Why Interfaces Don’t Work • Early Visions: Memex Credit for some of the slides in this lecture goes to Marti Hearst

  4. Search Engines • Crawling • Indexing • Querying

  5. Web Search Engine Layers From description of the FAST search engine, by Knut Risvikhttp://www.infonortics.com/searchengines/sh00/risvik_files/frame.htm

  6. Standard Web Search Engine Architecture Check for duplicates, store the documents DocIds crawl the web user query create an inverted index Inverted index Search engine servers Show results To user

  7. Google Web Search Architecture More detailed architecture (Brin & Page 98) Only covers the preprocessing in detail, not the query serving

  8. Indexes for Web Search Engines • Inverted indexes are still used, even though the web is so huge • Some systems partition the indexes across different machines • Each machine handles different parts of the data • Other systems duplicate the data across many machines • Queries are distributed among the machines • Most do a combination of these

  9. Search Engine Querying In this example, the data for the pages is partitioned across machines. Additionally, each partition is allocated multiple machines to handle the queries. Each row can handle 120 queries per second Each column can handle 7M pages To handle more queries, add another row. From description of the FAST search engine, by Knut Risvikhttp://www.infonortics.com/searchengines/sh00/risvik_files/frame.htm

  10. Querying: Cascading Allocation of CPUs • A variation on this that produces a cost-savings: • Put high-quality/common pages on many machines • Put lower quality/less common pages on fewer machines • Query goes to high quality machines first • If no hits found there, go to other machines

  11. Google • Google maintains the worlds largest Linux cluster (10,000 servers) • These are partitioned between index servers and page servers • Index servers resolve the queries (massively parallel processing) • Page servers deliver the results of the queries • Over 3 Billion web pages are indexed and served by Google

  12. Search Engine Indexes • Starting Points for Users include • Manually compiled lists • Directories • Page “popularity” • Frequently visited pages (in general) • Frequently visited pages as a result of a query • Link “co-citation” • Which sites are linked to by other sites?

  13. Starting Points: What is Really Being Used? • Todays search engines combine these methods in various ways • Integration of Directories • Today most web search engines integrate categories into the results listings • Lycos, MSN, Google • Link analysis • Google uses it; others are also using it • Words on the links seems to be especially useful • Page popularity • Many use DirectHit’s popularity rankings

  14. Web Page Ranking • Varies by search engine • Pretty messy in many cases • Details usually proprietary and fluctuating • Combining subsets of: • Term frequencies • Term proximities • Term position (title, top of page, etc) • Term characteristics (boldface, capitalized, etc) • Link analysis information • Category information • Popularity information

  15. Ranking: Hearst ‘96 • Proximity search can help get high-precision results if >1 term • Combine Boolean and passage-level proximity • Proves significant improvements when retrieving top 5, 10, 20, 30 documents • Results reproduced by Mitra et al. 98 • Google uses something similar

  16. Ranking: Link Analysis • Assumptions: • If the pages pointing to this page are good, then this is also a good page • The words on the links pointing to this page are useful indicators of what this page is about • References: Page et al. 98, Kleinberg 98

  17. Ranking: Link Analysis • Why does this work? • The official Toyota site will be linked to by lots of other official (or high-quality) sites • The best Toyota fan-club site probably also has many links pointing to it • Less high-quality sites do not have as many high-quality sites linking to them

  18. Ranking: PageRank • Google uses the PageRank • We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. d is usually set to 0.85. C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: • PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) • Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one

  19. PageRank • Similar to calculations used in scientific citation analysis (e.g., Garfield et al.) and social network analysis (e.g., Waserman et al.) • Similar to other work on ranking (e.g., the hubs and authorities of Kleinberg et al.) • Computation is an iterative algorithm and converges to the principle eigenvector of the link matrix

  20. Lecture Overview • Review and Continuation • Web Search Engines and Algorithms • Interfaces for Information Retrieval • Introduction to HCI • Why Interfaces Don’t Work • Early Visions: Memex Credit for some of the slides in this lecture goes to Marti Hearst

  21. “Drawing the Circles”

  22. “Drawing the Circles”

  23. “Drawing the Circles”

  24. “Drawing the Circles”

  25. “Drawing the Circles”

  26. “Drawing the Circles”

  27. “Drawing the Circles”

  28. “Drawing the Circles”

  29. “Drawing the Circles”

  30. Human-Computer Interaction (HCI) • Human • The end-users of a program • The others in the organization • Computer • The machines the programs run on • Interaction • The users tell the computers what they want • The computers communicate results

  31. What is HCI? Organizational & Social Issues Task Design Technology Humans

  32. Shneiderman on HCI • Well-designed interactive computer systems • Promote • Positive feelings of success • Competence • Mastery • Allow users to concentrate on their work, exploration, or pleasure, rather than on the system or the interface

  33. Design Guidelines • Set of design rules to follow • Apply at multiple levels of design • Are neither complete nor orthogonal • Have psychological underpinnings (ideally)

  34. Shneiderman’s Design Principles • Provide informative feedback • Permit easy reversal of actions • Support an internal locus of control • Reduce working memory load • Provide alternative interfaces for expert and novice users

  35. Provide Informative Feedback • About: • The relationship between query specification and documents retrieved • Relationships among retrieved documents • Relationships between retrieved documents and metadata describing collections

  36. Reduce Working Memory Load • Provide mechanisms for keeping track of choices made during the search process • Allow users to: • Return to temporarily abandoned strategies • Jump from one strategy to the next • Retain information and context across search sessions • Provide browsable information that is relevant to the current stage of the search process • Related terms or metadata • Search starting points (e.g., lists of sources, topic lists)

  37. Interfaces For Expert And Novice Users • Simplicity vs. power tradeoffs • “Scaffolded” user interface • How much information to show the user? • Number and complexity of user operations • Variants of operations • Inner workings of system itself • System history • Example: • Television remote control

  38. User Differences • Abilities, preferences, predilections • Spatial ability • Memory • Reasoning abilities • Verbal aptitudes • Personality differences • Age, gender, ethnicity, class, sexuality, culture, education • Modalilty preferences/restrictions • Vision, audition, speech, gesture, haptics, locomotion

  39. Nielsen’s Usability Slogans • Your best guess is not good enough • The user is always right • The user is not always right • Users are not designers • Designers are not users • Less is more • Details matter (from Nielsen’s “Usability Engineering”)

  40. Who Builds UIs? • A team of specialists (ideally) • Graphic designers • Interaction / interface designers • Technical writers • Marketers • Test engineers • Software engineers

  41. How to Design and Build UIs Design Evaluate Prototype • Task analysis • Rapid prototyping • Evaluation • Implementation Iterate at every stage!

  42. Task Analysis • Observe existing work practices • Create examples and scenarios of actual use • Try out new ideas before building software

  43. Rapid Prototyping • Build a mock-up of design • Low fidelity techniques • Paper sketches • Cut, copy, paste • Video segments • Interactive prototyping tools • Visual Basic, HyperCard, Director, etc. • UI builders • NeXT, etc.

  44. Evaluation Techniques • Qualitative vs. quantitative methods • Qualitative (non-numeric, discursive, ethnographic) • Focus groups • Interviews • Surveys • User observation • Participatory design sessions • Quantitative (numeric, statistical, empirical) • User testing • System testing

  45. Qualitative Questions • User experience • User preferences • User recommendations • “Design dialogue”

  46. Quantitative Questions • Precision • Recall • Time required to learn the system • Time required to achieve goals on benchmark tasks • Error rates • Retention of the use of the interface over time

  47. Information Visualization • Utility • Inherently visual data • Making the abstract concrete • Making the invisible visible • Techniques • Icons • Color highlighting • Brushing and linking • Panning and zooming • Focus-plus-context • Magic lenses • Animation

  48. Lecture Overview • Review and Continuation • Web Search Engines and Algorithms • Interfaces for Information Retrieval • Introduction to HCI • Why Interfaces Don’t Work • Early Visions: Memex Credit for some of the slides in this lecture goes to Marti Hearst

  49. Why Interfaces Don’t Work • Because… • We still think of using the interface • We still talk of designing the interface • We still talk of improving the interface • “We need to aid the task, not the interface to the task.” • “The computer of the future should be invisible.”

  50. Norman on Design Priorities • The user—what does the person really need to have accomplished? • The task—analyze the task. How best can the job be done?, taking into account the whole setting in which it is embedded, including the other tasks to be accomplished, the social setting, the people, and the organization. • As much as possible, make the task dominate; make the tools invisible. • Then, get the interaction right, making things the right things visible, exploiting affordances and constraints, providing the proper mental models, and so on—the rules of good design for the user, written about many, many times in many, many places.

More Related