INFO624 -- Week 9Effective Information Retrieval Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University
Effective Information Retrieval • System’s perspectives • Fast indexing and retrieval algorithms • Inverted indexing. Tree structures, Hash tables • Semantic indexing and mapping • Subject indexing • Latent semantic indexing • Intelligent information retrieval • Knowledge representation • Logical inferences
Effective Information Retrieval • User’s perspectives • Iteration • Relevance Feedback • Use User's Profiles • Graphical Display of Search Results • Browsing/Interactive Searching • We can’t change the user. We should make the system to adapt to the user’s needs
Iteration • Most search needs to be done iteratively • From the user’s point of view • The first query often does not retrieve what the user wants • The user needs to see the output of previous queries to construct the next query • The user often needs to reconstruct his/her information needs after they read/browse search results.
Iteration – User’s strategies • Modify queries repeatedly based on some goals • Starting with high precision • Use a specific query first • Broaden queries to include more relevant documents • "pearl growing" • Starting with high recall • Use a very broad query • Improve precision gradually • "onion peeling" • Starting with known items • Find documents similar to the known items • Browsing/interactive searching
Iteration – System’s strategies • If the system can “learn” from the user’s activities, the system likely can retrieve better results to meet user’s needs. • Relevance feedback • User’s profiles • The system should provide better output representations to help the user • Browse • Conduct interactive searches.
Relevance Feedback • Feedback: The user provides information that the system can use to modify its next search or next display • Relevant Feedback: • Users let the system know • what documents are relevant to their information needs • What concepts or terms are related to their information needs • What weights they would like the system to put on each relevant documents/terms
Relevant Feedback – System’s Strategy • The system should invite the user to select relevant documents/terms from the retrieved results before the second retrieval is conducted • The system should use information from user's feedback to conduct next search.
Design IR Systems with relevance feedback • Collect relevance feedback through • Binary vs. scales • Positive and negative feedback • Apply relevance feedback to • Query • Profile • Document • Retrieval algorithm
User Profiles • User profiles • information about the user’s information needs that IR system can use to modify its search process. • Simple user profiles • A list of terms that the user selects to represent his/her information needs • A list of terms with weights
Extended user profiles • More complex term structures • Information use patterns • levels of interests • User’s background information • User’s browsing behaviors • What pages the user has visited last week, last month, … • From which page to which page …
Use of user Profiles • Selective Dissemination of Information (SDI) • The system regularly runs the search to get any new information that matches user’s profiles. • The user can set up several profiles • Once they are set up, the queries are always the same. • The user can set the frequency of the update searches.
SDI • Advantages of SDI • Automatic retrieval of new information for the user • Set up a profile once, use the profile for retrieval many times. • The user can change the profiles or the search frequency as needed. • Disadvantages of SDI • The query based on the profile is static • Timing problems • Information in need is information indeed. • Something I am very interest, but it did not come at the time I want to read it.
Use profiles during the search • Modify the query • When the user sends a query, the system automatically adds some terms to the query from the user’s profiles. • When the user sends a query, the system checks if the query terms is in user’s profile. If it is, increase the weight for the terms. • Organize the search results • When the user sends a query, the system uses the profiles information to organize the search results (such as clustering, ranking, )
Browsing • Browsing is an act of human information seeking • a mental process of identifying and choosing information • a dynamic process that varies in time and depends on intermediate results. • a part of process of decision making, problem solving, etc.
Browsing for Information Retrieval • A kind of searching process in which the initial search criteria or goals are only partly defined • general-purpose web browsing • An art of not knowing what one wants until one finds it • visual recognition • content recognition
Browsing for Information Retrieval • A learning activity that emphasizes structures and interactive process • exploratory • movements based on feedback • A process of finding and navigating in a unknown or unfamiliar information space • becoming aware of new contents • finding unexpected results
Search or Browse? • Would you like to search using a search engine or would you like to browse from pages to pages (or through a hierarchy)? • Depend on what?
Factors of browsing • Purposes • Fact retrieval • Concept formation or interpretation • Current awareness • Tasks • Well-defined tasks • Ill-defined tasks • number of items to browse
Factors of browsing • Individual characteristics • Motivation • Experience and knowledge • Cognitive styles • Context • Subject disciplines • Organizational schemes • Nature of text/information • Medium • Does the system support browsing?
IR Systems that support browsing • Good navigation tools • Easy to move from one item to another • Links • good structures • fast access • Easy to back track • Correct any errors • make new selections
IR Systems that support browsing • Good displays • easy to read • meaningful orders of retrieval results • graphical presentation • Meaningful content organization • contextual hierarchical structures • Grouping of related items • Contextual landmarks
“why just browse when you can fly?” • HotSauce is an innovative 3D fly-through interface for navigating information spaces. It was developed, largely as a one-man effort, by Ramanathan V. Guha while at Apple Research in the mid-1990s. HotSauce was a specific 3D spatialization of the Meta Content Framework (MCF) also developed by Guha.
Why Surf alone? • What if you had an assistant always looking ahead for you [when browsing the web]…. • The assistant could warn you if the page was irrelevant, could alert you if that link or some other link merited your attention. • The assistant could save you time and frustration. CACM,44(8), p.71, 2001
Information Agents • a software that applies user profiles, dynamically and intelligently, to search tasks • Search distributed, possibly heterogeneous information resources on the user’s behalf. • Gather and integrate search results by some Artificial Intelligence techniques • Accept user’s feedback and use the feedback to modify the user profiles and search strategies
Architecting Browsable Websites • Design site structures • Metaphor Exploration • Organizational metaphors • Functional metaphors • Visual metaphors • Define Navigation • Global navigation • Local navigation • Design Document
Interactive Systems • “When an interactive system is well-designed, the interface almost disappears, enabling users to concentrate on their work, exploration, or pleasure.” • Ben Shneiderman
Design Principles • Offer informative feedbacks • Relationships between query and documents retrieved • Relationships among retrieved documents • Relationships between metadata and documents • Reducing working memory load • Keep tracks of choices made during the search process • Allow user to return temporarily abandoned strategies or jump from one strategy to another • Retain information and context across search session.
Provide alternative interfaces for novice and expert users. • Simplicity vs. power
Output Presentation for Search engines • Two major issues • What information to present? • How to organize the output items? • Information in the output display • Traditional databases • Document reference numbers (unique number) • Citations (author, title, source) • Document surrogate (citation plus abstract and/or indexing terms) • fulltext
On the web • title, url • First few sentences/related sentences/summaries • Dates / page sizes • Degree of relevance • special links • “find similar one” • Types of links • Related categories
What other information you may wish to have in the retrieval output? • Citations (or links from this document)? • Critique or evaluation? • Access information (how many times it was accessed in last 6 months)? • Links to this document • Author contact information ? • Why documents were retrieved?
Output organization • Linear • a list of documents • listed by • best match • alphabetical orders • dates • order of selected fields (authors, titles, web sites)
Linear display • Practical and most popular • easy to generate • users know how to use it • Did not shown relationships among documents! • Document relationships are more complex than a linear one
Hierarchical display • Separate data into different levels or branches • Branches can be expanded/collapsed. • Show more data in less space • Show the organization of the data
Graphical displays • Show more complex relationships • Use location, colors, dimensions, etc to represent documents, terms or concepts. • Provide more interactive functions
What is IV? System-centered View • The use of computer-supported, interactive, visual representations of abstract data • to assist navigation in large information spaces • to reveal complex information structures • to amplify cognition User-centered
IV and IR • Both need to process a large amount of information • Both are tools to assist the cognitive process of finding, learning, and understanding information. • Both face the challenge of “uncertainty” • Not an “Exact science” • Both subject to human’s interpretation.
VIRI -- Visual Information Retrieval Interfaces • 2-dimensional graphical display • use graphical objects (icons, dots etc.) to represent documents • Use geographical relationships to indicate document relationships • use colors to group/differentiate documents • use animation to assist interaction
Concept Visualization • AltaVista LiveTopic • HiBrowse Interface • SemioMap • Hyperbolic Trees • Visual Thesaurus • Visual Concept Explorer
Topic Maps • Highwire: http://www.highwire.org