1 / 39

What is Information Retrieval (IR)?

What is Information Retrieval (IR)?. Adapted from UCB Course SIMS 202 and IIT Course on IR. What is information retrieval. Gathering information from a source(s) based on a need Major assumption - that information exists. Broad definition of information Sources of information

adellt
Download Presentation

What is Information Retrieval (IR)?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is Information Retrieval (IR)? Adapted from UCB Course SIMS 202 and IIT Course on IR

  2. What is information retrieval • Gathering information from a source(s) based on a need • Major assumption - that information exists. • Broad definition of information • Sources of information • Other people • Archived information (libraries, maps, etc.) • Web • Radio, TV, etc.

  3. Information retrieved • Impermanent information • Conversation • Documents • Text • Video • Files • Etc.

  4. The information acquisition process • Know what you want and go get it • Ask questions to information sources as needed (queries) - SEARCH • Have information sent to you on a regular basis based on some predetermined information need • Push/pull models

  5. What IR assumes • Information is stored (or available) • A user has an information need • An automated system exists from which information can be retrieved • Why an automated system? • The system works!!

  6. What IR is usually not about • Usually just unstructured data • Retrieval from databases is usually not considered • Database querying assumes that the data is in a standardized format • Transforming all information, news articles, web sites into a database format is difficult for large data collections

  7. What an IR system should do • Store/archive information • Provide access to that information • Answer queries with relevant information • Stay current • WISH list • Understand the user’s queries • Understand the user’s need • Acts as an assistant

  8. How good is the IR system Measures of performance based on what the system returns: • Relevance • Coverage • Recency • Functionality (e.g. query syntax) • Speed • Availability • Usability • Time/ability to satisfy user requests

  9. How do IR systems work Algorithms implemented in software • Gathering methods • Storage methods • Indexing • Retrieval • Interaction

  10. Memex - 1945 Vannevar Bush

  11. Some IR History • Roots in the scientific “Information Explosion” following WWII • Interest in computer-based IR from mid 1950’s • H.P. Luhn at IBM (1958) • Probabilistic models at Rand (Maron & Kuhns) (1960) • Boolean system development at Lockheed (‘60s) • Vector Space Model (Salton at Cornell 1965) • Statistical Weighting methods and theoretical advances (‘70s) • Refinements and Advances in application (‘80s) • User Interfaces, Large-scale testing and application (‘90s) • Then came the web and search engines and everything changed

  12. Existing IR SystemSearch Engine

  13. Index Query Engine Interface Indexer Users Crawler Web A Typical Web Search Engine

  14. Crawlers • Web crawlers (spiders) gather information (files, URLs, etc) from the web. • Primitive IR systems

  15. Finding Out About (FOA)(Reference R. Belew) • Three phases: • Asking of a question (the Information Need) • Construction of an answer (IR proper) • Assessment of the answer (Evaluation) • Part of an iterativeprocess

  16. What is different about IR from other areas, say Computer Science • Many problems have a right answer • How much money did you make last year? • IR problems usually don’t • Find all documents relevant to “hippos in a zoo”

  17. Repositories Goals Workspace IR is an Iterative Process

  18. Query Parse User’s Information Need text input

  19. Index Pre-process Collections

  20. Index Query Parse Rank or Match Pre-process User’s Information Need Collections text input

  21. Index Query Parse Query Reformulation Rank or Match Pre-process User’s Information Need Collections text input

  22. Question Asking • Person asking = “user” • In a frame of mind, a cognitive state • Aware of a gap in their knowledge • May not be able to fully define this gap • Paradox of Finding Out About something: • If user knew the question to ask, there would often be no work to do. • “The need to describe that which you do not know in order to find it” Roland Hjerppe • Query • External expression of this ill-defined state

  23. Question Answering • Consider - question answerer is human. • Can they translate the user’s ill-defined question into a better one? • Do they know the answer themselves? • Are they able to verbalize this answer? • Will the user understand this verbalization? • Can they provide the needed background? • Consider - answerer is a computer system.

  24. Assessing the Answer • How well does it answer the question? • Complete answer? Partial? • Background Information? • Hints for further exploration? • How relevant is it to the user? • Introduce notion of relevance.

  25. IR is usually a dialog • The exchange doesn’t end with first answer • User can recognize elements of a useful answer • Questions and understanding changes as the process continues.

  26. A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.” (after Bates 89) Q2 Q4 Q3 Q1 Q5 Q0

  27. Berry-picking model Berry-picking is greedy search – grab what you can see or that is nearby • The query is continually shifting • New information may yield new ideas and new directions • The information need • is not satisfied by a single, final retrieved set • is satisfied by a series of selections and bits of information found along the way.

  28. Information Seeking Behavior • Two parts of the process: • search and retrieval • analysis and synthesis of search results

  29. Search Tactics and Strategies • Search Tactics • Bates 79 • Search Strategies • Bates 89 • O’Day and Jeffries 93

  30. Tactics vs. Strategies • Tactic: short term goals and maneuvers • operators, actions • Strategy: overall planning • link a sequence of operators together to achieve some end

  31. Restricted Form of the IR Problem • The system has available only pre-existing, “canned” text passages. • Its response is limited to selecting from these passages and presenting them to the user. • It must select, say, 10 or 20 passages out of millions or billions!

  32. Information Retrieval • Revised Task Statement: Build a system that retrieves documents that users are likely to find relevant to their queries. • This set of assumptions underlies the field of Information Retrieval.

  33. Structure of an IR System Search Line Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Adapted from Soergel, p. 19 Potentially Relevant Documents

  34. Structure of an IR System Search Line Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Adapted from Soergel, p. 19 Potentially Relevant Documents

  35. Structure of an IR System Search Line Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Adapted from Soergel, p. 19 Potentially Relevant Documents

  36. Structure of an IR System Search Line Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Adapted from Soergel, p. 19 Potentially Relevant Documents

  37. Measures of performance • How good is that IR system? • BUDLITE SEARCH – never fills you up.

  38. Is IR Knowledge Creation? • If what is collected is indexed and used.

More Related