1 / 27

SIMS 296a-3: Current Topics in Information Access

SIMS 296a-3: Current Topics in Information Access. Marti Hearst Fall ‘98. Today. Introductions Goals and Course Requirements Administrivia Topics What is Information Access Current Topics (an outline) Intro to IA. Goals.

terris
Download Presentation

SIMS 296a-3: Current Topics in Information Access

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIMS 296a-3:Current Topics in Information Access Marti Hearst Fall ‘98

  2. Today • Introductions • Goals and Course Requirements • Administrivia • Topics • What is Information Access • Current Topics (an outline) • Intro to IA

  3. Goals • Become expert on the state-of-the-art in timely topics related to information access • Begin getting research results.

  4. Course Requirements • To get S/U credit for the class • Lead two discussions • Do the readings • Attend the meetings

  5. Course Requirements • To get a grade in the class • Do the above • Do one of the following (optionally with the help of a faculty member and/or another student): • Write a publishable survey paper on an emerging area of information access. • Do research that should lead to a publishable research paper ona new idea, method, analysis, or vision statement for an emerging area of information access. • Implement and/or evaluate code to further an information access research project.

  6. Administrivia • Sign up sheet • Readings • Other questions?

  7. Outline • What is Information Access? • Goals, Tasks, Types of data • Standard Information Retrieval • Assumptions, Techniques, Evaluation • Current Topics • Candidate topics

  8. What is Information Access? • Information Access: • The process by which users use information technology to seek, organize, and understand information. • Focus: information expressed as text.

  9. Information Retrieval • Task Statement Build a system that retrieves documents that users are likely to find relevant to their queries. • This set of assumptions underlies the field of Information Retrieval.

  10. Information Retrieval Assumptions • The system has available only pre-existing, “canned” text passages. • Its response is limited to selecting from these passages and presenting them to the user. • It must select, say, 10 or 20 passages out of millions or billions!

  11. Top 10 Research Issues for IRWhat do people want from IR? • By Bruce Croft, DLIB Magazine, Nov 95 • Based on work observations from work on public-domain systems, including: • THOMAS • American Memory Project (Library of Congress) • The order of importance does not correspond to many IR researchers’ priorities. • The same can be said for AI researchers.

  12. Top 10 Research Issues for IR • Bruce Croft, DLIB Magazine, Nov 95. In descending order of importance. • Integrated Solutions • Distributed IR • Efficient, Flexible Indexing and Retreival • “Magic” (Effective Vocabulary Expansion) • Interfaces and Browsing • Routing and Filtering • Effective Retrieval • Multimedia Retrieval • Information Extraction • Relevance Feedback

  13. Other Issues • Mundane issues are important • Spelling Correction • Fast display of initial results • Less important but more interesting from many researchers’ points of view: (Bruce Croft, DLIB Magazine, Nov 95) • Multilingual IR • Data Mining (in text databases) • Text Categorization

  14. Matching Tasks, Collections, and Search Systems • Typical WWW search is not the whole picture. • Different information needs require: • different collections • different search systems and strategies • Compare: • general WWW • newswire and magazines • medical journal articles

  15. Match Task and Search Type • WWW Tasks: (from www.cnet.com/Content/Reviews/Compare/Seach/ss1a.html) • Find how-to pages for Doom. • Purchase plane tickets and hotel for a trip to Java. • Find the top five all-time scoring leaders in the national hockey league. • Find a recipe for potato latkes. • Find the tide tables for Maui. • Characteristics: • Timely, specific, found via help from human agents and in well-known resources before the WWW.

  16. Match Task and Search Type • Newswire & Magazine Tasks: (from the TREC collection) • Find articles on research into cures for osteoporosis. • Find articles on the effects of recycling of tires on the environment. • Find information on jail and prison overcrowding and how inmates are forced to cope with those conditions. • Find discussion of an existing or proposed insurance plan (governmental, commercial or individual) and the coverage it provides for long term care confinements in an institution. • Characteristics: • Complex combinations of topics. • Research-oriented • Either timely or retrospective

  17. Match Task and Search Type • MEDLINE Tasks: (From OHSUMED, medir.ohsu.edu/pub/ohsumed) • Are there adverse effects on lipids when progesterone is given with estrogen replacement therapy? • Pathophysiology and treatment of disseminated intravascular coagulation. • Reviews on subdurals in the elderly. • Effectiveness of etidronate in treating hypercalcemia of malignancy. • Characteristics • Research-oriented • Technical • Cause and Effect, Implications

  18. The Problem of Information Access • Main problem: • Computers can’t understand natural language. • Therefore: • Information access systems must guide users to information of interest by approximate methods. • General common methods: • word match • topic directories

  19. Why Text is Tough • Abstract concepts difficult to represent (AI-Complete) • “Countless” combinations of subtle, abstract relationships among concepts • Many ways to represent similar concepts space ship, flying saucer, UFO, figment of imagination • Concepts are difficult to visualize • High dimensionality Tens or hundreds of thousands of features

  20. Why Text is Tough • I saw Pathfinder on Mars with a telescope. • Pathfinder photographed Mars. • The Pathfinder photograph mars our perception of a lifeless planet. • The Pathfinder photograph from Ford has arrived. • The Pathfinder forded the river without marring its paint job.

  21. Outline • What is Information Access? • Goals, Tasks, Types of data • Standard Information Retrieval • Assumptions, Techniques, Evaluation • Current Topics • Candidate topics • User Interfaces • Quality Assessment • Text Data Mining • Student suggestions

  22. Tools for Information Access User Interfaces (information visualization) Information Access (information retrieval) Language and Task Analysis Content Analysis

  23. Current Topics • User Interfaces • Incorporating “personal” information • Automated “Agents” vs. User Initiated Steps • Support for the dynamic process of information access • How to organize large search results • Categories, clusters, combinations of these • Question Answering • Others?

  24. Current Topics • Quality Assessment • Issues: • How to define quality • Rating methods • Different fields (medicine, business) • Techniques • Visitation patterns and times • “Social” techniques • Link structure (co-citation patterns) • Link structure + content

  25. Current Topics • Text Data Mining • Visualizating the contents of large text collections • Automatically discovering associations within text collections • Discovering useful patterns • Spotting anomalies • *Finding chains of associated information • *I have a proposal for this

  26. Current Topics • Cognitive modeling/AI techniques • Your idea goes here:

  27. For Next Time • Do background reading • Think about which topics to pursue • I will present more background information

More Related