1 / 16

Survey

Survey. Jaehui Park 2008. 07. 17. Introduction. Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon We are interested in Issues in Information Retrieval About crawling, indexing, searching and ranking methods

kiefer
Download Presentation

Survey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survey Jaehui Park 2008. 07. 17.

  2. Introduction • Members • Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon • We are interested in • Issues in Information Retrieval • About crawling, indexing, searching and ranking methods • How to process multi-term queries in information retrieval environments • Ex) • Today • US Today • Today Weather • Paris Today Weather -> Multi-term queries express more complex information need than single queries.

  3. Main Topic • Long Queries in Keyword Search • Keywords: • Compound query, Evidence Combination, Phrasal Query, Multi-term Query, Multiple Keyword Search, Multiword Unit, and so on. • Issues • proximity or distance • syntactic structure (order) • semantic • NLP remedies • …

  4. Proximity • An intuitive concept for processing multiple term queries • Readings • Term Proximity Scoring for Keyword-Based Retrieval Systems • [ECIR 2003] Yves Rasolofo and Jacques Savoy • Efficiency vs. Effectiveness in Terabyte-Scale Information Retrieval • [TREC 2005] Stefan Buttcher and Charles L. A. Clarke • Efficient Text Proximity Search • [SPIRE 2007] Ralf Schenkel, et al. • Why Bigger Windows Are Better Than Smaller Ones • [TR-UM 1997] Ron Papka and James Allan • …

  5. Term Proximity Scoring for Keyword-Based Retrieval Systems Yves Rasolofo and Jacques Savoy European Colloquium on IR Research(ECIR) 2003, LNCS 2633 2008. 07. 17. Presented by Jaehui Park

  6. Introduction • Phrase, term proximity or term distance in IR • Focus on adding a word pair scoring module • Okapi probabilistic model + proximity measurement • Previous work • Salton & McGil [1983] • Generating statistical phrases based on word co-occurrence • Fagan [1987] • Considering syntactic relation or syntactic structures • Mitra et al. [1997] • “Once a good basic ranking scheme is used, the use of phrases do not have a major effect on precision at high ranks” • Arampatzis et al.[2000] • The lack of success when using NLP technique in IR • Hawking & Thistlewaite [1996] • The use of proximity scoring within the PADRE system (Z-mode method)

  7. Okapi • Okapi [Robertson & Spark Jones 1976] • Document ranking function according to their relevance to a given search query based on the probabilistic retrieval model • Considering • Term frequency • Document length • The weight for a given term ti in document d

  8. Okapi • Okapi [Robertson & Spark Jones 1976] (continued) • The weight for the term ti within a query • The retrieval status value (for a document according to a query)

  9. Term Proximity Weighting • Improving retrieval performance by using term proximity scoring • Assumption • If a document contains sentences having at least two query terms within them, the probability that this document will be relevant must be greater. • The closer are the query terms, the higher is the relevance probability. • Objective • Assigning more importance to those keywords having a short distance between their occurrences.

  10. Term Proximity Weighting • 1. expand the request(query) using keyword pairs extracted from the query’s wording • 2. compute a term pair instance weight • “information retrieval “ : 1.0 • “the retrieval of medical information” : 0.11 (1/9)

  11. Term Proximity Weighting 3. sum all the corresponding term pairs 4. compute the contribution of all occurring term pairs in the document 5. compute the final retrieval status value

  12. Experiments • Test Collections • TREC-8 document (528,155 docs) • Financial Times, Federal Register, Foreign Broadcast Information Service, LA Times • TREC-9, TREC-10 (1,692,096 docs)

  13. Experiments Evaluation

  14. Experiments Evaluation

  15. Experiments Evaluation

  16. Conclusion • The impact of a new term proximity algorithm on retrieval effectiveness for keyword-based system was examined. • Improve ranking for documents having query term pairs occurring within a given distance constraint. • The term proximity scoring approach • Improve precision after retrieving a few documents

More Related