1 / 24

Modern Information Retrieval

Modern Information Retrieval. Chapter 3 Retrieval Evaluation. The most common measures of system performance are time and space an inherent tradeoff Data retrieval time and space indexing Information retrieval precision of the answer set also important. evaluation considerations

ayvonne
Download Presentation

Modern Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modern Information Retrieval Chapter 3 Retrieval Evaluation

  2. The most common measures of system performance are time and space • an inherent tradeoff • Data retrieval • time and space • indexing • Information retrieval • precision of the answer set also important

  3. evaluation considerations • query with/without feedback • query interface design • real data/synthetic data • real life/laboratory environment • repeatability and scalability

  4. recall and precision • recall: fraction of relevant documents which has been retrieved • precision: fraction of retrieved documents which is relevant

  5. can we precisely compute precisions? can we precisely compute recalls?

  6. precision versus recall curve: a standard evaluation strategy

  7. interpolation procedure for generating the 11 standard recall levels • Rq={d3,d56,d129} where j is in {0,1,2,…,10} and P(r) is a known precision

  8. to evaluate the retrieval strategy over all test queries, the precisions at each recall level are averaged

  9. another approach: compute average precision at given relevant document cutoff values • advantages?

  10. single value summary for each query • average precision at seen relevant documents • example in Figure 3.2 • favor systems which retrieve relevant documents quickly • can have a poor overall recall performance • R-precision • R: total number of relevant documents • examples in Figures 3.2 and 3.3

  11. precision histogram

  12. combining recall and precision • the harmonic mean • it assumes a high value only when both recall and precision are high

  13. the E measure • b=1, complement of the harmonic mean • b>1, the user is more interested in precision • b<1, the user is more interested in recall

  14. user-oriented measures

  15. coverage ratio: fraction of the documents known to be relevant which has been retrieved • the system finds the relevant documents the user expected to see

  16. novelty ratio: fraction of the relevant documents retrieved which was previously unknown to the user • the system reveals new relevant documents previously unknown to the user

  17. relative recall: the ratio between the number of relevant documents found and the number of relevant documents the user expected to find • relative recall= • when the relative recall equals to 1 (the user finds enough relevant documents), the user stops searching

  18. recall effort: the ratio between the number of relevant documents the user expected to find and the number of documents examined in an attempt to find the expected relevant documents • research in IR • lack a solid formal framework • lack robust and consistent testbeds and benchmarks • Text REtrieval Conference

  19. retrieval techniques • methods using automatic thesauri • sophisticated term weighting • natural language techniques • relevance feedback • advanced pattern matching • document collection • over 1 million documents • newspaper, patents, etc. • topics • in natural language • conversion done by the system

  20. relevant documents • the pooling method: for each topic, collect the top k documents generated by each participating system and decide their relevance by human assessors • the benchmark tasks • ad hoc task • filtering task • Chinese • cross languages • spoken document retrieval • high precision • very large collection

  21. evaluation measures • summary table statistics: number of documents retrieved, number of relevant documents retrieved, number of relevant documents not retrieved, etc. • recall-precision averages • document level averages: average precision at seen relevant documents • average precision histogram

More Related