1 / 13

CS 533 – 5 min. Presentations M. Sami Arpa Enes Taylan

CS 533 – 5 min. Presentations M. Sami Arpa Enes Taylan

Download Presentation

CS 533 – 5 min. Presentations M. Sami Arpa Enes Taylan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 533 – 5 min. Presentations M. Sami Arpa Enes Taylan Amit Singhal, Chris Buckley, and Mandar Mitra. 1996. Pivoted document length normalization, In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '96). ACM, New York, NY, USA, 21-29. DOI=10.1145/243199.243206 http://doi.acm.org/10.1145/243199.243206

  2. Pivoted Document Normalization Subject: Automatic information retrieval systems work with documents of varying lengths in a text collection.

  3. Pivoted Document Normalization Problem: Long documents have advantage in retrieval over the short documents because of: - Higher term frequencies - More terms

  4. Pivoted Document Normalization Previous Solutions: Document length normalization, - Provides fairly retrieving documents of all lengths. - Cosine normalization - Maximum tf normalization - Byte length normalization

  5. Pivoted Document Normalization Problem with Previous Solutions: Probability of retrieval and probability of relevance has different slopes, because of normalization factor.

  6. Pivoted Document Normalization New approach: Pivoted Document Normalization

  7. Pivoted Document Normalization Likelihood of relevance and retrieval: - Order documents in a collection by their lengths - Divide them into several equal sized “bins” - Compute probability of a randomly selected relevant/retrieved document belonging to a certain bin.

  8. Pivoted Document Normalization Pivoted Normalization Scheme: - “The probability of retrieval of a document is inversely related to the normalization factor.” - To increase the chances of some documents to be retrieved, decrease the value of norm. factor or opp.

  9. Pivoted Document Normalization Method: - Use a previous normalization method (like cosine or byte size) to initially retrieve some documents. - Find a tilting amount from previous normalization

  10. Pivoted Document Normalization Method:

  11. Pivoted Document Normalization Results:

  12. Pivoted Document Normalization • Conclusion: • If documents of different lenghts are retrieved with equal chances, retrieval effectivess increases. • Pivoted normalization technique could make previously developed normalization techniques more powerful .

  13. thank you.

More Related