1 / 9

Distributed Information Retrieval Jamie Callan

Distributed Information Retrieval Jamie Callan. Amir Rahimzadeh Ilkhechi Yağız Salor Mustafa İlker Saraç Hakan Sözer. Motivation.

dustin
Download Presentation

Distributed Information Retrieval Jamie Callan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Information RetrievalJamie Callan Amir Rahimzadeh Ilkhechi Yağız Salor Mustafaİlker Saraç Hakan Sözer

  2. Motivation • The single database model can be successful if most of the important or valuable information on a network can be copied easily. However information that cannot be copied is not accessible under the single database model. Information that is proprietarythat costs money or that a publisher wishes to control carefully is essentially invisible to the single database model.

  3. Solution • The alternative to the single database model is a multi-database model in which the existence of multiple text databases is modeled explicitly Central DB Holds Descriptions of the Private DBs Private DB 1 Private DB 2 Single-DB Model Multi-DB Model

  4. Multi-Database Model • Resource Description: The contents of each text database must be described • Resource Selection: Given an information need and a set of resource descriptions a decision must be made about which database(s) to search • Resource Merging: Integrating the ranked lists returned by each database into a single coherent ranked list ???, ???, ??? aaa, bbb, ccc aaa, bbb, ccc aaa, bbb, ccc ???, ???, ??? bbb, ddd, eee bbb, ddd, eee bbb, ddd, eee ???, ???, ??? aaa, ccc, ddd aaa, ccc, ddd

  5. Resource Description • Approach: A simple and robust solution is to represent each database by a description consisting of the words that occur in the databaseand their frequencies of occurrence or statistics derived from frequencies of occurrence which called unigram language model ???, ???, ??? aaa, bbb, ccc ???, ???, ??? bbb, ddd, eee ???, ???, ??? aaa, ccc, ddd

  6. Resource Selection • The major part of this resource selection problem is ranking resources by how likely they are to satisfy the information need • Approach is to apply the techniques of document ranking to the problem of resource ranking using variants of tf .idfapproaches. One advantage is that the same query can be used to rank resources and to rank documents aaa, bbb, ccc aaa, bbb, ccc bbb, ddd, eee bbb, ddd, eee aaa, ccc, ddd

  7. Resource Merging • Solutions include: • computing normalized scores • estimating normalized scores • merging based on unnormalized scores. aaa, bbb, ccc bbb, ddd, eee

  8. Results • Accuracy • of unigram language models • of resource rankings • of document rankings • Testbeds • Summary statistics for three distributed IR testbeds

  9. Conclusion & Summary unigram language model tf .idf Computing normalized scores ???, ???, ??? aaa, bbb, ccc aaa, bbb, ccc aaa, bbb, ccc ???, ???, ??? bbb, ddd, eee bbb, ddd, eee bbb, ddd, eee ???, ???, ??? aaa, ccc, ddd aaa, ccc, ddd

More Related