1 / 17

Tutorial#3

Tutorial#3. Retrieval models. Retrieval models match query with documents to: separate documents into relevant and non-relevant class rank the documents according to the relevance. Boolean model Vector space model (VSM) Probabilistic models. Boolean model.

rasia
Download Presentation

Tutorial#3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial#3

  2. Retrieval models Retrieval models match query with documents to: • separate documents into relevant and non-relevant class • rank the documents according to the relevance. • Boolean model • Vector space model (VSM) • Probabilistic models

  3. Boolean model • Boolean model is most common exact-match model • queries are logic expressions with document features as operands • In pure Boolean model, retrieved documents are not ranked.

  4. Example D7 OR D1,D2,D5 AND D2,D4,D5,D6,D8 D7 OR D2,D5

  5. Vector space model (VSM) • Documents and queries are represented as vectors. dj = (w1,j,w2,j,...,wt,j) q = (w1,q,w2,q,...,wt,q) • Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero.

  6. Vector space model (VSM) • Several different ways of computing these values, also known as (term) weights, have been developed. One of the best known schemes is (tf-idf) weighting:

  7. (tf-idf) weighting

  8. Vector space model (VSM)

  9. Example • documents: D0:'How to Bake Bread Without Recipes', D1:'The Classic Art of Viennese Pastry', D2:'Numerical Recipes: The Art of Scientific Computing', D3:'Breads, Pastries, Pies and Cakes : Quantity Baking Recipes', D4:'Pastry: A Book of Best French Recipe‘ • Keywords : ['bak','recipe','bread','cake','pastr','pie']

  10. will generate a matrix 6 terms x 5 documents

  11. Query: "baking bread“ • will generate a matrix 6 terms x 5 documents

  12. VSM Implementation • VSMranker.javaranks documents for a query • Provides functions to develop different user interfaces • Stand alone usage needs document and query TDMs java -cp ../java VSMranker cacm.tdm query.tdm 7 • Retrieves top 7 documents for CACM queries

  13. Ex#3 (solve in tutorial time)

  14. References: • http://www.ccs.neu.edu/home/jaa/CSG339.06F/Lectures/vector.pdf • http://www.ccs.neu.edu/home/jaa/CSG339.06F/Lectures/boolean.pdf

More Related