CES 514 Data Mining March 11, 2010 Lecture 5: scoring, term weighting, vector space model (Ch 6). Model for ranking documents. Ranked retrieval Zones, parameters in querying Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lecture 5: scoring, term weighting, vector space model (Ch 6)
Given a Boolean query q and a document d, weighted zone scoring assigns to the pair (q, d) a score in the interval [0, 1], by computing a linear combination of zone scores, where each zone of the document contributes a Boolean value.
Consider a set of documents each of which has ℓ
zones. Let g1, . . . , gℓ ∈ [0, 1] such that
g1 + … + gl = 1. For 1 ≤ i ≤ ℓ, let si be the
Boolean score denoting a match between q and the ith zone.
Weighted score =
Will turn out the base of the log is immaterial.
for q, d length-normalized.