Discussion class 6
Download
1 / 9

Discussion Class 6 - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Discussion Class 6. Ranking Algorithms. Discussion Classes. Format: Question Ask a member of the class to answer Provide opportunity for others to comment When answering: Give your name. Make sure that the TA hears it. Stand up Speak clearly so that all the class can hear.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Discussion Class 6' - missy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Discussion class 6

Discussion Class 6

Ranking Algorithms


Discussion classes

Discussion Classes

Format:

Question

Ask a member of the class to answer

Provide opportunity for others to comment

When answering:

Give your name. Make sure that the TA hears it.

Stand up

Speak clearly so that all the class can hear


Question 1 inverted document frequency idf
Question 1: Inverted Document Frequency (IDF)

In class, I first introduced Salton's original term weighting, known as Inverted Document Frequency:

wik = fik / dk

The reading gives Sparck Jones's term weighting, Inverted Document Frequency (IDF):

IDFi= log2 (N/ni)+ 1

or

IDFi= log2 (maxn/ni)+ 1

What is the relationship between these alternatives?


Q1 (continued): Definitions of Terms

wik weight given to term k in document i

fik frequency with which term k appears in document i

dk number of documents that contain term k

N number of documents in the collection

ni total number of occurrences of term i in the collection

maxn maximum frequency of any term in the collection


Question 2 within document frequency
Question 2: Within-Document Frequency

(a) Why does term weighting using within document frequency improve ranking?

(b) Why is it necessary to normalize within-document frequency?

(c) Explain Croft's normalization:

cfreqij = K + (1 - K) freqij/maxfreqj

(d) How does Salton and Buckley's recommendation term weighting fit with Croft's normalization?


Question 3 salton buckley recommendation
Question 3: Salton/Buckley Recommendation

similarity (Q,D) =

t

t

t

 (wiq x wij)

i = 1

i = 1

i = 1

( )

wiq= 0.5 + x IDFi

 wiq2 x  wij2

0.5 freqiq

maxfreqq

where

and wij= freqij x IDFj

freqiq = frequency of term i in query q

maxfreqq = maximum frequency of any term in query q

IDFi = IDF of term i in entire collection

freqij = frequency of term i in document j


Question4 zipf s law
Question4: Zipf's Law

"... significant performance inprovement using ... the inverted document frequency ... that is based on Zipf's distribution ..."

What has Zipf's law to do with IDF?


Question 4 probabilistic models
Question 4: Probabilistic Models

The section on probabilistic models is rather unsatisfactory because it relies on a mathematical foundation that has been left out.

Can you summarize the basic ideas?


Question 5 tf idf compared with google pagerank
Question 5: TF.IDF compared with Google PageRank

(a) TF.IDF and PageRank are based on fundamentally different considerations. What are the fundamental differences?

(b) Under which circumstances would you expect each to excel?


ad