exploring the similarity space n.
Download
Skip this Video
Download Presentation
Exploring the Similarity Space

Loading in 2 Seconds...

play fullscreen
1 / 11

Exploring the Similarity Space - PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on

Exploring the Similarity Space. M. Ya ğmur Şahin Çağlar Terzi Arif Usta. Introduction. What similarity calculations should be used? F or each type of queries For each or type of documents Type of desired performance Is there a “silver bullet” for measurement? To find the answer

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Exploring the Similarity Space' - luke


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
exploring the similarity space

Exploring the Similarity Space

M. Yağmur Şahin

Çağlar Terzi

Arif Usta

introduction
Introduction
  • What similarity calculations should be used?
    • For each type of queries
    • For each or type of documents
    • Type of desired performance
  • Is there a “silver bullet” for measurement?
  • To find the answer
    • Q-expression (8-position string)
    • Test by extending database system mg
    • Experiments on TREC environment
similarity measure
Similarity Measure
  • Recall – Precision
  • TREC Conference
  • Range of sources are used
    • Van Rijsbergen [1979]
    • Salton and McGill [1983]
    • Salton [1989]
    • Frakes and Baeza-Yates [1992]
  • Extension of previous work of Salton and Buckley [1988] *sonrakicumleler
combining functions
Combining functions
  • Combining functions correspond to
    • importance of each term in the document,
    • importance of that term in the query,
    • length or weight of the document,
    • length of the query
term weight
Term Weight
  • Inverse Document Frequency (IDF)
  • Salton and Buckley [1988]’s three different term weighting rules
  • Document-term and query-term weight
    • Only one of them, both of them or none of them can be used
relative term frequency
Relative Term Frequency
  • TF
  • TF-IDF
    • wd,t= rd,t * wt
  • Salton and Buckley [1988] described three different RTF formulations
q expression
Q-Expression
  • 8-position string
    • BB-ACB-BAA
experiments
Experiments
  • Aim is the best combination
  • Exhaustive enumeration
    • [AB][BDI]-[AB][CEF][BDIK]-[AB][ACE]A
    • 720 possibilites
  • 5-10 minutes CPU time per mechanism
  • 2-4 seconds per query per collection
  • Total: 4 weeks
experiments1
Experiments
  • 6 experimental domains
    • 3 sets of queries
      • Title, narrative, full
    • 2 sets of collections
      • Ap2wsj2 (Newspaper articles)
      • Fr2ziff2 (Non-newspaper articles)
  • 3 effectiveness measures
    • average 11-point recall-precision average over the query set,
    • average precision-at-20 value for the query set
    • average reciprocal rank of the first relevant document retrieved
conclusion
Conclusion
  • They failed to find any particular measure that really stood out but discovered that no measure consistently worked well across all of the queries in a query set
  • No component or weighting scheme was shown to be consistently valuable across all of the experimental domains
  • Better performance can be obtained - by choosing a similarity measure to suit each query on an individual basis
    • IMPLAUSIBLE!