The google similarity distance
Download
1 / 13

The Google Similarity Distance - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

The Google Similarity Distance. Presenter : Chien-Hsing Chen Author: Rudi L. Cilibrasi Paul M.B. Vitanyi. 2007,TKDE. Outline. Motivation Objective NGD Experiments Conclusions Personal Opinion. Motivation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Google Similarity Distance' - vilmos


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The google similarity distance

The Google Similarity Distance

Presenter:Chien-Hsing Chen

Author: Rudi L. Cilibrasi

Paul M.B. Vitanyi

2007,TKDE


The google similarity distance

Outline

  • Motivation

  • Objective

  • NGD

  • Experiments

  • Conclusions

  • Personal Opinion


Motivation
Motivation

  • great cost of designing structures capable of manipulating knowledge

  • entering high quality contents in these structures by knowledgeable human experts

  • the efforts are long-running

  • large scale


Objective
Objective

  • The author develop a method that uses only the name of an object and obtains knowledge about the similarity of objects

    • a regular FCA, used in Ontology, acquires the similarity between objects and attributes


The google similarity distance1
The Google Similarity Distance

Kolmogorov complexity


The google similarity distance2
The Google Similarity Distance

  • NGD (horse, rider) = 0.443

    • “horse” 46,700,000 pages

    • “rider” 12,200,000 pages

    • “horse, rider” 2,630,000 pages

    • N= Indexed 8,058,044,651 pages

NGD(pensi, cola)=0.797

NGD(賓拉登, 攻擊)=0.64

NGD(horse, rider)=0.898

NGD(book, drink)=0.694

NGD(web, network)=0.2768


Applications and experiments
Applications and Experiments

  • Hierarchical Clustering

  • Given a set of objects in a space provided with a distance measure, the matrix has as entries the pairwise distances between the objects.


Applications and experiments1
Applications and Experiments

  • Hierarchical Clustering

  • Dataset: 17th Century painters


Applications and experiments2
Applications and Experiments

  • SVM-NGD Learning

  • The author uses the anchor words to convert each of the 40 training words w1, …, w40 to 6-dimensional training vector v1,…v40.

  • The entry vj,i of vj=(vj,1,…,vj,6) is defined as vj,i=NGD(wj,ai) (1≦j ≦ 40, 1 ≦ i ≦ 6)



Comparison to wordnet semantics
Comparison to WordNet semantics

  • Randomly selected 100 semantic categories from the WordNet database

    • for each category, SVM is trained on 50 labeled training samples

      Positive examples are from WordNet, others are from dictionary

    • Per experiment is used a total of six anchors, 3 are from WordNet, 3 are from dictionary

    • Testing dataset, 20 new examples

    • Running with 100 experiments

  • The author ignores the false negatives


Conclusion
Conclusion

  • This knowledge base was created over the course of decades by paid human experts.

  • Google has already indexed more than 8 billion pages and shows no signs of slowing down.

    • Someone who estimated the 8-billion indexed pages was in 2004.


Opinion
Opinion

  • Advantage

    • Google search engine was respected recently for similarity measure.

  • Drawback

    • anchors determination, accuracy measure (ignore false-negative)

    • NGD is a nothing novel but a demonstration straightly

  • Application