the google similarity distance
Skip this Video
Download Presentation
The Google Similarity Distance

Loading in 2 Seconds...

play fullscreen
1 / 13

The Google Similarity Distance - PowerPoint PPT Presentation

  • Uploaded on

The Google Similarity Distance. Presenter : Chien-Hsing Chen Author: Rudi L. Cilibrasi Paul M.B. Vitanyi. 2007,TKDE. Outline. Motivation Objective NGD Experiments Conclusions Personal Opinion. Motivation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' The Google Similarity Distance' - vilmos

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the google similarity distance

The Google Similarity Distance

Presenter:Chien-Hsing Chen

Author: Rudi L. Cilibrasi

Paul M.B. Vitanyi




  • Motivation
  • Objective
  • NGD
  • Experiments
  • Conclusions
  • Personal Opinion
  • great cost of designing structures capable of manipulating knowledge
  • entering high quality contents in these structures by knowledgeable human experts
  • the efforts are long-running
  • large scale
  • The author develop a method that uses only the name of an object and obtains knowledge about the similarity of objects
    • a regular FCA, used in Ontology, acquires the similarity between objects and attributes
the google similarity distance1
The Google Similarity Distance

Kolmogorov complexity

the google similarity distance2
The Google Similarity Distance
  • NGD (horse, rider) = 0.443
    • “horse” 46,700,000 pages
    • “rider” 12,200,000 pages
    • “horse, rider” 2,630,000 pages
    • N= Indexed 8,058,044,651 pages

NGD(pensi, cola)=0.797

NGD(賓拉登, 攻擊)=0.64

NGD(horse, rider)=0.898

NGD(book, drink)=0.694

NGD(web, network)=0.2768

applications and experiments
Applications and Experiments
  • Hierarchical Clustering
  • Given a set of objects in a space provided with a distance measure, the matrix has as entries the pairwise distances between the objects.
applications and experiments1
Applications and Experiments
  • Hierarchical Clustering
  • Dataset: 17th Century painters
applications and experiments2
Applications and Experiments
  • SVM-NGD Learning
  • The author uses the anchor words to convert each of the 40 training words w1, …, w40 to 6-dimensional training vector v1,…v40.
  • The entry vj,i of vj=(vj,1,…,vj,6) is defined as vj,i=NGD(wj,ai) (1≦j ≦ 40, 1 ≦ i ≦ 6)
comparison to wordnet semantics
Comparison to WordNet semantics
  • Randomly selected 100 semantic categories from the WordNet database
    • for each category, SVM is trained on 50 labeled training samples

Positive examples are from WordNet, others are from dictionary

    • Per experiment is used a total of six anchors, 3 are from WordNet, 3 are from dictionary
    • Testing dataset, 20 new examples
    • Running with 100 experiments
  • The author ignores the false negatives
  • This knowledge base was created over the course of decades by paid human experts.
  • Google has already indexed more than 8 billion pages and shows no signs of slowing down.
    • Someone who estimated the 8-billion indexed pages was in 2004.
  • Advantage
    • Google search engine was respected recently for similarity measure.
  • Drawback
    • anchors determination, accuracy measure (ignore false-negative)
    • NGD is a nothing novel but a demonstration straightly
  • Application