Semantic similarity methods in wordnet and their application to information retrieval on the web
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web. Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou Euripides G.M. Petrakis Evangelos Milios. Semantic Similarity.

Download Presentation

Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Semantic similarity methods in wordnet and their application to information retrieval on the web

Semantic Similarity Methods in WordNet andTheir Application to Information Retrieval onthe Web

Giannis Varelas

Epimenidis Voutsakis

Paraskevi Raftopoulou

Euripides G.M. Petrakis

Evangelos Milios

Semantic Similarity


Semantic similarity

Semantic Similarity

  • Semantic Similarity relates to computing the conceptual similarity between terms which are not lexicographically similar

    • “car” “automobile”

  • Map two terms to an ontology and compute their relationship in that ontology

Semantic Similarity


Objectives

Objectives

  • We investigate several Semantic Similarity Methods and we evaluate their performance

    • http://www.ece.tuc.gr/similarity

  • We propose the Semantic Similarity Retrieval Model (SSRM) for computing similarity between documents containing semantically similar but not necessarily lexicographically similar terms

    • http://www.ece.tuc.gr/intellisearch

Semantic Similarity


Ontologies

Ontologies

  • Tools of information representation on a subject

  • Hierarchical categorization of terms from general to most specific terms

    • object  artifact  construction  stadium

  • Domain Ontologies representing knowledge of a domain

    • e.g., MeSH medical ontology

  • General Ontologies representing common sense knowledge about the world

    • e.g., WordNet

Semantic Similarity


Wordnet

WordNet

  • A vocabulary and a thesaurus offering a hierarchical categorization of natural language terms

  • More than 100,000 terms

  • An ontology of natural language terms

  • Nouns, verbs, adjectives and adverbs are grouped into synonym sets (synsets)

  • Synsets represent terms or concepts

    • stadium, bowl, arena, sports stadium – (a large structure for open-air sports or entertainments)

Semantic Similarity


Wordnet hierarchies

WordNet Hierarchies

  • The synsets are also organized into senses

  • Senses: Different meanings of the same term

  • The synsets are related to other synsets higher or lower in the hierarchy by different types of relationships e.g.

    • Hyponym/Hypernym (Is-A relationships)

    • Meronym/Holonym (Part-Of relationships)

  • Nine noun and several verb Is-A hierarchies

Semantic Similarity


A fragment of the wordnet is a hierarchy

A Fragment of the WordNet Is-A Hierarchy

Semantic Similarity


Semantic similarity methods in wordnet and their application to information retrieval on the web

Semantic Similarity


Semantic similarity methods

Semantic Similarity Methods

  • Map terms to an ontology and compute their relationship in that ontology

  • Four main categories of methods:

    • Edge counting: path length between terms

    • Information content: as a function of their probability of occurrence in corpus

    • Feature based: similarity between their properties (e.g., definitions) or based on their relationships to other similar terms

    • Hybrid: combine the above ideas

Semantic Similarity


Example

Example

  • Edge counting distance between “conveyance” and “ceramic” is 2

  • An information content method, would associate the two terms with their common subsumer and with their probabilities of occurrence in a corpus

Semantic Similarity


Semantic similarity on wordnet

Semantic Similarity on WordNet

  • The most popular methods are evaluated

  • All methods applied on a set of 38 term pairs

  • Their similarity values are correlated with scores obtained by humans

  • The higher the correlation of a method the better the method is

Semantic Similarity


Evaluation

Evaluation

Semantic Similarity


Observations

Observations

  • Edge counting/Info. Content methods work by exploiting structure information

  • Good methods take the position of the terms into account

  • Higher similarity for terms which are close together but lower in the hierarchy e.g., [Li et.al. 2003]

  • Information Content is measured on WordNet rather than on corpus [Seco2002]

  • Similarity only for nouns and verbs

  • No taxonomic structure for other p.o.s

Semantic Similarity


Http www ece tuc gr similarity

http://www.ece.tuc.gr/similarity

Semantic Similarity


Semantic similarity retrieval model ssrm

Semantic Similarity Retrieval Model (SSRM)

  • Classic retrieval models retrieve documents with the same query terms

  • SSRM will retrieve documents which also contain semantically similar terms

  • Queries and documents are initially assigned tfxidf weights

  • q=(q1,q2,…qN) , d=(d1,d2,…dN)

Semantic Similarity


Semantic similarity methods in wordnet and their application to information retrieval on the web

SSRM

  • Query term re-weighting

    similar terms reinforce each other

  • Query term expansion with synonyms and similar terms

  • Document similarity

Semantic Similarity


Query term expansion

Query Term Expansion

Semantic Similarity


Observations1

Observations

  • Specification of T ?

  • Large T may lead to topic drift

  • Word sense disambiguation for expanding with the correct sense

  • Expansion with co-concurring terms?

    • SVD, local/global analysis

  • Semantic similarity between terms of different parts of speech?

  • Work with compound terms (phrases)

Semantic Similarity


Evaluation of ssrm

Evaluation of SSRM

  • SSRM is evaluated through intellisearcha system for information retrieval on the WWW

  • 1,5 Million Web pages with images

  • Images are described by surrounding text

  • The problem of image retrieval is transformed into a problem of text retrieval

Semantic Similarity


Http www ece tuc gr intellisearch

http://www.ece.tuc.gr/intellisearch

Semantic Similarity


Methods

Methods

  • Vector Space Model (VSM)

  • SSRM

  • Each method is represented by a precision/recall plot

  • Each point is the average precision/recall over 20 queries

  • 20 queries from the list of the most frequent Google image queries

Semantic Similarity


Experimental results

Experimental Results

Semantic Similarity


Mesh and medline

MeSH and MedLine

  • MeSH: ontology for medical and biological terms by the N.L.M.

    • 22,000 terms

  • MedLine: the premier bibliographic medical database of N.L.M.

    • 13 Million references

Semantic Similarity


Evaluation on medline

Evaluation on MedLine

Semantic Similarity


Conclusions

Conclusions

  • Semantic similarity methods approximated the human notion of similarity reaching correlation up to 83%

  • SSRM exploits this information for improving the performance of retrieval

  • SSRM can work with any semantic similarity method and any ontology

Semantic Similarity


Future work

Future Work

  • Experimentation with more data sets (TREC) and ontologies

  • Extend SSRM to work with

    • Compound terms

    • More parts of speech (e.g., adverbs)

    • Co-occurring terms

    • More terms relationships in WordNet

    • More elaborate methods for specification of thresholds

Semantic Similarity


Try our system on the web

Try our system on the Web

  • Semantic Similarity System: http://www.ece.tuc.gr/similarity

  • SRRM: http://www.ece.tuc.gr/intellisearch

Semantic Similarity


  • Login