Learning taxonomic relations from heterogeneous evidence
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

Learning Taxonomic Relations from Heterogeneous Evidence PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on
  • Presentation posted in: General

Learning Taxonomic Relations from Heterogeneous Evidence. Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004). Purpose. To examine the possibility of learning taxonomic relations by considering various sources of evidence

Download Presentation

Learning Taxonomic Relations from Heterogeneous Evidence

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning taxonomic relations from heterogeneous evidence

Learning Taxonomic Relations from Heterogeneous Evidence

Philipp Cimiano

Aleksander Pivk

Lars Schmidt-Thieme

Steffen Staab (ECAI 2004)


Purpose

Purpose

  • To examine the possibility of learning taxonomic relations by considering various sources of evidence

  • Main aim:

    • To gain insight into the behavior of different approaches to learn taxonomic relations

    • To provide a first step towards combining these different approaches

    • To establish a baseline model for further research


Introduction

Introduction

  • Taxonomies or conceptual hierarchies are useful in many NLP applications.

  • However, the development of suitable ontologies is time-consuming.

  • Automatically acquiring ontological knowledge is required.

  • The approach proposed in this paper learns taxonomic relations (is-a relation) by considering four different evidences:

    • Hearst-patterns matched in a large corpus

    • Hearst-patterns matched in WWW

    • WordNet

    • The ‘vertical relations’-heuristic


Introduction1

Introduction

  • Goal:

    • Learning is-a relations in tourism domain

  • Training Corpus:

    • Domain-specific:

      • http://www.lonelyplanet.com

      • http://www.all-inall.de

    • General:

      • British National Corpus

  • The ontology for evaluation:

    • A tourism reference ontology modeled by ontology engineer.

    • A few abstract concepts are removed.

    • 272 concepts, 225 direct is-a relations, and 636 non-direct is-a relations


Hearst patterns

Hearst Patterns

  • Lexico-syntactic patterns proposed by Hearst (1992).

    • N such as N1, N2,…

    • such N as N1, N2,…

    • N1, N2,… and other N

    • N, (especially | including) N1, N2,…

  • From these patterns, we could derive is-a(Ni, N).

  • Numbers of Hearst-patterns between different terms are recorded and normalized to 0~1.

  • Different thresholds are set and experimented.


Hearst patterns1

Hearst Patterns


Wordnet

WordNet

  • WordNet is not “unstructured” source of evidence.

  • However, it is general and domain-independent.

  • One term may have several senses and there may be more than one hypernym relation between two terms.

  • Two different strategies are used:

    • Normalizing all hypernym paths between two terms:

    • Considering only the most frequent sense of t1


Wordnet1

WordNet


Wordnet2

WordNet


Vertical relations heuristic

‘Vertical Relations’-Heuristic

  • Given t1 and t2, if t2 matches t1 and t1 is additionally modified by certain terms or adjectives, the relation is-a(t1, t2) is derived.

  • Ex. is-aHEURISTIC(international conference, conference)


World wide web

World Wide Web

  • Google API (http://www.google.com/apis/) is used to count the matches of certain Hearst-patterns in the Web.

  • The sum of the number of Google hits over all patterns for a certain pair (t1, t2) is normalized by dividing through the number of hits returned for t1.


World wide web1

World Wide Web


Combining evidences

Combining Evidences


Conclusion and further work

Conclusion and Further Work

  • A simple combination strategy improves the results.

  • It remains further work to find out if other sources of evidence could be integrated into this approach.

  • It could turn out to be useful to only consider domain-specific text collections instead of a general corpus such as the BNC and to consider only pages in the World Wide Web related to the domain.

  • It remains as a challenge to determine the optimal strategy to combine the different approaches.

  • In order to apply machine learning techniques for this purpose, it is necessary to cope with the high number of negative examples.


  • Login