slide1
Download
Skip this Video
Download Presentation
Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing

Loading in 2 Seconds...

play fullscreen
1 / 115

Melbourne 2006 - PowerPoint PPT Presentation


  • 291 Views
  • Uploaded on

Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing. Paul Buitelaar Language Technology Lab & Competence Center Semantic Web DFKI GmbH Saarbrücken, Germany. Overview. Day 1: Words and Meanings Human language as a system

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Melbourne 2006' - Angelica


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Lexical Semantics and OntologiesTutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing

Paul Buitelaar

Language Technology Lab &

Competence Center Semantic Web

DFKI GmbH

Saarbrücken, Germany

overview
Overview
  • Day 1: Words and Meanings
    • Human language as a system
    • How do words relate to each other
  • Day 2: Words and Object Descriptions
    • Human language as a means of representation
    • How do words represent objects in the/a world
day 1 introduction
Day 1 - Introduction
  • Words and Meanings
    • Synsets and Senses
      • Lexical Semantics in WordNet
    • Related Senses
      • Generative Lexicon and CoreLex
    • Domains and Senses
      • Tuning WordNet to a Domain
words and meanings

Words and Meanings

Lexical Semantics in WordNetGenerative Lexicon and CoreLexTuning WordNet to a Domain

wordnet
WordNet
  • Lexical Semantic Resource
    • Semantic Lexicon
      • Maps words to meanings (senses)
    • Lexical Database
      • Machine readable (has a formal structure)
  • Freely available
    • http://wordnet.princeton.edu/
wordnet origins
WordNet - Origins

In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database …

The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically …

WordNet … instantiates hypotheses based on results of psycholinguistic research …

… expose such hypotheses to the full range of the common vocabulary

In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter ‘‘apple,’’ even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. (Caramazza/Berndt 1978)

Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. ``Introduction to WordNet: an on-line lexical database.\'\' In: International Journal of Lexicography 3 (4), 1990, pp. 235 - 244.

synsets
Synsets
  • WordNet is organized around word meaning (not word forms as with traditional lexicons)
    • Word meaning is represented by “synsets”
    • Synset is a “Set of Synonyms”
  • Example
    • {board, plank}
      • Piece of lumber
    • {board, committee}
      • Group of people
synset hierarchy
Synset Hierarchy
  • Synsets are organized in hierarchies
    • Defines:
      • generalization (hypernymy)
      • specialization (hyponymy)
  • Example

{entity}

{whole, unit}

{building material}

{lumber, timber}

{board, plank}

hypernymy

hyponymy

synsets and senses
Synsets and Senses
  • Synsets represent word meaning
    • Words that occur in several synsets have a corresponding number of meanings (senses)
  • Example
other wordnet relations
(Other) WordNet Relations
  • Synonymy
    • Similar in meaning
  • Hypernymy/Hyponymy
    • Generalization and Specialization
  • Meronymy
    • Part-of
      • e.g. study, bathroom, ... meronym house
  • Antonymy
    • Opposite in meaning
      • e.g. warm antonym cold
words and meanings14

Words and Meanings

Lexical Semantics in WordNetGenerative Lexicon and CoreLexTuning WordNet to a Domain

systematic polysemy
Systematic Polysemy
  • Homonymy
    • bank

embankment We walked along the bank of the Charles river.

institution Did he have an account at the HBU bank?

  • Systematic Polysemy
    • school

group (of people) The school went for an outing.

(learning) process School starts at 8.30

organization The school was founded in 1910.

building The school has a new roof.

semantic or pragmatic

Semantic Analysis Pragmatic Analysis

Lexical Items

of the

Language

Objects

in the

World

school

school

Obj1

Obj4

Obj1

Obj4

Obj2

Obj3

Obj3

Obj2

Semantic or Pragmatic?
underspecified discourse referents
Underspecified Discourse Referents
  • Anaphora Resolution
    • [A long book heavily weighted with military technicalities]NP:event-physical_object-content , in this edition it is neither so long event nor so technical content as it was originally.
  • Metonymy
    • The Boston office called
      • office > person
      • person part-ofoffice
  • Bridging
    • Peter bought a car. The engine runs well.
      • engine part-of car
    • The Boston office called. They asked for a new price.
      • office > person
generative lexicon theory
Generative Lexicon Theory

Type Coercion

I began the book

book > event

event ‘has-relation-with’book

read is-a event

  • multifaceted representation of lexical semantics
    • reflecting systematic / regular / logical polysemy
generative lexicon theory19
Generative Lexicon Theory

Qualia Structure (Pustejovsky 1995)

Formal inheritance (is-a / hyponymy)

book formal artifact, communication, …

Constitutive modification (part-of / meronymy)

book constitutive section, …

Telic purpose („what is the object used for“)

book telic read, …

Agentive causality („how did the object come about“)

book agentive write, …

corelex buitelaar 1998
CoreLex (Buitelaar 1998)
  • Automatic Qualia Structure Acquisition
    • CoreLex is an attempt to automatically acquire underspecified lexical semantic representations that reflect systematic polysemy
    • These representations can be viewed as shallow Qualia Structures
  • Sense Distribution in WordNet
    • Systematic polysemy can be empirically studied in WordNet by observing sense distributions

>> If more than two words share the same sense distribution (i.e. have the same set of senses), then this may indicate a pattern of systematic polysemy (adapted from Apresjan 1973)

systematic polysemous classes
Systematic Polysemous Classes

book 1.{publication} => artifact

2.{product, production} => artifact

3.{fact} => communication

4.{dramatic_composition, dramatic_work} => communication

5.{record} => communication

6.{section, subdivision} => communication

7.{journal} => artifact

Systematic Polysemous Class

“artifact communication”

amulet annals armband arrow article ballad bauble beacon bible birdcall blank blinker boilerplate book bunk cachet canto catalog catalogue chart chevron clout compact compendium convertible copperplate copy cordon corker ... guillotine homophony horoscope indicator journal laurels lay ledger loophole marker memorial nonsense novel obbligato obelisk obligato overture pamphlet pastoral paternoster pedal pennant phrase platform portrait prescription print puzzle radiogram rasp recap riddle rondeau … statement stave stripe talisman taw text tocsin token transcription trophy trumpery wand well whistle wire wrapper yardstick

from wordnet to corelex

Noun1

Nounn

Basic Type1

Basic Type1

Systematic

Polysemous

Class1

Systematic

Polysemous

Classn

From WordNet to CoreLex
other examples
Other Examples

“animal natural_object”

alligator broadtail chamois ermine lapin leopard muskrat ...

“natural_object plant”

algarroba almond anise baneberry butternut candlenut cardamon ...

“action artifact group_social”

artillery assembly band church concourse dance gathering institution ...

“action attribute event psychological”

appearance concentration decision deviation difference impulse outrage …

“possession quantity_definite”

cent centime dividend gross penny real shilling

representation and interpretation
Representation and Interpretation
  • „Dotted Types“ (Pustejovsky)
    • Lexical types are either simple (human, artifact, ...) or complex (information AND physical_object)
    • Can be represented with a „dotted type“, e.g.

informationphysical_object

    • In (Cooper 2005) interpreted as a record type (a delicious lunch can take forever):
related work
Related Work
  • Apresjan 1973
    • Regular Polysemy.
  • Nunberg & Zaenen 1992
    • Systematic polysemy in lexicology and lexicography.
  • Bill Dolan 1994
    • Word Sense Ambiguation: Clustering Related Senses.
  • Copestake & Briscoe 1996
    • Semi-productive polysemy and sense extension.
  • Peters, Peters & Vossen 1998
    • Automatic Sense Clustering in EuroWordNet.
  • Tomuro 1998
    • Semi-Automatic Induction of Systematic Polysemy from WordNet.
words and meanings27

Words and Meanings

Lexical Semantics in WordNetGenerative Lexicon and CoreLexTuning WordNet to a Domain

reducing ambiguity
Reducing Ambiguity
  • WordNet has too many senses …
  • Reduce Ambiguity
    • Cluster related senses (CoreLex)
    • Tune WordNet to an application domain
domains and senses
Domains and Senses

Domains determine Sense Selection, e.g.

  • English: cell
    • prison cell in the Politics/Law domain
    • living cell in the Biomedical domain
  • English: tissue
    • living tissue in the Biomedical domain
    • cloth in the Fashion domain
  • German: Probe
    • test in the Biomedical domain
    • rehearsal in the Theater domain

>> Compute Domain-Specific Sense

approaches
Approaches
  • Subject Codes
    • Domain codes are in the dictionary
  • Topic Signatures
    • Compute (domain-specific) context models from dictionary definitions, domain corpora, web resources
  • Tuning of WordNet to a domain
    • Top Down: Cucchiarelli & Velardi, 1998
    • Bottom Up: Buitelaar & Sacaleanu, 2001
    • Related recent work: McCarthy et al, 2004; Chan & Ng, 2005; Mohammad & Hirst, 2006
subject codes
Subject Codes
  • Subject Codes (as used in LDOCE) indicate a domain in which a word is used in a particular sense
  • Examples (2600 codes)
    • Sub-Field Codes
      • MDZP (Medicine:Physiology)
    • Code Combinations
      • MLCO (Meteorology+Building) e.g. lightning conductor
      • MLUF (Meteorology+Europe+France) e.g. Mistral
adding subject codes to wordnet
Adding Subject Codes to WordNet
  • Grouping Synsets together across POS

MEDICINE Nouns: doctor#1, hospital#1 Verbs: operate#7

  • Grouping Synsets together across Sub-Hierarchies

SPORT life_form#1: athlete#1

physical_object#1: game_equipment#1

act#2 : sport#1

location#1 : playing_field#1

Magnini B. & Cavaglià G. Integrating Subject Field Codes into WordNet In: Proceedings LREC 2000

wordnet domains
WordNet DOMAINS

Bernardo Magnini, Carlo Strapparava, Giovanni Pezzuli, and Alfio Gliozzo. Using domain information for word sense disambiguation. In: Proceedings of the SENSEVAL2 workshop 2001.

wsd with subject codes
WSD with Subject Codes
  • Match between set of words in the context of the ambiguous word and the set of words (“neighborhoods”) in the definitions + sample sentences of all senses that share a Subject Code

bank: Economics

bank: Medicine and Biology

Guthrie J. A. & Guthrie I. & Wilks Y. & Aidinejad H. Subject Dependent Co-Occurrence and Word Sense Disambiguation In: Proceedings of ACL 1991.

topic signatures from the web
Topic Signatures from the Web
  • Construct Topic Signatures for WordNet synsets/senses
    • Retrieve document collections from the web and use queries constructed for each WordNet sense, e.g.

( boy AND ( altar boy OR ball boy OR … OR male person )

AND NOT (man OR … OR broth of a boy OR

son OR … OR mama’s boy OR black ) )

Agirre E. & Ansa O. & Hovy E. & Martinez D. Enriching very large ontologies using the WWW In: Proc. of the Ontology Learning Workshop ECAI 2000

top down tuning cucchiarelli velardi
Top Down Tuning – Cucchiarelli & Velardi
  • Automatically find the best set of (WordNet) senses that:
    • “… represent at best the semantics of the domain”
    • “[has the] … ‘right’ level of abstraction, so as to mediate between over-ambiguity and generality”
    • “… [is] balanced …, i.e. words should be evenly distributed among categories”

Alessandro Cucchiarelli, Paola Velardi Finding a domain-appropriate sense inventory for semantically tagging a corpus. Natural Language Engineering 4/4, p.325-344, Dec. 1998.

methods used
Methods Used
  • Create alternative sets of balanced categories by use of an adapted version of the Hearst/Schütze algorithm
  • Apply a scoring function to find the best set, with parameters:
    • Generality
      • Highest possible level of generalization with a small number of categories is preferred
    • Discrimination Power
      • Different senses lead to different categories
    • (Domain) Coverage
      • Words in the domain corpus that are represented by the selected categories
    • Average Ambiguity
      • Ambiguity reduction is measured by the inverse of the average ambiguity of all words
balanced categories hearst sch tze
Balanced Categories - Hearst/Schütze
  • Reduce WordNet noun hierarchy to a set of 726 disjoint categories, each consisting of a relatively large number of synsets and of an average size, with as small a variance as possible
  • Group categories together into a set of 106 super-categories according to mutual co-occurrence in a training corpus
  • Measure the frequency of categories on domain corpora

United States Constitution

Genesis

Hearst M. & Schütze H. Customizing a Lexicon to Better Suit a Computational Task In: Proceedings ACL SIGLEX Workshop 1993

generality
Generality

Generality of Category Set Ci: 1/DM(Ci)

Average Distance between the Categories of Ci and the topmost synsets.

4 + 3 / 2

3 / 1

Ci = {Ci1, Ci2}

DM (Ci )= (3.5 + 3) / 2 = 3.25

Topmost SynSet

Ci1

Ci2

General SynSet

discrimination power
Discrimination Power

Discrimination Power of Category Set Ci:

(Nc(Ci) - Npc(Ci))/ Nc(Ci)

where Nc(Ci) is the number of words that reach at least one category of Ci and Npc(Ci) is the number of words that have at least two senses that reach the same category cij of Ci

Ci1

Ci2

Ci3

Ci4

Ci = {Ci1Ci2 Ci3Ci4}

General Synset

Sense

Domain Word

w1

w2

w3

coverage average ambiguity
Coverage & Average Ambiguity

Coverage of Category Set Ci: Nc(Ci)/W

where Nc(Ci) is the number of words that reach at least one category in Ci

Inverse of Average Ambiguity of Category Set Ci: 1/A(Ci)

where Nc(Ci) is the number of words that reach at least one category in Ci , and foreach word w in this set, Cwj(Ci) is the number of categories in Ci reached

best category set wsj
Best Category Set (WSJ)

Top Down categories for the financial domain, based on the Wall Street Journal

sense selection with wsj set
Sense Selection with WSJ Set

Senses for stock - kept by domain tuning on the Wall Street Journal

Senses for stock - discarded by domain tuning on the Wall Street Journal

bottom up tuning buitelaar sacaleanu
Bottom Up Tuning – Buitelaar & Sacaleanu
  • Ranking of WordNet synsets according to a domain-specific corpus
    • Compute term relevance against reference corpus
    • Compute synset relevance according to term relevance (where term = synonym in synset)
    • Ranking can be used in WSD (similar to usage of ‘most frequent heuristic’)

Paul Buitelaar, Bogdan Sacaleanu Ranking and Selecting Synsets by Domain Relevance In: Proceedings of WordNet and Other Lexical Resources: Applications, Extensions and Customizations, NAACL 2001 Workshop, June 3/4 2001

tfidf
TFIDF

The word is more important if it appears

several times in a target document

The word is more important if it appears in less documents

tf(w) term frequency (number of word occurrences in a document)

df(w) document frequency (number of documents containing the word)

N number of all documents

tfIdf(w) relative importance of the word in the document

term and synset relevance
Term and Synset Relevance
  • Term Relevance
    • Relevance Score of Synset Members

where t represents the term, d the domain, N is the total number of domains

  • Synset Relevance
    • Cumulated Relevance Score for a Synset
extended synset relevance
Extended Synset Relevance
  • Lexical Coverage
    • Take Length of the Synset Into Account

[Gefängniszelle, Zelle] ("prison cell")

[Zelle] ("living cell")

  • Hyponyms
    • Take Hyponyms Into Account

[Zelle,Gefängniszelle,Todeszelle]

[Zelle,Körperzelle,Pflanzenzelle]

related recent work
Related Recent Work
  • Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll
    • Finding predominant senses in untagged text. In Proc. of ACL 2004.
  • Chan, Yee Seng and Ng, Hwee Tou (2005)
    • Word Sense Disambiguation with Distribution Estimation. Proc. of IJCAI 2005.
  • Mohammad, Saif and Hirst, Graeme.
    • Determining word sense dominance using a thesaurus. Proc. of EACL 2006.
day 2 introduction
Day 2 - Introduction
  • Words and Object Descriptions
    • Semantics on the Semantic Web
      • Semantic Web, Ontologies and Natural Language Processing
    • The Lexical Semantic Web
      • Knowledge Representation as Word Meaning
    • A Lexicon Model for Ontologies
      • Enriching Ontologies with Linguistic Information
words and object descriptions

Words and Object Descriptions

Semantics on the Semantic WebThe “Lexical Semantic Web”A Lexicon Model for Ontologies

slide56

Formal Interpretation - Knowledge Markup

Semantic Web

Knowledge

Markup

Ontologies

slide57

Formal Interpretation - Knowledge Markup

Semantic Web

Knowledge

Markup

Ontologies

slide58

Formal Interpretation - Knowledge Markup

Semantic Web

Knowledge

Markup

Ontologies

slide59

Turns the Web into a Knowledge Base

Knowledge

Markup

Ontologies

slide60

Enables Semantic Web Services …

Semantic

Web Services

Knowledge

Markup

Ontologies

slide61

… and Intelligent Man-Machine Interface

Semantic

Web Services

Knowledge

Markup

Ontologies

Intelligent

Man-Machine Interface

resource description framework rdf
Resource Description Framework (RDF)

DFKI GmbH

name

node1

www

http://www.dfki.de

location

Kaiserslautern

rdf xml based representation
RDF : XML-based Representation

<?xml version=‘1.0’ ?>

<rdf:RDF

xmlns:rdf=“… rdf-syntax-ns#”

xmlns:rdfs=“… rdf-schema#”

xmlns=“http://example.org”>

<rdf:Descriptionrdf:nodeID=“node1”>

<name>DFKI GmbH</name>

<location>Kaiserslautern</location>

<www rdf:resource=“http://www.dfki.de” />

</rdf:Description>

</rdf:RDF>

rdf schema rdfs
RDF Schema (RDFS)

Representation of classes and properties

Student

enrolledIn

is-a

Course

Person

Teacher

is-a

teaches

name

rdf:Literal

web ontology language owl
Web Ontology Language (OWL)
  • OWL adds further modelling vocabulary on top of RDFS, e.g.
    • Class equivalence
    • Property types (data vs. object property)
  • Based on Description Logics, three versions
    • OWL Lite
    • OWL DL
    • OWL Full
slide68
OWL

Extended knowledge representation

Student

is-a

enrolledIn

Course

disjoint

Person

Teacher

is-a

teaches

name

rdf:Literal

xml rdf rdfs owl

Syntax

Semantics

XML

XML Schema

NamespacesInterpretation Context

Data Types

Formalization:

Class Definition, Properties

RDF Schema

RDF

Formalization:

extended Class Definition, Properties, Property Types

OWL

XML – RDF – RDFS - OWL
ontologies what they are
Ontologies – What they are
  • Ontology refers to an engineering artifact
    • a specific vocabulary used to describe a certain reality
    • a set of explicit assumptions regarding the intended meaning of the vocabulary
  • An Ontology is
    • an explicit specification of a conceptualization [Gruber 93]
    • a shared understanding of a domain of interest [Uschold/Gruninger 96]
ontologies why you need them
Ontologies – Why you need them
  • Make domain assumptions explicit
    • Easier to exchange domain assumptions
    • Easier to understand and update legacy data
  • Separate domain knowledge from operational knowledge
    • Re-use domain and operational knowledge separately
  • A community reference for applications
  • Shared understanding of what particular information means
applications of ontologies
Applications of Ontologies
  • NLP
    • Information Extraction, e.g. Buitelaar et al. 06, Mädche, Staab & Neumann 00, Nedellec, Rebholz
    • Information Retrieval (Semantic Search), e.g. WebKB (Martin et al. 00), OntoSeek (Guarino et al. 99), Ontobroker (Decker et al. 99)
    • Question Answering, e.g. Harabagiu, Schlobach & de Rijke, Aqualog (Lopez and Motta 04)
    • Machine Translation, e.g.Nirenburg et al. 04, Beale et al. 95, Hovy, Knight
  • Other
    • Business Process Modeling, e.g. Uschold et al. 98
    • Digital Libraries, e.g. Amann & Fundulaki 99
    • Information Integration, e.g. Kashyap 99; Wiederhold 92
    • Knowledge Management (incl. Semantic Web), e.g. Fensel 01, Staab & Schnurr 00; Sure et al. 00, Abecker et al. 97
    • Software Agents, e.g. Gluschko et al. 99; Smith & Poulter 99
    • User Interfaces, e.g. Kesseler 96
ontologies and their relatives
Ontologies and Their Relatives

General logical

constraints

Formal isa

Thesauri

Catalogs

Glossaries & Terminologies

Axioms:

Disjoint/Inverse…

Semantic Networks

Formal Instance

thesauri examples eurovoc
Thesauri – Examples : EuroVoc
  • EuroVoc
    • covers terminology in all of the official EU languages
    • for all fields (27) that concern the EU institutions, e.g. politics, trade, law, science, energy, agriculture

MT 3606 natural and applied sciences

UF gene pool

genetic resource

genetic stock

genotype

heredity

BT1 biology

BT2 life sciences

NT1 DNA

NT1 eugenics

RT genetic engineering (6411)

thesauri examples mesh
Thesauri – Examples : MeSH
  • MeSH (Medical Subject Headings)
    • organized by terms (~ 250,000) that correspond to medical subjects
    • for each term syntactic, morphological or semantic variants are given

MeSH Heading Databases, Genetic

Entry Term Genetic Databases

Entry Term Genetic Sequence Databases

Entry Term OMIM

Entry Term Online Mendelian Inheritance in Man

Entry Term Genetic Data Banks

Entry Term Genetic Data Bases

Entry Term Genetic Databanks

Entry Term Genetic Information Databases

See Also Genetic Screening

semantic networks examples umls
Semantic Networks - Examples : UMLS
  • Unified Medical Language System
    • integrates linguistic, terminological and semantic information
    • Semantic Network consists of 134 semantic types and 54 relations between types

Pharmacologic Substance affects Pathologic Function

Pharmacologic Substance causes Pathologic Function

Pharmacologic Substance complicates Pathologic Function

Pharmacologic Substance diagnoses Pathologic Function

Pharmacologic Substance prevents Pathologic Function

Pharmacologic Substance treats Pathologic Function

semantic networks examples go
Semantic Networks - Examples : GO
  • GO (Gene Ontology)
    • Aligns descriptions of gene products in different databases, including plant, animal and microbial genomes
    • Organizing principles are molecular function, biological process and cellular component

Accession: GO:0009292

Ontology: biological process

Synonyms: broad: genetic exchange

Definition: In the absence of a sexual life cycle, the processes involved in the introduction of genetic information to create a genetically different individual.

Term Lineage all : all (164142)

GO:0008150 : biological process (115947)

GO:0007275 : development (11892)

GO:0009292 : genetic transfer (69)

ontologies example ii

F-Logic

Ontology

similar

Ontologies – Example II

Geographical Entity (GE)

is-a

flow_through

Inhabited GE

Natural GE

capital_of

city

mountain

river

country

instance_of

located_in

Zugspitze

Neckar

Germany

capital_of

height (m)

length (km)

flow_through

located_in

Stuttgart

Berlin

2962

367

flow_through

Design: Philipp Cimiano

ontologies for nlp
Ontologies for NLP
  • Information Retrieval
    • Query Expansion
  • Machine Translation
    • Interlingua
  • Information Extraction
    • Template Definition
    • Semantic Integration
  • Question Answering
    • Question Analysis
    • Answer Selection
information extraction
Information Extraction
  • Class-based Template Definition
    • Allows for Reasoning over Extracted Templates with Respect to the Ontology (see e.g. [Nedellec and Nazarenko 2005] for discussion)
  • Semantic Integration
    • Extraction from Heterogeneous Sources (Text, Tables and other Semi-Structured Data, Image Captions) – SmartWeb [Buitelaar et al. 06]
    • Multi-Document Information Extraction – ArtEquAKT [Alani et al. 2003]
question answering
Question Answering
  • Question Analysis
    • Ontology/WordNet-based Semantic Question Interpretation (e.g. [Pasca and Harabagiu 01])
  • Answer Selection
    • Ontology/WordNet-based Reasoning for Answer Type-Checking
      • Ontology of Events [Sinha and Narayanan 05]
      • Geographical Ontology, WordNet [Schlobach & de Rijke 04]
      • WordNet [Pasca and Harabagiu 01]
  • Ontology-based Question Answering
    • Derive Answers from a Knowledge Base (e.g. Aqualog [Lopez & Motta 04])
ontology life cycle
Ontology Life Cycle

Populate

Knowledge Base Generation

Validate

Consistency Checks

Create/Select

Development and/or Selection

Evolve

Extension, Modification

Deploy

Knowledge Retrieval

Maintain

Usability Tests

nlp in the ontology life cycle
NLP in the Ontology Life Cycle

Ontology Population

Information Extraction

KB Retrieval

Question Answering

OntologyLearning

Text Mining

slide86

GeneralAxioms

Axiom Schemata

Relation Hierarchy

Relations

Concept Hierarchy

Concept Formation

(Multilingual) Synonyms

Terms

Ontology Learning

Design: Philipp Cimiano

words and object descriptions87

Words and Object Descriptions

Semantics on the Semantic WebThe “Lexical Semantic Web”A Lexicon Model for Ontologies

dictionary words and senses
Dictionary: Words and Senses
  • Represent interpretations of words through senses, very much like classes that are assigned to a word, e.g.

article

1. An individual thing or element of a class…

2. A particular section or item of a series in a written document…

3. A non-fictional literary composition that forms an independent part of a publication…

4. The part of speech used to indicate nouns and to specify their application

5. A particular part or subject; a specific matter or point

(as provided by http://dictionary.reference.com/)

ontology classes and labels i
Ontology: Classes and Labels - I
  • Ontologies assign labels (i.e. words) to a given class
  • In the COMMA ontology on document management the class article corresponds to sense 2 (‘section of a written document’):

http://pauillac.inria.fr/cdrom/ftp/ocomma/comma.rdfs

ontology classes and labels ii
Ontology Classes and Labels - II
  • In the GOLD ontology on linguistics, the class label article corresponds to sense 4 (‘part of speech ’):

http://emeld.org/gold

the meaning of director i
The Meaning of Director - I

The Semantic Web can be viewed as a large, distributed dictionary (or rather a semantic lexicon) in which we can look up the meaning of words, e.g. director

… as a ‘role’ (AgentCities ontology)

http://www-agentcities.doc.ic.ac.uk/ontology/shows.daml

the meaning of director ii
The Meaning of Director - II

… as ‘head of a program’ (University Benchmark ontology)

http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl

exploring the lexical semantic web
Exploring the Lexical Semantic Web
  • Collect ontologies
    • OntoSelect
  • Analyse the use of class/property labels
  • Treat class/property labels as lexical entries
    • Normalize
    • Organize by language
ontology collection
Ontology Collection
  • OntoSelect
    • Web Monitor on DAML, RDFS, OWL Files
    • Download, Analyze and Store Included Information and Metadata
      • Class and Property Labels
      • Multilingual Information
      • Included Ontologies
    • Ontology Ranking and Selection Functionalities

http://olp.dfki.de/OntoSelect

words and object descriptions99

Words and Object Descriptions

Semantics on the Semantic WebThe “Lexical Semantic Web”A Lexicon Model for Ontologies

ontologies example iii continued
Ontologies – Example III (continued)

Student

studies_at

located_at

University

Campus

works_at

is_part_of

“Fakultät”

Staff

ontologies example iii continued102
Ontologies – Example III (continued)

Student

studies_at

located_at

University

Campus

works_at

is_part_of

“Fakultät”

Staff

has_German_term

Fakultät

has_Dutch_term

Faculteit

has_US_English_term

School

ontologies example iii continued103
Ontologies – Example III (continued)

University

is_part_of

“Fakultät”

has_term

Term

instance_of

instance_of

Fakultät

faculteit

school

language

language

language

DE

NL

EN-US

semiotic triangle
Semiotic Triangle
  • Ogden & Richards, 1923
  • based on Structural Linguistics studies (de Saussure, 1916)
  • adopted in Knowledge Representation (e.g. Sowa, 1984)
linginfo model simplified
LingInfo Model – Simplified

Design: Michael Sintek

linginfo instances example
LingInfo Instances - Example

Fußballspielers

„of the football player“

conclusions115
Conclusions
  • WordNet: Appropriate Use may include
    • Introduction of underspecified senses (sense grouping)
    • Tuning to a domain
  • The “Lexical Semantic Web”
    • The Semantic Web (and Web 2.0) is a potentially rich resource for (formal) lexical semantics
    • Mining such resources for lexical semantics (i.e. compilation of a distributed semantic lexicon) only just started
    • Ontologies to be extended with linguistic/lexical information
ad