Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing - PowerPoint PPT Presentation

Slide1 l.jpg
Download
1 / 115

  • 276 Views
  • Uploaded on
  • Presentation posted in: Pets / Animals

Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing. Paul Buitelaar Language Technology Lab & Competence Center Semantic Web DFKI GmbH Saarbrücken, Germany. Overview. Day 1: Words and Meanings Human language as a system

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

Lexical Semantics and OntologiesTutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing

Paul Buitelaar

Language Technology Lab &

Competence Center Semantic Web

DFKI GmbH

Saarbrücken, Germany


Overview l.jpg

Overview

  • Day 1: Words and Meanings

    • Human language as a system

    • How do words relate to each other

  • Day 2: Words and Object Descriptions

    • Human language as a means of representation

    • How do words represent objects in the/a world


Day 1 introduction l.jpg

Day 1 - Introduction

  • Words and Meanings

    • Synsets and Senses

      • Lexical Semantics in WordNet

    • Related Senses

      • Generative Lexicon and CoreLex

    • Domains and Senses

      • Tuning WordNet to a Domain


Words and meanings l.jpg

Words and Meanings

Lexical Semantics in WordNetGenerative Lexicon and CoreLexTuning WordNet to a Domain


Wordnet l.jpg

WordNet

  • Lexical Semantic Resource

    • Semantic Lexicon

      • Maps words to meanings (senses)

    • Lexical Database

      • Machine readable (has a formal structure)

  • Freely available

    • http://wordnet.princeton.edu/


Wordnet origins l.jpg

WordNet - Origins

In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database …

The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically …

WordNet … instantiates hypotheses based on results of psycholinguistic research …

… expose such hypotheses to the full range of the common vocabulary

In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter ‘‘apple,’’ even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. (Caramazza/Berndt 1978)

Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. ``Introduction to WordNet: an on-line lexical database.'' In: International Journal of Lexicography 3 (4), 1990, pp. 235 - 244.


Synsets l.jpg

Synsets

  • WordNet is organized around word meaning (not word forms as with traditional lexicons)

    • Word meaning is represented by “synsets”

    • Synset is a “Set of Synonyms”

  • Example

    • {board, plank}

      • Piece of lumber

    • {board, committee}

      • Group of people


Synset hierarchy l.jpg

Synset Hierarchy

  • Synsets are organized in hierarchies

    • Defines:

      • generalization (hypernymy)

      • specialization (hyponymy)

  • Example

    {entity}

    {whole, unit}

    {building material}

    {lumber, timber}

    {board, plank}

hypernymy

hyponymy


Hierarchies wordnet 1 7 l.jpg

Hierarchies (WordNet 1.7)


Hierarchy example wordnet 2 1 l.jpg

Hierarchy Example (WordNet 2.1)


Synsets and senses l.jpg

Synsets and Senses

  • Synsets represent word meaning

    • Words that occur in several synsets have a corresponding number of meanings (senses)

  • Example


Wordnet 2 1 l.jpg

WordNet 2.1


Other wordnet relations l.jpg

(Other) WordNet Relations

  • Synonymy

    • Similar in meaning

  • Hypernymy/Hyponymy

    • Generalization and Specialization

  • Meronymy

    • Part-of

      • e.g. study, bathroom, ... meronym house

  • Antonymy

    • Opposite in meaning

      • e.g. warm antonym cold


Words and meanings14 l.jpg

Words and Meanings

Lexical Semantics in WordNetGenerative Lexicon and CoreLexTuning WordNet to a Domain


Systematic polysemy l.jpg

Systematic Polysemy

  • Homonymy

    • bank

      embankmentWe walked along the bank of the Charles river.

      institutionDid he have an account at the HBU bank?

  • Systematic Polysemy

    • school

      group (of people)The school went for an outing.

      (learning) processSchool starts at 8.30

      organizationThe school was founded in 1910.

      buildingThe school has a new roof.


Semantic or pragmatic l.jpg

Semantic Analysis Pragmatic Analysis

Lexical Items

of the

Language

Objects

in the

World

school

school

Obj1

Obj4

Obj1

Obj4

Obj2

Obj3

Obj3

Obj2

Semantic or Pragmatic?


Underspecified discourse referents l.jpg

Underspecified Discourse Referents

  • Anaphora Resolution

    • [A long book heavily weighted with military technicalities]NP:event-physical_object-content , in this edition it is neither so long event nor so technical content as it was originally.

  • Metonymy

    • The Boston office called

      • office > person

      • person part-ofoffice

  • Bridging

    • Peter bought a car. The engine runs well.

      • engine part-of car

    • The Boston office called. They asked for a new price.

      • office > person


Generative lexicon theory l.jpg

Generative Lexicon Theory

Type Coercion

I began the book

book > event

event ‘has-relation-with’book

read is-a event

  • multifaceted representation of lexical semantics

    • reflecting systematic / regular / logical polysemy


Generative lexicon theory19 l.jpg

Generative Lexicon Theory

Qualia Structure (Pustejovsky 1995)

Formalinheritance (is-a / hyponymy)

book formal artifact, communication, …

Constitutivemodification (part-of / meronymy)

book constitutive section, …

Telicpurpose („what is the object used for“)

book telic read, …

Agentivecausality („how did the object come about“)

book agentive write, …


Corelex buitelaar 1998 l.jpg

CoreLex (Buitelaar 1998)

  • Automatic Qualia Structure Acquisition

    • CoreLex is an attempt to automatically acquire underspecified lexical semantic representations that reflect systematic polysemy

    • These representations can be viewed as shallow Qualia Structures

  • Sense Distribution in WordNet

    • Systematic polysemy can be empirically studied in WordNet by observing sense distributions

      >> If more than two words share the same sense distribution (i.e. have the same set of senses), then this may indicate a pattern of systematic polysemy (adapted from Apresjan 1973)


Systematic polysemous classes l.jpg

Systematic Polysemous Classes

book1.{publication}=> artifact

2.{product, production}=> artifact

3.{fact}=> communication

4.{dramatic_composition, dramatic_work}=> communication

5.{record}=> communication

6.{section, subdivision}=> communication

7.{journal}=> artifact

Systematic Polysemous Class

“artifact communication”

amulet annals armband arrow article ballad bauble beacon bible birdcall blank blinker boilerplate book bunk cachet canto catalog catalogue chart chevron clout compact compendium convertible copperplate copy cordon corker ... guillotine homophony horoscope indicator journal laurels lay ledger loophole marker memorial nonsense novel obbligato obelisk obligato overture pamphlet pastoral paternoster pedal pennant phrase platform portrait prescription print puzzle radiogram rasp recap riddle rondeau … statement stave stripe talisman taw text tocsin token transcription trophy trumpery wand well whistle wire wrapper yardstick


From wordnet to corelex l.jpg

Noun1

Nounn

Basic Type1

Basic Type1

Systematic

Polysemous

Class1

Systematic

Polysemous

Classn

From WordNet to CoreLex


Other examples l.jpg

Other Examples

“animal natural_object”

alligator broadtail chamois ermine lapin leopard muskrat ...

“natural_object plant”

algarroba almond anise baneberry butternut candlenut cardamon ...

“action artifact group_social”

artillery assembly band church concourse dance gathering institution ...

“action attribute event psychological”

appearance concentration decision deviation difference impulse outrage …

“possession quantity_definite”

cent centime dividend gross penny real shilling


Corelex vs wordnet l.jpg

CoreLex vs. WordNet


Representation and interpretation l.jpg

Representation and Interpretation

  • „Dotted Types“ (Pustejovsky)

    • Lexical types are either simple (human, artifact, ...) or complex (information AND physical_object)

    • Can be represented with a „dotted type“, e.g.

      informationphysical_object

    • In (Cooper 2005) interpreted as a record type (a delicious lunch can take forever):


Related work l.jpg

Related Work

  • Apresjan 1973

    • Regular Polysemy.

  • Nunberg & Zaenen 1992

    • Systematic polysemy in lexicology and lexicography.

  • Bill Dolan 1994

    • Word Sense Ambiguation: Clustering Related Senses.

  • Copestake & Briscoe 1996

    • Semi-productive polysemy and sense extension.

  • Peters, Peters & Vossen 1998

    • Automatic Sense Clustering in EuroWordNet.

  • Tomuro 1998

    • Semi-Automatic Induction of Systematic Polysemy from WordNet.


Words and meanings27 l.jpg

Words and Meanings

Lexical Semantics in WordNetGenerative Lexicon and CoreLexTuning WordNet to a Domain


Reducing ambiguity l.jpg

Reducing Ambiguity

  • WordNet has too many senses …

  • Reduce Ambiguity

    • Cluster related senses (CoreLex)

    • Tune WordNet to an application domain


Domains and senses l.jpg

Domains and Senses

Domains determine Sense Selection, e.g.

  • English: cell

    • prison cell in the Politics/Law domain

    • living cell in the Biomedical domain

  • English: tissue

    • living tissue in the Biomedical domain

    • cloth in the Fashion domain

  • German: Probe

    • test in the Biomedical domain

    • rehearsal in the Theater domain

      >> Compute Domain-Specific Sense


Approaches l.jpg

Approaches

  • Subject Codes

    • Domain codes are in the dictionary

  • Topic Signatures

    • Compute (domain-specific) context models from dictionary definitions, domain corpora, web resources

  • Tuning of WordNet to a domain

    • Top Down: Cucchiarelli & Velardi, 1998

    • Bottom Up: Buitelaar & Sacaleanu, 2001

    • Related recent work: McCarthy et al, 2004; Chan & Ng, 2005; Mohammad & Hirst, 2006


Subject codes l.jpg

Subject Codes

  • Subject Codes (as used in LDOCE) indicate a domain in which a word is used in a particular sense

  • Examples (2600 codes)

    • Sub-Field Codes

      • MDZP (Medicine:Physiology)

    • Code Combinations

      • MLCO (Meteorology+Building) e.g. lightning conductor

      • MLUF (Meteorology+Europe+France) e.g. Mistral


Adding subject codes to wordnet l.jpg

Adding Subject Codes to WordNet

  • Grouping Synsets together across POS

    MEDICINENouns:doctor#1, hospital#1Verbs:operate#7

  • Grouping Synsets together across Sub-Hierarchies

    SPORTlife_form#1: athlete#1

    physical_object#1: game_equipment#1

    act#2 : sport#1

    location#1 : playing_field#1

Magnini B. & Cavaglià G. Integrating Subject Field Codes into WordNet In: Proceedings LREC 2000


Wordnet domains l.jpg

WordNet DOMAINS

Bernardo Magnini, Carlo Strapparava, Giovanni Pezzuli, and Alfio Gliozzo. Using domain information for word sense disambiguation. In: Proceedings of the SENSEVAL2 workshop 2001.


Wsd with subject codes l.jpg

WSD with Subject Codes

  • Match between set of words in the context of the ambiguous word and the set of words (“neighborhoods”) in the definitions + sample sentences of all senses that share a Subject Code

bank: Economics

bank: Medicine and Biology

Guthrie J. A. & Guthrie I. & Wilks Y. & Aidinejad H. Subject Dependent Co-Occurrence and Word Sense Disambiguation In: Proceedings of ACL 1991.


Topic signatures from the web l.jpg

Topic Signatures from the Web

  • Construct Topic Signatures for WordNet synsets/senses

    • Retrieve document collections from the web and use queries constructed for each WordNet sense, e.g.

( boyAND ( altar boy OR ball boy OR … OR male person )

AND NOT (man OR … OR broth of a boy OR

son OR … OR mama’s boy OR black ) )

Agirre E. & Ansa O. & Hovy E. & Martinez D. Enriching very large ontologies using the WWW In: Proc. of the Ontology Learning Workshop ECAI 2000


Top down tuning cucchiarelli velardi l.jpg

Top Down Tuning – Cucchiarelli & Velardi

  • Automatically find the best set of (WordNet) senses that:

    • “… represent at best the semantics of the domain”

    • “[has the] … ‘right’ level of abstraction, so as to mediate between over-ambiguity and generality”

    • “… [is] balanced …, i.e. words should be evenly distributed among categories”

Alessandro Cucchiarelli, Paola Velardi Finding a domain-appropriate sense inventory for semantically tagging a corpus. Natural Language Engineering 4/4, p.325-344, Dec. 1998.


Methods used l.jpg

Methods Used

  • Create alternative sets of balanced categories by use of an adapted version of the Hearst/Schütze algorithm

  • Apply a scoring function to find the best set, with parameters:

    • Generality

      • Highest possible level of generalization with a small number of categories is preferred

    • Discrimination Power

      • Different senses lead to different categories

    • (Domain) Coverage

      • Words in the domain corpus that are represented by the selected categories

    • Average Ambiguity

      • Ambiguity reduction is measured by the inverse of the average ambiguity of all words


Balanced categories hearst sch tze l.jpg

Balanced Categories - Hearst/Schütze

  • Reduce WordNet noun hierarchy to a set of 726 disjoint categories, each consisting of a relatively large number of synsets and of an average size, with as small a variance as possible

  • Group categories together into a set of 106 super-categories according to mutual co-occurrence in a training corpus

  • Measure the frequency of categories on domain corpora

United States Constitution

Genesis

Hearst M. & Schütze H. Customizing a Lexicon to Better Suit a Computational Task In: Proceedings ACL SIGLEX Workshop 1993


Generality l.jpg

Generality

Generality of Category Set Ci: 1/DM(Ci)

Average Distance between the Categories of Ci and the topmost synsets.

4 + 3 / 2

3 / 1

Ci = {Ci1, Ci2}

DM (Ci )= (3.5 + 3) / 2 = 3.25

Topmost SynSet

Ci1

Ci2

General SynSet


Discrimination power l.jpg

Discrimination Power

Discrimination Power of Category Set Ci:

(Nc(Ci) - Npc(Ci))/ Nc(Ci)

where Nc(Ci) is the number of words that reach at least one category of Ci and Npc(Ci) is the number of words that have at least two senses that reach the same category cij of Ci

Ci1

Ci2

Ci3

Ci4

Ci = {Ci1Ci2 Ci3Ci4}

General Synset

Sense

Domain Word

w1

w2

w3


Coverage average ambiguity l.jpg

Coverage & Average Ambiguity

Coverage of Category Set Ci: Nc(Ci)/W

where Nc(Ci) is the number of words that reach at least one category in Ci

Inverse of Average Ambiguity of Category Set Ci: 1/A(Ci)

where Nc(Ci) is the number of words that reach at least one category in Ci , and foreach word w in this set, Cwj(Ci) is the number of categories in Ci reached


Best category set wsj l.jpg

Best Category Set (WSJ)

Top Down categories for the financial domain, based on the Wall Street Journal


Sense selection with wsj set l.jpg

Sense Selection with WSJ Set

Senses for stock - kept by domain tuning on the Wall Street Journal

Senses for stock - discarded by domain tuning on the Wall Street Journal


Bottom up tuning buitelaar sacaleanu l.jpg

Bottom Up Tuning – Buitelaar & Sacaleanu

  • Ranking of WordNet synsets according to a domain-specific corpus

    • Compute term relevance against reference corpus

    • Compute synset relevance according to term relevance (where term = synonym in synset)

    • Ranking can be used in WSD (similar to usage of ‘most frequent heuristic’)

Paul Buitelaar, Bogdan Sacaleanu Ranking and Selecting Synsets by Domain Relevance In: Proceedings of WordNet and Other Lexical Resources: Applications, Extensions and Customizations, NAACL 2001 Workshop, June 3/4 2001


Tfidf l.jpg

TFIDF

The word is more important if it appears

several times in a target document

The word is more important if it appears in less documents

tf(w)term frequency (number of word occurrences in a document)

df(w)document frequency (number of documents containing the word)

Nnumber of all documents

tfIdf(w)relative importance of the word in the document


Term and synset relevance l.jpg

Term and Synset Relevance

  • Term Relevance

    • Relevance Score of Synset Members

      where t represents the term, d the domain, N is the total number of domains

  • Synset Relevance

    • Cumulated Relevance Score for a Synset


Extended synset relevance l.jpg

Extended Synset Relevance

  • Lexical Coverage

    • Take Length of the Synset Into Account

      [Gefängniszelle, Zelle] ("prison cell")

      [Zelle] ("living cell")

  • Hyponyms

    • Take Hyponyms Into Account

      [Zelle,Gefängniszelle,Todeszelle]

      [Zelle,Körperzelle,Pflanzenzelle]


Experiment medical domain l.jpg

Experiment – Medical Domain


Related recent work l.jpg

Related Recent Work

  • Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll

    • Finding predominant senses in untagged text. In Proc. of ACL 2004.

  • Chan, Yee Seng and Ng, Hwee Tou (2005)

    • Word Sense Disambiguation with Distribution Estimation. Proc. of IJCAI 2005.

  • Mohammad, Saif and Hirst, Graeme.

    • Determining word sense dominance using a thesaurus. Proc. of EACL 2006.


Day 2 introduction l.jpg

Day 2 - Introduction

  • Words and Object Descriptions

    • Semantics on the Semantic Web

      • Semantic Web, Ontologies and Natural Language Processing

    • The Lexical Semantic Web

      • Knowledge Representation as Word Meaning

    • A Lexicon Model for Ontologies

      • Enriching Ontologies with Linguistic Information


Words and object descriptions l.jpg

Words and Object Descriptions

Semantics on the Semantic WebThe “Lexical Semantic Web”A Lexicon Model for Ontologies


Slide52 l.jpg

Web Consists of Non-Interpreted Data

Web

Text

Images

Tables

DBs


Slide53 l.jpg

Interpretation through Markup - Categories

Web

Markup


Slide54 l.jpg

Interpretation through Markup – User Tags

“Web 2.0”

Markup


Slide55 l.jpg

Interpretation through Markup – User Tags

“Web 2.0”

Markup


Slide56 l.jpg

Formal Interpretation - Knowledge Markup

Semantic Web

Knowledge

Markup

Ontologies


Slide57 l.jpg

Formal Interpretation - Knowledge Markup

Semantic Web

Knowledge

Markup

Ontologies


Slide58 l.jpg

Formal Interpretation - Knowledge Markup

Semantic Web

Knowledge

Markup

Ontologies


Slide59 l.jpg

Turns the Web into a Knowledge Base

Knowledge

Markup

Ontologies


Slide60 l.jpg

Enables Semantic Web Services …

Semantic

Web Services

Knowledge

Markup

Ontologies


Slide61 l.jpg

… and Intelligent Man-Machine Interface

Semantic

Web Services

Knowledge

Markup

Ontologies

Intelligent

Man-Machine Interface


Semantic web layer cake l.jpg

Semantic Web Layer cake


Resource description framework rdf l.jpg

Resource Description Framework (RDF)

DFKI GmbH

name

node1

www

http://www.dfki.de

location

Kaiserslautern


Rdf xml based representation l.jpg

RDF : XML-based Representation

<?xml version=‘1.0’ ?>

<rdf:RDF

xmlns:rdf=“… rdf-syntax-ns#”

xmlns:rdfs=“… rdf-schema#”

xmlns=“http://example.org”>

<rdf:Descriptionrdf:nodeID=“node1”>

<name>DFKI GmbH</name>

<location>Kaiserslautern</location>

<www rdf:resource=“http://www.dfki.de” />

</rdf:Description>

</rdf:RDF>


Rdf schema rdfs l.jpg

RDF Schema (RDFS)

Representation of classes and properties

Student

enrolledIn

is-a

Course

Person

Teacher

is-a

teaches

name

rdf:Literal


Rdfs xml based representation l.jpg

RDFS : XML-based Representation


Web ontology language owl l.jpg

Web Ontology Language (OWL)

  • OWL adds further modelling vocabulary on top of RDFS, e.g.

    • Class equivalence

    • Property types (data vs. object property)

  • Based on Description Logics, three versions

    • OWL Lite

    • OWL DL

    • OWL Full


Slide68 l.jpg

OWL

Extended knowledge representation

Student

is-a

enrolledIn

Course

disjoint

Person

Teacher

is-a

teaches

name

rdf:Literal


Owl xml based representation l.jpg

OWL : XML-based Representation


Xml rdf rdfs owl l.jpg

Syntax

Semantics

XML

XML Schema

NamespacesInterpretation Context

Data Types

Formalization:

Class Definition, Properties

RDF Schema

RDF

Formalization:

extended Class Definition, Properties, Property Types

OWL

XML – RDF – RDFS - OWL


Ontologies what they are l.jpg

Ontologies – What they are

  • Ontology refers to an engineering artifact

    • a specific vocabulary used to describe a certain reality

    • a set of explicit assumptions regarding the intended meaning of the vocabulary

  • An Ontology is

    • an explicit specification of a conceptualization [Gruber 93]

    • a shared understanding of a domain of interest [Uschold/Gruninger 96]


Ontologies why you need them l.jpg

Ontologies – Why you need them

  • Make domain assumptions explicit

    • Easier to exchange domain assumptions

    • Easier to understand and update legacy data

  • Separate domain knowledge from operational knowledge

    • Re-use domain and operational knowledge separately

  • A community reference for applications

  • Shared understanding of what particular information means


Applications of ontologies l.jpg

Applications of Ontologies

  • NLP

    • Information Extraction, e.g. Buitelaar et al. 06, Mädche, Staab & Neumann 00, Nedellec, Rebholz

    • Information Retrieval (Semantic Search), e.g. WebKB (Martin et al. 00), OntoSeek (Guarino et al. 99), Ontobroker (Decker et al. 99)

    • Question Answering, e.g. Harabagiu, Schlobach & de Rijke, Aqualog (Lopez and Motta 04)

    • Machine Translation, e.g.Nirenburg et al. 04, Beale et al. 95, Hovy, Knight

  • Other

    • Business Process Modeling, e.g. Uschold et al. 98

    • Digital Libraries, e.g. Amann & Fundulaki 99

    • Information Integration, e.g. Kashyap 99; Wiederhold 92

    • Knowledge Management (incl. Semantic Web), e.g. Fensel 01, Staab & Schnurr 00; Sure et al. 00, Abecker et al. 97

    • Software Agents, e.g. Gluschko et al. 99; Smith & Poulter 99

    • User Interfaces, e.g. Kesseler 96


Ontologies and their relatives l.jpg

Ontologies and Their Relatives

General logical

constraints

Formal isa

Thesauri

Catalogs

Glossaries & Terminologies

Axioms:

Disjoint/Inverse…

Semantic Networks

Formal Instance


Thesauri examples eurovoc l.jpg

Thesauri – Examples : EuroVoc

  • EuroVoc

    • covers terminology in all of the official EU languages

    • for all fields (27) that concern the EU institutions, e.g. politics, trade, law, science, energy, agriculture

MT 3606 natural and applied sciences

UF gene pool

genetic resource

genetic stock

genotype

heredity

BT1 biology

BT2 life sciences

NT1 DNA

NT1 eugenics

RT genetic engineering (6411)


Thesauri examples mesh l.jpg

Thesauri – Examples : MeSH

  • MeSH (Medical Subject Headings)

    • organized by terms (~ 250,000) that correspond to medical subjects

    • for each term syntactic, morphological or semantic variants are given

MeSH Heading Databases, Genetic

Entry Term Genetic Databases

Entry Term Genetic Sequence Databases

Entry Term OMIM

Entry Term Online Mendelian Inheritance in Man

Entry Term Genetic Data Banks

Entry Term Genetic Data Bases

Entry Term Genetic Databanks

Entry Term Genetic Information Databases

See Also Genetic Screening


Semantic networks examples umls l.jpg

Semantic Networks - Examples : UMLS

  • Unified Medical Language System

    • integrates linguistic, terminological and semantic information

    • Semantic Network consists of 134 semantic types and 54 relations between types

Pharmacologic Substance affects Pathologic Function

Pharmacologic Substance causes Pathologic Function

Pharmacologic Substance complicatesPathologic Function

Pharmacologic Substance diagnoses Pathologic Function

Pharmacologic Substance prevents Pathologic Function

Pharmacologic Substance treats Pathologic Function


Semantic networks examples go l.jpg

Semantic Networks - Examples : GO

  • GO (Gene Ontology)

    • Aligns descriptions of gene products in different databases, including plant, animal and microbial genomes

    • Organizing principles are molecular function, biological process and cellular component

Accession:GO:0009292

Ontology:biological process

Synonyms:broad: genetic exchange

Definition:In the absence of a sexual life cycle, the processes involved in the introduction of genetic information to create a genetically different individual.

Term Lineageall : all (164142)

GO:0008150 : biological process (115947)

GO:0007275 : development (11892)

GO:0009292 : genetic transfer (69)


Ontologies example i l.jpg

Ontologies – Example I


Ontologies example ii l.jpg

F-Logic

Ontology

similar

Ontologies – Example II

Geographical Entity (GE)

is-a

flow_through

Inhabited GE

Natural GE

capital_of

city

mountain

river

country

instance_of

located_in

Zugspitze

Neckar

Germany

capital_of

height (m)

length (km)

flow_through

located_in

Stuttgart

Berlin

2962

367

flow_through

Design: Philipp Cimiano


Ontologies for nlp l.jpg

Ontologies for NLP

  • Information Retrieval

    • Query Expansion

  • Machine Translation

    • Interlingua

  • Information Extraction

    • Template Definition

    • Semantic Integration

  • Question Answering

    • Question Analysis

    • Answer Selection


Information extraction l.jpg

Information Extraction

  • Class-based Template Definition

    • Allows for Reasoning over Extracted Templates with Respect to the Ontology (see e.g. [Nedellec and Nazarenko 2005] for discussion)

  • Semantic Integration

    • Extraction from Heterogeneous Sources (Text, Tables and other Semi-Structured Data, Image Captions) – SmartWeb [Buitelaar et al. 06]

    • Multi-Document Information Extraction – ArtEquAKT [Alani et al. 2003]


Question answering l.jpg

Question Answering

  • Question Analysis

    • Ontology/WordNet-based Semantic Question Interpretation (e.g. [Pasca and Harabagiu 01])

  • Answer Selection

    • Ontology/WordNet-based Reasoning for Answer Type-Checking

      • Ontology of Events [Sinha and Narayanan 05]

      • Geographical Ontology, WordNet [Schlobach & de Rijke 04]

      • WordNet [Pasca and Harabagiu 01]

  • Ontology-based Question Answering

    • Derive Answers from a Knowledge Base (e.g. Aqualog [Lopez & Motta 04])


Ontology life cycle l.jpg

Ontology Life Cycle

Populate

Knowledge Base Generation

Validate

Consistency Checks

Create/Select

Development and/or Selection

Evolve

Extension, Modification

Deploy

Knowledge Retrieval

Maintain

Usability Tests


Nlp in the ontology life cycle l.jpg

NLP in the Ontology Life Cycle

Ontology Population

Information Extraction

KB Retrieval

Question Answering

OntologyLearning

Text Mining


Slide86 l.jpg

GeneralAxioms

Axiom Schemata

Relation Hierarchy

Relations

Concept Hierarchy

Concept Formation

(Multilingual) Synonyms

Terms

Ontology Learning

Design: Philipp Cimiano


Words and object descriptions87 l.jpg

Words and Object Descriptions

Semantics on the Semantic WebThe “Lexical Semantic Web”A Lexicon Model for Ontologies


Dictionary words and senses l.jpg

Dictionary: Words and Senses

  • Represent interpretations of words through senses, very much like classes that are assigned to a word, e.g.

    article

    1.An individual thing or element of a class…

    2. A particular section or item of a series in a written document…

    3. A non-fictional literary composition that forms an independent part of a publication…

    4. The part of speech used to indicate nouns and to specify their application

    5. A particular part or subject; a specific matter or point

    (as provided by http://dictionary.reference.com/)


Ontology classes and labels i l.jpg

Ontology: Classes and Labels - I

  • Ontologies assign labels (i.e. words) to a given class

  • In the COMMA ontology on document management the class article corresponds to sense 2 (‘section of a written document’):

http://pauillac.inria.fr/cdrom/ftp/ocomma/comma.rdfs


Ontology classes and labels ii l.jpg

Ontology Classes and Labels - II

  • In the GOLD ontology on linguistics, the class label article corresponds to sense 4 (‘part of speech ’):

http://emeld.org/gold


The meaning of director i l.jpg

The Meaning of Director - I

The Semantic Web can be viewed as a large, distributed dictionary (or rather a semantic lexicon) in which we can look up the meaning of words, e.g. director

… as a ‘role’ (AgentCities ontology)

http://www-agentcities.doc.ic.ac.uk/ontology/shows.daml


The meaning of director ii l.jpg

The Meaning of Director - II

… as ‘head of a program’ (University Benchmark ontology)

http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl


Exploring the lexical semantic web l.jpg

Exploring the Lexical Semantic Web

  • Collect ontologies

    • OntoSelect

  • Analyse the use of class/property labels

  • Treat class/property labels as lexical entries

    • Normalize

    • Organize by language


Ontology collection l.jpg

Ontology Collection

  • OntoSelect

    • Web Monitor on DAML, RDFS, OWL Files

    • Download, Analyze and Store Included Information and Metadata

      • Class and Property Labels

      • Multilingual Information

      • Included Ontologies

    • Ontology Ranking and Selection Functionalities

      http://olp.dfki.de/OntoSelect


Slide95 l.jpg

OntoSelect


Multilinguality on the semantic web l.jpg

Multilinguality on the Semantic Web


Slide97 l.jpg

Multilingual Labels


Slide98 l.jpg

“Lexical Semantic Ambiguity”


Words and object descriptions99 l.jpg

Words and Object Descriptions

Semantics on the Semantic WebThe “Lexical Semantic Web”A Lexicon Model for Ontologies


Ontologies example iii l.jpg

Ontologies – Example III


Ontologies example iii continued l.jpg

Ontologies – Example III (continued)

Student

studies_at

located_at

University

Campus

works_at

is_part_of

“Fakultät”

Staff


Ontologies example iii continued102 l.jpg

Ontologies – Example III (continued)

Student

studies_at

located_at

University

Campus

works_at

is_part_of

“Fakultät”

Staff

has_German_term

Fakultät

has_Dutch_term

Faculteit

has_US_English_term

School


Ontologies example iii continued103 l.jpg

Ontologies – Example III (continued)

University

is_part_of

“Fakultät”

has_term

Term

instance_of

instance_of

Fakultät

faculteit

school

language

language

language

DE

NL

EN-US


Semiotic triangle l.jpg

Semiotic Triangle

  • Ogden & Richards, 1923

  • based on Structural Linguistics studies (de Saussure, 1916)

  • adopted in Knowledge Representation (e.g. Sowa, 1984)


Linginfo model simplified l.jpg

LingInfo Model – Simplified

Design: Michael Sintek


Linginfo model l.jpg

LingInfo Model


Linginfo instances example l.jpg

LingInfo Instances - Example

Fußballspielers

„of the football player“


Linginfo predicate arg structure l.jpg

LingInfo Predicate-Arg Structure

Design: Anette Frank


Conclusions l.jpg

Conclusions


Conclusions115 l.jpg

Conclusions

  • WordNet: Appropriate Use may include

    • Introduction of underspecified senses (sense grouping)

    • Tuning to a domain

  • The “Lexical Semantic Web”

    • The Semantic Web (and Web 2.0) is a potentially rich resource for (formal) lexical semantics

    • Mining such resources for lexical semantics (i.e. compilation of a distributed semantic lexicon) only just started

    • Ontologies to be extended with linguistic/lexical information


  • Login