Shalini gupta 07305r02 apoorv sharma 07305913 chirag patel 07305909 shitanshu verma 07305037
1 / 47

Ontology Learning - PowerPoint PPT Presentation

  • Uploaded on

Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037. Ontology Learning. Issue. There is lot of information current representation renders it uninterpretable for machines consequences most of the information remains undiscovered

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Ontology Learning' - zulema

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Shalini gupta 07305r02 apoorv sharma 07305913 chirag patel 07305909 shitanshu verma 07305037

Shalini Gupta - 07305R02

Apoorv Sharma - 07305913

Chirag Patel - 07305909

Shitanshu Verma - 07305037

Ontology Learning


There is lot of information

current representation renders it uninterpretable for machines


most of the information remains undiscovered

Big and popular search engines are able to search only 3-4% of the total information on the web.

What is needed
What is needed ?

Improved machines intelligence.

Make them read understand use modify information.

With minimal human intervention.

To achieve it
To Achieve It ?

Enable machines




Maintain Their knowledge representation

What is ontology
What is ontology

A representation format that conceptualizes domain

Captures classes, instances , attributes, relationships

Provides sound semantic ground of machine-understandable description of digital content

Is used in various fields SE, AI

Is represented using languages as OWL etc

What is ontology learning
What is ontology learning

Process of



ontologies from sources such as

Documents in natural language

with the help of




The flow
The flow

Initial ontology is given

Information sources are given

Machines work over the data sources to

enrich the ontology

Once enriched

consistency check is done


Terms related with the process
Terms related with the process

Ontology enrichment

Improving an existing ontology

Ontology population

Creating new ontology or adding new concepts to it

Inconsistency resolution

resolving inconsistencies that come up while acquiring ontologies

Enrichment of ontology
Enrichment of Ontology

Term Identification

Taxonomy Extraction

Non taxonomical relationship extraction

Enrichment of ontology1
Enrichment of Ontology

Term Identification

identify important terms in the text

Taxonomy Extraction

identifying taxonomical relationships between terms identified

Non taxonomical relationship extraction

identifying other relationships


Ontology learning

ontology enrichment

term identification

taxonomy extraction

non taxonomic relationship extraction

Term identification basics
Term Identification: Basics

Everything is a concept.

An object, an idea, or a thing.

A term lexicalizes a concept.

A Word or Multi-word string that conveys 'a single meaning' within a given community

e.g. company, Paris, man, cellphone, Red Hat, car parking

Goal: Find out representative concepts.

Term identification steps
Term Identification: Steps


Term Recognition: Find the terms.

Term Classification: Cluster the terms which are same.

Term Mapping: Link the terms to well-defined concepts of referent data sources.

Various techniques exist for every step.

Term identification tokenizing
Term Identification: Tokenizing

Different combinations of Linguistics techniques have been able to surpass this step


Scan the text in order to identify boundaries of words and complex expressions

Term identification tokenizing1
Term Identification: Tokenizing

Remove the stop words like 'a', 'the', 'of', 'with'

E.g. Check of the Electrical Bonding of External Composite Panels with a CORAS Resistivity-Continuity Test

Terms: Check, Electrical Bonding, External Composite Panels, CORAS Resistivity-Continuity Test Set.

Generally nouns are considered as candidate concepts

Term identification importance of a term
Term Identification: Importance of a term

TF-IDF technique can be used to find the important keywords [6]

a balanced measure stating that a word is more important if it appears several times in a target document and at the same time it appears rarely in other documents.

Seed-concepts can be used from existing ontologies.

Term identification importance of a term1
Term Identification:Importance of a term

Multi-word terms

The C/NC-value method: [5]

(1) the frequency of occurrence,

(2) the frequency of occurrence as a sub-string of other candidate terms,

(3) the number of candidate terms containing the given term as a sub-string,

(4) the number of words contained in the candidate term

The relevant terms can be determined by mutual cohesiveness by using Mutual Expectation

Term identification morphological analysis
Term Identification: Morphological Analysis

Use of morphological knowledge of a word [9]

A technique which identifies a word-stem from a full word-form

To identify small domain-specific units

studies patterns of word-formation and attempts to formulate rules using the word structure.

e.g. In the biomedical domain a word ending in “-ofilous” or “-itis” is very probably a bio-molecule or a medical term

Advantage: Can identify “background terms” even with low frequency of appearance

Term identification named entity recognition
Term Identification:Named Entity Recognition

Recognition of

person, location, organization names as single complex entities

Complex date and time expressions

percentage, monetary value

E.g. 'Merrill Lynch'

The next step associates single words or complex expressions with the concepts

e.g 'Merrill Lynch' is related to the concept organization

Identifying relationships
Identifying Relationships

  • More information for later steps

  • Dependency Relations:

    • Between the word and its neighbours, the mind perceives connections, the totality of which forms the structure of the sentence

    • Structural connections establish dependency relations between the words

Deriving relationships from dependency relations
Deriving Relationships from Dependency Relations

Syntactic dependency relations coincide closely with semantic relations [3]

e.g. France Telecom in Paris offers the new DSL technology.

Dependency relations would give linkage between France Telecom(organization) and Paris(city)‏

From this we can derive a semantic relationship between organization and city

Term Identification

Identifying Relationships

Taxonomic Relationships

Non-Taxonomic Relationships

Taxonomy construction
Taxonomy Construction

Hierarchy of concepts

Inclusion relations provide a tree view of the ontology and imply inheritance between super-concepts and sub-concepts.

E.g. 'Living being' is a super-concept and 'mammal' is a sub-concept.

In terms of ontology, root node is the most general one for the domain of interest.

Discovering taxonomic relations
Discovering taxonomic relations

Based on lexico-syntactic patterns

Can find inclusion relation between concepts through a simple pattern matching on a set of documents

E.g. NP such as NP, NP,..., and NP by authors such as Herrick, Goldsmith, and Shakespeare

hyponym(“author”, Herrick)‏

hyponym(“author”, Goldsmith)‏

hyponym(“author”, Shakespeare)‏

Discovering new patterns
Discovering new patterns

Idea is to use a pattern learner to generate new patterns

Generated patterns then can be used in order to generate new information (new inclusion relations), as well as to assess the validity of extracted information

E.g. we can generate new patterns like

NP is NP

NP, NP,..., and other NP

NP, especially NP, NP,..., and NP

From the pattern NP such NP as NP, NP,..., and NP

Algorithm for finding new patterns
Algorithm for finding new patterns

Decide on a lexical relation, R, that is of interest,e.g., "group/member" E.g. a hyponym relation like (author,Shakespeare).

Gather a list of terms/instances for which this relation holds.

Find places in the corpus where these terms/instances occur syntactically near one another and record the environment.

Find new patterns using this.

Once a new pattern has been positively identified, use it to gather more instances of the target relation and go to Step 2.

Multi word concepts
Multi-word concepts

A concept may be represented by multi-word terms

A concept 'A' is a hyponym of a concept 'B' if

A has more tokens than B

all the tokens of B are present in A

both terms have the same head

E.g. Concepts 'private customer' and business customer' is a hyponym of the concept 'customer'

Mining non taxonomic relations
Mining non-taxonomic relations

Relationships other than is-a relationships

E.g. Linguistic processing may find that the word 'cost' occurs frequently with the words 'hotel', 'guest house', 'youth hostel' in sentences like 'Costs at the youth hostel are $20 per night'

Relations (cost, hotel), (cost, guest house) and (cost, youth hostel) exist

Discovery algorithm finds support and confidence measures for these pairs as well as relationships at higher levels of abstraction such as accommodation and costs

Finding non taxonomic relations
Finding non-taxonomic relations

Based on basic Association Rule Algorithm [3]

Basic Association Rule Algorithm


a set of transactions, T

Each transaction has a set of items, i1,i2, ... in

Goal: Compute association rules of form i1→i2

Trick: Explores the fact that many items appear together. So occurrence of one implies occurrence of another with a high probability (confidence)‏

Association rule mining
Association Rule Mining

E.g. consider the transactions

(bread, butter, jam, chips)‏

(bread, butter, jam, ketchup)‏


(bread, butter, jam, chips)‏


Eg. bread → butter, jam

Support =n(XUY)/N

E.g. Support = 3/5

Confidence = n(XUY)/n(X)‏

E.g. Confidence = 3/4


1. Extend each transaction to include the ancestor of a particular item

E.g. include the word 'Accommodation' in the transactions containing word 'guest house'

2. Determine association rules of the form Xk→Yk where |Xk| = 1 and |Yk| = 1

3. Determine confidence for all rules that exceed user determined support

4. Prune the rules subsumed by ancestral rules

E.g. if we found 2 rules, (cost, accommodation) and (cost, hotel), we prune the latter rule (cost, hotel)‏

Statistics based extraction of taxonomic relations 12 13
Statistics-based Extraction of Taxonomic Relations [12][13]

Uses hierarchical clustering.

Groups up the similar terms in a bottom up fashion

Uses cosine similarity function

The cosine measure or normalized correlation coefficient between two vectors x and y is given by

Computation of similarity function
Computation of similarity function

The similarity matrix is given by

Hotel vector=(0,14,7,4,6)

Accommodation vector=(14,0,11,2,5)

cos(Hotel,Accommodation) = 7*11+4*2+6*5/(105*150)

Case study web based ontology learning with isolde
Case study:Web-based Ontology Learning with ISOLDE

ISOLDE (Information System for Ontology Learning and Domain Exploration) produce domain ontology from a base ontology

Uses the following

An unsupervised named entity recognition system

Web resources like DWDS, Wikipedia and Wiktionary.

Analysis steps used by isodle
Analysis steps used by ISODLE

Named-entity recognition (NER)

uses a domain-specific corpus, a base ontology and a general purpose NER system (SproUT, see Drozdzynski et al. 2004) to find instances for the classes in the base ontology.

Linguistic pattern analysis

for the extraction of class candidates from the context of the instances extracted in step 1 by use of lexico-syntactic patterns

Collecting web-based knowledge

collect information on and between extracted class candidates from online resources and integrating this into a new or extended taxonomy/ontology

Stage wise examples
Stage wise Examples

After step 1 we get Ballack,Munich, as 1 named entity from soccer corpus

In the second step we find the class candidates for named entities for the sentence in the corpus and then filter the domains specific candidates using X2 method

Ballack, the best midfielder in the German national team. Gives Midfielder as the calss candidate of Ballack.

In the third step for the class candidates we search on web wikipedia definition on midfielder is

A midfielder is a player whose position of play is midway between the attacking strikers and the defenders

Example contd
Example contd..

  • We learn the relation midfielder is a player(taxonomic relationship)

  • Relevence Factor X2

  • X2=

  • O matrix for striker

Issues in learning
Issues in Learning

human understandable vs machine understandable

learning higher degree relation

mapping to high level ontology

evaluation benchmark

incremental ontology learning

multi agent learning

Application of ontology
Application of ontology

is ubiquitous in information systems [2]

improving the performance of information retrieval and reasoning

making data between different applications interoperable

ontology-type semantic description of behaviors and services allow software agents in a multi-agent system to better coordinate themselves


[1] Elias Zavitsanos, Georgios Paliouras, George Vouros,Ontology Learning and Evaluation: A survey Technical Report, 2006.

[2] Nicolas Weber, Paul Buitelaar, Web-based Ontology Learning with ISOLDE, DFKI GmbH - Language Technology Lab Saarbrücken, German,2006.

[3] Alexander Maedche and Steffen Staab, Mining Ontologies from Text, 2000.

[4] Alexander Maedche, Viktor Pekar, and Steffen Staab, Ontology Learning Part One-On Discovering Taxonomic Relations from the Web, 2003.


[5] K. Frantzi, S. Ananiadou, and H. Mima. Automatic recognition of multi-word terms: The c-value/nc-value method. 3(2):115–130, 2000.

[6] A. Saltion, G. Wong and C.S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620, 1975.

[7] D.I. Moldovan and R.C. Girju. An interactive tool for the rapid development of knowledge bases. International Journal on Artificial Intelligence Tools (IJAIT), 10(1-2), 2001


[8] J.D. Cohen. Highlights: Language and domain independent automatic indexing terms for abstracting. Journal of the American Society for Information Science, 46(3):162–174, 1995.

[9] U. Heid. A linguistic bootstrapping approach to the extraction of term candidates from german text. Terminology, 5(2):161–181, 1998.

[10] L.M. Iwanska, N. Mata, and K. Kruger. Fully Automatic Acquisition of Taxonomic Knowledge from Large Corpora of Texts, pages 335–345. MIT/AAAI Press, 2000.


[11] J.U. Kietz, A. Maedche, and R. Volz. A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet. , Juan-Les-Pins, France, 2000.

[12] A. Maedche, V. Pekar, and S. Staab.Ontology learning part one - on discovering taxonomic relations from the web.In Proceedings of the Web Intelligence conference. Springer Verlag, 2002.

[13] Vincent Schickel-Zuber, Boi Faltings: Using hierarchical clustering for learning theontologies used in recommendation systems. KDD 2007: 599-608

[14] A . Maedche and S. Staab. Discovering Conceptual Relations from Text. In Proceedings of ECAI 2000, IOS Press, Amsterdam, 2000.