Shalini gupta 07305r02 apoorv sharma 07305913 chirag patel 07305909 shitanshu verma 07305037
This presentation is the property of its rightful owner.
Sponsored Links
1 / 47

Ontology Learning PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037. Ontology Learning. Issue. There is lot of information current representation renders it uninterpretable for machines consequences most of the information remains undiscovered

Download Presentation

Ontology Learning

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Shalini gupta 07305r02 apoorv sharma 07305913 chirag patel 07305909 shitanshu verma 07305037

Shalini Gupta - 07305R02

Apoorv Sharma - 07305913

Chirag Patel - 07305909

Shitanshu Verma - 07305037

Ontology Learning


Issue

Issue

There is lot of information

current representation renders it uninterpretable for machines

consequences

most of the information remains undiscovered

Big and popular search engines are able to search only 3-4% of the total information on the web.


What is needed

What is needed ?

Improved machines intelligence.

Make them read understand use modify information.

With minimal human intervention.


To achieve it

To Achieve It ?

Enable machines

Populate

Enrich

Evaluate

Maintain Their knowledge representation


What is ontology

What is ontology

A representation format that conceptualizes domain

Captures classes, instances , attributes, relationships

Provides sound semantic ground of machine-understandable description of digital content

Is used in various fields SE, AI

Is represented using languages as OWL etc


What is ontology learning

What is ontology learning

Process of

preparing

updating

ontologies from sources such as

Documents in natural language

with the help of

dictionaries

thesauruses

etc


Ontology learning

Environment


The flow

The flow

Initial ontology is given

Information sources are given

Machines work over the data sources to

enrich the ontology

Once enriched

consistency check is done

evaluation


Terms related with the process

Terms related with the process

Ontology enrichment

Improving an existing ontology

Ontology population

Creating new ontology or adding new concepts to it

Inconsistency resolution

resolving inconsistencies that come up while acquiring ontologies


Enrichment of ontology

Enrichment of Ontology

Term Identification

Taxonomy Extraction

Non taxonomical relationship extraction


Enrichment of ontology1

Enrichment of Ontology

Term Identification

identify important terms in the text

Taxonomy Extraction

identifying taxonomical relationships between terms identified

Non taxonomical relationship extraction

identifying other relationships


Review

Review

Ontology learning

ontology enrichment

term identification

taxonomy extraction

non taxonomic relationship extraction


Term identification basics

Term Identification: Basics

Everything is a concept.

An object, an idea, or a thing.

A term lexicalizes a concept.

A Word or Multi-word string that conveys 'a single meaning' within a given community

e.g. company, Paris, man, cellphone, Red Hat, car parking

Goal: Find out representative concepts.


Term identification steps

Term Identification: Steps

Steps:

Term Recognition: Find the terms.

Term Classification: Cluster the terms which are same.

Term Mapping: Link the terms to well-defined concepts of referent data sources.

Various techniques exist for every step.


Term identification tokenizing

Term Identification: Tokenizing

Different combinations of Linguistics techniques have been able to surpass this step

Tokenizing

Scan the text in order to identify boundaries of words and complex expressions


Term identification tokenizing1

Term Identification: Tokenizing

Remove the stop words like 'a', 'the', 'of', 'with'

E.g. Check of the Electrical Bonding of External Composite Panels with a CORAS Resistivity-Continuity Test

Terms: Check, Electrical Bonding, External Composite Panels, CORAS Resistivity-Continuity Test Set.

Generally nouns are considered as candidate concepts


Term identification importance of a term

Term Identification: Importance of a term

TF-IDF technique can be used to find the important keywords [6]

a balanced measure stating that a word is more important if it appears several times in a target document and at the same time it appears rarely in other documents.

Seed-concepts can be used from existing ontologies.


Term identification importance of a term1

Term Identification:Importance of a term

Multi-word terms

The C/NC-value method: [5]

(1) the frequency of occurrence,

(2) the frequency of occurrence as a sub-string of other candidate terms,

(3) the number of candidate terms containing the given term as a sub-string,

(4) the number of words contained in the candidate term

The relevant terms can be determined by mutual cohesiveness by using Mutual Expectation


Term identification morphological analysis

Term Identification: Morphological Analysis

Use of morphological knowledge of a word [9]

A technique which identifies a word-stem from a full word-form

To identify small domain-specific units

studies patterns of word-formation and attempts to formulate rules using the word structure.

e.g. In the biomedical domain a word ending in “-ofilous” or “-itis” is very probably a bio-molecule or a medical term

Advantage: Can identify “background terms” even with low frequency of appearance


Term identification named entity recognition

Term Identification:Named Entity Recognition

Recognition of

person, location, organization names as single complex entities

Complex date and time expressions

percentage, monetary value

E.g. 'Merrill Lynch'

The next step associates single words or complex expressions with the concepts

e.g 'Merrill Lynch' is related to the concept organization


Identifying relationships

Identifying Relationships

  • More information for later steps

  • Dependency Relations:

    • Between the word and its neighbours, the mind perceives connections, the totality of which forms the structure of the sentence

    • Structural connections establish dependency relations between the words


Deriving relationships from dependency relations

Deriving Relationships from Dependency Relations

Syntactic dependency relations coincide closely with semantic relations [3]

e.g. France Telecom in Paris offers the new DSL technology.

Dependency relations would give linkage between France Telecom(organization) and Paris(city)‏

From this we can derive a semantic relationship between organization and city


Ontology learning

Term Identification

Identifying Relationships

Taxonomic Relationships

Non-Taxonomic Relationships


Taxonomy construction

Taxonomy Construction

Hierarchy of concepts

Inclusion relations provide a tree view of the ontology and imply inheritance between super-concepts and sub-concepts.

E.g. 'Living being' is a super-concept and 'mammal' is a sub-concept.

In terms of ontology, root node is the most general one for the domain of interest.


Discovering taxonomic relations

Discovering taxonomic relations

Based on lexico-syntactic patterns

Can find inclusion relation between concepts through a simple pattern matching on a set of documents

E.g. NP such as NP, NP,..., and NP

...works by authors such as Herrick, Goldsmith, and Shakespeare

hyponym(“author”, Herrick)‏

hyponym(“author”, Goldsmith)‏

hyponym(“author”, Shakespeare)‏


Discovering new patterns

Discovering new patterns

Idea is to use a pattern learner to generate new patterns

Generated patterns then can be used in order to generate new information (new inclusion relations), as well as to assess the validity of extracted information

E.g. we can generate new patterns like

NP is NP

NP, NP,..., and other NP

NP, especially NP, NP,..., and NP

From the pattern NP such NP as NP, NP,..., and NP


Algorithm for finding new patterns

Algorithm for finding new patterns

Decide on a lexical relation, R, that is of interest,e.g., "group/member" E.g. a hyponym relation like (author,Shakespeare).

Gather a list of terms/instances for which this relation holds.

Find places in the corpus where these terms/instances occur syntactically near one another and record the environment.

Find new patterns using this.

Once a new pattern has been positively identified, use it to gather more instances of the target relation and go to Step 2.


Multi word concepts

Multi-word concepts

A concept may be represented by multi-word terms

A concept 'A' is a hyponym of a concept 'B' if

A has more tokens than B

all the tokens of B are present in A

both terms have the same head

E.g. Concepts 'private customer' and business customer' is a hyponym of the concept 'customer'


Mining non taxonomic relations

Mining non-taxonomic relations

Relationships other than is-a relationships

E.g. Linguistic processing may find that the word 'cost' occurs frequently with the words 'hotel', 'guest house', 'youth hostel' in sentences like 'Costs at the youth hostel are $20 per night'

Relations (cost, hotel), (cost, guest house) and (cost, youth hostel) exist

Discovery algorithm finds support and confidence measures for these pairs as well as relationships at higher levels of abstraction such as accommodation and costs


Finding non taxonomic relations

Finding non-taxonomic relations

Based on basic Association Rule Algorithm [3]

Basic Association Rule Algorithm

Given

a set of transactions, T

Each transaction has a set of items, i1,i2, ... in

Goal: Compute association rules of form i1→i2

Trick: Explores the fact that many items appear together. So occurrence of one implies occurrence of another with a high probability (confidence)‏


Association rule mining

Association Rule Mining

E.g. consider the transactions

(bread, butter, jam, chips)‏

(bread, butter, jam, ketchup)‏

(ketchup,chips)‏

(bread, butter, jam, chips)‏

(bread,rice)‏

Eg. bread → butter, jam

Support =n(XUY)/N

E.g. Support = 3/5

Confidence = n(XUY)/n(X)‏

E.g. Confidence = 3/4


Algorithm

Algorithm

1. Extend each transaction to include the ancestor of a particular item

E.g. include the word 'Accommodation' in the transactions containing word 'guest house'

2. Determine association rules of the form Xk→Yk where |Xk| = 1 and |Yk| = 1

3. Determine confidence for all rules that exceed user determined support

4. Prune the rules subsumed by ancestral rules

E.g. if we found 2 rules, (cost, accommodation) and (cost, hotel), we prune the latter rule (cost, hotel)‏


Statistics based extraction of taxonomic relations 12 13

Statistics-based Extraction of Taxonomic Relations [12][13]

Uses hierarchical clustering.

Groups up the similar terms in a bottom up fashion

Uses cosine similarity function

The cosine measure or normalized correlation coefficient between two vectors x and y is given by


Algorithm1

Algorithm


Computation of similarity function

Computation of similarity function

The similarity matrix is given by

Hotel vector=(0,14,7,4,6)

Accommodation vector=(14,0,11,2,5)

cos(Hotel,Accommodation) = 7*11+4*2+6*5/(105*150)


Case study web based ontology learning with isolde

Case study:Web-based Ontology Learning with ISOLDE

ISOLDE (Information System for Ontology Learning and Domain Exploration) produce domain ontology from a base ontology

Uses the following

An unsupervised named entity recognition system

Web resources like DWDS, Wikipedia and Wiktionary.


Analysis steps used by isodle

Analysis steps used by ISODLE

Named-entity recognition (NER)

uses a domain-specific corpus, a base ontology and a general purpose NER system (SproUT, see Drozdzynski et al. 2004) to find instances for the classes in the base ontology.

Linguistic pattern analysis

for the extraction of class candidates from the context of the instances extracted in step 1 by use of lexico-syntactic patterns

Collecting web-based knowledge

collect information on and between extracted class candidates from online resources and integrating this into a new or extended taxonomy/ontology


Architecture

Architecture


Stage wise examples

Stage wise Examples

After step 1 we get Ballack,Munich, as 1 named entity from soccer corpus

In the second step we find the class candidates for named entities for the sentence in the corpus and then filter the domains specific candidates using X2 method

Ballack, the best midfielder in the German national team. Gives Midfielder as the calss candidate of Ballack.

In the third step for the class candidates we search on web wikipedia definition on midfielder is

A midfielder is a player whose position of play is midway between the attacking strikers and the defenders


Example contd

Example contd..

  • We learn the relation midfielder is a player(taxonomic relationship)

  • Relevence Factor X2

  • X2=

  • O matrix for striker


Issues in learning

Issues in Learning

human understandable vs machine understandable

learning higher degree relation

mapping to high level ontology

evaluation benchmark

incremental ontology learning

multi agent learning


Application of ontology

Application of ontology

is ubiquitous in information systems [2]

improving the performance of information retrieval and reasoning

making data between different applications interoperable

ontology-type semantic description of behaviors and services allow software agents in a multi-agent system to better coordinate themselves


References

References

[1] Elias Zavitsanos, Georgios Paliouras, George Vouros,Ontology Learning and Evaluation: A survey Technical Report, 2006.

[2] Nicolas Weber, Paul Buitelaar, Web-based Ontology Learning with ISOLDE, DFKI GmbH - Language Technology Lab Saarbrücken, German,2006.

[3] Alexander Maedche and Steffen Staab, Mining Ontologies from Text, 2000.

[4] Alexander Maedche, Viktor Pekar, and Steffen Staab, Ontology Learning Part One-On Discovering Taxonomic Relations from the Web, 2003.


References1

References

[5] K. Frantzi, S. Ananiadou, and H. Mima. Automatic recognition of multi-word terms: The c-value/nc-value method. 3(2):115–130, 2000.

[6] A. Saltion, G. Wong and C.S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620, 1975.

[7] D.I. Moldovan and R.C. Girju. An interactive tool for the rapid development of knowledge bases. International Journal on Artificial Intelligence Tools (IJAIT), 10(1-2), 2001


References2

References

[8] J.D. Cohen. Highlights: Language and domain independent automatic indexing terms for abstracting. Journal of the American Society for Information Science, 46(3):162–174, 1995.

[9] U. Heid. A linguistic bootstrapping approach to the extraction of term candidates from german text. Terminology, 5(2):161–181, 1998.

[10] L.M. Iwanska, N. Mata, and K. Kruger. Fully Automatic Acquisition of Taxonomic Knowledge from Large Corpora of Texts, pages 335–345. MIT/AAAI Press, 2000.


References3

References

[11] J.U. Kietz, A. Maedche, and R. Volz. A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet. , Juan-Les-Pins, France, 2000.

[12] A. Maedche, V. Pekar, and S. Staab.Ontology learning part one - on discovering taxonomic relations from the web.In Proceedings of the Web Intelligence conference. Springer Verlag, 2002.

[13] Vincent Schickel-Zuber, Boi Faltings: Using hierarchical clustering for learning theontologies used in recommendation systems. KDD 2007: 599-608

[14] A . Maedche and S. Staab. Discovering Conceptual Relations from Text. In Proceedings of ECAI 2000, IOS Press, Amsterdam, 2000.


Thank you

Thank You


  • Login