TANGO Table ANalysis for Generating Ontologies - PowerPoint PPT Presentation

Tango table analysis for generating ontologies l.jpg
Download
1 / 27

  • 224 Views
  • Updated On :
  • Presentation posted in: Travel / Places

TANGO Table ANalysis for Generating Ontologies. Yuri A. Tijerino*, David W. Embley*, Deryle W. Lonsdale* and George Nagy** * Brigham Young University ** Rensselaer Polytechnic Institute. List of contents. Motivation Applications Table understanding Concept matching

Related searches for TANGO Table ANalysis for Generating Ontologies

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

TANGO Table ANalysis for Generating Ontologies

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tango table analysis for generating ontologies l.jpg

TANGOTable ANalysis for Generating Ontologies

Yuri A. Tijerino*,

David W. Embley*,

Deryle W. Lonsdale* and

George Nagy**

* Brigham Young University

** Rensselaer Polytechnic Institute


List of contents l.jpg

List of contents

  • Motivation

  • Applications

  • Table understanding

  • Concept matching

  • Ontology merging/growing

  • Example

  • Future direction


Motivation l.jpg

Motivation

  • Semi-automated ontological engineering through Table Analysis for Generating Ontologies (TANGO)

  • Keyword or link analysis search not enough to search for information in tables

  • Structure in tables can lead to domain knowledge which includes concepts, relationships and constraints (ontologies)

  • Tables on web created for human use can lead to robust domain ontologies


Tango applications l.jpg

TANGO Applications

  • Extraction ontologies (generation)

  • Data integration

  • Semantic web

  • Multiple-source query processing

  • Document image analysis for documents that contain tables


Table understanding l.jpg

Table understanding

  • What is a table?

  • Why table normalization?

  • What is table understanding?

  • What is mini-ontology generation?


Table understanding what is a table l.jpg

Table understanding:What is a table?

  • “…a two-dimensional assembly of cells used to present information…”

    • Lopresti and Nagy

  • Normalized tables (row-column format)

  • Small paper (using OCR) and/or electronic tables (marked up) intended for human use


Table understanding what is table normalization l.jpg

Table understanding:What is table normalization?

Raw table

Table normalization means to take any table and produce a standard row-column table with all data cells containing expanded values and type information

Normalized

table


Table understanding what is table normalization8 l.jpg

Table understanding:What is table normalization?


Table understanding what is table normalization9 l.jpg

Table understanding:What is table normalization?


Table understanding information useful for normalization l.jpg

Table understanding:Information useful for normalization

  • Captions – in vicinity of table (above, below etc)

  • Footnotes – on annotated column labels or data cells

  • Embedded information – in rows, columns or cells {e.g., $, %, (1,000), billions, etc}

  • Links to other views of the table, possibly with new information


What is table understanding l.jpg

What is table understanding?

  • Normalize table

  • Take a table as an input and produce standard records in the form of attribute-value pairs as output

  • Discover constraints among columns

  • Understand the data values

{<Country: Afghanistan>, <GDP/PPP: $21,000,000,000>, <GDP/PPP per capita: $800>, <Real-growth rate: ?>, <Inflation: ?>}

Left-most,

primary key

{has(Country, GDP/PPP),has(Country,GDP/PPP Per Capita),

has(Country,Real-growth rate*), has(Country, Inflation*)

Country names

(from data frame)

Dollar amount

(from data frame)

Percentage

(from data frame)


Example creating a domain ontology l.jpg

Example:Creating a domain ontology

Longitude

Latitude

Latitude and longitude

designates location

Name

Geopolitical Entity

Location

Distances

Includes procedural

knowledge

names

has

Has

GMT

Duration between

Time zones

Time

Country

City

Has associated

data frames


Example table understanding to mini ontology generation l.jpg

Agglomeration

Population

Country

Continent

Example:Table understanding to mini-ontology generation


Example concept matching to ontology merging l.jpg

Longitude

Longitude

Latitude

Latitude

Agglomeration

Population

Latitude and longitude

designates location

Latitude and longitude

designates location

Country

Continent

Name

Name

Geopolitical Entity

Geopolitical Entity

Location

Location

names

names

has

has

Time

Time

Longitude

Latitude

Population

Country

Country

City

City

Latitude and longitude

designates location

Name

Geopolitical Entity

Location

Continent

Country

Agglomeration

City

Example:Concept matching to ontology Merging

Merge

Results

Has

GMT

Has

GMT


Concept matching l.jpg

Concept matching

  • We use exhaustive concept matching techniques to match concepts from different mini-ontologies, including:

    • Lexical and Natural Language Processing

    • Value Similarity

    • Value Features

    • Data Frame Comparison

    • Constraints


Concept matching lexical nlp l.jpg

Concept Matching (Lexical & NLP)

  • Lexical

    • Direct comparisons (substring/superstring)

    • WordNet (Synonyms, Word Senses, Hypernyms/Hyponyms)

  • Natural Language Processing

    • Phrases in column headers

    • Footnotes (for columns, rows, values)

    • Explanations of symbols, rows, columns

    • Titles and subtitles


Concept matching value similarity l.jpg

Concept Matching (Value Similarity)

  • Compute overlap for string values comparing data sets

  • Compute overlap for numeric values comparing Gaussian Probability Distributions

  • Compute similarity of numeric values using regression


Concept matching value similarity18 l.jpg

Concept Matching (Value Similarity)

Real-world example

Total of 193 cells in A

Total of 267 cells in B

77 fields in B not in A

3 fields in A not in B

190 total matches

Proportion of matches with

respect to A = 190/193 = 98%

Proportion of matches with

respect to B = 190/267 = 71%

In B not in A

In A not in B

In B not in A

A

B


Concept matching value similarity19 l.jpg

Concept Matching (Value Similarity)

Gaussian PDF

Total of 170 cells in A

Total of 240 cells in B

50 fields in B not in A

2 fields in A not in B

168 total matches

Proportion of matches with

respect to A = 168/170 = 99%

Proportion of matches with

respect to B = 168/240 = 70%

In B not in A

In A not in B

In B not in A

A

B


Concept matching value features l.jpg

Concept Matching (Value Features)

  • We can also compute similarities from value characteristics such as:

    • Character/numeric length, ratio

    • Numeric values mean, variance, standard deviation


Concept matching data frames l.jpg

Concept Matching (Data frames)

  • Snippets of real-world knowledge about data (type, length, nearby keywords, patterns [as in regexps], functional, etc)

  • We have used data frames to

    • Recognize data types

    • Include recognizers for values (dates, times, longitude, latitude, countries, cities, etc)

    • Provide conversion routines

    • Match headers, labels, footnotes and values

    • Compose or split columns (e.g., addresses)


Concept matching constraints l.jpg

Concept Matching (Constraints)

  • Keys in tables (as well as nonkeys)

  • Functional relationships

  • 1-1, 1-*, *-1 or *-* correspondences

  • Subset/superset of value sets

  • Unknown and null values


Ontology merging growing l.jpg

Ontology merging/growing

  • Direct merge (no conflicts)

    • Use results of matching phase to find similar concepts in ontologies (e.g., data value similarities, data frames, NLP, etc)

  • Conflict resolution

    • Interactively identify evidence and counter evidence of functional relationships among mini-ontologies using constraint resolution

  • IDS Interaction with human knowledge engineer

    • Issues – identify

    • Default strategy – apply

    • Suggestions – make


Example another mini ontology generation l.jpg

Longitude

Latitude

Place Name

Elevation

State

USGS Quad

Place

Area

Country

City/town

Lake

Reservoir

Mine

Example: Another mini-ontology generation


Example another mini ontology generation25 l.jpg

Longitude

Latitude

Place Name

Elevation

State

USGS Quad

Place

Area

Country

City/town

Lake

Reservoir

Mine

Example: Another mini-ontology generation

Merge

Longitude

Latitude

Population

Latitude and longitude

designates location

Name

Geopolitical Entity

Location

names

has

has

GMT

Time

Continent

Country

Agglomeration

City


Example concept mapping to ontology merging l.jpg

Example: Concept Mapping to Ontology Merging

Longitude

Latitude

Population

Latitude and longitude

designates location

Name

Geopolitical Entity

Location

names

has

has

GMT

Time

Geopolitical

Entity with

population

Elevation

State

USGS Quad

Place

Area

Country

Continent

Country

Agglomeration

City/town

Lake

Reservoir

Mine


Future direction l.jpg

Future direction

  • Start with multiple tables (or URLs) and generate mini-ontologies

  • Identify most suitable mini-ontologies to merge by calculating which tables have most overlap of concepts

  • Generate multiple domain ontologies

  • Integrate with form-based data extraction tools (smarter Web search engines)


  • Login