design and creation of ontologies for environmental multimedia information retrieval l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Design and Creation of Ontologies for Environmental (Multimedia) Information Retrieval * PowerPoint Presentation
Download Presentation
Design and Creation of Ontologies for Environmental (Multimedia) Information Retrieval *

Loading in 2 Seconds...

play fullscreen
1 / 27

Design and Creation of Ontologies for Environmental (Multimedia) Information Retrieval * - PowerPoint PPT Presentation


  • 148 Views
  • Uploaded on

Design and Creation of Ontologies for Environmental (Multimedia) Information Retrieval *. Vipul Kashyap National Library of Medicine kashyap@nlm.nih.gov Workshop on Science and the Semantic Web October 24, 2002. * Work done by the author when at MCC and LSDIS Lab, UGA. Outline.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Design and Creation of Ontologies for Environmental (Multimedia) Information Retrieval *' - ryder


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
design and creation of ontologies for environmental multimedia information retrieval

Design and Creation of Ontologies for Environmental (Multimedia) Information Retrieval*

Vipul KashyapNational Library of Medicinekashyap@nlm.nih.govWorkshop on Science and the Semantic WebOctober 24, 2002

* Work done by the author when at MCC and LSDIS Lab, UGA

outline
Outline
  • Ontologies for Information Retrieval: The InfoSleuth System
  • The Ontology Design Process:
    • “Reverse Engineering” from a database schema
    • Ontology refinement based on user queries
    • Using a data dictionary and Thesaurus
  • Ontology-based Multimedia Information Retrieval
    • Information Extraction from Textual Data
    • Information Extraction from Image Data
  • Conclusions and Future Work
slide3

KQML/OKBC

agents

Ontologies for Information Retrieval:The InfoSleuth System

Image Database:

features, patterns,

semantic objects

Document Database

e.g., Verity

Ontology-based

retrieval query

Structured Database

e.g., Oracle

a multimedia gis query using an ontological model

county

block

area

population

spatial_location

land_cover

containment

name

Region

Fire

isLocatedNear

A Multimedia GIS Query using an ontological model

Get me all regions (blocks, counties) having apopulation greater than 500 andarea greater than 50 acres having an urban land cover and such that all the nearby fires have excellent containment

select county, block, spatial_location

from region

where area > 50 and population > 500

and land_cover = “urban”

and region.isLocatedNear.containment = “excellent”

ontologies for information retrieval
Ontologies for Information Retrieval
  • Provide a concise, uniform, declarative description of semantic information
  • Independent of syntactic representations, conceptual models of the underlying information bases
  • Domain models provide wider access by supporting multiple world views on the same underlying data
  • EDEN ontology defined in the context of the InfoSleuth system:
    • important and crucial to capture elements of environmental information
sources for ontology construction
Sources for Ontology construction
  • Pre-existing Database Schemas
    • data directed component
  • Collection of representative set of queries possibly parameterized based on application user interface
    • application directed component
  • Thesauri and Vocabularies (e.g., EEA Thesaurus)
    • knowledge directed component
  • Ontology = knowledge-based middle ground between applications and data !!!
the ontology design process

Abstract detailsfrom Database Schema

Choose newDatabase Schema

DetermineRelationships

Determine entitiesand attributes

Group information,Analyze foreign keysand dependencies

Drop entitiesand attributes

Implementand Test

EvaluateOntology

Add new entitiesand attributes

Add new subclassesand superclasses

Choose new query

No morequeries

The Ontology Design Process

Ontology fromDatabase Schema

Ontology fromQueries

environmental databases
Environmental Databases
  • CERCLIS 3
    • http://www.epa.gov/enviro/html/cerclis/
  • ITT
  • HAZDAT
    • http://www.atsdr.cdc.gov/hazdat.html
  • ERPIMS
    • http://ns1.ktc.com/personal/larnold/erpims.htm
  • Basel Convention Database
    • http://www.unep.ch/basel
grouping information in multiple tables

Site

code

name

date

alias_name

description

Grouping Information in Multiple Tables

Site

site_id (PK)

site_name

site_ifms_ssid_

code

site_rcra_id

site_epa_id

Site_Characteristic

site_id (PK, FK to Site)

rsic_code (PK, FK to Ref_Sic)

sc_date

Ref_Sic

rsic_code (PK)

rsic_code_desc

Site_Alias

site_id (PK, FK to Site)

site_alias_id (PK)

sa_name

Database Schema

Ontology

identifying relationships

Ref_action_type

rat_code (PK)

rat_name

rat_def

Remedial_Responsesite_idact_code_idrat_code

Waste_Src_Media_Contaminated

Contaminant

wsmrc_nmbr (PK)

site_id (PK, FK to Action)

rat_code (FK to Action)

act_code_id (FK to Action)

actionName

Site

RemedialResponse

PerformedAt

Identifying Relationships

Site

site_id (PK)

site_name

site_ifms_ssid_

code

site_rcra_id

site_epa_id

Action

site_id (PK, FK to Site)

rat_code (PK, FK to ref_action_type)

act_code_id (PK)

Database Schema

Ontology

ontology refinement based on user queries
Ontology refinement based on user queries
  • Addition of New Attributes
    • At NPL sites with a land use category of INDUSTRIAL, what is the cleanup level range for LEAD ….
    • Add an attribute landUseCategory to the entity Site in the ontology
  • Addition of new Relationships
    • What is the range of concentrations for ARSENIC is a contaminant of concern in the SURFACE SOIL at NPL sites
    • Add a relationship HasContaminant between the entities Site and Contaminant in the ontology
  • Addition of class-subclass relationships and new entities
    • How many Super fund sites are in Edison County, New Jersey ?
    • Add an entity SuperFundSite as a subclass of Site in the ontology
using a data dictionary edr to enhance the ontology

Site

Map

coding_scheme1

coding_scheme2

coding_scheme3

state

StateName

StateCode

StateAbbr

Using a data dictionary (EDR) to enhance the ontology

{ “Texas”, “California” }

{ “TX”, “CA” }

select * from Site where state = ‘TX’ or state = ‘California’

select coding_scheme1 from Map where coding_scheme3 = ‘TX’

enhancing the ontology by using a thesaurus

LandSetup

Site

AbandonedSite

DisusedMilitarySite

SuperfundSite

Enhancing the Ontology by using a Thesaurus

abandoned siteTHEME POLLUTIONBT land setupNT disused military site

information extraction from text and multimedia data

county

block

area

population

spatial_location

land_cover

containment

name

Region

Fire

isLocatedNear

Information Extraction from Text andMultimedia Data

Get me all regions (blocks, counties) having apopulation greater than 500 andarea greater than 50 acres having an urban land cover and such that all the nearby fires have excellent containment

select county, block, spatial_location

from region

where area > 50 and population > 500

and land_cover = “urban”

and region.isLocatedNear.containment = “excellent”

slide15

containment

county

block

state

Region

Fire

isLocatedNear

Information Extraction from Textual Data

= “excellent”

<ACCRUE>(<SENTENCE>(<AND>(<NUMBER>(X),

X < 25),

<WORD>(%), <WORD>(active)),

<PHRASE>(full, containment,,

<STEM>(was), expected)

<PHRASE>(the, fire, <STEM>(is),

contained))

<ACCRUE>(<SENTENCE>(

<PHRASE>(<OR>(New, Las, San),

[region.county]),

<OR>(county, block, state)))

<PARAGRAPH>(FIRE, REGION)

mapping domain specific model elements to media specific metadata
Mapping “domain specific” model elements to media specific metadata
  • county(x,y) gets mapped to:
    • word(x), phrase(x), accrue(<list-of-subtrees>)
  • containment(x, “excellent”) gets mapped to:
    • sentence(<set-of-words>), stem(x), accrue(<list-of-subtrees>)
  • isLocatedNear(x, y) gets mapped to:
    • paragraph(x,y)
slide17

Mapping SQL queries to Topic Expressions

select county from region

where isLocatedNear.containment = “excellent”

<PARAGRAPH>(

<ACCRUE>(<SENTENCE>(<AND>(<NUMBER>(X),

X < 25),

<WORD>(%), <WORD>(active)),

<PHRASE>(full, containment,,

<STEM>(was), expected)

<PHRASE>(the, fire, <STEM>(is),

contained)),

<ACCRUE>(<SENTENCE>(

<PHRASE>(<OR>(New, Las, San),

[region.county]),

county))

)

limitations of current indexing technologies selection operation
Limitations of Current Indexing Technologies: “selection operation”

select county from region

<ACCRUE>(<SENTENCE>(<PHRASE>(<OR>(New, Las, San),

WILDCARD),

<OR>(county, block, state)))

=> post-processing of patterns returned (WILDCARD as place-holder)

Problem: WILDCARD may match a lot of words in the same sentence

WILDCARD may match different words in different sentences

using nlp and statistical techniques
Using NLP and statistical techniques
  • WILDCARD matches a number of words in the same sentence

Yeltsin was appointed thePrime Ministerwhensleeping

articlenounconjunction verb

=> Use part of speech tagging to reduce number of possibilities

  • WILDCARD matches different words in different sentencesYeltsin was appointed Prime MinisterYeltsin was appointed President=> use frequency statistics to give a level of confidence
definition support

INCIDENT MANAGEMENT SITUATION REPORT

Friday August 1, 1997 - 0530 MDT

NATIONAL PREPAREDNESS LEVEL II

CURRENT SITUATION: Alaska continues to experience large fire activity. Additional fires ha

staffed for structure protection.

SIMELS, Galena District, BLM. This fire is on the east side of the Innoko Flats, between Galena

The fore is active on the southern perimeter, which is burning into a continuous stand of black s

fire has increased in size, but was not mapped due to thick smoke. The slopover on the eastern

35% contained, while protection of the historic cabit continues.

CHINIKLIK MOUNTAIN, Galena District, BLM. A Type II Incident Management Team (Weh

assigned to the Chiniklik fire. The fire is contained. Major areas of heat have been mopped up.

contained. Major areas of heat have been mopped-up. All crews and overhead will mop-up wher

burned beyond the meadows. No flare-ups occurred today. Demobilization is planned for this we

depending on the results of infrared scanning.

Phrase:

SIMELS, Galina District, BLM.

Slot: fire.name

value: SIMELS

structure:

<name> , <place> , <unit> .

Definition Support
midas information extraction from multimedia data
MIDAS*: Information Extraction from Multimedia Data

Query: Get me all regions (blocks, counties) having apopulationgreater than 500 and area greater than 50 acres having an urban land cover

select county, block, area, population, spatial_location, land_coverfrom regionwhere area > 50and population > 500and land_cover = ‘urban’and relief = ‘moderate’

*Media Independent DomAin Specific correlation

slide22

Get me all regions

(counties, blocks) having

50 < population < 100

25 < area < 50

and low density urban area

land cover ...

media independent

correlation across domain

specific metadata

correlation across image

and structured data at an

intensional domain level

slide23

SQL queries to structured data

(Census DB)

Population:

Area:

Boundaries:

SQL Gateway

to textual data

(TIGER/Line DB)

Land cover:

Relief:

Image Processing routines

for Image Data

mapping domain specific model elements to media specific metadata25
Mapping “domain specific” model elementsto media specific metadata
  • contained(<concept>, <image>) gets mapped to:
    • latitude/longitude, image-coordinates
    • bounding box of region
    • image type: LULC, DEM
  • land_cover(x, “low density urban”) gets mapped to:
    • percentage(<pixel-color>, <bounding-box>)
  • relief(x, “moderate”) gets mapped to:
    • standard-deviation(<pixel-value, <bounding-box>)
need for characterization of domain vocabularies

Geological Region

Water

Urban

Forest Land

Industrial

Reservoirs

Residential

Lakes

Evergreen

Commercial

Streams and Canals

Deciduous

Mixed

Geological Region

State

County

Rural Area

City

Tract

Block Group

Block

Need for characterization of Domain Vocabularies

Another source

of domain ontology

Construction:

  • Classification Standards
conclusions and future work
Conclusions and Future Work
  • Role of semantic content in handling data/information overload
    • Domain Specific ontologies: an approach for capturing semantic content
  • Design and construction of domain ontologies
    • labor intensive, time consuming, difficult endeavor
    • Re-use readily information: schemas, queries, data dictionaries, thesauri
      • minimize the involvement of the domain expert
  • Metadata is the key for MultiMedia Information Retrieval
    • Use an expanded notion of metadata as schema and declarative SQL like query language
    • Pragamatic Incorporation of NLP/Image+Speech+Video Processing/Computer Vision techniques
    • Exploit synergy across multiple media for better precision and performance
  • Extrapolate this technique into other domains:
    • Medical and Bio-Informatics
    • telecommunication
    • IP networks (use of CIM information model by DMTF)
  • Ontology Extraction from Textual Data:
    • Clustering techniques to identify central concepts and taxonomic relationships
    • NLP techniques to identify concept associations
    • Consensus analysis techniques to establish ontologies