Data Integration under the Schema Tuple Query Assumption

1 / 10

# Slides - PowerPoint PPT Presentation

Data Integration under the Schema Tuple Query Assumption Michael Minock The University of Umeå, Sweden Introduction Problem: Queries may be over information that is not (yet) covered by the data integration system ”List museums in Vienna or Bratislava holding paintings by Klimt or Picasso.”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Data Integration under the Schema Tuple Query Assumption

Michael Minock

The University of Umeå, Sweden

Michael Minock ([email protected])

Introduction
• Problem:
• Queries may be over information that is not (yet) covered by the data integration system
• ”List museums in Vienna or Bratislava holding paintings by Klimt or Picasso.”
• A purely extensional response misleads
• Solution:
• Give available extension, but contextualize with intensional descriptions of coverage
• Certain: ”The following are all the museums in Vienna that hold paintings of Picasso: …”
• Possible: ”The following museums in Vienna do not provide inventory records, so they may have paintings by Klimt:…”
• Incomplete: ”There is no information for museums in Bratislava.”

Michael Minock ([email protected])

Approach
• LAV (Local as View) architecture
• user queries and data source descriptions restricted to schema tuple queries in L(or Q)
• currently sources must contain complete and correct views
• broker mediates user query over sources and supplies a mixed extensional/intensional response
• Use ’algebraic’ properties of L (or Q) to derive:
• query plan (using cache)
• logical descriptions of certain, uncertain and incomplete sets
• Exploit subsumption properties for:
• query simplification
• natural language generation

Michael Minock ([email protected])

The Schema Tuple Query Languages L (and Q)
• Assumptions:
• L :Tuple relational queriesof the form:
• Q:
• Properties:
• L and Q decidable for satisfiability
• Unlike , Q closed over negation
• May calculate difference and intersection and decide containment, equivalence and disjointness for queries built using L and Q

Michael Minock ([email protected])

Example: Art museum domain
• QUERY: ”List museums in Vienna or Bratislava
• holding paintings by Klimt or Picasso.”

Artist(id, name, country, DOB,DOD)

Museum (id, name, address, city, country)

Painting (id, title,year, artistId)

HasPainting (museumId, paintingId)

Central European

Museums

MAK

Inventory

Picasso

Locator

Albertina

Inventory

Michael Minock ([email protected])

Example: Input Expressions …

(m Museum

(IN m city ("Vienna" "Bratislava"))

(+ (y1 y2 y3)

(HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)(= y2 artistId y3 id)

(IN y3 name ("Klimt" "Picasso"))))

(h HasPainting

(+ (y1 y2)

(Painting y1)

(Artist y2)

(= h paintingId y1 id)

(= y1 artistId y2 id)

(= y2 name "Picasso"))))

(m Museum

(IN m city

("Vienna" "Prague”

"Berlin” …))))

(h HasPainting

(+ (y1)

(Museum y1)

(= h museumId y1 id)

(= y1 name "MAK")

(= y1 city "Vienna"))))

(h HasPainting

(+ (y1)

(Museum y1)

(= h museumId y1 id)

(= y1 name ”Albertina")

(= y1 city "Vienna"))))

Michael Minock ([email protected])

Example: Output Expressions …

(m Museum

(= m city ”Vienna")

(+ (y1 y2 y3)

(HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)

(= y2 artistId y3 id)(= y3 name "Picasso")))

(m Museum

(= m city ”Vienna")

(IN m name (”Albertina” ”MAK”))

(+ (y1 y2 y3)

(HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)

(= y2 artistId y3 id)(= y3 name "Klimt")))

Certain

(m Museum

(= m city ”Vienna")

(NOT_IN m name (”Albertina” ”MAK”))

(+ (y1 y2 y3) (HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)(= y2 artistId y3 id)

(= y3 name "Klimt")))

Uncertain

(m Museum

(= m city "Bratislava")

(+ (y1 y2 y3)

(HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)(= y2 artistId y3 id)

(IN y3 name ("Klimt" "Picasso"))))

Incomplete

Michael Minock ([email protected])

Example: To Natural Language
• QUERY: ”List museums in Vienna or Bratislava
• holding paintings by Klimt or Picasso.”

”Museums in Vienna named

’Albertina’ or ’MAK’

that have paintings by Klimt.”

Certain

”Museums in Vienna that have paintings by Picasso”

Museums in Vienna not named

’Albertina’ or ’MAK’

that have paintings by Klimt.”

Uncertain

Incomplete

”Museums in Bratislava that have paintings by Picasso or Klimt.”

Michael Minock ([email protected])

Pros and cons of L and Q
• Pros
• May represent n-ary relations
• Direct translation to SQL!
• Some negation
• General cyclic queries

”The artists without paintings in a museum in the country of their origin.”

• Cons
• No projection!
• Certain quantifier prefixes prohibited

”The artists with paintings in all of the museums in the country of their origin”

Michael Minock ([email protected])

Next ’STEP’…
• STEP 1.0 (Schema Tuple Expression Processor)
• Incomplete and/or incorrect source views
• Real applications

Datasource

Descriptions

Phrasal

Lexicon

Cache

DB

Broker

NLG

Differencing Engine/Simplifier

L2DomainCalculus

SPASS theorem prover

Michael Minock ([email protected])