# Data Integration under the Schema Tuple Query Assumption - PowerPoint PPT Presentation

1 / 10

Data Integration under the Schema Tuple Query Assumption Michael Minock The University of Umeå, Sweden Introduction Problem: Queries may be over information that is not (yet) covered by the data integration system ”List museums in Vienna or Bratislava holding paintings by Klimt or Picasso.”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Data Integration under the Schema Tuple Query Assumption

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Data Integration under the Schema Tuple Query Assumption

Michael Minock

The University of Umeå, Sweden

Michael Minock (mjm@cs.umu.se)

### Introduction

• Problem:

• Queries may be over information that is not (yet) covered by the data integration system

• ”List museums in Vienna or Bratislava holding paintings by Klimt or Picasso.”

• A purely extensional response misleads

• Solution:

• Give available extension, but contextualize with intensional descriptions of coverage

• Certain: ”The following are all the museums in Vienna that hold paintings of Picasso: …”

• Possible: ”The following museums in Vienna do not provide inventory records, so they may have paintings by Klimt:…”

• Incomplete: ”There is no information for museums in Bratislava.”

Michael Minock (mjm@cs.umu.se)

### Approach

• LAV (Local as View) architecture

• user queries and data source descriptions restricted to schema tuple queries in L(or Q)

• currently sources must contain complete and correct views

• broker mediates user query over sources and supplies a mixed extensional/intensional response

• Use ’algebraic’ properties of L (or Q) to derive:

• query plan (using cache)

• logical descriptions of certain, uncertain and incomplete sets

• Exploit subsumption properties for:

• query simplification

• natural language generation

Michael Minock (mjm@cs.umu.se)

### The Schema Tuple Query Languages L (and Q)

• Assumptions:

• L :Tuple relational queriesof the form:

• Q:

• Properties:

• L and Q decidable for satisfiability

• Unlike , Q closed over negation

• May calculate difference and intersection and decide containment, equivalence and disjointness for queries built using L and Q

Michael Minock (mjm@cs.umu.se)

### Example: Art museum domain

• QUERY: ”List museums in Vienna or Bratislava

• holding paintings by Klimt or Picasso.”

Artist(id, name, country, DOB,DOD)

Museum (id, name, address, city, country)

Painting (id, title,year, artistId)

HasPainting (museumId, paintingId)

Central European

Museums

MAK

Inventory

Picasso

Locator

Albertina

Inventory

Michael Minock (mjm@cs.umu.se)

### Example: Input Expressions …

(m Museum

(IN m city ("Vienna" "Bratislava"))

(+ (y1 y2 y3)

(HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)(= y2 artistId y3 id)

(IN y3 name ("Klimt" "Picasso"))))

(h HasPainting

(+ (y1 y2)

(Painting y1)

(Artist y2)

(= h paintingId y1 id)

(= y1 artistId y2 id)

(= y2 name "Picasso"))))

(m Museum

(IN m city

("Vienna" "Prague”

"Berlin” …))))

(h HasPainting

(+ (y1)

(Museum y1)

(= h museumId y1 id)

(= y1 name "MAK")

(= y1 city "Vienna"))))

(h HasPainting

(+ (y1)

(Museum y1)

(= h museumId y1 id)

(= y1 name ”Albertina")

(= y1 city "Vienna"))))

Michael Minock (mjm@cs.umu.se)

### Example: Output Expressions …

(m Museum

(= m city ”Vienna")

(+ (y1 y2 y3)

(HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)

(= y2 artistId y3 id)(= y3 name "Picasso")))

(m Museum

(= m city ”Vienna")

(IN m name (”Albertina” ”MAK”))

(+ (y1 y2 y3)

(HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)

(= y2 artistId y3 id)(= y3 name "Klimt")))

Certain

(m Museum

(= m city ”Vienna")

(NOT_IN m name (”Albertina” ”MAK”))

(+ (y1 y2 y3) (HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)(= y2 artistId y3 id)

(= y3 name "Klimt")))

Uncertain

(m Museum

(= m city "Bratislava")

(+ (y1 y2 y3)

(HasPainting y1)(Painting y2)(Artist y3)

(= m id y1 museumId)(= y1 paintingId y2 id)(= y2 artistId y3 id)

(IN y3 name ("Klimt" "Picasso"))))

Incomplete

Michael Minock (mjm@cs.umu.se)

### Example: To Natural Language

• QUERY: ”List museums in Vienna or Bratislava

• holding paintings by Klimt or Picasso.”

”Museums in Vienna named

’Albertina’ or ’MAK’

that have paintings by Klimt.”

Certain

”Museums in Vienna that have paintings by Picasso”

Museums in Vienna not named

’Albertina’ or ’MAK’

that have paintings by Klimt.”

Uncertain

Incomplete

”Museums in Bratislava that have paintings by Picasso or Klimt.”

Michael Minock (mjm@cs.umu.se)

### Pros and cons of L and Q

• Pros

• May represent n-ary relations

• Direct translation to SQL!

• Some negation

• General cyclic queries

”The artists without paintings in a museum in the country of their origin.”

• Cons

• No projection!

• Certain quantifier prefixes prohibited

”The artists with paintings in all of the museums in the country of their origin”

• Michael Minock (mjm@cs.umu.se)

### Next ’STEP’…

• STEP 1.0 (Schema Tuple Expression Processor)

• Incomplete and/or incorrect source views

• Real applications

Datasource

Descriptions

Phrasal

Lexicon

Cache

DB

Broker

NLG

Differencing Engine/Simplifier

L2DomainCalculus

SPASS theorem prover

Michael Minock (mjm@cs.umu.se)