An abstract framework for generating maximal answers to queries
Download
1 / 46

An Abstract Framework for Generating Maximal Answers to Queries - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

An Abstract Framework for Generating Maximal Answers to Queries. Sara Cohen, Yehoshua Sagiv. Motivation. Queries and Databases. Answers and Semantics. Graph Properties. The Problem. In many different domains, we are given the option to query some source of information

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An Abstract Framework for Generating Maximal Answers to Queries' - conway


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
An abstract framework for generating maximal answers to queries l.jpg

An Abstract Framework for Generating Maximal Answers to Queries

Sara Cohen, Yehoshua Sagiv

ICDT 2005


Slide2 l.jpg

Motivation Queries

Queries and Databases

Answers and Semantics

Graph Properties

ICDT 2005


The problem l.jpg
The Problem Queries

  • In many different domains, we are given the option to query some source of information

  • Usually, the user only gets results if the query can be completely answered (satisfied)

  • In many domains, this is not appropriate, e.g.,

    • The user is not familiar with the database

    • The database does not contain complete information

    • There is a mismatch between the ontology of the user and that of the database

    • The query is a “search” that is not expected to be correct

ICDT 2005




Slide6 l.jpg

Search for buses from “Haifa-Technion” to Queries

“Ben Gurion Airport”

ICDT 2005



Slide8 l.jpg

Search for buses to destinations

“Ben Gurion Airport”

ICDT 2005


Slide9 l.jpg

Must choose From and To destinations

ICDT 2005


What do users need l.jpg
What Do Users Need? destinations

  • Users need a way to get interesting partial answers to their queries, especially if a complete answer does not exist

  • These partial answers should contain maximal information

  • Main Problems:

    • What should be the semantics of partial answers?

    • How can all partial answers be efficiently computed?

ICDT 2005


Previous work l.jpg
Previous Work destinations

  • Many solutions have been given for the main problems

    • solutions differ, according to the problem domain

  • Examples:

    • Full disjunctions: Galindo-Legaria (94), Rajaraman, Ullman (96), Kanza, Sagiv (03)

    • Queries with incomplete answers over semistructured data: Kanza, Nutt, Sagiv (99)

    • FleXPath: Amer-Yahia, Lakshmanan, Pandit (04)

    • Interconnections: Cohen, Kanza, Sagiv (03)

ICDT 2005


Our contribution l.jpg
Our Contribution destinations

  • In the past, for each semantics considered, the query evaluation problem had to be studied anew. In this paper, we:

    • Present a general framework for defining semantics for partial answers

    • Framework is general enough to cover most previously studied semantics

    • Query evaluation problem can be solved once within this framework – and reused for new semantics

    • Results improve upon previous evaluation algorithms

    • Presents relationship between this problem and that of the maximal P-subgraph problem

ICDT 2005


Slide13 l.jpg

Motivation destinations

Queries and Databases

Answers and Semantics

Graph Properties

ICDT 2005


Databases l.jpg
Databases destinations

  • Databases are modeled as data graphs: (V, E, r, lV, lE)

    • r: Can have a designated root

    • lV: Labels on the vertices

    • lE: Labels on the edges

  • Note:

    • Nodes correspond to data items

    • Even databases that do not have an inherent graph structure can be modeled as graphs, e.g., relational databases

ICDT 2005


Xml as a data graph l.jpg
XML as a Data Graph destinations

University

Name

Dept

Dept

Technion

Name

Faculty

Name

Faculty

Computer

Science

Biology

Professor

Lecturer

Teaches

Teaches

Teaches

Name

Name

Avi

Levy

Bioinformatics

Chana

Israeli

Databases

Molecular

Biology

ICDT 2005


Relational database as a data graph l.jpg
Relational Database as a Data Graph destinations

Sites

Climates

Accommodations

ICDT 2005


Relational database as a data graph17 l.jpg

(C, (Canada, diverse)) destinations

(C, (UK, temporate))

(C, (USA, temporate))

Relational Database as a Data Graph

Sites

Climates

Accommodations

ICDT 2005


Relational database as a data graph18 l.jpg

(A, (UK, London, Plaza)) destinations

(C, (Canada, diverse))

(C, (UK, temporate))

(A, (Canda, Montreal, Hilton))

(C, (USA, temporate))

(A, (Canda, Toronto, Ramada))

Relational Database as a Data Graph

Sites

Accommodations

ICDT 2005


Relational database as a data graph19 l.jpg

(S, (UK, London, Buckingham)) destinations

(A, (UK, London, Plaza))

(C, (Canada, diverse))

(S, (USA, NY, Metropolitan))

(C, (UK, temporate))

(A, (Canda, Montreal, Hilton))

(C, (USA, temporate))

(A, (Canda, Toronto, Ramada))

Relational Database as a Data Graph

Sites

ICDT 2005


Relational database as a data graph20 l.jpg

(S, (UK, London, Buckingham)) destinations

(A, (UK, London, Plaza))

(C, (Canada, diverse))

(S, (USA, NY, Metropolitan))

(C, (UK, temporate))

(A, (Canda, Montreal, Hilton))

(C, (USA, temporate))

(A, (Canda, Toronto, Ramada))

Relational Database as a Data Graph

ICDT 2005


Queries l.jpg
Queries destinations

  • Queries are modeled as query graphs: (V, E, r, CV, CE, s)

    • r:Can have a designated root

    • CV : Vertex constraints on the vertices (basically, a boolean function on vertices)

    • CE : Edge constraints on the edges (basically, a boolean function on pairs of vertices)

    • s:A structural constraint, one of the letters C, R, N(defines the required structure of answers, i.e., connected,rooted or none)

  • Note: Nodes correspond to query variables

ICDT 2005


Xml query as a graph l.jpg
XML Query as a Graph destinations

  • Returns faculty members from the Biology Department

= University

Is Descendent

= Dept and ContainsText(Biology)

Is Child

Structural Constraint: Rooted

= Faculty

Is GrandChild

= Name

ICDT 2005


Join query as a graph l.jpg
Join Query as a Graph destinations

  • C A S

Structural Constraint: Connected

Belongs to: C

q1

C.Country = A.Country

C.Country = S.Country

q2

q3

Belongs to: A

Belongs to: S

A.Country = S.Company and A.City = S.City

ICDT 2005


Slide24 l.jpg

Motivation destinations

Queries and Databases

Answers and Semantics

Graph Properties

ICDT 2005


Assignment graphs l.jpg
Assignment Graphs destinations

  • Assignment graphs are used to compactly represent assignments of query nodes to database nodes

  • Basically, assignment graph for Q and D, written QD has:

    • Node (q,d) for each pair q Q and d D such that d satisfies the constraint on q

    • Edge ((q,d), (q’,d’)) if there is an edge (q,q’) in Q and (d,d’) satisfies the constraint on (q,q’)

    • May also have a root (details omitted)

ICDT 2005


Slide26 l.jpg

(S, (UK, London, Buckingham)) destinations

s1

(C, (UK, temporate))

(A, (UK, London, Plaza))

c1

a1

(A, (Canda, Toronto, Ramada))

(C, (Canada, diverse))

a2

c2

(C, (USA, temporate))

(A, (Canda, Montreal, Hilton))

a3

c3

s2

(S, (USA, NY, Metropolitan))

Belongs to: C

(q1, c1)

q1

C.Country = A.Country

C.Country = S.Country

(q1, c2)

(q1, c3)

Belongs to: S

Belongs to: A

q2

q3

A.Country = S.Company and A.City = S.City

ICDT 2005


Slide27 l.jpg

(S, (UK, London, Buckingham)) destinations

s1

(C, (UK, temporate))

(A, (UK, London, Plaza))

c1

a1

(A, (Canda, Toronto, Ramada))

(C, (Canada, diverse))

a2

c2

(C, (USA, temporate))

(A, (Canda, Montreal, Hilton))

a3

c3

s2

(S, (USA, NY, Metropolitan))

Belongs to: C

(q2, a1)

(q1, c1)

q1

C.Country = A.Country

C.Country = S.Country

(q2, a2)

(q1, c2)

(q1, c3)

(q2, a3)

Belongs to: S

Belongs to: A

q2

q3

A.Country = S.Company and A.City = S.City

ICDT 2005


Slide28 l.jpg

(S, (UK, London, Buckingham)) destinations

s1

(C, (UK, temporate))

(A, (UK, London, Plaza))

c1

a1

(A, (Canda, Toronto, Ramada))

(C, (Canada, diverse))

a2

c2

(C, (USA, temporate))

(A, (Canda, Montreal, Hilton))

a3

c3

s2

(S, (USA, NY, Metropolitan))

(q3, s1)

Belongs to: C

(q2, a1)

(q1, c1)

q1

C.Country = A.Country

C.Country = S.Country

(q2, a2)

(q1, c2)

(q1, c3)

(q2, a3)

Belongs to: S

Belongs to: A

q2

q3

A.Country = S.Company and A.City = S.City

(q3, s2)

ICDT 2005


Slide29 l.jpg

(S, (UK, London, Buckingham)) destinations

s1

(C, (UK, temporate))

(A, (UK, London, Plaza))

c1

a1

(A, (Canda, Toronto, Ramada))

(C, (Canada, diverse))

a2

c2

(C, (USA, temporate))

(A, (Canda, Montreal, Hilton))

a3

c3

s2

(S, (USA, NY, Metropolitan))

(q3, s1)

Belongs to: C

(q2, a1)

(q1, c1)

q1

C.Country = A.Country

C.Country = S.Country

(q2, a2)

(q1, c2)

(q1, c3)

(q2, a3)

Belongs to: S

Belongs to: A

q2

q3

A.Country = S.Company and A.City = S.City

(q3, s2)

ICDT 2005


Slide30 l.jpg

(S, (UK, London, Buckingham)) destinations

s1

(C, (UK, temporate))

(A, (UK, London, Plaza))

c1

a1

(A, (Canda, Toronto, Ramada))

(C, (Canada, diverse))

a2

c2

(C, (USA, temporate))

(A, (Canda, Montreal, Hilton))

a3

c3

s2

(S, (USA, NY, Metropolitan))

(q3, s1)

Belongs to: C

(q2, a1)

(q1, c1)

q1

C.Country = A.Country

C.Country = S.Country

(q2, a2)

(q1, c2)

(q1, c3)

(q2, a3)

Belongs to: S

Belongs to: A

q2

q3

A.Country = S.Company and A.City = S.City

(q3, s2)

ICDT 2005


Slide31 l.jpg

(S, (UK, London, Buckingham)) destinations

s1

(C, (UK, temporate))

(A, (UK, London, Plaza))

c1

a1

(A, (Canda, Toronto, Ramada))

(C, (Canada, diverse))

a2

c2

(C, (USA, temporate))

(A, (Canda, Montreal, Hilton))

a3

c3

s2

(S, (USA, NY, Metropolitan))

Belongs to: C

q1

C.Country = A.Country

C.Country = S.Country

Belongs to: S

Belongs to: A

q2

q3

A.Country = S.Company and A.City = S.City

(q3, s1)

(q2, a1)

(q1, c1)

(q2, a2)

(q1, c2)

(q1, c3)

(q2, a3)

ICDT 2005

(q3, s2)


Partial assignment l.jpg
Partial Assignment destinations

  • A partial assignment is any subgraph of QD that does not contain two different nodes (q,d) and (q,d’)

    • otherwise, would map the node q to two different database nodes

  • Can distinguish special types of partial assignments:

    • vertex complete

    • edge complete

    • structurally consistent

Every query node must appear in the partial assignment

The partial assignment satisfies the query’s structural constraint

Every edge constraint between query variables in the partial assignment holds

ICDT 2005


Example l.jpg

destinationsVertex Complete,

Edge Complete,

Structurally Consistent

Vertex Complete,

Edge Complete,

Structurally Consistent

Vertex Complete,

Edge Complete,

Structurally Consistent

Example

(q3, s1)

(q2, a1)

(q1, c1)

(q2, a2)

(q1, c2)

(q1, c3)

(q2, a3)

(q3, s2)

ICDT 2005


Semantics l.jpg
Semantics destinations

  • All partial assignments for Q over D that satisfy the vertex and edge constraints are encoded in QD

  • A semantics defines which subgraphs of the answer graph (i.e., which partial assignments) are in fact answers, e.g.,

    • Sves allows all partial assignments that are vertex complete, edge complete and structurally consistent

    • Ses allows all partial assignments that are edge complete and structurally consistent

    • Ss allows all partial assignments that are structurally consistent

  • Usually, we are only interested in maximal partial assignemnts

ICDT 2005


Example join l.jpg
Example: Join destinations

(q3, s1)

Using semantics Sves we get the natural join

(q2, a1)

(q1, c1)

(q2, a2)

(q1, c2)

(q1, c3)

(q2, a3)

(q3, s2)

ICDT 2005


Example join becomes a full disjunction l.jpg
Example: Join “becomes” a Full Disjunction destinations

(q3, s1)

Using semantics Ses we get the full disjunction

(q2, a1)

(q1, c1)

(q2, a2)

(q1, c2)

(q1, c3)

(q2, a3)

(q3, s2)

ICDT 2005


Other examples l.jpg
Other Examples destinations

  • Queries with incomplete answers over semistructured data: Kanza, Nutt, Sagiv (PODS 99)

    • Weak semantics modeled by Ses;Or-semantics modeled by Ss

  • FleXPath: Amer-Yahia, Lakshmanan, Pandit (Sigmond 04)

    • Modeled by Ses

  • Interconnections: Cohen, Kanza, Sagiv (03)

    • Complete interconnection can be modeled by Ses; Reachable interconnection can be modeled by Ss

ICDT 2005


Slide38 l.jpg

Motivation destinations

Queries and Databases

Answers and Semantics

Graph Properties

ICDT 2005


Semantics are a type of graph property l.jpg
Semantics are a type of Graph Property destinations

  • A graph property Pis a set of graphs, e.g.,

    • is a clique

    • is a bipartite graph

  • A semantics defines a set of graphs, for every Q, D (these graphs are subgraphs of QD)

  • Therefore, semantics are a type of graph property

ICDT 2005


Hereditary graph properties and their variants l.jpg
Hereditary Graph Properties and their Variants destinations

  • There are several interesting types of graph properties that have been studied in graph theory

  • A graph property P is hereditary if every induced subgraph of a graph in P, is also in P (e.g., clique, is a forest)

  • A graph property P is connected-hereditary if every connected induced subgraph of a graph in P, is also in P (e.g., is a tree)

  • Can define rooted-hereditarysimilarly

ICDT 2005


Semantics are usually hereditary l.jpg
Semantics are usually Hereditary destinations

  • Most semantics for partial answers considered in the past are hereditary (in some sense), i.e., subgraphs of a partial answer are also partial answers

  • Many semantics require connectivity of results (e.g., full disjunctions)

  • Some require answers to be rooted (e.g., FlexPath)

ICDT 2005


Maximal p subgraph problem l.jpg
Maximal destinationsP-Subgraph Problem

  • Given a graph property P, and a graph G The maximal P-subgraph problem is: Find all maximal induced subgraphs of G that have property P

  • Therefore, the problem of finding all maximal answers for a query over a database, under a given semantics, is a special case of the maximal P-subgraph problem

ICDT 2005


Efficient query evaluation l.jpg
Efficient Query Evaluation destinations

  • There are efficient algorithms that find all maximal P-subgraphs for hereditary, connected hereditary and rooted hereditary properties

    • Efficient in terms of the input and the output (i.e., incremental polynomial time)

  • Use these algorithms to find maximal query answers, e.g., to find full disjunctions, weak answers, or-answers, etc.

    • Improves upon previous results

ICDT 2005


Conclusion l.jpg
Conclusion destinations

  • Presented abstract framework

  • Can model many different types of queries, databases and semantics in the framework

  • Semantics in the framework are graph properties

  • Solve the maximal P-subgraph problem once and reuse it to find maximal query answers

ICDT 2005


Future work l.jpg
Future Work destinations

  • It is convenient to define ranking functions and return answers in ranking order

  • How/when can this be done in our framework?

  • Note: From the modeling it is immediately apparent that ranking cannot always be performed efficiently

    • The problem of finding a maximal P-subgraph of size k is NP complete for hereditary and connected-hereditary graph properties (Yannakakis, STOC 78)

ICDT 2005


Thank you questions l.jpg

Thank you! destinationsQuestions?

ICDT 2005