Loading in 5 sec....

Full Disjunctions : Polynomial-Delay Iterators in ActionPowerPoint Presentation

Full Disjunctions : Polynomial-Delay Iterators in Action

- By
**ranee** - Follow User

- 54 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Full Disjunctions : Polynomial-Delay Iterators in Action' - ranee

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Full Disjunctions:Polynomial-Delay Iterators in Action

Contents

Itzhak Fadida

Yaron Kanza

Technion Israel

Technion Israel

University of Toronto

Canada

Benny Kimelfeld

Yehoshua Sagiv

Hebrew University

Israel

Hebrew University

Israel

VLDB 2006

Seoul, Korea

Computing Full Disjunctions

- The full disjunction is a relational operator that maximally combines data from several relations
- It extends the natural join by allowing incompleteness
- It extends the binaryouterjoin to many relations

- This paper presents algorithms and optimizations for computing full disjunctions
- Theoretically, full disjunctions are more tractable than previously known
- Practically, a significant improvement over the state-of-art, an iterator-like evaluation

Contents

- Full Disjunctions
- Complexity

- Contributions
- Algorithms
- Algorithm NLOJ for Tree-Structured Schemes
- Algorithm PDelayFD for General Schemes
- Algorithm BiComNLOJ − Main Algorithm

- Experimental Results
- Conclusion

Contents

- Full Disjunctions
- Complexity

- Contributions
- Algorithms
- Algorithm NLOJ for Tree-Structured Schemes
- Algorithm PDelayFD for General Schemes
- Algorithm BiComNLOJ − Main Algorithm

- Experimental Results
- Conclusion

The Natural Join Misses Information

Climates

Accommodations

Sites

Bahamas is not in Sites, so the natural join misses it

ClimatesAccommodationsSites

The Natural Join Misses Information

Empty space means nullvalue

Climates

Accommodations

Bahamas is not in Sites, so the natural join misses it

Mouth Logan is not in a city, hence missed

ClimatesAccommodationsSites

The Natural Join Misses Information

A looser notion of join is needed—one that enables joining tuples from some of the tables

Climates

Accommodations

Bahamas is not in Sites, so the natural join misses it

Mouth Logan is not in a city, hence missed

ClimatesAccommodationsSites

The Natural Join Operator

A tuple of the join corresponds to a set of tuples from the source relations

Climates

Accommodations

Sites

Join consistent

Connected

No Cartesian product

Complete

One tuple from each relation

ClimatesAccommodationsSites

Join-Consistent Sets of Tuples

A set T of tuples is join-consistent if every two tuples of T are join-consistent

Two tuples t1 and t2 are join-consistent

if for every common attribute A:

1. t1[A] and t2[A] are non-null

2.t1[A] = t2[A]

Connected Sets of Tuples

A set of tuples is connected if its join graph is connected

The join graph of a setT of tuples:

- The nodes are the tuples of T
- An edge between every two tuples with a common attribute

Natural Join (w/o Cartesian Product)

T is join consistent

1.

JCC

2.

3.

T is connected

No Cartesian product

T is complete

One tuple from each relation

Each tuple of the result corresponds to a

set T of tuples from the source relations

FullDisjunction (Galindo-Legaria 1994)

JCC

2.

3.

3.

T is connected

No Cartesian product

T is complete

One tuple from each relation

T is maximal

Not properly contained in any JCC set

Each tuple of the result corresponds to a

set T of tuples from the source relations

T is join consistent

1.

The Outerjoin Operator

R1R2

The natural joinR1 R2 and, in addition,

all dangling tuplespadded with nulls

Theouterjoinof two relations R1andR2

Combining Relations using Outerjoins

The outerjoin operator is not associative

For more than two relations, the result depends on the order in which the outerjoin is applied

In general, outerjoins cannot maximally combine relations (no matter what order is used)

Outerjoin is not suitable for combining more than two relations!

Contents

- FullDisjunctions
- Complexity

- Contributions
- Algorithms
- Algorithm NLOJ for Tree-Structured Schemes
- Algorithm PDelayFD for General Schemes
- Algorithm BiComNLOJ − Main Algorithm

- Experimental Results
- Conclusion

Efficiency of Evaluation

The full-disjunction operator (as well as other operators

like the Cartesian product or the natural join) can generate

an exponential(in the input size)number of tuples

Polynomial running time is not a suitable yardstick

The usual notion:

Polynomial time in the combined size of the input and the output

History of Algorithms for Full Disjunctions

Source

Time

Databases

g-acyclic

RU96

O(n+F2)

KS03

O(n5N2F2)

general

O(n3NF2)“incremental polynomial”

CS05

general

This paper:linear dependence on F

number of relations

number of tuples in the DB

number of tuples in the FD

F is typically very large Can be exponential in the size of the database

n:

N:

F:

Polynomial Delay

time

One way to obtain an evaluation with a running time

linear in the output is to devise an algorithm that acts

as an iterator with an efficient next() operator, that is,

An enumeration algorithm that

runs with polynomial delay

An enumeration algorithm runs with polynomial delay if the time between every two successive answers is polynomial in the size of the input

Other Benefits of Polynomial Delay

- Incremental evaluation
- First tuples are generated quickly
- Full disjunctions are large, yet the user need not wait for the whole result to be generated

- Suitable for Web applications, where users expect to get the first few pages quickly
- In addition, the user can decide anytime that enough information has been shown

- First tuples are generated quickly
- Enable parallel query processing
- While one processor generates the FD tuples, other processors apply further processing

Contents

- Full Disjunctions
- Complexity

- Contributions
- Algorithms
- Algorithm NLOJ for Tree-Structured Schemes
- Algorithm PDelayFD for General Schemes
- Algorithm BiComNLOJ − Main Algorithm

- Experimental Results
- Conclusion

Main Contributions

Substantial improvement over the state-of-art

is proved theoretically and experimentally

1.First algorithm for computing full disjunctions withpolynomial delay

2. First algorithm for computing full disjunctions in time linear in the output

3. A general optimizationtechniquefor computing full disjunctions

Division into biconnected components

Contents

- FullDisjunctions
- Complexity

- Contributions
- Algorithms
- Algorithm NLOJ for Tree-Structured Schemes
- Algorithm PDelayFD for General Schemes
- Algorithm BiComNLOJ − Main Algorithm

- Experimental Results
- Conclusion

Our Algorithms

Combine

Algorithm NLOJ

Tree Schemes

Algorithm PDelayFD

GeneralSchemes

Division into Biconnected Components

Optimization

Algorithm BiComNLOJ

Main Algorithm− GeneralSchemes

- Full Disjunctions
- Complexity

- Contributions
- Algorithms
- Algorithm NLOJ for Tree-Structured Schemes
- Algorithm PDelayFD for General Schemes
- Algorithm BiComNLOJ − Main Algorithm

- Experimental Results
- Conclusion

Tree Schemes

R1

R5

R2

R3

R6

R7

R4

Scheme graphs w/o cycles

In the scheme graph, the relation schemes are the nodes and there is an edge between every two schemes with one or more common attributes

Left-Deep Sequence of Outerjoins

AlgorithmNLOJ (Nested Loop OuterJoin)

R: a set of relations with a tree scheme

R1,…,Rn: a connected-prefix order of R

Proposition:

FD(R) = (…((R1R2) R3) …) Rn

1. Compute a connected-prefix order of R

2. Apply outerjoins in a left-deep order

Connected-Prefix Order of Relations

Aconnected-prefixorder of relations:

Each prefix forms a (connected) subtree

R1

R5

R2

R3

R6

R7

R4

R1

R3

R2

R7

R4

R5

R6

Achieving Polynomial Delay

R2

R3

Rn-1

Rn

Already exponential size!

AlgorithmNLOJ (Nested Loop OuterJoin)

1. Compute a connected-prefix order of R

2. Apply outerjoins in a left-deep order

R1

…

Problem: exp. delay

Solution: use iterators

Iterators

To obtain polynomial delay, we use iterators

- Operate on top of an enumeration algorithm
- Implement next() by controlling the execution

Algorithm

Iterator

next()

Outerjoins are not Always Applicable

It is not always possible to formulate a

full disjunction as a left-deep sequence

of outerjoins

Rajaraman and Ullman[PODS 96]:

Some full disjunctions cannot be formulated as expressions of outerjoins (i.e., with arbitrary placement of parentheses)

Contents

- Full Disjunctions
- Complexity

- Contributions
- Algorithms
- AlgorithmNLOJfor Tree-Structured Schemes
- Algorithm PDelayFDforGeneralSchemes
- Algorithm BiComNLOJ − Main Algorithm

- Experimental Results
- Conclusion

About the Algorithm

- Unlike NLOJ, the next algorithm, PDelayFD, is applicable to all schemes (and not just trees)
- Algorithm PDelayFD has a polynomial delay, but the delay islargerthan that of NLOJ
- Nevertheless, PDelayFD by itself is a significant improvement over the state-of-art

Shifting a Maximal JCC Tuple Set T

t-shifting T:

T

1.Add t to T

2.Extractmax. JCC subset containing t

3. Extend to a maximal JCC set

t-shift of T

t

t

t

Algorithm PDelayFD

Theorem:

Validate that the t-shift is not already in Q orC

1. Generate a max. JCC set T0

2.Insert T0 into Q

PDelayFD(R) computes

FD(R)with polynomial delay

C

Q

Repeat until Q is empty:

1. Move some T from Q toC

2.Print the join ofT, padded with nulls

3.Insert into Qa t-shift of Tfor all

tuples t in the database

…

Output:

Contents

- Full Disjunctions
- Complexity

- Contributions
- Algorithms
- AlgorithmNLOJfor Tree-Structured Schemes
- AlgorithmPDelayFD for General Schemes
- AlgorithmBiComNLOJ− Main Algorithm

- Experimental Results
- Conclusion

NLOJ vs. PDelayFD

R3

R3

R2

R7

R2

R7

R8

R10

R1

R8

R10

R1

R6

R4

R5

R9

R6

R4

R5

R9

R3

R2

R7

R8

R10

R1

R6

R4

R5

R9

?

PDelayFD

NLOJ

- Shorter delays
- Less space
- Simpler to impl.

Our approach: divide and conquer

Biconnected Components

R1

R2

R5

R4

R3

R6

R8

R7

R9

R1

R5

R2

R3

R8

R6

R7

R4

Biconnected component:

A maximal subset Bof relations, s.t. the scheme graph has

two (or more) disjoint paths between every two relations ofB

Left-Deep Sequence of Outerjoins

R: a set of relations

Theorem:

There exists an (efficiently computable) order B1,…,Bk of the biconnected components ofR, s.t.

FD(R) = (…((FD(B1) FD(B2)) …) FD(Bk)

Optimized Algorithm:

1. Compute the biconnected components of R

2. Compute the full disjunction of each component

3. Apply outerjoins in a suitable order

BiComNLOJ: a Naïve Attempt

Iterator

Iterator

Iterator

Each FD(Bi) can be exponential in the input

1.DivideRinto biconnected components

→B1,…Bk in a suitable order

Non-polynomial delay!

2.ComputeFD(B1),…,FD(Bk)

− using PDelayFD

3. Using NLOJ, compute

(…((FD(B1) FD(B2)) …) FD(Bk)

Solution:

Retaining Polynomial Delay: 1st Problem

R2

R6

For simplification, assume only two components

R1

R3

R5

R7

R4

R8

B1

B2

- After generating a tuple t of FD(B1), we need to generate all tuples of FD(B2) that can join t
- Non-polynomial delay if all of FD(B2) is computed for finding these tuples!
- Solution:
- PDelayFD can be modified so that it generates only those tuples of FD(B2) that can join t

Details in the proceedings…

Retaining Polynomial Delay: 2nd Problem

R2

R6

For simplification, assume only two components

R1

R3

R5

R7

R4

R8

B1

B2

- The last step is to generate all tuples of FD(B2) that cannot be joined with tuples of FD(B1)
- However, this task is by itself NP-hard!
- Solution: When generating all tuples of FD(B2) that can be joined with some tuple of FD(B1), we collect enough information for generating the remaining tuples of FD(B2)

Details in the proceedings…

Contents

- Full Disjunctions
- Complexity

- Contributions
- Algorithms
- AlgorithmNLOJfor Tree-Structured Schemes
- AlgorithmPDelayFD for General Schemes
- Algorithm BiComNLOJ − Main Algorithm

- Experimental Results
- Conclusion

Experimental Setting

Implementation

R6

R2

R2

R7

R2

R3

R5

R7

R9

R5

R1

R3

R7

R9

R1

R3

R6

R4

R8

R10

R5

R4

R9

R6

R8

R10

R8

R10

R1

R4

Scheme S2

Scheme S1

Scheme S3

Algorithms:PDelayFD,BiComNLOJ(main)

IncrementalFD(CS05, state-of-art)

PosgreSQL(open source)

HW: Pentium4, 1.6GHZ, 512MB RAM

- Syntheticdata(randomly generated)
- Fixed schemes

State-of-Art vs. Main Algorithm

Scheme 1

Scheme 2

Scheme 3

IncrementalFD

(state of art, CS05)

BiComNJOJ

our main algorithm

Average Delay

(msec)

Number of Tuples in each Relation

BiComNLOJ is a substantial improvement over the state-of-art

Division into Biconnected Components

Scheme 1

Scheme 2

Scheme 3

PDelayFD

(no division to b.c.c.)

BiComNJOJ

our main algorithm

Average Delay

(msec)

Number of Tuples in each Relation

Division reduces delays

(amount depends on the scheme)

Behavior of Delay

Measure the delay before each generated tuple

IncrementalFD

(state of art, CS05)

BiComNJOJ

our main algorithm

Delay (msec)

Tuple Number

While IncrementalFDhas a slowdown, the delay of BiComNLOJ remains almost constant

Contents

- Full Disjunctions
- Complexity

- Contributions
- Algorithms
- AlgorithmNLOJfor Tree-Structured Schemes
- AlgorithmPDelayFD for General Schemes
- Algorithm BiComNLOJ − Main Algorithm

- Experimental Results
- Conclusion

Summary

Full Disjunction:

An associative extension of the outerjoin operator to an arbitrary number of relations

3 Algorithms for computing FD:

PDelayFD

Polynomial-DelayFull Disjunction

General Schemes

NLOJ

Nested-Loop Outerjoin

Tree-Structured Schemes

BiComNLOJ

Combine first 2, deploy div. into biconnected components

General Schemes

Contributions

- Substantial improvement of evaluation time over the state-of-art
- Proved theoretically and experimentally

- Full disjunctions can be computed with polynomial delay and in time linear in the output size
- Optimization techniques for computing FDs
- Implementation within PostgreSQL (ongoing…)
- Incorporating our algorithms into an SQL optimizer
- E.g., some operators can be pushed through the FD
- Not discussed here, appears in the proceedings…

Questions?

Download Presentation

Connecting to Server..