beyond relations as sets multisets complex objects sequences and streams
Download
Skip this Video
Download Presentation
Beyond Relations as Sets: Multisets, Complex Objects, Sequences, and Streams

Loading in 2 Seconds...

play fullscreen
1 / 51

Beyond Relations as Sets: Multisets - PowerPoint PPT Presentation


  • 266 Views
  • Uploaded on

Beyond Relations as Sets: Multisets, Complex Objects, Sequences, and Streams. Equivalences Among Relational Expressions; Aho, Sagiv, and Ullman, SIAM J. Computing, 1979 Four Views of Complex Objects: A Sophisticate\'s Introduction; Hull, LNCS 361 Nested Relations and Complex Objects, 1987

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Beyond Relations as Sets: Multisets' - Pat_Xavi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
beyond relations as sets multisets complex objects sequences and streams

Beyond Relations as Sets: Multisets, Complex Objects, Sequences, and Streams

Equivalences Among Relational Expressions; Aho, Sagiv, and Ullman, SIAM J. Computing, 1979

Four Views of Complex Objects: A Sophisticate\'s Introduction; Hull, LNCS 361 Nested Relations and Complex Objects, 1987

SRQL: Sorted Relational Query Language; Ramakrishnan et al., SSDBM 1998

Raghu Ramakrishnan

what s there beyond relations as sets
What’s there Beyond Relations As Sets?
  • Relations as multisets (real SQL)
  • Object id (Object DBMSs)
  • Nested structure (Object-Relational DBMSs)
  • Partial structure (e.g., XML)
  • Sequences (SQL:1999)
  • Temporal databases (versions, “valid” time)
  • Streams (data feeds, alert systems)
    • data/query duality
relations as multisets bags
Relations-As-Multisets (Bags)
  • What if the number of copies of a tuple matters?
    • E.g., Find the average salary
    • In fact, a relation is a multiset in SQL, by default.
  • The results for conjunctive query containment over relations-as-sets no longer hold!
    • Containment: No longer in NP (Chaudhuri-Vardi); equivalence is still in NP!
    • Containment of unions of conjunctive queries: Undecidable (Ioannidis-Ramakrishnan)

Lesson: Small changes in the data model can have a big impact!

querying sequences in sql 1999
Querying Sequences in SQL:1999
  • Trend analysis is difficult to do in SQL-92:
    • Find the % change in monthly sales
    • Find the top 5 product by total sales
    • Find the trailing n-day moving average of sales
    • The first two queries can be expressed with difficulty, but the third cannot even be expressed in SQL-92 if n is a parameter of the query.
  • The WINDOW clause in SQL:1999 allows us to write such queries over a table viewed as a sequence (implicitly, based on user-specified sort keys)
motivation

Stocks(date, symbol, close)

Trans(cust, symbol, date, shares)

Motivation
  • Find the trailing weekly moving average for each stock.
  • For each date in ‘97, find the ten cheapest stocks.
  • Compute the % change of each stock during ‘97, and then find stocks in the top 5% (those that changed the most).
  • For each week, find the stock in which Joe had the most invested.
sequences
Sequences
  • Simple sequence: A relation (logically) sorted on a key; key attributes are called sequencing attributes.
  • Composite sequence:A relation that is first partitioned usinggrouping attributes, then sorted within each partition bysequencing attributes.

ord(t): Ordinal # of tuple within its group

sequence algebra
Sequence Algebra
  • Relational algebra, extended to work over multisets of tuples, is the foundation for SQL.
  • We extend the relational operators to work on sorted relations.
  • In addition, we define some new operators over sequences. (Only one of these new operators is necessary; the rest are for convenience.)
extending relational algebra
Extending Relational Algebra
  • The operators are: s, p, x, u, -, distinct

scan use conditions over ord attr of input

  • To extend these ops to work on sorted relations, we must define the grouping and sequencing attributes of the result.
    • We do this simply: the grouping and sequencing attribute lists are empty for the output of the above ops. I.e., the result is unordered.
extending the join operator
Extending the Join Operator
  • Since cross-product (x) does not propagate the ord values of the input table to its output, we must define join as a primitive op!
    • Join ( ): Can refer to ord values of inputs.
    • Left-outer join ( ): Left tuples with no matching right tuple appear in result with NULLs in right tuple attrs.
    • Result has empty grouping / sequencing lists.
creating a sequence
Creating a Sequence
  • The sequencing operator y is our fundamental extension to rel algebra.

Applied to a table R,

with grouping attrs g and sequencing attrs s, it returns the corresponding composite sequence.

shiftall operator d
ShiftAll Operator D
  • D(R,I) joins each tuple t in sequence R with all tuples t’ in the same group such that ord(t’) = ord(t)+I.
    • If such t’ does not exist, use NULLs.
    • Can be defined as left-outer join followed by sequencing; included for convenience.
    • Can be extended to use a fixed ordinal such as FIRST or LAST, and also to align each tuple t with more than one other tuple t’.
shift operator d
Shift Operator d
  • Modifies each tuple t of R by adding the value of the sequencing attributes of tuple t’ (whose ord value is shifted by I).
  • Differs from ShiftAll (followed by projection of unwanted columns) if R has duplicates in the sequencing columns.
  • Like ShiftAll, Shift can be defined in terms of left-outer join and sequencing.
slide15

SELECT S.g,S.t,S.v,SHIFTALL(S,-1).v,SHIFT(S,1).t

FROM R GROUP BY g, SEQUENCE BY t AS S

  • SHIFTALL gives access to all attrs of the tuple(s) at the shifted ordinal position.
  • SHIFT gives the sequencing attr values of the tuple at the shifted ord position.
slide16

SELECT t, AVG(v) OVER 0 TO 1

FROM R SEQUENCE BY t

POSITIONAL

AGGREGATE

VALUE-BASED

AGGREGATE

SELECT t, AVG(v) OVER VALUES 0 TO 1

FROM R SEQUENCE BY t

srql queries

SELECT expr-list

FROM table-list

WHERE predicate

GROUP BY expr-list

SEQUENCE BY expr-list

WITH predicate

HAVING predicate

ORDER BY expr-list

SRQL Queries
  • FROM clause can contain expressions involving sequencing; first evaluate these and find cross-prod.
  • Then apply selections in WHERE clause, and sort as per GROUP BY and SEQUENCE BY clauses to produce a sequence.
  • For each tuple (and each agg expr in SELECT), identify window and apply agg fn.
  • Apply HAVING clause, finally ORDER BY.
slide18

SELECT V.name, Q.name

FROM Volcano AS V, Quake SEQUENCE BY time AS Q

WHERE Q.time <= V.time AND

(SHIFT(Q,1).time > V.time OR

SHIFT(Q,1).time IS NULL)

AND Q.magnitude > 7

  • For each volcano eruption where the most recent quake was > 7, find the name of this quake:
  • Using some syntactic sugar:

SELECT V.name, Q.name

FROM Volcano AS V, Quake SEQUENCE BY time AS Q

WHERE Q.time PRECEDES= V.time AND

AND Q.magnitude > 7

aggregate functions
Aggregate Functions
  • As in SQL, we need to extend the algebra to allow the use of aggregate functions like MIN and SUM.
    • Must define “window” for each application of an agg fn.
    • In SQL, window is a partition created by GROUP BY; agg op applied to each partition, and yields one value per partition.
  • In SRQL, for each agg fn, can have a different window for each tuple.
window aggregate op w
Window Aggregate Op w
  • w (R, p, f, V):
    • R is a sequence, f is an agg fn, V is an attr of R, p is a selection predicate; result is a sequence with same grouping/seq lists as R
    • For each input tuple t, output contains

< t, f(pv ( st.g=R.g and p(t) (R) ) ) >

  • I.e., for each tuple t of R, apply p(t) to find its “window” within the group of t, apply agg fn f to the multiset of V-values in window, and include with t in result.
slide21

SELECT t, AVG(v) OVER 0 TO 1

FROM R SEQUENCE BY t

POSITIONAL

AGGREGATE

VALUE-BASED

AGGREGATE

SELECT t, AVG(v) OVER VALUES 0 TO 1

FROM R SEQUENCE BY t

slide22

SELECT day, AVG(profit) OVER 0 TO 1

FROM Sales

SEQUENCE BY day

WITH vol>100

  • Find 2-day moving average of the profit made on sales volume > 100:
slide23

SELECT day, AVG(profit) OVER 0 TO 1

FROM Sales

SEQUENCE BY day

WHERE vol>100

  • Considering only days with sales volume > 100, find 2-day moving average of the profit:
  • Placing selection in WHERE clause affects window for positional aggregation!
slide24

SELECT product, day, AVG(vol) OVER 0 TO 1

FROM Sales

GROUP BY product

SEQUENCE BY day

  • Find the 2-day moving average of volume sold for each product:
  • In effect, creates a sequence by day for each product, and computes the moving average over each of these sequences.
  • Observe how this generalizes SQL’s GROUP BY: illustrates power of composite sequences and aggregation.
the window clause
The WINDOW Clause

SELECT L.state, T.month, AVG(S.sales) OVER W AS movavg

FROM Sales S, Times T, Locations L

WHERE S.timeid=T.timeid AND S.locid=L.locid

WINDOW W AS (PARTITION BY L.state

ORDER BY T.month

RANGE BETWEEN INTERVAL `1’ MONTH PRECEDING

AND INTERVAL `1’ MONTH FOLLOWING)

  • Let the result of the FROM and WHERE clauses be “Temp”.
  • (Conceptually) Temp is partitioned according to the PARTITION BY clause.
    • Similar to GROUP BY, but the answer has one row for each row in a partition, not one row per partition!
  • Each partition is sorted according to the ORDER BY clause.
  • For each row in a partition, the WINDOW clause creates a “window” of nearby (preceding or succeeding) tuples.
    • Can be value-based, as in example, using RANGE
    • Can be based on number of rows to include in the window, using ROWS clause
  • The aggregate function is evaluated for each row in the partition using the corresponding window.
    • New aggregate functions that are useful with windowing include RANK (position of a row within its partition) and its variants DENSE_RANK, PERCENT_RANK, CUME_DIST.
top n queries
Top N Queries
  • If you want to find the 10 (or so) cheapest cars, it would be nice if the DB could avoid computing the costs of all cars before sorting to determine the 10 cheapest.
    • Idea: Guess at a cost c such that the 10 cheapest all cost less than c, and that not too many more cost less. Then add the selection cost<c and evaluate the query.
      • If the guess is right, great, we avoid computation for cars that cost more than c.
      • If the guess is wrong, need to reset the selection and recompute the original query.
top n queries27
Top N Queries

SELECT P.pid, P.pname, S.sales

FROM Sales S, Products P

WHERE S.pid=P.pid AND S.locid=1 AND S.timeid=3

ORDER BY S.sales DESC

OPTIMIZE FOR 10 ROWS

  • OPTIMIZE FOR construct is not in SQL:1999!
  • Cut-off value c is chosen by optimizer.

SELECT P.pid, P.pname, S.sales

FROM Sales S, Products P

WHERE S.pid=P.pid AND S.locid=1 AND S.timeid=3

AND S.sales > c

ORDER BY S.sales DESC

streams are forever
Streams are Forever …
  • So no notion of query over “entire” stream
    • Queries must operate over parts of a stream
    • How do you identify “parts”?
      • Common answer: windows of various kinds
  • Continuous queries
    • Many queries of interest being continually evaluated over a given stream
      • What do you index? The collection of queries or the streaming data?
      • Duality of data/queries!
  • Rate-oriented optimizations
    • Can query evaluation/monitoring keep up with arrival rate?
    • If not, how do we achieve a good trade-off?
  • Stream statistics
    • Incremental computation of statistics
    • Can actually maintain stats over “entire” stream, but also over “groups” of tuples (window or value-based)
complex objects31
Complex Objects

Arise as a generalization of earlier models; in particular, we study them as extensions of the relational model.

Features:

Richer structuring facilities. e.g. context-dependent information (john, {joanna, goanna})

Precursor to XML’s semi-structured model

Data sharing (in models with object-identity, e.g., LDM)

Inheritance

Two streams of Complex Objects research:

Emphasis on structure of objects—our focus!

Emphasis on models for expressing rich semantics

look out for
Look Out For …
  • Treatment of identity
    • Value-based or object-based?
  • Use of sets
    • Restrictions make a big difference!
  • Also, Type vs. Class.

Starting Point

The relational model:

    • Only one basic domain of constants (D)
    • Only one type constructor: R:[A1: D, A2:D, …, An:D]
    • The only ‘objects’ are tuples.
complex objects33
Complex Objects

Types are constructed as follows:

  • Basic: Basic types T1,T2, …, Tn are given. An object of type Ti is an element of (the given) domain Di.
  • Record:If T1,T2, …, Tk are types, then T= [L1: T1,L2:T2, …,Lk:Tk] is also a type. An object of type T is an element of DOM(T1) DOM(T2)  ... DOM(Tk). (This set of possible values is DOM(T))
  • Set: If T1 is a type, then T = L:{T1} is also a type. An object of type T is a finite subset of DOM(T1). (What is DOM(T)?)

T

T

The Li’s are labels, or field names, drawn from a separate domain. No label is used twice in the same type.

L1

LK

L

T1 TK

T1

example

#5, 10k, [John, 30], [Joe, 1] [Jill, 30]

Toy,

#7, 20k, [Jack, 32],

Gumbo,

#7, 50k, [Jack, 32],

Example

Instance of a type: A set of objects of that type.

Example:

Basic types: int, char_str

Person = [Name: char_str, Age: int]

Emp = [EID: int, Sal: int, Who:person, Family:{Person}]

Dept = [Dname:char_str, Demps:{Emp}]

Instance of Dept:

update examples

274-7473390-2577

Syne, Nancy,

232-2323289=9887

Energy, John

Wefix, John,

URaft, Andy,

939-9983

Update Examples

Corp

Phones

CName

Owner

Phone

Insert [Lang, Sylvie, {989-2232}] at Corp

Delete 390-2577 at Phones – ‘Energy’ tuple is modified

Delete WeFix at CName – ‘WeFix’ tuple is deleted

Replace Andy by Sabra at Owner – Modifies the tuple

Delete Andy at OwnerInsert Sabra (into the person instance)Not ≡ to the ‘replace’ in 4.

nfnf relations pnf
NFNF Relations, PNF

NFNF Relation Types

T = [T1, T2, …, Tk], and

(a) S= {S1}  S1 = [ , …, ] (i.e., S1 is a record type)

(b) S = [S1, S2, …, Sm],  Si = {Si’} (i.e., a set type), or a basic type.

for all types S used in defining Ti’s.

Straightforward to obtain ‘equivalent’ NFNF type for any given type T.

Partitioned , or nested, normal form

A type T is in PNF if  type S used to define T: If S = [S1, S2, …, Sm], and S1, S2, …, Sk , k  m, are basic types, then S1, S2, …, Sk includes a key for S.

(Must hold, recursively, at every level of nesting!)

examples of nfnf

Bach, Dieter, Joseph, Jinyi

DeWitt, Jeff, Raghu

Bach, Dieter, Joseph, Jinyi

DeWitt, Jeff, Raghu

Bach, Jeff

Bach, DeWitt

AT&T

Sperry, Genentech

Examples of NFNF

UW-CS,

“Essentially all NFNF relationship in practice are in PNF.”

e.g.

UW-CS,

UW-CS, Theory,

UW-CS, DB,

UW-CS,

Bach/Jeff is one (of several such) combination(s) that can teach a course involving both theory and database.

But

, ……

UW-CS,

Med_Info_Sys,

Neither Sperry nor Genentech by itself can do the job, but together, they can.

Med_Info_Sys,

relational calculus trc

A variable that is not ‘bound’ through  or 

Relational Calculus (TRC)
  • Query language for FLAT tables.
  • Atomic Formula
    • R(S) – S is a tuple in R
    • d θ S.L or S.Lθd or S.LθS1.L1
  • Formula
    • Any atomic formula
    • p, pq, pq, pq
    • S (P(S)) – S is a variable
    • S (P(S)) – S is a variable
  • Query
    • {u | p(u)}, where u is the only ‘free variable’ in p.
query language extended relational calculus

R(S)

  • S  O.L
  • d θ O.L orO.L θ d or O.L θ O1.L1
  • O.L = {u|(u, t1, …, tn)}

The set of all u s.t. is true. u, t1, … tn are free variables of . If O is a variable, it is distinct. This is the biggie.

S is a tuple in R

S is in (the set) O.L

Comparisons , θ is =, <, > etc.

Query Language (Extended Relational Calculus)

Formulas built using , , , , , from atoms of the form:

example transitive closure

Anc = {p | t (almost(t)  p t}

Example: Transitive Closure

OK(t) =  s [s t ( p(Par(p)  s.1 = p.1)  q(Par(q)  s.2 = q.2))]

trans(t) =  p  q[p t  q t  OK(t)  p.2 = q.1  ( r(r t  r.1 = p.1  r.2 = q.2))]

contains(t) =  p [ Par(p)  p t]

almost(t) = OK(t)  trans(t)  contains(t)

Intersection of models

Note: The variable t takes on relations as values.

This underscores the PNF restriction—the intermediate relations are not in PNF. (Note that t is really t.1—a tuple whose only field takes on relations as values.) This query can’t be expressed if all relations are to be in PNF.

miscellaneous remarks

a1 b1 c1

c2

b2 c3

a2 b3 c4

b1 c5

Miscellaneous Remarks

1. PNF condition: [A, {[B, {C}]}]

a1 b1 c1

c2

b2 c3

a2 b3 c4

2. Transitive closure example

variables taking tuple values vs. variables taking relation values(Also, example is due to Vardi; similarly, domain independence is due to Dipaola.)

What if this is b3?

1 nf relational calculus
1 NF relational Calculus

e.g. {t |  R(t)}

An expression {t | ψ(t)} is SAFE if:

1. Whenever ψ(t) is true, each component (field) of t is in DOM(ψ).

2. For each sub formula of ψ of the form  u((u)), if any component of u is not in DOM(ψ), then  (u) is true for all values of the other free variables in .

3. For each sub formula of ψ of the form  u((u)), if  (u) is true for any values of the other free variables in , then each component of u is in DOM(ψ).

  • What is DOM(ψ)?
  • What does “safety” give us?
nfnf query languages algebra
NFNF Query Languages (Algebra)
  • We’ve already seen the calculus, which is as expressive as Datalog.
  • NFNF Algebra
    • The usual operators (Suitably generalized)
    • , ,, , , 
    • And some fancy additions:

(nest),  (unnest), (powerset)

  • Main Result
    • Safe calculus  Algebra
operators nest
Operators - Nest 
  • Let R = [R1, …, Rn] = [S1, …, Sk, T1, …, Tm]
  • Let I be an instance of R.
  • S=(S1…Sk) (I) : “Group S”
    • { u |  r  I (

(u.Ti = r.Ti, 1i  m)

(u[S] = {q[ S1, …, Sk] | q  I  (q.Ti = r.Ti, 1  i  m)} ) )

}

operators nest45

Phones

Emps

BName

Owner

{P.No.}

{[Ename, Ejob]}

288-2323

GmbH

Nancy

Karl

Researcher

232-4232

232-2345

Lucy

Designer

E.Inc.

John

Jim

Translator

877-2323

John

Explorer

W.Fix

John

Gail

Attorney

Operators - Nest 

What is : υallmine= (Bname, Phones, Emps) (Business)?

example46

Owner

Allmine

{(Bname, {P.No.}, {[Ename, Ejob]})}

Nancy

GmbH

288-2323

Karl

Researcher

232-4232

John

E.Inc.

232-2345

Lucy

Designer

877-2323

Jim

Translator

John

Explorer

W.Fix

Gail

Attorney

Example

What is : υallmine= (Bname, Phones, Emps) (Business)?

more on nest
More on Nest
  • Can you say anything about the key fields of a nested relation?
  • Is υ commutative?Let R = [A, B, C], I be an instance of R.

Is υD=(B) (υE=(C) (I)) = υE=(C) (υD=(B) (I)) ?

Find a (counter) example.

Both expressions have the same type:

⊛ [⊗[A, ⊛[⊗[B]], ⊛[⊗[C]]]]

operators unnest
Operators – Unnest 
  • Let R = {[R1, …, S, …,Rn]} where S = {[S1, …, Sm]}
  • Let I be an instance of R.

Then S(I) has type {[R1, …, S1, …, Sm, …,Rn]}

S(I) = {t |  r  I ( (t.Ri = r.Ri, 1  i  n) 

t [S1, …, Sm]  r.S)

}

example cont

(Phones){P.No.}

Owner

Phone

(Phones){P.No.}

Owner

Owner

Nancy

288-2323

Nancy

288-2323

Nancy

288-2323

232-4232

Nancy

232-4232

232-4232

John

232-2345

John

232-2345

John

232-2345

877-2323

John

877-2323

John

877-2323

John

Example – Cont.

Consider projected= owner, phones (business)

Another instance of projected

Projected

phones(Projected)

operators nest vs unnest
Operators – Nest vs. Unnest
  • υ and  are (almost) inverses. Let I be an instance of type R.
    • When is (Ri) (υ(Ri)I) = I?
    • What happens when:

1. The unnested field has different (set-)values for the same values in the other columns?

2. The unnested field has the empty set?

    • Can (2) happen following a nest on the same column?
    • When is υ(Ri) ((Ri)I) = I?
operators powerset
Operators – Powerset 

Let I be an instance of R.

(I) is of type {[R]}, and contains a tuple for each subset of I.

Express transitive closure using the NFNF algebra!

Theorem:

NFNF algebra is equivalent in expressive power to (safe) NFNF calculus.

ad