70 Views

Download Presentation
##### Computing Full Disjunctions

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Computing Full Disjunctions**Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem**Preliminary Notations**• Given • a set of relations r1, …, rn • with schemes R1, …, Rn, respectively • We denote with tijthe j-th tuple of ri • For X Ri, we denote by tij[X] the projection oftij on X • Next, we give some preliminary definitions**Acted-in**Actors Movies Actors-that-Directed Scheme Graph • Two distinct schemes Ri and Rj are connected if RiRj is non-empty • The scheme graph of R1, …, Rnconsists of • A node for each scheme Ri • An edge between Ri and Rj if Ri and Rj are connected**Acted-in**Movies Actors Movies Actors Connected Relation Schemes Unconnected Relation Schemes Connected Relations Schemes • Relation schemes Ri1, …, Rim are connected if their scheme graph is connected • Tuples ti1j1, …, timjm, from m distinct relations, are connected if the relation schemes of these relations are connected**Join Consistent Tuples**• Two tuples ti1j1 and ti2j2 are join consistent if ti1j1[Ri1Ri2] = ti2j2[Ri1Ri2] • m tuples, from m distinct relations, are join consistent if every pair of connected tuples are join consistent**Universal Tuple**• A universal tupleu is defined over all the attributes in R1 … Rnand consists of null and non-null values • We denote by û the non-null portion of u • A universal tuple is called integrated tuple if there are m connected and join consistent tuples ti1j1, …, timjm such that û is the natural join of ti1j1, …, timjm**Maximal Universal Tuple**• A universal tuple usubsumes a universal tuple v if u is equal to v on all the non-null attributes of v (i.e., u can be created from v by replacing some null values with non-null values) • In a given set D, a tuple u is maximal if there is no tuple in D, other than u, that subsumes u**A Full Disjunction**• The full disjunction of r1, …, rn is the set of all maximal integrated tuples that can be generated from m tuples of r1, …, rn**Acyclic Scheme**• Given a set of schemes R1, …, Rn, their scheme hypergraph consists of • A node for each attribute that appears in some Ri • For each Ri (1in), a hyperedge that includes the attributes of Ri • α-acyclic scheme hypergraph: • All the hyperedges can be removed by a sequence of ear removals • γ-acyclic scheme hypergraph: • The Bachman diagram of the scheme hypergraph is acyclic**Product Graph**• Given a query Q and a database D, the product of Q and D is a graph such that • The nodes are pairs of a node of Q and a node of D • The edges are between nodes such that the pair of nodes of Q and the pair of nodes of D both are connected by edges with the same label in Q and in D, respectively • The root is the pair of the root of Q and the root of D**1**movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director filmography item v1 actor filmography item movie name title director v3 v2 The product of the query and the database is the next graph w3 w1 language date of birth w4 w2 filmography item**V1, 1**movie actor movie V2, 2 V2, 3 V3, 4 title name title date of birth language w1, 5 w2, 6 w1, 8 w3, 10 w4, 11 director filmography item filmography item There are additional nodes that are not reachable from the root**Matching as a Subgraph of the Product Graph**• For a subgraph G of the product graph • G has no repeated variables • G contains the root • Each node in G is reachable from the root • G preserves the constraints (edges) of the query • Conditions 1 – 3 OR-matching graph • Conditions 1 – 4 weak-matching graph**V1, 1**V2, 2 V3, 4 w1, 5 w2, 6 w3, 10 w4, 11 V1, 1 movie actor movie V2, 2 V2, 3 V3, 4 title name title date of birth language w1, 5 w2, 6 w1, 8 w3, 10 w4, 11 director filmography item filmography item An OR-matching graph It is also a weak-matching graph**V1, 1**V1, 1 movie actor movie V2, 2 V2, 3 V2, 3 V3, 4 V3, 4 title name title date of birth language w1, 5 w2, 6 w1, 8 w1, 8 w3, 10 w3, 10 w4, 11 w4, 11 director filmography item filmography item Another OR-matching graph It is not a weak-matching graph since the “director” edge of the query is not preserved**Matching**Matching Graphs Each OR-matching graph represents an OR-matching (and each weak-matching graph represent a weak matching) An OR-matching can be represented by many OR-matchinggraphs, but all these graphs have the same set of nodes and only differ by their edges (and the same it true for weak-matchings and weak-matching graphs)**Intuition**• For DAG queries, matching graphs are constructed by adding edges according to the query constraints • The order of the extensions is simply made by using a topological sort of the query nodes • For cyclic queries, a simple traversal over the query nor a simple traversal over the database will work • Instead, we use a stratum traversal over the matching graph**Stratum 1**Stratum 2 Stratum 3 V1, 1 movie actor movie V2, 2 V2, 3 V3, 4 title name title date of birth language w1, 5 w2, 6 w1, 8 w3, 10 w4, 11 director filmography item filmography item … Dividing the edges to strata**Stratum Traversal**• A stratum traversal is an ordered list that • Starts with the edges on stratum 1 • Followed by the edges of stratum 2 • … • Followed by the edges of stratum n • … We only look at the first n strata, where n is the size of the query The order of the edges in each stratum is unimportant There can be multiple occurrences of the same edge in different strata**Computing the OR-Matching Graphs**• A set of OR-matching graphs is created • We extend each OR-matching graph in the set by adding edges according to the stratum traversal • Initially, the set includes a single graph that consists only the root of the product graph • In each extension step, we try to add the current edge to the graphs that were produced so far, and this may cause • The creation of a new graph that replaces the extended graph • The creation of a new graph that is added to the set of graphs in addition to the existing graphs • No change to the set of graphs**Adding an Edge**• After each addition of an edge, subsumed matching-graphs are being removed, to avoid exponential blowup • There are six cases that should be handled • The cases of extending a graph by an edge will be described next**movie**V1, O1 actor The graph already includes the added edge V2, O2 V3, O4 V1, O1 2 movie V2, O2 movie V1, O1 actor The target of the added edge has a node with a pair that includes the root of Q without the root of D V2, O2 V3, O4 V2, O2 1 title V1, O3 No change is being done No change is being done**3**movie V1, O1 actor The graph does not include the source of the added edge V2, O2 V3, O4 V2, O3 title W1, O8 4 movie movie V1, O1 V1, O1 actor actor The graph includes the source of the added edge and no node with the variable of the target V2, O2 V2, O2 V3, O4 V3, O4 title W1, O5 V2, O2 title The edge is added to the graph and the new graph replaces the existing graph W1, O5 No change is being done**a.k.a**V2, O2 a.k.a W1, O3 The edge is added to the graph and the new graph replaces the existing graph 5 movie V1, O1 actor The graph already includes the source and the target of the added edge but does not include the added edge itself V2, O2 V3, O4 title W1, O3**A new graph is created and**being added to the existing graph, without replacing it Different nodes with the same variable V2 movie V1, O1 actor movie V1, O1 actor V2, O2 V3, O4 V2, O4 V3, O4 film title W1, O3 (V2,O2) is replaced by (V2,O4), and nodes that are not reachable from the root are being erased 6 movie V1, O1 actor The graph includes the source of the added edge but also includes a node with the same variable as the variable in the target of the added edge V2, O2 V3, O4 title V3, O4 W1, O3 film V2, O4**1**V1, 1 2 V1, 1 movie V1, 1 V1, 1 V1, 1 V1, 1 movie V2, 2 V2, 2 V2, 2 V2, 2 3 V1, 1 movie movie movie V2, 3 V1, 1 movie V2, 3 Applying the algorithm to the movies example**4**V1, 1 V1, 1 movie movie actor actor movie movie V2, 3 V2, 3 V3, 4 V3, 4 V1, 1 actor V3, 4 V1, 1 V1, 1 V1, 1 V1, 1 V1, 1 V1, 1 5 actor actor actor movie actor movie movie movie V2, 3 V3, 4 V2, 3 V3, 4 V3, 4 V2, 2 V2, 2 V2, 2 V2, 2 V3, 4 title V2, 2 w1, 5 title w1, 5**V1, 1**V1, 1 actor actor actor movie 6 actor movie movie movie V2, 3 V3, 4 V2, 3 V3, 4 V3, 4 V3, 4 title language title w1, 5 w2, 6 w1, 5 V2, 2 language V1, 1 V1, 1 V1, 1 V1, 1 w2, 6 7 V2, 2 V2, 2 V2, 2 V2, 2 V1, 1 V1, 1 actor actor actor movie actor movie movie movie V2, 3 V3, 4 V2, 3 V3, 4 V3, 4 V3, 4 title title language title language w1, 5 w1, 5 w2, 6 w1, 5 w2, 6 V2, 3 title w1, 5**9**date of birth date of birth w4, 11 w4, 11 name name w3, 10 w3, 10 V3, 4 name V1, 1 w3, 10 V3, 4 date of birth V2, 2 w4, 11 8 V1, 1 actor actor movie movie V3, 4 V2, 3 V3, 4 title language title w1, 5 w2, 6 w1, 5**date of birth**date of birth w4, 11 w4, 11 name name w3, 10 w3, 10 V1, 1 V2, 2 V1, 1 date of birth V1, 1 date of birth movie actor actor movie director V2, 2 V3, 4 w4, 11 V2, 3 V3, 4 w4, 11 name title language title name w1, 5 w2, 6 w3, 10 w1, 5 w3, 10 10 V1, 1 actor actor movie movie V3, 4 V2, 3 V3, 4 title language title w1, 5 w2, 6 w1, 5 V2, 2 director V3, 4**date of birth**V1, 1 date of birth actor actor movie movie V3, 4 w4, 11 V2, 3 V3, 4 w4, 11 name title language title name w1, 5 w2, 6 w3, 10 w1, 5 w3, 10 V3, 4 V1, 1 filmography item Subsumed by the left matching graph V2, 2 V2, 2 V1, 1 date of birth V1, 1 date of birth V1, 1 date of birth movie actor actor actor movie director V2, 2 V3, 4 w4, 11 V2, 3 V3, 4 w4, 11 V2, 2 V3, 4 w4, 11 filmography item name title title name filmography item name w1, 5 w3, 10 w1, 5 w3, 10 w3, 10 language w2, 6 11**V3, 4**filmography item V2, 3 Subsumed by the right matching graph V1, 1 date of birth V1, 1 date of birth V1, 1 date of birth movie actor actor actor movie director V2, 2 V3, 4 w4, 11 V3, 4 w4, 11 V2, 3 V3, 4 w4, 11 filmography item name title V2, 3 name filmography item title name filmography item w1, 5 w3, 10 w3, 10 w1, 5 w3, 10 language w2, 6 12 V1, 1 date of birth V1, 1 date of birth movie actor actor movie director V2, 2 V3, 4 w4, 11 V2, 3 V3, 4 w4, 11 filmography item name title title name w1, 5 w3, 10 w1, 5 w3, 10 language w2, 6**The OR-Matchings**director V1, 1 date of birth movie actor director V2, 2 V3, 4 w4, 11 V1, 1 date of birth filmography item name actor title movie w1, 5 w3, 10 V2, 3 V3, 4 w4, 11 language filmography item title name w2, 6 w1, 5 w3, 10 The Product Graph V1, 1 movie actor movie V2, 2 V2, 3 V3, 4 title name title date of birth language w1, 5 w2, 6 w1, 8 w3, 10 w4, 11 filmography item filmography item**Computing Maximal Weak-Matching Graphs**• In order to compute maximal weak matching graphs, the same algorithm is being used with a slight change • After each addition of an edge the nodes that cause a query constraint not to be preserved are removed (along with edges that contain these nodes) • Also, are deleted nodes that the previous deletion causes them not to be reachable from the root**The Algorithm Computes Weak-Queries in Polynomial Time**Theorem Given a query Q and a database D, the revised algorithm terminates with the set of maximal weak-matching graphs of Q w.r.t. D. The runtime of the algorithm is O(q3dm2), where q is the size of the query, d is the size of the database and m is the size of the result