1 / 39

Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05

Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05. Motivation. Optimization of query evaluation in a peer-to-peer environment Development of a distributed algorithm based on Query-Sub-Query technique for optimization of Datalog queries in a peer-to-peer environment

jaron
Download Presentation

Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05

  2. Motivation • Optimization of query evaluation in a peer-to-peer environment • Development of a distributed algorithm based on Query-Sub-Query technique for optimization of Datalog queries in a peer-to-peer environment • Implementation of the algorithm using the Active XML system

  3. Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets

  4. Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets

  5. Alice Joyce Nancy Ruth Lois Andy Mark Example Input: • We are interested in the ancestor(x,y) relation • Typical query: “Give me all the ancestors of Andy” parent(x,y)

  6. Alice Joyce Nancy Ruth Lois Andy Mark Relational Database • A Database composed of relations (tables) • Stores only explicit information anc(x,y) parent(x,y)

  7. Deductive Database • Explicit information • Rules that enable inferences based on the stored data Datalog program parent(x,y) anc(x,y) :- parent(x,y) anc(x,y) :- anc(x,z), parent(z,y) ↨ head body recursions x,y (anc(x,y) ← parent(x,y)) x,y,z (anc(x,y) ← anc(x,z), parent(z,y))

  8. Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets

  9. Alice Joyce Nancy Ruth Lois Andy Mark Query Evaluation • Query: • Goal: Compute query with minimal data materialization q(y) :- anc(“Joyce”,y)

  10. QSQ • Known technique for optimization of Datalog queries:Query-Sub-Query (QSQ) • QSQ rewrites the Datalog program according to the given query • QSQ is based on two main notions: • Binding patterns • Supplementary relations

  11. Binding Patterns anc(x,y) :- parent(x,y) anc(x,y) :- anc(x,z), parent(z,y) q(y) :- anc(“Joyce”,y) • For each relation, adorned versions of the relation based on the bindings of the variables are considered • For example, adorned versions of anc are: ancbb, ancbf, ancfb, ancff,

  12. Binding Patterns anc (x,y) :- parent(x,y) anc (x,y) :- anc (x,z), parent(z,y) q(y) :- anc(“Joyce”,y) • The same relation may appear with different adornments in the Datalog program • different adornments of the same relation are treated as different relations during the QSQ computation bf bf bf bf bound to a constant free

  13. Supplementary Relations sup_10(x) :- in_anc_bf(x) sup_11(x,y) :- sup_10(x), parent(x,y) anc_bf(x,y) :- sup_11(x,y) sup_20(x) :- in_anc_bf(x) sup_21(x,z) :- sup_20(x), anc_bf(x,z) sup_22(x,y) :- sup_21(x,z), parent(z,y) anc_bf(x,y) :- sup_22(x,y) • For each adorned relation and each position in the body of a rule, we define a supplementary relation to accumulate the bindings relevant to that position ancbf(x,y) :- parent(x,y) ancbf(x,y) :- ancbf(x,z), parent(z,y) q(x) :- ancbf(“Joyce”,x) sup_10(x) sup_11(x,y) QSQ rewriting of the program sup_22(x,y) sup_20(x) sup_21(x,z)

  14. Alice Joyce Nancy Ruth Lois Andy Mark QSQ Example ancbf(x,y) :- parent(x,y) parent(x,y) sup_10(x) sup_11(x,y) Joyce Joyce, Lois Joyce, Ruth ancbf(x,y) :- ancbf(x,z), parent(z,y) sup_20(x) sup_21(x,z) sup_22(x,y) Joyce Joyce, Lois Joyce, Ruth Joyce, Mark Joyce, Andy Joyce, Mark Joyce, Andy q(y) :- ancbf(“Joyce”,y) Lois Ruth Mark Andy query result

  15. Properties of QSQ • Compute the correct answer to the query • Materialize only a minimal set of tuples • Guaranteed to terminate QSQ evaluations have nice properties!

  16. Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets

  17. R S hosting r,a hosting s,b T hosting t,c Distributed Environment Centralized Datolog program r1r(x,y) :- a(x,y) r2r(x,y) :- s(x,z), t(z,y) r3s(x,y) :- r(x,y), b(y,z) r4t(x,y) :- c(x,y) Distribution of the program between 3 peers r1r@R(x,y) :- a@R(x,y) r2r@R(x,y) :- s@S(x,z), t@T(z,y) r3s@S(x,y) :- r@R(x,y), b@S(y,z) The rules at peer P are the rules where P is the peer of the head r4t@T(x,y) :- c@T(x,y)

  18. Naïve Distributed Evaluation Activation of remote relations r2r@R(x,y) :- s@S(x,z), t@T(z,y) R request request S T response response AXML and Web Services make it very easy!

  19. Termination Detection • We need to detect when the system reaches a fixpoint • Fixpoint is reached when no new facts can be derived at any peer • Termination detection is a standard problem in distributed computing

  20. Termination Detection The model: • Communication is asynchronous • Each message eventually arrives and acknowledged • At some point, the site that started the query decides to check for termination • It calls all the sites that it directly invoked and asks them if they completed • These sites contact the sites they invoked and so on…

  21. Termination Detection • A site answers positively if: • It is idle (cannot produce more data) • All the data it has sent has been acknowledged • All its successors believe the computation terminated

  22. r a s t b c Termination Detection r1r@R(x,y) :- a@R(x,y) r2r@R(x,y) :- s@S(x,z), t@T(z,y) r3s@S(x,y) :- r@R(x,y), b@S(y,z) • Build a graph to represent the distributed Datalog program • Recursions result in cycles in the graph • Use a spanning tree of the graph in order to decide termination r4t@T(x,y) :- c@T(x,y)

  23. Distributed QSQ Rewriting • For each rule: The peer in the head of the rule starts the rewriting • When a remote relation is encountered, the peer delegates the remainder of the rule to the remote peer in charge of that relation

  24. r@Rbf(x,y) :- s@Sbf(x,z), t@Tbf(z,y) rbf(x,y) :- sbf(x,z), tbf(z,y) sup_2@T(x,y) sup_0@R(x) sup_1@S(x,z) sup_2(x,y) sup_0(x) sup_1(x,z) Distributed QSQ Rewriting • R computes sup_0@R(x) :- in_r_bf@R(x) • R sends to S sup2@S(x,y) :- sup0@R(x,y), s_bf@S(x,z), t_bf@T(z,y) sup_0(x) :- in_r_bf(x) sup_1(x,z) :- sup_0(x), s(x,z) sup_2(x,y) :- sup_1(x,z), t_bf(z,y) r_bf(x,y) :- sup_2(x,y) centralized distributed

  25. Distributed QSQ Rewriting • The rewriting is performed locally at each peer, without any global knowledge • Once the QSQ rewriting is complete, we start the QSQ computation process – Like in the central case, except for calling remote services

  26. Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets

  27. Why Active XML? • AXML is a natural selection • An AXML document contains both explicit and implicit data, just like in Datalog <r> <t> <x>1</x> <y>2</y> </t> <t> <x>1</x> <y>3</y> </t> <sc>… r@R(x,y) :- s@S(x,z), t@T(z,y) continuous services S T

  28. Implementation Steps • Given a distributed Datalog program and a query: • Transform the Datalog program to distributed QSQ • Transform the distributed QSQ to Active XML • Run! • Detect termination

  29. Outline • Datalog • Query-Sub-Query (QSQ) • Distributed Query-Sub-Query (dQSQ) • Implementation using AXML • Using dQSQ for Petri Nets

  30. Article “Diagnosis of Asynchronous Discrete Event Systems: Datalog to the Rescue!” S. Abiteboul, Z. Abrams, S. Haar, T. Milo PODS, June 2005

  31. Datalog & P2P • Deductive databases was a hot topic in the late 80s • Research in this area led to beautiful results, with little industrial impact • Years later, with networks everywhere, recursive data management is becoming more essential • Datalog and QSQ become hot again!

  32. Abstract • Diagnosis of distributed telecommunication systems • The problem can be modeled by Datalog • Can benefit from dQSQ

  33. Petri Nets marked place • An enabled transition can fire and yield a new Petri net • If a transition fires, its alarm symbol is reported to the supervisor • For example, if transition (i) fires. The marking moves from places 1,7 to places 2,3 transition alarm symbol place • The marked places model the current state of the peer • A transition node is enabled iff all its parent nodes are marked

  34. The Problem • The supervisor receives an alarm sequence (a1,p1),(a2,p2),…,(an,pn).Ai – An alarm symbolPi – The peer that emitted the alarm • Due to asynchronous communication • We do not guarantee that alarms sent by different peers appear in the order they were emitted • We can only assume that the order of alarms is kept for each individual peer • Goal: Find an explanation for a given alarm sequence

  35. Example The set of shaded nodes in figure 2 is a diagnosis for the alarm sequence (b; p1), (a; p2), (c; p1).

  36. From Petri Nets to dQSQ • Petri Nets can be modeled by Datalog and dQSQ • A set of relations and rules is defined at each peer • Each peer builds its own Datalog program using local information only, even if it has transitions to other peers

  37. From Petri Nets to dQSQ • Here is a small part of the Datalog rules…

  38. From Petri Nets to AXML • Translation steps from Petri Nets to Active XML: Petri Net Datalog QSQ AXML PNet2Datalog Datalog2QSQ QSQ2AXML

  39. The End

More Related