1 / 29

Performance Guarantees for Distributed Reachability Queries

Performance Guarantees for Distributed Reachability Queries. 1. outline. Partial Evaluation. Distributed query evaluation with performance guarantees. Querying distributed real-life graphs Real-life graphs are often fragmented/distributed Distributed reachability queries

lois
Download Presentation

Performance Guarantees for Distributed Reachability Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Guarantees for Distributed Reachability Queries 1

  2. outline Partial Evaluation Distributed query evaluation with performance guarantees • Querying distributed real-life graphs • Real-life graphs are often fragmented/distributed • Distributed reachability queries • Distributed bounded reachability queries • Distributed regular reachability queries • Distributed reachability with MapReduce • Experimental study • Conclusion 2 Yinghui Wu VLDB 2012

  3. Distributed Real-life Graphs Real-life graphs are purposely or naturally distributed • Real life graphs are distributed • Geo-distributed, e.g., data centers • Decentralization, e.g., social networks • Distributed entity and personal information 3 Yinghui Wu VLDB 2012

  4. Distributed querying methods Q fragments ... centralized querying Q(G) construction and maintenance cost • Federated/centralized graph database • collect and link graph fragments • query the centralized graph 4 Yinghui Wu VLDB 2012

  5. Distributed querying methods Q ... intermediate results query plan master node Q(G) slave node no bounds on visit numbers and data shipment • Graph exploration strategy • Master node and slave node • Predefined graph partition and query execution plan 5 Yinghui Wu VLDB 2012

  6. Querying a distributed social network Mat,"HR" Fred, "HR" DC1 Ann, "CTO" Emmy,"HR" Walt, "HR" Ben,"MK" DC2 Dan,"DB" (DB*∪HR*) Jack,"MK" Bill,"DB" Mark,"FA" Ross,"HR" Mark, "FA" Q Pat,"SE" centralized method? Graph exploration? Tom,"AI" DC3 6 Using partial evaluation to obtain performance guarantees Yinghui Wu VLDB 2012

  7. Partial evaluation • Partial evaluation (a.k.a program specialization) • given a function f(s,d) and a part of input e.g., s, specializes f(s,d) w.r.t s • only conducts the part of f’s computation that depends on s • generates a residual function f’ s f (s, d) f’ (d) for graph queries? Fi Q (Fi, G) Q’ (G) Partial evaluation: generating partial answer 7

  8. Distributed graphs and graph queries • Distributed graph • graph fragmentation • F = (F, Gf) • fragment graph Gf • Reachability query • reachability query Qr(s,t) • bounded reachability • query Qbr(s,t,l) • regular reachability • (path) query Qrr(s,t,R) • R::= ε| a | RR | R∪R | R* an in-node of F1 a virtual node of F1 fragment Ann, "CTO" Ann, "CTO" Ann, "CTO" Mat,"HR" Fred, "HR" F1 Emmy,"HR" Walt, "HR" 5 a cross edge F2 (DB*∪HR*) Jack,"MK" Bill,"DB" Mark, "FA" Mark, "FA" Mark, "FA" Ross,"HR" Qrr(Ann, Mark, (DB*∪HR*)) Qbr(Ann, Mark, 5) Qr(Ann, Mark) Gf Pat,"SE" Tom,"AI" F3 8 Yinghui Wu VLDB 2012

  9. Distributed graph querying framework coordinating site Sc and a set of graph fragments F1, …, Fn Q Q(Fi) Q(Fi) distributing at Sc: post Q to fragments fragments ... Q(G) Q Q Q Q local evaluation: partially evaluate Q coordinator Sc Q(Fi) Q(Fi) Assembling at Sc Applying partial evaluation to graph querying 9 Yinghui Wu VLDB 2012

  10. Distributed reachability queries • Performance guarantees: Over a fragmentation F = (F, Gf) of a graph G, reachability queries can be evaluated (a) in O(|Vf||Fm|)time, (b) by visiting each site only once, and (c) with the total network traffic bounded by O(|Vf|2), where Gf = (Vf , Ef) and Fm is the largest fragment in F. • A distributed reachability evaluation algorithm DisReach • Coordinator Sc posts qr(s,t) to each fragment site in F • Each site locally evaluates qr(s,t) in parallel, and produces partial answer as a set of Boolean equations • Sc collects and assembles the partial results 10 Yinghui Wu VLDB 2012

  11. Distributed reachability: partial evaluation • Local evaluate each qr(v,t) on Fi in parallel: • for each in-node v’ in Fi, decides if v’ reaches t; introduce a Boolean variable to each v’ • Partial answer to qr(v,t): a set of Boolean formula, disjunction of variables of v’ to which v can reach qr(v,t) = Xv1’ or … or Xvn’ v qr(v,v’) Xv’ = qr(v’,t) v’ t t Partial evaluation by introducing Boolean variables 11 Yinghui Wu VLDB 2012

  12. Distributed reachability: assembling • Collect the Boolean equation set at coordinator Sc • solve a Boolean equation system over a dependency graph • qr(s,t) is true iff Xs = true at Sc Xs = Xv O(|Vf|) Xv = Xv’’ or Xv’ Xv’ = Xt Xv’’ = false Xt = 1 Partial evaluation by introducing Boolean variables 12 Yinghui Wu VLDB 2012

  13. Distributed reachability queries: example Dispatch Q to fragments (at Sc) Partial evaluation: generating Boolean equations (at Fi) Assembling: solving equation system (at Sc) Q Mat,"HR" Fred, "HR" Emmy,"HR" Walt, "HR" F2 F1 Ann Q Q Q Q Sc Jack,"MK" Bill,"DB" Mark Ross,"HR" F3 Pat,"SE" Tom,"AI" 13 Yinghui Wu VLDB 2012

  14. Distributed bounded reachability queries Dispatch Q to fragments (at Sc) Partial evaluation: generating equations (at Fi) Assembling: solving equation system (at Sc) Q Mat,"HR" Fred, "HR" Emmy,"HR" Walt, "HR" F2 F1 A weighted dependency graph Ann Q Q Q Q Sc Jack,"MK" Bill,"DB" Mark Ross,"HR" F3 Pat,"SE" Variables denoting numeric values Tom,"AI" 15 Yinghui Wu VLDB 2012

  15. Distributed bounded reachability queries Performance guarantees for distributed bounded reachability Performance guarantees: bounded reachability queries can be evaluated with the same performance guarantees as for reachability queries. 16 Yinghui Wu VLDB 2012

  16. Distributed regular reachability queries Automaton representation for queries Performance guarantees: Over a fragmentation F = (F, Gf) of a graph G, regular reachability queries qrr(s, t, R) can be evaluated (a) in O((|Vf|2+|Fm|)|R|2 )time, (b) by visiting each site only once, and (c) with the total network traffic bounded by O(R|2|Vf|2), where Gf = (Vf , Ef) and Fm is the largest fragment in F. Query automaton Gq(R) of R: <Vq, Eq, Lq, us, ut> 17 Yinghui Wu VLDB 2012

  17. Query automaton Mat,"HR" Ann Fred, "HR" Emmy,"HR" Walt, "HR" DB HR Mark,"FA" FA Ross,"HR" A node v is a match of state uv in Gq(R) iff (1) they have the same label, and (2) there is a path ρ from v to t and a path ρ’ from uv to ut , s.t. ρ and ρ’ induce the same label Given a graph G, qrr(s, t, R) over G is true if and only if s is a match of us in Gq(R) 18 Yinghui Wu VLDB 2012 Q Tom,"AI"

  18. Distributed regular query evaluation: algorithm 19 Yinghui Wu VLDB 2012

  19. Distributed regular query evaluation: partial evaluation • For each node v in Fi, assign v. rvec: a vector of O(|Vq|) Boolean formulas, each entry v.rvec[u] denotes if v matches u • introduce a Boolean variable X(v’,w) to each virtual node v’ of Fi and a state w in Vq, denoting if v’ matches w • Partial answer to qrr(s,t): a set of Boolean formula from each in-nodes of Fi f21 f11 f1v’ f22 f12 f2v’ … … … fkv’ f1k f2k v1 v2 v’ t X(v’,w) vq … wq t qrr Partial evaluation by introducing Boolean variables 20 Yinghui Wu VLDB 2012

  20. Distributed regular query evaluation: assembling • Collects partial results as set of Boolean formulas • Constructs a dependency graph: a node vd for each in-node and each entry of its formula vector, labeled with Boolean formula and an edge for dependencies • Checks the reachability of vd(s, us) can reach vd(t, ut) in the dependency graph vd(s, us) f11 f12 … f1k v1 vd(v1, vq) vd(v2,vq) v2 vd(v’,w) v’ t vq vd(t,ut)=true … wq t qrr Partial evaluation by introducing Boolean variables 21 Yinghui Wu VLDB 2012

  21. Distributed Regular Reachability Evaluation: Example Dispatch Q to fragments (at Sc) Partial evaluation: generating a set of Boolean equations (at Fi) Assembling: solving equation system (at Sc) Q Mat,"HR" Fred, "HR" Test reachability in dependency graph Emmy,"HR" Walt, "HR" F2 F1 Q Q Q Q Sc Jack,"MK" Bill,"DB" Ross,"HR" F3 Pat,"SE" Tom,"AI" vector of Boolean formulas distributed regular reachability query evaluation 22 Yinghui Wu VLDB 2012

  22. Distributed Reachability with MapReduce coordinator generates query automata Gq; partition graph G to K fragments (as a key/value pair) (i, <Fi, Gq> ) Map function: local evaluation upon (i, <Fi, Gq>) and generates <1, rvset> Reduce function: assembles collected partial results and writes <0, ans> to distributed file system. O(Fm) 1, <F1, Gq> k, <Fk, Gq> … … mapper 1 mapper m mapper k O(|R|2|Vf |2) 1, rvset1 1, rvsetk Processing path reducer O(Fm) + |R|2|Vf |2) <0,ans> Partial evaluation properly fits in MapReduce framework 24 Yinghui Wu VLDB 2012

  23. Experimental Evaluation • Experimental setting • Real-life datasets • Synthetic data: larger random graphs following densification law • Algorithms: • disReach, disReachn and disReachm • disDist and disDistn • disRPQ, disRPQn and disRPQd • MRdRPQ 25 Yinghui Wu VLDB 2012

  24. Distributed reachability • Efficiency and scalability 20% and 6% 9% of disReachn three thousand visits over 4 fragments disReach outperforms centralized and message-passing approaches

  25. Distributed regular reachability Time: 60% of disRPQn Traffic: at most 25% and 3% disRPQ takes much less time and communication cost Efficiency and network traffic 27 Yinghui Wu VLDB 2012

  26. Distributed regular reachability (cont.) Scales well with the number of fragments; takes less time over more fragments disRPQ scales well over the number of fragments Scalability 28 Yinghui Wu VLDB 2012

  27. Performance of MapReduce implementation Takes less time with more mappers scales well with the size of fragments Takes more time over more complex queries Partial evaluation works well in MapReduce model Efficiency and Scalability 29 Yinghui Wu VLDB 2012

  28. Conclusion Partial evaluation based distributed query evaluation • Distributed reachability querying • Partial evaluation based distributed evaluation • Reachability, bounded reachability and regular reachability queries • Performance guarantees • Partial evaluation can be naturally conducted as MapReduce • Future work • Distributed evaluation for other queries, e.g., graph pattern matching using simulation • Combining partial evaluation and incremental computation 30 Yinghui Wu VLDB 2012

  29. Performance Guarantees for Distributed Reachability Queries Thank you! 29

More Related