1 / 16

Towards Efficient Query Processing on Massive Evolving Graphs ( C-Big2012 )

Towards Efficient Query Processing on Massive Evolving Graphs ( C-Big2012 ). Arash Fard , Amir Abdolrashidi , Lakshmish Ramaswamy and John A. Miller UGA Presentation by : Charith Wickramaarachchi. Time Evolving Graph. Paradigm for molding dynamic relationships in networks.

dinah
Download Presentation

Towards Efficient Query Processing on Massive Evolving Graphs ( C-Big2012 )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Efficient Query Processing on Massive Evolving Graphs(C-Big2012) ArashFard, Amir Abdolrashidi, LakshmishRamaswamyand John A. Miller UGA Presentation by : CharithWickramaarachchi

  2. Time Evolving Graph • Paradigm for molding dynamic relationships in networks. • TEG : Series of snapshots of a graph which evolves over time. • Web graph • Relationship structure of social networks • Communication flow networks • Evolution History of genome families

  3. TEG and Scalability • Additional Dimension – Time • New queries • Historical • Inverse temporal • Continuous • Data volume • Indexing

  4. Overview • Data distribution strategies for TEGs • Answering reachability queries • Sub graph queries in large TEGs

  5. TEG distribution • Objectives • Improve node utilization • Minimize the communication cost • Strategies • Random distribution • Improves node utilization • High communication • connected sub-graph distribution • Low communication • Low node utilization

  6. Type of Algorithm • High communication low computation • Page rank, HCC - Min-cut • Low communication • SSSP - Radom distribution • Dynamic Nature of Graph • Additions and deletions of nodes. • Repartitioning cost • Data transfer cost. • Re-wiring cost • Data node configuration • More partitions than compute nodes (Partition : CC ) • Smaller sized partitions • Small stragglers

  7. Reachability queries in TEGs • {G1,G2,…… Gq, …..Gr} – Snapshots of TEG : G • Diff(Gq,Gq-1) – Changes between snapshots Gqand Gq-1 • Vertex addition • Edge addtion • Reach(v,w,q) – TRUE/FALSE

  8. Reachability Queries in Static Graphs • Pre Indexing • O(1) – Pre computed spanning tree • High indexing time • Index table • On demand Traversal • O(M+N) • Limitations for TEGs • High indexing cost – Need to index per each snapshot • High storage overhead • Low cost benefit ratio

  9. Approach • Interval – based indexing

  10. Approach • Steps (Assume Reach (u,v,q) where q > p and Gp is indexed) • Reach(u,v,p) ? • Does Diff(Gp,Gq) change that • Naïve approach : process Diff(Gp,Gq) in Chronological order • A Better approach : Does the changes impact the reachability ?

  11. Approach • Reach (A,H,3) • Add(E,F) ? Related ? • Add(B,E) & Add(F,G) & Add(E,F)

  12. Observations • If Reach(u,v,q) = true • Need to process diffs if diff stack contains at least one delete(p,q) where p,q is a edge on a path from u,v in Gp • If Reach(u,v,q) = false • Contains at least one Add(p,q) • p is reachable from u • q is reachable from v

  13. Graph Pattern Matching • Subgraph Isomorphism • Bijective mapping between query (Q(Vq,Eq))graph and subgraph(G’(V’,E’)) of target graph G. • There exist f : V’--> Vq • For all v’,w’ in V’ there is vq,wqin Vqs.t. (v’,w’) in E’ ↔ (vq,wq) in Eq • Simulation • G(V,E) matches Q(Vq,Eq) if there exist R subset of Vq X V s.t.(u,u’) in R -> u and u’ have same label • For all u in Vq there is u’ in V • For all (u,v) in Eq there is a (u’,v’) in E

  14. Vertex Centric approach • Graph (V,E,l) • Query Q(Vq,Eq,lq) • Output M : a Maxmmatch in G for Q • Use GPS features • Master for global operations

  15. Vertex Centric approach • 1ST - Master broadcasts the query • 2nd – Each vertex whose label is same as in Q will get flagged • S : set of matched nodes (Note v in G can be matched to two vertices in Q) • Each vertex keeps set of lists of labels for possible children. • # of outgoing edges < any list of children : remove. • Send id to children. • 3rd Children reply with id, label • 4th : If received child label is superset of matched children labels in Q keep, else remove. Pass the removal report to parents • 5th : Remove the child list , Check for validity in S . If not remove your self from S, Report to parents . • Next : Goto 5th.

  16. Conclusion • TEG processing : an emerging research area with lot of applications • Need for new partitioning techniques and graph query techniques • Does TEG processing applications benefits more from an EDA based model than traditional query processing model ?

More Related