towards efficient query processing on massive evolving graphs c big2012 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Towards Efficient Query Processing on Massive Evolving Graphs ( C-Big2012 ) PowerPoint Presentation
Download Presentation
Towards Efficient Query Processing on Massive Evolving Graphs ( C-Big2012 )

Loading in 2 Seconds...

play fullscreen
1 / 16

Towards Efficient Query Processing on Massive Evolving Graphs ( C-Big2012 ) - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

Towards Efficient Query Processing on Massive Evolving Graphs ( C-Big2012 ). Arash Fard , Amir Abdolrashidi , Lakshmish Ramaswamy and John A. Miller UGA Presentation by : Charith Wickramaarachchi. Time Evolving Graph. Paradigm for molding dynamic relationships in networks.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Towards Efficient Query Processing on Massive Evolving Graphs ( C-Big2012 )' - dinah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
towards efficient query processing on massive evolving graphs c big2012

Towards Efficient Query Processing on Massive Evolving Graphs(C-Big2012)

ArashFard, Amir Abdolrashidi, LakshmishRamaswamyand John A. Miller

UGA

Presentation by : CharithWickramaarachchi

time evolving graph
Time Evolving Graph
  • Paradigm for molding dynamic relationships in networks.
  • TEG : Series of snapshots of a graph which evolves over time.
    • Web graph
    • Relationship structure of social networks
    • Communication flow networks
    • Evolution History of genome families
teg and scalability
TEG and Scalability
  • Additional Dimension – Time
    • New queries
      • Historical
      • Inverse temporal
      • Continuous
  • Data volume
  • Indexing
overview
Overview
  • Data distribution strategies for TEGs
  • Answering reachability queries
  • Sub graph queries in large TEGs
teg distribution
TEG distribution
  • Objectives
    • Improve node utilization
    • Minimize the communication cost
  • Strategies
    • Random distribution
      • Improves node utilization
      • High communication
    • connected sub-graph distribution
      • Low communication
      • Low node utilization
slide6

Type of Algorithm

    • High communication low computation
      • Page rank, HCC - Min-cut
    • Low communication
      • SSSP - Radom distribution
  • Dynamic Nature of Graph
    • Additions and deletions of nodes.
    • Repartitioning cost
      • Data transfer cost.
      • Re-wiring cost
  • Data node configuration
    • More partitions than compute nodes (Partition : CC )
      • Smaller sized partitions
        • Small stragglers
reachability queries in tegs
Reachability queries in TEGs
  • {G1,G2,…… Gq, …..Gr} – Snapshots of TEG : G
  • Diff(Gq,Gq-1) – Changes between snapshots Gqand Gq-1
    • Vertex addition
    • Edge addtion
  • Reach(v,w,q) – TRUE/FALSE
reachability queries in static graphs
Reachability Queries in Static Graphs
  • Pre Indexing
  • O(1) – Pre computed spanning tree
  • High indexing time
  • Index table
  • On demand Traversal
  • O(M+N)
  • Limitations for TEGs
  • High indexing cost – Need to index per each snapshot
  • High storage overhead
  • Low cost benefit ratio
approach
Approach
  • Interval – based indexing
approach1
Approach
  • Steps (Assume Reach (u,v,q) where q > p and Gp is indexed)
    • Reach(u,v,p) ?
    • Does Diff(Gp,Gq) change that
  • Naïve approach : process Diff(Gp,Gq) in Chronological order
  • A Better approach : Does the changes impact the reachability ?
approach2
Approach
  • Reach (A,H,3)
    • Add(E,F) ? Related ?
    • Add(B,E) & Add(F,G) & Add(E,F)
observations
Observations
  • If Reach(u,v,q) = true
    • Need to process diffs if diff stack contains at least one delete(p,q) where p,q is a edge on a path from u,v in Gp
  • If Reach(u,v,q) = false
    • Contains at least one Add(p,q)
      • p is reachable from u
      • q is reachable from v
graph pattern matching
Graph Pattern Matching
  • Subgraph Isomorphism
    • Bijective mapping between query (Q(Vq,Eq))graph and subgraph(G’(V’,E’)) of target graph G.
    • There exist f : V’--> Vq
    • For all v’,w’ in V’ there is vq,wqin Vqs.t. (v’,w’) in E’ ↔ (vq,wq) in Eq
  • Simulation
    • G(V,E) matches Q(Vq,Eq) if there exist R subset of Vq X V s.t.(u,u’) in R -> u and u’ have same label
    • For all u in Vq there is u’ in V
    • For all (u,v) in Eq there is a (u’,v’) in E
vertex centric approach
Vertex Centric approach
  • Graph (V,E,l)
  • Query Q(Vq,Eq,lq)
  • Output M : a Maxmmatch in G for Q
  • Use GPS features
    • Master for global operations
vertex centric approach1
Vertex Centric approach
  • 1ST - Master broadcasts the query
  • 2nd – Each vertex whose label is same as in Q will get flagged
    • S : set of matched nodes (Note v in G can be matched to two vertices in Q)
    • Each vertex keeps set of lists of labels for possible children.
    • # of outgoing edges < any list of children : remove.
    • Send id to children.
  • 3rd Children reply with id, label
  • 4th : If received child label is superset of matched children labels in Q keep, else remove. Pass the removal report to parents
  • 5th : Remove the child list , Check for validity in S . If not remove your self from S, Report to parents .
  • Next : Goto 5th.
conclusion
Conclusion
  • TEG processing : an emerging research area with lot of applications
  • Need for new partitioning techniques and graph query techniques
  • Does TEG processing applications benefits more from an EDA based model than traditional query processing model ?