1 / 21

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems. Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University. Motivation. P2P Systems Dynamic set of nodes Dynamic data distributed over nodes No centralization

deon
Download Presentation

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University

  2. Motivation • P2P Systems • Dynamic set of nodes • Dynamic data distributed over nodes • No centralization • Traditionally : Simple point queries over data • New P2P applications desire multi-dimensional queries • Photo Sharing: Find all labels for photos in a geographical area • Multi-player games: Find all objects in an area

  3. Problem Devise P2P system to store relation R with: • Efficient tuple insertion/deletion • Efficient node join/leave • Minimize #messages • Efficient multi-dimensional range queries • Minimize #nodes processing query • Load balance across nodes A parallel DB on steroids

  4. Challenge 1: Partitioning Problem • Partition data with • Locality: Keep nearby tuples on same node • Load balance: Equal #tuples on all nodes • Complications • Dynamic data • Dynamic nodes

  5. Challenge 2: Routing Problem • Route query/insert/delete to relevant nodes • No centralization! • Replicated directory too expensive! • Trade-off between cost of query and cost of maintaining routing structure

  6. Roadmap • Two Different Approaches • SCRAP: Space-filling curves with Range Partitions • MURK: Multi-dimensional Rectangulation with kd-trees • Comparing the two approaches

  7. SCRAP Partitioning • Two-Step Process • Map data to 1-d with space-filling curve • E.g., <110011,010101> becomes 101100011011

  8. Scrap Partitioning (2) 2. Range partition 1-d data • Preserves locality!

  9. Load Balancing with SCRAP • Adjust partitions when unbalanced • Adjust boundary with neighbor • Migrate to new area • Guarantees: All loads within factor 4.24. Constant tuple movements per insert/delete [GBGM04]

  10. Query Routing • Map multi-dim query to set of 1-d ranges • Send each 1-d range query to relevant node • Use a linked list to interconnect nodes • Add “skip” pointers for fast routing • O(log n) messages for routing/node joins/leaves

  11. Roadmap • Two Different Approaches • SCRAP: Space-filling curves with Range Partitions • MURK: Multi-dimensional Rectangulation with kd-trees • Comparing the two approaches

  12. MURK • Intuition: Partition data in native space into “Rectangles” • a la kd-trees

  13. Kd-tree Interpretation • Nodes form leaves of kd-tree • Node Join: Split existing leaf • Node leave • Sibling takes over • If no sibling, find someone in sibling sub-tree

  14. Murk Properties • Locality: Rectangulation better than SCRAP • Load Balance • Ok if data distribution is static • ??? If data distribution is dynamic

  15. Routing Queries • Build a grid of nodes • Adjacent nodes link to each other • Analogous to linked list in higher dimensions • Problems • Node managing large space has many neighbors! • Routing on grid is too slow. Need skip pointers • Not easy to add skip pointers (see paper)

  16. Evaluation • Datasets • Uniform: 32-bit ints drawn at random • Skewed: Photo Co-ords from real collection • Nodes join one at a time to build network • Evaluate • Locality: #nodes that process a query • Routing: #messages transmitted per query

  17. Dimensionality vs. Locality #nodes = 8192. #Ideal Locality =1 Dimensionality

  18. Selectivity vs. Locality

  19. Network Size vs. routing Cost Network Size

  20. Conclusions • SCRAP • Simple partitioning and routing • Excellent load balance • Issue: Space-filling curve offers poor locality • MURK • Much better locality than SCRAP • Routing still ok • Load balance is more complex and heuristic 

  21. More Information • Load Balancing, Range Queries and P2P • “Online Balancing of Range-Partitioned Data with Applications to P2P Systems”, VLDB 2004 • “Distributed Balanced Tables: Not Making a Hash of it All”, Stanford Tech Report • Google: “Prasanna Ganesan” • More work on P2P • Google: “Stanford Peers”

More Related