one torus to rule them all multi dimensional queries in p2p systems n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems PowerPoint Presentation
Download Presentation
One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems

Loading in 2 Seconds...

play fullscreen
1 / 21

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems. Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University. Motivation. P2P Systems Dynamic set of nodes Dynamic data distributed over nodes No centralization

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems' - deon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
one torus to rule them all multi dimensional queries in p2p systems

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems

Prasanna Ganesan

Beverly Yang

Hector Garcia-Molina

Stanford University

motivation
Motivation
  • P2P Systems
    • Dynamic set of nodes
    • Dynamic data distributed over nodes
    • No centralization
    • Traditionally : Simple point queries over data
  • New P2P applications desire multi-dimensional queries
    • Photo Sharing: Find all labels for photos in a geographical area
    • Multi-player games: Find all objects in an area
problem
Problem

Devise P2P system to store relation R with:

  • Efficient tuple insertion/deletion
  • Efficient node join/leave
    • Minimize #messages
  • Efficient multi-dimensional range queries
    • Minimize #nodes processing query
  • Load balance across nodes

A parallel DB on steroids

challenge 1 partitioning problem
Challenge 1: Partitioning Problem
  • Partition data with
    • Locality: Keep nearby tuples on same node
    • Load balance: Equal #tuples on all nodes
  • Complications
    • Dynamic data
    • Dynamic nodes
challenge 2 routing problem
Challenge 2: Routing Problem
  • Route query/insert/delete to relevant nodes
    • No centralization!
    • Replicated directory too expensive!
    • Trade-off between cost of query and cost of maintaining routing structure
roadmap
Roadmap
  • Two Different Approaches
    • SCRAP: Space-filling curves with Range Partitions
    • MURK: Multi-dimensional Rectangulation with kd-trees
  • Comparing the two approaches
scrap partitioning
SCRAP Partitioning
  • Two-Step Process
    • Map data to 1-d with space-filling curve
      • E.g., <110011,010101> becomes 101100011011
scrap partitioning 2
Scrap Partitioning (2)

2. Range partition 1-d data

  • Preserves locality!
load balancing with scrap
Load Balancing with SCRAP
  • Adjust partitions when unbalanced
    • Adjust boundary with neighbor
    • Migrate to new area
    • Guarantees: All loads within factor 4.24. Constant tuple movements per insert/delete [GBGM04]
query routing
Query Routing
  • Map multi-dim query to set of 1-d ranges
  • Send each 1-d range query to relevant node
  • Use a linked list to interconnect nodes
    • Add “skip” pointers for fast routing
    • O(log n) messages for routing/node joins/leaves
roadmap1
Roadmap
  • Two Different Approaches
    • SCRAP: Space-filling curves with Range Partitions
    • MURK: Multi-dimensional Rectangulation with kd-trees
  • Comparing the two approaches
slide12
MURK
  • Intuition: Partition data in native space into “Rectangles”
    • a la kd-trees
kd tree interpretation
Kd-tree Interpretation
  • Nodes form leaves of kd-tree
  • Node Join: Split existing leaf
  • Node leave
    • Sibling takes over
    • If no sibling, find someone in sibling sub-tree
murk properties
Murk Properties
  • Locality: Rectangulation better than SCRAP
  • Load Balance
    • Ok if data distribution is static
    • ??? If data distribution is dynamic
routing queries
Routing Queries
  • Build a grid of nodes
    • Adjacent nodes link to each other
    • Analogous to linked list in higher dimensions
  • Problems
    • Node managing large space has many neighbors!
    • Routing on grid is too slow. Need skip pointers
    • Not easy to add skip pointers (see paper)
evaluation
Evaluation
  • Datasets
    • Uniform: 32-bit ints drawn at random
    • Skewed: Photo Co-ords from real collection
  • Nodes join one at a time to build network
  • Evaluate
    • Locality: #nodes that process a query
    • Routing: #messages transmitted per query
dimensionality vs locality
Dimensionality vs. Locality

#nodes = 8192. #Ideal Locality =1

Dimensionality

conclusions
Conclusions
  • SCRAP
    • Simple partitioning and routing
    • Excellent load balance
    • Issue: Space-filling curve offers poor locality
  • MURK
    • Much better locality than SCRAP
    • Routing still ok
    • Load balance is more complex and heuristic 
more information
More Information
  • Load Balancing, Range Queries and P2P
    • “Online Balancing of Range-Partitioned Data with Applications to P2P Systems”, VLDB 2004
    • “Distributed Balanced Tables: Not Making a Hash of it All”, Stanford Tech Report
    • Google: “Prasanna Ganesan”
  • More work on P2P
    • Google: “Stanford Peers”