One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University

Motivation • P2P Systems • Dynamic set of nodes • Dynamic data distributed over nodes • No centralization • Traditionally : Simple point queries over data • New P2P applications desire multi-dimensional queries • Photo Sharing: Find all labels for photos in a geographical area • Multi-player games: Find all objects in an area

Problem Devise P2P system to store relation R with: • Efficient tuple insertion/deletion • Efficient node join/leave • Minimize #messages • Efficient multi-dimensional range queries • Minimize #nodes processing query • Load balance across nodes A parallel DB on steroids

Challenge 1: Partitioning Problem • Partition data with • Locality: Keep nearby tuples on same node • Load balance: Equal #tuples on all nodes • Complications • Dynamic data • Dynamic nodes

Challenge 2: Routing Problem • Route query/insert/delete to relevant nodes • No centralization! • Replicated directory too expensive! • Trade-off between cost of query and cost of maintaining routing structure

Roadmap • Two Different Approaches • SCRAP: Space-filling curves with Range Partitions • MURK: Multi-dimensional Rectangulation with kd-trees • Comparing the two approaches

SCRAP Partitioning • Two-Step Process • Map data to 1-d with space-filling curve • E.g., <110011,010101> becomes 101100011011

Scrap Partitioning (2) 2. Range partition 1-d data • Preserves locality!

Load Balancing with SCRAP • Adjust partitions when unbalanced • Adjust boundary with neighbor • Migrate to new area • Guarantees: All loads within factor 4.24. Constant tuple movements per insert/delete [GBGM04]

Query Routing • Map multi-dim query to set of 1-d ranges • Send each 1-d range query to relevant node • Use a linked list to interconnect nodes • Add “skip” pointers for fast routing • O(log n) messages for routing/node joins/leaves

Roadmap • Two Different Approaches • SCRAP: Space-filling curves with Range Partitions • MURK: Multi-dimensional Rectangulation with kd-trees • Comparing the two approaches

MURK • Intuition: Partition data in native space into “Rectangles” • a la kd-trees

Kd-tree Interpretation • Nodes form leaves of kd-tree • Node Join: Split existing leaf • Node leave • Sibling takes over • If no sibling, find someone in sibling sub-tree

Murk Properties • Locality: Rectangulation better than SCRAP • Load Balance • Ok if data distribution is static • ??? If data distribution is dynamic

Routing Queries • Build a grid of nodes • Adjacent nodes link to each other • Analogous to linked list in higher dimensions • Problems • Node managing large space has many neighbors! • Routing on grid is too slow. Need skip pointers • Not easy to add skip pointers (see paper)

Evaluation • Datasets • Uniform: 32-bit ints drawn at random • Skewed: Photo Co-ords from real collection • Nodes join one at a time to build network • Evaluate • Locality: #nodes that process a query • Routing: #messages transmitted per query

Dimensionality vs. Locality #nodes = 8192. #Ideal Locality =1 Dimensionality

Selectivity vs. Locality

Network Size vs. routing Cost Network Size

Conclusions • SCRAP • Simple partitioning and routing • Excellent load balance • Issue: Space-filling curve offers poor locality • MURK • Much better locality than SCRAP • Routing still ok • Load balance is more complex and heuristic 

More Information • Load Balancing, Range Queries and P2P • “Online Balancing of Range-Partitioned Data with Applications to P2P Systems”, VLDB 2004 • “Distributed Balanced Tables: Not Making a Hash of it All”, Stanford Tech Report • Google: “Prasanna Ganesan” • More work on P2P • Google: “Stanford Peers”

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems

Presentation Transcript

One Dimensional Magnetic Systems

One Ring to Rule Them All!!

Association Rule Mining Multi Level And Multi Dimensional Association Rule Mining

One Newsfeed to Rule Them All

The one business card to rule them all.

Aegir Hosting System One Drupal to Rule Them All

The One Carbon Ring…. To Rule them All!

One ring to rule them all CIDOC CRM

CMIS: One ECM API to Rule Them All

One control to rule them all Michael Welzl

One Bib to Rule Them All – SUNY One Bib / Shared Catalog Project

One Phone to Rule Them All

Approximate range selection queries in P2P systems

One model to rule them all?

One Editor To Rule Them All

One mask to group them all, One code to find them, One file to store them all,

One Team to Rule Them All presenting ROBO RAGGINS

Caching Multi-dimensional Queries Using Chunks

Multi-dimensional Queries in P2P Systems

One ring to rule them all CIDOC CRM

One App to Rule Them All: How Remote-First Teams Succeed