Complex queries in dht based peer to peer networks
Download
1 / 18

Complex Queries in DHT-based Peer-to-Peer Networks - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Complex Queries in DHT-based Peer-to-Peer Networks. Matthew Harren, Joe Hellerstein, Ryan Huebsch, Boon Thau Loo, Scott Shenker, Ion Stoica [email protected] UC Berkeley, CS Division. IPTPS 3/8/02. Outline. Contrast P2P & DB systems Motivation Architecture DHT Requirements

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Complex Queries in DHT-based Peer-to-Peer Networks' - farren


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Complex queries in dht based peer to peer networks

Complex Queries in DHT-based Peer-to-Peer Networks

Matthew Harren, Joe Hellerstein,

Ryan Huebsch, Boon Thau Loo,

Scott Shenker, Ion Stoica

[email protected]

UC Berkeley, CS Division

IPTPS 3/8/02


Outline
Outline

  • Contrast P2P & DB systems

  • Motivation

  • Architecture

    • DHT Requirements

    • Query Processor

  • Current Status

  • Future Research


Uniting dhts and query processing

DHT

Query

Processor

CAN

Chord

Predicates

SQL

Tapestry

Group By

Joins

Pastry

Aggregation

Relational Data

Uniting DHTs andQuery Processing…



P2p db
P2P + DB = ?

  • P2P Database? No!

    • ACID transactional guarantees do not scale, nor does the everyday user want ACID semantics

    • Much too heavyweight of a solution for the everyday user

  • Query Processing on P2P!

    • Both P2P and DBs do data location and movement

    • Can be naturally unified (lessons in both directions)

    • P2P brings scalability & flexibilityDB brings relational model & query facilities


P2p query processing simple example
P2P Query Processing(Simple) Example

SELECT song, size, server…

FROM album, song

WHERE album.ID = song.albumID AND album.name = “Rubber Soul”

  • Filesharing+

  • Keyword searching is ONE canned SQL query

  • Imagine what else you could do!


P2p query processing simple example1
P2P Query Processing(Simple) Example

SELECT song, size, server…

FROM album-ngrams AN, song

WHERE AN.ID = song.albumID AND AN.ngram IN <list of search ngrams>

GROUP BY AN.ID

HAVING COUNT(AN.ngram) >=

<# of ngrams in search>

  • Filesharing+

  • Keyword searching is ONE canned SQL query

  • Imagine what else you could do!

    • Fuzzy Searching, Resource Discovery, Enhanced DNS


What this project is and is not about
What this projectIS and IS NOT about…

  • IS NOT ABOUT: Absolute Performance

    • In most situations a centralized solution could be faster…

  • IS ABOUT: Decentralized Features

    • No administrator, anonymity, shared resources, tolerates failures, resistant to censorship…

  • IS NOTABOUT: Replacing RDBMS

    • Centralized solutions still have their place for many applications (commercial records, etc.)

  • IS ABOUT: Research synergies

    • Unifying/morphing design principles and techniques from DB and NW communities


General architecture

Based on Distributed Hash Tables (DHT) to get many good networking properties

A query processor is built on top

Note: the data is stored separately from the query engine, not a standard DB practice!

General Architecture


Dht api
DHT – API networking properties

  • Basic API

    • publish(RID, object)

    • lookup(RID)

    • multicast(object)

  • NOTE: Applications can only fetch-by-name… a very limited query language!


Dht api enhancements i
DHT – API Enhancements I networking properties

  • Basic API

    • publish(namespace, RID, object)

    • lookup(namespace, RID)

    • multicast(namespace, object)

  • Namespaces: subsets of the ID space for logical and physical data partitioning


Dht api enhancements ii
DHT – API Enhancements II networking properties

  • Additions

    • lscan(namespace) – retrieve the data stored locally from a particular namespace

    • newData(namespace) – receive a callback when new data is inserted into the local store for the namespace

  • This violates the abstraction of location independence

  • Why necessary? Parallel scanning of base relation

  • Why acceptable? Access is limited to reading, applications can not control the location of data


Query processor qp architecture

QP is just another application as far as the DHT is concerned… DHT objects = QP tuples

User applications can use QP to query data using a subset of SQL

Select

Project

Joins

Group By / Aggregate

Data can be metadata (for a file sharing type application) or entire records, mechanisms are the same

Query Processor(QP) Architecture


Indexes the lifeblood of a database engine
Indexes. The lifeblood of a database engine. concerned… DHT objects = QP tuples

  • DHT’s mapping of RID/Object is equivalent to an index

  • Additional indexes are created by adding another key/value pair with the key being the value of the indexed field(s) and value being a ‘pointer’ to the object (the RID or primary key)

Secondary

PKey

Key

Index NS

Data

Ptr

DHT

DHT

Primary

PKey

Data

Primary Index

Secondary Index


Relational algorithms
Relational Algorithms concerned… DHT objects = QP tuples

  • Selection/Projection

  • Join Algorithms

    • Symmetric Hash

      • Use lscan on tables R & S. Republish tuples in a temporary namespace using the join attributes as the RID. Nodes in the temporary namespace perform mini-joins locally as tuples arrive and forwards results to requestor.

    • Fetch Matches

      • If there is an index on the join attribute(s) for one table (say R), use lscan for other table (say S) and then issue a lookup probing for matches in R.

    • Semi-Join like algorithms

    • Bloom-Join like algorithms

  • Group-By (Aggregation)


Interesting note
Interesting note… concerned… DHT objects = QP tuples

  • The state of the join is stored in the DHT store

    • Rehashed data is automatically re-routed to the proper node if the coordinate space adjusted

    • When a node splits (to accept a new node into the network) the data is also split, this includes previously delivered rehashed tuples

  • Allows for graceful re-organization of the network not to interfere with ongoing operations


Where we are
Where we are… concerned… DHT objects = QP tuples

  • A working real implementation of our Query Processing (currently named PIER) on top of a CAN simulator

  • Initial work studying and analyzing algorithms… nothing really ground-breaking… YET!

  • Analyzing the design space and which problems seem most interesting to pursue


Where to go from here
Where to go from here? concerned… DHT objects = QP tuples

  • Common Issues:

    • Caching – Both at DHT and QP levels

    • Using Replication – for speed and fault tolerance (both in data and computation)

    • Security

  • Database Issues:

    • Pre-computation of (intermediate) results

    • Continuous queries/alerters

    • Query optimization (Is this like network routing?)

    • More algorithms, Dist-DBMS have more tricks

    • Performance Metrics for P2P QP Systems

  • What are the new apps the system enables?


ad