1 / 83

PIER

PIER. Presentation overview PIER. Core functionality and design principles Distributed join example. CAN high level functions. Application, PIER, DHT/CAN in detail. Distributed join / operations in detail. Simulation results. Conclusion. What is PIER?.

chelsa
Download Presentation

PIER

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PIER

  2. Presentation overview PIER • Core functionality and design principles • Distributed join example. • CAN high level functions. • Application, PIER, DHT/CAN in detail. • Distributed join / operations in detail. Simulation results. • Conclusion

  3. What is PIER? • Peer-to-Peer Information Exchange and Retrieval • A distributed query engine based on widely distributed environments • "general data retrieval system" using any source • Pier is internally using a relational database format • Read only system

  4. >>based on CAN Network Query DHT Storage Monitoring Manager Core Optimizer Wrapper Relational Execution Other User Overlay Catalog Engine Apps Routing Manager Applications DHT PIER Tier 1 Tier 2 Tier 3 Pier overview

  5. Relaxed Consistency • Brewer states that a distributed data system can only have two out of three of the following properties : • (C)onsistency • (A)vailability • Tolerance of network (P)artitions • Pier : Priority : A ,P and sacrifice Cie. Best effort results • Detailed in distributed join part

  6. Scalability • Scalability – amount of work scales with the amount of nodes • Network can grow easily • Robust • PIER doesnt require a-priori allocation of resources

  7. Tuple Wrapper Source Data sources • Data remains in it original source • Could be anything : -file system-live feed from a proces • Wrappers or gateways have to be provided

  8. Standard schemas Relation • Design goal for application layer • Pro : Bypasses standardization process • Con : Limited by current applications Popular software (example tcpdump)

  9. PIER is independent of the DHT. Currently it is CAN. • Currently using multicast. Other strategies are possible. (Further explained in DHT)

  10. PIER storage overview

  11. Presentation overview PIER • Core functionality and design principles • Distributed join example. • CAN high level functions. • Application, PIER, DHT/CAN in detail. • Distributed join / operations in detail. Simulation results. • Conclusion

  12. DHT based distributed join: example • Distributed execution of relational database query operations, e.g. join, is the core functionality of the PIER system. • Distributed join is a relational database join performed to some degree in parallel by a number of processors (machines) containing different parts of the relations on which the join is performed. • Perspective: generic intelligent "keyword" based search based on distributed database query operations (e.g. like Google). • The following example illustrates what PIER can do and thus the main purpose of PIER, by means of distributed join based on DHT. Details of how and which layer is doing what are provided later.

  13. DHT based distributed join example: relational database join proper (1/2)

  14. DHT based distributed join example: relational database join proper (2/2)

  15. DHT based distributed join example: (1/9)

  16. DHT based distributed join example: (2/9)

  17. DHT based distributed join example: (3/9)

  18. DHT based distributed join example: (4/9)

  19. DHT based distributed join example: (5/9)

  20. DHT based distributed join example: (6/9)

  21. DHT based distributed join example: (7/9)

  22. DHT based distributed join example: (8/9)

  23. DHT based distributed join example: (9/9)

  24. Presentation overview PIER • Core functionality and design principles • Distributed join example. • CAN high level functions. • Application, PIER, DHT/CAN in detail. • Distributed join / operations in detail. Simulation results. • Conclusion

  25. CAN • CAN is a DHT • Basic operations on CAN are insertion, lookup and deletion of (key,value) pairs • Each CAN node stores a chunk (called zone) of the entire hash table

  26. Every node is responsible of a small number of “adjacent” nodes(called zones) in the table • Requests (insert, lookup, delete) for a particular key are routed by intermediate CAN nodes towards the CAN node that contains the key

  27. Design centers around a virtual d-dimensional Cartesian coordinate space on a d-torus • Coordinate space is completely virtual and has no relation to any physical coordinate system • Keyspace is probably SHA1 Map Key k1 onto a point p1 using a Uniform Hash function (k1,v1) is stored at the node Nx that owns the zone with p1

  28. Retrieving a value for a given key • Apply the deterministic hash function to map key onto point P and then retrieve the corresponding value from the point P. • One hash function per dimension get(key) -> value:- lookup(key) -> ip address.- ip.retrieve(key) -> value

  29. Storing (key,value) • Key is deterministically mapped onto a point P in the coordinate space using a uniform hash function • The corresponding (key,value) pair is then stored at the node that owns the zone within which the point P lies put(key,value):- lookup(key) -> ip address- ip.store(key,value)

  30. Routing • CAN node maintains routing table that holds the IP address and virtual coordinate zone of each of its immediate neighbors in the coordinate space • Using its neighbor coordinate set, a node routes a message towards its destination by simple greedy forwarding to the neighbor with coordinates closest to the destination coordinates

  31. (16,0) Key = (15,14) Data (0,0) (0,16) • Node Maintains routing table with neighbors • Follow the straight line path through the Cartesian space (16,16)

  32. Node Joining • New node must find a node already in the CAN • Next, using the CAN routing mechanisms, it must find a node whose zone will be split • Neighbors of the split zone must be notified so that routing can include the new node

  33. I CAN: construction new node 1) Discover some node “I” already in CAN

  34. CAN: construction (x,y) I new node 2) Pick random point in space

  35. CAN: construction (x,y) J I new node 3) I routes to (x,y), discovers node J

  36. CAN: construction new J 4) split J’s zone in half… new node owns one half

  37. Node departure • Node explicitly hands over its zone and the associated (key,value) database to one of its neighbors • Incase of network failure this is handled by a take-over algorithm • Problem : take over mechanism does not provide regeneration of data • solution:every node has a backup of its neighbours

  38. Presentation overview PIER • Core functionality and design principles • Distributed join example. • CAN high level functions. • Application, PIER, DHT/CAN in detail. • Distributed join / operations in detail. Simulation results. • Conclusion

  39. Querying the Internet with PIER(PIER = Peer-to-peer Information Exchange and Retrieval)

  40. What is a DHT? • Take an abstract ID space, and partition among a changing set of computers (nodes) • Given a message with an ID, route the message to the computer currently responsible for that ID • Can store messages at the nodes • This is like a “distributed hash table” • Provides a put()/get() API

  41. Key = (15,14) Data Given a message with an ID, route the message to the computer currently responsible for that ID (16,16) (16,0) (0,0) (0,16)

  42. Lots of effort is put into making DHTs better: • Scalable (thousands  millions of nodes) • Resilient to failure • Secure (anonymity, encryption, etc.) • Efficient (fast access with minimal state) • Load balanced • etc.

  43. Select R.Cpr,R.name, s.Address From R,S Where R.cpr=S.cpr Declarative Queries Query Plan Overlay Network Physical Network >>based on Can

  44. Applications Any Distributed Relational Database Applications Network Monitoring Feasible Applications Intrusion detection Fingerprint queries  Load Monitor of CPU Split Zone

  45. DHTs • Implemented with CAN (Content Addressable Network). • Node identified by rectangle in d-dimensional space • Key hashed to a point, stored in corresponding node. • Routing Table of neighbours is maintained. O(d)

  46. DHT Design • Routing Layer Mapping for keys (-- dynamic as nodes leave and join) • Storage Manager DHT based data • Provider Storage access interface for higher levels

  47. DHT – Routing Routing layer maps a key into the IP address of the node currently responsible for that key. Provides exact lookups, callbacks higher levels when the set of keys has changed Routing layer API Asynchronous Fnc lookup(key)  ipaddr synch fnc Local Node join(landmarkNode) leave() locationMapChange()

  48. DHT – Storage Storage Manager stores and retrieves records, which consist of key/value pairs. Keys are used to locate items and can be any data type or structure supported Storage Manager API store(key, items) retrieve(key) items --Structure remove(key)

  49. DHT – Provider (1) Provider ties routing and storage manager layers and provides an interface • Each object in the DHT has a namespace, resourceID and instanceID • DHT key = (hash1(namespace,resourceID) +..+ hashN(namespace,resourceID)) Depends on dimension On CAN 0.2160 • namespace - application or group of object, table or relation • resourceID – primary key or any attribute(Object) • instanceID– integer, to separate items with the samenamespace and resourceID • Lifetime - item storage duration --adherence principle of relaxed Consistency CAN’s mapping of resourceID /Object is equivalent to an index

  50. DHT – Provider (2) Provider API get(namespace, resourceID)  item put(namespace, resourceID, item, lifetime) renew(namespace, resourceID, instanceID, lifetime)  bool multicast(namespace, resourceID, item) lscan(namespace)  items (Structure/Iterator) newData(namespace, item)

More Related