1 / 24

Querying The Internet With PIER

Querying The Internet With PIER. Nitin Khandelwal. Motivation. Inject a degree of distribution into databases Internet scale systems vs. hundred node systems Large scale applications requiring database functionaity. Applications. P2P Databases

mairi
Download Presentation

Querying The Internet With PIER

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying The Internet With PIER Nitin Khandelwal

  2. Motivation • Inject a degree of distribution into databases • Internet scale systems vs. hundred node systems • Large scale applications requiring database functionaity

  3. Applications • P2P Databases Highly distributed and available data • Network Monitoring Intrusion detection Fingerprint queries 

  4. Design Principles  • Relaxed Consistency Sacrifice Consistency in face of Availability and Partition tolerance • Organic Scaling Growth with deployment • Natural Habitats for Data Data remains in original format with a DB interface • Standard Schemas Achieved though common software 

  5. DHTs • Implemented with CAN (Content Addressable Network). • Node identified by hyper-rectangle in d-dimensional space • Key hashed to a point, stored in corresponding node. • Routing Table of neighbours is maintained. O(d)

  6. DHT Design • Routing Layer Mapping for keys (-- dynamic as nodes leave and join) • Storage Manager DHT based data • Provider Storage access interface for higher levels

  7. Provider • Couples the routing and storage layers namespace – relation resourceId – primary key  namespace + resourceId >> key instanceId – distinguishes objects with same namespace and resourceID lifetime – item storage duration • LScan, Multicast, Newdata

  8. PIER Query Processor • Operators: Selection, proj, joins, grouping, agg • Operators push and pull data • Relaxed Consistency and reachable snapshot: - working with nodes reachable at query issue. - Instead, use arrival of query multicast message.

  9. Join Algorithm • R, S – relations • Nr, Ns – relation namespaces • Nq - DHT-based temporary table • Symmetric Hash Join: - Rehashes the relations - Scan and copy in new namespace Nq • Fetch Matches - One relation(S) already hashed on join attribute - Selections on non-join attributes of S cannot be pushed into the DHT

  10. Join Rewriting • Aimed at lowering the bandwidth utilization  • Symmetric semi-join - Local projections to Resource ID + join keys - Symmetric Hash Join on two projections - Global fetch matches join using Resource Ids of R and S • Bloom joins(Hashed semi-join) - Bloom filter is hashing based bit-vector - Local bloom filters are published into temporary namespaces - Filters are OR-ed and multicast to opposite relation’s nodes

  11. Workload Parameters  • CAN configuration: d = 4 • R 10 times larger than S • Constants provide 50% selectivity • f(x,y) evaluated after the join • 90% of R tuples match a tuple in S • Result tuples are 1KB each • Symmetric hash join used

  12. Simulation Setup  • Up to 10,000 nodes • Network cross-traffic, CPU and memory utilizations ignored • Data shipped from source to computation node for every query operation • 1. 100ms and 10Mbps fully connected links •  2. GT-ITM transit-stub topology (similar results)

  13. Join Algorithms • Infinite Bandwidth (Observe Impact of just propagation delay) • 1024 data and computation nodes • Core Join Algorithms: Performs faster • Rewrites: • Bloom Filter: two multicasts • Semi-join: two CAN lookups

  14. Join Algorithms -- 2 • Limited Bandwidth • Symmetric Hash Join: - Rehashes both tables • Semi Joins: - Transfer only matching tuples • At 40% selectivity, bottleneck switches from computation nodes to query sites

  15. Conclusions • Scalability of PIER dervies from relaxed design principles - adoption of soft states - dilated snapshot semantics • Limitation: Just equality predicates  • Directions: - Pushdown of selections into DHT - Caching and replication of DHT data - Catalog Manager – Stringent consistency and availability requirements.

  16. Sophia: An Information Plane Nitin Khandelwal

  17. Shared Information Plane • Distributed System running throughout the network. - Collects information about network elements Local state(load/memory usage), local perspective (reachability of other nodes) - Evaluate statements(questions) about the state - Reacting according to conclusions Killing misbehaving service

  18. Challenges • Information is widely distributed and dynamic • Statements formulated at run-time – not a-priori • Centralized analysis not practical Push analysis to the nodes(push into the network)

  19. Approach • Use logic programming model - In dynamic and distributed system, therefore temporal and positional logic • Why? - Expressivity: Intuitive to make statements about the state of the system - Performance: :: Logic expression transformation for efficient evaluation :: Partial results caching

  20. Time and Position in the Language • Every term in the system has an environment containing time and location • Eval( bandwidth( env (at(node(Node), time(Time), Time > 1032445465, BwVar), BwVar > 40000))

  21. Performance • Aggressive Caching: - Evaluation results are cached - Sometimes latency is more important then freshness - Time environment used to control freshness • Scheduling - Pre-scheduling results to be available when and where they may be needed. - Cache can be refreshed with fresh values

  22. Evaluation Planning • Given an expression, plan - where(close to data) - when (time when dependencies resolved) - what to evaluate • Logic expressions can be transformed at runtime

  23. Extensibility • Users can add new functionality at run-time • Capabilities : to protect modules, grant and revoke privileges. cap569354(Val) :- read sensor. cap435456(Val) :- cap569354(Val). bandwidth(Val) :- cap(435456(Val) • Module Protection: All predicates transformed into capabilities, shared through master key capability • Danger in caching – different interfaces

  24. PIER and Sophia • Sophia: location of code execution is both explicit in the language and can be evaluated in the course of evaluation. • PIER: details of query execution left to underlying implementation to optimize. • Consequence: Sophia queries are more sophisticated: both user and system participate in evaluation planning.

More Related