1 / 55

Query Processing and Networking Infrastructures

Query Processing and Networking Infrastructures. Day 2 of 2 Joe Hellerstein UC Berkeley September 27, 2002. Outline. Day 1: Query Processing Crash Course Intro Queries as indirection How do relational databases run queries? How do search engines run queries?

mistico
Download Presentation

Query Processing and Networking Infrastructures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Processing and Networking Infrastructures Day 2 of 2 Joe Hellerstein UC Berkeley September 27, 2002

  2. Outline • Day 1: Query Processing Crash Course • Intro • Queries as indirection • How do relational databases run queries? • How do search engines run queries? • Scaling up: cluster parallelism and distribution • Day 2: Research Synergies w/Networking • Queries as indirection, revisited • Useful (?) analogies to networking research • Some of our recent research at the seams • Some of your research? • Directions and collective discussion

  3. Indirections

  4. Standard: Spatial Indirection • Allows referent to move without changes to referers • Doesn’t matter where the object is, we find it. • Alternative: copying • Works if updates are managed carefully, or don’t exist

  5. Temporal Indirection • Asynchronous communication is indirection in time • Doesn’t matter when the object arrives, you find it • Analogy to space • Sender  referer • Recipient  referent

  6. Generalizing • Indirection in Space • x-to-one or x-to-many? • Physical or Logical mapping? • Indirection in Time • Persistence model: storage or re-xmission • Persistence role: sender or receiver

  7. Indirection in Space, Redux • One-to-one, one-to-many, many-to-many? • Standard relational issue • E.g. virtual address is many-to-one • E.g. email distribution list is one-to-many • Physical or logical • Mapping table? • E.g. page tables, mailing list, DNS, multicast group lists • Logical • E.g. queries, subscriptions, interests

  8. Indirection in Time, Redux • Persistence model: storage or re-xmission • Storage: e.g. DB, heap, stack, NW buffer, mailqueue • Re-xmission: e.g. polling, retries. • “Joe is so persistent” • Persistence of put or get • Put: e.g. DB insert, email, retry • Get: e.g. subscription, polling

  9. Examples: Storage Systems • Virtual Memory System • Space: 1-to-1, physical • Time: synchronous (no indirection) • Database System • Space: many-to-many, logical • Time: synchronous (no indirection) • Broadcast Disks • Space: 1-to-1 • Time: re-xmitted put

  10. Examples: Split-Phase APIs • Polling • Space: no indirection • Time: re-xmitted get • Callbacks • Space: no indirection • Time: stored get • Active Messages • Space: no indirection • Time: stored get • App stores a get with putter, which tags it on messages

  11. Examples: Communication • Email • Space: One-to-many, physical • Mapping is one-to-many, delivery is one-to-one (copies) • Time: stored put • Multicast • Space: One-to-many, physical • Both mapping and delivery are one-to-many • Time: roughly synchronous?

  12. Examples: Distributed APIs • RPC • Space: 1-to-1, physical • Can be 1-to-many • Time: synchronous (no indirection) • Messaging systems • Space: 1-to-1, physical • Often 1-to-many • Time: depends! • Transactional messaging is stored put • Exactly-once transmission guaranteed • Other schemes are re-xmitted put • At least once transmission. Idempotency of message becomes important!

  13. Examples: Logic-based APIs • Publish-Subscribe • Space: one-to-many, logical • Time: stored receiver • Tuplespaces • Space: one-to-many, logical • Time: stored sender

  14. Indirection Summary • 2 binary indirection variables for space, 2 for time • Can have indirection in one without the other • Leads to 24 indirection options • 16 joint space/time indirections, 4 space-only, 4 time-only • And few lessons about the tradeoffs! • Note: issues here in performance and SW engineering and … • E.g. “Are tuplespaces better than pub/sub?” • Not a unidimensional question!

  15. Rendezvous • Indirection on both sender and receiver side • In time and/or space on each side • Most general: neither sender nor receiver know where or when rendezvous will happen! • Each chases a reference for where • Each must persist for when

  16. Join as Rendezvous • Recall pipelining hash join • Combine all blue and gray tuples that match • A batch rendezvous • In space: the data items were not stored in a fixed location, copied into HT • In time: both sides do put-persist in the join algorithm via storage • A hint of things to come: • In parallel DBs, the hash table is content-addressed (via the exchange routing function) • What if hash table is distributed? • If a tuple in the join is doing “get”, then is there a distinction between sender/recipient? Between query and data?

  17. Some resonances • We said that query systems are an indirection mechanism. • Logical, many-to-many, but synchronous • Query-response • And some dataflow techniques inside query engines seem to provide useful indirection mechanisms • If we add a network into the picture, life gets very interesting • Indirection in space very useful • Indirection in time is critical • Rendezvous is a basic operation

  18. More Resonance

  19. More Interaction: CS262 Experiment w/ Eric Brewer • Merge OS & DBMS grad class, over a year • Eric/Joe, point/counterpoint • Some tie-ins were obvious: • memory mgmt, storage, scheduling, concurrency • Surprising: QP and networks go well side by side • E.g. eddies and TCP Congestion Control • Both use back-pressure and simple Control Theory to “learn” in an unpredictable dataflow environment

  20. Scout • Paths the key to comm-centric OS • “Making Paths Explicit in the Scout Operating System”, David Mosberger and Larry L. Peterson. OSDI ‘96. Figure 3:Example Router Graph

  21. CLICK • A NW router is a query plan! • With a twist: flow-based context • An opportunity for “autonomous” query optimization

  22. Revisiting a NW Classic with DB Goggles

  23. Clark & Tennenhouse, SIGCOMM ‘90 • Architectural Considerations for a New Generation of Protocols • Love it for two reasons • Tries to capture the essence of what networks do • Great for people who need the 10,000-foot view! • I’m a fan of doing this (witness last week) • Tries to move the community up the food chain • Resonances everywhere!!

  24. C&T Overview (for amateurs like me) • Core function of protocols: data xfer • Data Manipulation • buffer, checksum, encryption, xfer to/from app space, presentation • Transfer Control • flow/congestion ctl, detecting transmission problems, acks, muxing, timestamps, framing

  25. Data Modeling! Query Opt! Exchange! C & T’s Wacky Ideas • Thesis: nets are good at xfer control, not so good at data manipulation • Some C&T wacky ideas for better data manipulation • Xfer semantic units, not packets (ALF) • Auto-rewrite layers to flatten them (ILP) • Minimize cross-layer ordering constraints • Control delivery in parallel via packet content

  26. DB People Should Be Experts! • BUT… remember: • Basic Internet assumption:“a network of unknown topology and with an unknown, unknowable and constantly changing population of competing conversations” (Van Jacobson) • Spoils the whole optimize-then-execute architecture of query optimization • What happens when denvironment/dt < query length?? • What about the competing conversations? • How do we handle the unknown topology? • What about partial failure? • Ideally, we’d like: • the semantics and optimization of DB dataflow • with the agility and efficiency of NW dataflow

  27. The Cosmic Convergence Data Models, Query Opt, DataScalability DATABASE RESEARCH Adaptive QueryProcessing ContinuousQueries, Streams P2P QueryEngines SensorQuery Engines XML Routing Router Toolkits Content Addressingand DHTs DirectedDiffusion NETWORKING RESEARCH Adaptivity, Federated Control, GeoScalability

  28. What does the QP perspective add? • In terms of high-level languages? • In terms of a reusable set of operators? • In terms of optimization opportunities? • In terms of batch-I/O tricks? • In terms of approximate answers? • A “safe” route to Active Networks? • Not computationally complete • Optimizable and reconfigurable -- data independence applies • Fun to be had here! • Addressing a few fronts at Berkeley…

  29. Some of our work at the seams • Starting with centralized engine for remote data sets and streams • Telegraph: eddies, SteMs, FLuX • “Deep Web”, filesharing systems, sensor streams • More recently, querying sensor networks • TinyDB/TAG: in-network queries • And DHT-based overlay networks • PIER

  30. Telegraph Overview

  31. Telegraph: An Adaptive Dataflow System • Themes: Adaptivity and Sharing • Adaptivity encapsulated in operators • Eddies for order of operations • State Modules (SteMs) for transient state • FLuX for parallel load-balance and availability • Work- and state-sharing across flows • Unlike traditional relational schemes, try to share physical structures Franklin, Hellerstein, Hong and students (to follow)

  32. Telegraph Architecture Request Parsing, Metadata XML Catalog Explicit Dataflows SQL Online Query Processing Join Select Project Group Aggregate Transitive Closure DupElim InterModule Comm and scheduling (Fjords) Modules Adaptive Routing and Optimization Juggle Eddy SteM FLuX Ingress File Reader Sensor Proxy P2P Proxy TeSS

  33. Continuous Adaptivity: Eddies • A little more state per tuple • Ready/done bits (extensible a la Volcano/Starburst) • Minimal state in Eddy itself • Queue + parameters being learning • Decisions: which tuple in queue to which operator • Query processing = dataflow routing!! Ron Avnur Eddy

  34. Two Key Observations • Break the set-oriented boundary • Usual DB model: algebra expressions: (R S) T • Common DB implementation: pipelining operators! • Subexpressions needn’t be materialized • Typical implementation is more flexible than algebra • We can reorder in-flight operators • Don’t rewrite graph. Impose a router • Graph edge = absence of routing constraint • Observe operator consumption/production rates • Consumption: cost. Production: cost*selectivity • Could break these down per values of tuples • So fun! • Simple, incremental, general • Brings all of query optimization online • And hence a bridge to ML, Control Theory, Queuing Theory

  35. State Modules (SteMs) static dataflows • Goal: Further adaptivity through competition • Multiple mirrored sources (AMs) • Handle rate changes, failures, parallelism • Multiple alternate operators • Join = Routing + State • SteM operator manages tradeoffs • State Module, unifies caches, rendezvous buffers, join state • Competitive sources/operators share building/probing SteMs • Join algorithm hybridization! • Eddies + SteMs tackle the full (single-site) query optimization problem online Vijayshankar Raman, Amol Deshpande eddy eddy + stems

  36. Fault-tolerant, Load-balancing eXchange Continuous/long-running flows need high availability Big flows need parallelism Adaptive Load-Balancing req’d FLuX operator: Exchange plus… Adaptive flow partitioning (River) Transient state replication & migration Replication & checkpointing for SteMs Note: set-based, not sequence-based! Needs to be extensible to different ops: Content-sensitivity History-sensitivity Dataflow semantics Optimize based on edge semantics Networking tie-in again: At-least-once delivery? Exactly-once delivery? In/Out of order? Mehul Shah FLuX: Routing Across Cluster

  37. Continuously AdaptiveContinuous Queries (CACQ) • Continuous Queries clearly need all this stuff! • Natural application of Telegraph infrastructure • 4 Ideas in CACQ: • Use eddies to allow reordering of ops. • But one eddy will serve for all queries • Queries are data: join with Grouped Filter • A la stored get! • This idea extended in PSOUP (Chandrasekaran & Franklin) • Explicit tuple lineage • Mark each tuple with per-op ready/done bits • Mark each tuple with per-query completed bits • Joins via SteMs, shared across all queries • Note: mixed-lineage tuples in a SteM. I.e. shared state is not shared algebraic expressions! • Delete a tuple from flow only if it matches no query Sam Madden, Mehul Shah, Vijayshankar Raman, Sirish Chandrasekaran

  38. Sensor QP: TinyDB/TAG

  39. Wireless Sensor Networks Palm DevicesLinux Smart Dust MotesTinyOS • A spectrum of devices • Varying degrees of power and network constraints • Fun is on the small side! • Our current platform: Mica and TinyOS • 4Mhz Atmel CPU, 4KB RAM, 40kBit radio, 512K EEPROM, 128K Flash • Sensors: temp, light, accelerometer, magnetometer, mic, etc. • Wireless, single-ported, multi-hop ad-hoc network • Spanning-tree communication through “root”

  40. TinyDB • A query/trigger engine for motes • Declarative (SQL-like) language for optimizability • Data independence arguments in spades here! • Non-programmers can deal with it • Lots of challenges at the seams of queries and routing • Query plans over dynamic multi-hop network • With power and bandwidth consumption as key metrics Sam Madden (w/Hellerstein, Hong, Franklin)

  41. Query Focus: Hierarchical Aggregation • Aggregation natural in sensornets • The “big picture” typically interesting • Aggregation can smooth noise and loss • E.g. signal processing aggs like wavelets • Provides data reduction • Power/Network Reduction:in-network aggregation • Hierarchical version of parallel aggregation • Tricky design space • power vs. quality • topology-selection • value-based routing • dynamic environment requires adaptivity

  42. TinyDB Sample Apps • Habitat Monitoring: what is the average humidity in the populated petrel burrows on Great Duck Island right now? • Smart Office: find me the conference rooms that have been reserved but unoccupied for 5 minutes. • Home Automation: lower blinds when light intensity is above a threshold.

  43. Performance in SensorNets • Power consumption • Communication >> Computation • METRIC: radio wake time • Send > Receive • METRIC: messages generated • “Run for 5 years” vs. “Burn power for critical events” vs. “Run my experiment” • Bandwidth Constraints • Internal >> External • Volume >> surface area • Result Quality • Noisy sensors • Discrete sampling of continuous phenomena • Lossy communication channel

  44. TinyDB • SQL-like language for specifying continuous queries and triggers • Schema management, etc. • Proxy on desktop, small query engine per mote • Plug and play (query snooping) • To keep the engine “tiny”, use an eddy-style arch • One explicit copy of each iterator’s code image • Adaptive dataflow in network Alpha available for download on SourceForge

  45. Some of the Optimization Issues • Extensible Aggregation API: • Init(), Iter(), SplitFlow(), Close() • Properties • Amount of intermediate state • Duplicate sensitivity • Monotonicity • Exemplary vs. Summary • Hypothesis Testing • Snooping and Suppression • Compression, Presumption, Interpolation Generally, QP and NW issues intertwine!

  46. PIER: Querying the Internet

  47. Querying the Internet • As opposed to querying over the Internet • Have to deal with Internet realities • Scale, dynamics, federated admin, partial failure, etc. • Standard distributed DBs won’t work • Applications • Start with real-time, distributed network monitoring • Traffic monitoring, intrusion/spam detection, software deployment detection (e.g. via TBIT), etc. • Use PIER’s SQL as a workload generator for networks? • Virtual “tables” determine load produced by each site • “Queries” become a way of specifying site-to-site communication • Move to infect the network more deeply? • E.g. Indirection schemes like i3, rendezvous mechanisms, etc. • Overlays only?

  48. And p2p QP, Obviously • Gnutella done right • And it’s so easy! :-) • Crawler-free web search • Bring WYGIWIGY queries to the people • Ranking, recommenders, etc. • Got to be more fun here • If p2p takes off in a big way, queries have to be a big piece • Why p2p DB, anyway? • No good reason I can think of! :-) • Focus on the grassroots nature of p2p • Schema integration and transactions and … ?? • No! Work with what you got! Query the data that’s out there • Nothing complicated for users will fly • Avoid the “DB” word: P2P QP, not P2P DB

  49. Approach: Leverage DHTs • “Distributed Hash Tables” • Family of distributed content-routing schemes • CAN, CHORD, Pastry, Tapestry, etc. • Internet scale “hash table” • A la wide-area, adaptive Exchange routing table • With some notion of storage • Leverage DHTs aggressively • As distributed indexes on stored data • As state modules for query processing • E.g. use DHTs as the hash tables in a hash join • As rendezvous points for exchanging info • E.g. Bloom Filters

  50. PIER: P2p Information Exchange and Retrieval • Relational-style query executor • With front-ends for SQL and catalogs • Standard and continuous queries • With access to DHT APIs • Currently CAN and Chord, working on Tapestry • Common DHT API would help • Currently simulating queries running on 10’s of thousands of nodes • Look ma, it scales! • Widest-scale relational engine ever, looks feasible • Most of the simulator code will live on in implementation • On Millennium and PlanetLab this fall/winter Ryan Huebsch and Boon Thau Loo (w/Hellerstein, Shenker, Stoica)

More Related