Streaming Data, Continuous Queries, and Adaptive Dataflow - PowerPoint PPT Presentation

yannis
streaming data continuous queries and adaptive dataflow n.
Skip this Video
Loading SlideShow in 5 Seconds..
Streaming Data, Continuous Queries, and Adaptive Dataflow PowerPoint Presentation
Download Presentation
Streaming Data, Continuous Queries, and Adaptive Dataflow

play fullscreen
1 / 13
Download Presentation
Streaming Data, Continuous Queries, and Adaptive Dataflow
95 Views
Download Presentation

Streaming Data, Continuous Queries, and Adaptive Dataflow

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002 .

  2. Data Stream Processing • Networked data streams central to current and future computing. • Existing data management and query processing infrastructure is lacking: • Adaptability • Continuous and Incremental Processing • Work Sharing for large scale • Resource scalability: from “smart dust” up to clusters to grids. • XML provides additional opportunites. 2

  3. Example 1: “Transactional Flows” • E-Commerce, clickstream, swipestream, logs… • Network Monitoring • B2B and Enterprise apps • Supply-Chain, CRM, ERP • (Quasi) real-time flow of events and data • Must manage these flows to drive business processes. • Mine flows to create and adjust business rules. • Can also “tap into” flows for on-line analysis. 3

  4. Example 2: Information Dissemination • Doc creation or crawler initiates flow of data towards users. • profiles are aggregated back towards data. Data Sources User Profiles Filtered Data Users 4

  5. Example 3: Sensor Nets • Tiny (or not so tiny) devices measure the physical world. • Berkeley “motes”, Smart Dust, Smart Tags, … • Many monitoring applications • Transportation, Seismic, Energy, Military… • Form dynamic ad hoc networks. • Aggregate and communicate streams of values. • Not one way – can actuate to effect or actively monitor the environment 5

  6. Common Features • Centrality of Dataflow and Data Routing • Architecture is focused on data movement • Moving streams of data through code in a network • Volatility of the environment • Dynamic resources & topology, partial failures • Long-running (never-ending?) tasks • Potential for user interaction during the flow • Large Scale: users, data, resources, … • Resource Constraints • Bandwidth, memory,processing,battery,… • Time and human attention 6

  7. Query Result In The Beginning Index Data 7

  8. Data Result Pub Sub/CQ/Filtering Index Queries • Effectively processes all queries simultaneously. • Shares work for common sub-expressions. 8

  9. Result Data Telegraph/PSoup: Query & Data Duality Index Index Queries Data 9

  10. Result Query Telegraph/PSoup: Query & Data Duality Index Index Queries Data 10

  11. PSoup – Query Invocation • PSoup continuously maintains materialized views over streaming data andqueries. • Data is returned to user when query is invoked. • Invocation requires applying “windows” to precomputed results. • Adaptive approach allows system to continuously absorb new data and new queries without recompilation. • Lots of issues to study: • Query indexing, Spilling to disk, bulk processing • Other semantics and interaction models (e.g., alerts) 11

  12. Stream Processing Research Agenda • Need continuously-adaptive processing. • Need appropriate data model & query lang. • Window semantics: input and output • Notification semantics & thresholds • Approximation, satisficing, and QoS • must be driven by user needs and context • adapt to available resources & time constraints • Integration & interaction with “pooled” data. • time travel, archiving, “normal” databases • Structured, semi-, and un- data; XML etc. • Sensor-sensitive processing. • Metrics and Benchmarks (challenge problems). 12

  13. Conclusions • Dataflow and streaming are central to many emerging application areas. • Solutions require a mixture of database and networking approaches: • adaptivity and tolerance of partial failure • exploitation of user, app, and data semantics • A new infrastructure is needed for solving these problems. • Duality of Data and Queries • Currently a topic of major interest in the research community. 13