160 likes | 167 Views
Telegraph: An Adaptive Global-Scale Query Engine. Joe Hellerstein. Scenarios. Ubiquitous computing: more than clients! sensors and their data feeds are key smart dust, biomedical (MEMS sensors) each consumer good records (mis)use disposable computing
E N D
Telegraph: An Adaptive Global-Scale Query Engine Joe Hellerstein
Scenarios • Ubiquitous computing: more than clients! • sensors and their data feeds are key • smart dust, biomedical (MEMS sensors) • each consumer good records (mis)use • disposable computing • video from surveillance cameras, broadcasts, etc. • Global Data Federation • all the data is online – what are we waiting for? • The plumbing is coming • XML/HTTP, etc. give LCD communication • but how do you query robustly over many sites in the wide area?
There’s a Data Flood Coming • What does it look like? • Never ends: interactivity required • Big: data reduction/aggregation is key • Unpredictable: this scale of devices and nets will not behave nicely
The Telegraph Query Engine • Key technologies • Interactive Control • interactivity with early answers • online aggregation for data reduction • Continuously adaptive flow optimization • massively parallel, adaptive dataflow via Rivers and Eddies
CONTROLContinuous Output, Navigation & Transformation with Refinement On Line • Data-intensive jobs are long-running. How to give early answers and interactivity? • online interactivity over feeds: data “juggle” • online query processing algs: ripple joins • statistical estimators, and their performance implications • Appreciate interplay of massive data processing, stats, and UIs
CONTROLContinuous Output and Navigation Technology with Refinement On Line
CONTROLContinuous Output and Navigation Technology with Refinement On Line
Q River • We built the world’s fastest sorting machine • On the “NOW”: 100 Sun workstations + SAN • But it only beat the record under ideal conditions! • River: performance adaptivity for data flows on clusters • simplifies management and programming • perfect for sensor-based streams
Eddy Eddy • How to order and reorder operators over time • based on performance, economic/admin feedback • Vs.River: • River optimizes each operator “horizontally” • Eddies optimize a pipeline “vertically”
Telegraph: Putting it Together • Scalable, adaptive dataflow infrastructure. Apps include… • sensor nets • massively parallel and wide-area query engines • net appliances: chaining xform8n/aggreg8n/etc. proxies • any unpredictable dataflow scenario • Technology: a marriage of… • CONTROL, River & Eddy • Many research questions here • E.g. how to combine River and Eddy adaptivity • E.g. how to tune Eddies for statistical performance goals • Combinations of browse/query/mine at UI • Storage management to handle new hardware realities
Integration with Endeavour • Give • Be data-intensive backbone to diverse clients • Be replication dataflow engine for OceanStore • Telegraph Storage Manager provides storage (xactional/otherwise) for OceanStore • Provide platform for data-intensive “tacit info mining” • Take • Leverage OceanStore to manager distributed metadata, security • Leverage protocols out of TinyOS for sensors
Additional Slides • For use in questions, etc.
Connectivity & Heterogeneity • Lots of folks working on data format translation, parsing • we will borrow, not build • currently using JDBC & Cohera Net Query • commercial tool, donated by Cohera Corp. • gateways XML/HTML (via http) to ODBC/JDBC • we may write “Teletalk” gateways from sensors • Heterogeneity • never a simple problem • Control project developed interactive, online data transformation tool: Potter’s Wheel