hifi systems network centric query processing for the physical world l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
HiFi Systems: Network-Centric Query Processing for the Physical World PowerPoint Presentation
Download Presentation
HiFi Systems: Network-Centric Query Processing for the Physical World

Loading in 2 Seconds...

play fullscreen
1 / 43

HiFi Systems: Network-Centric Query Processing for the Physical World - PowerPoint PPT Presentation


  • 158 Views
  • Uploaded on

High Fan-in. HiFi Systems: Network-Centric Query Processing for the Physical World. Michael Franklin UC Berkeley 2.13.04. Introduction. Continuing improvements in sensor devices Wireless motes RFID Cellular-based telemetry Cheap devices can monitor the environment at a high rate.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

HiFi Systems: Network-Centric Query Processing for the Physical World


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hifi systems network centric query processing for the physical world

High Fan-in

HiFi Systems: Network-Centric Query Processing for the Physical World

Michael Franklin

UC Berkeley

2.13.04

introduction
Introduction
  • Continuing improvements in sensor devices
    • Wireless motes
    • RFID
    • Cellular-based telemetry
  • Cheap devices can monitor the environment at a high rate.
  • Connectivity enables remote monitoring at many different scales.
  • Widely different concerns at each of these levels and scales.
plan of attack
Plan of Attack
  • Motivation/Applications/Examples
  • Characteristics of HiFi Systems
  • Foundational Components
    • TelegraphCQ
    • TinyDB
  • Research Issues
  • Conclusions
rfid retail scenario
RFID - Retail Scenario
  • “Smart Shelves” continuously monitor item addition and removal.
  • Info is sent back through the supply chain.
extranet information flow
“Extranet” Information Flow

Manufacturer C

Retailer A

Aggregation/ Distribution

Service

Manufacturer D

Retailer B

m2m telemetry remote monitoring
M2M - Telemetry/Remote Monitoring
  • Energy Monitoring - Demand Response
  • Traffic
  • Power Generation
  • Remote Equipment
time shift trend prediction
Time-Shift Trend Prediction
  • National companies can exploit East Coast/ West Coast time differentials to optimize West Coast operations.
virtual sensors
Virtual Sensors
  • Sensors don’t have to be physical sensors.
  • Network Monitoring algorithms for detecting viruses, spam, DoS attacks, etc.
  • Disease outbreak detection
properties

HiFi System

Properties
  • High Fan-In, globally-distributed architecture.
  • Large data volumes generated at edges.
    • Filtering and cleaning must be done there.
  • Successive aggregation as you move inwards.
    • Summaries/anomalies continually, details later.
  • Strong temporal focus.
  • Strong spatial/geographic focus.
  • Streaming data and stored data.
  • Integration within and across enterprises.
one view of the design space

seconds

Time

Scale

years

One View of the Design Space

Archiving

(provenance

and schema

evolution)

Filtering,Cleaning,Alerts

Monitoring,

Time-series

Data mining

(recent history)

Combined

Stream/Disk

Processing

On-the-fly

processing

Disk-based

processing

another view of the design space

local

Geographic

Scope

global

Central

Office

Regional

Centers

Several

Readers

Another View of the Design Space

Archiving

(provenance

and schema

evolution)

Filtering,Cleaning,Alerts

Monitoring,

Time-series

Data mining

(recent history)

one more view of the design space

Degree of Detail

Aggregate Data Volume

One More View of the Design Space

Archiving

(provenance

and schema

evolution)

Filtering,Cleaning,Alerts

Monitoring,

Time-series

Data mining

(recent history)

Dup Elim

history: hrs

Interesting Events

history: days

Trends/Archive

history: years

telegraphcq monitoring data streams
TelegraphCQ: Monitoring Data Streams
  • Streaming Data
    • Network monitors
    • Sensor Networks
    • News feeds
    • Stock tickers
  • B2B and Enterprise apps
    • Supply-Chain, CRM, RFID
    • Trade Reconciliation, Order Processing etc.
  • (Quasi) real-time flow of events and data
  • Must manage these flows to drive business (and other) processes.
  • Can mine flows to create/adjust business rules or to perform on-line analysis.
telegraphcq continuous queries
TelegraphCQ (Continuous Queries)
  • An adaptive system for large-scale shared dataflow processing.
  • Based on an extensible set of operators:

1) Ingress (data access)operators

      • Wrappers, File readers, Sensor Proxies

2) Non-BlockingData processing operators

      • Selections (filters), XJoins, …

3)Adaptive Routing Operators

      • Eddies, STeMs, FLuX, etc.
  • Operators connected through “Fjords”
    • queue-based framework unifying push&pull.
    • Fjords will also allow us to easily mix and match streaming and stored data sources.
extreme adaptivity

per tuple

intra-

operator

inter-

operator

static

plans

late

binding

???

Query

Scrambling,

MidQuery

Re-opt

Dynamic,

Parametric,

Competitive,

Eddies,

CACQ

???

current

DBMS

XJoin, DPHJ

Convergent

QP

PSoup

Extreme Adaptivity
  • This is the region that we are exploring in the Telegraph project.
  • Traditional query optimization depends on statistical knowledge of the data and a stable environment.

The streaming world has neither.

adaptivity overview avnur hellerstein 2000

static

dataflow

eddy

A B C D

Adaptivity Overview [Avnur & Hellerstein 2000]

D

C

A

B

  • How to order and reorder operators over time?
  • Traditionally, use performance, economic/admin feedback
  • won’t work for never-ending queries over volatile streams
  • Instead, use adaptive record routing.
    • Reoptimization = change in routing policy
the telegraphcq architecture

Shared Memory

Query Plan Queue

TelegraphCQBack End

TelegraphCQBack End

TelegraphCQ Front End

Eddy Control Queue

Planner Parser Listener

Modules

Modules

Split

Query Result Queues

Mini-Executor

CQEddy

CQEddy

Proxy

}

Split

Split

Catalog

Scans

Scans

Shared Memory Buffer Pool

Wrappers

TelegraphCQ

Wrapper

ClearingHouse

Disk

The TelegraphCQ Architecture

A single CQEddy

can encode multiple

queries.

the streaquel query language
The StreaQuel Query Language

SELECTprojection_list

FROMfrom_list

WHEREselection_and_join_predicates

ORDEREDBY

TRANSFORM…TO

WINDOW…BY

  • Target language for TelegraphCQ
  • Windows can be applied to individual streams
  • Window movement is expressed using a “for loop construct in the “transform” clause
  • We’re not completely happy with our syntax at this point.
current status telegraphcq
Current Status - TelegraphCQ
  • System developed by modifying PostgreSQL.
  • Initial Version released Aug 03
    • Open Source (PostgreSQL license)
    • Shared joins with windows and aggregates
    • Archived/unarchived streams
    • Next major release planned this summer.
  • Initial users include
    • Network monitoring project at LBL (Netlogger)
    • Intrusion detection project at Eurecom (France)
    • Our own project on Sensor Data Processing
    • Class projects at Berkeley, CMU, and ???

Visit http://telegraph.cs.berkeley.edu for more information.

slide23

SELECT MAX(mag)

FROM sensors

WHERE mag > thresh

SAMPLE PERIOD 64ms

App

Query,

Trigger

Data

TinyDB

Sensor Network

  • Query-based interface to sensor networks
  • Developed on TinyOS/Motes
  • Benefits
    • Ease of programming and retasking
    • Extensible aggregation framework
    • Power-sensitive optimization and adaptivity
  • Sam Madden (Ph.D. Thesis) in collaboration with Wei Hong (Intel).

http://telegraph.cs.berkeley.edu/tinydb

declarative queries in sensor nets
Declarative Queries in Sensor Nets
  • Many sensor network applications can be described using query language primitives.
    • Potential for tremendous reductions in development and debugging effort.

SELECT nestNo, light

FROM sensors

WHERE light > 400

EPOCH DURATION 1s

“Report the light intensities of the bright nests.”

Sensors

aggregation query example

Regions w/ AVG(sound) > 200

Aggregation Query Example

“Count the number occupied nests in each loud region of the island.”

  • SELECT region, CNT(occupied) AVG(sound)
  • FROM sensors
  • GROUP BY region
  • HAVING AVG(sound) > 200
  • EPOCH DURATION 10s
query language tinysql
Query Language (TinySQL)

SELECT <aggregates>, <attributes>

[FROM {sensors | <buffer>}]

[WHERE <predicates>]

[GROUP BY <exprs>]

[SAMPLE PERIOD <const> | ONCE]

[INTO <buffer>]

[TRIGGER ACTION <command>]

sensor queries @ 10000 ft

Query

{A,B,C,D,E,F}

A

{B,D,E,F}

B

C

{D,E,F}

D

F

E

Sensor Queries @ 10000 Ft

(Almost) All Queries are Continuous and Periodic

  • Written in SQL
  • With Extensions For :
  • Sample rate
  • Offline delivery
  • Temporal Aggregation

M. Franklin, UC Berkeley, Feb. 04

in network processing aggregation

1

2

3

4

5

In-Network Processing: Aggregation

SELECT COUNT(*) FROM sensors

Interval 4

Sensor #

Epoch

Interval #

in network processing aggregation29

1

2

3

4

5

In-Network Processing: Aggregation

SELECT COUNT(*) FROM sensors

Interval 4

Sensor #

Epoch

Interval #

1

in network processing aggregation30

1

2

3

4

5

In-Network Processing : Aggregation

SELECT COUNT(*) FROM sensors

Interval 3

Sensor #

2

Interval #

in network processing aggregation31

1

2

3

4

5

In-Network Processing : Aggregation

SELECT COUNT(*) FROM sensors

Interval 2

Sensor #

1

3

Interval #

in network processing aggregation32

1

2

3

4

5

In-Network Processing : Aggregation

SELECT COUNT(*) FROM sensors

Interval 1

5

Sensor #

Interval #

in network processing aggregation33

1

2

3

4

5

In-Network Processing : Aggregation

SELECT COUNT(*) FROM sensors

Interval 4

Sensor #

Interval #

1

in network aggregation example benefits
In Network Aggregation: Example Benefits

2500 Nodes

50x50 Grid

Depth = ~10

Neighbors = ~20

M. Franklin, UC Berkeley, Feb. 04

taxonomy of aggregates
Taxonomy of Aggregates
  • TinyDB insight: classify aggregates according to various functional properties
    • Yields a general set of optimizations that can automatically be applied
current status tinydb
Current Status - TinyDB
  • System built on top of TinyOS (~10K lines embedded C code)Latest release 9/2003
  • Several deployments including redwoods at UC Botanical Garden

36m

33m: 111

32m: 110

30m: 109,108,107

20m: 106,105,104

10m: 103, 102, 101

Visit http://telegraph.cs.berkeley.edu/tinydb for more information.

ursa a hifi implementation

Mid-tier

(???)

Ursa-Minor

(TinyDB-based)

Ursa-Major

(TelegraphCQ w/Archiving)

Ursa - A HiFi Implementation
  • Current effort towards building an integrated infrastructure that spans the large scale in:
    • Time
    • Geography
    • Resources
telegraphcq tinydb integration
TelegraphCQ/TinyDB Integration
  • Fjords [Madden & Franklin 02] provide the dataflow plumbing necessary to use TinyDB as a data stream.
  • Main issues revolve around what to run where.
    • TCQ is a query processor
    • TinyDB is also a query processor
    • Optimization criteria include: total cost, response time, answer quality, answer likelihood, power conservation on motes, …
  • Project on-going, should work by summer.
  • Related work: Gigascope work at AT&T
tcq based overlay network
TCQ-based Overlay Network
  • TCQ is primarily a single node system
    • Flux operators [Shah et al 03] support cluster-based processing.
  • Want to run TCQ at each internal node.
  • Primary issue is support for wide-area temporal and geographic aggregation.
    • In an adaptive manner, of course
  • Currently under design.
  • Related work: Astrolabe, IRISNet, DBIS, …
querying the past present and future
Querying the Past, Present, and Future
  • Need to handle archived data
    • Adaptive compression can reduce processing time.
    • Historical queries
    • Joins of Live and Historical Data
    • Deal with later arriving detail info
  • Archiving Storage Manager - A Split-stream SM for stream and disk-based processing.
  • Initial version of new SM running.
  • Related Work: Temporal and Time-travel DBs
xml integration and other realities
XML, Integration, and Other Realities
  • Eventually need to support XML
  • Must integrate with existing enterprise apps.

In many areas, standardization well underway

  • Augmenting moving data
  • Related Work: YFilter [Diao & Franklin 03], Mutant Queries [Papadimos et al. OGI], 30+ years of data integration research, 10+ years of XML research, …

High Fan-in  High Fan-out

conclusions

HiFi Systems

Conclusions
  • Sensors, RFIDs, and other data collection devices enable real-time enterprises.
  • These will create high fan-in systems.
  • Can exploit recent advances in streaming and sensor data management.
  • Lots to do!