streambase systems l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
StreamBase Systems PowerPoint Presentation
Download Presentation
StreamBase Systems

Loading in 2 Seconds...

play fullscreen
1 / 30

StreamBase Systems - PowerPoint PPT Presentation


  • 167 Views
  • Uploaded on

StreamBase Systems. Stream Processing Overview. Dr. Stan Zdonik, Co-Founder March 14, 2006. Agenda. Problem Space and Landscape Case Scenarios Technical Approaches to CEP What is required of a Stream Processing Engine Emphasis on StreamSQL Future Directions for the Community. Investors.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'StreamBase Systems' - denim


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
streambase systems

StreamBase Systems

Stream Processing Overview

Dr. Stan Zdonik, Co-Founder

March 14, 2006

agenda
Agenda
  • Problem Space and Landscape
  • Case Scenarios
  • Technical Approaches to CEP
  • What is required of a Stream Processing Engine
    • Emphasis on StreamSQL
  • Future Directions for the Community
slide3

Investors

Partners

StreamBase at a Glance

  • Founded in 2003 by Dr. Mike Stonebraker (Ingres, Illustra)
  • Initial research prototype at MIT, Brown, & Brandeis (2001).
  • Boston-based company, with offices in NY, Washington, DC, & Europe
  • Financial backing by tier-one venture capital firms
  • Solid, growing customer base
  • Do for real-time data what relational database and SQL do for stored data
slide4

Use Case: Running VWAP

  • Scenario:
    • Every minute for every stock I am trading:
      • Calculate VWAP (vol. weighted avg. price) for my trades & all trades
      • Alert whenever my personal trading execution is inferior to market
  • Solution:
    • 5 StreamBase operators, 30 min to build
slide5

(Group by IP prefix;

Sum)

Filter

sum > T1

18.31.0.*

18.31.0.89

Filter

count > T1

(Group by source IP;

Count)

Join

Source IP

Filter

count > T2

http, dns, ssh

(Group by source IP;

count distinct protocol)

Example of IP intrusion detection with StreamBase

Use Case: Intrusion Detection

  • Client Scenario
    • Need to identify unusual patterns in IP connections
  • Solution
    • Implement sophisticated filtering & monitoring to drive real-time alerting
    • Immediate termination of suspicious user access
  • Delivery
    • Process, analyze, & act on 50k msgs/sec
use case battalion monitoring

(unit#, x, y)

Count;

Window = 1 min

Count > 3

(x.y) across line

and enemy

Lookup (unit#)

 (unit#, x, y, enemy?)

Use Case: Battalion Monitoring
  • Client Scenario
    • Government contractor required filtering of data and reports from reconnaissance aircraft of friendly and enemy activity
    • Determine positions of friendly vs enemy troops, tanks, aircraft in real-time
  • Solution
    • Critical alerting established to pinpoint any/every enemy movement

Example of combat military monitoring of friendly and enemy forces in real-time with StreamBase

cep stream processing marketplace
CEP/Stream Processing Marketplace
  • The high end
  • ~100K messages/second
  • ~1 msec latency
  • Anything will work at the low end
  • 1 message/day: Use pencil & paper
  • 1 message/hour: Use spreadsheet
  • 1 message/minute: Use favorite app server, RDBMS and/or enterprise middleware

Stream Processing Engines

(StreamSQL)

Complex

events

Processing Complexity

Conventional

Architectures

Simple

events

Human speed

(seconds to minutes)

Machine speed

(msec)

Processing Speed

technical approaches to cep
Technical Approaches to CEP
  • Custom code
    • Almost everybody does this today
    • Nobody wants to continue to do this going forward
    • Replacing this with commercial off-the-shelf (COTS) infrastructure will fuel an explosion in exploitation of increasingly ubiquitous real-time data
  • Your favorite rule engine
  • StreamSQL stream processing engine
rule 1 keep the data moving
Rule 1: Keep the Data Moving

To achieve low-latency, perform data processing without first storing and retrieving the data

In-stream Processing

Traditional Data Processing

Event Data

StreamBase Application

Alerts Actions

Memory

Memory

Updates

Disk

Disk

Queries

Queries

  • Low latency
    • No waiting
    • Results delivered in-flight
rule 2 query paradigm streamsql
Rule 2: Query Paradigm (StreamSQL)

What is StreamSQL?

  • StreamSQL extends conventional SQL with time windows for key functions (e.g. joining, querying, aggregating data)
      • Streams do not have “end of table”
      • Optimal approach for unifying processing of real-time and stored data
  • SQL is a good paradigm
      • For analytics
      • And filtering
      • “Gold standard” for stored data

Use querying mechanism to find output events of interest or compute analytics on real-time and historical data

streamsql programming paradigm

Arrival time

Data Value

3:01.00 3:01.10 3:01.20 3:01.30 3:02.00 3:02.40 3:03.55 3:04.10 3:04.88 3:05.75 3:06.28 3:07.00 3:08.50 3:09.50

StreamSQL Programming Paradigm
  • Time window-based computations, statistics
  • Extensibility
    • User-defined functions and aggregates
    • Custom Java or C++ operators
    • Modules for reusability
  • Stores state
integrating real time and stored state
Integrating Real Time and Stored State……

Produce the split-adjusted price of every security in a feed over several days (stock can split more than once)

Two feeds:

Tick (symbol, price, volume, date, time)

Splits (symbol, date, time, split_factor)

streamsql solution for real time and stored data
StreamSQL solution for Real-Time and Stored Data

Stored table: Store (symbol, factor)

Feeds: Tick and Split

_________________________________________

UPDATEStore

(SET factor = factor * S.split_factor)

FROMSplit S

WHERE symbol = S.symbol

SELECT T.symbol, price = T.price * S.factor,

T.volume, T.date, T.time

FROMTick T, Store S

WHERE S.symbol = T.symbol

Mixing

Stream and Table

Mixing

Stream and Table

streamsql solution
StreamSQL Solution

….or a four box application in the StreamBase GUI

Some programmers prefer textual notation; some prefer GUI. Take your pick.

Tick (symbol, price, volume, date, time)

(read)

T.price * S.factor

Store

(Symbol, Factor)

Splits (symbol, date, time, split_factor)

(write)

factor * S.split_factor

characteristics of example
Characteristics of Example
  • Storage of (perhaps lots of) state
  • Decision making based on a mix of stored state and real time computation

StreamSQL has a single programming paradigm for both kinds of data.

Not necessarily true for other technical approaches.

what about pattern matching
What About Pattern-Matching?
  • Example: Find IBM ticks over 80 followed by at least two ticks under 80.

CREATE STREAM TickTriples AS

SELECT symbol, T1.price AS price1, T2.price AS price2,

T3.price AS price3

FROM Ticks T1 -> Ticks T2 -> Ticks T3

WHERE T1.symbol = T2.symbol AND T2.symbol = T3.symbol;

SELECT *

FROM TickTriples

WHERE price1 > 80 AND price2 < 80 AND price3 < 80 AND

symbol = "IBM";

Regular expression (pattern matching) is the same in any technology!!!

performance streamsql
Performance – StreamSQL
  • Internal query plan (think of it as our graphical workflow notation)
  • For any event, we know exactly what processing happens next
  • As a result, we can optimize the plan
streamsql advantages
StreamSQL Advantages
  • Superior performance
  • Easy programmability (and maintainability)
  • One notation for real-time and stored data
  • Includes regular expression evaluation
  • Closer to basis for standardization
    • FROM clause can mix stored tables and streams
    • Add time windows to SQL
    • Add stream disorder to SQL
rule 3 handle delayed missing out of order data
Rule 3: Handle Delayed, Missing,& Out-of-Order Data
  • Ability to time-out individual calculations or computations
  • Ability to merge streams and plug gaps from one with valid value in another
  • Bounded sort operation (BSORT)
  • Outer-join

Make provision for handling data which is late or delayed, missing, or out-of-sequence

rule 4 generate predictable outcomes
Rule 4: Generate Predictable Outcomes
  • Two distinct runs of the system with the same input should yield the same output (deterministic).
  • Ensure calculations performed on one time-series record do not interfere with calcs done on another

Process time-series records (tuples) in a consistent manner

rule 5 process streaming or stored data

Alerts Actions

Real-time Feeds

Remote process

Embedded local storage

Data store

Rule 5: Process Streaming or Stored Data

Store and access current or historical state information, preferably using a familiar standard such as SQL

  • Interfaces:
    • Embedded in-process DB for low latency, low overhead
    • Standards such as ODBC, JDBC to external databases
  • Ability to test trading algorithms on historical data, then switch seamlessly to live feed
rule 6 guarantee data safety availability

If a failure occurs (hardware, operating system, software), the streaming application must failover to a back-up and keep running

Secondary

Alerts

Actions

Alerts

Actions

Market Data

Market Data

Checkpoint

Checkpoint

Primary

Rule 6: Guarantee Data Safety & Availability
  • Restarting and recovering from a log for real-time processing is not practical.
  • Better idea: A tandem-style approach for streaming data
rule 7 partition scale automatically
Rule 7: Partition & Scale Automatically
  • Easily split application without custom-coding
  • Multi-threading:
    • To utilize multi-CPU (Multi-core) hardware
    • Avoid blocking for external events and maintain low latency

Split an application over multiple processors or machines for scalability, without developer having to write low-level code

rule 8 process respond instantaneously
Rule 8: Process & Respond Instantaneously
  • Ensure high availability, stored/real-time processing, handling stream imperfections all work concurrently with low latency
  • Test rigorously—simulated and live feeds
  • Monitor latency and processing speed in messages/second

Run all 7 rules in-process at tens to hundreds of thousands of messages/second with low latency

slide26

Client Applications

Output Stream

Input Stream

StreamBase Application

StreamBase Application

Messaging/Transport System

Messaging/Transport System

StreamBase Server

StreamBase Server

Operating System

Operating System

Output Stream

Input Stream

Hardware

Hardware

Stream Processing Engine Architecture

The StreamBase Server

Infrastructure Capabilities:

  • 10k-500K+ msgs/sec
  • High availability
  • 64 bit addressing
  • Supports clusters & blade configurations via application & data partitioning

Functional Capabilities:

  • Implements StreamSQL
  • Multi-threaded with real-time scheduling
  • Multiple options for managing stored data
  • Insertion of custom logic & analytics to the data stream
  • Adapters to external data sources & messaging systems
slide27

Integrated environment for building, testing, deploying

Integrated Development Environment

  • Eclipse-based IDE
  • Drag-and-connect with workflow orientation
  • Built-in load simulation for easy testing
  • Stream Record/Playback
  • Custom C++ or Java operators
  • Debugger & performance monitor
future directions for the community
Future Directions for the Community
  • Standard vocabulary and vernacular:
    • E.g. “events,” “CEP,” “stream processing,” “pattern-matching”
  • Education and visibility around category:
    • Analyst reports
    • Broader market education
  • Technical standards:
    • Benchmarks: Performance, scalability
    • Languages: StreamSQL or extended SQL
  • Research:
    • Approximation
    • Distributed processing
    • Self-adaptive
    • Sensor applications
    • Scientific applications
thank you

London Office107-111 Fleet StreetLondon EC4A 2ABUnited Kingdom+44 (0)20 7936 9050

Corporate Headquarters181 Spring StreetLexington, Massachusetts 02421+1 866 STRMBAS+1 866 787 6227+1 781 761 0800

Reston, Virginia Office11921 Freedom Drive, Suite 550 Reston, VA 20190+1 703 608 6958

New York City Office220 West 42nd Street, 20th FloorNew York, New York 10036+1 866 STRMBAS+1 866 787 6227

Enterprise-classstream processing software designed totransformreal-time complex eventsinto actionable intelligence

Thank You