streambase systems l.
Skip this Video
Loading SlideShow in 5 Seconds..
StreamBase Systems PowerPoint Presentation
Download Presentation
StreamBase Systems

Loading in 2 Seconds...

play fullscreen
1 / 30

StreamBase Systems - PowerPoint PPT Presentation

  • Uploaded on

StreamBase Systems. Stream Processing Overview. Dr. Stan Zdonik, Co-Founder March 14, 2006. Agenda. Problem Space and Landscape Case Scenarios Technical Approaches to CEP What is required of a Stream Processing Engine Emphasis on StreamSQL Future Directions for the Community. Investors.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

StreamBase Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
streambase systems

StreamBase Systems

Stream Processing Overview

Dr. Stan Zdonik, Co-Founder

March 14, 2006

  • Problem Space and Landscape
  • Case Scenarios
  • Technical Approaches to CEP
  • What is required of a Stream Processing Engine
    • Emphasis on StreamSQL
  • Future Directions for the Community



StreamBase at a Glance

  • Founded in 2003 by Dr. Mike Stonebraker (Ingres, Illustra)
  • Initial research prototype at MIT, Brown, & Brandeis (2001).
  • Boston-based company, with offices in NY, Washington, DC, & Europe
  • Financial backing by tier-one venture capital firms
  • Solid, growing customer base
  • Do for real-time data what relational database and SQL do for stored data

Use Case: Running VWAP

  • Scenario:
    • Every minute for every stock I am trading:
      • Calculate VWAP (vol. weighted avg. price) for my trades & all trades
      • Alert whenever my personal trading execution is inferior to market
  • Solution:
    • 5 StreamBase operators, 30 min to build

(Group by IP prefix;



sum > T1



count > T1

(Group by source IP;



Source IP


count > T2

http, dns, ssh

(Group by source IP;

count distinct protocol)

Example of IP intrusion detection with StreamBase

Use Case: Intrusion Detection

  • Client Scenario
    • Need to identify unusual patterns in IP connections
  • Solution
    • Implement sophisticated filtering & monitoring to drive real-time alerting
    • Immediate termination of suspicious user access
  • Delivery
    • Process, analyze, & act on 50k msgs/sec
use case battalion monitoring

(unit#, x, y)


Window = 1 min

Count > 3

(x.y) across line

and enemy

Lookup (unit#)

 (unit#, x, y, enemy?)

Use Case: Battalion Monitoring
  • Client Scenario
    • Government contractor required filtering of data and reports from reconnaissance aircraft of friendly and enemy activity
    • Determine positions of friendly vs enemy troops, tanks, aircraft in real-time
  • Solution
    • Critical alerting established to pinpoint any/every enemy movement

Example of combat military monitoring of friendly and enemy forces in real-time with StreamBase

cep stream processing marketplace
CEP/Stream Processing Marketplace
  • The high end
  • ~100K messages/second
  • ~1 msec latency
  • Anything will work at the low end
  • 1 message/day: Use pencil & paper
  • 1 message/hour: Use spreadsheet
  • 1 message/minute: Use favorite app server, RDBMS and/or enterprise middleware

Stream Processing Engines




Processing Complexity





Human speed

(seconds to minutes)

Machine speed


Processing Speed

technical approaches to cep
Technical Approaches to CEP
  • Custom code
    • Almost everybody does this today
    • Nobody wants to continue to do this going forward
    • Replacing this with commercial off-the-shelf (COTS) infrastructure will fuel an explosion in exploitation of increasingly ubiquitous real-time data
  • Your favorite rule engine
  • StreamSQL stream processing engine
rule 1 keep the data moving
Rule 1: Keep the Data Moving

To achieve low-latency, perform data processing without first storing and retrieving the data

In-stream Processing

Traditional Data Processing

Event Data

StreamBase Application

Alerts Actions








  • Low latency
    • No waiting
    • Results delivered in-flight
rule 2 query paradigm streamsql
Rule 2: Query Paradigm (StreamSQL)

What is StreamSQL?

  • StreamSQL extends conventional SQL with time windows for key functions (e.g. joining, querying, aggregating data)
      • Streams do not have “end of table”
      • Optimal approach for unifying processing of real-time and stored data
  • SQL is a good paradigm
      • For analytics
      • And filtering
      • “Gold standard” for stored data

Use querying mechanism to find output events of interest or compute analytics on real-time and historical data

streamsql programming paradigm

Arrival time

Data Value

3:01.00 3:01.10 3:01.20 3:01.30 3:02.00 3:02.40 3:03.55 3:04.10 3:04.88 3:05.75 3:06.28 3:07.00 3:08.50 3:09.50

StreamSQL Programming Paradigm
  • Time window-based computations, statistics
  • Extensibility
    • User-defined functions and aggregates
    • Custom Java or C++ operators
    • Modules for reusability
  • Stores state
integrating real time and stored state
Integrating Real Time and Stored State……

Produce the split-adjusted price of every security in a feed over several days (stock can split more than once)

Two feeds:

Tick (symbol, price, volume, date, time)

Splits (symbol, date, time, split_factor)

streamsql solution for real time and stored data
StreamSQL solution for Real-Time and Stored Data

Stored table: Store (symbol, factor)

Feeds: Tick and Split



(SET factor = factor * S.split_factor)


WHERE symbol = S.symbol

SELECT T.symbol, price = T.price * S.factor,

T.volume,, T.time

FROMTick T, Store S

WHERE S.symbol = T.symbol


Stream and Table


Stream and Table

streamsql solution
StreamSQL Solution

….or a four box application in the StreamBase GUI

Some programmers prefer textual notation; some prefer GUI. Take your pick.

Tick (symbol, price, volume, date, time)


T.price * S.factor


(Symbol, Factor)

Splits (symbol, date, time, split_factor)


factor * S.split_factor

characteristics of example
Characteristics of Example
  • Storage of (perhaps lots of) state
  • Decision making based on a mix of stored state and real time computation

StreamSQL has a single programming paradigm for both kinds of data.

Not necessarily true for other technical approaches.

what about pattern matching
What About Pattern-Matching?
  • Example: Find IBM ticks over 80 followed by at least two ticks under 80.


SELECT symbol, T1.price AS price1, T2.price AS price2,

T3.price AS price3

FROM Ticks T1 -> Ticks T2 -> Ticks T3

WHERE T1.symbol = T2.symbol AND T2.symbol = T3.symbol;


FROM TickTriples

WHERE price1 > 80 AND price2 < 80 AND price3 < 80 AND

symbol = "IBM";

Regular expression (pattern matching) is the same in any technology!!!

performance streamsql
Performance – StreamSQL
  • Internal query plan (think of it as our graphical workflow notation)
  • For any event, we know exactly what processing happens next
  • As a result, we can optimize the plan
streamsql advantages
StreamSQL Advantages
  • Superior performance
  • Easy programmability (and maintainability)
  • One notation for real-time and stored data
  • Includes regular expression evaluation
  • Closer to basis for standardization
    • FROM clause can mix stored tables and streams
    • Add time windows to SQL
    • Add stream disorder to SQL
rule 3 handle delayed missing out of order data
Rule 3: Handle Delayed, Missing,& Out-of-Order Data
  • Ability to time-out individual calculations or computations
  • Ability to merge streams and plug gaps from one with valid value in another
  • Bounded sort operation (BSORT)
  • Outer-join

Make provision for handling data which is late or delayed, missing, or out-of-sequence

rule 4 generate predictable outcomes
Rule 4: Generate Predictable Outcomes
  • Two distinct runs of the system with the same input should yield the same output (deterministic).
  • Ensure calculations performed on one time-series record do not interfere with calcs done on another

Process time-series records (tuples) in a consistent manner

rule 5 process streaming or stored data

Alerts Actions

Real-time Feeds

Remote process

Embedded local storage

Data store

Rule 5: Process Streaming or Stored Data

Store and access current or historical state information, preferably using a familiar standard such as SQL

  • Interfaces:
    • Embedded in-process DB for low latency, low overhead
    • Standards such as ODBC, JDBC to external databases
  • Ability to test trading algorithms on historical data, then switch seamlessly to live feed
rule 6 guarantee data safety availability

If a failure occurs (hardware, operating system, software), the streaming application must failover to a back-up and keep running






Market Data

Market Data




Rule 6: Guarantee Data Safety & Availability
  • Restarting and recovering from a log for real-time processing is not practical.
  • Better idea: A tandem-style approach for streaming data
rule 7 partition scale automatically
Rule 7: Partition & Scale Automatically
  • Easily split application without custom-coding
  • Multi-threading:
    • To utilize multi-CPU (Multi-core) hardware
    • Avoid blocking for external events and maintain low latency

Split an application over multiple processors or machines for scalability, without developer having to write low-level code

rule 8 process respond instantaneously
Rule 8: Process & Respond Instantaneously
  • Ensure high availability, stored/real-time processing, handling stream imperfections all work concurrently with low latency
  • Test rigorously—simulated and live feeds
  • Monitor latency and processing speed in messages/second

Run all 7 rules in-process at tens to hundreds of thousands of messages/second with low latency


Client Applications

Output Stream

Input Stream

StreamBase Application

StreamBase Application

Messaging/Transport System

Messaging/Transport System

StreamBase Server

StreamBase Server

Operating System

Operating System

Output Stream

Input Stream



Stream Processing Engine Architecture

The StreamBase Server

Infrastructure Capabilities:

  • 10k-500K+ msgs/sec
  • High availability
  • 64 bit addressing
  • Supports clusters & blade configurations via application & data partitioning

Functional Capabilities:

  • Implements StreamSQL
  • Multi-threaded with real-time scheduling
  • Multiple options for managing stored data
  • Insertion of custom logic & analytics to the data stream
  • Adapters to external data sources & messaging systems

Integrated environment for building, testing, deploying

Integrated Development Environment

  • Eclipse-based IDE
  • Drag-and-connect with workflow orientation
  • Built-in load simulation for easy testing
  • Stream Record/Playback
  • Custom C++ or Java operators
  • Debugger & performance monitor
future directions for the community
Future Directions for the Community
  • Standard vocabulary and vernacular:
    • E.g. “events,” “CEP,” “stream processing,” “pattern-matching”
  • Education and visibility around category:
    • Analyst reports
    • Broader market education
  • Technical standards:
    • Benchmarks: Performance, scalability
    • Languages: StreamSQL or extended SQL
  • Research:
    • Approximation
    • Distributed processing
    • Self-adaptive
    • Sensor applications
    • Scientific applications
thank you

London Office107-111 Fleet StreetLondon EC4A 2ABUnited Kingdom+44 (0)20 7936 9050

Corporate Headquarters181 Spring StreetLexington, Massachusetts 02421+1 866 STRMBAS+1 866 787 6227+1 781 761 0800

Reston, Virginia Office11921 Freedom Drive, Suite 550 Reston, VA 20190+1 703 608 6958

New York City Office220 West 42nd Street, 20th FloorNew York, New York 10036+1 866 STRMBAS+1 866 787 6227

Enterprise-classstream processing software designed totransformreal-time complex eventsinto actionable intelligence

Thank You