Applying Control Theory to Stream Processing Systems

Applying Control Theory to Stream Processing Systems

Applying Control Theory to Stream Processing Systems

Applying Control Theory to Stream Processing Systems

Wei Xu (xuw@cs.berkeley.edu)

Bill Kramer (kramer@lbl.gov)

Joe Hellerstein ( hellers@us.ibm.com )

TCQ drops tuples silently if result queue is full

TCQ

Complex internal structure

Data Source

Input Buffer

- Data source does not provide accurate data rate

- TCQ node drops tuples when result queue fill up

Source

Buffer

TCQ

Result Q

- Providing an accurate data source
- Get the actual data rate

- Regulate queue length on TCQ node
- Prevent dropping tuples
- Maximize throughput (and adapts when disturbance happens)

Queue Length Monitor

Controlled

Data Source

Output Rate

Controller

PI Controller

P Controller

P Controller with Pre-compensation

PI Controller

Source

Buffer

TCQ

Result Q

Source

Buffer

TCQ

Result Q

- One of my implementations .. What happened?

Source

Buffer

TCQ

Result Q

Controlled

Output Thread(Code Reuse)

Queue Length

Controller

Desired

Queue length

Data Rate to TCQ

Actual Queue Length

Output Y from simulation

Queue length

Time

Model evaluation – Making the system operate in desired range

Data rate vs free space

Free Space

Non-Linear range

Easy for data source, but queue length ..

A lot of small disturbance in a Java program

Incremental garbage collection

P Controller

PI Controller

- Advantages of feedback control
- Make system more robust under disturbance
- Treat complex systems as black boxes
- Cope with the system characteristics instead of having to change it

- Encourage reporting system statistics
- Implementation is easy and has theoretical guarantees

- Load balancer
- Smaller sample time to reduce disturbance caused by Java GC?
- Controller on scheduling of system shared by multiple streams

- Problems and Motivation
- Controller design
- Result
- Discussion

Tuples

TCQ Node

Tuple

Blocks

Routing

Logic

Input Buffer

Data

Source

TCQ Node

Load Splitter

Tuples

Queue length

- Operation of Load Splitter
- Arriving blocks wait in Input Buffer
- Tuples are routed to balance TCQ queue lengths
- Stop routing if queue length is too large to avoid tuple discards

Revised

We know

Y(k) , and we know what we want y(k+1) to be.. Use transfer function to solve for u(k)…

(Expected result – accuracy and disturbance ) -- do be done

y(k+1)=ay(k)+bu(k)

Regression

Model evaluation – A data rate that make it operate in linear range