High Performance Algorithms for Multiple Streaming Time Series

High Performance Algorithms for Multiple Streaming Time Series Xiaojian Zhao Advisor: Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences New York University Jan. 10 2006

Roadmap: • Motivation • Incremental Uncooperative Time Series Correlation • Incremental Matching Pursuit (MP) (optional) • Future Work and Conclusion

Motivation (1) • Financial time series streams are watched closely by millions of traders. • “Which pairs of stocks were correlated with a value of over 0.9 for the last three hours? Report this information every half hour” (Incremental pairwise correlation) • “How to form a portfolio consisting of a small set of stocks which replicates the market? Update it every hour” (Incremental matching pursuit)

Motivation (2) • As processors speed up, algorithmic efficiency no longer matters … one might think. • True if problem sizes stay same. But they don’t. • As processors speed up, sensors improve • Satellites spewing out more data a day • Magnetic resonance imagers give higher resolution images, etc.

High performance incremental algorithms • Incremental Uncooperative Time Series Correlation • Monitor and report the correlation information among all time series incrementally (e.g. every half hour) • Improve the efficiency from quadratic to super-linear • Incremental Matching Pursuit (MP) • Monitor and report the approximation vectors of matching pursuit incrementally (e.g. every hour) • Improve the efficiency significantly

Incremental Uncooperative Time Series Correlation

Problem statement: • Detect and report the correlation incrementally and rapidly • Extend the algorithm into a general engine • Apply it in practical application domains

Correlated! Correlated! Online detection of high correlation

Pearson correlation and Euclidean distance • Normalized Euclidean distance  Pearson correlation • Normalization • dist2=2(1- correlation) • From now on, we will not differentiate between correlation and Euclidean distance

Naïve approach: pairwise correlation • Given a group of time series, compute the pairwise correlation • Time O(WN2),where • N : number of streams • W: window size (e.g. in the past one hour) Let’s see high performance algorithms!

Technical review Framework: GEMINI Tools: Data Reduction Techniques • Deterministic Orthogonal vs. Randomized • Fourier Transform, Wavelet Transform, and Random Projection Target: Various Data • Cooperative vs. Uncooperative

GEMINI Framework* Data reduction, e.g. DFT, DWT, SVD * Faloutsos, C., Ranganathan, M. & Manolopoulos, Y. (1994). Fast subsequence matching in time-series databases,. SIGMOD, 1994

GEMINI: an example • Objective: find the nearest neighborhood (L2-norm) of each time series. • Compute the Fourier Transform over each of them, e.g. X and Y; yield two coefficient vectors Xf and Yf • Xf=(a1, a2, …ak) and Yf=(b1, b2, …bk) • Original distance vs. coefficient distance (Parseval's Theorem) • Because, for some data types, energy concentrates on first a few frequency components, coefficient distance can work as a very good filter and at the same time guarantee no false negatives • They may be stored in a tree or grid structure

DFT on random walk

Review: DFT/DWT vs. Random Projection Fourier Transform, Wavelet Transform and SVD • A set of orthogonal base (deterministic) • Based on Parseval's Theorem Random Projection • A set of random base (non-deterministic) • Based on Johnson-Lindenstrauss (JL) Lemma Orthogonal Base Random Base

Review: Random Projection: Intuition • You are walking in a sparse forest and you are lost. • You have an outdated cell phone without a GPS (w/o latitude&altitude). • You want to know if you are close to your friend. • You identify yourself at 100 meters from Bestbuy and 200 meters from a silver building etc. • If your friend is at similar distances from several of these landmarks, you might be close to one another. • Random projections are analogous to these distances to landmarks.

Random Projection inner product sketches time series random vector Sketch: A vector of output returned by random projection

Review: Sketch Guarantees * Johnson-Lindenstrauss ( JL) Lemma: • For any and any integer n, let k be a positive integer such that • Then for any set V of n points in , there is a map such that for all • Further this map can be found in randomized polynomial time • W.B.Johnson and J.Lindenstrauss. “Extensions of Lipshitz mapping into hilbert space”. Contemp. Math.,26:189-206,1984

Empirical study : sketch approximation Time series length=256 and sketch size=30

Empirical study : sketch approximation

Empirical study: sketch distance/real distance Sketch=30 Sketch=1000 Sketch=80

Data classification • Cooperative • Time series exhibiting a fundamental degree of regularity, allowing them to be represented by the first few coefficients in the spectral space with little loss of information • Example: Stock Price (random walk) • Tools: Fourier Transform, Wavelet Transform, SVD • Uncooperative • Time series whose energy is not concentrated in only a few frequency components, e.g. • Example: Stock Return (= ) • Tool: Random Projection

DFT on random walk and white noise Cooperative Uncooperative

Approximation Power: SVD Distance vs. Sketch Distance • Note: SVD is superior to DFT and DWT in approximation power. • But all of them are all bad for uncooperative data. • Here sketch size = 32 and SVD coefficient number =30

Our new algorithm* • The big picture of the system • Structured random vector (New) • Compute sketch by structured convolution (New) • Optimize in the parameter space (New) • Empirical study • Richard Cole, Dennis Shasha and Xiaojian Zhao. “Fast Window Correlations Over Uncooperative Time Series”. SIGKDD 2005

Big Picture time series 1 time series 2 time series 3 … time series n … sketch 1 sketch 2 … sketch n … Correlatedpairs Random Projection Grid structure Data Reduction Filtering

Our objective reminded • Monitor and report the correlation periodically e.g. “every half hour” • We chose Random Projection as a means to reduce the data dimension • The time series needs to be looked at in a time window. • This time window should slide forward as time goes on.

Definitions: Sliding window and Basic window Basic window (bw) Time point Stock 1 Stock 2 Stock 3 …… Stock n Sliding window size=8 Basic window size=2 Sliding window (sw) Time axis Example: Every half hour (bw) report the correlation of the last three hours (sw)

Random vector and naïve random projection • Choose randomly sw random numbers to form a random vector R=(r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12) • Inner product starts from each data point Xsk1=(x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12)*R Xsk2=(x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13)*R Xsk3=(x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14)*R …… • We improve it in two ways • Partition a random vector of length sw into several basic windows • Use convolution instead of inner product

How to construct a random vector Construct a random vector of 1/-1 of length sw. Suppose sliding window size=12, and basic window size=4 The random vector within a basic window is A control vector A final complete random vector for a sliding window may look like: (1 1 -1 1; -1 -1 1 -1; 1 1 -1 1) Here Rbw=(1 1 -1 1) b=(1 -1 1) Rbw -Rbw Rbw

Naive algorithm and hope for improvement r=( 1 1 -1 1 ; -1 -1 1 -1; 1 1 -1 1 ) x=(x1 x2 x3 x4; x5 x6 x7 x8; x9 x10 x11 x12) • There is redundancy in the second dot product given the first one. • We will eliminate the repeated computation to save time dot product xsk=r*x= x1+x2-x3+x4-x5-x6+x7-x8+x9+x10-x11+x12 With new data point arrival, this operation will be done again r= ( 1 1 -1 1 ; -1 -1 1 -1; 1 1 -1 1 ) x’=(x5 x6 x7 x8 ; x9 x10 x11 x12; x13 x14 x15 x16) * xsk=r*x’= x5+x6-x7+x8-x9-x10+x11+x12+x13+x14+x15- x16

Our algorithm • All the operations are over the basic window; • Pad with |bw-1| zeros, then convolve with Xbw conv1:(1 1 -1 1 0 0 0) (x1,x2,x3,x4) conv2:(1 1 -1 1 0 0 0) (x5,x6,x7,x8) conv3:(1 1 -1 1 0 0 0) (x9,x10,x11,x12) x4 x4+x3 Animation shows convolution in action: -x4+x3+x2 1 1 -1 1 0 0 0 x4-x3+x2+x1 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x3-x2+x1 x2-x1 x1

Our algorithm: example First Convolution Second Convolution Third Convolution x8 x8+x7 x6+x7-x8 x5+x6-x7+x8 x5-x6+x7 x6-x5 x5 x12 x12-x11 x10+x11-x12 x9+x10-x11+x12 x9-x10+x11 x10-x9 x9 x4 x4+x3 x2+x3-x4 x1+x2-x3+x4 x1-x2+x3 x2-x1 x1 + + xsk1= (x1+x2-x3+x4)-(x5+x6-x7+x8)+(x9+x10-x11+x12) xsk2=(x2+x3-x4+x5)-(x6+x7-x8+x9)+(x10+x11-x12+x13)

Our algorithm: example sk1=(x1+x2-x3+x4) sk5=(x5+x6-x7+x8) sk9=(x9+x10-x11+x12) xsk1= (x1+x2-x3+x4)-(x5+x6-x7+x8)+(x9+x10-x11+x12)b= ( 1 -1 1) First sliding window sk2=(x2+x3-x4) + (x5)sk6=(x6+x7-x8) + (x9)sk10=(x10+x11-x12) + (x13)Then sum up and we have xsk2=(x2+x3-x4+x5)-(x6+x7-x8+x9)+(x10+x11-x12+x13) b=( 1 -1 1) Second sliding window (Sk1 Sk5 Sk9)*(b1 b2 b3) * is inner product

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 1 1 –1 1 1 1 –1 1 1 1 –1 1 x5+x6-x7+x8 x9+x10-x11+x12 x1+x2-x3+x4 Basic window version • Or if time series are highly correlated between two consecutive data points, we may compute the sketch every basic window. • That is, we update the sketch for each time series only when data of a complete basic window arrive. No convolution, only inner product.

Overview of our new algorithm • The projection of a sliding window is decomposed into operations over basic windows • Each basic window is convolved/inner product with each random vector only once • We may provide the sketches starting from each data point or starts from the beginning of each basic window. • There is no redundancy.

Performance comparison • Naïve algorithm For each datum and random vector (1) O(|sw|) integer additions • Pointwise version Asymptotically for each datum and random vector (1) O(|sw|/|bw|) integer additions (2) O(log |bw|) floating point operations (use FFT in computing convolutions) • Basic window version Asymptotically for each datum and random vector O(|sw|/|bw|2) integer additions

Big picture revisited time series 1 time series 2 time series 3 … time series n … sketch 1 sketch 2 … sketch n … Correlatedpairs Random Projection Grid structure Filtering So far we reduce the data dimension efficiently. Next, how can it be used as a filter?

How to use the sketch distance as a filter • Naive method: compute the sketch distance: • Being close by sketch distance are likely to be close by originaldistance (JL Lemma) • Finally any close data pair will be double checked with the original data.

Use the sketch distance as a filter • But we do not use it, why? Expensive. • Since we still have to do the pairwise comparison between each pair of stocks which is , k is the size of the sketches, e.g. typically 30, 40, etc • Let’s see our new strategy

Our method: sketch unit distance Given sketches: We have If f distance chunks have we may say where: f: 30%, 40%, 50%, 60% …c: 0.8, 0.9, 1.1…

Further: sketch groups We may compute the sketch group: Remind us of a grid structure For example If f sketch groups have we may say

Grid structure • To avoid checking all pairs, we can use a grid structure and look in the neighborhood, this will return a super set of highly correlated pairs. • The data labeled as “potential” will be double checked using the raw data vectors.

Optimization in parameter space • How to choose the parameters g, c, f, N? • We will choose the best one to be applied to the practical data. But how? --- an engineering problem • Combinatorial Design (CD) • Bootstrapping N: total number of the sketches g: group size c: the factor of distance f: the fraction of groups which are necessary to claim that two time series are close enough Now, Let’s put all together.

X Y Z Inner product with random vectors r1,r2,r3,r4,r5,r6

Grid structure

Empirical study: various data sources • Cstr: Continuous stirred tank reactor • Fortal_ecg: Cutaneous potential recordings of a pregnant woman • Steamgen: Model of a steam generator at Abbott Power Plant in Champaign IL • Winding: Data from a test setup of an industrial winding process • Evaporator: Data from an industrial evaporator • Wind: Daily average wind speeds for 1961-1978 at 12 synoptic meteorological stations in the Republic of Ireland • Spot_exrates: The spot foreign currency exchange rates • EEG: Electroencepholgram

Empirical study:performance comparison Sliding window=3616, basic window=32 and sketch size=60

Section conclusion • How to perform data reduction over uncooperative time series efficiently in contrast to well-established methods for cooperative data • How to cope with middle-size sketch vectors systematically. • Sketch vector partition, grid structure • Parameter space optimization by combinatorial design and bootstrapping • Many ideas can be extended to other applications

Incremental Matching Pursuit (MP)

High Performance Algorithms for Multiple Streaming Time Series

High Performance Algorithms for Multiple Streaming Time Series

Presentation Transcript

Streaming Algorithms

A Symbolic Representation of Time Series, with Implications for Streaming Algorithms

Streaming Algorithms

Efficient Query Filtering for Streaming Time Series

Streaming Algorithms for Geometric Problems

Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks

Streaming Algorithms for Geometric Problems

High-Performance Computing 12.2: Parallel Algorithms

Streaming Algorithms

Sequencing algorithms for multiple machines

High Performance Correlation Techniques For Time Series

Clustering of Streaming Time Series is Meaningless

Sequencing algorithms for multiple machines

Streaming Algorithms for Geometric Problems

Streaming Algorithms for Geometric Problems

Efficient Query Filtering for Streaming Time Series

Streaming Pattern Discovery in Multiple Time-Series