810 likes | 957 Views
High Performance Correlation Techniques For Time Series. Xiaojian Zhao Department of Computer Science Courant Institute of Mathematical Sciences New York university 25 Oct. 2004. Roadmap. Section 1: Introduction Motivation Problem Statement Section 2 : Background GEMINI framework
E N D
High Performance Correlation Techniques For Time Series Xiaojian Zhao Department of Computer Science Courant Institute of Mathematical Sciences New York university 25 Oct. 2004
Roadmap Section 1: Introduction • Motivation • Problem Statement Section 2 : Background • GEMINI framework • Random Projection • Grid Structure • Some Definitions • Naive method and Yunyue’s Approach Section 3 : Sketch based StatStream • Efficient Sketch Computation • Sketch technique as a filter • Parameter selection • Grid structure • System Integration Section 4 : Empirical Study Section 5 : Future Work Section 6 : Conclusion
Motivation • Stock prices streams • The New York Stock Exchange (NYSE) • 50,000 securities (streams); 100,000 ticks (trade and quote) • Pairs Trading, a.k.a. Correlation Trading • Query:“which pairs of stocks were correlated with a value of over 0.9 for the last three hours?” XYZ and ABC have been correlated with a correlation of 0.95 for the last three hours. Now XYZ and ABC become less correlated as XYZ goes up and ABC goes down. They should converge back later. I will sell XYZ and buy ABC …
Correlated! Correlated! Online Detection of High Correlation
Why speed is important • As processors speed up, algorithmic efficiency no longer matters … one might think. • True if problem sizes stay same but they don’t. • As processors speed up, sensors improve --satellites spewing out a terabyte a day, magnetic resonance imagers give higher resolution images, etc.
Problem Statement • Detect and report the correlation rapidly and accurately • Expand the algorithm into a general engine • Apply them in many practical application domains
Big Picture time series 1 time series 2 time series 3 … time series n … sketch 1 sketch 2 … sketch n … Correlatedpairs Random Projection Grid structure
GEMINI framework* DFT, DWT, etc * Faloutsos, C., Ranganathan, M. & Manolopoulos, Y. (1994). Fast subsequence matching in time-series databases. In proceedings of the ACM SIGMOD Int'l Conference on Management of Data. Minneapolis, MN, May 25-27. pp 419-429.
Goals of GEMINI framework • High performance Operations on synopses will save time such as distance computation • Guarantee no false negative Feature Space shrinks the original distances in the raw data space .
Random Projection: Intuition • You are walking in a sparse forest and you are lost. • You have an outdated cell phone without a GPS. • You want to know if you are close to your friend. • You identify yourself at 100 meters from the pointy rock and 200 meters from the giant oak etc. • If your friend is at similar distances from several of these landmarks, you might be close to one another. • The sketches are the set of distances to landmarks.
How to make Random Projection* • Sketch pool: A list of random vectors drawn from stable distribution (like the landmarks) • Project the time series into the space spanned by these random vectors • The Euclidean distance (correlation) between time series is approximated by the distance between their sketches with a probabilistic guarantee. • W.B.Johnson and J.Lindenstrauss. “Extensions of Lipshitz mapping into hilbert space”. Contemp. Math.,26:189-206,1984
Random Projection X’ relative distances X’ current position Rocks, buildings… inner product Y’ relative distances Y’ current position random vector sketches raw time series
Sketch Guarantees • Note: Sketches do not provide approximations of individual time series window but help make comparisons. Johnson-Lindenstrauss Lemma: • For any and any integer n, let k be a positive integer such that • Then for any set V of n points in , there is a map such that for all • Further this map can be found in randomized polynomial time
Sketches : Random Projection Why we use sketches or random projections? To reduce the dimensionality! For example: The original time series x is of the length 256, we may represent it with a sketch vector of length 30. First step to removing “the curse of dimensionality”
Achliptas’s lemma • Dimitris Achliptas proved that Let P be an arbitrary set of n points in , represented as an matrix A. Given , let For integer , let R be a random matrix with R(i;j)= , where { } are independent random variables from either one of the following two probability distributions shown in next slide: *Idea from Dimitris Achlioptas, “Database-friendly Random Projections”, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Achliptas’s lemma or Let Let map the row of A to the row of E. With a probability at least , for all *Idea from Dimitris Achlioptas, “Database-friendly Random Projections”, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Definition: Sketch Distance Note: DFT, DWT distance are analogous. For those measures, the difference between the original vectors is approximated by the difference between the first Fourier/Wavelet coefficients of those vectors.
Empirical Study: sketch distance/real distance Sketch=30 Sketch=1000 Sketch=80
Correlation and Distance • There is relationship between Euclidean distance and Pearson correlation • Normalization • dist2=2(1- correlation)
How to compute the correlation efficiently? Goal: To find the most highly correlated stream pairs over sliding windows • Naive method • Statstream method • Our method
Naïve Approach • Space and time cost • Space O(N) and time O(N2sw) • N : number of streams • sw : size of sliding window. • Let’s see Statstream approach
Definitions: Sliding window and Basic window Time point Basic window Stock 1 Stock 2 Stock 3 …… Stock n Sliding window size=8 Basic window size=2 Sliding window Time axis
StatStream Idea • Use Discrete Fourier Transform(DFT) to approximate correlation as in the GEMINI approach discussed earlier. • Every two minutes (“basic window size”), update the DFT for each time series over the last hour (“sliding window size”) • Use a grid structure to filter out unlikely pairs
Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs Time point Basic window StatStream: Stream synoptic data structure Sliding window
Problem not yet solved • DFT approximates the price-like data type very well. Gives a poor approximation for returns(today’s price – yesterday’s price)/yesterday’s price. • Return is more like white noise which contains all frequency components. • DFT uses the first n (e.g. 10) coefficients in approximating data, which is insufficient in the case of white noise.
Big Picture Revisited time series 1 time series 2 time series 3 … time series n … sketch 1 sketch 2 … sketch n … Correlatedpairs Random Projection Grid structure Random Projection: inner product between Data Vector and random vector
How to compute the sketch efficiently We will not compute the inner product at each data point because the computation is expensive. A new strategy, in joint work with Richard Cole, is used to compute the sketch. Here the random variable will be drawn from:
How to construct the random vector: Given time series , compute its sketch for a window of size sw=12. Partition to smaller basic windows of size bw = 4. The random vector within a basic window is R and a control vector b is used to determine which basic window will be multiplied with –1 or 1 (Why? Wait…) A final complete random vector may look like: Here bw=(1 1 -1 1) b=(1 -1 1) (1 1 -1 1; -1 -1 1 -1; 1 1 -1 1)
Naive algorithm and hope for improvement r=(1 1 -1 1; -1 -1 1 -1; 1 1 -1 1) x=(x1 x2 x3 x4; x5 x6 x7 x8; x9 x10 x11 x12) • There is redundancy in the second dot product given the first one. • We will eliminate the repeated computation to save time dot product xsk=r*x= x1+x2-x3+x4-x5-x6+x7-x8+x9+x10-x11+x12 With new data point arrival, such operations will be done again r=(1 1 -1 1; -1 -1 1 -1; 1 1 -1 1) x’=(x5 x6 x7 x8 ; x9 x10 x11 x12; x13 x14 x15 x16) * xsk=r*x’= x5+x6-x7+x8-x9-x10+x11+x12+x13+x14+x15- x16
conv1:(1 1 -1 1 0 0 0 0) (x1,x2,x3,x4) conv2:(1 1 -1 1 0 0 0 0) (x5,x6,x7,x8) conv3:(1 1 -1 1 0 0 0 0) (x9,x10,x11,x12) Our algorithm(Pointwise version) Convolve with corresponding after padding with |bw| zeros. x4 x4+x3 Animation shows convolution in action: -x4+x3+x2 1 1 -1 1 0 0 0 0 x4-x3+x2+x1 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x3-x2+x1 x2-x1 x1
Our algorithm: example First Convolution Second Convolution Third Convolution x8 x8+x7 x6+x7-x8 x5+x6-x7+x8 x5-x6+x7 x6-x5 x5 x12 x12-x11 x10+x11-x12 x9+x10-x11+x12 x9-x10+x11 x10-x9 x9 x4 x4+x3 x2+x3-x4 x1+x2-x3+x4 x1-x2+x3 x2-x1 x1 + +
Our algorithm: example sk1=(x1+x2-x3+x4) sk5=(x5+x6-x7+x8) sk9=(x9+x10-x11+x12) xsk1= (x1+x2-x3+x4)-(x5+x6-x7+x8)+(x9+x10-x11+x12)b= ( 1 -1 1) First sliding window sk2=(x2+x3-x4) + (x5)sk6=(x6+x7-x8) + (x9)sk10=(x10+x11-x12) + (x13)Then sum up and we have xsk2=(x2+x3-x4+x5)-(x6+x7-x8+x9)+(x9+x10-x11+x12) b=( 1 -1 1) Second sliding window (Sk1 Sk5 Sk9)*(b1 b2 b3) * is inner product
Our algorithm • The projection of a sliding window is decomposed into operations over basic windows • Each basic window is convolved with each random vector only once • We may provide the sketches incrementally starting from each data point. • There is no redundancy.
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 1 1 –1 1 1 1 –1 1 1 1 –1 1 x5+x6-x7+x8 x9+x10-x11+x12 x1+x2-x3+x4 Jump by a basic window (basic window version) • Or if time series are highly correlated between two consecutive data points, we may compute the sketch every other basic window. • That is, we update the sketch for each time series only when data of a complete basic window arrive.
Online Version • We take the basic window version for instance • Review: To have the same baseline we normalize the time series within each siding window. • Challenge: The normalization of the time series change over each basic window
Online Version • Its incremental computation nature results in a update of the average and variance whenever a new basic window enters • Do we have to compute the normalization and thus the sketch whenever a new basic window enters? • Of course not. Otherwise our algorithm will degrade into the trivial computation
Sum of the whole sliding window Sum of the square of each data in a sliding window Sum of the whole basic window Sum of the square of each data in a basic window Dot Product of random vector with each basic window Online Version • Then how? After mathematical manipulation, we claim that we only need store and maintain the following quantities
Performance comparison • Naïve algorithm For each datum and random vector O(|sw|) integer additions • Pointwise version Asymptotically for each datum and random vector (1) O(|sw|/|bw|) integer additions (2) O(log |bw|) floating point operations (use FFT in computing convolutions) • Basic window version Asymptotically for each basic window and random vector (1) O(|sw|/|bw|) integer additions (2) O(|bw|) floating point operations
Sketch distance filter quality • We may use the sketch distance to filter the unlikely data pairs • How accurate is it? • How is it compared to DFT and DWT distance in terms of the approximation ability?
Empirical Study: Sketch sketch compared to DFT and DWT distance • Data length=256 • DFT: the first 14 DFT coefficients are used in the distance computation, • DWT: db2 wavelet is used with coefficient size=16 • Sketch: the random vector number is 64
Use the sketch distance as a filter • We may compute the sketch distance: • c could be 1.2 or larger to reduce the number of false negatives. • Finally any possible data point will be double checked with the raw data.
Use the sketch distance as a filter • But we will not use it, why? Expensive. • Since we still have to do the pairwise comparison between each pair of stocks which is , k is the size of the sketches