1 / 17

Time Series Filtering

Time Series Filtering. Matches Q11. Time Series. 1. 5. 9. 2. 6. 10. Given a Time Series T , a set of Candidates C and a distance threshold r , find all subsequences in T that are within r distance to any of the candidates in C. 11. 3. 7. 12. 4. 8. Candidates.

Download Presentation

Time Series Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time Series Filtering Matches Q11 Time Series 1 5 9 2 6 10 Given a Time Series T, a set of Candidates Cand a distance threshold r, find all subsequences in T that are within r distance to any of the candidates in C. 11 3 7 12 4 8 Candidates

  2. Filtering vs. Querying Query (template) Database Database Matches Q11 Best match 1 5 9 6 1 7 2 6 10 2 8 3 11 3 7 9 4 12 4 8 10 5 Database Queries

  3. C 0 10 20 30 40 50 60 70 80 90 100 Q Euclidean Distance Metric Given two time series Q = q1…qn and C = c1…cn , their Euclidean distance is defined as:

  4. C Q calculation abandoned at this point 0 10 20 30 40 50 60 70 80 90 100 Early Abandon During the computation, if current sum of the squared differences between each pair of corresponding data points exceeds r 2, we can safely stop the calculation.

  5. Classic Approach Time Series 1 5 9 2 6 10 Individually compare each candidate sequence to the query using the early abandoning algorithm. 11 3 7 12 4 8 Candidates

  6. U L Wedge Having candidate sequences C1, .. , Ck , we can form two new sequences U and L : Ui = max(C1i , .. , Cki ) Li = min(C1i , .. , Cki ) They form the smallest possible bounding envelope that encloses sequences C1, .. ,Ck . We call the combination of U and L a wedge, and denote a wedge as W. W = {U, L} A lower bounding measure between an arbitrary query Q and the entire set of candidate sequences contained in a wedge W: C1 C2 U W L W Q

  7. C1 (or W1 ) C2 (or W2 ) C3 (or W3 ) W(1, 2) Generalized Wedge • Use W(1,2) to denote that a wedge is built from sequences C1 and C2 . • Wedges can be hierarchally nested. For example, W((1,2),3) consists of W(1,2) and C3 . W((1, 2), 3)

  8. H-Merge Time Series 1 5 9 • Compare the query to the wedge using LB_Keogh • If the LB_Keogh function early abandons, we are done • Otherwise individually compare each candidate sequences to the query using the early abandoning algorithm 2 6 10 11 3 7 12 4 8 Candidates

  9. W3 W3 W3 W2 W(2,5) W(2,5) W5 W1 W1 W(1,4) W4 W4 W((2,5),3) W(((2,5),3), (1,4)) W(1,4) K = 5 K = 4 K = 3 K = 2 K = 1 Hierarchal Clustering C3 (or W3) C5 (or W5) C2 (or W2) C4 (or W4) C1 (or W1) Which wedge set to choose ?

  10. Which Wedge Set to Choose ? • Test all k wedge sets on a representative sample of data • Choose the wedge set which performs the best

  11. C1 (or W1 ) C2 (or W2 ) C3 (or W3 ) W(1, 2) Upper Bound on H-Merge • Wedge based approach seems to be efficient when comparing a set of time series to a large batch dataset. • But, what about streaming time series ? • Streaming algorithms are limited by their worst case. • Being efficient on average does not help. • Worst case Subsequence W((1, 2), 3)

  12. W3 W3 W3 W2 W(2,5) W(2,5) W5 W1 W1 W(1,4) W4 W4 Triangle Inequality Ifdist(W((2,5),3), W(1,4)) >= 2 r Subsequence W((2,5),3) W((2,5),3) < r W(((2,5),3), (1,4)) >= 2r ? W(1,4) K = 5 K = 4 K = 3 K = 2 K = 1 W(1,4) fails cannot fail on both wedges

  13. Experimental Setup • Datasets • ECG Dataset • Stock Dataset • Audio Dataset • We measure the number of computational steps used by the following methods: • Brute force • Brute force with early abandoning (classic) • Our approach (H-Merge) • Our approach with random wedge set (H-Merge-R)

  14. Experimental Results: ECG Dataset • Batch time series • 650,000 data points (half an hour’s ECG signals) • Candidate set • 200 time series of length 40 • r = 0.5 9 x10 6 brute force 5 4 Number of Steps 3 2 1 classic H-Merge-R H-Merge 0 Algorithms

  15. Experimental Results: Stock Dataset • Batch time series • 2,119,415 data points • Candidate set • 337 time series with length 128 • r = 4.3 10 brute force x 10 10 9 8 7 6 Number of Steps 5 4 3 classic H-Merge-R 2 H-Merge 1 0 Algorithms

  16. Experimental Results: Audio Dataset • Batch time series • 46,143,488 data points (one hour’s sound) • Candidate set • 68 time series with length 101 • r = 4.14 • Sliding window • 11,025 (1 second) • Step • 5,512 (0.5 second) brute force 7 x 10 6 5 4 Number of Steps 3 2 1 H-Merge-R classic H-Merge 0 Algorithms

  17. Experimental Results: Sorting • Wedge • with length 1,000 • Random walk time series • with length 65,536

More Related