1 / 30

# Time Series Sequence Matching - PowerPoint PPT Presentation

Time Series Sequence Matching. Jiaqin Wang CMPS 565. Papers. “ Fast subsequence Matching in time-series database ” Christos Faloutsos, M.Ranganathan Yannis Manolopoulos “ Skyline index for time series data ” Quanzhong Li, Ines Fernando Vega Lopez, Bongki Moon. Types of Time Series sequence.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Time Series Sequence Matching' - elam

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Time Series Sequence Matching

Jiaqin Wang

CMPS 565

• “Fast subsequence Matching in time-series database”Christos Faloutsos, M.Ranganathan Yannis Manolopoulos

• “Skyline index for time series data”Quanzhong Li, Ines Fernando Vega Lopez, Bongki Moon

• Financial, marketing area

• Stock prices

• Sales numbers

• Scientific databases

• Weather data

• Environmental data

• Whole matching

• data sequences and query sequence have the same length

• Subsequence matching

• Query sequence and data sequence have different length

• Given N sequences with the same length l

• Use features extraction function to convert sequences into n-dimensional values

• DFT

• N-dimensional value (Q1,Q2,…,Qn)

• Most energy in first few coefficients

• Keep first few coefficients

• Reduce dimensions of sequence

• Map each sequence as a n-dimensional point into the feature space

• Only take first 2 coefficients

• Organize these points into R-tree

• For index and search in R-tree

• New coming query sequence

• Use DFT convert to feature point

• Map the query feature point into feature space

• Find out points whose distance to query point within tolerance e

• Consider them similar

• Discrete Fourier Transform (DFT )

• keep first few (2-3) coefficients

• The first few coefficients contain most energy of the feature

• TS1(0.05,3)

• TS2(0.01,12)

• ……

• The distance e < minimum query distance

• A collection of N sequences, each one has different length

• A query Q with tolerance e

• Find out all sequence Sі(1<i<N), along with the correct offsets k,such that the sequence Sі[k:k+Len(Q)-1] matches the query sequence: D(Q, Sі[k:k+Len(Q)-1] ) <= e

• Assuming the minimum query length w

• Using a sliding window of size w and place it on the date sequence at every possible offsets of the whole data sequences

• Extract the features in window at each possible offset and map each feature as a point into feature space

• Sliding window on sequence from offset 0 to Len(S)-w+1

• The length of window is w

• Sliding window on sequence from offset 0 to Len(S)-w+1

• The length of window is w

• Sliding window on sequence from offset 0 to Len(S)-w+1

• The length of window is w

• Sliding window on sequence from offset 0 to Len(S)-w+1

• The length of window is w

• Sliding window on sequence from offset 0 to Len(S)-w+1

• The length of window is w

• A series of points in the feature space is curve

• R-tree

• Store points in R-tree is inefficient

• Divide trial into sub-trials using minimum bounding rectangles (MBRs)

• Combine small MBRs

• Get the index information

• Group the points into MBR with a fixed-number

• Group the points into MBR with a variable-number

• One greedy algorithm

• number of disk access

• cost function

• average cost function

• Assign the first point of the trail in a sub-trail

• For each successive point

• If it increases the average cost of current sub-trail

• Then start another sub-trail

• Else include this point in current sub-trial

• “Skyline index for time series data”Quanzhong Li, Ines Fernando Vega Lopez, Bongki Moon

• What is APCA?

• Limitation of APCA

• Internal overlap in MBRs

• SBR

• N time series data objects of length l

• Specify 2-dimensional regions by top and bottom skylines

• Many approaches

• Equal-length constant-valued segments

• Variance-length constant-valued segments

• ASBR will cover the original SBR

• R-Tree based Skyline index

• Internal node

• Approximation SBR

• Pointer to child node

• Leaf node

• Pointer to time series data

Thank You