Loading in 5 sec....

The Landmark Model: An Instance Selection Method for Time Series DataPowerPoint Presentation

The Landmark Model: An Instance Selection Method for Time Series Data

- 100 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' The Landmark Model: An Instance Selection Method for Time Series Data' - varden

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### The Landmark Model: An Instance Selection Method for Time Series Data

C.-S. Perng, S. R. Zhang, and D. S. Parker

Instance Selection and Construction for Data Mining, Chapter 7, pp. 113-130

Cho, Dong-Yeon

Introduction Series Data

- Complexity
- Patterns: continuous time series segments with particular features
- The reflection of events in time series is better represented by patterns.
- The complexity of processing patterns
- The number of all possible segments for a time series of length N is N(N+1)/2.
- A simple inspection of each of these segments takes O(N3).

- Good instance selection algorithms are especially helpful here, since they can greatly reduce complexity by reducing the volume of data.

- Similarity Model Series Data
- Euclidian distance does not match human intuition.
- 1,2,3,4,3 and 3,4,5,6,5

- Previous works
- None of these proposed techniques supports a similarity model that can both capture the similarity and support efficient pattern querying of time series.

- Euclidian distance does not match human intuition.

- Pattern Representation Series Data
- Two formats for temporal association rules to verify the cause-effect relation
- Forward association: C1,…,Cn E1,…,Em
- Backward association: C1,…,Cn E1,…,Em

- Association rules can be either formulated as hypotheses and verified with data, or be discovered by data mining process.
- It is sill not clear what kind of segments can represented event.
- What is the basic vocabulary for spelling association rule?

- Two formats for temporal association rules to verify the cause-effect relation

- Noise Removal and Data Smoothing Series Data
- Commonly-used smoothing techniques, such as moving averages, often lag or miss the most significant peaks and bottoms.
- These peaks and bottoms can be very meaningful, and smoothing or removing them can lose a great deal of information.

- Little previous work takes smoothing as an integral part of the process of pattern definition, index construction, and query processing.

- Commonly-used smoothing techniques, such as moving averages, often lag or miss the most significant peaks and bottoms.

The Landmark Data Model and Similarity Model Series Data

- The Landmark Concept
- Episodic memory: human and animals depend on landmarks in organizing their spatial memory
- Landmarks: (times, events)
- Using landmarks instead of the raw data for processing
- N-th order landmark of a curve if the N-th order derivative is 0.
- Local maxima, local minima, and inflection points

- Tradeoff
- The more different types of landmarks in use, the more accurately a time series will be represented.
- Using fewer landmarks will result in storage savings and smaller index trees.

- Stock market data Series Data
- Almost half of the record
- The normalized error is reasonably small when the curve is reconstructed from the landmarks.
- The more volatile the time series, the less significant the higher-order landmarks.

- Smoothing Series Data
- Minimal Distance/Percentage Principle (MDPP)
- A minimal distance D and a minimal percentage P
- Remove landmarks (xi, yi) and (xi+1, yi+1) if

- Minimal Distance/Percentage Principle (MDPP)

- The effect of the MDPP Series Data

- Normalized error generated by the MDPP and DFT Series Data

- Transformations Series Data
- Six kinds of transformations
- Shifting: SHk(f) such that SHk(f(t))=f(t)+k where k is a constant.
- Uniform Amplitude Scaling: UASk(f) such that UASk(f(t))=kf(t) where k is a constant.
- Uniform Time Scaling: UTSk(f) such that UTSk(f(t))=f(kt) where k is a positive constant.
- Uniform Bi-scaling: UBSk(f) such that UBSk(f(t))=kf(t/k) where k is a positive constant.
- Time Warping: TWg(f) such that TWg(f(t))=f(g(t)) where g is a positive and monotonically increasing.
- Non-uniform Amplitude Scaling: NASg(f) such that NASg(f(t))=g(t) where for every t, g´(t)=0 if and only if f´(t)=0.

- Six kinds of transformations

- The more transformation included in a similarity model, the more powerful the similarity model.

- These transformations can be composed to form new transformations.
- The composition order is flexible:
- The composition is idempotent:

- Two time series are defined to be similar if they differ only by a transform.

- Landmark Similarity transformations.
- Dissimilarity measure
- Given two sequences of landmarks L= L1,…,Ln and L´= L´1,…,L´n where Li=(xi, yi) and L´i=(x´i, y´i), the distance between the k-th landmark is defined by where
- The distance between the two sequences is
- We define

- Dissimilarity measure

- A land mark similarity measure is a binary relation on time series segments defined by a 5-tuple LSM=D,P,T,time,amp.

- Given two time series sequences s1 and s2, let L1 and L2 be the landmark sequences after MDPP(D, P) smoothing.
- (s1, s2)LMS if and only if |L1|=|L2| and there exist two parameterized transformations T1 and T2 of T whose dissimilarity satisfies time(T1(L1), T2(L2)) < time and amp(T1(L1), T2(L2)) < amp.

Data Representation series segments defined by a 5-tuple

- Family of Time Series Segments
- Equivalent under the six transformations
- Replacing naïve landmark coordinates with various features of landmarks that are invariant under these transformations
- F = {y, h, v, hr, vr, vhr, pv} hi=xi-xi-1vi=yi-yi-1hri=hi+1/hivri=vi+1/vivhri=vi/ hipvi=vi/yi

- Invariant features under transformations

- Equivalent under the six transformations

Conclusion series segments defined by a 5-tuple

- Landmark Model
- An instance selection system for time series
- This integrates similarity measures, data representation and smoothing techniques in a single framework.
- Minimal Distance/Percentage Principle (MDPP): The smoothing method for the Landmark Model

- This also supports a generalized similarity model which can ignore differences corresponding to six transformations.
- Intuitive to human

Download Presentation

Connecting to Server..