1 / 17

The Landmark Model: An Instance Selection Method for Time Series Data - PowerPoint PPT Presentation

The Landmark Model: An Instance Selection Method for Time Series Data. C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for Data Mining , Chapter 7, pp. 113-130 Cho, Dong-Yeon. Introduction. Complexity

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'The Landmark Model: An Instance Selection Method for Time Series Data' - varden

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

The Landmark Model: An Instance Selection Method for Time Series Data

C.-S. Perng, S. R. Zhang, and D. S. Parker

Instance Selection and Construction for Data Mining, Chapter 7, pp. 113-130

Cho, Dong-Yeon

Introduction Series Data

• Complexity

• Patterns: continuous time series segments with particular features

• The reflection of events in time series is better represented by patterns.

• The complexity of processing patterns

• The number of all possible segments for a time series of length N is N(N+1)/2.

• A simple inspection of each of these segments takes O(N3).

• Good instance selection algorithms are especially helpful here, since they can greatly reduce complexity by reducing the volume of data.

• Similarity Model Series Data

• Euclidian distance does not match human intuition.

• 1,2,3,4,3 and 3,4,5,6,5

• Previous works

• None of these proposed techniques supports a similarity model that can both capture the similarity and support efficient pattern querying of time series.

• Pattern Representation Series Data

• Two formats for temporal association rules to verify the cause-effect relation

• Forward association: C1,…,Cn E1,…,Em

• Backward association: C1,…,Cn E1,…,Em

• Association rules can be either formulated as hypotheses and verified with data, or be discovered by data mining process.

• It is sill not clear what kind of segments can represented event.

• What is the basic vocabulary for spelling association rule?

• Noise Removal and Data Smoothing Series Data

• Commonly-used smoothing techniques, such as moving averages, often lag or miss the most significant peaks and bottoms.

• These peaks and bottoms can be very meaningful, and smoothing or removing them can lose a great deal of information.

• Little previous work takes smoothing as an integral part of the process of pattern definition, index construction, and query processing.

The Landmark Data Model and Similarity Model Series Data

• The Landmark Concept

• Episodic memory: human and animals depend on landmarks in organizing their spatial memory

• Landmarks: (times, events)

• Using landmarks instead of the raw data for processing

• N-th order landmark of a curve if the N-th order derivative is 0.

• Local maxima, local minima, and inflection points

• The more different types of landmarks in use, the more accurately a time series will be represented.

• Using fewer landmarks will result in storage savings and smaller index trees.

• Stock market data Series Data

• Almost half of the record

• The normalized error is reasonably small when the curve is reconstructed from the landmarks.

• The more volatile the time series, the less significant the higher-order landmarks.

• Smoothing Series Data

• Minimal Distance/Percentage Principle (MDPP)

• A minimal distance D and a minimal percentage P

• Remove landmarks (xi, yi) and (xi+1, yi+1) if

• Transformations Series Data

• Six kinds of transformations

• Shifting: SHk(f) such that SHk(f(t))=f(t)+k where k is a constant.

• Uniform Amplitude Scaling: UASk(f) such that UASk(f(t))=kf(t) where k is a constant.

• Uniform Time Scaling: UTSk(f) such that UTSk(f(t))=f(kt) where k is a positive constant.

• Uniform Bi-scaling: UBSk(f) such that UBSk(f(t))=kf(t/k) where k is a positive constant.

• Time Warping: TWg(f) such that TWg(f(t))=f(g(t)) where g is a positive and monotonically increasing.

• Non-uniform Amplitude Scaling: NASg(f) such that NASg(f(t))=g(t) where for every t, g´(t)=0 if and only if f´(t)=0.

• Landmark Similarity transformations.

• Dissimilarity measure

• Given two sequences of landmarks L= L1,…,Ln and L´= L´1,…,L´n where Li=(xi, yi) and L´i=(x´i, y´i), the distance between the k-th landmark is defined by where

• The distance between the two sequences is

• We define

• Given two time series sequences s1 and s2, let L1 and L2 be the landmark sequences after MDPP(D, P) smoothing.

• (s1, s2)LMS if and only if |L1|=|L2| and there exist two parameterized transformations T1 and T2 of T whose dissimilarity satisfies time(T1(L1), T2(L2)) < time and amp(T1(L1), T2(L2)) < amp.

Data Representation series segments defined by a 5-tuple

• Family of Time Series Segments

• Equivalent under the six transformations

• Replacing naïve landmark coordinates with various features of landmarks that are invariant under these transformations

• F = {y, h, v, hr, vr, vhr, pv} hi=xi-xi-1vi=yi-yi-1hri=hi+1/hivri=vi+1/vivhri=vi/ hipvi=vi/yi

• Invariant features under transformations

Conclusion series segments defined by a 5-tuple

• Landmark Model

• An instance selection system for time series

• This integrates similarity measures, data representation and smoothing techniques in a single framework.

• Minimal Distance/Percentage Principle (MDPP): The smoothing method for the Landmark Model

• This also supports a generalized similarity model which can ignore differences corresponding to six transformations.

• Intuitive to human