The landmark model an instance selection method for time series data
1 / 17

The Landmark Model: An Instance Selection Method for Time Series Data - PowerPoint PPT Presentation

  • Uploaded on

The Landmark Model: An Instance Selection Method for Time Series Data. C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for Data Mining , Chapter 7, pp. 113-130 Cho, Dong-Yeon. Introduction. Complexity

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' The Landmark Model: An Instance Selection Method for Time Series Data' - varden

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The landmark model an instance selection method for time series data

The Landmark Model: An Instance Selection Method for Time Series Data

C.-S. Perng, S. R. Zhang, and D. S. Parker

Instance Selection and Construction for Data Mining, Chapter 7, pp. 113-130

Cho, Dong-Yeon

Introduction Series Data

  • Complexity

    • Patterns: continuous time series segments with particular features

    • The reflection of events in time series is better represented by patterns.

    • The complexity of processing patterns

      • The number of all possible segments for a time series of length N is N(N+1)/2.

      • A simple inspection of each of these segments takes O(N3).

    • Good instance selection algorithms are especially helpful here, since they can greatly reduce complexity by reducing the volume of data.

  • Similarity Model Series Data

    • Euclidian distance does not match human intuition.

      • 1,2,3,4,3 and 3,4,5,6,5

    • Previous works

      • None of these proposed techniques supports a similarity model that can both capture the similarity and support efficient pattern querying of time series.

  • Pattern Representation Series Data

    • Two formats for temporal association rules to verify the cause-effect relation

      • Forward association: C1,…,Cn E1,…,Em

      • Backward association: C1,…,Cn E1,…,Em

    • Association rules can be either formulated as hypotheses and verified with data, or be discovered by data mining process.

    • It is sill not clear what kind of segments can represented event.

      • What is the basic vocabulary for spelling association rule?

  • Noise Removal and Data Smoothing Series Data

    • Commonly-used smoothing techniques, such as moving averages, often lag or miss the most significant peaks and bottoms.

      • These peaks and bottoms can be very meaningful, and smoothing or removing them can lose a great deal of information.

    • Little previous work takes smoothing as an integral part of the process of pattern definition, index construction, and query processing.

The landmark data model and similarity model
The Landmark Data Model and Similarity Model Series Data

  • The Landmark Concept

    • Episodic memory: human and animals depend on landmarks in organizing their spatial memory

    • Landmarks: (times, events)

      • Using landmarks instead of the raw data for processing

      • N-th order landmark of a curve if the N-th order derivative is 0.

      • Local maxima, local minima, and inflection points

    • Tradeoff

      • The more different types of landmarks in use, the more accurately a time series will be represented.

      • Using fewer landmarks will result in storage savings and smaller index trees.

  • Stock market data Series Data

    • Almost half of the record

    • The normalized error is reasonably small when the curve is reconstructed from the landmarks.

    • The more volatile the time series, the less significant the higher-order landmarks.

  • Smoothing Series Data

    • Minimal Distance/Percentage Principle (MDPP)

      • A minimal distance D and a minimal percentage P

      • Remove landmarks (xi, yi) and (xi+1, yi+1) if

  • Transformations Series Data

    • Six kinds of transformations

      • Shifting: SHk(f) such that SHk(f(t))=f(t)+k where k is a constant.

      • Uniform Amplitude Scaling: UASk(f) such that UASk(f(t))=kf(t) where k is a constant.

      • Uniform Time Scaling: UTSk(f) such that UTSk(f(t))=f(kt) where k is a positive constant.

      • Uniform Bi-scaling: UBSk(f) such that UBSk(f(t))=kf(t/k) where k is a positive constant.

      • Time Warping: TWg(f) such that TWg(f(t))=f(g(t)) where g is a positive and monotonically increasing.

      • Non-uniform Amplitude Scaling: NASg(f) such that NASg(f(t))=g(t) where for every t, g´(t)=0 if and only if f´(t)=0.

  • Landmark Similarity transformations.

    • Dissimilarity measure

      • Given two sequences of landmarks L= L1,…,Ln and L´= L´1,…,L´n where Li=(xi, yi) and L´i=(x´i, y´i), the distance between the k-th landmark is defined by where

      • The distance between the two sequences is

      • We define

  • Given two time series sequences s1 and s2, let L1 and L2 be the landmark sequences after MDPP(D, P) smoothing.

  • (s1, s2)LMS if and only if |L1|=|L2| and there exist two parameterized transformations T1 and T2 of T whose dissimilarity satisfies time(T1(L1), T2(L2)) < time and amp(T1(L1), T2(L2)) < amp.

Data representation
Data Representation series segments defined by a 5-tuple

  • Family of Time Series Segments

    • Equivalent under the six transformations

      • Replacing naïve landmark coordinates with various features of landmarks that are invariant under these transformations

      • F = {y, h, v, hr, vr, vhr, pv} hi=xi-xi-1vi=yi-yi-1hri=hi+1/hivri=vi+1/vivhri=vi/ hipvi=vi/yi

    • Invariant features under transformations

Conclusion series segments defined by a 5-tuple

  • Landmark Model

    • An instance selection system for time series

    • This integrates similarity measures, data representation and smoothing techniques in a single framework.

      • Minimal Distance/Percentage Principle (MDPP): The smoothing method for the Landmark Model

    • This also supports a generalized similarity model which can ignore differences corresponding to six transformations.

    • Intuitive to human