Time series shapelets a new primitive for data mining
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Time Series Shapelets: A New Primitive for Data Mining PowerPoint PPT Presentation


  • 265 Views
  • Uploaded on
  • Presentation posted in: General

Time Series Shapelets: A New Primitive for Data Mining. Lexiang Ye and Eamonn Keogh University of California, Riverside. Classification. Classification Huge interest in time series Extensive applications Nearest Neighbor Most accurate (in extensive empirical tests) Robust Simple.

Download Presentation

Time Series Shapelets: A New Primitive for Data Mining

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Time series shapelets a new primitive for data mining

Time Series Shapelets: A New Primitive for Data Mining

Lexiang Ye and Eamonn Keogh

University of California, Riverside


Classification

Classification

  • Classification

    • Huge interest in time series

    • Extensive applications

  • Nearest Neighbor

    • Most accurate (in extensive empirical tests)

    • Robust

    • Simple


Drawback of the nn

Drawback of the NN

  • Time and space complexity

  • Results are not interpretable


Solution

Solution

  • Shapelets

    • shapelets are time series subsequences which are maximally representative of a class

    • Distinguishing substring selection

    • Probe design (computational biology)


Motivating example

Motivating example


Time series shapelets a new primitive for data mining

false nettles

stinging nettles

false nettles

Shapelet Dictionary

I

Shapelet

5.1

Leaf Decision Tree

I

yes

no

0

1

false nettles

stinging nettles

stinging nettles

false nettles


Brute force algorithm

Brute-Force Algorithm


Extract subsequences of all possible lengths

Extract subsequences of all possible lengths

Candidates Pool

ca

. . .


Testing the utility of a candidate shapelet

Testing the utility of a candidate shapelet

  • Information gain

  • Arrange the time series objects

  • Find the optimal split point

  • Pick the candidate achieving best utility as the shapelet

candidate

Split Point

0


Problem

Candidates Pool

Problem

  • Total number of candidate

  • Trace dataset

    • 200 instances, each of length 275

    • 7,480,200 shapelet candidates

    • approximately three days

. . .


Speedup

Speedup

  • Distance calculations from time series objects to shapelet candidates are the most expensive part

  • Reduce the time in two ways

    • Distance Early Abandon (known idea)

    • Admissible Entropy Pruning (novel idea)


Admissible entropy pruning

Admissible Entropy Pruning


Admissible entropy pruning1

Admissible Entropy Pruning

  • Information Gain

    • Traditional evaluation in decision tree

    • Easily generalized to the multi-class problem

    • Reduce the number of distance calculations


Time series shapelets a new primitive for data mining

stinging nettles

false nettles

0


Time series shapelets a new primitive for data mining

I=0.42

I= 0.29

0

0


Time series shapelets a new primitive for data mining

false nettles

stinging nettles

false nettles

Shapelet Dictionary

I

Shapelet

5.1

Classification

Leaf Decision Tree

I

yes

no

0

1

false nettles

stinging nettles

stinging nettles

false nettles


Experimental evaluation

EXPERIMENTAL EVALUATION


Performance comparison

Performance Comparison

5 *105

1.00

Brute Force

4 *105

0.95

3 *105

seconds

accuracy

0.90

2 *105

Currently best published accuracy 91.1%

Pruning

0.85

1 *105

0

0.80

160

10

20

40

80

10

20

40

80

320

160

|D|, the number of objects in the database

|D|, the number of objects in the database


Projectile points

Projectile Points


Time series shapelets a new primitive for data mining

Arrowhead Decision Tree

I

II

0

2

1

Avonlea

Clovis

1.0

(Clovis)

11.24

I

0

(Avonlea)

85.47

II

Shapelet Dictionary

0

200

400


Wheat spectrography

Wheat Spectrography

1

0.5

0

0

200

400

600

800

1000

1200

one sample from each class


Time series shapelets a new primitive for data mining

I

V

II

III

IV

VI

2

4

0

1

3

6

5

Shapelet Dictionary

I

0.4

II

0.3

III

0.2

IV

0.1

0.0

V

VI

300

0

100

200

Wheat Decision Tree


The gun nogun problem

the Gun/NoGun Problem

No Gun

Gun

(No Gun)

2

38.94

I

0

Shapelet Dictionary

0

50

100

Gun Decision Tree

I

1

0


Gait analysis

Gait Analysis


Reduces the sensitivity of alignment

0

100

200

300

Reduces the sensitivity of alignment

1.0

0

0.909

0.902

0.860

right toe

144.075

I

left toe

(Normal Walk)

Walk Decision Tree

I

0.535

0

1


Conclusions

Conclusions

  • Interpretable results

  • more accurate/robust

  • significantly faster at classification


Time series shapelets a new primitive for data mining

Thank You 

Question?

  • All of the datasets are free to download http://www.cs.ucr.edu/~lexiangy/shapelet.html

  • Code available upon request


  • Login