Loading in 5 sec....

Time Series Shapelets: A New Primitive for Data MiningPowerPoint Presentation

Time Series Shapelets: A New Primitive for Data Mining

- 331 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Time Series Shapelets: A New Primitive for Data Mining' - debra-mcintosh

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Time Series Shapelets: A New Primitive for Data Mining

Lexiang Ye and Eamonn Keogh

University of California, Riverside

Classification

- Classification
- Huge interest in time series
- Extensive applications

- Nearest Neighbor
- Most accurate (in extensive empirical tests)
- Robust
- Simple

Drawback of the NN

- Time and space complexity
- Results are not interpretable

Solution

- Shapelets
- shapelets are time series subsequences which are maximally representative of a class
- Distinguishing substring selection
- Probe design (computational biology)

stinging nettles

false nettles

Shapelet Dictionary

I

Shapelet

5.1

Leaf Decision Tree

I

yes

no

0

1

false nettles

stinging nettles

stinging nettles

false nettles

Testing the utility of a candidate shapelet

- Information gain

- Arrange the time series objects
- Find the optimal split point
- Pick the candidate achieving best utility as the shapelet

candidate

Split Point

0

Problem

- Total number of candidate
- Trace dataset
- 200 instances, each of length 275
- 7,480,200 shapelet candidates
- approximately three days

. . .

Speedup

- Distance calculations from time series objects to shapelet candidates are the most expensive part
- Reduce the time in two ways
- Distance Early Abandon (known idea)
- Admissible Entropy Pruning (novel idea)

Admissible Entropy Pruning

- Information Gain
- Traditional evaluation in decision tree
- Easily generalized to the multi-class problem
- Reduce the number of distance calculations

stinging nettles

false nettles

Shapelet Dictionary

I

Shapelet

5.1

Classification

Leaf Decision Tree

I

yes

no

0

1

false nettles

stinging nettles

stinging nettles

false nettles

Performance Comparison

5 *105

1.00

Brute Force

4 *105

0.95

3 *105

seconds

accuracy

0.90

2 *105

Currently best published accuracy 91.1%

Pruning

0.85

1 *105

0

0.80

160

10

20

40

80

10

20

40

80

320

160

|D|, the number of objects in the database

|D|, the number of objects in the database

I

II

0

2

1

Avonlea

Clovis

1.0

(Clovis)

11.24

I

0

(Avonlea)

85.47

II

Shapelet Dictionary

0

200

400

V

II

III

IV

VI

2

4

0

1

3

6

5

Shapelet Dictionary

I

0.4

II

0.3

III

0.2

IV

0.1

0.0

V

VI

300

0

100

200

Wheat Decision Tree

the Gun/NoGun Problem

No Gun

Gun

(No Gun)

2

38.94

I

0

Shapelet Dictionary

0

50

100

Gun Decision Tree

I

1

0

100

200

300

Reduces the sensitivity of alignment1.0

0

0.909

0.902

0.860

right toe

144.075

I

left toe

(Normal Walk)

Walk Decision Tree

I

0.535

0

1

Conclusions

- Interpretable results
- more accurate/robust
- significantly faster at classification

Question?

- All of the datasets are free to download http://www.cs.ucr.edu/~lexiangy/shapelet.html
- Code available upon request

Download Presentation

Connecting to Server..