Time Series Shapelets: A New Primitive for Data Mining

Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside

Classification • Classification • Huge interest in time series • Extensive applications • Nearest Neighbor • Most accurate (in extensive empirical tests) • Robust • Simple

Drawback of the NN • Time and space complexity • Results are not interpretable

Solution • Shapelets • shapelets are time series subsequences which are maximally representative of a class • Distinguishing substring selection • Probe design (computational biology)

Motivating example

false nettles stinging nettles false nettles Shapelet Dictionary I Shapelet 5.1 Leaf Decision Tree I yes no 0 1 false nettles stinging nettles stinging nettles false nettles

Brute-Force Algorithm

Extract subsequences of all possible lengths Candidates Pool ca . . .

Testing the utility of a candidate shapelet • Information gain • Arrange the time series objects • Find the optimal split point • Pick the candidate achieving best utility as the shapelet candidate Split Point 0

Candidates Pool Problem • Total number of candidate • Trace dataset • 200 instances, each of length 275 • 7,480,200 shapelet candidates • approximately three days . . .

Speedup • Distance calculations from time series objects to shapelet candidates are the most expensive part • Reduce the time in two ways • Distance Early Abandon (known idea) • Admissible Entropy Pruning (novel idea)

Admissible Entropy Pruning

Admissible Entropy Pruning • Information Gain • Traditional evaluation in decision tree • Easily generalized to the multi-class problem • Reduce the number of distance calculations

stinging nettles false nettles 0

I=0.42 I= 0.29 0 0

false nettles stinging nettles false nettles Shapelet Dictionary I Shapelet 5.1 Classification Leaf Decision Tree I yes no 0 1 false nettles stinging nettles stinging nettles false nettles

EXPERIMENTAL EVALUATION

Performance Comparison 5 *105 1.00 Brute Force 4 *105 0.95 3 *105 seconds accuracy 0.90 2 *105 Currently best published accuracy 91.1% Pruning 0.85 1 *105 0 0.80 160 10 20 40 80 10 20 40 80 320 160 |D|, the number of objects in the database |D|, the number of objects in the database

Projectile Points

Arrowhead Decision Tree I II 0 2 1 Avonlea Clovis 1.0 (Clovis) 11.24 I 0 (Avonlea) 85.47 II Shapelet Dictionary 0 200 400

Wheat Spectrography 1 0.5 0 0 200 400 600 800 1000 1200 one sample from each class

I V II III IV VI 2 4 0 1 3 6 5 Shapelet Dictionary I 0.4 II 0.3 III 0.2 IV 0.1 0.0 V VI 300 0 100 200 Wheat Decision Tree

the Gun/NoGun Problem No Gun Gun (No Gun) 2 38.94 I 0 Shapelet Dictionary 0 50 100 Gun Decision Tree I 1 0

Gait Analysis

0 100 200 300 Reduces the sensitivity of alignment 1.0 0 0.909 0.902 0.860 right toe 144.075 I left toe (Normal Walk) Walk Decision Tree I 0.535 0 1

Conclusions • Interpretable results • more accurate/robust • significantly faster at classification

Thank You  Question? • All of the datasets are free to download http://www.cs.ucr.edu/~lexiangy/shapelet.html • Code available upon request

Time Series Shapelets: A New Primitive for Data Mining

Time Series Shapelets: A New Primitive for Data Mining

Presentation Transcript

DATA MINING Introductory and Advanced Topics Part II

Knime: a data mining platform

Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 6 —

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining

Data Mining Classification: Basic Concepts,

Data Mining Chapter 1

Chapter 9 Regression with Time Series Data: Stationary Variables

Data Mining: Concepts and Techniques — Chapter 5 — Mining Frequent Patterns

Time Series – from Achieved to Excellence

Data Mining Algorithms for Recommendation Systems

Weka – A Data Mining Toolkit

Data Mining: Concepts and Techniques

CS 490 Sample Project Mining the Mushroom Data Set

Spatial Data Mining: Accomplishments and Research Needs

Data Mining: Concepts and Techniques

DATA WAREHOUSING AND DATA MINING

DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University

Chapter 4 Primitive Data Types and Operations

15-826: Multimedia Databases and Data Mining