finding time series motifs on disk resident data n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Finding Time Series Motifs on Disk-Resident Data PowerPoint Presentation
Download Presentation
Finding Time Series Motifs on Disk-Resident Data

Loading in 2 Seconds...

play fullscreen
1 / 21

Finding Time Series Motifs on Disk-Resident Data - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Finding Time Series Motifs on Disk-Resident Data. Abdullah Mueen, Dr. Eamonn Keogh UC Riverside Nima Bigdely-Shamlo Swartz Center for Computational Neuroscience, UCSD. Outline. Motivation Time Series Motif DAME : Disk-Aware Motif Enumeration Performance Evaluation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Finding Time Series Motifs on Disk-Resident Data' - ermin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
finding time series motifs on disk resident data

Finding Time Series Motifs on Disk-Resident Data

Abdullah Mueen, Dr. Eamonn Keogh

UC Riverside

Nima Bigdely-Shamlo

Swartz Center for Computational Neuroscience, UCSD

outline
Outline
  • Motivation
    • Time Series Motif
  • DAME: Disk-Aware Motif Enumeration
  • Performance Evaluation
    • Speedup and Efficiency
  • Case Studies
    • Motifs in Brain-Computer Interfaces
    • Motifs in Image Database
  • Conclusion
sequence motif
Sequence Motif
  • Repeated Pattern in a sequence .
  • A Pattern can be approximately similar.
    • Mismatch is allowed
  • A Pattern can be overlapping.

2

1

0

-1

GACATAATAACCAGCTATCTGCTCGCATCGCCGCGACATAGCT

-2

40

60

80

100

120

140

160

180

200

20

Motion Motif

Structural Motif

Time Series Motif

time series motif
Time Series Motif
  • Repeated Pattern in a Time Series.
  • Exact Motif.
    • The most similar pair under Euclidean Distance.
  • Non Overlapping.
  • Euclidean distance (between normalized segments)
    • Beats most similarity measures on large datasets.
    • Early abandoning.
    • Triangular inequality.
      • d(P,Q) ≥ |d(P,R) - d(Q,R)|

2

1

0

-1

-2

0

10

20

30

40

50

60

motif discovery in disk resident datasets
Motif Discovery in Disk-Resident Datasets
  • Large datasets
    • Light Curves of Stars.
    • Performance Counters of Data Centers.
  • Pseudo time series dataset
    • “80 million Tiny Images”
  • Database of normalized subsequences
    • An hour long trace of EEG generates over one million normalized subsequences.
slide6

Geometric View

Disk View

Blocks

19

1

2

3

4

5

6

9

10

12

4

14

7

8

9

24

15

7

16

1

10

11

12

5

6

11

13

14

15

8

3

17

22

23

20

16

17

18

13

19

20

21

2

21

DAME

18

22

23

24

Set of 2D points

slide7

Geometric View

Projected View

Disk View

1

5

18

19

Blocks

19

1

5

14

3

15

17

9

10

12

4

14

8

10

22

24

15

7

16

1

11

4

12

0

5

6

11

9

7

24

8

3

17

22

23

20

6

2

13

13

20

21

23

2

21

DAME

18

16

1819

Linear Representation

in sorted order

0 is the reference point

slide8

Geometric View

Projected View

Projected View

Disk View

1

5

18

19

Blocks

19

1

5

14

3

15

17

9

10

12

4

14

8

10

22

24

15

7

16

1

11

4

12

0

5

6

Best 1

11

9

7

24

8

3

17

22

23

20

6

2

13

13

20

21

23

2

21

DAME

18

16

1819

Best 2

Divide the point-set into two partition

and solve the subproblem

slide9

Geometric View

Projected View

Projected View

Disk View

1

5

18

19

Blocks

19

1

5

14

3

15

17

9

10

12

4

14

8

10

22

24

15

7

16

1

11

4

12

0

5

6

Bsf

11

9

7

24

8

3

17

22

23

20

6

2

13

13

20

21

23

2

21

DAME

18

16

1819

The inner ring is the region for blocks 5 and 6

Blocks of Interest

The outer ring is the region for blocks 3 and 4

slide10

Block-Pair (3,5)

Block-Pair (3,6)

Block-Pair (4,5)

Block-Pair (4,6)

Block 3 and block 6 do not overlap. No comparison.

9 comparisons

1 comparison

1 Comparison

1

1

1

1

2

2

2

2

3

3

3

3

4

4

4

4

bsf

5

5

5

5

6

6

6

6

7

7

7

7

DAME

8

8

8

8

No Comparison

Loaded Blocks

11 comparisons are made instead of 9*16=144

speedup
Speedup

Memory

Disk

X

X

X

performance evaluation
Performance Evaluation

x 103

x 103

12

10

Seconds in DAME_Motif

11

Seconds in DAME_Motif

9

10

Total

8

9

8

7

7

CPU

6

6

5

5

4

4

I/O

3

3

2

0

200

400

600

800

1000

1200

Motif Length

10,000

20,000

30,000

40,000

50,000

1,000

500

34

25

20

# of time series

# of blocks

case study 1 brain computer interfaces
Case Study 1: Brain-Computer Interfaces

Biosemi, Inc.

Target

Non-Target

case study 1 brain computer interfaces1

Before target presentation

After target presentation

IC 17, Motif 1

110

100

90

80

Epochs

70

60

50

40

30

20

10

-1000

-500

0

500

1000

Latency

Case Study 1: Brain-Computer Interfaces

22

50

Target Trials

20

18

100

16

14

150

12

200

10

Non-target Trials

8

250

6

Spatial filter (ICA)

300

4

-1000

-800

-600

-400

-200

0

200

400

600

800

Time (ms)

3

Segment 1

Motif 1

Segment 2

2

1

Normalized IC activity

Distance to Motif 1

0

-1

-2

0

100

200

300

400

500

600

Time (ms)

case study 2 image motifs
Case Study 2: Image Motifs
  • Concatenated color histogram is considered as pseudo time series.
  • Each time series is of length 256*3 = 768.
  • 80 million tiny images of 32X32 resolution.

12

10

8

6

4

2

0

-2

0

100

200

300

400

500

600

700

80 million tiny images : collected by Antonio Torralba, Rob Fergus, William T. Freemanat MIT.

case study 2 image motifs1
Case Study 2: Image Motifs
  • DAME worked on the first 40 million time series in ~6.5 days
  • DAME found 3,836,902 images which have at least one duplicate.
    • 1,719,443 unique images.
  • 542,603 images have near duplicates with distance less than 0.1.

Duplicate Image

Near Duplicate Image

23277616

23277667

15513839

15513780

31391181

6791228

38468056

11896606

32751032

17012103

2495

21298

2477

21280

3245

21891

3305

22166

2553

21371

conclusion
Conclusion
  • DAME: The first exact-motif discovery algorithm that finds motif in disk-resident data.
  • DAME is scalable to massive datasets of the order of millions of time series.
  • DAME successfully finds motif in EEG traces and image databases.
example of multidimensional motif
Example of Multidimensional Motif

Motion-Motif

Top view of the dance floor and the trajectories of the dancers.

Dance Motions are taken from the CMU Motion Capture Database

example of worst case scenario
Example of Worst Case Scenario

14

15

13

5

6

4

7

16

12

r

3

8

11

17

2

9

1

10

18

multiple references for ordering
Multiple References for Ordering

Larger Gap

40

30

20

Planar bounds

10

Actual distances

0

Linear bounds

10

20

30

Smaller Gap

40

Lower bound

x

y

y

x

Rotational axis

r1

r2

r1

r2