# Time Series Data Analysis - II - PowerPoint PPT Presentation

1 / 19

Time Series Data Analysis - II. Yaji Sripada. In this lecture you learn. Structural representations of time series SAX Computing SAX Data analysis using SAX Visualization using SAX. Introduction. Time series exhibit an internal structure

## Related searches for Time Series Data Analysis - II

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Time Series Data Analysis - II

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Time Series Data Analysis - II

### In this lecture you learn

• Structural representations of time series

• SAX

• Computing SAX

• Data analysis using SAX

• Visualization using SAX

Dept. of Computing Science, University of Aberdeen

### Introduction

• Time series exhibit an internal structure

• Elements of this structure have domain specific meanings

• E.g. a scuba dive is composed of

• one or more descent segments,

• one or more bottom segments and

• finally one or more ascent segments in that order

• These segments have specific meaning in the domain of scuba diving

• The structural elements of a time series are usually approximations (abstractions) of the original data

• Experts in any domain reason in terms of these abstractions and not in terms of the original time series

• Understanding time series = understanding their structure

Dept. of Computing Science, University of Aberdeen

### Several structural representations

• Time series can be represented in terms of

• Linear segments (we already saw this last week)

• Aggregate Approximations (will study in this lecture)

• Non-linear segments (Not in this course)

• Wavelets (involve complex mathematics – not in this course)

• And many more

• The primary motivation behind creating the above structural representations is time series data mining

Dept. of Computing Science, University of Aberdeen

### Which structure is the most useful?

• All these structural representations are useful

• may be more used in some application domains than others

• A good representation exhibits meaningful structure

• But meaning is attributed to a structure based on domain knowledge and user tasks

• This means, select a representation that helps easy computation of meaning

• Our approach to selecting the right representation

• Based on the domain KA we learn the trends and patterns that are meaningful

• Select one or more representations that facilitate the computation of required trends and patterns

Dept. of Computing Science, University of Aberdeen

### Symbolic Aggregate Approximation (SAX)

• A recently developed symbolic representation of time series is claimed to facilitate easy pattern computation

• http://www.cs.ucr.edu/~eamonn/SAX.htm is the main SAX page

• We introduced this representation in the last lecture

• We study how to create this representation in this lecture because it allows

• Novel data analysis of time series and

• Novel visualization of time series

• We will study briefly data analysis and visualization with SAX

• The above link has all the required details for further study

Dept. of Computing Science, University of Aberdeen

### Creating SAX

• Input

• Real valued time series (blue curve)

• Output

• Symbolic representation of the input time series (red string)

• Process

• First convert the input series into piecewise aggregate approximation (PAA) representation (grey steps)

• Then convert the PAA into a string of symbols (red string)

PAA

Input Series

SAX

baabccbc

Dept. of Computing Science, University of Aberdeen

### Example Data

Dept. of Computing Science, University of Aberdeen

### Creating PAA

• Normalize the input time series

• Subtract the mean from each value and divide the deviation with standard deviation

• Divide input time series of length n into w portions of equal length

• w is the parameter that controls the length of PAA and therefore the length of SAX

• If w is large you have a detailed (fine) PAA and a detailed SAX

• If w is small you have an abstract (coarse) PAA and an abstract SAX

• Choice of w should be based on the application requirements

Dept. of Computing Science, University of Aberdeen

### Creating PAA (2)

• Two cases

• n/w is a whole number

• Simple case of each portion having n/w number of values from the input time series

• n/w is a fraction

• Complicated case because you cannot assign equal number of whole numbered values from the input series to w equal sized portions

• Our example data has n = 14

• If w = 3, then n/w is a fraction

• The length of each portion is 14/3 = 4.66667

• Each portion should have 4.66667 values from the original time series

Dept. of Computing Science, University of Aberdeen

### Creating PAA (3)

• We use the following scheme to achieve 4.6667 values in each portion

• The following is the list of indexes of the 14 values in a input series

1 2 3 4 5 6 7 8 9 10 11 12 13 14

• The first portion will have values at 1, 2, 3, and 4

• We need 0.6667 more to complete this portion

• We achieve this by inserting 0.6667 times the 5th value

• The remaining 0.3333 times the 5th value is inserted into the second portion

Dept. of Computing Science, University of Aberdeen

### Creating PAA (4)

• Using the above scheme our three lists are

• 4.2, 9.2, 14.8, 15 and 0.6667*17

• 0.3333*17, 18, 19.7, 20, 20.8, 0.3333*21.3

• 0.6667*21.3, 21.6, 20.6, 16.9, 12.8

• (Note: here we have shown the values from the un-normalized input series)

• Each of the above sublists have equal portions from the input series

• Next for each of the sublists compute the average (mean)

• In our case, three sublists will each have an average value

• PAA is simply a vector of these average values

• {avg1, avg2, avg3}

• {-0.9338,0.53135,0.34767} for our example (using normalized values)

Dept. of Computing Science, University of Aberdeen

### Properties of PAA

• PAA is simple to compute (as can be seen from the previous slides)

• Achieves dimensionality reduction

• From 14 values our input series is reduced to 3 values

• Any similarities computed on the PAA will be true on input series as well

• Lower bounding distance

• Very useful property for a structural representation

• Allows data analysis to be performed on the approximate representation rather than the original series

Dept. of Computing Science, University of Aberdeen

### Symbol Mapping

• In this step, each average value from the PAA vector is replaced by a symbol from an alphabet

• An alphabet size, a of 5 to 8 is recommended

• a,b,c,d,e

• a,b,c,d,e,f

• a,b,c,d,e,f,g

• a,b,c,d,e,f,g,h

• Given an average value we need a symbol

• This is achieved by using the normal distribution from statistics

• Because our input series is normalized we can use normal distribution as the data model

• We divide the area under the normal distribution into ‘a’ equal sized areas where a is the alphabet size

• Each such area is bounded by breakpoints

Dept. of Computing Science, University of Aberdeen

### Symbol mapping - breakpoints

• Breakpoints for different alphabet sizes can be structured as a lookup table

• When a=3

• Average values below -0.43 are replaced by ‘A’

• Average values between -0.43 and 0.43 are replaced by ‘B’

• Average values above 0.43 are replaced by ‘C’

• Using this table, SAX for our input series is ‘ADD’

Dept. of Computing Science, University of Aberdeen

c

c

c

b

b

b

a

a

-

-

0

0

40

60

80

100

120

20

### SAX Computation – in pictures

C

C

0

20

40

60

80

100

120

This slide taken from Eamonn’s Tutorial on SAX

baabccbc

Dept. of Computing Science, University of Aberdeen

### Data Analysis using SAX

• A general approach is to convert time series into SAX

• Use SAX representations to train Markov models (details not here) on normal data

• The model captures the probabilities of normal patterns

• The trained models are then used to test incoming data for known and unknown patterns

Dept. of Computing Science, University of Aberdeen

### Visualization using SAX

Mark Frequencies

• Given a SAX representation

• count the frequencies of patterns (substrings) of required length and

• use them to color code a mosaic for visualizing time series

• For example, given ‘baabccbc’ as the SAX representation

• We calculate the frequencies of substrings of length 1 and represent them in a mosaic

• Visualizations for substrings of length>1 are possible (please refer to the SAX site)

Normalize

Color code cells

Dept. of Computing Science, University of Aberdeen

### Summary

• Structural representations help in understanding time series through

• Data analysis + Visualization

• SAX is claimed to be a landmark representation of time series

• Symbolic and therefore allows use of discrete data structures and their corresponding algorithms for analysis

• Also helps with visualization

Dept. of Computing Science, University of Aberdeen