Time Series Data Analysis - II

1 / 19

# Time Series Data Analysis - II - PowerPoint PPT Presentation

Time Series Data Analysis - II. Yaji Sripada. In this lecture you learn. Structural representations of time series SAX Computing SAX Data analysis using SAX Visualization using SAX. Introduction. Time series exhibit an internal structure

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Time Series Data Analysis - II' - mada

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Time Series Data Analysis - II

In this lecture you learn
• Structural representations of time series
• SAX
• Computing SAX
• Data analysis using SAX
• Visualization using SAX

Dept. of Computing Science, University of Aberdeen

Introduction
• Time series exhibit an internal structure
• Elements of this structure have domain specific meanings
• E.g. a scuba dive is composed of
• one or more descent segments,
• one or more bottom segments and
• finally one or more ascent segments in that order
• These segments have specific meaning in the domain of scuba diving
• The structural elements of a time series are usually approximations (abstractions) of the original data
• Experts in any domain reason in terms of these abstractions and not in terms of the original time series
• Understanding time series = understanding their structure

Dept. of Computing Science, University of Aberdeen

Several structural representations
• Time series can be represented in terms of
• Linear segments (we already saw this last week)
• Aggregate Approximations (will study in this lecture)
• Non-linear segments (Not in this course)
• Wavelets (involve complex mathematics – not in this course)
• And many more
• The primary motivation behind creating the above structural representations is time series data mining

Dept. of Computing Science, University of Aberdeen

Which structure is the most useful?
• All these structural representations are useful
• may be more used in some application domains than others
• A good representation exhibits meaningful structure
• But meaning is attributed to a structure based on domain knowledge and user tasks
• This means, select a representation that helps easy computation of meaning
• Our approach to selecting the right representation
• Based on the domain KA we learn the trends and patterns that are meaningful
• Select one or more representations that facilitate the computation of required trends and patterns

Dept. of Computing Science, University of Aberdeen

Symbolic Aggregate Approximation (SAX)
• A recently developed symbolic representation of time series is claimed to facilitate easy pattern computation
• http://www.cs.ucr.edu/~eamonn/SAX.htm is the main SAX page
• We introduced this representation in the last lecture
• We study how to create this representation in this lecture because it allows
• Novel data analysis of time series and
• Novel visualization of time series
• We will study briefly data analysis and visualization with SAX
• The above link has all the required details for further study

Dept. of Computing Science, University of Aberdeen

Creating SAX
• Input
• Real valued time series (blue curve)
• Output
• Symbolic representation of the input time series (red string)
• Process
• First convert the input series into piecewise aggregate approximation (PAA) representation (grey steps)
• Then convert the PAA into a string of symbols (red string)

PAA

Input Series

SAX

baabccbc

Dept. of Computing Science, University of Aberdeen

Example Data

Dept. of Computing Science, University of Aberdeen

Creating PAA
• Normalize the input time series
• Subtract the mean from each value and divide the deviation with standard deviation
• Divide input time series of length n into w portions of equal length
• w is the parameter that controls the length of PAA and therefore the length of SAX
• If w is large you have a detailed (fine) PAA and a detailed SAX
• If w is small you have an abstract (coarse) PAA and an abstract SAX
• Choice of w should be based on the application requirements

Dept. of Computing Science, University of Aberdeen

Creating PAA (2)
• Two cases
• n/w is a whole number
• Simple case of each portion having n/w number of values from the input time series
• n/w is a fraction
• Complicated case because you cannot assign equal number of whole numbered values from the input series to w equal sized portions
• Our example data has n = 14
• If w = 3, then n/w is a fraction
• The length of each portion is 14/3 = 4.66667
• Each portion should have 4.66667 values from the original time series

Dept. of Computing Science, University of Aberdeen

Creating PAA (3)
• We use the following scheme to achieve 4.6667 values in each portion
• The following is the list of indexes of the 14 values in a input series

1 2 3 4 5 6 7 8 9 10 11 12 13 14

• The first portion will have values at 1, 2, 3, and 4
• We need 0.6667 more to complete this portion
• We achieve this by inserting 0.6667 times the 5th value
• The remaining 0.3333 times the 5th value is inserted into the second portion

Dept. of Computing Science, University of Aberdeen

Creating PAA (4)
• Using the above scheme our three lists are
• 4.2, 9.2, 14.8, 15 and 0.6667*17
• 0.3333*17, 18, 19.7, 20, 20.8, 0.3333*21.3
• 0.6667*21.3, 21.6, 20.6, 16.9, 12.8
• (Note: here we have shown the values from the un-normalized input series)
• Each of the above sublists have equal portions from the input series
• Next for each of the sublists compute the average (mean)
• In our case, three sublists will each have an average value
• PAA is simply a vector of these average values
• {avg1, avg2, avg3}
• {-0.9338,0.53135,0.34767} for our example (using normalized values)

Dept. of Computing Science, University of Aberdeen

Properties of PAA
• PAA is simple to compute (as can be seen from the previous slides)
• Achieves dimensionality reduction
• From 14 values our input series is reduced to 3 values
• Any similarities computed on the PAA will be true on input series as well
• Lower bounding distance
• Very useful property for a structural representation
• Allows data analysis to be performed on the approximate representation rather than the original series

Dept. of Computing Science, University of Aberdeen

Symbol Mapping
• In this step, each average value from the PAA vector is replaced by a symbol from an alphabet
• An alphabet size, a of 5 to 8 is recommended
• a,b,c,d,e
• a,b,c,d,e,f
• a,b,c,d,e,f,g
• a,b,c,d,e,f,g,h
• Given an average value we need a symbol
• This is achieved by using the normal distribution from statistics
• Because our input series is normalized we can use normal distribution as the data model
• We divide the area under the normal distribution into ‘a’ equal sized areas where a is the alphabet size
• Each such area is bounded by breakpoints

Dept. of Computing Science, University of Aberdeen

Symbol mapping - breakpoints
• Breakpoints for different alphabet sizes can be structured as a lookup table
• When a=3
• Average values below -0.43 are replaced by ‘A’
• Average values between -0.43 and 0.43 are replaced by ‘B’
• Average values above 0.43 are replaced by ‘C’
• Using this table, SAX for our input series is ‘ADD’

Dept. of Computing Science, University of Aberdeen

c

c

c

b

b

b

a

a

-

-

0

0

40

60

80

100

120

20

SAX Computation – in pictures

C

C

0

20

40

60

80

100

120

This slide taken from Eamonn’s Tutorial on SAX

baabccbc

Dept. of Computing Science, University of Aberdeen

Data Analysis using SAX
• A general approach is to convert time series into SAX
• Use SAX representations to train Markov models (details not here) on normal data
• The model captures the probabilities of normal patterns
• The trained models are then used to test incoming data for known and unknown patterns

Dept. of Computing Science, University of Aberdeen

Visualization using SAX

Mark Frequencies

• Given a SAX representation
• count the frequencies of patterns (substrings) of required length and
• use them to color code a mosaic for visualizing time series
• For example, given ‘baabccbc’ as the SAX representation
• We calculate the frequencies of substrings of length 1 and represent them in a mosaic
• Visualizations for substrings of length>1 are possible (please refer to the SAX site)

Normalize

Color code cells

Dept. of Computing Science, University of Aberdeen

Summary
• Structural representations help in understanding time series through
• Data analysis + Visualization
• SAX is claimed to be a landmark representation of time series
• Symbolic and therefore allows use of discrete data structures and their corresponding algorithms for analysis
• Also helps with visualization

Dept. of Computing Science, University of Aberdeen