The automatic explanation of multivariate time series mts
Download
1 / 32

The Automatic Explanation of Multivariate Time Series (MTS) - PowerPoint PPT Presentation


  • 151 Views
  • Uploaded on

The Automatic Explanation of Multivariate Time Series (MTS). Allan Tucker. The Problem - Data. Datasets which are Characteristically: High Dimensional MTS Large Time Lags Changing Dependencies Little or No Available Expert Knowledge. The Problem - Requirement.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Automatic Explanation of Multivariate Time Series (MTS)' - fai


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

The problem data
The Problem - Data

  • Datasets which are Characteristically:

    • High Dimensional MTS

    • Large Time Lags

    • Changing Dependencies

    • Little or No Available Expert Knowledge


The problem requirement
The Problem - Requirement

  • Lack of Algorithms to Assist Users in Explaining Events where:

    • Model Complex MTS Data

    • Learnable from Data with Little or No User Intervention

    • Transparency Throughout the Learning and Explaining Process is Vital


Contribution to knowledge
Contribution to Knowledge

  • Using a Combination of Evolutionary Programming (EP) and Bayesian Networks (BNs) to Overcome Issues Outlined

  • Extending Learning Algorithms for BNs to Dynamic Bayesian Networks (DBNs) with Comparison of Efficiency

  • Introduction of an Algorithm for Decomposing High Dimensional MTS into Several Lower Dimensional MTS


Contribution to knowledge continued
Contribution to Knowledge (Continued)

  • Introduction of New EP-Seeded GA Algorithm

  • Incorporating Changing Dependencies

  • Application to Synthetic and Real-World Chemical Process Data

  • Transparency Retained Throughout Each Stage


Framework
Framework

Pre-processing

Data Preparation

Variable Groupings

Model

Building

Search Methods

Synthetic Data

Evaluation

Real Data

Changing

Dependencies

Explanation


Key technical points 1 comparing adapted algorithms
Key Technical Points 1Comparing Adapted Algorithms

  • New Representation

  • K2/K3 [Cooper and Herskovitz]

  • Genetic Algorithm [Larranaga]

  • Evolutionary Algorithm [Wong]

  • Branch and Bound [Bouckaert]

  • Log Likelihood / Description Length

  • Publications:

    • International Journal of Intelligent Systems, 2001


Key technical points 2 grouping
Key Technical Points 2Grouping

  • A Number of Correlation Searches

  • A Number of Grouping Algorithms

  • Designed Metrics

  • Comparison of All Combinations

  • Synthetic and Real Data

  • Publications:

    • IDA99

    • IEEE Trans System Man and Cybernetics 2001

    • Expert Systems 2000


Key technical points 3 ep seeded ga
Key Technical Points 3EP-Seeded GA

  • Approximate Correlation Search Based on the One Used in Grouping Strategy

  • Results Used to Seed Initial Population of GA

  • Uniform Crossover

  • Specific Lag Mutation

  • Publications:

    • Genetic Algorithms and Evolutionary Computation Conference 1999 (GECCO99)

    • International Journal of Intelligent Systems, 2001

    • IDA2001


Key technical points 4 changing dependencies
Key Technical Points 4Changing Dependencies

  • Dynamic Cross Correlation Function for Analysing MTS

  • Extend Representation Introduce a Heuristic Search - Hidden Controller Hill Climb (HCHC)

    • Hidden Variables to Model State of the System

    • Search for Structure and Hidden States Iteratively


Future work
Future Work

  • Parameter Estimation

  • Discretisation

  • Changing Dependencies

  • Efficiency

  • New Datasets

    • Gene Expression Data

    • Visual Field Data


Dbn representation
DBN Representation

a0(t)

(3,1,4)

(4,2,3)

(2,3,2)

(3,0,2)

(3,4,2)

a1(t)

a2(t-2)

a2(t)

a3(t-4)

a3(t-2)

a3(t)

a4(t-3)

a4(t)

t-4 t-3 t-2 t-1 t


Sample dbn search results
Sample DBN Search Results

N = 5, MaxT = 10

N = 10, MaxT = 60


The automatic explanation of multivariate time series mts

1. Correlation

Search (EP)

2. Grouping

Algorithm (GGA)

Several Lower

Dimensional

MTS

Grouping

One High

Dimensional

MTS (A)

List

1

2

R

(a, b, lag)

(a, b, lag)

(a, b, lag)

G

{0,3}

{1,4,5}

{2}


Sample grouping results

Original Synthetic MTS

Groupings

Groupings Discovered

from Synthetic Data

Sample of Variables from a Discovered Oil Refinery Data Group

0 1 2

3 4 5 6 7

8 9 10 11 12

13 14 15 16 17 18 19 20 21 22

23 24 25 26 27 28 29 30 31 32

33 34 35 36 37 38 39 40 41 42

43 44 45 46 47 48 49

50 51 52 53 54 55

56 57 58

59 60

0 6

1

2

3 4 5 7

8

9 10

11 12

13

14 15 20 21 22

16 17 18 19

23 24 25 26 27 28 29 30 31 32

33 34 35 36 37 38 39 40 41 42

43 44 45 46 47 48 49

50 51 52 53 54 55

56 57 58

59 60

Sample Grouping Results


Parameter estimation
Parameter Estimation

  • Simulate Random Bag (Vary R, s and c, e)

  • Calculate Mean and SD for Each Distribution (the Probability of Selecting e from s)

  • Test for Normality (Lilliefors’ Test)

  • Symbolic Regression (GP) to Determine the Function for Mean and SD from R, s and c (e will be Unknown)

  • Place Confidence Limits on the P(Number of Correlations Found e)


The automatic explanation of multivariate time series mts

Final EPList

EP-Seeded GA

0: (a,b,l)

1: (a,b,l)

2: (a,b,l)

EPListSize: (a,b,l)

EP

DBN

Initial GAPopulation

0: ((a,b,l),(a,b,l)…(a,b,l))

1: ((a,b,l),(a,b,l)…(a,b,l))

2: ((a,b,l),(a,b,l)…(a,b,l))

GAPopsize: ((a,b,l) … (a,b,l))

GA


Ep seeded ga results
EP-Seeded GA Results

N = 10, MaxT = 60

N = 20, MaxT = 60



The automatic explanation of multivariate time series mts

Time Explanation

t

t-1

t-11

t-13

t-16

t-20

t-60

P(TT instate_0) = 1.0

P(TGF instate_0) = 1.0

P(BPF instate_3) = 1.0

P(TGF instate_3) = 1.0

P(TT instate_1) = 0.446

P(SOT instate_0) = 0.314

P(C2% instate_0) = 0.279

P(T6T instate_0) = 0.347

P(RinT instate_0) = 0.565


Changing dependencies

50

10.5

10

45

9.5

40

9

A/M_GB

Variable Magnitude

35

TGF

8.5

30

8

25

7.5

20

7

1

501

1001

1501

2001

2501

3001

3501

Time (Minutes)

Changing Dependencies


Dynamic cross correlation function
Dynamic Cross- Correlation Function


Hidden variable opstate
Hidden Variable - OpState

a0(t-4)

a2(t-1)

a2(t)

OpState2

a3(t-2)

t-4 t-3 t-2 t-1 t


Hidden controller hill climb

< DBN_List >

< Segment_Lists >

Update Segment_Lists

through Op_State Parameter Estimation

Score

Update DBN_List

through DBN Structure

Search

Hidden Controller Hill Climb


Hchc results oil refinery data
HCHC Results - Oil Refinery Data


Hchc results synthetic data
HCHC Results - Synthetic Data

Generate Data from Several DBNs

Append each Section of Data Together to Form One MTS with Changing Dependencies

Run HCHC


The automatic explanation of multivariate time series mts

Time Explanation

t

t-1

t-3

t-5

t-6

t-9

P(OpState1 is 0) = 1.0

P(a1 is 0) = 1.0

P(a0 is 0) = 1.0

P(a2 is 1) = 1.0

P(OpState1 is 0) = 1.0

P(a1 is 1) = 1.0

P(a0 is 0) = 1.0

P(a2 is 1) = 1.0

P(a2 is 0) = 0.758

P(OpState0 is 0) = 0.519

P(a0 is 0) = 0.968

P(OpState0 is 0) = 0.720

P(a0 is 1) = 0.778

P(a2 is 0) = 0.545

P(a0 is 1) = 0.517


The automatic explanation of multivariate time series mts

Time Explanation

t

t-1

t-3

t-5

t-6

t-7

t-9

P(OpState1 is 4) = 1.0

P(a1 is 0) = 1.0

P(a0 is 0) = 1.0

P(a2 is 1) = 1.0

P(OpState1 is 4) = 1.0

P(a1 is 1) = 1.0

P(a0 is 0) = 1.0

P(a2 is 1) = 1.0

P(a2 is 1) = 0.570

P(a0 is 0) = 0.506

P(OpState2 is 3) = 0.210

P(a2 is 1) = 0.974

P(OpState2 is 4) = 0.222

P(a2 is 0) = 0.882

P(a0 is 1) = 0.549


Process diagram

TGF

%C3

Process Diagram

TT

T6T

PGM

PGB

SOTT11

SOFT13

RINT

C11/3T

T36T

AFT

FF

RBT

BPF

%C2


Typical discovered relationships

TGF

%C3

Typical Discovered Relationships

PGM

TT

T6T

PGB

SOTT11

SOFT13

RINT

C11/3T

T36T

AFT

FF

RBT

BPF

%C2


Parameters
Parameters

DBN SearchGA EP

PopSize 100 10

MR0.1 0.8

CR0.8 ---

GenBased on FC Based on FC

Correlation Search

c - Approx. 20% of s

R - Approx. 2.5% of s

Grouping GA Synth. 1 Synth. 2-6 Oil

PopSize150 100 150

CR 0.8 0.8 0.8

MR0.1 0.1 0.1

Gen 150 100 (1000 for GPV) 150


Parameters1
Parameters

EP-Seeded GA

c - Approx. 20% of s

EPListSize - Approx. 2.5% of s

GAPopSize - 10

MR - 0.1

CR - 0.8

LMR -0.1

Gen - Based on FC

HCHC

Oil Synthetic

DBN_Iterations 1×106 5000

Winlen 1000 200

Winjump 500 50