1 / 17

Allan Tucker - Birkbeck College Stephen Swift - Brunel University Nigel Martin - Birkbeck College

Grouping Multivariate Time Series Variables: Applications to Chemical Process and Visual Field Data. Allan Tucker - Birkbeck College Stephen Swift - Brunel University Nigel Martin - Birkbeck College Xiaohui Liu - Brunel University. Introduction.

ksena
Download Presentation

Allan Tucker - Birkbeck College Stephen Swift - Brunel University Nigel Martin - Birkbeck College

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grouping Multivariate Time Series Variables: Applications to Chemical Process and Visual Field Data Allan Tucker - Birkbeck College Stephen Swift - Brunel University Nigel Martin - Birkbeck College Xiaohui Liu - Brunel University

  2. Introduction • Present a methodology to group Multivariate Time Series (MTS) variables • MTS is a series of observations recorded over time • Test on two real-world applications • Grouping - partitioning a set of objects into a number of mutually exclusive subsets • Many, if not all, are NP-Hard

  3. MTS Example

  4. Grouping MTS - Introduction • Desirable to model MTS as a group of several smaller dimensional MTS • Decompose MTS into several smaller dimensional MTS based on dependencies in data • Large number of dependencies because one variable may affect another after a certain time lag

  5. 1. Correlation Search (EP) Q 2. Grouping Algorithm (GGA) 1 2 ... Qlen (xa, xb, lag) (xc, xd, lag) ... (xe, xf, lag) G Several Lower Dimensional MTS {{0,3} {1,4,5} {2} Grouping MTS - Methodology One High Dimensional MTS (X)

  6. Correlation Search • Spearman’s Rank Correlation used • Entire Search Space is too large • Invalid Triples: • Autocorrelations • duplicates irrespective of direction where lag = 0 e.g. (xi ,xj ,0) and (xj ,xi ,0) • Evolutionary Programming approach found to be the most efficient

  7. Group 0 Group 1 Group 2 0 3 4 1 2 6 5 7 Grouping Genetic Algorithm- Representation and Operators • Previously compared and contrasted different GA representations and operators • Falkenauer’s Crossover & Mutation ensure Schema Theory holds for grouping problems Chromosome: 0 1 1 0 0 2 1 2 : 0 1 2

  8. Grouping- The Grouping Metric Properties • If Q is empty, then fitness maximised when each variable is in a separate group • If Q contains all pairings of variables (the entire search space), then fitness maximised when all variables in the same group • If data is from mixed set of MTS, fitness maximised when variables in the same group have as many correlations as possible in Q and variables in different groups have as few correlations as possible in Q

  9. Oil Refinery Data • Oil Refinery Process in Scotland • Data recorded every minute • Hundreds of variables • Years of data available on repository • Selected 50 interrelated variables over 10000 time points • Large Time Lags (up to 120 minutes between some variables)

  10. Visual Field Data The interval between tests is about 6 months 5 6 6 6 5 5 5 6 6 7 Typically, 76 points are measured 5 5 5 5 5 6 7 7 4 4 4 3 2 2 4 6 7 8 Values Range Between 60 =very good, 0 = blind 4 3 3 2 2 1 1 B 8 8 13 14 14 15 15 1 1 B 9 9 The number of tests can range between 10 and 44 13 13 13 14 15 15 13 11 10 9 12 12 12 12 12 11 10 10 Nerve Fibre Bundle (Right Eye) 12 12 12 11 11 10 X 12 11 11 11 Usual Position of Blind Spot (Right Eye) B

  11. Oil Refinery Data - Results (1) • Very rapid generation of Groups (seconds) • 3 major groups discovered, 2 relating to the upper and lower trays of the column • Most of the single variables appear noisy • Used as a method for pre-processing data before model building where time is short

  12. Oil Refinery Data - Results (2)

  13. Visual Field Data - Results (1)- Patient Group Comparison Patients are ordered on Average Sensitivity Patient 1 - lowest and Patient 82 - the highest Graph goes from light (BRHC) to dark (TLHC)

  14. Visual Field Data - Results (2) • High Sensitivity implies similar groups • Small groups in general • Points in the eye will be associated with similar nerve fibre bundles • Low Sensitivity implies dissimilar groups • Large groups in general • Different areas of the visual field may be deteriorating

  15. Conclusions • Decomposing Large, High-Dimensional MTS is a challenging one • Proposed methodology very encouraging • Oil Refinery Data : 3 relatively independent sub-systems rapidly identified • Visual Field Data : Discovered groups offer ideal starting point for modelling as a VAR process

  16. Future Work • Experimenting with new datasets • Gene Expression Data • EEG Data • Determining the ideal Parameters • e.g. Qlen is very influential on final groupings • Combining the two stages - correlation search and grouping into one incremental process

  17. Acknowledgements • Engineering and Physical Sciences Research Council, UK • Moorfields Eye Hospital, UK • Honeywell Technology Centre, USA • Honeywell Hi-Spec Solutions, UK • BP-Amoco, UK

More Related