Loading in 5 sec....

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data PowerPoint Presentation

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

Download Presentation

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

Loading in 2 Seconds...

- 90 Views
- Uploaded on
- Presentation posted in: General

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

Emma Peeling, Allan Tucker

Centre for Intelligent Data Analysis

Brunel University

West London

- Studies often involve data sampled from a cross-section of a population
- Especially in biological and medical studies
- Collecting medical information on patients suffering from a particular disease and controls (healthy)

- Essentially these studies show a “snapshot” of the disease process

- Many processes are inherently temporal in nature
- Previously healthy people can develop a disease over time going through different stages of severity
- If we want to model the development of such processes, usually require longitudinal data

Longitudinal Study

Onset

Disease Progression

Cross Section

Study

- In this presentation we explore:
- Ordering data based upon Minimum Spanning Trees & PQ-Trees (Rifkin et al. 2000)
- Treating this ordered data as “Pseudo Time-Series”
- Using Pseudo Time-Series to build temporal models
- Test using a dynamic Bayesian network model for classifying:
- Medical Data
- Gene Expression Data

- Can be used to visualise distance between data points and pathways
- Here we use classic MDS
- Metric-based – Euclidean Distance

- Connects all nodes in graph
- Links contain minimal weights

Weighted Graph MST

- PQ-Trees are used to encode partial orderings on variables
- P nodes: children can be in any order
- Q nodes: children order can only be reversed

- DBNCs are used to calculate: P(C|Xt, Xt-1)
- Here, we use the DBNC to model the Pseudo Time-Series for classifying data

- In Summary:
1: Input: Cross-section data

2: Construct weighted graph and MST

3: Construct PQ tree from MST

4: Derive Pseudo Time-Series from PQ-tree using hill-climb search on P-nodes to

minimise sequence length

5: Build DBNC model using pseudo temporal ordering of samples

6: Output: Temporal model of cross-section data

- B-Cell Microarray Data
- 3 classes of B-Cell data
- A number of patients
- Pre-ordered into expert pseudo time-series

- Visual Field Test Data
- One large cross-section study
- Healthy and Glaucomatous eyes
- One longitudinal study for testing the models

- Plots show
- discovered path in 3D
- Classification of B-Cell data in 2D

- Plot shows mean accuracy and variance over Cross-Validation with repeats

- Ordering Sequence length
- Biologist = 512.0506:
- 1-26
- PQ-tree: = 528.9907:
- 1-6,7,9,8,11,10,12-18,26,19,21,20,22-25
- PQ-tree and hill-climb = 521.1865:
- 1-18,26,19-25

- Plots show
- Path found for VF data in 3D
- Classification of VF data in 2D

- Plot shows mean accuracy and variance over Train / Test data with repeats

- Semi-Supervised Methods
- Some datapoints are labelled with classes
- These are used to assist classification of others in an incremental manner

- Pseudo MTS imposes an order on the data as well as a distance between data
- Allows for the prediction of future states

- Cross Section data usually models snapshot of a process
- Longitudinal data usually needed to model temporal nature
- Here we use ordering methods to create Pseudo Time-Series models
- Early results on medical and biological data are promising

- Dealing with outliers in dataspace
- Multiple trajectories (e.g. in VF data)
- Normalisation (rather than discretisation)
- Combining a number of longitudinal and cross-section studies

- Thanks to:
- David Garway-Heath, Moorifield’s Eye Hospital, London
- Paul Kellam, University College London