Loading in 5 sec....

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data PowerPoint Presentation

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

- By
**akina** - Follow User

- 121 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data ' - akina

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

Emma Peeling, Allan Tucker

Centre for Intelligent Data Analysis

Brunel University

West London

Cross-Section Data

- Studies often involve data sampled from a cross-section of a population
- Especially in biological and medical studies
- Collecting medical information on patients suffering from a particular disease and controls (healthy)

- Essentially these studies show a “snapshot” of the disease process

Cross-Section Data

- Many processes are inherently temporal in nature
- Previously healthy people can develop a disease over time going through different stages of severity
- If we want to model the development of such processes, usually require longitudinal data

Pseudo Time-Series Models

- In this presentation we explore:
- Ordering data based upon Minimum Spanning Trees & PQ-Trees (Rifkin et al. 2000)
- Treating this ordered data as “Pseudo Time-Series”
- Using Pseudo Time-Series to build temporal models
- Test using a dynamic Bayesian network model for classifying:
- Medical Data
- Gene Expression Data

Multi-Dimensional Scaling

- Can be used to visualise distance between data points and pathways
- Here we use classic MDS
- Metric-based – Euclidean Distance

PQ-Tree

- PQ-Trees are used to encode partial orderings on variables
- P nodes: children can be in any order
- Q nodes: children order can only be reversed

Dynamic Bayesian Network Classifiers

- DBNCs are used to calculate: P(C|Xt, Xt-1)
- Here, we use the DBNC to model the Pseudo Time-Series for classifying data

Pseudo Time-Series Models

- In Summary:
1: Input: Cross-section data

2: Construct weighted graph and MST

3: Construct PQ tree from MST

4: Derive Pseudo Time-Series from PQ-tree using hill-climb search on P-nodes to

minimise sequence length

5: Build DBNC model using pseudo temporal ordering of samples

6: Output: Temporal model of cross-section data

The Datasets

- B-Cell Microarray Data
- 3 classes of B-Cell data
- A number of patients
- Pre-ordered into expert pseudo time-series

- Visual Field Test Data
- One large cross-section study
- Healthy and Glaucomatous eyes
- One longitudinal study for testing the models

B-Cell: MDS & Pseudo Time-Series

- Plots show
- discovered path in 3D
- Classification of B-Cell data in 2D

B-Cell Accuracy

- Plot shows mean accuracy and variance over Cross-Validation with repeats

Expert Knowledge

- Ordering Sequence length
- Biologist = 512.0506:
- 1-26
- PQ-tree: = 528.9907:
- 1-6,7,9,8,11,10,12-18,26,19,21,20,22-25
- PQ-tree and hill-climb = 521.1865:
- 1-18,26,19-25

Visual Field: MDS & Pseudo Time-Series

- Plots show
- Path found for VF data in 3D
- Classification of VF data in 2D

VF Accuracy

- Plot shows mean accuracy and variance over Train / Test data with repeats

Related Work

- Semi-Supervised Methods
- Some datapoints are labelled with classes
- These are used to assist classification of others in an incremental manner

- Pseudo MTS imposes an order on the data as well as a distance between data
- Allows for the prediction of future states

Conclusions

- Cross Section data usually models snapshot of a process
- Longitudinal data usually needed to model temporal nature
- Here we use ordering methods to create Pseudo Time-Series models
- Early results on medical and biological data are promising

Future Work

- Dealing with outliers in dataspace
- Multiple trajectories (e.g. in VF data)
- Normalisation (rather than discretisation)
- Combining a number of longitudinal and cross-section studies

Acknowledgements

- Thanks to:
- David Garway-Heath, Moorifield’s Eye Hospital, London
- Paul Kellam, University College London

Download Presentation

Connecting to Server..