Making time pseudo time series for the temporal analysis of cross section data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data. Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel University West London. Cross-Section Data. Studies often involve data sampled from a cross-section of a population

Download Presentation

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Making time pseudo time series for the temporal analysis of cross section data

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

Emma Peeling, Allan Tucker

Centre for Intelligent Data Analysis

Brunel University

West London


Cross section data

Cross-Section Data

  • Studies often involve data sampled from a cross-section of a population

  • Especially in biological and medical studies

    • Collecting medical information on patients suffering from a particular disease and controls (healthy)

  • Essentially these studies show a “snapshot” of the disease process


Cross section data1

Cross-Section Data

  • Many processes are inherently temporal in nature

  • Previously healthy people can develop a disease over time going through different stages of severity

  • If we want to model the development of such processes, usually require longitudinal data


Cross section vs longitudinal

Cross-Section vs Longitudinal

Longitudinal Study

Onset

Disease Progression

Cross Section

Study


Pseudo time series models

Pseudo Time-Series Models

  • In this presentation we explore:

    • Ordering data based upon Minimum Spanning Trees & PQ-Trees (Rifkin et al. 2000)

    • Treating this ordered data as “Pseudo Time-Series”

    • Using Pseudo Time-Series to build temporal models

    • Test using a dynamic Bayesian network model for classifying:

      • Medical Data

      • Gene Expression Data


Multi dimensional scaling

Multi-Dimensional Scaling

  • Can be used to visualise distance between data points and pathways

  • Here we use classic MDS

    • Metric-based – Euclidean Distance


Minimum spanning tree

Minimum Spanning Tree

  • Connects all nodes in graph

  • Links contain minimal weights

Weighted Graph MST


Pq tree

PQ-Tree

  • PQ-Trees are used to encode partial orderings on variables

  • P nodes: children can be in any order

  • Q nodes: children order can only be reversed


Dynamic bayesian network classifiers

Dynamic Bayesian Network Classifiers

  • DBNCs are used to calculate: P(C|Xt, Xt-1)

  • Here, we use the DBNC to model the Pseudo Time-Series for classifying data


Pseudo time series models1

Pseudo Time-Series Models

  • In Summary:

    1: Input: Cross-section data

    2: Construct weighted graph and MST

    3: Construct PQ tree from MST

    4: Derive Pseudo Time-Series from PQ-tree using hill-climb search on P-nodes to

    minimise sequence length

    5: Build DBNC model using pseudo temporal ordering of samples

    6: Output: Temporal model of cross-section data


The datasets

The Datasets

  • B-Cell Microarray Data

    • 3 classes of B-Cell data

    • A number of patients

    • Pre-ordered into expert pseudo time-series

  • Visual Field Test Data

    • One large cross-section study

    • Healthy and Glaucomatous eyes

    • One longitudinal study for testing the models


B cell mds pseudo time series

B-Cell: MDS & Pseudo Time-Series

  • Plots show

    • discovered path in 3D

    • Classification of B-Cell data in 2D


B cell accuracy

B-Cell Accuracy

  • Plot shows mean accuracy and variance over Cross-Validation with repeats


Expert knowledge

Expert Knowledge

  • Ordering Sequence length

  • Biologist = 512.0506:

  • 1-26

  • PQ-tree: = 528.9907:

  • 1-6,7,9,8,11,10,12-18,26,19,21,20,22-25

  • PQ-tree and hill-climb = 521.1865:

  • 1-18,26,19-25


Visual field mds pseudo time series

Visual Field: MDS & Pseudo Time-Series

  • Plots show

    • Path found for VF data in 3D

    • Classification of VF data in 2D


Vf accuracy

VF Accuracy

  • Plot shows mean accuracy and variance over Train / Test data with repeats


Related work

Related Work

  • Semi-Supervised Methods

    • Some datapoints are labelled with classes

    • These are used to assist classification of others in an incremental manner

  • Pseudo MTS imposes an order on the data as well as a distance between data

  • Allows for the prediction of future states


Conclusions

Conclusions

  • Cross Section data usually models snapshot of a process

  • Longitudinal data usually needed to model temporal nature

  • Here we use ordering methods to create Pseudo Time-Series models

  • Early results on medical and biological data are promising


Future work

Future Work

  • Dealing with outliers in dataspace

  • Multiple trajectories (e.g. in VF data)

  • Normalisation (rather than discretisation)

  • Combining a number of longitudinal and cross-section studies


Multiple trajectories

Multiple Trajectories


Acknowledgements

Acknowledgements

  • Thanks to:

    • David Garway-Heath, Moorifield’s Eye Hospital, London

    • Paul Kellam, University College London


  • Login