making time pseudo time series for the temporal analysis of cross section data
Download
Skip this Video
Download Presentation
Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

Loading in 2 Seconds...

play fullscreen
1 / 21

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data. Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel University West London. Cross-Section Data. Studies often involve data sampled from a cross-section of a population

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data ' - akina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
making time pseudo time series for the temporal analysis of cross section data

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

Emma Peeling, Allan Tucker

Centre for Intelligent Data Analysis

Brunel University

West London

cross section data
Cross-Section Data
  • Studies often involve data sampled from a cross-section of a population
  • Especially in biological and medical studies
    • Collecting medical information on patients suffering from a particular disease and controls (healthy)
  • Essentially these studies show a “snapshot” of the disease process
cross section data1
Cross-Section Data
  • Many processes are inherently temporal in nature
  • Previously healthy people can develop a disease over time going through different stages of severity
  • If we want to model the development of such processes, usually require longitudinal data
cross section vs longitudinal
Cross-Section vs Longitudinal

Longitudinal Study

Onset

Disease Progression

Cross Section

Study

pseudo time series models
Pseudo Time-Series Models
  • In this presentation we explore:
    • Ordering data based upon Minimum Spanning Trees & PQ-Trees (Rifkin et al. 2000)
    • Treating this ordered data as “Pseudo Time-Series”
    • Using Pseudo Time-Series to build temporal models
    • Test using a dynamic Bayesian network model for classifying:
      • Medical Data
      • Gene Expression Data
multi dimensional scaling
Multi-Dimensional Scaling
  • Can be used to visualise distance between data points and pathways
  • Here we use classic MDS
    • Metric-based – Euclidean Distance
minimum spanning tree
Minimum Spanning Tree
  • Connects all nodes in graph
  • Links contain minimal weights

Weighted Graph MST

pq tree
PQ-Tree
  • PQ-Trees are used to encode partial orderings on variables
  • P nodes: children can be in any order
  • Q nodes: children order can only be reversed
dynamic bayesian network classifiers
Dynamic Bayesian Network Classifiers
  • DBNCs are used to calculate: P(C|Xt, Xt-1)
  • Here, we use the DBNC to model the Pseudo Time-Series for classifying data
pseudo time series models1
Pseudo Time-Series Models
  • In Summary:

1: Input: Cross-section data

2: Construct weighted graph and MST

3: Construct PQ tree from MST

4: Derive Pseudo Time-Series from PQ-tree using hill-climb search on P-nodes to

minimise sequence length

5: Build DBNC model using pseudo temporal ordering of samples

6: Output: Temporal model of cross-section data

the datasets
The Datasets
  • B-Cell Microarray Data
    • 3 classes of B-Cell data
    • A number of patients
    • Pre-ordered into expert pseudo time-series
  • Visual Field Test Data
    • One large cross-section study
    • Healthy and Glaucomatous eyes
    • One longitudinal study for testing the models
b cell mds pseudo time series
B-Cell: MDS & Pseudo Time-Series
  • Plots show
    • discovered path in 3D
    • Classification of B-Cell data in 2D
b cell accuracy
B-Cell Accuracy
  • Plot shows mean accuracy and variance over Cross-Validation with repeats
expert knowledge
Expert Knowledge
  • Ordering Sequence length
  • Biologist = 512.0506:
  • 1-26
  • PQ-tree: = 528.9907:
  • 1-6,7,9,8,11,10,12-18,26,19,21,20,22-25
  • PQ-tree and hill-climb = 521.1865:
  • 1-18,26,19-25
visual field mds pseudo time series
Visual Field: MDS & Pseudo Time-Series
  • Plots show
    • Path found for VF data in 3D
    • Classification of VF data in 2D
vf accuracy
VF Accuracy
  • Plot shows mean accuracy and variance over Train / Test data with repeats
related work
Related Work
  • Semi-Supervised Methods
    • Some datapoints are labelled with classes
    • These are used to assist classification of others in an incremental manner
  • Pseudo MTS imposes an order on the data as well as a distance between data
  • Allows for the prediction of future states
conclusions
Conclusions
  • Cross Section data usually models snapshot of a process
  • Longitudinal data usually needed to model temporal nature
  • Here we use ordering methods to create Pseudo Time-Series models
  • Early results on medical and biological data are promising
future work
Future Work
  • Dealing with outliers in dataspace
  • Multiple trajectories (e.g. in VF data)
  • Normalisation (rather than discretisation)
  • Combining a number of longitudinal and cross-section studies
acknowledgements
Acknowledgements
  • Thanks to:
    • David Garway-Heath, Moorifield’s Eye Hospital, London
    • Paul Kellam, University College London
ad