My first 100 tb of data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

My first 100 Tb of data PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on
  • Presentation posted in: General

My first 100 Tb of data. STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP. Ciprian M. Crainiceanu Johns Hopkins University http://www.biostat.jhsph.edu/smnt. Members of the group. Key personnel C.M. Crainiceanu, B.S. Caffo, A.-M. Staicu, S. Greven, D. Ruppert, C.-Z. Di Senior Students

Download Presentation

My first 100 Tb of data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


My first 100 tb of data

My first 100 Tb of data

STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP

Ciprian M. Crainiceanu

Johns Hopkins University

http://www.biostat.jhsph.edu/smnt


Members of the group

Members of the group

  • Key personnel

    • C.M. Crainiceanu, B.S. Caffo, A.-M. Staicu, S. Greven, D. Ruppert, C.-Z. Di

  • Senior Students

    • V. Zipunnikov, J.-A. Goldsmith

  • Other statisticians (>20)

  • Scientific collaborators

    • Direct collaboration

    • Solving important scientific problems

    • Diverse scientific applications


Scientific collaborators

Scientific Collaborators

  • Susan Bassett – fMRI, Alzheimer’s

  • Danny Reich – DTI, DCE-MRI, MS

  • Brian Schwartz – lead exposure, VBM, DTI, white matter imaging

  • Stewart Mostofsky – fMRI, rsfcMRI, Autism, ADHD, Turrets

  • Naresh Punjabi – EEG, sleep, sleep diseases

  • Dzung Pham / PilouBazin – Cortical shape, thickness, lesion detection, MS

  • Dean Wong – PET, fMRI substance abuse

  • Susan Resnick –BLSA

  • Jerry Prince – BLSA, ADNI

  • Jim Pekar, Peter Van Zijl – 7T MRI, fMRI, rsfcMRI preprocessing, scanner physics

  • Christos Davatzikos- RAVENS

  • Susumu Mori – DTI, tractography

  • Dana Boatman – ECOG, EEG, epilepsy

  • Graham Redgrave – fMRI, DTI, Huntington’s, anorexia/bulimia

  • Tudor Badea, Bruno Jednyak – Neuron classification, morphometry, 3D structure and shape

  • Tom Glass – Gizmos

  • Merck – EEG, neuroimaging

  • Pfizer – imaging biomarkers?


Observational studies 2 0

Observational Studies 2.0


Longitudinal functional principal component analysis lfpca

Longitudinal Functional Principal Component Analysis (LFPCA)

  • I=1000, J=4, D=100: 15’

  • I=1000, J=8, D=200: 70’

    Greven, Crainiceanu, Caffo, Reich, 2010. LFPCA, EJS, to appear


A simple regression formula

A simple regression formula

  • Data compression via longitudinal PCA

  • MoM estimators of covariance matrices, smoothing

  • Need: all covariance operators

  • Solution: regress Yij(d)Yik(d’) on 1, Tik, Tij, TikTij, djk


Variance explained fa 3 yrs of long data

Variance explained (FA, 3 yrs of long. data)


Longitudinal penalized functional regression

Longitudinal Penalized Functional Regression


Lpfr recipe and ingredients

LPFR: recipe and ingredients


Pasat md corp call pd cortic spinal

PASAT/MD (Corp. Call.), PD (Cortic. spinal)


Functional regression

Functional regression

  • No paper on longitudinal functional regression

  • No paper published with this data structure

  • Longitudinal extensions are not “simple”

  • Technical details are hard without the correct “recipe” for known and published “ingredients”

  • No available method that scales up

    Goldsmith, Feder, Crainiceanu, Caffo, Reich, 2010. PFR, JCGS, to appear

    Goldsmith, Crainiceanu, Caffo, Reich, 2010. LPFR, to appear?


My first 100 tb of data

Population Value Decomposition (PVD)


My first 100 tb of data

PVD

Yi = P ViD + Ei

  • P is T*A

  • D is B*F

  • Vi is A*B

  • A << T, B << F


My first 100 tb of data

Singular Value Decomposition (SVD) summarizes variance

One subject

Time

Subject-specific Data

Frequency.

Frequency

Diagonal

Matrix

Eigenvariates

Eigenfrequencies


My first 100 tb of data

Default PVD

(Start here)

Eigenvariates

SVD

Subject-specific Data

Eigenfrequencies

Low rank approximation

SVD

Population decomposition

Stacked across subjects

Projecting original data onto population bases

...

Subject-specific Data

Caffo BS, Crainiceanu CM, Verduzco G, Joel SE, Mostofsky SH, Bassett SS, Pekar JJ. Two-Stage decompositions for the analysis of functional connectivity for fMRI with application to Alzheimer’s disease risk. NeuroImage (In Press).


My first 100 tb of data

Population eigenimages


My first 100 tb of data

  • Currently:

    • Deploying PVD to the 1000 Functional Connectomes Project

    • http://www.nitrc.org/projects/fcon_1000/

    • Comparing rsfcMRI in stroke versus normal subjects


Hd mfpca ravens images

HD-MFPCA/RAVENS Images


Multilevel functional principal component analysis mfpca

Multilevel Functional Principal Component Analysis (MFPCA)


Mfpca

MFPCA


Hd mfpca

HD-MFPCA


Hd mfpca step 1

HD-MFPCA, Step 1


Hd mfpca step 2

HD-MFPCA, Step 2


Main message backed by 100tb of data

Main message, backed by 100Tb of data

  • Eventually, good tech makes into observational and clinical trials

  • Longitudinal/Multilevel FDA is the natural next step in FDA

  • Data is changing the way we do business: availability, size, complexity

  • Likely: funding will be based much more on relevance than on technical ability


  • Login