My first 100 tb of data
Download
1 / 27

My first 100 Tb of data - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

My first 100 Tb of data. STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP. Ciprian M. Crainiceanu Johns Hopkins University http://www.biostat.jhsph.edu/smnt. Members of the group. Key personnel C.M. Crainiceanu, B.S. Caffo, A.-M. Staicu, S. Greven, D. Ruppert, C.-Z. Di Senior Students

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' My first 100 Tb of data' - dillon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
My first 100 tb of data

My first 100 Tb of data

STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP

Ciprian M. Crainiceanu

Johns Hopkins University

http://www.biostat.jhsph.edu/smnt


Members of the group
Members of the group

  • Key personnel

    • C.M. Crainiceanu, B.S. Caffo, A.-M. Staicu, S. Greven, D. Ruppert, C.-Z. Di

  • Senior Students

    • V. Zipunnikov, J.-A. Goldsmith

  • Other statisticians (>20)

  • Scientific collaborators

    • Direct collaboration

    • Solving important scientific problems

    • Diverse scientific applications


Scientific collaborators
Scientific Collaborators

  • Susan Bassett – fMRI, Alzheimer’s

  • Danny Reich – DTI, DCE-MRI, MS

  • Brian Schwartz – lead exposure, VBM, DTI, white matter imaging

  • Stewart Mostofsky – fMRI, rsfcMRI, Autism, ADHD, Turrets

  • Naresh Punjabi – EEG, sleep, sleep diseases

  • Dzung Pham / PilouBazin – Cortical shape, thickness, lesion detection, MS

  • Dean Wong – PET, fMRI substance abuse

  • Susan Resnick –BLSA

  • Jerry Prince – BLSA, ADNI

  • Jim Pekar, Peter Van Zijl – 7T MRI, fMRI, rsfcMRI preprocessing, scanner physics

  • Christos Davatzikos- RAVENS

  • Susumu Mori – DTI, tractography

  • Dana Boatman – ECOG, EEG, epilepsy

  • Graham Redgrave – fMRI, DTI, Huntington’s, anorexia/bulimia

  • Tudor Badea, Bruno Jednyak – Neuron classification, morphometry, 3D structure and shape

  • Tom Glass – Gizmos

  • Merck – EEG, neuroimaging

  • Pfizer – imaging biomarkers?



Longitudinal functional principal component analysis lfpca
Longitudinal Functional Principal Component Analysis (LFPCA)

  • I=1000, J=4, D=100: 15’

  • I=1000, J=8, D=200: 70’

    Greven, Crainiceanu, Caffo, Reich, 2010. LFPCA, EJS, to appear


A simple regression formula
A simple regression formula

  • Data compression via longitudinal PCA

  • MoM estimators of covariance matrices, smoothing

  • Need: all covariance operators

  • Solution: regress Yij(d)Yik(d’) on 1, Tik, Tij, TikTij, djk






Functional regression
Functional regression

  • No paper on longitudinal functional regression

  • No paper published with this data structure

  • Longitudinal extensions are not “simple”

  • Technical details are hard without the correct “recipe” for known and published “ingredients”

  • No available method that scales up

    Goldsmith, Feder, Crainiceanu, Caffo, Reich, 2010. PFR, JCGS, to appear

    Goldsmith, Crainiceanu, Caffo, Reich, 2010. LPFR, to appear?



PVD

Yi = P ViD + Ei

  • P is T*A

  • D is B*F

  • Vi is A*B

  • A << T, B << F


Singular Value Decomposition (SVD) summarizes variance

One subject

Time

Subject-specific Data

Frequency.

Frequency

Diagonal

Matrix

Eigenvariates

Eigenfrequencies


Default PVD

(Start here)

Eigenvariates

SVD

Subject-specific Data

Eigenfrequencies

Low rank approximation

SVD

Population decomposition

Stacked across subjects

Projecting original data onto population bases

...

Subject-specific Data

Caffo BS, Crainiceanu CM, Verduzco G, Joel SE, Mostofsky SH, Bassett SS, Pekar JJ. Two-Stage decompositions for the analysis of functional connectivity for fMRI with application to Alzheimer’s disease risk. NeuroImage (In Press).



  • Currently:

    • Deploying PVD to the 1000 Functional Connectomes Project

    • http://www.nitrc.org/projects/fcon_1000/

    • Comparing rsfcMRI in stroke versus normal subjects








Main message backed by 100tb of data
Main message, backed by 100Tb of data

  • Eventually, good tech makes into observational and clinical trials

  • Longitudinal/Multilevel FDA is the natural next step in FDA

  • Data is changing the way we do business: availability, size, complexity

  • Likely: funding will be based much more on relevance than on technical ability


ad