Inference of Signaling Networks Using Quantitative Morphological Signatures:

Inference of Signaling Networks Using Quantitative Morphological Signatures: Parallel Computing Framework Oaz Nir18.337JMay 13, 2008

Challenges in Describing Signaling Networks 1Connectivity 2 Flow of Signaling Information 3 Subcellular Distribution of Local Networks

Cell Morphology = Signaling State 1 “stress fibers” 2 3 GTP 4 Rho Rho Activation 5 h Active Rho Morphological Signature = F1, F2, F3,…Fn 7 6 8 9 normal perimeter “ruffles/lammelipodia” “spiky” GTP GTP Rac Cdc42 Cdc42 Activation Rac Activation d Active Rac Morphological Signature = F1, F2, F3,…Fn Active Cdc42 Morphological Signature = F1, F2, F3,…Fn

Transmembrane Receptors Polarity Complexes GEF GAP GAP GEF GEF GAP GAP GEF G GEF GPCRs/G-proteins GAP AdhesionStructures GAP G GEF GEF GAP G GAP GEF GEF GAP G GAP GEF GEF GAP GAP GEF GEF GAP GAP GEF GEF GAP Actin regulators Lipid Regulators GAP GEF GEF GAP GAP GEF GEF GAP GEF GEF GAP MT regulators Actin/MT coordinators Understanding how Signaling Networks that Regulate Morphology are Organized and How Information Flows through these Networks RhoGTPases

Acquiring Morphological Signatures from Complex Images 3. CellSegmenter 2. Image Acquisition (GFP) features cell “x1”condition “a1” Normalized feature values cell “x2” condition “a1” … N Treatment Conditions (TCs) 1. Cell Culturing+ GFP+/- dsRNA+/- Gene overexpression GFP Cell Segment Ruffle Area Edge Process Area Drainage Area Half Mass fr. Centroid Half Mass fr. Boundary Gaussian Fit Low Smooth/Best Ellipse Fit High Smooth/Best Ellipse Fit x=0.358y=0.357=-0.248 DAPIGFPF-Actin

All features B A Feature n Feature y Feature x Reduce Dimensionality B A B Feature a A Classifier (test) Feature b Feature c Raw Morphological Data and Data Reduction 145 phenotypic features

Using Feature Graphs to Model Single-cell Distributions i j dsRNA x 3i,3j,-4k Feature values: i,j,k -3i,-3j,-4k i j dsRNA y k 4i,4j,-4k Feature values: i,j,k 2i,2j,-2k • Create structures that allow inference of signaling pathways • Utilize single-cell data • Linear correlations are fast and easy to compute • Graphs on the same vertex set are comparable by various algorithms from graph theory Draw a vertex for each feature (neural network classifier) For each dsRNA treatment, defined a graph as follows: Draw an edge if the correlation between corresponding features among single cell data exceeds a threshold

C A B Inference Based on Feature Graphs This is the unknown signaling network we will infer. For this slide, assume we know the signaling network ahead of time Intuition for Inference Based on Feature Graphs Question: What is the relationship between feature graphs of genes in a signaling pathway? [F1, F2] RNAi Gene C Expect feature graph to have a relatively small number of edges Feature graph is approximately the intersection of the feature graphs for RNAi A and RNAi B RNAi Gene A [F1, F2, F4] [F1, F2, F3] Expect feature graph to have relatively large number of edges RNAi Gene B Expect feature graph to have a relatively large number of edges

Focus on Details of Feature Graph Construction F=150 features per cell (future data sets will be larger) For each FG, need to compute linear correlation for C=50 data points for all F*(F-1)/2=150*149/2 pairs of features. Since there are N = 250 TCs, there are a total of N*F*(F-1)/2 linear correlations to compute. N =250 Treatment Conditions (TCs) Draw a vertex for each feature (neural network classifier) Drosophila Data Set For each dsRNAtreatment (TC), defined a graph as follows: C = 50 cells per TC Draw an edge if the correlation between corresponding features among single cell data exceeds a threshold

Focus on Details of Feature Graph Construction F=150 features per cell For each FG, need to compute linear correlation for C=50 data points for all F*(F-1)/2=150*149/2 pairs of features. How to compute all pairwise correlations efficiently? matmul of FxC and CxF Computation is dominated by matmul of FxC and CxF N =250 Treatment Conditions (TCs) Drosophila Data Set matmul of Fx1 and 1xF C = 50 cells per TC **Matlab built-in “corr” does not work with ppeval

Parallelize in Dimension of TCs Speed-up? Parallel Serial

What if the TCs Have Different Numbers of Cells (C)?

What if the TCs Have Different Numbers of Cells (C)? Serial Parallel

Summary and Conclusions • Feature graph construction depends on computation of numerous linear correlations • Parallelization was implemented • But speed-ups were not realized (why not?) • In fact, slower because of time required to move data to/from the server • Speed-ups are realized for *very* large data sets because the server can handle larger data more smoothly than a typical PC. But this is not due to parallelization, rather due to hard drive usage. • Why didn’t parallelization result in gains in speed? • Interactive Supercomputing doesn’t preallocate matrices in Matlab • Structure of problem? • Coding?

Acknowledgments: Chris Bakal Bonnie Berger John Aach Norbert Perrimon George Church

Inference of Signaling Networks Using Quantitative Morphological Signatures: