1 / 3

Scalable graph analytics for metagenomics and metaproteomics

Scalable graph analytics for metagenomics and metaproteomics. Ananth Kalyanaraman @ HPCBio lab ( ananth@eecs.wsu.edu ) Associate Professor, School of EECS, Washington State University, Pullman, WA.

Download Presentation

Scalable graph analytics for metagenomics and metaproteomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable graph analytics for metagenomics and metaproteomics AnanthKalyanaraman@ HPCBio lab (ananth@eecs.wsu.edu) Associate Professor, School of EECS, Washington State University, Pullman, WA Research Areas: Parallel algorithms, Computational biology/bioinformatics, Graph algorithms, String algorithms, Parallel architectures Environmental microbial community analytics • Applications: • bioenergy alternatives • human health • environmental monitoring • soil and forest ecology • ocean microbiology … NGS Funding relevance: DNA, RNA, protein,mass spec/peptide • Data scale: • #studies: >350 • #samples: >2,500 • #genic/ORF reads: >100M+ • … Image courtesy: www.genomesonline.org Workshop on Future Computing Platforms to Accelerate Next-Gen Sequencing (NGS) Applications, May 19, 2013, held in conjunction with IPDPS’13, Boston, MA

  2. Some graph-theoretic problems in environmental microbial community analytics • Problems: • Network construction • Clustering • Community annotation • Network comparison • Heterogeneity • … • Parallelism: • mostly rudimentary/ad hoc in standard workflows • distributed memory • MPI, MapReduce • Intra-node • Multicore, GPUs • Some challenges: • inherits graph-related challenges and choice of architectures • availability of networks/inference • data integration • low sampling, species diversity • qualitative metrics • automated workflows • … • Source data: • Protein/ORF sequence homology • Mass spectral library construction • Interaction networks (gene, protein) Workshop on Future Computing Platforms to Accelerate Next-Gen Sequencing (NGS) Applications, May 19, 2013, held in conjunction with IPDPS’13, Boston, MA

  3. Graphs are pervasive in Computational Biology STRING GRAPHS reads CLIQUE Genome Comparativegenomics gene motifs mRNA PATTERNMATCHING search PROBABILISTIC GRAPH MODELS database TREES, DAGS, TSP, ML Phylogenetictree protein CLASSICAL NETWORKANALYSIS …. Populationgenomics COMPARATIVE NETWORK ANALYSIS Proteinfamilies SIAM CSE'13, Boston, MA

More Related