Statistical Tool for Identifying Sequence Variations Correlating with Virus Phenotype

Statistical Tool for Identifying Sequence Variations that Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) Brett E. Pickett1, Douglas S. Greer1, Yun Zhang1, Liwei Zhou2, Sanjeev Kumar2, Sam Zaremba2, Chris Larsen3, Edward B. Klem2, Richard H. Scheuermann1 1J. Craig Venter Institute, San Diego, CA; 2Northrop Grumman Health Solutions, Rockville MD; 3Vecna Technologies, Greenbelt MD. Introduction Protein Ortholog Search Metadata-driven Comparative Genomics Statistical Analysis • The Virus Pathogen Database and Analysis Resource (ViPR, www.viprbrc.org), sponsored by the National Institute of Allergy and Infectious Diseases, serves as a single publicly accessible repository of integrated datasets and analysis tools for 14 different virus families--including Herpesviridae. ViPR supports wet-bench virology research focusing on the development of diagnostics, prophylactics, vaccines, and treatments for these pathogens1. • The usefulness of the ViPR system can be demonstrated by using a scientific use case. Here we examine the sequence variation existing within the glycoprotein D (gD) protein in Human Herpesvirus-1 (HHV-1) and Human Herpesvirus-2 (HHV-2). • ViPRSupports 14 Virus Families • ViPR IntegratesData from Many Sources • GenBank sequence records, gene annotations, and strain metadata • Protein Databank (PDB) 3D protein structures • Immune epitopes from the Immune Epitope Database (IEDB) • Clinical data • Host Factor Data generated from the NIAID Systems Biology projects and the ViPR-funded Driving Biological Projects • UniProtKB protein annotations • Gene Ontology (GO) classifications • Additional data derived from computational algorithms • ViPR Provides Analysis and Visualization Tools • Multiple Sequence Alignment • Phylogenetic Tree Construction • Sequence Polymorphism Analysis • Metadata-driven Comparative Genomics Statistical Analysis • Genome Annotator • Gbrowse Genome Viewer • Sequence Format Conversion • BLAST Sequence Similarity Search • 3D Protein Structure Visualization • Sequence Feature Variant Types • Ortholog Group Assignments • ViPR enables you to store and share data and results through the ViPR Workbench • ViPR groups viral proteins together based on their predicted orthology within a virus taxon to facilitate gene/protein search, gene function inference, and virus evolution research. These orthologous groups can then be queried intuitively. • A search for orthologs of the US6 gene, which codes for Glycoprotein D (gD), was performed. Non-redundant HHV-1 and HHV-2 sequences were selected for more in-depth analysis. • ViPR uses an automated pipeline to identify sequence variations that significantly differ between groups of strains. • Flaviviridae • Hepeviridae • Herpesviridae • Paramyxoviridae • Picornaviridae • Arenaviridae • Bunyaviridae • Caliciviridae • Coronaviridae • Filoviridae • Poxviridae • Reoviridae • Rhabdoviridae • Togaviridae Figure 5: Results from Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS). Shows abridged output using meta-CATS to compare HHV-1 and HHV-2 with residues located within experimentally-determined positive B-Cell epitopes (underlined) found in ViPR. Figure 2: Screenshots of the Ortholog Group Component. Users can search for orthologs using various criteria (left) and then browse the results according to ortholog group (right). Of the 493 gD protein orthologs predicted by ViPR, 39 (HHV-1) and 25 (HHV-2) non-redundant sequences were included in this analysis. 3D Protein Structure Viewer • ViPR provides multiple data types for viewing on a 3D structure. Phylogenetic Tree • ViPR can generate phylogenetic trees from search results, multiple sequence alignment, working set, or custom sequences via upload. Figure 6: 3D Protein Structure Viewer in ViPR. A display of a 3D protein structure for HHV-1 glycoprotein D complexed with Nectin-1. Residue 48 (cyan) and an epitope comprising residues 77-87 (green) are highlighted (PDB ID: 3U82). Summary Figure 3: A Phylogenetic Tree Reconstruction of HHV-1 and HHV-2 sequences. The distance-based FastME algorithm, implemented in ViPR was used to generate a phylogenetic tree of all 64 HHV-1 (red) and HHV-2 (blue) amino acid sequences. • ViPR can assist in various comparative genomics analyses. As an example use case, we identified 2 significant sequence variations that: • Have diverged through speciation between HHV-1 and HHV-2 • Overlap with known B-Cell epitopes • Could vary in response to external pressure(s) while retaining the ability to bind and enter host cells following speciation • In conclusion, theViPR resource combines a powerful database with integrated bioinformatics tools to perform computational analyses and assist in hypothesis generation. The uniqueness of ViPR lies in: • integrating data from various sources • capturing unique data on the host response to virus infection • combining necessary tools to perform analytical workflows • allowing data sharing and storage with collaborators Multiple Sequence Alignment • Multiple sequence alignment (MSA) can be calculated directly from: search results, a working set, or custom uploaded sequences. Acknowledgements We would like to thank the primary data providers for the data that was used throughout this study. We also recognize the scientific and technical personnel responsible for supporting and developing ViPR, which has been wholly supported with federal funds from the NIH/NIAID (N01AI2008038 and N01AI40041 to R.H.S.). Figure 4: Alignment of gD Amino Acid Sequences HHV-1 (white) and HHV-2 (gray) gD sequences show a high degree of divergence towards the N-terminus of the protein. Blue arrows highlight a subset of significant positions. Figure 1: A screenshot of the ViPRhomepage The ViPR homepage is the portal used to access the various types of data and advanced functionality for any supported virus family. References 1 Pickett, B.E., et al. (2012) ViPR: an open bioinformaticsdatabaseandanalysisresource for virologyresearch. Nucl. Acids Res. 40(D1): D593-D598.

Statistical Tool for Identifying Sequence Variations Correlating with Virus Phenotype

Statistical Tool for Identifying Sequence Variations Correlating with Virus Phenotype

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction