130 likes | 253 Views
North Carolina Bioinformatics Grid. Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina. Genomics A Compute- & Data-Intensive Science. * from TimeLogic. Data Explosion Rapid Growth of GenBank. Growth of GenBank
E N D
North CarolinaBioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina
GenomicsA Compute- & Data-Intensive Science * from TimeLogic
Data ExplosionRapid Growth of GenBank • Growth of GenBank • Number of base pairs increasing dramatically (exponentially) • Growth in 2002 due to additions in just 21 days! No. Gbases
Data ExplosionNumber and Diversity of Databases Nucleic Acids Research, 2002, Vol. 30, No. 1 Table 1. Molecular Biology Database Collection Major Public Sequence Repositories DNA Data Bank of Japan (DDBJ) http://www.ddbj.nig.ac.jp All known nucleotide and protein sequences … Varied Biomedical Content … VirOligo http://viroligo.okstate.edu Virus-specific oligonucleotides for PCR and … 333 Databases
Computing ExplosionAssembly and Analysis of Genomic Data • Celera Genomics–Assembling the Genome • Compaq Alpha Clusters • Number of processors: ~ 750 • Peak performance: 1 teraops • NuTech Sciences–Mining the Genome • IBM p640 System • Number of processors: ~ 5,000 • Peak performance: 7½ teraops • Total memory: 2½ terabytes • Total disk storage: 50 terabytes
GenomicsMeeting the Information Challenge Data Storage Network Grid Middleware Computers
North CarolinaResearch and Education Network Elizabeth City Winston Salem Boone Greensboro Rocky Mount RTP Asheville Greenville Fayetteville Cullowhee Charlotte Pembroke RTP RPoP Morehead City NCCU Wilmington Duke • NCREN3 • Increased bandwidth • Increased reliability • Increased resiliency NCSU Qwest MCNC NCSU Centennial Campus UNC-CH
Grid Technologies • Major New Computing Technology • Under development since mid-1990s • Distinguishing Characteristics • “Middleware” to support efficient resource sharing in a distributed, heterogeneous computing and data storage environment • Focus on use of large-scale computing and data storage • Some Major Grid Efforts • NASA IPG—Testbed linking selected NASA centers • DataGrid—International Grid being developed for high-energy physics (CERN)
Grid Technologies (cont’d) • Some Major Grid Efforts (cont’d) • GriPhyN—Research in Grid technologies for physics applications (Argonne, Florida) • e-Science Grid—Major effort in UK to develop a Grid infrastructure for science and engineering research • BIRN—Data Grid focused on neuroimaging data (UCSD, SDSC)
North CarolinaGenomics and Bioinformatics Consortium • Goal • Provide a venue for Consortium members to share information and resources, plan strategic initiatives, and form alliances • Distributed Across North Carolina • Concentration in Research Triangle, but extends across all of North Carolina • Diverse Goals and Expertise • Human health, including animal models; agriculture and forestry; evolutionary biology basic research; tool development
Overall NC BioGrid Architecture Grid-aware, -enabled bioinformatics applications … BioApp #1 BioApp #2 BioApp #3 Grid Middleware Globus, Legion, … Network NCREN3 NCSC plus Member’s Computing Centers Computing and Data Resources
NC BioGrid Project • Two Phases • Testbed Phase—test existing middleware, resolve issues, prepare detailed plan (12-18 months) • Production Phase—create and operate NC BioGrid • Funding for Testbed from MCNC • Project Manager • Phil Emer, MCNC, Chief Architect/NC BioGrid • Project Oversight • MCNC Board of Directors • HPCC Advisory Board • NC BioGrid Technical Advisory Group