1 / 20

CARE Center Informatics Subcommittee

CARE Center Informatics Subcommittee. Background and Progress Report for 10/20/06 Conference Call Marcia Nizzari. Background Information. On CARE informatics, GAP (Genetic Analysis Platform) and Informatics Team. CaRE Center Informatics. Builds on existing Genetic Analysis Platform

ronna
Download Presentation

CARE Center Informatics Subcommittee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CARE Center Informatics Subcommittee Background and Progress Report for 10/20/06 Conference Call Marcia Nizzari

  2. Background Information On CARE informatics, GAP (Genetic Analysis Platform) and Informatics Team

  3. CaRE Center Informatics • Builds on existing Genetic Analysis Platform • Operational for 2+ years • Genotyping and Resequencing • Code base successfully reused • CaRE Center enhancements: • Data sharing strategy • Phenotype/Trait thesaurus, meta thesaurus • Customizable analytic pipelines

  4. GAP by the Numbers… • 510,000 lines of working source code • Very large databases, one of the largest tables has 640 M rows • Standard industry metrics (SLOCCount) estimate that this code base required • $18M to develop (actual: ~ $4.5M) • Staff of 40 for 3.5 years (actual: average 12/yr) • Pretty good deal! • Truly a World Class informatics team

  5. GAP by the #’s (cont’d.) • User statistics: • 222 logins for the system (internal & external) • Nina has trained (since 12/05): • 47 people individually • Held 25 small group sessions • Jira statistics: • 1,722 issues logged since Jira in production • 1,330 resolved • 392 open/in progress • Daniel Mirel is the champion Jira user

  6. GAP by the #’s (cont’d) • Informatics staff statistics: • Total software dev experience: ~200 years • Degrees held: • 1 PhD (neuroscience) • 6 masters degrees • 4 comp sci, 1 molbio, 1 manufacturing • 4 eng bachelor degrees • 2 EE’s, 1 ChemEng, 1 SoftEng • 3 biochem/molbio/biology bachelor degrees • 4 comp sci bachelor degrees • 1 physics bachelor degree

  7. User Workflow in Software Samples & Clinical Information Genome Sequence & Genetic Variation Biological Sample Platform NCBI HapMap DCC Purchased SNPs (Celera) dbSNP Plan Experiment in Project Management Project Management Execute Genotyping Experiment Execute Resequencing Experiment Genotyping Pipeline (ESP) Resequencing Pipeline Genetic Analysis Perform analysis and loop back to next round of experiment planning

  8. CARE Association Study Workflow Analysis:Gene Pattern + custom analysis tools Production: Sample Mgt, Project Mgt, Genotyping Upload Samples, Peds, Individuals, Phenotypes Sample DB Data Compile Project DB Summarize/Filter PLINK Create Experiments (Samples x Features) Web Services Association & Statistics Viewers Feature DB Design and Execute Experiments LIMS DBs Custom Algorithms, Viewers Data Vault QC/Curate Results

  9. Thesauri, Meta Thesaurus for CARE Controlled Vocabulary Constraints One ontology – either Group or Project specified Phenotype Inquiry Base Component Phenotype Capture and Validation Phenotype Component Conceptual Architecture

  10. CARE Progress Report • PhenoMall functionality • Rapid enhancement of capture function • Meta data • Mapping of all CARE phenotypes looks good • Major enhancements for pheno inquiry • Informatics goals of pilot • Figure out how far up the controlled vocab/thesaurus stack we need to go • What curation tools are needed? • Requirements gathering beyond pilot • Awaiting the decision on data sharing…

  11. Deliverables • NIH Application/System Security Plan • Two major revisions, July 17th and Oct 16th • Security officer at NHLBI is Cindy Walczak • When data sharing model decided: • Research technologies, approaches, make recommendation to subcommittee • Spec/design and review by subcommittee • Working pilot • Need to discuss when to demo – Feb meeting in Bethesda??

  12. Security Considerations

  13. Security Layers - General • There are at least three levels: • MIT firewalls • Penetration testing, Tripwire, packet monitoring, etc. • Broad • New Cisco firewalls • Route to host servers • Explicit Allows only • Wireless access goes out to MIT firewall • Open jack goes to Broad firewall • CARE Center application itself

  14. The World MIT The Broad Institute Firewalls On LIMS Used for authentication for VPN access MIT Host A Cisco ASA 5540 Internet “Cloud” Radius DB Core Router Host B Cisco ASA 5540 Host on server … Access Rules for Subnets: Explicit allows, e.g., allow host on LIMS to talk to host on server Must be in the list to permit access Allow Rules: Explicit allows – http = 80 -> host Ssh = 22 -> host https = 443 (SSL) Unregistered 10.10 domain Open jack Wireless

  15. Security Layers - Application • Genetic Analysis Platform application security: • Role-based security • Passwords that expire • Audit trails track user activity • Detailed information available in NIH Application/System Security Plan for CARE Center

  16. Summary: Issues/Questions • Scope of phenotype-related enhancements • Group/Project structure for CaRE Center • CaRE user visibility into Process Dashboard/LIMS • Data release model decision • Data Enclave scenarios and security • User training and doco • Analysis methodology • System and security training

  17. Users in JAAS domain BSP Lab Technician CaRE Scientist CaRE Cohort Technician Project Management Biological Samples Platform Analysis Pipelines BSP Security Context (Sample Collection) Proj Mgt SecurityContext (Project) CaRE Analysis Security Context (Scope based on rules of Data Enclave, could cover multiple Projects) Groups, Projects, Grants, Panels, Feature Sets, Sample Sets Shareable Objects: Peds, Individuals, Phenotypes, Samples, Features LSIDs Process/ LIMS PIPS DB Feature DB Lab SecurityContext (X-Project) Broad Lab Technician, Coordinator Security for Production & Analysis

  18. Postlude

  19. How Users Can Help • Specify! We need things nailed down… • The classic specification: • Genesis 6:14 - 16 (NKJV) 14 "Make yourself an ark of gopherwood; make rooms in the ark, and cover it inside and outside with pitch. 15 "And this is how you shall make it: The length of the ark shall be three hundred cubits, its width fifty cubits, and its height thirty cubits. 16 "You shall make a window for the ark, and you shall finish it to a cubit from above; and set the door of the ark in its side. You shall make it with lower, second, and third decks • We live in the world of 0’s and 1’s!

  20. Jason Carey Kristian Cibulskis Michael Dinsmore Tim Fennell George Grant Bob Handsaker Nina Lapchyk Pei Lin James Nemesh (CH) Huy Nguyen Howard Rafal Greg Rushton Dennis Ryan David Tefft Alex Thomson Ellen Winchester Alec Wysoker Informatics Development Team Names in bold have significant time allocated to CARE center activity.

More Related