560 likes | 748 Views
The Future of Biomedical Informatics. Barry Smith University at Buffalo http://ontology.buffalo.edu/smith. Biomedical Informatics Needs Data The Problem of Local Coding Schemes NIH Policies for Data Reusability and the Growth of Clinical Research Consortia Is SNOMED the Solution?
E N D
The Future of Biomedical Informatics Barry Smith University at Buffalo http://ontology.buffalo.edu/smith
Biomedical Informatics Needs Data • The Problem of Local Coding Schemes • NIH Policies for Data Reusability and the Growth of Clinical Research Consortia • Is SNOMED the Solution? • The Gene Ontology • The OBO Foundry • The National Center for Biomedical Ontology • Ontology in Buffalo
Biomedical Informatics Needs Data • The Problem of Local Coding Schemes • NIH Policies for Data Reusability and the Growth of Clinical Research Consortia • Is SNOMED the Solution? • The Gene Ontology • The OBO Foundry • The National Center for Biomedical Ontology • Ontology in Buffalo
Biomedical Informatics Needs Data • Four sides of the equation of translational medicine • Biological data + clinical data • Access + usability
Problems of gaining access to clinical data • privacy, security, liability • incentives (value of data ...) • costs (training ...)
Making data (re-)usable through standards • Standards provide • common structure and terminology • single data source for review (less redundant data) • Standards allow • use of common tools and techniques • common training • single validation of data
Problems with standards • Not all standards are of equal quality • Once a bad standard is set in stone you are creating problems for your children and for your children’s children • Standards, especially bad standards, have costs
Biomedical Informatics Needs Data • The Problem of Local Coding Schemes • NIH Policies for Data Reusability and the Growth of Clinical Research Consortia • Is SNOMED the Solution? • The Gene Ontology • The OBO Foundry • The National Center for Biomedical Ontology • Ontology in Buffalo
Multiple kinds of data in multiple kinds of silos Lab / pathology data Clinical trial data, including regulatory data Electronic Health Record data Patient histories (free text) Medical imaging Microarray data Protein chip data Flow cytometry Mass spectrometry data Genotype / SNP data Mouse data, fly data, chicken data ...
How to find your data? How to find other people’s data? How to reason with data when you find it? How to work out what data you do not have? How to understand the significance of your own data from 3 years ago?
Biomedical Informatics Needs Data • The Problem of Local Coding Schemes • NIH Policies for Data Reusability and the Growth of Clinical Research Consortia • Is SNOMED the Solution? • The Gene Ontology • The OBO Foundry • The National Center for Biomedical Ontology • Ontology in Buffalo
Sharing Research Data:Investigators submitting an NIH application seeking $500,000 or more in direct costs in any single year are expected to include a plan for data sharing or state why this is not possible (http://grants.nih.gov/grants/policy/data_sharing).
Program Announcement (PA) Number: PAR-07-425 Title: Data Ontologies for Biomedical Research (R01) NIH Blueprint for Neuroscience Research, (http://neuroscienceblueprint.nih.gov/)National Cancer Institute (NCI), (http://www.cancer.gov)National Center for Research Resources (NCRR), (http://www.ncrr.nih.gov/)National Eye Institute (NEI), (http://www.nei.nih.gov/)National Heart Lung and Blood Institute (NHLBI), (http://http.nhlbi.nih.gov )National Human Genome Research Institute (NHGRI), (http://www.genome.gov)National Institute on Alcohol Abuse and Alcoholism (NIAAA), (http://www.niaaa.nih.gov/)National Institute of Biomedical Imaging and Bioengineering (NIBIB), (http://www.nibib.nih.gov/)National Institute of Child Health and Human Development (NICHD), (http://www.nichd.nih.gov/)National Institute on Drug Abuse (NIDA), (http://www.nida.nih.gov/)National Institute of Environmental Health Sciences (NIEHS), (http://www.niehs.nih.gov/)National Institute of General Medical Sciences (NIGMS), (http://www.nigms.nih.gov/)National Institute of Mental Health (NIMH), (http://www.nimh.nih.gov/)National Institute of Neurological Disorders and Stroke (NINDS), (http://www.ninds.nih.gov/)National Institute of Nursing Research (NINR), (http://www.ninr.nih.gov) Release/Posted Date: August 3, 2007 Letters of Intent Receipt Date(s): December 18, 2007, August 18, 2008, December 22, 2009, and August 21, 2009 for the four separate receipt dates.
Purpose. Optimal use of informatics tools and resources [data sets] depends upon explicit understandings of concepts related to the data upon which they compute. This is typically accomplished by a tool or resource adopting a formal controlled vocabulary and ontology ... that describes objects and the relationships between those objects in a formal way. ... this FOA solicits Research Project Grant (R01) applications from institutions/ organizations that propose to develop an ontology that will make it possible for software to understand how two or more existing data sets relate to each other.
Currently, there is no convenient way to map the knowledge that is contained in one data set to that in another data set, primarily because of differences in language and structure. • ... in some areas there are emerging standards. Examples include: • the Unified Medical Language System (UMLS), • the Gene Ontology, http://www.geneontology.org/, • the work supported by the caBIG project (https://cabig.nci.nih.gov/workspaces/VCDE/), • ontologies listed at the Open Biomedical Ontology web site (http://obo.sourceforge.net/).
This FOA will support limited awards, each of which focuses on integrating information between two (or a few very closely related) data sets in a single subject domain. The hope is that the developed vocabularies and ontologies will serve as nucleation points for other researchers in the area to build upon by adopting and extending the vocabularies and ontologies developed under this FOA. Applicants are expected to identify and adopt emerging standards (such as those listed above) whenever possible. Applicants are also strongly encouraged to federate their data under appropriate infrastructures when possible. One potential infrastructure is provided by the Biomedical Informatics Research Network (http://www.nbirn.net ). The caBIG infrastructure (http://cabig.cancer.gov ) is another well established infrastructure that researchers should consider.
NIH anticipates that once important data sets in a topical area have been unified that others in that area will adopt the emerging standard. The nucleation points should be able to interact with each other, e.g. through the use of tools that are made freely available to the research community, such as those created by the National Center for Biomedical Ontology (NCBO) (http://bioontology.org/) or by caBIG
Another determinate of ontology acceptance is the degree to which the ontology conforms to best practices governing ontology design and construction. Criteria have been developed, and are undergoing empirical validation, by the Vocabulary and Common Data Element Work Group of caBIG. Other criteria have been specified by the OBO Foundry (http://obofoundry.org/ ). In this FOA, the applicant should specify the criteria with which the ontology will conform and the reasons that those criteria are relevant to the data sets being integrated by the proposed ontology.
Growth of Clinical and Translational Research Consortia Examples: • PharmGKB • caBIG • BIRN – Biomedical Informatics Research Network • BIRN Ontology Task Force
Biomedical Informatics Needs Data • The Problem of Local Coding Schemes • NIH Policies for Data Reusability and the Growth of Clinical Research Consortia • Is SNOMED the Solution? • The Gene Ontology • The OBO Foundry • The National Center for Biomedical Ontology • Ontology in Buffalo
medical records SNOMED codes
The Systematized Nomenclature of Medicine • built by College of American Pathologists • now maintained by International Health Terminology Standards Development Organisation • access via Virginia Tech SNOMED CT® Browser http://snomed.vetmed.vt.edu/ • (semi-) Open Source
SNOMED often includes non-perspicuous terms FullySpecifiedName: Coordination observable (observable entity) FullySpecifiedName: Coordination (observable entity)
and more: Self-control behavior: aggression (observable entity) Physical activity target light exercise (finding) is a type ofphysical activity finding (finding)
odd bunchings European is aethnic group 6 Other European in New Zealand (ethnic group) is aethnic group Mixed ethnic census group is aethnic group Flathead is aethnic group
Poor modular development • No clear strategy for improvement • Difficult to use for coding • A tax on world health information technology?
SNOMED embraces only some of the multiple kinds of siloed data Lab / pathology data Electronic Health Record data Patient histories Clinical trial data, including regulatory data Medical imaging Microarray data Protein chip data Flow cytometry Mass spectrometry data Genotype / SNP data Mouse data, fly data, chicken data ...
Biomedical Informatics Needs Data • The Problem of Local Coding Schemes • NIH Policies for Data Reusability and the Growth of Clinical Research Consortia • Is SNOMED the Solution? • The Gene Ontology • The OBO Foundry • The National Center for Biomedical Ontology • Ontology in Buffalo
The Gene Ontology Open Source Cross-Species Impressive annotation resource Impressive policies for maintenance
How to do Biology across the Genome? MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV sequence of X chromosome in baker’s yeast
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGEMKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
what cellular component? what molecular function? what biological process?
A strategy for translational medicine Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers using functional information captured by GO for given gene product types identified 189 as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention. Science. 2006 Oct 13;314(5797):268-74.
http://ontologist.com GO widely used Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers using functional information captured by GO for given gene product types identified189 as being mutated at significant frequencies and thus as providing targets for diagnostic and therapeutic intervention. Science. 2006 Oct 13;314(5797):268-74.
Benefits of GO • links people to data • links data together • across species (human, mouse, yeast, fly ...) • across granularities (molecule, cell, organ, organism, population) • links medicine to biological science
Biomedical Informatics Needs Data • The Problem of Local Coding Schemes • NIH Policies for Data Reusability and the Growth of Clinical Research Consortia • Is SNOMED the Solution? • The Gene Ontology • The OBO Foundry • The National Center for Biomedical Ontology • Ontology in Buffalo
2003 a shared portal for (so far) 58 ontologies (low regimentation) http://obo.sourceforge.net NCBO BioPortal
OBO Foundry Coordinators Lewis Berkeley Ashburner Cambridge Mungall Berkeley Smith Buffalo
The goal all biological (biomedical) research data should cumulate to form a single, algorithmically processible, whole http://obofoundry.org
The ontology isopenand available to be used by all. • The ontology is in, or can be instantiated in, a common formal language. • The developers of the ontology agree in advance to collaboratewith developers of other OBO Foundry ontology where domains overlap. CRITERIA FOUNDRY CRITERIA
UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. • ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary. CRITERIA
Consequences • OBO Foundry is serving as a benchmark for improvements in discipline-focused terminology resources • yielding callibration of existing terminologies and data resources and alignment of different views
Mature OBO Foundry ontologies (now undergoing reform) Cell Ontology (CL) Chemical Entities of Biological Interest (ChEBI) Foundational Model of Anatomy (FMA) Gene Ontology (GO) Phenotypic Quality Ontology (PaTO) Relation Ontology (RO) Sequence Ontology (SO)
Ontologies being built to satisfy Foundry principles ab initio Common Anatomy Reference Ontology (CARO) Ontology for Biomedical Investigations (OBI) Protein Ontology (PRO) RNA Ontology (RnaO) Subcellular Anatomy Ontology (SAO)
Ontologies in planning phase Biobank/Biorepository Ontology (BrO, part of OBI) Environment Ontology (EnvO) Immunology Ontology (ImmunO) Infectious Disease Ontology (IDO)
Biomedical Informatics Needs Data • The Problem of Local Coding Schemes • NIH Policies for Data Reusability and the Growth of Clinical Research Consortia • Is SNOMED the Solution? • The Gene Ontology • The OBO Foundry • The National Center for Biomedical Ontology • Ontology in Buffalo