1 / 18

Data Science in Biomedical Informatics Education

Data Science in Biomedical Informatics Education. Accelerate discovery and advance health through data-driven research. Valerie Florance, PhD Associate Director for Extramural Programs, National Library of Medicine AMIA 2018 florancev@mail.nih.gov. Context for the Panel’s Topic.

rogelioi
Download Presentation

Data Science in Biomedical Informatics Education

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Science in Biomedical Informatics Education • Accelerate discovery and advance health through data-driven research Valerie Florance, PhD Associate Director for Extramural Programs, National Library of Medicine AMIA 2018 florancev@mail.nih.gov

  2. Context for the Panel’s Topic • NLM strategic plan, Goal 3 • NLM’s history of evolving university-based training for research careers in biomedical informatics • Examples of curriculum topics from NLM and BD2K data science training grants • NIH Strategic Plan for Data Science, Goal 4 • Recent reports on ‘core concepts’ in data science

  3. NLM Strategic Plan2017-2027

  4. NLM Strategic Plan: A Platform for Biomedical Discovery and Data-Powered Health • Objective 3.1: Expand and enhance research training for biomedical informatics and data science. • Important skills for all researchers include being prepared to “compute in context;” extract meaning and insight from aggregations of data; and create new ways to analyze, visualize, mine, and integrate data and information. • Specialists must be developed to meet the challenge of enabling dynamic, real-time curation at the scale and complexity that the future of biomedical data and information holds. • https://www.nlm.nih.gov/pubs/plan/lrp17/NLM_StrategicReport2017_2027.html

  5. Evolution of NLM’s Informatics Training Programs • 1970s – Health Computer Science • 1980s - Clinical Informatics • 1990s - Clinical + early Bioinformatics • 2000-06 Clinical + Bioinformatics + early Imaging + early Public Health Informatics • 2007-11 Clinical + Bioinformatics + Imaging + Public Health Informatics + early clinical research informatics • 2012-16 Health care, translational bioinformatics, public health informatics, clinical research informatics, dental informatics • 2017-21 Health care, translational bioinformatics, public health informatics, clinical research informatics, data science, environmental exposure informatics

  6. NLM T15 Training Programs FOA 2016 • Graduates of the NLM-supported programs should be able to conduct original basic or applied research at the intersection of computer, statistical and information sciences with one or more biomedical application domains. Successful graduates of these programs will be prepared for research-oriented roles in academic institutions, not-for-profit research institutes, governmental and public health agencies, pharmaceutical and software companies, and health care organizations. • The proposed training in a required core curriculum should include informatics and data science principles and concepts, quantitative methods, such as biostatistics and applied mathematics, concepts of computer science, engineering, information sciences and/or other relevant fields, and instruction on the design of rigorous, reproducible research studies in biomedical informatics and data science.

  7. Data science as a thread through biomedical informatics focus areas in NLM training programs • Health care/clinical informatics: precision medicine • Translational bioinformatics: health effects of environment, mining large scale genome-phenome data sets; intelligent tools for curation, visualization and analysis of biomedical big data • Clinical research informatics: big data analytics, biostatistics, in-silico trials, merging and mining large disparate data sets that mix images, text and data • Public health informatics: health literacy, health effects of climate change, big data analysis for population health

  8. 16 NLM University-based Training Programs • Columbia • Stanford • Vanderbilt • U Pittsburgh • OHSU • Harvard • Rice • U Washington • UC San Diego • U Wisconsin- Madison • SUNY – Buffalo • U Colorado • UNC-Chapel Hill • Yale • U Utah • Indiana IUPUI

  9. Data Science in the Curriculum in NLM training programs • Core data science course content (examples) • Data structures and algorithms • Statistical reasoning and data analytics • Data-driven medicine • Mathematical techniques to analyze data and test hypotheses • Symbolic methods, including ontologies and standards • Reproducibility and Responsible Conduct of Research • Ethics, privacy, data security, data sharing, current topics

  10. BD2K T32 Data Science Training: 14 active programs • “To … fully utilize the vast amount of heterogeneous biomedical Big Data there must be an increase in the number of individuals: (1) trained in developing tools, methods, and analyses to make Big Data useful, and (2) knowledgeable about how to use the tools, methods, and analyses.  • Training requirements • intersection of these three scientific areas – computer science/informatics, statistics/mathematics, and biomedical science;  • extensive coursework in both advanced statistical and computational techniques; • training faculty from all of these three scientific disciplines who will work collaboratively across disciplines as co-mentors of trainees • team approach to solving data-intensive biomedical problems, while also nurturing the skills necessary to be an independent investigator in Big Data Science.

  11. Responsible Conduct of Research (RCR) trainingin BD2K T32 Programs • Program 1: Standard Responsible Conduct of Research topics plus CITI modules on records-based research and genetic research in human populations. Data privacy, confidentiality, disclosure limitation, privacy preserving security and data mining • Program 2: Nine standard RCR components plus research misconduct , case studies, research ethics

  12. NIH Strategic Plan for Data Science • Data science—interdisciplinary field of inquiry in which quantitative and analytical approaches, processes, and systems are developed and used to extract knowledge and insights from increasingly large and/or complex sets of data (from the glossary) • Goal 1: Support a Highly Efficient and Effective Biomedical Research Data Infrastructure • Goal 2: Promote Modernization of the Data-Resource Infrastructure • Goal 3: Support the Development and Dissemination of Advance Data Management, Analytics and Visualization Tools • Goal 4: Enhance Workforce Development for Biomedical Data Science • Goal 5: Enact Appropriate Policies to Promote Stewardship and Sustainability • https://datascience.nih.gov/sites/default/files/NIH_Strategic_Plan_for_Data_Science_Final_508.pdf

  13. NIH Strategic Plan for Data Science • Implementation Tactics for Objective 4.1 • Develop data-science training programs for NIH staff • Launch the NIH Data Fellows program: NIH will recruit a cohort of data scientists and others with expertise in areas such as project management, systems engineering, and computer science from the private sector and academia for short-term (1-3 yr) national service sabbaticals. These NIH Data Fellows will be embedded within a range of high-profile, transformative NIH projects such as All of Us, the Cancer Moonshot, and the BRAIN initiative…. • https://datascience.nih.gov/sites/default/files/NIH_Strategic_Plan_for_Data_Science_Final_508.pdf

  14. NIH Strategic Plan for Data Science • Implementation Tactics for Objective 4.2 • Enhance quantitative and computational training for undergraduates, graduate students, and postdoctoral fellows. • Enable the development of curricula and other resources toward enhancing rigor and reproducibility of data science-based approaches. • Promote training of data scientists in biomedical research areas. • Improve the education of students on NIH training grants by enriching content in Responsible Conduct of Research requirements with information about secure and ethical data use. • In keeping with the National Library of Medicine’s (NLM) strategic plan, “A Platform for Biomedical Discovery and Data-powered Health,” NIH will partner with institutions to engage librarians and information specialists in finding new paths in areas such as library science that have the potential to enrich the data-science ecosystem for biomedical research.

  15. Resources: Core Skills for biomedical data science • White paper developed by NLM staff to supplement the NLM strategic plan • Kaggle survey of over 16K self-identified data scientists • Data science skills taught in BD2K training programs • Desired skills in Data science-related job ads • Core skills: • General biomedical subjects matter knowledge • Programming language expertise (e.g. R or Python) • Predictive analytics, modeling and machine learning • Team science and scientific communications • Responsible data stewardship • (Zaringhalam, Federer and Huerta. NLM Office of Strategic Initiatives) [no date – 2017]

  16. Resources: Data Science for Undergraduates. Opportunities and Options (Final Report. NASEM: 2018) • “Today the term ‘data scientist’ typically describes a knowledge worker who uses complex and massive data resources characteristic of this new era. However, data science is a broader concept involving principles for data collection, storage, integration, analysis, inference, communication and ethics….” • “…one can envision a variety of models for data science instruction, including discipline-centered data science courses offered by specific academic departments..., large introductory data science courses serving the campus-wide student body,... online courses, bootcamps and other innovative approaches.” • NASEM doi.org/10.17226/25104, p. 8

  17. Foundational skills for data science: Data Acumen (NASEM) • The ability to make good judgments and decisions with data • Mathematical foundations, computational thinking, statistical thinking • Data management and curation, data description and visualization, data modeling and assessment • Workflow and reproducibility • Communication and teamwork, • Domain specific considerations • Ethical problem solving. • NASEM doi.org/10.17226/25104 (p.22, finding 2.3)

  18. NLM Training for Research Careers in Biomedical Informatics and Data Science • NLM 200 Pre- and Post-doctoral trainees, 16 Universities produce graduates who are prepared for research careers in 5 broad areas of biomedical informatics and data sciences; short-term training positions available at 11 of them; environmental exposures informatics at 6 of them

More Related