1 / 32

FDA Data Innovation Lab and Predictive Analytics Meetup

FDA Data Innovation Lab and Predictive Analytics Meetup. Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/

Download Presentation

FDA Data Innovation Lab and Predictive Analytics Meetup

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FDA Data Innovation Lab and Predictive Analytics Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup October 6, 2014

  2. Agenda • 6:30 p.m. Welcome and Introduction – Report on Recent Meeting with Dr. TahaKass-Hout, FDA’s First Chief Health Informatics Officer (CHIO) and FDA Data Science Data Publication Tutorial: • Interest in our Meetup on OpenFDA, July 7th • Keynote at AFCEA Bethesda’s Health IT Day, December 2nd • 7:00 p.m. Brooke Aker, Big Data Lens, Predictive Analytics for OpenFDAand Other Examples • 7:45 p.m.​ Brief Member Introductions and Inter-American Development Bank Open Data Portal and FDA Examples • 8:30 p.m. Open Discussion • 8:45 p.m. Networking • 9:00 p.m. Depart

  3. Dr. TahaKass-Hout, FDA’s First Chief Health Informatics Officer Dr. TahaKass-Hout is the Chief Health Informatics Officer of FDA taha@fda.hhs.gov | @DrTaha_FDA Dr. Jeffrey Shuren is Director of FDA's Center for Devices and Radiological Health jeff.shuren@fda.hhs.gov

  4. OpenFDA • OpenFDA, a new initiative to provide unprecedented access to FDA data and highlight projects in the public and private sector that use these data to further scientific research, educate the public, and save lives. • OpenFDAis an initiative of FDA’s Office of Informatics and Technology Innovation to provide a new level of access to a number of public high-value FDA datasets via RESTful APIs and structured raw file download. Currently, the project is in an early-development stage, with an alpha release of two datasets planned for spring 2014 and a larger public release later in the year. Additionally, openFDA will provide a platform for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data and creating new partnerships and opportunities between the public and private sector (BOLDING BY ME). • Presidential Innovation Fellow: Sean Herron is a Presidential Innovation Fellow serving at FDA sean.herron@fda.hhs.gov | @seanherron http://www.hhs.gov/idealab/innovate/openfda/

  5. OpenFDA History • OpenFDA is the first innovation created by TahaKass-Hout, MD, MS, upon joining FDA as the first Chief Health Information Officer in March 2013. • Dr. Kass-Hout launched the project by obtaining a Presidential Innovation Fellow to focus on policy and programmatic issues in July 2013. • In August 2013, a research and development contract was awarded to Iodine, Inc. to build the site. • The public cloud environment was determined in September 2013, and Dr. Kass-Hout’s team solicited agency and user input into policies, first priority datasets, and desirable technical characteristics of openFDA. • In December 2013 FDA established the Office of Informatics and Technology Innovation (OITI) under Dr. Kass-Hout’s leadership. • OpenFDA launched in Beta mode on June 2, 2014. • By September 2014, medical device reports, enforcement reports, and drug adverse event reports were available. • There were over 4.5 million data calls, over 40,000 visitors to openFDA from all over the world, dozens of press articles, and several websites that use openFDA in their own public offerings. • During Fiscal Year 2015, additional datasets and harmonization will be added. https://open.fda.gov/about/

  6. Making public FDA datasets more accessible • Caution: • We're in beta!openFDA is a beta research project and not for clinical use. We may limit or otherwise restrict your access to the API in line with our Terms of Service. Need help? Try StackExchange • Purposes: • Open data for easier and better access to FDA datasets. APIs, raw data, and documentation for high value public datasets. • Open source code and documentation. Shared on GitHub for community contribution. • Open community to share examples, apps, and ideas. Developers, researchers, and FDA on GitHub, StackExchange and Twitter. https://open.fda.gov/

  7. OpenFDA Updates • Introducing openFDA • TahaKass-Hout | 04 Mar 2014 • FDA's Path Forward for Open Data and Next Generation Sequencing * • TahaKass-Hout | 06 Mar 2014 (See next slides) • Ten Things to Know About Drug Adverse Events * • Sean Herron | 02 Jun 2014 (See next slides) • OpenFDA: Innovative Initiative Opens Door to Wealth of FDA’s Publicly Available Data • TahaKass-Hout | 02 Jun 2014 • OpenFDA Provides Ready Access to Recall Data • TahaKass-Hout | 08 Aug 2014 • Providing Easy Public Access to Prescription Drug, Over-the-Counter Drug, and Biological Product Labeling • TahaKass-Hout | 18 Aug 2014 • Providing Easy Access to Medical Device Reports Submitted to FDA since the Early 1990s • TahaKass-Hout | Jeffrey Shuren | 19 Aug 2014 https://open.fda.gov/updates/

  8. FDA's Path Forward for Open Data and Next Generation Sequencing • Utility NGS (Next Generation Sequencing) in the Internet cloud: FDA is facing growing NGS needs for processing internal genome sequencing data as well as the NGS data from industry submissions. The NGS initiative is planning and developing a cloud-base Big Data platform and analytics for robust, secure and controlled data storage, analysis, and collaboration and potentially sharing public-access genome sequencing information. • NGS is a Big Data Initiative. https://open.fda.gov/update/fda-path-forward-for-open-data-and-next-generation-sequencing/

  9. Ten Things to Know About Drug Adverse Events • 1. Start with the examples • 2. Know the limitations • 3. Know why the data is sometimes messy • 4. Make sure you check out the reference • 5. Learn the Lucene query syntax • 6. Don’t forget about count • 7. Use the openfda fields! • 8. Use .exact to count for phrases • 9. Beware of null values • 10. Watch for changes • We’ll be adding additional data to this endpoint whenever a new Quarterly Data File is posted. • My Note: Bulk data downloads I used! https://open.fda.gov/update/ten-things-to-know-about-adverse-events/

  10. Data Science Data Publications forBig Data Analytics • New Government Data Science Best Practices: • Digital Government Strategy • Open Research Data Policy • Agency: HHS IdeaLab, NIH Data Commons, FDA Innovation Lab • White House NITRD Big Data Initiative and NSF Agency Strategic Plan: Data Science, Data Infrastructure, and Data Publications • New Government Data Science Publication Examples: • Federal Data Center Consolidation 2014 • Performance.gov • FDA Data and FDA Data Innovation Lab • National Science Board Science & Engineering Indicators

  11. Data Science Data Publication for Federal Data Center Consolidation 2014: Data Journalism • In 2011 and 2012, I published three stories on the Federal Data Consolidation Initiative because of the poor quality and incompleteness of the data. It was one of the first non-federal applications of analytics I did after leaving government service. I decided to revisit the data for this and was please to find that the quality and completeness had improved considerably and so I decided to import the new spreadsheet into Spotfire and explore the results in multiple dynamically linked adjacent visualizations. • Of the 3,665 data centers in the data set now, only 976 have been closed since the beginning of the program and 2,689 are yet to be closed in 2014-2015! The vast majority of these (2,254) belong to the Department of Agriculture. Spreadsheet

  12. Data Science Data Publication for Federal Data Center Consolidation 2014: Data Visualization Web Player

  13. Data Science Data Publications for FDA:Data Science Data Mining Process • Recall OpenFDAKnowledge Base for previous visualization and analytics: • Brooke Aker, Biplab Pal, and Brand Niemann. • Mined HealthData.gov for FDA data and built linked data spreadsheets (17) for Spotfire: • See next slides. • Mined FDA Site Map for data: • Found Two: Data Standards and FDA Drug Approvals & Databases. • Downloaded and inventoried files (41) (ZIP, CSV & XLS) for Spotfire. • Used for FDA Data Innovation Lab Visualization Gallery.

  14. Data Science for OpenFDAMindTouch Knowledge Base http://semanticommunity.info/Data_Science/Data_Science_for_OpenFDA

  15. Data Science for FDA DataExcel Spreadsheet Data Ecosystem • FDA @ HealthData.gov • Summary FDA • FDA Site Map • FDA-TRACK • FDA Glossary • FDA-TRACK Research Glossary • FDA Drug Approvals & Databases • Summary All • Holdren Memo Agencies • HealthData.gov Subject 09172014 • HealthData.gov Agency 09172014 • HealthData.gov Date 09172014 • HealthData.gov Year 09172014 • HealthData.gov Period 09172014 • HealthData.gov Spatial 09172014 • HealthData.gov Start 09172014 • HealthData.gov Media 09172014 http://semanticommunity.info/@api/deki/files/30746/HHSFDA.gov.xlsx?origin=mt-web

  16. Data Science Data Publication:FDA Data in Spotfire • Cover Page-Performance Analytics: FDA TRACK • Content Analytics: Summary Statistics • Content Analytics: HealthData.gov Statistics 09172014 • Content Analytics: FDA @ HealthData.gov • Network Analytics: FDA Glossary & Site Map • Data Analytics: FDA Drug Approvals & Databases

  17. Cover Page-Performance Analytics: FDA TRACK My Note: Most programs do not have a Strategic Plan! Web Player

  18. Content Analytics:Summary Statistics My Note: Of the 5 HHS agencies that come under the Holdren Memo, CDC and FDA have by far the most and almost equal number of data sets! Web Player

  19. Content Analytics:HealthData.gov Statistics 09172014 My Note: See how few of these data sets are in readily useable media! Web Player

  20. Content Analytics:FDA @ HealthData.gov My Note: A Dashboard to the FDA Dashboards! Web Player

  21. Network Analytics:FDA Glossary & Site Map My Note: The FDA Site Map and Glossary as a Linked Data Network! Web Player

  22. Data Analytics:FDA Drug Approvals & Databases My Note: Inventory to prioritize further data science data publication work! Web Player

  23. FDA Data Innovation Lab Visualization Gallery:Spreadsheet Inventory My Note: This inventory is updated as one drills down into the data sets! http://semanticommunity.info/@api/deki/files/30746/HHSFDA.gov.xlsx?origin=mt-web

  24. FDA Data Innovation Lab Visualization Gallery:File Folder My Note: Some folders contain multiple files!

  25. Suggestions • Help the FDA Data Innovation Lab with data publication gallery and wall posters. • Help the FDA Data Innovation Lab with their Open Data Lab Day. • Organize Joint Meetups and promote use of the FDA Data Innovation Lab. • Help form Data Science Teams to work on FDA big data problems.

  26. Open Data Portal for the Inter-American Development Bank: Comments • Another good meeting last night. • Thank you for organizing this meetup, very helpful! Special thanks to Brand for all the info you shared. I'm looking forward to future ones! • Terrific and innovative data visualizations can make a big impact indeed. • This week was very good - exposure to interesting beta products (Semantic Insights, this week) as well as new approaches to visualization techniques are always things to which I look forward. When I get to see an illustration of the concept of "cognitive load" in visualizations the way it was shown in this session (with Sankey diagrams), it makes it an even better session. Great stuff! And I get to play around with a new data set - even better! http://www.meetup.com/Federal-Big-Data-Working-Group/events/206366842/

  27. Open Data Portal for the Inter-American Development Bank: Annette Hester • Thanks for hosting me last night. It was a pleasure to share ideas with such a knowledgeable group. • We would be delighted if you or any in your group took time to understand the database and compare it to traditional graphs and other visualizations. As I mentioned, the easiest way to do so would be using the first data graph, Energy Flows (http://www.iadb.org/eic/database). It is a Sankey Graph with a twist. You can find similar products at: • http://www.iea.org/Sankey/ • https://flowcharts.llnl.gov/ • www.energyliteracy.com • http://www.sankey-diagrams.com/tag/ghg/ • And if you google energy flow charts you will find quite a variety. • The more I look at energy data and what we have published, the better I feel about our database. I look forward to the results of your investigation. Please do keep in touch… and do feel free to post this note  on the meetup website.

  28. Open Data Portal for the Inter-American Development Bank: Energy Flows Visualization http://www.iadb.org/en/topics/energy/energy-innovation-center/flow-institutional-data,8879.html?view=v11

  29. Data Science for IDB Data:MindTouch Knowledge Base • My Initial Data Science Data Publication: • How was the data collected? • Where is the data stored? • What are the results? • Why should we believe them? The broader context and constructive critique http://semanticommunity.info/Data_Science/Data_Science_for_IDB_Data

  30. Data Science for IDB Data:IDB and Semantic Community Spreadsheets Integrated Spreadsheet http://www.iadb.org/en/topics/energy/energy-innovation-center/flow-institutional-data,8879.html?view=v11# http://semanticommunity.info/@api/deki/files/30751/Brazil_2011.csv?origin=mt-web

  31. Data Science for IDB Data:Spotfire Data Publication Compared to What? Web Player

  32. Inter-American Development Bank Open Data Portal Examples, Etc. • Please post your interest in providing a visualization example(s) and explanations to our Meetup site • Also feel free to use the FDA data or any other data you are working with in visualizations and explanations. • NSB Science & Engineering Indicators • FDA Data Innovation Lab Visualization Gallery • This is your time to shine!

More Related