1 / 8

Data Science for the National Big Data R&D Initiative

Data Science for the National Big Data R&D Initiative. Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/

Download Presentation

Data Science for the National Big Data R&D Initiative

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup October 5, 2014

  2. Background • I received an email that called attention to this important Federal Register Notice and the Knowledge Base for this story evolved to include: • Request for Input (RFI)-National Big Data R&D Initiative • Email and Web • The National Big Data R&D Initiative • PDF • GovLoop (Web and PDF) • The Journey to Big Data & Analytics Adoption • The Foundation For Data Innovation: The Enterprise Data Hub Report • Access and Use of NASA and Other Federal Earth Science Data • Email and Wiki • Big Data in Materials Research and Development • Web and PDF

  3. Data Science Knowledge Base in MindTouch Data Science for the National Big Data R&D Initiative

  4. Data Science Data Publication:EPA Open Data Policy Inventory • Project Open Data to Find EPA Public Excel File • Start at: • http://project-open-data.github.io/ • Scroll down to find Example Data Hubs: • http://project-open-data.github.io/data-hubs/ • Scroll down to find Environmental Protection Agency: • http://www.epa.gov/data/ • See EPA's Public Excel file and download it: • http://www2.epa.gov/sites/production/files/2013-12/usepa-pdl4odp-nov-2013-final.xlsx • Answers to Data Science Publication Questions: • How was the data collected? • We do not know yet? • Where is the data stored? • Excel Spreadsheet • What are the results? • Not Provided – We will do some analytics in Spotfire • Why should we believe them? • We do not know yet?

  5. EPA Public Excel File My Note: These links are not visible See next slide. http://semanticommunity.info/@api/deki/files/31049/usepa-pdl4odp-nov-2013-final.xlsx

  6. EnviroAtlas: Link to Summary My Note: Link to Resource Not Available https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7b0068AF7C-DE6C-4510-806A-BC35F333FE24%7d

  7. Data Science Analytics in Spotfire Web Player

  8. Some Results and Next Steps • Semantic Community has provide a Response to the Request for Input (RFI)-National Big Data R&D Initiative citing the Federal Big Data Working Group Meetup as an example. • We have built a knowledge base of examples: government (NASA/EPA Workshop), academia (NAS Workshop), and industry (GovLoop - Cloudera and IBM) to highlight a key differentiator, namely data science data publications that answer four key questions. • We have used the Open Data Agency Data Hubs to find and use the EPA Public Excel File to illustrate this. • The result is that EPA Public Excel File is insufficient to answer the four questions to produce a Data Science Data Publication, but the analytics have led us to the EPA EnviroAtlas “big data” sets (209) that can be used when the data have been reformatted for ease of use.

More Related