Data Science for DataBayDataBay "Reclaim the Bay" Innovation Challenge:August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf Rd, Edgewater, MD 21037http://databay.splashthat.com Dr. Brand Niemann Director and Senior Data Scientist Semantic Community and Dr. Joan Aron Principal Aron Consulting August 3, 2014 http://semanticommunity.info/Data_Science/Data_Science_for_DataBay
So take a look at the new data catalogue & let us know what you think • Can you work with this data? • Yes, especially 5 sites that provide spreadsheet downloads. There are 5 sites that require a separate inventory. There are two sites that require browsing lots of data sets to make a selection. There is one site that does not appear to provide the actual data and one site that requires the user to have ArcGis software. • Are you encountering any issues with the datasets? • There are two sites that return Error Messages and one site that requires ArcGis software. • Are there relevant datasets or websites which are missing? • Probably, but there is so much to work with and so little time that can come later. • What other information would you like to see? • What I have done here as a Data Scientist to begin the Data Mining Process as follows: See next two slides.
DataBay Bibliography Catalogue http://semanticommunity.info/@api/deki/files/30279/Data-Bay-Bibliography-1.xlsx
Data Mining Process Standard Source: Data Science for Business (2013) at http://shop.oreilly.com/product/0636920028918.do
Federal Big Data Working Group Meetup • The Fourth Paradigm of Science (1): • Fourth Paradigm. Data-intensive science that exploits the large volumes of data in new ways for scientific exploration, such as the International Virtual Observatory Alliance inastronomy. • The Fourth Question of Big Data for Science (2): • How was the data collected? • Where is the data stored? • What are the data results? • Does the data story persuade? • Data Science Data Publications: • In General-Open Government and Non-Government Research Data in Data FAIRports (Findable, Accessible, Interoperable, and Reusable) or Commons (e.g. NIH BIG DATA Program) • Specifically-Chesapeake Bay Program and EPA EnviroAltas Bell G, Hey, T., & Szalay, A. (2009) Beyond the data deluge, Science 323, 6 March 2009, pp. 1297-1298. de Waard, Anita, (2014) About Stories, that Persuade With Data, Federal Big Data Working Group Meetup, 20 May,, 41 slides.
Data Science for DataBay:Knowledge Base in MindTouch (Wiki) http://semanticommunity.info/Data_Science/Data_Science_for_DataBay
Data Science for DataBay:Data Commons in MindTouch (Wiki) http://semanticommunity.info/Data_Science/Data_Science_for_DataBay
Data Science for DataBay:Data Commons in Spotfire (Business Intelligence) 1 My Note: Read Instructions to Execute These Dynamically Linked Visualizations.
Data Science for DataBay:Data Commons in Spotfire (Business Intelligence) 2 My Note: Read Instructions to Execute These Dynamically Linked Visualizations.
Some Conclusions and Next Steps • We formed a team with a senior data scientist and a senior environmental scientist from members of the Federal Big Data Working Group Meetup. • We looked at the new data catalogue & let you know what you thought about using it. • We have built and deployed Knowledge Base, Data FAIRport-Data Commons, and Business Intelligence Applications on the Semantic Data Web. • Our work on in-depth data science for the Chesapeake Bay Program and EPA EnviroAltas continues.