Why Doesn't EPA Have a Self-Contained Statistical Unit?:A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://breakinggov.com/author/brand-niemann/ July 8, 2013
A Tribute to Doug Engelbart http://www.dougengelbart.org/
Preface • Doug Engelbart had a strong influence on my professional work for the US Government: • It started with his participation in our Federal CIO Council Interagency Collaboration Expedition Workshops with Wikis. • It continued with my building a Dynamic Knowledge Repository for OMB after his Bootstrapping Innovation - Putting Vision to Practice Paradigm. • It finished with an invitation to visit his home and provide a ride to his doctor for a check up.
Purpose • Add another building block to my Dynamic Knowledge Repository Ecosystem as a tribute to Doug Engelbart. • Use the recent 5th Principles and Practices For A Federal Statistical Agency as the core of an expert knowledge base. • Answer the question: Why Doesn't the US EPA Have a Self-Contained Statistical Unit? after all this time and effort. • Show what can be done with US EPA and Scotland’s environment data in visualizations that the US EPA, OMB, and Scotland want.
Some of My Principles and Practices • Start With the End in Mind (Stephen Covey) • A good visualization depends more on the data and its creator than the tool (Edward Tufte) • Tool Wars Can Impede the Use of Content Management and Visualizations for Decision Making (Brand Niemann) • Encourage all tools to support interoperability (reuse) and “treat all content as data” (Dominic Sale) • A Well-designed Spreadsheet That Can be “Dragged and Dropped” Onto a Tool That Creates Statistics and Visualizations in the Public and Private Clouds is the “Killer App” (Brand Niemann) • This is why I used Silver Spotfire at the US EPA and now for European, Japanese, and US applications, but this can be done with other tools – they just take longer in my experience.
A Well-designed Spreadsheet http://www.scotland.gov.uk/Resource/0040/00400791.xls
Scotland’s Environment:Homepage My Note: It starts with finding the statistics and their metadata and then producing a data story supported by data products. This is what a data scientist –data journalist does! http://www.environment.scotland.gov.uk/default.aspx
Scotland’s Environment:Trends and Indicators http://www.environment.scotland.gov.uk/trends_and_indicators.aspx
The Scottish Government Environmental Statistics http://www.scotland.gov.uk/Topics/Statistics/Browse/Environment
“Drag and Drop” Onto a Tool Open File Open From Library Add Data Tables Add On-Demand Data Table Add Data Connection
Creates Statistics and Visualizations in the Public and Private Cloud
Get a Data Story Idea • In the 5th Principles and Practices For A Federal Statistical Agency, under Principal Statistical Agencies it says: • This section provides information—primarily from agency websites (see Appendix E) and OMB publications—on 13 of the 14 members of the ICSP, excluding only the Office of Environmental Information in the Environmental Protection Agency, which is not a self-contained statistical unit. The information provided for the 13 agencies includes origins, authorizing legislation or other authority, status of head (presidential appointee, career senior executive service official), budget and full-time permanent staffing levels in 2012 (see U.S. Office of Management and Budget, 2012b: Table 1 and App. B), and principal programs. The agencies are discussed in alphabetical order.
Add Your Personal Experience • I worked in EPA's Environmental Statistics Division and compiled a knowledgebase of their activities. Earlier I worked in the EPA Center for Environmental Statistics to try to become a Bureau of Environmental Statistics and produced an EPA Ontology State of the Environment Report. • While working in the EPA Center for Environmental Statistics, I helped produce the EPA Guide to Selected National Environment Statistics in the US Government and the Guide to Global Environmental Statistics. I received the EPA Bronze Medal for the former in 1993.
Add Your Personal Opinion • Since Congress never allowed EPA to have a bureau of Environmental Statistics and since the Office of Environmental Information in the Environmental Protection Agency would never allow the Environmental Statistics Division to become a self-contained statistical unit, I decide to spend the rest of my EPA career being a data scientist and applying my statistics and data architecture expertise to analyzing and visualizing as many EPA and government data sets as possible using the premier tool based on S-Plus and Spotfire called Spotfire by TIBCO. • This turned out to be very visionary because now the statistical agencies (e.g. Census) and OMB are actively looking to apply state-of-the-art tool to provide a lot of federal data to analysts and empowering them to use a visualization tool to derive new understandings. See: • http://semanticommunity.info/Data_Science/Free_Data_Visualization_and_Analysis_Tools
Bring In More Ideas and Data Sets My Note: This article contains links to data sets that I am using. http://blog.epa.gov/science/2013/06/epa-scientists-presented-open-science-at-white-house/
EPA Scientists Used These Data Sets My Note: These are the data sets and metadata in the article. http://epa.gov/comptox/
EPA Provides These Open Data Sets My Note: I am mining these data sets. http://www2.epa.gov/open
EPA Just Received Recognition For Their GeoPlatform • Recent Tweet: EPA GeoPlatform got a @ComputerWorld award for collaboration: http://www.eiseverywhere.com/ehome/49069/83917/?& … • https://twitter.com/DruidSmith/status/351786541049331712 • This is an opportunity to make it even more collaborative (reusable) and Digital Government Strategy Compliant!
US EPA Environmental Dataset Gateway Download My Note: This is difficult for the public to use and not “content as data”. https://edg.epa.gov/data/
EDG Well-Designed Spreadsheet My Note: This is Linked Open Data version of the EPA’s Geospatial Data that supports faceted search! http://semanticommunity.info/@api/deki/files/24897/EPAOpenGovernmentData.xlsx
EDG Visualizations:Bar Charts My Note: One can use this to assess Agency performance and prioritize data analyses. https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?EPAOpenGovernmentData-Spotfire
EDG Visualizations:Map Chart My Note: Dynamically linked adjacent visualizations. https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?EPAOpenGovernmentData-Spotfire
Build a Knowledge Base in MindTouch My Note: This is Digital Government Strategy Compliant! http://semanticommunity.info/CNSTAT/Principles_and_Practices_for_a_Federal_Statistical_Agency
Build a Knowledge Base Indexin Spreadsheet My Note: This is Linked Open Data and makes unstructured content structured so “all content is data” and federated search can be done across everything! http://semanticommunity.info/@api/deki/files/24897/EPAOpenGovernmentData.xlsx
Some Conclusions and Recommendations • Doug Engelbart knew how to work with people and technology. • The recent 5th Principles and Practices For A Federal Statistical Agency contains core subject matter expertise for working with government data to support decision making. • The US EPA and many other government agencies do not have “self-contained statistical units” but they can make better use of visualizations of their data to support decision making like Scotland. • Start With the End in Mind, Avoid Tool Wars, and Develop Well-designed Spreadsheets That Can be “Dragged and Dropped” Onto a Tool That Creates Statistics and Visualizations in the Public and Private Clouds.