230 likes | 374 Views
Providing Consistent Corporate Information David Smith Kevin Donovan. November 8, 2011. EPA Corporate Data. Background Consistency EPA’s Facility Registry System Linked Open Data Discussion. 2. Background . EPA uses parent company data in Small business analyses
E N D
Providing Consistent Corporate InformationDavid SmithKevin Donovan November 8, 2011
EPA Corporate Data • Background • Consistency • EPA’s Facility Registry System • Linked Open Data • Discussion 2
Background • EPA uses parent company data in • Small business analyses • Enforcement and compliance cases • Parent Company Data is reported to EPA by several different programs including Toxics Release Inventory (TRI) • In last several years, financial community, NGOs, environmental, and academic organizations expressed interest in data collected by EPA 3
Background – TRI Program • Requires annual reports from Facilities that meet reporting thresholds for the Manufacture, Process, or otherwise use of Toxic chemicals • Approximately 20,000 facilities report annually – information such as chemical releases & recycling totals • Facilities also required to report Parent Company name & Dun & Bradstreet number • With a large # of facilities, annual reporting, & corporate parent information make TRI a rich source of data to researchers 4
Consistency • Facilities often submit names with small variations • The Dow Chemical Company vs. Dow Chemical Co • JR Simplot vs. J R Simplot 5
Consistency – Naming Conventions Help to Eliminate Variability • Corporate initials (Zurn Industries PLC vsZurn Industries) • Misspellings(TaiheiyoCement vs TaihieyoCement) • Hyphenation (Trans-MaticvsTransMatic) • Spacing (JR Simplot vs J R Simplot) • Finally, companies involved in mergers and acquisitions are updated so the Parent Company list only includes the US domestic parent company. 6
Consistency – Naming Conventions Help to Eliminate Variability • Eliminate All Periods and Commas • Use all CAPITALS • Replace commonly used acronyms and abbreviations such as: • Replace AND with & • Replace LIMITED with LTD • Replace CORPORATION with CORP • Replace ASSOCIATION with ASSOC • Replace LIMITED LIABILITY COMPANY with LLC • Replace COMPANY with CO • Replace LIMITED LIABILITY CO. with LLC 7
Consistency • Submitted Parent Name (2008) # of Facilities G.E. 1 GE 35 GE. 79 GE, CO. 1 GENERAL ELECTRIC 1 GENERAL ELECTRIC CO21 GENERAL ELECTRIC COR 1 NA 1 • Standardized Parent Names (2008) # of FacilitiesGENERAL ELECTRIC CO (GE CO) 140 8
FRS Overview • Facility Registry System • FRS is a data aggregator • FRS performs integration, validation and QA across over 30 federal databases and over 50 state, territory and tribal databases • FRS contains information on nearly 2.8 million facilities
What FRS Does • Provides more complete, robust, holistic view of facility information, facilitating cross-media analyses: • Community-based initiatives • Environmental justice analyses • NEPA assessments • Emergency response • Other mission needs (TMDL program, climate change analysis, etc.) • FRS improves program facility data validity from 40% to 95% through QA and selecting best contact and location information from multiple data sources • Allows EPA, public, academic, and investment communities to evaluate compliance with environmental regulations U.S. Environmental Protection Agency
Collection Mandates Degree of ownership required to be reported to EPA under several statutes. Where the requirements do not capture the ultimate US-based parent company, we estimate the aggregated LOE and incremental burden on the regulated facilities due to changes in the regulations requiring facilities to report their ultimate US-based parent company.
Collaborative Ecosystem? http://opencorporates.com/ Typically in the US, legal entities are created with filings with state government
Collaborative Ecosystem? http://sunlightfoundation.com/sixdegrees/
Ecosystem -> Cloud Interoperability is needed
Contact: Dave Smith Office of Information Collection Smith.DavidG@epa.gov 202-566-0797 Kevin Donovan TRI Program Donovan.Kevin-E@epa.gov 202-566-0676 15
Interoperability http://www.heppnetz.de/ontologies/goodrelations/v1 Some models exist, e.g. GoodRelations cookbook for e-Commerce in the Linked Data space
Possibilities? Can agencies, private sector, et cetera participate collaboratively, with different folks able to contribute data they collect, and then being able to consume it as a whole?
Regulatory Drivers Concept: We have information from programs like TRI, which collect corporate / organizational entity names. From these programs, we can also populate some additional items, such as regulatory interest (in keeping with the January 18 White House memo), as well as effective dates and other fields which we can derive. Can we then ask questions like “What companies are impacted by 40 CFR 372?” or “What regulations impact company ‘X’?” We can publish this as Linked Open Data – we could also use this to dynamically update a Wiki page Strawman model for how for example one might want to connect organizational entity to regulatory info, toward for example assessing cumulative regulatory burden
Additional References to Legal Entity Examination of statutes in the Code of Federal Regulations (CFR) to find where and how ownership is defined and how it is required to be reported for each statute.