1 / 22

Open DATA METI: All Content As Big Data

Open DATA METI: All Content As Big Data. Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013

nani
Download Presentation

Open DATA METI: All Content As Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open DATA METI:All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013 http://semanticommunity.info/A_Japan_METI_Open_Data_Dashboard/Open_DATA_METI

  2. Preface All the work with Data Catalogs does not really help with data integration. • Question from Brand Niemann: • Does this deal with the data elements themselves in the data sets, so you can search for data elements that you want to integrate with other data elements and find their definitions (metadata) to know if they are the same or similar enough to be semantically integrated? • Answer from John Erickson, Director, Web Science Operations, TetherlessWorld Constellation (RPI): • No. DCAT deals with the initial problems of where dataset catalogs and datasets themselves are from and what they contain. Loosely speaking, it does for catalogs and datasets what Dublin Core did for publications: it provides a succinct vocabulary that providers can rely on for describing their datasets, and consumers can rely on for finding. DCAT has already been used as the basis for the schema.org "datasets" extension as a way to make discovery of datasets easier using popular search engines. • Articulating the actual vocabularies used in published datasets is waaaay beyond the scope of DCAT, in part because DCAT is not restricted to datasets published as linked data. Some work including http://healthdata.tw.rpi.edu are looking at ways to communicate standard vocabularies used in published linked data...

  3. Preface Big Data Spells New Architecture "The data warehouse does what it does well and is not going to go anywhere. But it is not architected very well for the future. Our job, as IT, revolves entirely around one thing -- data integration”. http://www.computerweekly.com/news/2240179544/Big-data-spells-new-architectures

  4. Preface ‘Big Data is the new software’ http://www.forbes.com/sites/jonbruner/2012/04/04/tim-oreilly-on-the-future-of-location-the-guy-with-the-most-data-wins/ http://radar.oreilly.com/2007/12/google-admits-data-is-the-inte.html

  5. Preface • “New Digital Government Strategy is treating all content as data.” • Dominic Sale: • Introduced as OMB Chief of Data Analytics & Reporting at the Big Data Technology Symposium, March 13, 2013. • Said “new Digital Government Strategy is treating all content as data.“ • Dominic Sale joined OMB’s Office of E-Government and Information Technology in 2008 as a portfolio manager for several government-wide IT initiatives.  At OMB, Dominic played a lead role in implementing and operating major initiatives such as the IT Dashboard, and he is currently heavily involved in implementing the Federal CIO’s 25-Point IT Management Reforms.  Prior to arriving at OMB, Dominic began his Federal career as a program analyst in the OCIO at the Department of Transportation.  In his prior life as a contractor at both BAE Systems and BearingPoint, Dominic managed EA, capital planning and security initiatives at DOL, NLRB, FDA, and Census.  He has also worked on a variety of federal programs, at agencies such as the IRS, US Postal Service, US Mint, US Patent and Trademark Office, and the National Park Service. http://semanticommunity.info/Big_Data_Symposia#Speaker_Bio_for_Dominic_Sale

  6. My Process • Open DATA METI Web Site to MindTouch Knowledge Base to an Excel Spreadsheet • Open DATA METI Data Set List by File Type to an Excel Spreadsheet • Open DATA METI Data Sets by Metadata to an Excel Spreadsheet • Import the Above (3) and Selected Open DATA METI Data Sets Into Spotfire • Get Visualizations and Beginning of a Unified Big Data Architecture and Ecosystem for Big Data Integration

  7. Open DATA METI: WordPress & CKAN About DATA METI: Home Terms of use Privacy Policy Notation of credit Partners leverage DATA METI Inquiry API API Documentation Section: Tag Statistics Revision Site administrator http://datameti.go.jp/

  8. Open DATA METI: MindTouch Knowledge Base with Well-Defined URLs http://semanticommunity.info/A_Japan_METI_Open_Data_Dashboard/Open_DATA_METI

  9. Open DATA METI: Excel Spreadsheet 1Knowledge Base http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx

  10. Open DATA METI: Data Set List Drill Down on These 19 http://datameti.go.jp/data/

  11. Open DATA METI: Excel Spreadsheet 2Data Set List http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx

  12. Open DATA METI:Comprehensive Energy Statistics http://datameti.go.jp/data/group/statistics_sougouenergy

  13. Open DATA METI:General Energy Statistics (FY 2011) Some Have Lots of Files Source of Data http://datameti.go.jp/data/dataset/statistics_sougouenergy_2011

  14. Open DATA METI:Source http://www.enecho.meti.go.jp/info/statistics/jukyu/index.htm

  15. Open DATA METI:Link to Excel Spreadsheet Link to Spreadsheet My Comment: This is too many clicks to get to the actual data! http://datameti.go.jp/data/dataset/statistics_sougouenergy_2011/resource/b707e1d2-bd3d-483a-ab83-65e081c6daab

  16. Open DATA METI:Excel Spreadsheet http://www.enecho.meti.go.jp/info/statistics/jukyu/resource/xls/2011fysokuhou.xls

  17. Open DATA METI:Excel Spreadsheet in Spotfire Needs reformatting and language translation. Needs reformatting and language translation. Beginning of a Unified Data Architecture and Ecosystem for Data Integration using the View Data function in Spotfire 5. https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AOpenDATAMETI-Spotfire.dxp

  18. Open DATA METI: Excel Spreadsheet 3Data Sets Metadata http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx

  19. Open DATA METI:Excel Spreadsheet 1-3 in Spotfire https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AOpenDATAMETI-Spotfire.dxp

  20. Open DATA METI: Excel Spreadsheet 4Merged Data Sets http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx

  21. Open DATA METI:Merged Data Sets in Spotfire https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AOpenDATAMETI-Spotfire.dxp

  22. Summary • Preface: • All the work with Data Catalogs does not really help with data integration. • Big Data Spells New Architecture. • Big Data is the new software. • New Digital Government Strategy is treating all content as data. • The Open DATA METI Data Catalog has been turned into data in spreadsheets and statistical visualizations in Spotfire. • This simplifies the complex WordPress & CKAN interface which requires lots of extra mouse clicks and provides no faceted search. • Google Chrome provides Japanese language translation of the metadata, but not of the data columns in the spreadsheets. • This process provides the beginning of a Unified Data Architecture and Ecosystem for Data Integration using the View Data function in Spotfire 5.

More Related