1 / 22

3 Round Stones: All Content As Big Data

3 Round Stones: All Content As Big Data. Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013 http://semanticommunity.info/3_Round_Stones.

catrin
Download Presentation

3 Round Stones: All Content As Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3 Round Stones:All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013 http://semanticommunity.info/3_Round_Stones

  2. Awarded Top Semantic Technology Startup http://semanticweb.com/3-round-stones-named-%E2%80%9Ctop-semantic-technology-start-up%E2%80%9D-at-semantic-tech-business-conference_b29646

  3. Linked Data Book by David Wood, et al http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/events/104544852/

  4. Current US GovernmentSemantic Web Strategy • Data.gov Advocates RDFa 1.1 Lite for Semantic Web Strategy. • See Comment From Owen Ambur on Next Slide. • I believe there is a better way to handle this that I showed the W3C eGov Special Interest Group on January 21st and have recommended for the reintroduction of the Data Act to the 113th Congress. • Create a Semantic Index of Strong Relationships (SR) in RDF Format in a Spreadsheet. • See next slide for example (spreadsheet and words) • Integrate That With Other Spreadsheets and Relational Databases in An Interoperability Interface (e.g. Dashboard) That Can Searched. • Essentially: • Computer Scientists Use RD2RDF (James Hendler) • Data Scientists Use SR2Excel2RDF (Brand Niemann)

  5. Comment From Owen Ambur • OMB's official guidance to agencies on implementation of section 10 of the GPRA Modernization Act (GPRAMA) says they may use XML, JSON, spreadsheets or CSVs in order to meet the requirement to publish their strategic and performance plans and reports in machine-readable format... but not PDF or HTML -- at least not without "enhanced structural elements".[1] • I couldn't help but chuckle at how [1] is a PDF. I get your pointhowever, which I think reinforces mine, that there is no US federalpolicy that prefers RDFa 1.1 over HTML Microdata for publishingmetadata in HTML. • [1]RDFa Lite 1.1, W3C Recommendation, June 7, 2012, Manu Sporny, editor, see http://www.w3.org/TR/rdfa-lite/ • Source: Owen Ambur, December 18, 2012, W3C eGov Mailing List. My Note: Former, Co-Chair of the Federal XML Working Group.

  6. International Linked Open Data Strategy:Linked Open Data Cloud Data My Question: Is it easy to add columns for who links to who? Answer: Not in a single table. SPARQL can't do cross-tabulation (Richard Cyganiak). http://semanticommunity.info/@api/deki/files/8824/=VIVO.xlsx

  7. International Linked Open Data:Comments to David Wood • The Linked Open Data Cloud is not actually “linked data”. • RDF at Data.gov is not linked data. • The analytical and statistical communities view Data.gov and Linked Open Data as “IT projects”. • Former Census Bureau Director Robert Groves. • Conventional tools can do linked data and data integration. • Spotfire Information Designer, Informatica, Information Builders, etc. http://manning.com/dwood/LinkedData_MEAP_ch1.pdf http://semanticommunity.info/AOL_Government/Exploiting_Linked_Data_with_BI_Tools

  8. Our Semantic Web Strategy for Data:Simple Explanation • One Table: • Two Columns • Example: Column 1: Section and Column 2: URL • Note: A Column 3: Description could be in the URL • Example: See Slide 18 • Three Columns: • Example: Column 1: Subject, Column 2: Object, and Column 3: Predicate • Note: This is the Semantic Web’s Linked Open Data Cloud as Linked Open Data for Network Analytics! • Example: See Slide 18 • Four Columns: • Examples: Column 1: Subject, Column 2: Attribute, Column 3: From, and Column 4: To, or Column 1: City, Column 2: Country, Column 3: Longitude, and Column 4: Latitude • Note: This is the format for Spotfire’s Network Analytics Module developed for the CIA • Example: See Next Slide and Semantic Medline

  9. Our Semantic Web Strategy for Data:Spotfire Network Analytics http://semanticommunity.info/AOL_Government/Social_Media_-_Six_Degrees_of_Separation_and_Now_Even_Less

  10. Edge and Node Tables To create a new network visualization it is necessary to provide an edge data table. It is optional to add a node data table since the application can generate a node table from your edge table as soon as you have made the necessary settings for the edges. The edge table must contain at least two columns, but usually more than two columns are needed for the network graph to give any useful insight into the data. The table should also contain a meaningful relation between the columns. For example, persons travelling to or from cities or, friendship relationships.

  11. My Process • Linked Data Web Sites to MindTouch Knowledge Base and to an Excel Spreadsheet • Linked Data Nuclear Power Plants Demo Application to MindTouch Knowledge Base and to an Excel Spreadsheet • Other Nuclear Power Plant Data Sources (2) to an Excel Spreadsheet • Import the Above (5) and Into Spotfire • Get Visualizations and Beginning of a Unified Big Data Architecture and Ecosystem for Big Data Integration

  12. Linked Data Book Web Site http://manning.com/dwood/ and http://manning.com/dwood/LinkedData_MEAP_ch1.pdf

  13. Linked Data Book in MindTouch My Note: Every Section, Figure, and Code Listing Has a well-defined URL! http://semanticommunity.info/3_Round_Stones#Book

  14. Knowledge Base Attachments My Note: This is similar to Callimachus attachments. http://semanticommunity.info/3_Round_Stones

  15. Callimachus Linked Open Data Demonstrations http://demo.3roundstones.net/rdf/2012/nuclear/schema/index.xhtml?view

  16. Callimachus jQuery Data Tables Example of Nuclear Power Plants http://demo.3roundstones.net/rdf/2012/datatable/index.xhtml?view

  17. Arkansas Nuclear One http://demo.3roundstones.net/diverted;http://usepa.3roundstones.net/facilities/110028034721?view

  18. Knowledge Base in MindTouchto Excel Spreadsheet Entity Extraction in Progress From MindTouchMashup to Excel Spreadsheet in Triple Format – Recall Slide 8 – to Build Strong Relationships. http://semanticommunity.info/@api/deki/files/23420/3RoundStonesLODDemos.xlsx

  19. Use Other Nuclear Power Plant Data Sources Data.gov: Appa (Operating Rx- data.gov).xls PowerReactorStatusForLast365Days.xls http://www.nrc.gov/info-finder/reactor/ano1.html

  20. 3 Round Stones:Five Excel Spreadsheets in Spotfire My Note: See Beginning of Unified Data Architecture & Ecosystem Also Photo Images Linked Data. https://silverspotfire.tibco.com/ViewAnalysis.aspx?file=/users/bniemann/Public/3RoundStones-Spotfire

  21. Summary • The New Digital Government Strategy of treating all content as data has been applied to the 3 Round Stones Web content and Callimachus Demo. • The Callimachus Demo has been turned into data in spreadsheets and statistical visualizations in Spotfire 5. • This simplifies the complex Callimachus interface which requires lots of extra mouse clicks and provides no faceted search. • There are other nuclear power plant data and metadata sources that should and have been included. • This process provides the beginning of a Unified Data Architecture and Ecosystem for Data Integration using the View Data function in Spotfire 5.

  22. Post Meetup Comments • US EPA’s data problems are systemic and not technological (I know because I was there for 30 years and was their first data architect and data scientist). • I have produced over 50 EPA Data Science Products and used Spotfire 5 to integrate 30 or so of EPA’s major data sets for the 2011 EPA Apps for the Environment Challenge using Spotfire 5. • I helped design Data.gov, implemented a more semantic version while on detail to them, and helped the Japan METI start Open Government Data. • Be Informed is the most advanced semantic technology (ontology & rules) in the world, but they do not call it that for business reasons. • Semantic Medline is the “killer semantic web app” for the Federal Government that our Data Science Team is moving to the new Cray Graph Computer. • At the Health Datapalooza 2012, Dr. Bill Frist (Eminent Heart Surgeon and Former Senate Majority Leader) described the exciting work that he is involved in to improve the outcomes of heart transplant surgery by individualizing the treatment of patients that reject the normal organ transplant medications due to genetic factors. • I volunteer to show how make 3 Round Stones: All Content As Big Data using the new Digital Government Strategy and our Semantic Web Strategy for Data.

More Related