1 / 31

Linked Open Data and Next Generation Science

Linked Open Data and Next Generation Science. Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute, Troy, NY & CEO McGuinness Associates, Latham, NY.

leona
Download Presentation

Linked Open Data and Next Generation Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linked Open Data and Next Generation Science Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute, Troy, NY & CEO McGuinness Associates, Latham, NY Earth System Information Partners, Madison Wisconsin, July 18, 2012

  2. Background I • Access to data is exploding with open government data and numerous agencies publishing and providing services access or at least FOIA access • Citizen interest and contributions are increasing – data gathering (e.g., bird observations), reviewing (e.g., galaxy zoo), compute cycles (e.g., SETI), … • Arguably the more large (both data volume and area breadth) science problems need addressing – these go beyond what a single research team can easily solve

  3. Background II • Semantic Technologies – technological support for encoding meaning in a form computers can understand and manipulate – are maturing and increasing in usage • Computational encodings of meaning can be used to help integrate, link, validate, filter,…. Essentially to make smarter, more context-aware applications • Semantic Technologies enable linking data… and linked data provides a way of connecting and traversing information, nodes, graphs, webs, …

  4. Take Home Message (early) • Linked Data is usable now by any project • Linked Data and Semantic Technologies can help in forming and connecting help large, distributed, evolving efforts such as many earth and space science projects • In the rest of talk: • Brief intro to Linked Data and Semantic Technologies through examples • Discussion about what we might do now and strive for in the future

  5. Linked Data • Linked Data is quite simple and follows principles set out by Berners-Lee in http://www.w3.org/DesignIssues/LinkedData.html • Use URIs as names for things • Use HTTP URIs so that people can look up those names. • When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) • Include links to other URIs. so that they can discover more things. • Introduction by examples and then discussion

  6. Population Sciences Grid Goals • Convey complex health-related information to consumer and public health decision makers for community health impact • Inform the development of future research opportunities effectively utilizing cyberinfrastructure for cancer prevention and control McGuinness, D. Shaikh, A., Lebo, T, Ding, L., Courtney, P., McCusker, J., Moser,. Morgan, G.D., Tatalovich, Z., Willis, G., Contractor, N., and Hesse, B. 2012. Towards Semantically-Enabled Next Generation Community Health Information Portals: The PopSciGridPilot In Proceedings of Hawaii International Conference on System Sciences 2012

  7. Semantic Web Perspective on Initial Project Goals • How can semantic technologies be used to integrate, present, and analyze data for a wide range of users? • Can tools allow lay people to build their own demos and support public usage and accurate interpretation? • How do we facilitate collaboration and “viral” applications? • Within PopSciGrid: • Which policies (taxation, smoking bans, etc) are correlated with health and health care costs? • What data should be displayed to help scientists and lay people evaluate related questions? • What data might be presented so that people choose to make (positive) behavior changes? • What does the data show? why should someone believe that? • What are appropriate follow up questions to support actionability?

  8. What is an Ontology? Thesauri “narrower term” relation Frames (properties) Formal is-a General Logical constraints Catalog/ ID Informal is-a Formal instance Value Restrs. Disjointness, Inverse, part-of… Terms/ glossary Ontologies Come of Age McGuinness, 2001, and From AAAI Panel 99 – McGuinness, Welty, Uschold, Gruninger, Lehmann Plus basis of Ontologies Come of Age – McGuinness, 2003

  9. Inference Web: Making Data Transparent and Actionable Using Semantic Technologies • How and when does it make sense to use smart system results & how do we interact with them? (Mobile) Intelligent Agents Knowledge Provenance in Virtual Observatories NSF Interops: SONET SSIII – Sea Ice Intelligence Analyst Tools Hypothesis Investigation / Policy Advisors 9 9

  10. Foundations: Web Layer Cake Visualization APIs S2S Govt Data Inference Web, Proof Markup Language, W3C Provenance Working group formal model, W3C incubator group, … Inference Web IW Trust, Air + Trust DL, KIF, CL, N3Logic Ontology repositories (ontolinguag), Ontology Evolution env: Chimaera, Semantic eScience Ontologies, MANY other ontologies OWL 1 & 2 WG Edited main OWL Docs, quick reference, OWL profiles (OWL RL), Earlier languages: DAML, DAML+OIL, Classic RIF WG AIR accountability tool SPARQL WG, earlier QL – OWL-QL, Classic’ QL, … Govt metadata search Linked Open Govt Data SPARQL to Xquery translator RDFS materialization (Billion triple winner) Transparent Accountable Datamining Initiative (TAMI)

  11. Foundations: Linked Data Cloud

  12. PopSciGrid Workflow Ban coverage Publish CSV2RDF4LOD Direct CHSI 2009 visualize derive derive integrate archive Archive SemDiff CSV2RDF4LOD Enhance derive

  13. PopSciGrid Example State View Extensible Mashups via Linked Data • Diverse datasets from NIH • Potentially linking to other content (e.g. “unemployment rate”) Accountable Mashups via Provenance • Annotate datasets used in demos • Feedback users’ comment to gov contact (e.g. %) • Annotation capabilities coming (and more)

  14. PopSciGrid II

  15. Reflections Successful but…. • What if we could allow data experts to build their own demos? • What if we could allow non-subject matter experts to function as subject-literate staff? • What if team members could interchange roles (and thus make contributions in other areas)? • What technological infrastructure is required? • Claim: all of this is being done now – and it is starting to scale and growing more accessible

  16. Updates and Motivations from a Computer Science Perspective Old: New: Enhanced conversions Vocabulary reuse Generic queries Re-usable data management code Unlimited use of new open source visualization toolkit State and county-level data • Raw conversions • Per-dataset vocabularies • Custom queries • Custom data management code • Limited use because of Google Visualization licenses • State-level data

  17. County average life expectancy (Summary Measures of Health)

  18. Questions and goals are similar – • What’s happening with x? – health of a country, water quality and other parts of an ecosystem, climate changes • What intervention strategies are being tested • What policies are correlated with factors under investigation • And • Why should people believe the outcome? Why Did I Show A Population Science Project and a Water Project?

  19. What’s happening with the climate and how will it affect the U.S.? National Climate Assessment 2013 30 chapters, 240 authors A “Highly Influential Scientific Assessment” Why should I believe it? GCIS presenting the provenance of the report itself, the key messages of the report, including traceable accounts of the >500 technical inputs from reports, papers, models, datasets, observations, etc. See Global Change Provenance Representation in the Global Change Information System (GCIS) Curt.Tilmes@nasa.gov

  20. SemantEco/SemantAqua • Enable/Empower citizens & scientists to explore pollution sites, facilities, regulations, and health impacts along with provenance. • Demonstrates semantic monitoring possibilities. • Map presentation of analysis • Explanations and Provenance available 5 4 3 2 1 http://was.tw.rpi.edu/swqp/map.html and http://aquarius.tw.rpi.edu/projects/semantaqua • Map view of analyzed results • Explanation of pollution • Possible health effect of contaminant (from EPA) • Filtering by facet to select type of data • Link for reporting problems • Now joint with USGS resource managers ; expanded to endangered species; now more virtual observatory style

  21. System Architecture Virtuoso access

  22. Semantic Web Methodology Originally developed for VSTO, now in SSIII, SESDI, SESF, OOI … McGuinness, Fox, West, Garcia, Cinquini, Benedict, Middleton The Virtual Solar-Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. Proc. 19 Conf. on Innovative Applications of Artificial Intelligence (IAAI-07), http://www.vsto.org

  23. Reflections • What began as Semantic water quality monitoring is now SemantEco – ecological and environmental monitoring in support of ecosystem analysis • Now includes endangered species and related health impacts working with USGS to prototype resource manager dashboard • Expanding to include citizen science reporting on water on mobile platforms • Now working with SONet, Santa Barbara County LTER, CUASHI to integrate other related scientific observations • Current focus use case ecological researcher • Find relevant data (within and outside DataOne) by region, timeframe, chemical, measurement dimension, species • Currently background ontology is relatively simple and aims more at discovery and integration • Semantic Sea Ice project aimed at helping arctic ice researchers find and evaluate data in support of understanding the state of ice in the arctic • These technologies span the spectrum of supporting discovery, integration, analysis, and ultimately prediction

  24. Discussion • Semantic Technologies and Linked Data are powering a wide array of applications – many in Big Science, Team Science, at least interdisciplinary science • Tools and methodologies are ready for use • We love to partner in these areas • What do you need or want from linked data and semantic technologies? Questions? - dlm @ cs . rpi . edu

  25. Extra

  26. RDF Data Cube Vocabulary • For publishing multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts using RDF. • Compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange). • Also compatible with: • SKOS, SCOVO, VoiD, FOAF, Dublin Core Terms • Integrated with the LOGD data conversion infrastructure • Integrated with other tooling like Stats2RDF

  27. Foundations: The Tetherless World Constellation Linked Open Government Data Portal TWC LOGD Convert Query/ Access LOGD SPARQL Endpoint Community Portal • RDF • RSS • JSON • XML • HTML • CSV • … Create Enhance Data.gov deployment

  28. Directions • Incorporation of TWC data Quality Facts label (Zednik et al) • Use of DataFAQs automated data quality framework (Lebo et al) • Additional provenance inclusion / usage (Inference / Provenance Web) • Annotation / Collaboration facilities (Michaelis et al) • Other data sets? Or exposition of other parameters? • Partners in additional topic areas

  29. Enabling Subject Area Exploration and Hypothesis Generation • What factors influence prevalence (and under what conditions)? • Within smoking, should we focus on prevalence, packs sold, quit rate, hospital admission diagnosis, other? • What is prevalence (definition)? And how is it measured (overall / in this data set)? • What are the conditions under which the data was obtained (date, sample set, extenuating conditions, …) • What other data might we include? And how might we show that data? • What should be represented ? And how should it be manipulated? • What tools and services to people benefit from to explore? Encode? Act?

  30. Semantic Advisor…providing insights to similar situations in travel Semantically-enabled advisors utilize: • Ontologies • Reasoning • Social • Mobile • Provenance • Context Patton & McGuinness.et. al tw.rpi.edu/web/project/Wineagent

  31. Semantic Sommelier • Previous versions used ontologies to infer descriptions of wines for meals and query for wines • New version uses • Context: GPS location, local restaurants and wine lists, user preferences • Social input: Twitter, Facebook, Wiki, mobile, … • Source variability in quality, contradictions exist, • Maintenance is an issue… however new models emerging

More Related