1 / 32

Getting Smart About Data: Better Stewards for Virtual Observatories

This presentation explores the importance of data stewardship in virtual observatories and the need for integration, transparency, and collaboration. Prof. Peter Fox discusses the challenges and solutions in accessing and sharing scientific data worldwide.

bandy
Download Presentation

Getting Smart About Data: Better Stewards for Virtual Observatories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why We Need To Get Smart About Data To Be Better Stewards: Making Smarter Virtual Observatories Prof. Peter Fox (pfox@cs.rpi.edu, @taswegian, #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive Science/ IT and Web Science Rensselaer Polytechnic Institute, Troy, NY USA And the Deep Carbon Observatory Data Science Team IGARSS 2015, Milan, Italy, July 28, 2015 http://tw.rpi.edu/web/doc/IGARSS2015_Milan_Fox20150728_TU3.Y2 or http://bit.ly/1D50rQE(1D50rQE)

  2. Data Science Team + Hao, Kaleo, Stephen, Anusha, Jun, Mengyu, Chengcong, Harsha, Dan, …

  3. What to expect… • The Virtual Observatory – in brief • Ecosystems and stewarding • Systems v. Frameworks -> Platforms • Mediation • Deep Carbon Observatory • Integration, Transparency and Collaboration • The smarts.. • Where we are headed

  4. Working premise :== Mission Statement • Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data and information that: • appears to be integrated • appears to be locally available • is in a language (written, programming, or science) that is understandable and can be shared • Data intensive – volume, complexity, mode, scale, heterogeneity, … in an OPEN WORLD 4

  5. Experience • Ecosystem metaphor – how to steward? Data Information Knowledge Creation Gathering Presentation Organization Integration Conversation Context 5

  6. Producers Consumers Experience • Not just curator, i.e. producer to consumer Data Information Knowledge Creation Gathering Presentation Organization Integration Conversation Context 6

  7. Stewardship in the ecosystem! • Many elements, and we still do not have sufficient information models (and meaning) of how they inter-relate – a massive stewardship challenge Accountability Collaboration Identity Explanation Justification Verifiability Proof Trust Citability Integratability ‘Transparency’ -> Translucency ‘Provenance’

  8. Framework v. systems v. platforms • Rough definitions • Systems have very well-define entry and exit points. A user tends to know when they are using one. Options for extensions are limited and usually require engineering • Frameworks have many entry and use points. A user often does not know when they are using one. Extension points are part of the design • Platforms ~ arise from frameworks Tetherless World Constellation 8

  9. High-level framework architectures Tetherless World Constellation

  10. Core and Framework Semantics - Multi-tiered interoperability Mediation! Mediation! Mediation!

  11. Mediation 6th Generation Guess Smart Text Agents, Smart Data Agents, Relationship/ Association Rules, Cognitive Collaboration All these generations of mediation are in effect as we collaborate From: C. Borgman, 2008, NSF Cyberlearning Report, Illustration by Roy Pea and Jillian C. Wallis

  12. Deep Carbon Observatory (DCO) … • “We are dedicated to achieving transformational understanding of carbon’s chemical and biological roles in Earth.” www.deepcarbon.net

  13. Collaboration and Integration needs … • “Enable DCO team leaders to create new groups and associate a number of content types --- documents, discussions, blog posts, tasks, links, and bibliographic entries --- with the group, as well as simple event management (a private event calendar for the group) and embedding of external services (e.g. and esp. Google Calendar)” … more… (data, publications, projects)… stewarding a Knowledge Network … and a Virtual Organization (> 1000+ people = more)

  14. Decadal goals = Discovery science Global community of ‘Carbon scientists’ contributing to the Deep Earth Computer (data legacy) comprising: • Global Earth Mineral Laboratory • Inventory of Deep Fluids • Global Volcano Gas Emissions • Census of Deep Microbial Life • State of High Pressure and Temperature Carbon and Related Materials • Global Inventory of Diamonds with Inclusions • 7 others…

  15. TW-SPARQL Application: Dynamic, Stylized Menu Generation (using Drupal host) • Menus based on parameterization of page • See “Recent Findings" and "Projects" below • Note also expanded view “>”

  16. DCO Data Science Platform = DCVO CKAN VIVO GHS – Handle.net

  17. Stewardship of data-information-knowledge deepcarbon.net info.deepcarbon.net data.deepcarbon.net dx.deepcarbon.net

  18. VIVO Extension: Dataset deposit in attached data repository Need DCO-ID? Begin NO Revise metadata YES YES • Includes multi-level metadata collection • Includes persistent identifier (DCO-ID generation) • Includes interaction with dedicated repository OR accepts third-party deposit details Generate & register DCO-ID (unique suffix, blank URL) NO NO Data deposit YES NO External data YES Review DCO-ID & CKAN metadata Collect CKAN metadata & generate URL Revise CKAN metadata Add URL (to data in external repository) Deposit in CKAN & generate URL to data URL to the downloadable data Update DCO-ID (map the DCO-ID to CKAN URL) Update DCO-ID record Object without data URL End DCO-ID & DCO-ID metadata Deposited DCO data or URL to external data Data Science

  19. DCO Ontology http://deepcarbon.net/dco_datasets Click on Title: “DCO Ontology” or https://deepcarbon.net//dco_dataset_summary?uri=http://info.deepcarbon.net/individual/n5989

  20. State to date… • Knowledge network – implements both the collaboration and the integration, reporting implements the transparency • It’s being USED • Many means of population • User generation • Machine generation • Contributing these enhancements back to open-source communities (CKAN, VIVO) July 2014

  21. July 2015

  22. Thus… progress in VOs • Integrative – semantics • Transparent – semantics • Collaborative – semantics • Stewardship! • Yep – semantics • And cognition • This is where are we headed

  23. Thank you • pfox@cs.rpi.edu and the DCO Data Science Team • @taswegian #twcrpi • http://tw.rpi.edu • http://tw.rpi.edu/web/project/DCO-DS • http://deepcarbon.net

  24. Modern informatics enables a new scale-free framework approach • Use cases • Stakeholders • Distributed authority • Access control • Ontologies • Maintaining Identity

  25. vivoweb.org • VIVO - represents academic research communities • Every person, organization, or other data entity in VIVO has a unique identifier • VIVO enables the discovery of research and scholarship across disciplines at one institution or across many • Records are both human-readable and machine-readable • VIVO Extension - we’ve extended (yes, ontologies) VIVO to the science network – datasets, instruments, sites, etc.

  26. Collaboration tools Group Based Collaboration Group data deposit and reporting Listings of group content Group management and messaging Listings of group documents

  27. Group bibliography Group shared calendar Group task management Group membership Group event management 32

More Related