1 / 4

Lessons About Sustainability Learned from the Open Science Data Cloud

Lessons About Sustainability Learned from the Open Science Data Cloud. Robert Grossman University of Chicago & Open Cloud Consortium. 500 users that compute over data 150 active each month Users utilize between 1000 – 100,000+ core hours per month Same open source software stack.

reegan
Download Presentation

Lessons About Sustainability Learned from the Open Science Data Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lessons About Sustainability Learned from the Open Science Data Cloud Robert Grossman University of Chicago & Open Cloud Consortium

  2. 500 users that compute over data • 150 active each month • Users utilize between 1000 – 100,000+ core hours per month • Same open source software stack • Open Science Data Cloud • Operated by NFP Open Cloud Consortium • 6 PB (pan science) • Started in 2009 • Bionimbus Protected Data Cloud • Operated by University of Chicago • 1+ PB controlled access cancer data • Started in 2013

  3. Useful Models Today • Organizations and projects can contribute through a “condo model” • Modest fees can be charged on those communities that can pay, including commercial entities. • PIs can add charges in their grants for data infrastructure, which are paid when they contribute their data to the infrastructure. • PIs can add charges in their grants which are paid when they make significant use of data infrastructure. • PIs can add charges in their grants to contribute cap exp to publically funded repositories and infrastructure • Federal agencies can provide grants to support publicly funded data infrastructure • There are very important differences between small, medium and large scale data infrastructure.

  4. Lessons Learned / Principles • Pay in at constant cap exp, despite the fact that no one wants to pay and that there is an expectation that data infrastructure should be free. • Permanent IDs with queryable metadata are essential. • Portability of data and data environment (at scale) is critical so researchers can liberate their data. • Peering of Tier 1 data repositories and infrastructure and interoperability of public data infrastructure is critical, but don’t get ahead of reference implementations. • Essential to keep costs down: • Efficiency gained through medium scale infrastructure • Open source software • Use of infrastructure management and automation tools • Misconceptions about the costs / tradeoffs of public cloud infrastructure.

More Related