Exploring Cloud Control for Long-Term Data Preservation and Resilience
This presentation discusses the challenges and opportunities of using cloud storage for long-term data preservation. While cloud technology is popular, concerns remain about its reliability and geographical storage locations. Key topics include the resilience of cloud systems, replication strategies, and the simplicity of cloud APIs like Amazon S3. We also explore the possibility of local network utilization for cloud expansion, addressing single points of failure, and promoting flexibility in object storage. Ongoing feasibility studies and modularization efforts will be highlighted.
Exploring Cloud Control for Long-Term Data Preservation and Resilience
E N D
Presentation Transcript
P2N: Cloud Control David Tarrant davetaz@ecs.soton.ac.uk Ben O’Steen benjamin.osteen@ouls.ox.ac.uk
Problem • Everyone loves the cloud • No one in this room would use it as their primary storage. • Would anyone use it as a long term preservation storage solution?
More Questions • Does the cloud do backup/replication/multi-site replication? • Where are my files stored (geographically)? • What is the long term pricing strategy of the cloud?
Influences • Simplistic Cloud API • High resilience and distribution of resources • Transparent Expansion • Low Barrier to Entry
The API • Amazon S3 • PUT, GET, POST, HEAD, DELETE • HTTP has all the tools we need!
High Resilience & Distribution • Erasing coding (Honeycomb & RAID) • More efficient than replication • Resilience of Bit Torrent • Nodes in the network are geographic aware
Transparent Expansion • Nodes can be added to the network arbitrarily • Network re-distributes data for even spread
Low Barrier to Entry • Provide a node • Full machine • Spare space on an existing machine
The P2N N1 N2 N3 N4 N5 N6
The P2N N N1 N2 N3 N4 N5 N6
The P2N N N1 N2 N3 N4 N5 N6
The P2N Single Point of Failure? N N N1 N2 N3 N4 N5 N6
The P2N Single Point of Failure? N2 N1 N3 N4 N5 N6
Institutional Distribution N4 N1 N5 N2 N6 N3
Flexability • Object level granularity • Basic metadata support (through POST, HEAD) • Object reporting, available via HEAD (single object) or GET (network report) • Extensions to S3 API without breaking core functionality.
Progress so far • Feasibility study has been done • Now re-modularising the core • P2N1 – Localised Network (Spare space) • P2N2 – Thumper Network (200Tb+)
Thank-YouP2N: Cloud Control David Tarrant davetaz@ecs.soton.ac.uk Ben O’Steen benjamin.osteen@ouls.ox.ac.uk Preserv .org.uk Repository Preservation and Interoperability