1 / 26

OceanStore: An Infrastructure for Global-Scale Persistent Storage

OceanStore: An Infrastructure for Global-Scale Persistent Storage. John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, Ben Zhao.

laurent
Download Presentation

OceanStore: An Infrastructure for Global-Scale Persistent Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, Ben Zhao A few slides have been borrowed from the authors’ presentations

  2. Vision • What is Oceanstore? • “a utility infrastructure to span the globe and provide continuous access to persistent information” Source: Berkeley OceanStore Website

  3. Vision • What is Oceanstore? • “a utility infrastructure to span the globe and provide continuous access to persistent information” • data • all kinds of information • desktop, laptop, palmtop • cars, cellular phones, other devices • futuristic: embedded in environment

  4. Vision • What is Oceanstore? • “a utility infrastructure to span the globe and provide continuous access to persistent information” • persistence • devices can be rebooted, lost, replaced • reliable, durable data (“deep archival” will last forever) • Automatic maintenance

  5. Vision What is Oceanstore? • “a utility infrastructure to span the globe and provide continuous access to persistent information” • connectivity • even to tiniest devices, possibly intermittent • variable bandwidth, latency • availability • uniform access, comparable to LAN-based networked storage • fault-tolerant, DoS-tolerant

  6. Vision • what is oceanstore? • “a utility infrastructure to span theglobe and provide continuous access to persistent information” • scale • geographically distributed • 1010 users • 1014 files / objects

  7. Questions about information: Where is persistent information stored? 20th-century tie between location and content outdated In world-scale system, locality is key How is it protected? Can disgruntled employee of ISP sell your secrets? Can’t trust anyone (how paranoid are you?) Can we make it indestructible? Want our data to survive “the big one”! Highly resistant to hackers (denial of service) Wide-scale disaster recovery Is it hard to manage? Worst failures are human-related Want automatic (introspective) diagnosis and repair

  8. First Observation:Want Utility Infrastructure Mark Weiser from Xerox: Transparent computing is the ultimate goal. Computers should disappear into the background In the context of storage: Don’t want to worry about backup Don’t want to worry about obsolescence Need lots of resources to make data secure and highly available, BUT don’t want to own them Outsourcing of storage already becoming popular Pay monthly fee and your “data is out there”

  9. Service provided by confederation of companies Monthly fee paid to one service provider Companies buy and sell capacity from each other Utility-based Infrastructure Canadian OceanStore Sprint AT&T IBM Pac Bell IBM

  10. Target applications Email Group calendar, contacts Distributed design tools Computer Supported Cooperative Work Digital libraries Distributed/shared repositories

  11. Assumptions Untrusted infrastructure A small number of servers may crash or leak information most of the servers functioning correctly financially “responsible party” of servers ensure integrity but only clients trusted with cleartext Nomadic data data divorced from location flows freely within the storage infrastructure promiscuouscaching: “anywhere, anytime” location important for performance dynamic system tuning through introspection

  12. System overview • persistent object • GUID: 160-bit SHA-1 hash • secure identification – globally unique and unforgeable • 280 unique objects before collisions (birthday paradox) • floating object replicas: independent of location • encrypted data • read • try fast probabilistic replica search (Bloom filter) • fallback to slower deterministic search (Tapestry) • write • update with predicates [as in Bayou – what is Bayou?] • creates new version

  13. What is Bayou The Bayou System (Xerox PARC) is a platform of replicated, highly-available, variable-consistency, databases on which collaborative applications can be built. It caters to portable devices having intermittent connections.

  14. System overview • application interface • sessions: sequence of read/writes • session guarantees [Bayou] • loose consistency levels, ACID • active and archival forms • active: latest version, with update handle • archive: erasure coded read-only version • dynamic optimization • object location • degree of replication

  15. Tentative Updates:Epidemic Dissemination

  16. Committed Updates:Multicast Dissemination

  17. naming • self-certifying path names (Mazières) • object GUID = hash of owner key and readable name • create hierarchies using “directory” objects • read restriction • through client encryption of data • write restriction, access control • associate ACL lists with object, respected by servers

  18. addressing • address an object by its GUID • message: GUID, random number, small predicate • route to closest GUID replica matching predicate • combines data location and routing: • no central name service to attack • save one round-trip for location discovery • routing • fast, probabilistic search algorithm • slow, deterministic search algorithm

  19. routing • fast, probabilistic search algorithm • Bloom filter • probabilistic set membership test using bit vector • n-bit vector generated from n hashes of each set element • filter is union (OR) of all bit vectors • attenuated Bloom filter • array of d Bloom filters • i th Bloom filter is union of all <i -hop nodes • slow, deterministic algorithm • Tapestry

  20. addressing and routing deterministic probabilistic

  21. Attenuated Bloom Filter

  22. updates • Updates based on versioning and conflict resolution • i.e. no locking • update: actions with predicates • commit – apply action of first true predicate • abort – no true predicates • conflict resolution on encrypted data • possible predicates: • compare-version, compare-size, compare-block, search • possible actions: • replace-block, insert-block, delete-block, append

  23. archival • produced when objects idle • use erasure codes (redundant fragmentation) • simplest example: parity bit • need any (n-1) out of n fragments • interleaved Reed-Solomon codes, Tornado codes • fragmentation improves reliability • “deep archival storage” • sweeper processes ensure replication sustained over time • fragmentation improves performance

  24. Erasure Codes Simple parity bits, or generalized Reed-Solomon codes can be used to implement it.

  25. Floating Replica and Deep Archival Coding Full Copy Full Copy Full Copy Ver1: 0x34243 Ver2: 0x49873 Ver3: … Ver1: 0x34243 Ver2: 0x49873 Ver3: … Ver1: 0x34243 Ver2: 0x49873 Ver3: … Conflict Resolution Logs Conflict Resolution Logs Conflict Resolution Logs Floating Replica Erasure-coded Fragments

  26. dynamic optimization (introspection) • observation modules • collect and summarize information • incrementally update system database • optimization modules • periodically process the observation database • cluster recognition: group related objects • replica management: maintain replica number and location • periodic migration: work-home-work-home… • maintenance: routing, dissemination, availability, durability

More Related