Oceanstore an infrastructure for global scale persistent storage
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

OceanStore: An Infrastructure for Global-Scale Persistent Storage PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

OceanStore: An Infrastructure for Global-Scale Persistent Storage. John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, Ben Zhao.

Download Presentation

OceanStore: An Infrastructure for Global-Scale Persistent Storage

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Oceanstore an infrastructure for global scale persistent storage

OceanStore: An Infrastructure for Global-Scale Persistent Storage

John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, Ben Zhao

A few slides have been borrowed from the authors’ presentations


Vision

Vision

  • What is Oceanstore?

    • “a utility infrastructure to span the globe and provide continuous access to persistent information”

Source: Berkeley OceanStore Website


Vision1

Vision

  • What is Oceanstore?

    • “a utility infrastructure to span the globe and provide continuous access to persistent information”

  • data

    • all kinds of information

    • desktop, laptop, palmtop

    • cars, cellular phones, other devices

    • futuristic: embedded in environment


Vision2

Vision

  • What is Oceanstore?

    • “a utility infrastructure to span the globe and provide continuous access to persistent information”

  • persistence

    • devices can be rebooted, lost, replaced

    • reliable, durable data (“deep archival” will last forever)

    • Automatic maintenance


Vision3

Vision

What is Oceanstore?

  • “a utility infrastructure to span the globe and provide continuous access to persistent information”

  • connectivity

    • even to tiniest devices, possibly intermittent

    • variable bandwidth, latency

  • availability

    • uniform access, comparable to LAN-based networked storage

    • fault-tolerant, DoS-tolerant


  • Vision4

    Vision

    • what is oceanstore?

      • “a utility infrastructure to span theglobe and provide continuous access to persistent information”

    • scale

      • geographically distributed

      • 1010 users

      • 1014 files / objects


    Questions about information

    Questions about information:

    Where is persistent information stored?

    20th-century tie between location and content outdated

    In world-scale system, locality is key

    How is it protected?

    Can disgruntled employee of ISP sell your secrets?

    Can’t trust anyone (how paranoid are you?)

    Can we make it indestructible?

    Want our data to survive “the big one”!

    Highly resistant to hackers (denial of service)

    Wide-scale disaster recovery

    Is it hard to manage?

    Worst failures are human-related

    Want automatic (introspective) diagnosis and repair


    First observation want utility infrastructure

    First Observation:Want Utility Infrastructure

    Mark Weiser from Xerox: Transparent computing is the ultimate goal. Computers should disappear into the background

    In the context of storage:

    Don’t want to worry about backup

    Don’t want to worry about obsolescence

    Need lots of resources to make data secure and highly available, BUT don’t want to own them

    Outsourcing of storage already becoming popular

    Pay monthly fee and your “data is out there”


    Utility based infrastructure

    Service provided by confederation of companies

    Monthly fee paid to one service provider

    Companies buy and sell capacity from each other

    Utility-based Infrastructure

    Canadian

    OceanStore

    Sprint

    AT&T

    IBM

    Pac

    Bell

    IBM


    Target applications

    Target applications

    Email

    Group calendar, contacts

    Distributed design tools

    Computer Supported Cooperative Work

    Digital libraries

    Distributed/shared repositories


    Assumptions

    Assumptions

    Untrusted infrastructure

    A small number of servers may crash or leak information

    most of the servers functioning correctly

    financially “responsible party” of servers ensure integrity

    but only clients trusted with cleartext

    Nomadic data

    data divorced from location

    flows freely within the storage infrastructure

    promiscuouscaching: “anywhere, anytime”

    location important for performance

    dynamic system tuning through introspection


    System overview

    System overview

    • persistent object

      • GUID: 160-bit SHA-1 hash

        • secure identification – globally unique and unforgeable

        • 280 unique objects before collisions (birthday paradox)

      • floating object replicas: independent of location

      • encrypted data

    • read

      • try fast probabilistic replica search (Bloom filter)

      • fallback to slower deterministic search (Tapestry)

    • write

      • update with predicates [as in Bayou – what is Bayou?]

      • creates new version


    What is bayou

    What is Bayou

    The Bayou System (Xerox PARC) is a platform of replicated, highly-available, variable-consistency, databases on which collaborative applications can be built. It caters to portable devices having intermittent connections.


    System overview1

    System overview

    • application interface

      • sessions: sequence of read/writes

      • session guarantees [Bayou]

        • loose consistency levels, ACID

    • active and archival forms

      • active: latest version, with update handle

      • archive: erasure coded read-only version

    • dynamic optimization

      • object location

      • degree of replication


    Tentative updates epidemic dissemination

    Tentative Updates:Epidemic Dissemination


    Committed updates multicast dissemination

    Committed Updates:Multicast Dissemination


    Naming

    naming

    • self-certifying path names (Mazières)

      • object GUID = hash of owner key and readable name

    • create hierarchies using “directory” objects

    • read restriction

      • through client encryption of data

    • write restriction, access control

      • associate ACL lists with object, respected by servers


    Addressing

    addressing

    • address an object by its GUID

      • message: GUID, random number, small predicate

      • route to closest GUID replica matching predicate

      • combines data location and routing:

        • no central name service to attack

        • save one round-trip for location discovery

    • routing

      • fast, probabilistic search algorithm

      • slow, deterministic search algorithm


    Routing

    routing

    • fast, probabilistic search algorithm

      • Bloom filter

        • probabilistic set membership test using bit vector

        • n-bit vector generated from n hashes of each set element

        • filter is union (OR) of all bit vectors

      • attenuated Bloom filter

        • array of d Bloom filters

        • i th Bloom filter is union of all <i -hop nodes

    • slow, deterministic algorithm

      • Tapestry


    Addressing and routing

    addressing and routing

    deterministic

    probabilistic


    Attenuated bloom filter

    Attenuated Bloom Filter


    Updates

    updates

    • Updates based on versioning and conflict resolution

      • i.e. no locking

      • update: actions with predicates

        • commit – apply action of first true predicate

        • abort – no true predicates

    • conflict resolution on encrypted data

      • possible predicates:

        • compare-version, compare-size, compare-block, search

      • possible actions:

        • replace-block, insert-block, delete-block, append


    Archival

    archival

    • produced when objects idle

    • use erasure codes (redundant fragmentation)

      • simplest example: parity bit

        • need any (n-1) out of n fragments

      • interleaved Reed-Solomon codes, Tornado codes

    • fragmentation improves reliability

      • “deep archival storage”

      • sweeper processes ensure replication sustained over time

      • fragmentation improves performance


    Erasure codes

    Erasure Codes

    Simple parity bits, or generalized Reed-Solomon codes

    can be used to implement it.


    Floating replica and deep archival coding

    Floating Replica and Deep Archival Coding

    Full

    Copy

    Full

    Copy

    Full

    Copy

    Ver1: 0x34243

    Ver2: 0x49873

    Ver3: …

    Ver1: 0x34243

    Ver2: 0x49873

    Ver3: …

    Ver1: 0x34243

    Ver2: 0x49873

    Ver3: …

    Conflict Resolution

    Logs

    Conflict Resolution

    Logs

    Conflict Resolution

    Logs

    Floating

    Replica

    Erasure-coded Fragments


    Dynamic optimization introspection

    dynamic optimization (introspection)

    • observation modules

      • collect and summarize information

      • incrementally update system database

    • optimization modules

      • periodically process the observation database

      • cluster recognition: group related objects

      • replica management: maintain replica number and location

      • periodic migration: work-home-work-home…

      • maintenance: routing, dissemination, availability, durability


  • Login