oceanstore an infrastructure for global scale persistent storage n.
Download
Skip this Video
Download Presentation
OceanStore: An Infrastructure for Global-Scale Persistent Storage

Loading in 2 Seconds...

play fullscreen
1 / 26

OceanStore: An Infrastructure for Global-Scale Persistent Storage - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

OceanStore: An Infrastructure for Global-Scale Persistent Storage. John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, Ben Zhao.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'OceanStore: An Infrastructure for Global-Scale Persistent Storage' - laurent


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
oceanstore an infrastructure for global scale persistent storage

OceanStore: An Infrastructure for Global-Scale Persistent Storage

John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, Ben Zhao

A few slides have been borrowed from the authors’ presentations

vision
Vision
  • What is Oceanstore?
    • “a utility infrastructure to span the globe and provide continuous access to persistent information”

Source: Berkeley OceanStore Website

vision1
Vision
  • What is Oceanstore?
    • “a utility infrastructure to span the globe and provide continuous access to persistent information”
  • data
    • all kinds of information
    • desktop, laptop, palmtop
    • cars, cellular phones, other devices
    • futuristic: embedded in environment
vision2
Vision
  • What is Oceanstore?
    • “a utility infrastructure to span the globe and provide continuous access to persistent information”
  • persistence
    • devices can be rebooted, lost, replaced
    • reliable, durable data (“deep archival” will last forever)
    • Automatic maintenance
vision3
Vision

What is Oceanstore?

    • “a utility infrastructure to span the globe and provide continuous access to persistent information”
  • connectivity
    • even to tiniest devices, possibly intermittent
    • variable bandwidth, latency
  • availability
    • uniform access, comparable to LAN-based networked storage
    • fault-tolerant, DoS-tolerant
vision4
Vision
  • what is oceanstore?
    • “a utility infrastructure to span theglobe and provide continuous access to persistent information”
  • scale
    • geographically distributed
    • 1010 users
    • 1014 files / objects
questions about information
Questions about information:

Where is persistent information stored?

20th-century tie between location and content outdated

In world-scale system, locality is key

How is it protected?

Can disgruntled employee of ISP sell your secrets?

Can’t trust anyone (how paranoid are you?)

Can we make it indestructible?

Want our data to survive “the big one”!

Highly resistant to hackers (denial of service)

Wide-scale disaster recovery

Is it hard to manage?

Worst failures are human-related

Want automatic (introspective) diagnosis and repair

first observation want utility infrastructure
First Observation:Want Utility Infrastructure

Mark Weiser from Xerox: Transparent computing is the ultimate goal. Computers should disappear into the background

In the context of storage:

Don’t want to worry about backup

Don’t want to worry about obsolescence

Need lots of resources to make data secure and highly available, BUT don’t want to own them

Outsourcing of storage already becoming popular

Pay monthly fee and your “data is out there”

utility based infrastructure
Service provided by confederation of companies

Monthly fee paid to one service provider

Companies buy and sell capacity from each other

Utility-based Infrastructure

Canadian

OceanStore

Sprint

AT&T

IBM

Pac

Bell

IBM

target applications
Target applications

Email

Group calendar, contacts

Distributed design tools

Computer Supported Cooperative Work

Digital libraries

Distributed/shared repositories

assumptions
Assumptions

Untrusted infrastructure

A small number of servers may crash or leak information

most of the servers functioning correctly

financially “responsible party” of servers ensure integrity

but only clients trusted with cleartext

Nomadic data

data divorced from location

flows freely within the storage infrastructure

promiscuouscaching: “anywhere, anytime”

location important for performance

dynamic system tuning through introspection

system overview
System overview
  • persistent object
    • GUID: 160-bit SHA-1 hash
      • secure identification – globally unique and unforgeable
      • 280 unique objects before collisions (birthday paradox)
    • floating object replicas: independent of location
    • encrypted data
  • read
    • try fast probabilistic replica search (Bloom filter)
    • fallback to slower deterministic search (Tapestry)
  • write
    • update with predicates [as in Bayou – what is Bayou?]
    • creates new version
what is bayou
What is Bayou

The Bayou System (Xerox PARC) is a platform of replicated, highly-available, variable-consistency, databases on which collaborative applications can be built. It caters to portable devices having intermittent connections.

system overview1
System overview
  • application interface
    • sessions: sequence of read/writes
    • session guarantees [Bayou]
      • loose consistency levels, ACID
  • active and archival forms
    • active: latest version, with update handle
    • archive: erasure coded read-only version
  • dynamic optimization
    • object location
    • degree of replication
naming
naming
  • self-certifying path names (Mazières)
    • object GUID = hash of owner key and readable name
  • create hierarchies using “directory” objects
  • read restriction
    • through client encryption of data
  • write restriction, access control
    • associate ACL lists with object, respected by servers
addressing
addressing
  • address an object by its GUID
    • message: GUID, random number, small predicate
    • route to closest GUID replica matching predicate
    • combines data location and routing:
      • no central name service to attack
      • save one round-trip for location discovery
  • routing
    • fast, probabilistic search algorithm
    • slow, deterministic search algorithm
routing
routing
  • fast, probabilistic search algorithm
    • Bloom filter
      • probabilistic set membership test using bit vector
      • n-bit vector generated from n hashes of each set element
      • filter is union (OR) of all bit vectors
    • attenuated Bloom filter
      • array of d Bloom filters
      • i th Bloom filter is union of all <i -hop nodes
  • slow, deterministic algorithm
    • Tapestry
addressing and routing
addressing and routing

deterministic

probabilistic

updates
updates
  • Updates based on versioning and conflict resolution
    • i.e. no locking
    • update: actions with predicates
      • commit – apply action of first true predicate
      • abort – no true predicates
  • conflict resolution on encrypted data
    • possible predicates:
      • compare-version, compare-size, compare-block, search
    • possible actions:
      • replace-block, insert-block, delete-block, append
archival
archival
  • produced when objects idle
  • use erasure codes (redundant fragmentation)
    • simplest example: parity bit
      • need any (n-1) out of n fragments
    • interleaved Reed-Solomon codes, Tornado codes
  • fragmentation improves reliability
    • “deep archival storage”
    • sweeper processes ensure replication sustained over time
    • fragmentation improves performance
erasure codes
Erasure Codes

Simple parity bits, or generalized Reed-Solomon codes

can be used to implement it.

floating replica and deep archival coding
Floating Replica and Deep Archival Coding

Full

Copy

Full

Copy

Full

Copy

Ver1: 0x34243

Ver2: 0x49873

Ver3: …

Ver1: 0x34243

Ver2: 0x49873

Ver3: …

Ver1: 0x34243

Ver2: 0x49873

Ver3: …

Conflict Resolution

Logs

Conflict Resolution

Logs

Conflict Resolution

Logs

Floating

Replica

Erasure-coded Fragments

dynamic optimization introspection
dynamic optimization (introspection)
  • observation modules
    • collect and summarize information
    • incrementally update system database
  • optimization modules
    • periodically process the observation database
    • cluster recognition: group related objects
    • replica management: maintain replica number and location
    • periodic migration: work-home-work-home…
    • maintenance: routing, dissemination, availability, durability