Composable consistency for wide area replication
1 / 47

Composable Consistency for Wide Area Replication - PowerPoint PPT Presentation

  • Uploaded on

Composable Consistency for Wide Area Replication. Sai Susarla Advisor: Prof. John Carter. Overview. Goal : m iddleware support for wide area caching in diverse distributed applications Key Hurdle : flexible consistency management

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Composable Consistency for Wide Area Replication' - Rita

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Composable consistency for wide area replication
Composable Consistency for Wide Area Replication

Sai Susarla

Advisor: Prof. John Carter


  • Goal: middleware support for wide area caching in diverse distributed applications

  • Key Hurdle: flexible consistency management

  • Our Solution: novel consistency interface/model - Composable Consistency

  • Benefit: supports broader set of sharing needs than existing models. Examples:

    • file systems, databases, directories, collaborative apps – wider variety than any existing consistency model can support

  • Demo Platform: novel P2P middleware data store - Swarm

Caching overview
Caching: Overview

  • The Idea: cache frequently used items locally for quick retrieval

  • Benefits

    • Within cluster: load-balancing, scalability

    • Across WAN: lower latency, improved throughput & availability

  • Applications

    • Data stored in one place, accessed from multiple locations

    • Examples:

      • File system: personal files, calendars, log files, software, …

      • Database: online shopping, inventory, auctions, …

      • Directory: DNS, LDAP, Active Directory, KaZaa, …

      • Collaboration: chat, multi-player games, meetings, …

Centralized service
Centralized Service


server cluster




Proxy based caching
Proxy-based Caching


server cluster






Caching proxy

Server cluster

Caching the challenge
Caching: The Challenge

Applications have diverse consistency needs

Caching the problem
Caching: The Problem

  • Consistency requirements are diverse

  • Caching is difficult over WANs

    • Variable delays, node failures, network partitions, admin domains, …

  • Thus, most WAN applications either:

    • Roll their own caching solution, or

    • Do not cache and live with the latency

      Can we do better?


"A consistency management system that provides

a small set of customizable consistency mechanisms

can efficiently satisfy the data sharing needs of

a wide variety of distributed applications."


  • Further Motivation

  • Application study  new taxonomy to classify application sharing needs

  • Composable Consistency (CC) model

    • Novel interface to express consistency semantics for each access

    • Small option set can express more diverse semantics

  • Evaluation

Existing models are inadequate
Existing Models are Inadequate

  • Provide a few packaged consistency semantics for specific needs:

    • e.g., optimistic/eventual, close-to-open, strong

  • Or, lack enough flexibility to support diverse needs

    • TACT (cannot express weak consistency or session semantics)

    • Bayou (cannot support strong consistency)

  • Or, leave consistency management burden on applications

    • e.g., Oceanstore, Globe

Existing middleware is inadequate
Existing Middleware is Inadequate

  • Existing middleware support specific sharing needs

    • Read-only data: PAST, BitTorrent

    • Rare write-sharing: file systems (NFS, Coda, Ficus …)

    • Master-slave (read-only) replication: storage vendors, mySQL

    • Scheduled (nightly) replication: storage and DB services

    • Read-write replication in a cluster: commercial DB vendors, Petal

Application survey
Application Survey

40+ applications with diverse consistency needs

Survey results
Survey Results

Found common issues, overlapping choices

  • Are parallel read and writes ok?

  • How often should replicas synchronize?

  • Does update order matter?

  • What if some copies are inaccessible?

  • Can we exploit this commonality?

Composable consistency novel interface to express consistency semantics
Composable Consistency:Novel interface to express consistency semantics

Concurrency control

Replica synchronization

Failure handling

View Isolation

Update Visibility

Example close to open afs
Example: Close-to-open (AFS)

Allow parallel reads and writes

Latest data guaranteed at open()

Fail access when partitioned

Accept remote updates only at open()

Reveal local updates to others only on close()

Example eventual consistency bayou
Example: Eventual Consistency (Bayou)

Allow parallel reads and writes

Sync copies at most once every 10 minutes

Syncing should not block or fail operations

Accept remote updates as they arrive

Reveal local updates to others as they happen

Handling conflicting semantics
Handling Conflicting Semantics

  • What if two sessions have different semantics?

    • If conflicting, block a session until conflict goes away (serialize)

    • Otherwise, allow them in parallel

  • Simple rules for checking conflicts (conflict matrix)

  • Examples:

    • Exclusive write vs. exclusive read vs. eventual write: serialize

    • Write-immediate vs. session-grain isolation: serialize

    • Write-immediate vs. eventual read: no conflict

Using composable consistency
Using Composable Consistency

  • Perform data access within a session e.g.,

    • session_id = open(object, CC_option_vector);

    • read(session_id, buf);

    • write(session_id, buf);

      OR, update(session_id, incr_counter(value));

    • close(session_id);

  • Specify consistency semantics per-session at open() via the CC option vector

    • Concurrency control, replica synchronization, failure handling, view isolation and update visibility.

  • System enforces semantics by mediating each access

Composable consistency benefits
Composable Consistency Benefits

  • Powerful: Small option set can express diverse semantics

  • Customizable: allows different semantics for each access

  • Effective: amenable to efficient WAN implementation

  • Benefit to middleware

    • Can provide read-write caching to a broader set of apps.

  • Benefit for an application

    • Can customize consistency to diverse and varying sharing needs

    • Can simultaneously enforce different semantics on the same data for different users

Swarm a middleware providing cc
Swarm: A Middleware Providing CC

  • Swarm:

    • Shared file interface with CC options

    • Location-transparent page-grained file access

    • Aggressive P2P caching

    • Dynamic cycle-free replica hierarchy per file

  • Prototype implements CC (except causality & atomicity)

    • Per-file, per-replica and per-session consistency

  • Network economy (exploit nearby replicas)

  • Contention-aware replication (RPC vs caching)

  • Multi-level leases for failure resilience

Client server berkeleydb application
Client-server BerkeleyDB Application

App users

App users




App server


App logic




Berkeleydb application using swarm
BerkeleyDB Application using Swarm

App users

App users




App server


App logic

Swarm server

RDB wrapper

RDB plugin





Caching proxy app server using swarm
Caching Proxy App Server using Swarm

App users

App users




App server


App server


App logic

Swarm server

App logic

Swarm server

RDB plugin

RDB wrapper

RDB wrapper

RDB plugin









Swarm based applications
Swarm-based Applications

  • SwarmDB: Transparent BerkeleyDB database replication across WAN

  • SwarmFS: wide area P2P read-write file system

  • SwarmProxy: Caching WAN proxies for an auction service with strong consistency

  • SwarmChat: Efficient message/event dissemination

    No single model can support the sharing needs of all these applications

Swarmdb replicated berkeleydb
SwarmDB: Replicated BerkeleyDB

  • Replication support built as wrapper library

  • Uses unmodified BerkeleyDB binary

  • Evaluated with five consistency flavors:

    • Lock-based updates, eventual reads

    • Master-slave writes, eventual reads

    • Close-to-open reads, writes

    • Staleness-bounded reads, writes

    • Eventual reads, writes

      Compared against BerkeleyDB-provided RPC version

  • Order-of-magnitude throughput gains over RPC by relaxing consistency

Swarmdb evaluation
SwarmDB Evaluation

  • BerkeleyDB B-tree index replicated across N nodes

  • Nodes linked via 1Mbps links to common router 40ms RTT to each other

  • Full-speed workload

    • 30% Writes: inserts, deletes, updates

    • 70% Reads: lookups, cursor scans

  • Varied # replicas from 1 to 48

Swarmdb write throughput replica
SwarmDB Write Throughput/replica

Local SwarmDB server


20msec stale

10msec stale

RPC over WAN

Master-slave writes, eventual reads


Locking writes, eventual reads

Swarmdb query throughput replica
SwarmDB Query Throughput/replica

Local SwarmDB server


10msec stale

RPC over WAN


Swarmdb results
SwarmDB Results

  • Customizing consistency can improve WAN caching performance dramatically

  • App can enforce diverse semantics by simply modifying CC options

  • Updates & queries with different semantics possible

Swarmfs distributed file system
SwarmFS Distributed File System

  • Sample SwarmFS path

    • /swarmfs/swid:0x1234.2/home/sai/thesis.pdf

  • Performance Summary

    • Achieves >80% of local FS performance on Andrew Benchmark

    • More network-efficient than Coda for wide area access

    • Correctly supports fine-grain collaboration across WANs

    • Correctly supports file locking for RCS repository sharing

Swarmfs vs coda roaming file access
SwarmFS vs. Coda Roaming File Access

Network Economy

Coda-s always gets files from distant U1.

SwarmFS gets files from nearest copy.

Swarmfs vs coda roaming file access1
SwarmFS vs. Coda Roaming File Access

P2P protocol more efficient

Coda-s writes files through to U1 for close-to-open semantics.

Swarm’s P2P pull-based protocol avoids this.

Hence, SwarmFS performs better for temporary files.

Swarmfs vs coda roaming file access2
SwarmFS vs. Coda Roaming File Access

Eventual consistency inadequate

Coda-w Compile errors

  • Coda-w behaves incorrectly

    • `make’ skipped files

    • linker found corrupt object files.

  • Trickle reintegration pushed huge obj files to U1, clogging network link.

Evaluation summary
Evaluation Summary

  • SwarmDB: gains of customizable consistency

  • SwarmFS: network economy under write-sharing

  • SwarmProxy: strong consistency over WANs under varying contention

  • SwarmChat: update dissemination in real-time

    By employing CC, Swarm middleware data store can support diverse app needs effectively

Related work
Related Work

  • Flexible consistency models/interfaces

    • Munin, WebFS, Fluid Replication, TACT

  • Wide area caching solutions/middleware

    • File systems and data stores: AFS, Coda, Ficus, Pangaea, Bayou, Thor, …

    • Peer-to-peer systems: Napster, PAST, Farsite, Freenet, Oceanstore, BitTorrent, …

Future work
Future Work

  • Security and authentication

  • Fault-tolerance via first-class replication

Thesis contributions
Thesis Contributions

  • Survey of sharing needs of numerous applications

  • New taxonomy to classify application sharing needs

  • Composable consistency model based on taxonomy

  • Demonstrated CC model is practical and supports diverse applications across WANs effectively


  • Can a storage service provide effective WAN caching support for diverse distributed applications? YES

  • Key enabler: a novel flexible consistency interface called Composable consistency

  • Allows an application to customize consistency to diverse and varying sharing needs

  • Allows middleware to serve a broader set of apps effectively

Composing master slave
Composing Master-slave

  • Master-slave replication

    • serialize updates

      • Concurrent mode writes (WR)

      • Serial update ordering (apply updates at central master)

    • eventual consistency for queries

      • Options mentioned earlier

  • Use: mySQL DB read-only replication across WANs

A swarm based chat room
A Swarm-based Chat Room


callback(handle, newdata)






handle = sw_open(kid, "a+");

sw_snoop(handle, callback);

while (! done) {



sw_write(handle, newdata);








Sample Chat client code

Chat transcript: WR mode, 0 second soft staleness, immediate visibility, no isolation

Update propagation path