Composable consistency for wide area replication
This presentation is the property of its rightful owner.
Sponsored Links
1 / 47

Composable Consistency for Wide Area Replication PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Composable Consistency for Wide Area Replication. Sai Susarla Advisor: Prof. John Carter. Overview. Goal : m iddleware support for wide area caching in diverse distributed applications Key Hurdle : flexible consistency management

Download Presentation

Composable Consistency for Wide Area Replication

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Composable consistency for wide area replication

Composable Consistency for Wide Area Replication

Sai Susarla

Advisor: Prof. John Carter



  • Goal: middleware support for wide area caching in diverse distributed applications

  • Key Hurdle: flexible consistency management

  • Our Solution: novel consistency interface/model - Composable Consistency

  • Benefit: supports broader set of sharing needs than existing models. Examples:

    • file systems, databases, directories, collaborative apps – wider variety than any existing consistency model can support

  • Demo Platform: novel P2P middleware data store - Swarm

Caching overview

Caching: Overview

  • The Idea: cache frequently used items locally for quick retrieval

  • Benefits

    • Within cluster: load-balancing, scalability

    • Across WAN: lower latency, improved throughput & availability

  • Applications

    • Data stored in one place, accessed from multiple locations

    • Examples:

      • File system: personal files, calendars, log files, software, …

      • Database: online shopping, inventory, auctions, …

      • Directory: DNS, LDAP, Active Directory, KaZaa, …

      • Collaboration: chat, multi-player games, meetings, …

Centralized service

Centralized Service


server cluster




Proxy based caching

Proxy-based Caching


server cluster






Caching proxy

Server cluster

Caching the challenge

Caching: The Challenge

Applications have diverse consistency needs

Caching the problem

Caching: The Problem

  • Consistency requirements are diverse

  • Caching is difficult over WANs

    • Variable delays, node failures, network partitions, admin domains, …

  • Thus, most WAN applications either:

    • Roll their own caching solution, or

    • Do not cache and live with the latency

      Can we do better?



"A consistency management system that provides

a small set of customizable consistency mechanisms

can efficiently satisfy the data sharing needs of

a wide variety of distributed applications."



  • Further Motivation

  • Application study  new taxonomy to classify application sharing needs

  • Composable Consistency (CC) model

    • Novel interface to express consistency semantics for each access

    • Small option set can express more diverse semantics

  • Evaluation

Existing models are inadequate

Existing Models are Inadequate

  • Provide a few packaged consistency semantics for specific needs:

    • e.g., optimistic/eventual, close-to-open, strong

  • Or, lack enough flexibility to support diverse needs

    • TACT (cannot express weak consistency or session semantics)

    • Bayou (cannot support strong consistency)

  • Or, leave consistency management burden on applications

    • e.g., Oceanstore, Globe

Existing middleware is inadequate

Existing Middleware is Inadequate

  • Existing middleware support specific sharing needs

    • Read-only data: PAST, BitTorrent

    • Rare write-sharing: file systems (NFS, Coda, Ficus …)

    • Master-slave (read-only) replication: storage vendors, mySQL

    • Scheduled (nightly) replication: storage and DB services

    • Read-write replication in a cluster: commercial DB vendors, Petal

Application survey

Application Survey

40+ applications with diverse consistency needs

Survey results

Survey Results

Found common issues, overlapping choices

  • Are parallel read and writes ok?

  • How often should replicas synchronize?

  • Does update order matter?

  • What if some copies are inaccessible?

  • Can we exploit this commonality?

Composable consistency novel interface to express consistency semantics

Composable Consistency:Novel interface to express consistency semantics

Concurrency control

Replica synchronization

Failure handling

View Isolation

Update Visibility

Example close to open afs

Example: Close-to-open (AFS)

Allow parallel reads and writes

Latest data guaranteed at open()

Fail access when partitioned

Accept remote updates only at open()

Reveal local updates to others only on close()

Example eventual consistency bayou

Example: Eventual Consistency (Bayou)

Allow parallel reads and writes

Sync copies at most once every 10 minutes

Syncing should not block or fail operations

Accept remote updates as they arrive

Reveal local updates to others as they happen

Handling conflicting semantics

Handling Conflicting Semantics

  • What if two sessions have different semantics?

    • If conflicting, block a session until conflict goes away (serialize)

    • Otherwise, allow them in parallel

  • Simple rules for checking conflicts (conflict matrix)

  • Examples:

    • Exclusive write vs. exclusive read vs. eventual write: serialize

    • Write-immediate vs. session-grain isolation: serialize

    • Write-immediate vs. eventual read: no conflict

Using composable consistency

Using Composable Consistency

  • Perform data access within a session e.g.,

    • session_id = open(object, CC_option_vector);

    • read(session_id, buf);

    • write(session_id, buf);

      OR, update(session_id, incr_counter(value));

    • close(session_id);

  • Specify consistency semantics per-session at open() via the CC option vector

    • Concurrency control, replica synchronization, failure handling, view isolation and update visibility.

  • System enforces semantics by mediating each access

Composable consistency benefits

Composable Consistency Benefits

  • Powerful: Small option set can express diverse semantics

  • Customizable: allows different semantics for each access

  • Effective: amenable to efficient WAN implementation

  • Benefit to middleware

    • Can provide read-write caching to a broader set of apps.

  • Benefit for an application

    • Can customize consistency to diverse and varying sharing needs

    • Can simultaneously enforce different semantics on the same data for different users



Swarm a middleware providing cc

Swarm: A Middleware Providing CC

  • Swarm:

    • Shared file interface with CC options

    • Location-transparent page-grained file access

    • Aggressive P2P caching

    • Dynamic cycle-free replica hierarchy per file

  • Prototype implements CC (except causality & atomicity)

    • Per-file, per-replica and per-session consistency

  • Network economy (exploit nearby replicas)

  • Contention-aware replication (RPC vs caching)

  • Multi-level leases for failure resilience

Client server berkeleydb application

Client-server BerkeleyDB Application

App users

App users




App server


App logic




Berkeleydb application using swarm

BerkeleyDB Application using Swarm

App users

App users




App server


App logic

Swarm server

RDB wrapper

RDB plugin





Caching proxy app server using swarm

Caching Proxy App Server using Swarm

App users

App users




App server


App server


App logic

Swarm server

App logic

Swarm server

RDB plugin

RDB wrapper

RDB wrapper

RDB plugin









Swarm based applications

Swarm-based Applications

  • SwarmDB: Transparent BerkeleyDB database replication across WAN

  • SwarmFS: wide area P2P read-write file system

  • SwarmProxy: Caching WAN proxies for an auction service with strong consistency

  • SwarmChat: Efficient message/event dissemination

    No single model can support the sharing needs of all these applications

Swarmdb replicated berkeleydb

SwarmDB: Replicated BerkeleyDB

  • Replication support built as wrapper library

  • Uses unmodified BerkeleyDB binary

  • Evaluated with five consistency flavors:

    • Lock-based updates, eventual reads

    • Master-slave writes, eventual reads

    • Close-to-open reads, writes

    • Staleness-bounded reads, writes

    • Eventual reads, writes

      Compared against BerkeleyDB-provided RPC version

  • Order-of-magnitude throughput gains over RPC by relaxing consistency

Swarmdb evaluation

SwarmDB Evaluation

  • BerkeleyDB B-tree index replicated across N nodes

  • Nodes linked via 1Mbps links to common router 40ms RTT to each other

  • Full-speed workload

    • 30% Writes: inserts, deletes, updates

    • 70% Reads: lookups, cursor scans

  • Varied # replicas from 1 to 48

Swarmdb write throughput replica

SwarmDB Write Throughput/replica

Local SwarmDB server


20msec stale

10msec stale

RPC over WAN

Master-slave writes, eventual reads


Locking writes, eventual reads

Swarmdb query throughput replica

SwarmDB Query Throughput/replica

Local SwarmDB server


10msec stale

RPC over WAN


Swarmdb results

SwarmDB Results

  • Customizing consistency can improve WAN caching performance dramatically

  • App can enforce diverse semantics by simply modifying CC options

  • Updates & queries with different semantics possible

Swarmfs distributed file system

SwarmFS Distributed File System

  • Sample SwarmFS path

    • /swarmfs/swid:0x1234.2/home/sai/thesis.pdf

  • Performance Summary

    • Achieves >80% of local FS performance on Andrew Benchmark

    • More network-efficient than Coda for wide area access

    • Correctly supports fine-grain collaboration across WANs

    • Correctly supports file locking for RCS repository sharing

Swarmfs distributed development

SwarmFS: Distributed Development

Replica topology

Replica Topology

Swarmfs vs coda roaming file access

SwarmFS vs. Coda Roaming File Access

Network Economy

Coda-s always gets files from distant U1.

SwarmFS gets files from nearest copy.

Swarmfs vs coda roaming file access1

SwarmFS vs. Coda Roaming File Access

P2P protocol more efficient

Coda-s writes files through to U1 for close-to-open semantics.

Swarm’s P2P pull-based protocol avoids this.

Hence, SwarmFS performs better for temporary files.

Swarmfs vs coda roaming file access2

SwarmFS vs. Coda Roaming File Access

Eventual consistency inadequate

Coda-w Compile errors

  • Coda-w behaves incorrectly

    • `make’ skipped files

    • linker found corrupt object files.

  • Trickle reintegration pushed huge obj files to U1, clogging network link.

Evaluation summary

Evaluation Summary

  • SwarmDB: gains of customizable consistency

  • SwarmFS: network economy under write-sharing

  • SwarmProxy: strong consistency over WANs under varying contention

  • SwarmChat: update dissemination in real-time

    By employing CC, Swarm middleware data store can support diverse app needs effectively

Related work

Related Work

  • Flexible consistency models/interfaces

    • Munin, WebFS, Fluid Replication, TACT

  • Wide area caching solutions/middleware

    • File systems and data stores: AFS, Coda, Ficus, Pangaea, Bayou, Thor, …

    • Peer-to-peer systems: Napster, PAST, Farsite, Freenet, Oceanstore, BitTorrent, …

Future work

Future Work

  • Security and authentication

  • Fault-tolerance via first-class replication

Thesis contributions

Thesis Contributions

  • Survey of sharing needs of numerous applications

  • New taxonomy to classify application sharing needs

  • Composable consistency model based on taxonomy

  • Demonstrated CC model is practical and supports diverse applications across WANs effectively



  • Can a storage service provide effective WAN caching support for diverse distributed applications? YES

  • Key enabler: a novel flexible consistency interface called Composable consistency

  • Allows an application to customize consistency to diverse and varying sharing needs

  • Allows middleware to serve a broader set of apps effectively

Swarmdb control flow

SwarmDB Control Flow

Composing master slave

Composing Master-slave

  • Master-slave replication

    • serialize updates

      • Concurrent mode writes (WR)

      • Serial update ordering (apply updates at central master)

    • eventual consistency for queries

      • Options mentioned earlier

  • Use: mySQL DB read-only replication across WANs

Clustered berkeleydb

Clustered BerkeleyDB

Berkeleydb proxy using swarm

BerkeleyDB Proxy using Swarm

A swarm based chat room

A Swarm-based Chat Room


callback(handle, newdata)






handle = sw_open(kid, "a+");

sw_snoop(handle, callback);

while (! done) {



sw_write(handle, newdata);








Sample Chat client code

Chat transcript: WR mode, 0 second soft staleness, immediate visibility, no isolation

Update propagation path

  • Login