Scalability terminology farms clones partitions and packs racs and raps
1 / 31

Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS - PowerPoint PPT Presentation

  • Uploaded on

Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS. Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999 Based on presentation by Hongwei Zhang, Ohio State. Outline. Introduction Basic scalability terminology / techniques

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS' - salvatore

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Scalability terminology farms clones partitions and packs racs and raps

Scalability Terminology:Farms, Clones, Partitions, and Packs:RACS and RAPS

Bill Devlin, Jim Cray, Bill Laing, George Spix

Microsoft Research

Dec. 1999

Based on presentation by Hongwei Zhang, Ohio State


  • Introduction

  • Basic scalability terminology / techniques

  • Software requirements for these scalable systems

  • Cost/performance metrics to consider

  • Summary

Why need to scale
Why need to Scale ?

  • Server systems must be able to start small

    • Small-size company (garage-scale) v.s. international company (kingdom-scale)

  • System should be able to grow as demand grows e.g.

    • eCommerce made system growth more rapid & dynamic

    • ASP also need dynamic growth

How to scale
How to scale?

  • Scale up - expanding a system by incrementally adding more devices to an existing node – CPUs, discs, NICS, etc.

    • inherently limited

  • Scale Out – expanding the system by adding more nodes – convenient (computing capacity can be purchased incrementally), no theoretical scalability limit

    • slogans:

      • Buy computing by the slice

      • Build systems from CyberBricks

        • slice, cyberbricks: fundamental building blocks for a scalable system

Basic terminology associated with scale out
Basic terminology associated with scale-out

  • Ways to organize massive computation

    • Farm

    • Geoplex

  • Ways to scale a farm

    • Clone (RACS)

    • Partition (RAPS)

  • Pack

Farm geoplex

  • Farm - the collection of servers, applications and data at a particular site

    • features:

      • functionally specialized services (email, WWW, directory, database, etc.)

      • administered as a unit (common staff, management policies, facilities, networking)

  • Geoplex – a replicated (duplicated?) farm at two or more sites

    • disaster protection

    • may be

      • active-active: all farms carry some of the load;

      • active-passive: one or more are hot-standbys (waiting for fail-over of corresponding active farms)


  • A replica of a server or a service

  • Allows load balancing

  • External to the clones - IP sprayer like Cisco LocalDirectorTM

    • LocalDirectorTM dispatches (sprays) requests to different nodes in the clone to achieve load-balancing

  • Internal to the clones - IP sieve like Network Load Balancing in Windows 2000

    • Every requests arrive at every node in the clone, but each node intelligently accepts a part of these requests;

    • Distributed coordination among nodes

  • RACS

    • DFN:

      • The collection of clones for a particular service is is called a RACS (Reliable Array of Cloned Services).

    • Two types of RACS (Fig. 2)

      • Shared-nothing RACS

        • Each node duplicate all the storage locally

      • Shared-disk RACS (also called a cluster)

        • All the nodes (clones) share a common storage manager. Stateless servers at different nodes access a common backend storage server

    RACS (contd.)

    • Advantages of cloning and RACS

      • Offer both scalability and availability

        • Scalability: excellent ways to addprocessing power, network bandwidth, and storage bandwidth to a farm;

        • Nodes can act as backup for one another: one node fail, other nodes continue to offer service (probably with degraded performance)

        • Failures could be masked, if node- and application-failure detection mechanisms are integrated with the load-balancing system or with client applications

      • Easy to manage

        • Administrative operations on one service instance at one node could be replicated to all others.

    RACS (contd.)

    • Challenges

      • Shared-nothing RACS

        • not a good way to grow storage capacity: updates at one node’s must be applied to all other nodes’ storage

        • problematic for write-intensive services: all clones must perform all writes (no throughput improvement) and need subtle coordination

          • Shared-disk RACS could ameliorate (to some extent) this cost and complexity of cloned storage;

      • Shared-disk RACS

        • Storage server should be fault-tolerant for availability (only one copy of data)

        • Still require subtle algorithms to manage updates (such as cache validation, lock managers, transaction logs, etc.)


    • DFN:

      • To grow a service by duplicatingthe hardware and software but dividing the data among the nodes (Fig. 3).

    • Features

      • Only the software is cloned, data is divided among the nodes (unlike shared-nothing clone)

      • Transparent to applications

      • Simple partitioning has only one copy of data, thus not improving availability:

        • Geoplex to guard against loss of storage

        • More common: locallyduplex (raid 1) or parity protect (raid 5) the storage

    Partition (contd.)

    • Example

      • Typically, the application middleware partitions the data and workload by object:

        • Mail servers partition by mailboxes

        • Sales systems partition by customer accounts or product lines

    • Challenge

      • When a partition (node) is added, the data should be automatically repartitioned among the nodes to balance the storage and computational load.

      • The partitioning should automatically adapt as new data is added and as the load changes.


    • Purpose

      • To deal with hardware/software failure at a partition

    • DFN:

      • Each partition is implemented as a pack of two or more nodes that provide access to the storage (Fig. 3).

    Pack (contd.)

    • Two types of Pack

      • Shared-disk pack

        • All members of the pack may access all the disks in the pack;

        • Similar to shared-disk clone, except that the pack is serving just one part of the total database.

      • Shared-nothing pack

        • Each member of the pack may serve just one partition of the disk pool during normal conditions, but serve a failed partition if the partition’s primary server fails;

    Shared-nothing Pack (contd.)

    • Two modes:

      • Active-active pack:

        • each member of the pack can have primary responsibility for one or more partitions;

        • When a node in the pack fails, the service of its partition migrates to another node of the pack.

      • Active-passive pack:

        • Just one node of the pack is actively serving requests while the other nodes are acting as hot-standbys


    • DFN:

      • The collection of nodes that support a packed-partitioned service are called a RAPS (Reliable Array of Partitioned Services).

    • Advantage

      • Provides both scalability and availability;

      • Better performance than RACS for write-intensive services.

    Summary contd
    Summary (contd.)

    • Clones and RACS

      • For read-mostly applications with low consistency and modest storage requirements (<= 100 GB)

        • Web/file/security/directory servers

    • Partitions and RAPS

      • For update-intensive and large database applications (routing requests to specific partitions)

        • Email/instant messaging/ERP/record keeping


    • Multi-tier applications

      (Functional separation)

      • front-tier:

        • Web and firewall services (read mostly)

      • middle-tier:

        • File servers (read mostly)

      • data-tier:

        • SQL servers (update intensive)

    Example contd
    Example (contd.)

    • Load balancing and routing at each tier

      • Front-tier

        • IP-level load distribution scheme

      • Middle-tier

        • Data and process specific load steering, according to request semantics

      • Data-tier

        • Routing to the correct partition

    Software requirements for geoplex farms racs and raps
    Software Requirements for Geoplex, Farms, RACS and RAPS

    (more of a wish list than a reflection of current tools and capabilities)

    • Be able to manage everything from a single remote console, treating RACS and RAPS as entities

      • Automated operation software to deal with “normal events” (summarizing, auditing, etc.) and to help the operator manage exceptional events (detects failures and orchestrates repair, etc.): reduce operator error (thus, enhancing site availability)

    Software requirements contd
    Software requirements (contd.)

    • Both the software and hardware components must allow online maintenance and replacement

      • Tools to support versioned software deployment and staging across a site

    • Good tools to design user interfaces, services, and databases

    • Good tools to configure and then load balance the system as it evolves

    Software requirements (contd.)

    • RACS

      • Automatic replication of software and data to new nodes

      • Automatic request routing to load balance the work and to route around failures

      • Recognize repaired and new nodes

    • RAPS

      • Automatic routing requests to nodes dedicated to serving a partition of data (affinity routing)

      • Middleware to provide transparent partitioning and load-balancing (a application-level service)

      • Similar manageability features of cloned system (for Pack)

    Price performance metrics
    Price/Performance Metrics

    • Why need cloning/partitioning ?

      • One cannot buy a single 60 billion-instructions per second processor or a single 100TB server

      • So, at least some degree of cloning and partitioning is required

    • What is the right building block for a site?

    Right building block
    Right building block

    • Mainframe vendors

      • Mainframe is the right choice!

        • Their hardware and software offer high availability

        • Easier to manage their systems than to manage coned PCs

      • But, mainframe prices are fairly high !

        • 3x to 10x more expensive

    Right building block contd
    Right building block (contd.)

    • Commodity servers and storage

      • Less costly to use inexpensive clones for those CPU intensive services, such as web service.

      • Commodity software is easier to manage than the traditional services that require skilled operators and administrators

    • Consensus

      • Much easier to manage homogeneous sites (all NT, all FreeBSD, etc.) than to manage heterogeneous sites

        • Stats: middleware (such as Netscape, IIS, Notes, Exchange) are where the administrators spend most of their time


    • Scalability technique

      • Replicate a service at many nodes

    • Simpler forms of replication

      • Duplicate both programs and data: RACS

    • For large databases or update-intensive services

      • Data partitioned: RAPS

      • Packs make partitions highly available

    • Against disaster

      • The entire farm is replicated to form a geoplex