Cs435 535 internet scale applications
1 / 34

CS435/535: Internet-Scale Applications - PowerPoint PPT Presentation

  • Updated On :

CS435/535: Internet-Scale Applications. Lecture 24 Service Backend Design: Storage/State Management: Local FS and Single Networked FS 4/20/2010. Admin. Please schedule two project meetings with instructors Project report due: May 11. Recap: Data Center Trend. Mega data centers

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'CS435/535: Internet-Scale Applications' - richard_edik

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Cs435 535 internet scale applications l.jpg

CS435/535: Internet-Scale Applications

Lecture 24Service Backend Design: Storage/State Management: Local FS and Single Networked FS


Admin l.jpg

  • Please schedule two project meetings with instructors

    • Project report due: May 11

Recap data center trend l.jpg
Recap: Data Center Trend

  • Mega data centers

    • Microsoft DC @ Chicago500,000+ servers

  • Virtualization at DC

    • 10 VMs per server ->5 mil. addresses

  • Large-scale app provider

    • All2all, internal comm.

    • multiple app.

Recap data center design l.jpg
Recap: Data Center Design

  • How to achieve high interconnect capacity?

  • How to design large L2 domain?

  • How to achieve application isolation?

Recap support for backend processing l.jpg
Recap: Support for Backend Processing

  • Programming model and workload/processing management

    • Means for rapidly installing a service’s code on a server

    • MapReduce/Dryad, virtual machines, disk images

  • Network infrastructure

    • Means for communicating with other servers, regardless of where they are in the data center(s)

    • Data center design

  • Storage management

    • Means for a server to store and access data

    • Distributed storage systems

Outline l.jpg

  • Admin.

  • Storage system

Context l.jpg

  • Some example data that we may store at the backend servers:

    • Web pages, inverted index

    • Content videos

    • Google map

    • User profiles, documents, emails, calendars, photos

    • Facebook feed of a user

    • Inventory

    • Shopping cart

    • User/device presence state

Context8 l.jpg

  • Storage design depends on what you store and what you want, e.g.,

    • Access pattern: e.g., mostly read or mostly write

    • Item size and # of items

    • Soft state vs hard state

    • Availability vs reliability

  • Storage system design is still evolving rapidly

  • Focus on problems/basic concepts, and sketch of solutions

Outline9 l.jpg

  • Admin.

  • Storage system

    • Review of design points in a local file system

Local fs name space management l.jpg
Local FS: Name Space Management System

  • Many notions:

    • name, dir, inode, block, file descriptor (bind to inode in Unix)

  • Impact of introducing file descriptor?

    • Performance: lookup file name only once

    • Have impact on semantics under concurrency:

Local fs 3 write problems l.jpg
Local FS: “3 Write Problems” System

  • When and how are the effects of write committed on the disk?

    • persistence vs performance

    • data structure/locality (write data or log)

  • How to handle the semantics of a file system operation that requires multiple disk writes?

    • Crash recovery and FS consistency

      • E.g., add a new block: mark block as not-free, add block # to inode

      • E.g., creating a new file modifies multiple disk blocks (allocating free inode, update list of free nodes, record name/inode as a directory entry, which may need to allocate a new block…)

  • How to/if handle concurrent write?

    • Performance vs consistency (lost updates or mixed data)

Outline13 l.jpg
Outline System

  • Admin.

  • Storage system

    • Review of local file system (FS)

    • Networked FS with one server

Single server networked storage system l.jpg
Single-Server Networked Storage System System

Application Logic/Server

Generally ->

A Single Storage Server (e.g., DB Server, File Server)

Brion Vibber, CTO, Wikipedia, 2008 slides

Example nfsv2 3 architecture l.jpg
Example: NFSv2/3 Architecture System

NFS client

NFS client

  • Design objectives

    • Compatible with local Unix FS semantics if possible

    • Acceptable performance

NFS server

Nfsv3 server procedures l.jpg
NFSv3 Server Procedures System







Example nfs messages for reading a file l.jpg
Example: NFS Messages for Reading a File System

Why does NFS use file handles?

Nfsv3 read l.jpg
NFSv3 READ System

  • Input:

    • struct READ3args { nfs_fh3 file; offset3 offset; count3 count; };

  • Output

    • struct READ3resok { post_op_attr file_attributes; count3 count; bool eof; opaque data<>; };

READ must specify offset, but typically we do not include offset in local FS access. Implicationof removing offset?

NFSv3 Design Decision: Stateless Server

What is file handler?

Nfs write l.jpg

  • Design question: spectrum of persistent strategies for a client to write to server

    • Not write from client to server until close

      • Typical: called close-to-open semantics, e.g., Linux client in kernels newer than 2.4.20 supports this mode

    • Write back (not persist, asynchronous)

      • Sends to server but does not ask server to sync to disk

    • Write through (synchronous, persist to disk)

Nfs write evolution l.jpg
NFS Write Evolution System

  • NFSv2

    • struct writeargs { fhandle file; unsigned beginoffset; unsigned offset; unsigned totalcount; nfsdata data; };

    • Two client modes

      • Close-to-open

      • Synchronous write (write through)

  • NFSv3: allows asynchronous write


Nfsv3 write l.jpg
NFSv3 WRITE System

  • Input:

    • enum stable_how { UNSTABLE = 0, DATA_SYNC = 1, FILE_SYNC = 2 }; struct WRITE3args { nfs_fh3 file; offset3 offset; count3 count; stable_how stable; opaque data<>; }; struct WRITE3resok { wcc_data file_wcc; count3 count; stable_how committed; writeverf3 verf; };

Nfsv3 write22 l.jpg
NFSv3 Write System

  • NFSv3 asynchronous write

    • Typical client behavior

      • unstable writes (for fast write)

      • COMMIT a group of writes later

  • Problem in a networked setting: server may crash and lose previous data but client does not know (no fate sharing)

    • Server veri is a cookie to allow a client to detect server crash/reboot; veri can be server start time

Nfsv3 write23 l.jpg
NFSv3 Write System

  • Write returns wcc_data:

  • Allows a client to detect changes between writes to support concurrency detection

struct wcc_data { pre_op_attr before; post_op_attr after; };

Nfsv3 create l.jpg

  • Input

    • enum createmode3 { UNCHECKED = 0, GUARDED = 1, EXCLUSIVE = 2 };

    • union createhow3 switch (createmode3 mode) { case UNCHECKED: case GUARDED: sattr3 obj_attributes; case EXCLUSIVE: createverf3 verf; // EXCLUSIVE means no duplicate}; struct CREATE3args { diropargs3 where; createhow3 how; }; struct CREATE3resok { post_op_fh3 obj; post_op_attr obj_attributes; wcc_data dir_wcc; };

What is verf: A combination of a client identifier, e.g., the client network address, and a unique number generated by the client, e.g., the RPC transaction identifier.

The server creates the file and stores verf in a stable storage before return.

Why verf?

Nfs exact once semantics l.jpg
NFS: Exact-Once-Semantics System

  • An issue not in local FS is lost/repeated requests

    • A request from client to server can be lost, and client retransmits request

    • Retransmissions can cause issues with non-idempotent requests (examples?)

  • NFS ties to provide EOS using a reply cache

    • Cache entry indexed by Source address + RPC transaction XID

Eos with reply cache l.jpg
EOS with Reply Cache System

  • Problem of reply cache:

    • True and complete EOS is not possible unless the server persists the reply cache in stable storage; but persists for each request is expensive

  • NFSv3 solution: allow client veri for selected calls to enforce persistence and identify request

Summary so far issues using one network file server l.jpg
Summary So Far: Issues Using One Network File Server System

  • Design issues not in or more important than local FS

    • Stateless or stateless server

    • Loss/repeated requests and EOS

    • Asynchronous write with server crash

Discussion l.jpg
Discussion System

  • What do you want to remove from/add to NFSv3?

Exercise how to append in nfsv3 l.jpg
Exercise: How to Append in NFSv3? System

  • App: append data in buf1 to file “/d1/f1”:

    • GET handler for /d1/f1

    • GETATTR to retrieve file size size;

    • write buf1 at file offset size

  • App: append data inbuf2 to file “/d1/f1”:

    • GET handler for /d1/f1

    • GETATTR to retrieve file size size;

    • write buf2 to at file offset size


What is the result?

Solution: Use a Lock (server) to synchronize

Nfsv4 4 1 stateless to state l.jpg
NFSv4/4.1: Stateless to State System

  • Integrating locking into the NFS protocol necessarily causes it to be stateful.

  • A NFSv4 server can grant a lock of any type

    • opens,

    • byte-range locks,

    • delegations (recallable guarantees by the server to the client that other clients will not reference or modify a particular file, until the delegation is returned), and

    • layouts

  • Server assigns a unique stateid that represents a set of locks (often a single lock) for the same file, of the same type, and sharing the same ownership characteristics

  • All stateids associated with a given client ID are associated with a common lease

  • Server guarantees exclusive access to a client until lease expires or client holding the lease is notified

Discussion31 l.jpg
Discussion System

  • Pros of using lease

  • Problems of using lease

Exercise design question l.jpg
Exercise: Design Question System

  • CS signup calendar:http://cs-www.cs.yale.edu/calendars/signup/signup.cgi?event=kurose.02.04.10.ev

Exact once semantics l.jpg
Exact-Once-Semantics System

  • Non-idempotent requests.

    • rename

  • Idempotent modifying requests.

    • WRITE operation. Repeated execution of the same WRITE has the same effect as execution of that WRITE a single time. Nevertheless, enforcing EOS for WRITEs and other idempotent modifying requests is necessary to avoid data corruption. Suppose a client sends WRITE A to a noncompliant server that does not enforce EOS, and receives no response, perhaps due to a network partition. The client reconnects to the server and re-sends WRITE A. Now, the server has outstanding two instances of A. The server can be in a situation in which it executes and replies to the retry of A, while the first A is still waiting in the server's internal I/O system for some resource. Upon receiving the reply to the second attempt of WRITE A, the client believes its WRITE is done so it is free to send WRITE B, which overlaps the byte-range of A. When the original A is dispatched from the server's I/O system and executed (thus the second time A will have been written), then what has been written by B can be overwritten and thus corrupted.

  • Idempotent non-modifying requests.