distributed file systems l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Distributed File Systems PowerPoint Presentation
Download Presentation
Distributed File Systems

Loading in 2 Seconds...

play fullscreen
1 / 45

Distributed File Systems - PowerPoint PPT Presentation


  • 1078 Views
  • Uploaded on

Distributed File Systems Chapter 16 Distributed Systems Introduction – advantages of distributed systems 15. Structures – network types, design 16. File Systems – naming, cache updating 17. Coordination – event ordering, mutual exclusion Internet Multi-CPU Systems M M M C+M C+M

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Distributed File Systems' - salena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
distributed systems
Distributed Systems
  • Introduction – advantages of distributed systems

15. Structures – network types, design

16. File Systems – naming, cache updating

17. Coordination – event ordering, mutual exclusion

multi cpu systems

Internet

Multi-CPU Systems

M

M

M

C+M

C+M

C

C

C

C

C

C

C

C

C

Shared memory

M

C

Inter- connect

C

C

C

M

C

C

C

C

C

C

C

C+M

C+M

C+M

M

M

M

Wide-area distributed system

Shared-memory multiprocessor

Message-passing multicomputer

Tanenbaum, Modern Operating Systems, 2nd Ed., p. 505

examples of multi cpu systems
Examples of Multi-CPU Systems
  • Multiprocessors – quad CPU PC
  • Multicomputer – 512 nodes in a room working on pharmaceutical modelling
  • Distributed System – Thousands of machines loosely cooperating over the Internet

Tanenbaum, p. 549

interconnect topologies
Interconnect Topologies

Grid

Single switch

Ring

Hypercube

Cube

Smallest diameter

Most links

Double Torus

Tanenbaum, Modern Operating Systems, 2nd Ed., p. 528

chapter 16 distributed file systems
Chapter 16 Distributed-File Systems
  • Background
  • Naming and Transparency
  • Remote File Access
  • Stateful versus Stateless Service
  • File Replication
  • Example Systems
background
Background
  • Distributed file system (DFS) – a distributed implementation of the classical time-sharing model of a file system, where multiple users share files and storage resources.
  • A DFS manages set of dispersed storage devices
    • Overall storage space is composed of different, remotely located, smaller storage spaces.
  • A component unit is the smallest set of files that can be stored on a single machine, independently of other units.
    • There is usually a correspondence between constituent storage spaces and sets of files.
dfs parts
DFS Parts
  • Service – software entity running on one or more machines and providing a particular type of function to a priori unknown clients.
  • Server – service software running on a single machine.
  • Client – process that can invoke a service using a set of operations that forms its client interface.

So a file system provides file services to clients.

dfs features
DFS Features
  • Client Interface
    • A set of primitive file operations (create, delete, read, write).
  • Transparency
    • Local and remote files are indistinguishable
    • The multiplicity of its servers and storage devices should appear invisible.
    • Response time is ideally comparable to that of a local file system
dfs implementation

Application

Application

Application

Application

Middleware

Middleware

Middleware

Middleware

Solaris

Windows

Mac OS

Linux

Pentium

Macintosh

Pentium

SPARC

Network

DFS Implementation
  • Various Implementations
    • Part of a distributed operating system, or
    • A software layer managing communication between conventional Operating Systems

Tanenbaum, p. 551

naming and transparency
Naming and Transparency
  • Naming – mapping between logical and physical objects.
  • Multilevel mapping – abstraction of a file that hides the details of how and where on the disk the file is actually stored.
  • A transparent DFS hides the location in the network where the file is stored.
  • A file can be replicated in several sites,
    • the mapping returns a set of the locations of this file’s replicas
    • both the existence of multiple copies and their location are hidden.
naming structures
Naming Structures
  • Location transparency – file name does not reveal the file’s physical storage location.

e.g. /server1/dir/dir2/x says that file is located on server1, but it does not tell where that server is located

    • File name still denotes a specific, although hidden, set of physical disk blocks.
    • Convenient way to share data.
    • Can expose correspondence between component units and machines.
    • However, if file x is large, the system might like to move x from server1 to server2, but the path name would change from /server1/dir/dir2/x to /server2/dir/dir2/x
naming structures14
Naming Structures
  • Location independence – file name does not need to be changed when the file’s physical storage location changes.
    • Better file abstraction.
    • Promotes sharing the storage space itself.
    • Separates the naming hierarchy from the storage-devices hierarchy, allowing file migration
    • Difficult to achieve, few experimental examples only

(e.g. Andrews File System)

    • Even remote mounting will not achieve location independence, since it is not normally possible to move a file from one file group (the unit of mounting) to another, and still be able to use the old path name.
naming schemes three main approaches
Naming Schemes — Three Main Approaches
  • Combination names:
    • Files named by combination of their host name and local name;
    • Guarantees a unique systemwide name.

e.g. host:local-name

  • Mounting file systems:
    • Attach remote directories to local directories, giving the appearance of a coherent directory tree;
    • Automount allows mounts to be done on demand
  • Global name structure
    • Total integration of the component file systems.
    • Spans all the files in the system.
    • Location-independent file identifiers link files to component units
types of middleware
Types of Middleware
  • Document-based:
    • Each page has a unique address
    • Hyperlinks within each page point to other pages
  • File System based:
    • Distributed system looks like a local file system
  • Shared Object based
    • All items are objects, bundled with access procedures called methods
  • Coordination-based
    • The network appears as a large, shared memory
document based middleware
Document-based Middleware
  • Make a distributed system look like a giant collection of hyperlinked documents
    • E.g. hyperlinks on web pages.
  • Steps in accessing web page

http://www.acm.org/dl/faq.html

    • Browser asks DNS for IP address of www.acm.org
    • DNS replies with 199.222.69.151
    • Browser connects by TCP to Port 80 of 199.222.69.151
    • Browser requests file dl/faq.html
    • TCP connection is released
    • Browser displays all text in dl/faq.html
    • Browser fetches and displays all images in dl/faq.html
file system based middleware
File-system based Middleware
  • Make a distributed system look like a great big file system
    • Single global file system, with users all over the world able to read and write files for which they have authorization

Server

Client

Client

Server

Old file

New file

Upload/download model

e.g. AFS

Remote access model

e.g. NFS

remote file access
Remote File Access
  • Reduce network traffic by retaining recently accessed disk blocks in a cache, so that repeated accesses to the same information can be handled locally.
    • If needed data not already cached, a copy of data is brought from the server to the user.
    • Accesses are performed on the cached copy.
    • Files identified with one master copy residing at the server machine, but copies of (parts of) the file are scattered in different caches.
    • Cache-consistency problem – keeping the cached copies consistent with the master file.
    • Network virtual memory, with backing store at a remote server
network cache location
Network Cache Location
  • Disk Cache
    • More reliable, survive crashes.
  • Main-Memory Cache
    • Permit workstations to be diskless.
    • Data can be accessed more quickly.
    • Technology trend to bigger, less expensive memories.
    • Server caches (used to speed up disk I/O) are in main memory regardless of where user caches are located; using main-memory caches on the user machine permits a single caching mechanism for servers and users.

e.g. NFS has memory caching, optional disk cache

cache update policy
Cache Update Policy
  • Write-through policy – write data through to disk as soon as they are placed on any cache. Reliable, but poor performance.
  • Delayed-write policy – modifications written to the cache and then written through to the server later.
    • Fast – write accesses complete quickly.
    • Less reliable – unwritten data lost whenever a user machine crashes.
  • Update on flush from cache
    • But flushes happen at irregular intervals
  • Update on regular scan
    • Scan cache, flush blocks that have been modified since the last scan (NFS).
  • Write-on-close – write data back to the server when the file is closed (AFS).
    • Best for files that are open for long periods and frequently modified.
consistency
Consistency
  • Is locally cached copy of the data consistent with the master copy? How to verify validity of cached data?
  • Client-initiated approach
    • Client initiates a validity check.
    • Server checks whether the local data are consistent with the master copy.
    • Check before every access, or timed checks
  • Server-initiated approach
    • Server records, for each client, the (parts of) files it caches.
    • When server detects a potential inconsistency, it reacts

e.g. When same file is open for read and write on different clients

caching and remote service
Caching and Remote Service
  • Caching
    • Faster, especially with locality in file accessing
    • Servers contacted only occasionally (rather than for each access).
    • Reduced server load and network traffic
    • Enhanced potential for scalability.
    • Lower network overhead, as data is transmitted in bigger chunks
  • Remote server method
    • Useful for diskless machines
    • Avoids cache-consistency problem
    • Inter-machine interface mirrors the local user-file-system interface
stateful file service
Stateful File Service
  • Mechanism.
    • Client opens a file.
    • Server fetches information about the file from its disk, stores it in its memory, and gives the client a connection identifier unique to the client and the open file.
    • Identifier is used for subsequent accesses until the session ends.
    • Server must reclaim the main-memory space used by clients who are no longer active.
  • Increased performance.
    • Fewer disk accesses.
    • Stateful server knows if a file was opened for sequential access and can thus read ahead the next blocks.
stateless file server
Stateless File Server
  • Mechanism
    • Each request self-contained.
    • No state information retained between requests.
    • Each request identifies the file and position in the file.
    • File open and close are local to the client
  • Design implications
    • Reliable, survives server crashes
    • Slower, with longer request messages
    • System-wide file names needed, to avoid name translation
    • Idempotent – File requests should leave server unchanged
recovery from failures
Recovery from Failures
  • Stateful server
    • Server failure – loses its volatile state
      • Restore state by recovery protocol in dialog with clients, or
      • Abort operations that were underway when the crash occurred.
    • Client failure
      • Server needs to be aware of client failures in order to reclaim space allocated to record the state of crashed client processes (orphan detection and elimination).
  • Stateless server
    • Server failure and recovery almost unnoticeable.
    • Newly refreshed server can respond to a self-contained request without any difficulty.
file replication
File Replication
  • Replicas of the same file on failure-independent machines.
    • Improves availability, shortens service time.
  • Replicated file name mapped to a particular replica.
    • Existence of replicas should be invisible to higher levels.
    • Replicas distinguished from one another by different lower-level names.
  • Updates – replicas of a file denote the same logical entity
    • thus an update to any replica must be reflected in all other replicas e.g. Locus OS.
  • Demand replication – reading a nonlocal replica causes it to be cached locally, thereby generating a new nonprimary replica.
    • Updates are made to the primary copy, others are invalid (e.g. Ibis)
andrew distributed computing environment
Andrew Distributed Computing Environment
  • History
    • under development since 1983 at Carnegie-Mellon University.
    • Name honours Andrew Carnegie and Andrew Mellon
  • Highly scalable;
    • the system is targeted to span over 5000 workstations.
  • Distinguishes between client machines (workstations) and dedicated server machines.
    • Servers and clients run slightly modified UNIX
    • Workstation LAN clusters interconnected by a WAN.
andrew file system afs
Andrew File System (AFS)
  • Clients are presented with a partitioned space of file names: a local name space and a shared name space.
  • Dedicated servers, called Vice, present the shared name space to the clients as a homogeneous, identical, and location transparent file hierarchy.
    • The local name space is the root file system of a workstation, from which the shared name space descends.
  • Workstations run the Virtue protocol to communicate with Vice, and are required to have local disks where they store their local name space.
    • Servers collectively are responsible for the storage and management of the shared name space.
afs file operations
AFS File Operations
  • Andrew caches entire files from servers.
    • A workstation interacts with Vice servers only during opening and closing of files.
  • Venus runs locally in the kernel on each workstation
    • Caches files from Vice when they are opened,
    • Stores modified copies of files back when they are closed.
    • Caches contents of directories and symbolic links, for path-name translation
  • Reading and writing bytes of a file
    • Done by the kernel without Venus intervention on the cached copy.
types of middleware31
Types of Middleware
  • Document-based: (e.g. Web)
    • Each page has a unique address
    • Hyperlinks within each page point to other pages
  • File System based: (e.g NFS, AFS)
    • Distributed system looks like a local file system
  • Shared Object based (e.g. CORBA, Globe)
    • All items are objects, bundled with access procedures called methods
  • Coordination-based (e.g. Linda, Jini)
    • The network appears as a large, shared memory
shared object based middleware
Shared Object based Middleware
  • Objects
    • Everything is an object, a collection of variables bundled with access procedures called methods
    • Processes invoke methods to access the variables
  • Common Object Request Broker Architecture (CORBA)
    • Client processes on client machines can invoke operations on objects on (possibly) remote server machines
    • To match objects from different machines, Object Request Brokers (ORBs) are interposed between client and server to allow them to match up
  • Interface Definition Language (IDL)
    • Tells what methods the object exports,
    • Tells what parameter types each object expects
corba model

Client ORB

Client ORB

Operating System

Operating System

CORBA Model

Server code

Client stub

Skeleton

Client code

server

client

Object adapter

IIOP

Tanenbaum, p. 567

corba
CORBA
  • Allows different client and server applications to communicate
    • e.g. a C++ program can use CORBA to access a COBOL database
  • ORB (Object Request Broker)
    • implements the interface specified by the IDL
    • ORB is on both client and server side
  • IIOP (Internet InterORB Protocol)
    • specifies how ORBs can communicate
  • Stub – Client-side library of IDL object specs
  • Skeleton – Server-side procedure for IDL-spec’d object
  • Objectadapter
    • wrapper that registers object,
    • generates object references,
    • activates the object
remote method invocation
Remote Method Invocation

Procedure

  • Process creates CORBA object, receives its reference
  • Reference is available to be passed to other processes, or stored in an object database for lookup
  • Client process acquires a reference to the object
  • Client process marshals required parameters into a parcel
  • Client process contacts client ORB
  • Client ORB sends the parcel to the server ORB
  • Server ORB arranges for invocation of method on the object
globe system
Globe System
  • Scope
    • Scales to 1 billion users and 1 trillion objects

e.g. stock prices, sports scores

  • Method
    • Replicate object, spread load over replicas
    • Every globe object has a class object with its methods
    • The object interface is a table of pointers, each a <method pointer, state pointer> pair
    • State pointers can point to interfaces such as mailboxes, each with its own language or function

e.g. business mail, personal mail

e.g. languages such as C, C++, Java, assembly

globe object

List Messages

Read Messages

Append Messages

Delete Messages

Globe Object

Class object contains the method

State of Mailbox 2

State of Mailbox 1

Interface used to access Mailbox 2

Interface used to access Mailbox 1

accessing a globe object
Accessing a Globe Object
  • Reading
    • Process looks it up, finds a contact address (e.g IP, port)
    • Security check, then object binding
    • Class object (code) loaded into caller’s address space
    • Instantiate a copy of its state
    • Process receives a pointer to its standard interface
    • Process invokes methods using the interface pointer
  • Writing
    • According to object replication policy:
    • Obtain a sequence number from the sequencer
    • Multicast a message containing the sequence number, operation name and parameters to all other processes bound to the object
    • Apply writes in order of sequence, to master, and update replicas
globe object39

Globe Object

Interface

Control subobject

Semantic subobject

Replication subobject

Communication subobject

Security subobject

Operating System

Messages

subobjects in a globe object
Subobjects in a Globe Object
  • Control subobject
    • Accepts incoming invocations, distributes tasks
  • Semantics subobject
    • Actually does the work required by object interface; only part actually programmed by coder
  • Replication subobject
    • Manages object replication

(e.g. all active, or master-slave)

  • Security suboject – implements security policy
  • Communication subobject – network protocols (e.g. IP v4)
coordination based middleware
Coordination-based Middleware
  • Linda
    • Developed at Yale, 1986
    • Users appear to share a big memory, known as tuple space
    • Processes on any machine can insert tuples into tuple space or remove tuples from tuple space
  • Publish/Subscribe, 1993
    • Processes connected by a broadcast network
    • Each process can be a producer of information, a consumer, or both
  • Jini
    • From Sun Microsystems, 1999
    • Self-contained Jini devices are plugged into a network, not a computer
    • Each device offers or uses services
linda
Linda
  • tuples
    • Like a structure in C, pure data, with no associated methods

e.g. (“abc, 2, 5)

(“matrix-1”, 1, 6, 3.14)

(“family”, “is-sister”, “Stephany”, “Roberta”)

  • Operations
    • Out – put a tuple into tuple space e.g. out(“abc”, 2, 5);
    • In – retrieve a tuple from tuple space e.g. in((“abc”, 2, ?i);
      • addressed by content rather than <name, address>
      • tuple space is searched for a match to the specified contents
      • Process is blocked until a match is found
    • Read a tuple, but leave it in tuple space
    • Eval to evaluate tuple parameters and the resulting tuple put out
publish subscribe
Publish/subscribe
  • Publishing
    • New information broadcast as a tuple on the network
    • Tuple has a subject line with multiple fields separated by periods
    • Processes can subscribe to certain subjects
  • Subscribing
    • The tuple daemon on each machine copies all broadcasted tuples into its RAM
    • It inspects each subject line, forwards a copy to each interested process.

Producer

WAN

LAN

Consumer

Daemon

Information router

slide44
Jini
  • Network-centric computing
    • An attempt to change from CPU-centric computing
    • Many self-contained Jini devices offer services to the others
    • e.g. Computer, cell phone, printer, palmtop, TV set, stereo
    • A loose confederation of devices, with no central administration
    • Coded in JVM (Java Virtual Machine language)
  • Joining a Jini federation
    • Broadcasts a message asking for a lookup service
    • Uses the discovery protocol to find the service
    • Lookup service sends code to register the new device
    • Device acquires a lease to register for a fixed time
    • The registration proxy can be sent to other devices looking for service
slide45
Jini
  • JavaSpaces
    • Entries like Linda tuples, but strongly typed

e.g. Employee entry could have <string, integer, integer, boolean> to accommodate <name, department, telephone, works fulltime>

  • Operations
    • Write – put an entry into JavaSpace, specifying the lease time
    • Read – copy an entry that matches a template out of JavaSpace
    • Take – copy and remove an entry that matches a template
    • Notify – notify the caller when a matching entry is written
    • Transactions can be atomic, so multiple methods can be safely grouped – all or none will execute