Distributed Storage and Consistency. Storage moves into the net. Network delays Network cost. Storage capacity/volume Administrative cost Network bandwidth. Shared storage with scalable bandwidth and capacity. Consolidate — multiplex — decentralize — replicate.
Distributed Storage and Consistency
Shared storage with scalable bandwidth and capacity.
Consolidate — multiplex — decentralize — replicate.
Reconfigure to mix-and-match loads and resources.
Storage Service Provider
Application Service Provider
Outsourcing: storage and/or applications as a service.
For ASPs (e.g., Web services), storage is just a component.
Goal: managed storage on demand for cross-disciplinary research.
Direct SAN access for “power clients” and NAS PoPs; other clients access through NAS.
Campus FC net
Campus IP net
Each SAN volume is managed by a single NAS PoP.
All access to each volume is mediated by its NAS PoP.
Campus FC net
Virtual Address Space
issue memory ops
in program order
switch randomly set
after each memory op
ensures some serial
order among all operations
Easily implemented with shared bus.
For page-based DSM, weaker consistency models may be useful….but that’s for later.
modeload (read)store (write)
exclusive yes yes
RPC over UDP or TCP
RMI is “RPC in Java”, supporting Emerald-like distributed object references, invocation, and garbage collection, derived from SRC Modula-3
network objects [SOSP 93].
The registry provides a bootstrap naming service using URLs.
1: Naming.bind(URL, obj1)
2: stub1 = Naming.lookup(URL)
3: stub2 = stub1->method()
These slides were not discussed. I use them in CPS 210, the operating systems course. They provide useful background for the material on NFS.
shared block storage service (FC/SAN, Petal, NASD)
compatibility with NAS protocols
sharing, coordination, recovery
block allocation and layout
separate lock service
logging and recovery
storage service + lock manager
What does Frangipani need from Petal? How does Petal contribute to F’s *ility?
Could we build Frangipani without Petal?
Each volume is a set of directories and files; a host’s file tree is the set of
directories and files visible to processes on a given host.
File trees are built by grafting
volumes from different volumes
or from network servers.
In Unix, the graft operation is
the privileged mount system call,
and each volume is a filesystem.
syscall layer (file, uio, etc.)
Virtual File System (VFS)
VFS was an internal kernel restructuring
with no effect on the syscall interface.
Incorporates object-oriented concepts:
a generic procedural interface with
Based on abstract objects with dynamic
method binding by type...in C.
Other abstract interfaces in the kernel: device drivers,
file objects, executable files, memory objects.
Each vnode has a standard
file attributes struct.
Generic vnode points at
(e.g., inode, rnode), seen
only by the filesystem.
Each specific file system maintains a cache of its resident vnodes.
Vnode operations are
macros that vector to
vnode attributes (vattr)
type (VREG, VDIR, VLNK, etc.)
mode (9+ bits of permissions)
nlink (hard link count)
owner user ID
owner group ID
unique file ID
file size (bytes and blocks)
vop_lookup (OUT vpp, name)
vop_create (OUT vpp, name, vattr)
vop_remove (vp, name)
vop_link (vp, name)
vop_rename (vp, name, tdvp, tvp, name)
vop_mkdir (OUT vpp, name, vattr)
vop_rmdir (vp, name)
vop_symlink (OUT vpp, name, vattr, contents)
vop_readdir (uio, cookie)
vop_getpages (page**, count, offset)
vop_putpages (page**, count, sync, offset)
VFS free list head
Active vnodes are reference- counted by the structures that hold pointers to them.
- system open file table
- process current directory
- file system mount points
Each specific file system maintains its own hash of vnodes (BSD).
- specific FS handles initialization
- free list is maintained by VFS
vget(vp): reclaim cached inactive vnode from VFS free list
vref(vp): increment reference count on an active vnode
vrele(vp): release reference count on a vnode
vgone(vp): vnode is no longer valid (file is removed)
1. crossing mount points
2. obtaining root vnode (or current dir)
3. finding resident vnodes in memory
4. caching name->vnode translations
5. symbolic (soft) links
6. disk implementation of directories
7. locking/referencing to handle races
with name create and delete operations
vp = get vnode for / (rootdir)
vp = cvp;