Implementing NFSv4 for HPSS: the GANESHA architecture Philippe DENIEL CEA/DAM firstname.lastname@example.org There are 3 goals in this presentation: Describing the NFSv4 enhancements compared to NFSv2 and NFSv3
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Implementing NFSv4 for HPSS:the GANESHA architecture
There are 3 goals in this presentation:
Describing the NFSv4 enhancements compared to NFSv2 and NFSv3
Describing the design and architecture of our development of the NFS protocols on top of HPSS : the GANESHA architecture.
Showing how these new features can be interesting for the HPSS community
NFS v2 was known for being a "home made" protocol, designed by Sun Microsystems, in 1984
NFS v3 was a little more discussed and several companies took part in the design of the protocol
NFSv4 is the result of an IETF working group, like TCP, UDP or IPv6. Design process started in 1997, and ends with edition of RFC3530 in late 2003.
NFS v4 is defined by RFC3530
NFSv4 is a standalone protocol; it requires no ancillary protocol
Mount protocol, NLM, NFS Stat are no more needed
Port 2049 is the only resource required by NFSv4
This value is written explicitly in the RFC
NFSv4 is not bound to a Unix semantic
File system object's attributes are shown as self-described bitmaps, not as a Unix-like structure
NFS v4 is firewall friendly
NFS v4 is not bound to Unix
NFSv4 is design to work on high latency / low bandwidth network
Semantic is not necessary a Unix-like system
no link to Unix structures, information is managed as bitmaps
User/group managed as strings, not ids
NFSv4 can export file systems with reduced attributes (like PC FAT), or extended attributes
Access Control Lists are natively supported
ACL model suits the need of diverse ACLs models (POSIX and NT)
Windows OPEN/CLOSE semantics is supported
Non latin characters are supported via UTF-8
NFSv4 will support minor versioning for protocol extensions
NFSv4 is ready for IPv6
NFSv4 is not dedicated to Unix clients
Uid = 6734
sid = 12-34-5678-9012
NFSv4 is firewall friendly
Only ports 2049/tcp and 2049/udp will be use; no other port is required
NFSv4 is ONC/RPC based
RPCSEC_GSS is explicitly supported
Every security paradigm with a GSSAPI integration can be used with RPCSEC_GSS : krb5, LIPKEY, SPKM3
NFSv4 is connection oriented
Connection based security (like SSL) is possible
RFC3530 recommends not to use SSL, but to use LIPKEY via GSSAPI and RPCSECGSS instead
NFSv4 compound requests are kind of lists of elementary operations to be performed on the server
The client can do many things in one call: the client/server dialog is reduced and becomes more flexible
The client can perform the request that fits correctly its caching policy
Elementary operation are dedicated to cache validation implementation on the client side (OP_VERIFY, OP_NVERIFY)
NFSv4 clients have the capability to handle locally, in its cache, a file for a given time period (delegation mechanism)
NFSv4 could be very efficient in a large scale through a proxy caching
Proxies with policies similar to HTTP proxies can cache files accessed by a pack of clients
NFSv4 protocol can be interesting with HPSS on several points
Aggressive caching on both client and server sides reduces the amount of request performed to the HPSS systems
HPSS specific information could be obtained, on a per file handle base, via the use of NFSv4 named attributes
Class of Service (set / get)
Storage Class related information (get only)
Migration state ( get only)
Site local attributes (set / get / create )
Native support of ACLs
Secured mount points could be use to safely share files between remote sites (Kerberos 5 and possibly SPKM-3 or LIPKEY).
Both TCP and UDP support as RPC transport layer
Filesets semantics and junction traversal is natively supported with potential server indirection and security re-negotiation.
Non Unix clients can be used (See Hummingbird’s NFS Maestro).
GANESHA is a generic NFS server design. It has several component:
RPC Layer : manages ONCRPC / RPCSEC_GSS requests
FSAL : File System Abstraction layer, provides access to the exported file system
Cache Inode layer and : File Content layer cache the metadata and information related to the managed FSAL objects
Admin Layer: generic interface for administrative operation on the daemon (stop, start, statistics, …)
ECL: External Control Layer, provides a way to administrate the server from the outside, on a client/server basis.
Internal Logging / External Logging
Memory management: resources are allocated at server’s boot, and managed internally.
Dup Req Layer
External Control API
NFS V2 / V3
cache fs operations
File system Abstraction Layer
File Content layer
The FSAL semantics is closed to the NFSv4 semantics (for reducing structure conversion)
Cache Inode uses Red-Black Tree based Hash Tables, designed to managed 10e5 to 10e6 entries at the same time: the objective is to keep a few days of production in this cache.
File Content cache will be preserved in case of server crash recovery
2: Dispatcher thread stores each
request into a tbuf entry. V4 and V2-V3
requests are separated
1: Client sends request to server
5: the requested operation
is done in HPSS
V4 pending entry
(based on nfs creds)
V4 pending entry
4: One available Worker thread
pick up a waiting tbuf entry.
V4 Tbuf list
V2/v3 pending entry
V2/v3 pending entry
One call per
V4 bulk request
calls to HPSS)
V2/v3 pending entry
V2/V3 Tbuf list
3: Request is decoded and waits in a tbuf entry
( V2-V3 and V4 requests are separated)
6: Results of the request are replayed to the client by the worker Thread
The FSAL layer is specific to the use of the NFS server with HPSS. It will be under the term of the HPSS licence
The other layers will have no adherence with HPSS. They will be provided under the term of the CeCILL licence (CEA’s free software licence). They can be freely used with other FSAL modules based on other File Systems.
06/05: HPSS/FSAL, Cache Inode and File Content layers development almost complete
Summer 05: complete the integration with RPC layers. The daemon should be NFSv2/NFSv3 capable and implement part of the NFSv4 protocol, but enough to be functional. Security support fully provided.
Autumn 05: delegation support and named attributes support to be added to the daemon. Validation of non Unix clients
12/05: first full version of the product.
After that : other FSAL modules to be developed. “Small files pack” specific FSAL to be added.
Hypothesis: the small files are located in the same directory tree
A file containing a file system image (mountable on loopback) is created. The directory tree with the small file is stored in the filesystem internal to this file.
The file system image is stored in HPSS (as a big file), but is seen as a NFSv4 fileset, with explicit export entry in the daemon configuration file
When accessed the first time, the big “fs image” file is stored in a cache dedicated to “fs image” caching.
Further operation to this fileset are done internally to the fs image (the NFS daemon can browse the fs image in user space)
HPSS sees only a big file, small files exists only as they are seen by the NFS daemon inside the “fs image”.
Other FSAL will be provided
FSAL for ext3fs and Reiserfs
FSAL for LUSTRE
FSAL implementing a user space NFSv4 client to build a NFSv4 proxy (server on one side, client on the other)
Other GSSAPI’s quality of protection supported as soon as supported by Linux.