Implementing NFSv4 for HPSS: the GANESHA architecture Philippe DENIEL CEA/DAM email@example.com There are 3 goals in this presentation: Describing the NFSv4 enhancements compared to NFSv2 and NFSv3
Describing the NFSv4 enhancements compared to NFSv2 and NFSv3
Describing the design and architecture of our development of the NFS protocols on top of HPSS : the GANESHA architecture.
Showing how these new features can be interesting for the HPSS communityScope of this presentation
NFS v3 was a little more discussed and several companies took part in the design of the protocol
NFSv4 is the result of an IETF working group, like TCP, UDP or IPv6. Design process started in 1997, and ends with edition of RFC3530 in late 2003.NFS v4 is an IETF protocol
NFSv4 is a standalone protocol; it requires no ancillary protocol
Mount protocol, NLM, NFS Stat are no more needed
Port 2049 is the only resource required by NFSv4
This value is written explicitly in the RFC
NFSv4 is not bound to a Unix semantic
File system object's attributes are shown as self-described bitmaps, not as a Unix-like structure
NFS v4 is firewall friendly
NFS v4 is not bound to UnixA more integrated protocol
no link to Unix structures, information is managed as bitmaps
User/group managed as strings, not ids
NFSv4 can export file systems with reduced attributes (like PC FAT), or extended attributes
Access Control Lists are natively supported
ACL model suits the need of diverse ACLs models (POSIX and NT)
Windows OPEN/CLOSE semantics is supported
Non latin characters are supported via UTF-8
NFSv4 will support minor versioning for protocol extensions
NFSv4 is ready for IPv6
NFSv4 is not dedicated to Unix clients
Uid = 6734
sid = 12-34-5678-9012Cross-Platform interoperability
Only ports 2049/tcp and 2049/udp will be use; no other port is required
NFSv4 is ONC/RPC based
RPCSEC_GSS is explicitly supported
Every security paradigm with a GSSAPI integration can be used with RPCSEC_GSS : krb5, LIPKEY, SPKM3
NFSv4 is connection oriented
Connection based security (like SSL) is possible
RFC3530 recommends not to use SSL, but to use LIPKEY via GSSAPI and RPCSECGSS insteadNFSv4 is security oriented
The client can do many things in one call: the client/server dialog is reduced and becomes more flexible
The client can perform the request that fits correctly its caching policy
Elementary operation are dedicated to cache validation implementation on the client side (OP_VERIFY, OP_NVERIFY)
NFSv4 clients have the capability to handle locally, in its cache, a file for a given time period (delegation mechanism)
NFSv4 could be very efficient in a large scale through a proxy caching
Proxies with policies similar to HTTP proxies can cache files accessed by a pack of clients
serverNFSv4 Caching capabilities
Aggressive caching on both client and server sides reduces the amount of request performed to the HPSS systems
HPSS specific information could be obtained, on a per file handle base, via the use of NFSv4 named attributes
Class of Service (set / get)
Storage Class related information (get only)
Migration state ( get only)
Site local attributes (set / get / create )NFSv4 and its use with HPSS (1)
Secured mount points could be use to safely share files between remote sites (Kerberos 5 and possibly SPKM-3 or LIPKEY).
Both TCP and UDP support as RPC transport layer
Filesets semantics and junction traversal is natively supported with potential server indirection and security re-negotiation.
Non Unix clients can be used (See Hummingbird’s NFS Maestro).NFSv4 and its use with HPSS (2)
RPC Layer : manages ONCRPC / RPCSEC_GSS requests
FSAL : File System Abstraction layer, provides access to the exported file system
Cache Inode layer and : File Content layer cache the metadata and information related to the managed FSAL objects
Admin Layer: generic interface for administrative operation on the daemon (stop, start, statistics, …)
ECL: External Control Layer, provides a way to administrate the server from the outside, on a client/server basis.
Internal Logging / External Logging
Memory management: resources are allocated at server’s boot, and managed internally.The GANESHA Architecture
Cache Inode uses Red-Black Tree based Hash Tables, designed to managed 10e5 to 10e6 entries at the same time: the objective is to keep a few days of production in this cache.
File Content cache will be preserved in case of server crash recoveryArchitecture’s objectives
2: Dispatcher thread stores each
request into a tbuf entry. V4 and V2-V3
requests are separated
1: Client sends request to server
5: the requested operation
is done in HPSS
V4 pending entry
(based on nfs creds)
V4 pending entry
4: One available Worker thread
pick up a waiting tbuf entry.
V4 Tbuf list
V2/v3 pending entry
V2/v3 pending entry
One call per
V4 bulk request
calls to HPSS)
V2/v3 pending entry
V2/V3 Tbuf list
3: Request is decoded and waits in a tbuf entry
( V2-V3 and V4 requests are separated)
6: Results of the request are replayed to the client by the worker ThreadNFS Daemon architecture
The other layers will have no adherence with HPSS. They will be provided under the term of the CeCILL licence (CEA’s free software licence). They can be freely used with other FSAL modules based on other File Systems.GANESHA and HPSS
Summer 05: complete the integration with RPC layers. The daemon should be NFSv2/NFSv3 capable and implement part of the NFSv4 protocol, but enough to be functional. Security support fully provided.
Autumn 05: delegation support and named attributes support to be added to the daemon. Validation of non Unix clients
12/05: first full version of the product.
After that : other FSAL modules to be developed. “Small files pack” specific FSAL to be added.What is available now ?
A file containing a file system image (mountable on loopback) is created. The directory tree with the small file is stored in the filesystem internal to this file.
The file system image is stored in HPSS (as a big file), but is seen as a NFSv4 fileset, with explicit export entry in the daemon configuration file
When accessed the first time, the big “fs image” file is stored in a cache dedicated to “fs image” caching.
Further operation to this fileset are done internally to the fs image (the NFS daemon can browse the fs image in user space)
HPSS sees only a big file, small files exists only as they are seen by the NFS daemon inside the “fs image”.Small file management: a possible solution