WAFL Overview

1. WAFL Overview NetApp Spotlight Series More Info

2. 2 WAFL: Write Anywhere File LayoutFilesystem for Improved Productivity Earlier we talked about disk capacity increasing but disk access times not. NetApp�s patented file system is called WAFL which stands for Write Anywhere Layout. WAFL always writes to the nearest available free block as opposed to a preallocated location on disk�If you look at conventional file systems, such as the Veritas Fast File System, NTFS, or the Berkeley Fast File System which is used in Solaris and HP/UX, they all write to pre-allocated locations on disk meaning that lot�s of disk seeking must occur. With WAFL we write out a stripe and hopefully not even move the heads to write the next stripe (if free space in the same cylinder). If not, it�s hopefully just one head click away. Quite simply, we minimize head seeking as much as possible.Earlier we talked about disk capacity increasing but disk access times not. NetApp�s patented file system is called WAFL which stands for Write Anywhere Layout. WAFL always writes to the nearest available free block as opposed to a preallocated location on disk�If you look at conventional file systems, such as the Veritas Fast File System, NTFS, or the Berkeley Fast File System which is used in Solaris and HP/UX, they all write to pre-allocated locations on disk meaning that lot�s of disk seeking must occur. With WAFL we write out a stripe and hopefully not even move the heads to write the next stripe (if free space in the same cylinder). If not, it�s hopefully just one head click away. Quite simply, we minimize head seeking as much as possible.

3. 3 Write Anywhere? Why do we do this? We are picking VERY carefully which blocks we read to, how does this affect efficiency? We are picking VERY carefully which blocks we read to, how does this affect efficiency?

4. 4 WAFL Architecture Overview

5. 5 WAFL uses integrated RAID4 RAID4 is similar to better known RAID5: RAID5: parity is distributed across all disks in the RAID group RAID4: parity is contained in a single disk in the RAID group Tradeoffs with the single parity disk RAID4 model: CON: The parity disk becomes the �hot spot� or bottleneck in the RAID group, due to intensive XOR parity calculations on it. RAID-3 typically uses a very small stripe width, sometimes as small as one byte per disk. The result: RAID-3 accesses all the disk in the group at one time, and can only execute one I/O request at a time. With RAID4, access to each disk becomes independent. The stripe size is sufficiently large that the majority of I/Os to the group will only affect a single disk. This allows the RAID-4 group to execute multiple I/O requests simultaneously (assuming they map to different member disks).RAID-3 typically uses a very small stripe width, sometimes as small as one byte per disk. The result: RAID-3 accesses all the disk in the group at one time, and can only execute one I/O request at a time. With RAID4, access to each disk becomes independent. The stripe size is sufficiently large that the majority of I/Os to the group will only affect a single disk. This allows the RAID-4 group to execute multiple I/O requests simultaneously (assuming they map to different member disks).

6. 6 WAFL eliminates the parity disk bottleneck WAFL overcomes the �classic� parity disk bottleneck issue, by the use of flexible write allocation policies: Writes any filesystem block to any disk location (data and meta data)* New data does not overwrite old data Allocates disk space for many client-write operations at once in a single new RAID-stripe write (no parity re-calculations) Writes to stripes that are near each other Writes blocks to disk in any order RAID-3 typically uses a very small stripe width, sometimes as small as one byte per disk. The result: RAID-3 accesses all the disk in the group at one time, and can only execute one I/O request at a time. With RAID4, access to each disk becomes independent. The stripe size is sufficiently large that the majority of I/Os to the group will only affect a single disk. This allows the RAID-4 group to execute multiple I/O requests simultaneously (assuming they map to different member disks).RAID-3 typically uses a very small stripe width, sometimes as small as one byte per disk. The result: RAID-3 accesses all the disk in the group at one time, and can only execute one I/O request at a time. With RAID4, access to each disk becomes independent. The stripe size is sufficiently large that the majority of I/Os to the group will only affect a single disk. This allows the RAID-4 group to execute multiple I/O requests simultaneously (assuming they map to different member disks).

7. 7 Result: Minimal seeks and no bottleneck Berkeley Fast File System (FFS) Assigns blocks to fixed disk locations, as physically close together as possible on a single disk, optimized for single-file-at-time access Apply it to NFS and the disk heads fly about madly Write Anywhere File Layout (WAFL) Writes blocks anywhere it finds convenient, close to the disk heads� current positions The previous version of a changed block is not over-written (it�s either retained or marked free) WAFL then logically threads a single file�s current blocks by updating the block pointers � it�s easy to adjust the pointers in the �inode� The result: reduced disk seek/latency time*Berkeley Fast File System (FFS) Assigns blocks to fixed disk locations, as physically close together as possible on a single disk, optimized for single-file-at-time access Apply it to NFS and the disk heads fly about madly Write Anywhere File Layout (WAFL) Writes blocks anywhere it finds convenient, close to the disk heads� current positions The previous version of a changed block is not over-written (it�s either retained or marked free) WAFL then logically threads a single file�s current blocks by updating the block pointers � it�s easy to adjust the pointers in the �inode� The result: reduced disk seek/latency time*

8. 8 WAFL Combined with NVRAM WAFL uses NVRAM �consistency points� (NetApp�s flavor of journalling), thus assuring filesystem integrity and fast reboots. CP flush to disk occurs once every 10 seconds or when NVRAM reaches half full. NVRAM placement is at the file system operation level, not at the (more typical) block level. This assures self-consistent CP flushes to disk. No fsck!

9. 9 General-purpose NV-RAM

10. 10 NVRAM and memory � key points Main memory is the write cache The NVRAM is not the write cache It is a redo log Once written, we never even look at it again Unless a controller fault occurs before a CP is complete and then we redo the operations in it �NVRAM-limited performance� is a myth Write throughput is limited by the disks or the controller Redo-logging is very space efficient Record only changed data Big win for small writes

11. 11 Seek Example in a SAN environment Assume 4K disk blocks, 2.5 msec for one seek+rotate, and an ideal 200MB/sec FC path. 200MB/sec FC bandwidth x .0025sec = .5MB worth of data blocks not sent on the channel during that seek. .5MB x 1 block/4KB = 128 blocks not sent Therefore a 2.5ms seek for just 1 block equates to a 128 block penalty Conclusion: one seek every 128 blocks or less ( ~1%) wastes at least half of your FC bandwidth!

12. 12 The Protocol Overhead issue Unlike general purpose file systems, WAFL has an intimate understanding of its underlying physical disk configuration. WAFL caches write operations that come in from the network, and then optimizes by performing multiple write operations all together within the same RAID array stripe. The stripe is chosen based on its physical proximity to the location of the disk heads at the time of the operation. This behavior ensures that the single parity disk does not become a bottleneck within the system as it would typically do with a general purpose file system. It also allows WAFL to achieve excellent write performance, since the disk heads never have to seek very far to write client data. Fragmentation is also not a significant issue with WAFL, as data belonging to the same file is always written to adjacent locations within the stripe. Unlike general purpose file systems, WAFL has an intimate understanding of its underlying physical disk configuration. WAFL caches write operations that come in from the network, and then optimizes by performing multiple write operations all together within the same RAID array stripe. The stripe is chosen based on its physical proximity to the location of the disk heads at the time of the operation. This behavior ensures that the single parity disk does not become a bottleneck within the system as it would typically do with a general purpose file system. It also allows WAFL to achieve excellent write performance, since the disk heads never have to seek very far to write client data. Fragmentation is also not a significant issue with WAFL, as data belonging to the same file is always written to adjacent locations within the stripe.

13. 13 The Protocol Overhead issue Unlike general purpose file systems, WAFL has an intimate understanding of its underlying physical disk configuration. WAFL caches write operations that come in from the network, and then optimizes by performing multiple write operations all together within the same RAID array stripe. The stripe is chosen based on its physical proximity to the location of the disk heads at the time of the operation. This behavior ensures that the single parity disk does not become a bottleneck within the system as it would typically do with a general purpose file system. It also allows WAFL to achieve excellent write performance, since the disk heads never have to seek very far to write client data. Fragmentation is also not a significant issue with WAFL, as data belonging to the same file is always written to adjacent locations within the stripe. Unlike general purpose file systems, WAFL has an intimate understanding of its underlying physical disk configuration. WAFL caches write operations that come in from the network, and then optimizes by performing multiple write operations all together within the same RAID array stripe. The stripe is chosen based on its physical proximity to the location of the disk heads at the time of the operation. This behavior ensures that the single parity disk does not become a bottleneck within the system as it would typically do with a general purpose file system. It also allows WAFL to achieve excellent write performance, since the disk heads never have to seek very far to write client data. Fragmentation is also not a significant issue with WAFL, as data belonging to the same file is always written to adjacent locations within the stripe.

14. 14 Superior performance vs. Competition

15. 15 Summary

WAFL Overview

WAFL Overview

Presentation Transcript

FS Consistency, Block Allocation, and WAFL

Creating a New WAFL District

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

OVERVIEW

Overview

Overview

OVERVIEW

Overview

Overview

OVERVIEW

Overview

Overview

Overview

Overview