1 / 38

Linux on zSeries Module 4: File Systems

Linux on zSeries Module 4: File Systems. Objectives. State two differences between hard links and symbolic links. List four basic components of any file system. State the purpose of file systems and metadata. List the two categories of file systems and two characteristics of each.

lmatos
Download Presentation

Linux on zSeries Module 4: File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linux on zSeriesModule 4: File Systems

  2. Objectives • State two differences between hard links and symbolic links. • List four basic components of any file system. • State the purpose of file systems and metadata. • List the two categories of file systems and two characteristics of each. • Describe fsck and its role in conventional file systems. • Briefly describe use of the journal in the journaling file system. • Describe the goal of the ReiserFS file system and state its focus. • Briefly describe how ext3 provides an advantage in availability, data integrity, speed, and transition. • State the difference between regular journaling and ext3 journaling. • List the two types of journaling. • List and briefly describe the three data journaling modes. • List two benefits of tmpfs over disk based file systems.

  3. File systems • File systems store, retrieve and manipulate data. • File systems are needed to maintain an internal data structure that keeps all data organized and readily accessible. • Metadata is the term used to describe this internal data structure. • How metadata is set up differentiates between file systems and determines their performance characteristics. • Linux file system driver is especially written to manipulate metadata; it interacts with it directly for the user.

  4. Basic concepts Inodes

  5. Concepts • Links • Hard links • Can only be used within a single file system. • Can only point to files. • To add a link simply create a directory entry where the inode number points to the inode, and in incrementing the links count in the inode. • To delete/remove a filename, the kernel decrements the links count and deallocates the inode if this count becomes zero. • Symbolic links • They are simply files which contain a filename. • They can create cross-file systems links. • They can point to any type of file. • Device special files • Does not use any space on the file system. • It is only an access point to the device driver . • Types of special files • Character files: allows I/O operations in character mode. • Block special files: requires data to be written in block mode via the buffer cache functions. • I/O requests on a special file are forwarded to a (pseudo) device driver. • The file is referenced by a major number, which identifies the device type, and a minor number, which identifies the unit.

  6. Virtual File System layer (VFS) • VFS is an indirection layer in the kernel that deals with the file oriented system calls and calls the necessary functions in the physical file system code to do the I/O. • Indirection eases the integration and the use of several file system types. • When a process issues a file oriented system call, the kernel calls a function contained in the VFS. • The function handles the structure independent manipulations and redirects the call to a function contained in the physical file system code, which is responsible for handling the structure dependent operations. • Finally the file system code uses the buffer cache functions to request I/O on devices.

  7. File system importance • File system recovery can add a significant amount of time to the recovery process. • Categories of file systems • Conventional • Contained within one physical device. • Byte-stream oriented: data from individual files are interleaved throughout the entire disk space managed by the file system. • Utilizes allocation map as to where data for each file resides. • Uses delayed write protocol which means recovery requires scanning the disk and reconstructing the file system. • Journaled • Keeps a record of all changes made to data held on file. • Recovery done quickly using this “journal” instead of restoring by hand.

  8. ext2 • ext2fs was released in Alpha version in January 1993. • It is based on the extfs code and wasdesigned and implemented to fix some problems present in the first Extended File System. • ext2fs supports standard UNIX file types, regular files, directories, device special files and symbolic links, as well as managing file systems created on large partitions. • ext2fs provides long file names. • In addition to the standard UNIX features, ext2fs supports some extensions which are not usually present in UNIX file systems. • ext2fs attributes allows users to modify the kernel behavior when acting on a set of files. One can set attributes on a file or on a directory. In the latter case, new files created in the directory inherit these attributes. • Allows semantics to be selected at mount time. • Synchronous updates can be used in ext2fs. • ext2fs allows the administrator to choose the logical block size when creating the file system.

  9. Other file systems • NFS: network file system • Used for accessing remote file systems over the network. • Swap: swap file system • Not mounted, is used to page out any unused memory pages. • Procfs • Virtual file system that exists in memory. Kernel uses to ensure key system info is available to programs through standard file operations. • Smbfs • Samba file system, allows file sharing with Windows clients.

  10. Fsck (file system check) • fsck aids the metadata driver. • When a Linux system boots, fsck starts up and scans all local file systems listed in the system's /etc/fstab file. • fsck ensures that the to-be-mounted file systems' metadata is in a usable state. • If an unexpected power failure or system lock-up prevents Linux from cleanly unmounting the file system, fsck detects this. • Upon reboot, fsck starts its scan, realizes that these file systems were not cleanly unmounted and makes a reasonable assumption that the file systems probably aren't ready to be seen by the Linux file system drivers. Chances are the metadata is messed up in some way.

  11. Problem with fsck • fsck does exhaustive scan and sanity check on corrupted metadata, correcting any errors that it finds along the way. • After fsck fixes the file system, it is ready for use once again. • Problem: • You must scan a file system's entire metadata in order to ensure file system consistency. This is extremely time consuming, especially as file system size grows. • All this time, the Linux system is offline -- hurting availability and killing productivity.

  12. Solution to fsck problem : journaling • The journaling file system (JFS) solved this issue by adding a new data structure, the journal. • Journal is an on-disk structure. • Before driver makes any changes to the metadata, it describes what it is about to do in an entry that is written to the journal. This creates a log that can later be used to check for consistency if it isn’t cleanly unmounted. • Then driver modifies the metadata. • Therefore journaling file systems store data, metadata, and the journal. • Examples of file systems that use journaling: ReiserFS, XFS, JFS, ext3 and GFS.

  13. Journaling • fsck normally ignores the file system; it allows it to be mounted. • The Linux file system driver is the key to restoring the file system to a consistent state. • Checks to see if the file system is okay after it is mounted. • If the metadata needs to be fixed, it just looks at the journal (no exhaustive metadata scan). • Uses the chronological log of all metadata changes from the journal so that it can inspect exclusively the portions of the metadata that have been recently modified. • File system is brought back to a consistent state in seconds regardless of the file system size. • Journaling file systems add a higher level of reliability and faster recovery time.

  14. ReiserFS • A journaling file system designed and developed by Hans Reiser and his team of developers at Namesys. • Goal of system: create a single shared environment, or namespace, where applications can interact directly, efficiently, and powerfully. • Small file performance. • Other systems like ext2 are good at storing lots of 20+ KB files, but not tons of 50 KB files. • System performance and storage efficiency drops significantly with small files since space is allocated in either 1 or 4KB chunks. • To resolve this problem, most systems store the files on a database above the file system which requires building a layer on top of the file system. • Special-purpose solution means the file system isn’t meeting your needs and is causing you to add coding to set up storage, caching mechanisms, and interfacing with the database library. • ReiserFS avoids this and its performance is 15-18 times faster than ext2 when handling files smaller than 1KB in size.

  15. ReiserFS approach • Uses a specially optimized b* balanced tree (one per file system) to organize all file system data. • Host of features aimed specifically at improving small file performance. • Doesn't allocate storage space in fixed 1k or 4k blocks. • Instead, it allocates the exact size it needs. • Includes some special optimizations centered around tails (files and end portions of files smaller than a file system block).

  16. ReiserFS performance In order to increase performance, ReiserFS is able to store files inside the b*tree leaf nodes themselves, rather than storing the data somewhere else on the disk and pointing to it. Results: • Dramatically increases small file performance. Since the file data and the stat_data (inode) information are stored right next to each other, they can normally be read with a single disk I/O operation • ReiserFS is able to pack the tails together, saving a lot of space. In fact, a ReiserFS file system with tail packing enabled (the default) can store 6 percent more data than the equivalent ext2 file system, which is amazing in itself.

  17. ext3 file system • Designed by Dr. Stephen Tweedie. • Built on the framework of the existing ext2 file system. • Major difference-- it supports journaling. • ext3 is a well-rounded file system. • Similar to ext2. • Not as fast for small-file performance that ReiserFS gives you. • But no unexpected performance or functionality hiccups either. • ext3 file system code was finally integrated into the official Linux kernel starting with the 2.4.15-pre2 release.

  18. Advantages of ext3 • Availability • With ext2, after an unclean system shutdown (unexpected power failure, system crash), each file system cannot be mounted until its consistency has been checked by the e2fsck program. • File systems that are several hundreds of gigabytes in size may take an hour or more to check, severely limiting availability. • ext3 does not require a file system check, even after an unclean system shutdown, unless there is a rare hardware failure case (e.g. hard drive failures). • Recovery dependent on size of journal used to maintain system, not file size. • Data integrity • Stronger guarantees about data integrity in case of an unclean system shutdown. • You choose the type and level of protection that your data receives. • Speed • ext3 is often faster (higher throughput) than ext2 because ext3's journaling optimizes hard drive head motion. • You can choose from three journaling modes to optimize speed, optionally choosing to trade off some data integrity. Modes available are data=writeback, data=ordered, data=journal. • Easy transition

  19. Easy transition • On-disk format is identical to ext2. • Cleanly unmounted ext3 file system can be remounted as an ext2 file system with absolutely no problems. • Uses same metadata as ext2. • Possible to perform in-place ext2 to ext3 file system upgrades. • Transition is safe, reversible, and incredibly easy. • Don't need to back up and recreate your file systems from scratch.

  20. ext3 reliability • ext3 reliability is a benefit of metadata format • ext3 users gain access to a rock-solid fsck (file system check) tool. • An advantage of using a journaling file system is to avoid the need for an exhaustive fsck. • In case of flaky kernel, bad hard drive, or anything else unforeseen, it is good to have something to deal with corrupt metadata.

  21. Metadata-only journaling • Handles journaling very differently from other journaling file systems. • JFS driver journals metadata, but makes no provisions for journaling data. • With metadata-only journaling, file system metadata is secure; however any recently modified data is not. • Problems with this method: If a file was being modified and the machine unexpectedly locked up, forcing a reboot: • File system metadata would be easily repaired, thanks to the metadata journal, and you wouldn't need to sit through fsck. • When file is reloaded into a text editor however, it will not simply be missing recent changes, but will contain a good amount of garbage and may even be completely unreadable.

  22. ext3 approach to journaling • Journaling code uses a special API called the Journaling Block Device layer, or JBD. • Designed to implement a journal on any kind of block device. • ext3 implements its journaling by "hooking in" to the JBD API. • ext3 file system code will inform JBD of modifications it is performing. • ext3 FS will also request permission from JBD before modifying data on disk. • Result: JBD can manage the journal on behalf of the ext3 file system driver. • A few important facts about the JBD-managed ext3 journal: • ext3's journal is stored in an inode (a file). • By storing the journal in an inode, ext3 is able to add the needed journal to the file system without requiring incompatible extensions to the ext2 metadata.

  23. Methods of implementing a journal • Logical journaling • Set up journal to store spans of bytes that need to be modified on the host file system. • Advantage: efficient storage of modifications by journal since it would only record the individual bytes that need to be modified and nothing more. • Example: used by XFS file system. • Physical Journaling • ext3 file system driver stores complete replicas of the modified blocks (either 1K, 2K, or 4K) in memory to track pending I/O operations. • JBD approach • Stores the complete modified file system blocks themselves. • JBD uses complete physical blocks as the underlying currency for implementing the journal. • Use of full blocks allows ext3 to perform some additional optimizations, such as "squishing" multiple pending I/O operations within a single block into the same in-memory data structure.

  24. ext3: data and metadata integrity • Provides both metadata and data journaling, avoiding the data corruption two ways. • Originally, ext3 was designed to perform full data and metadata journaling. • In this mode (called "data=journal" mode), JBD journals all changes to the file system (both data or metadata). • New journaling mode (called "data=ordered" mode), with the benefits of full journaling but without the performance penalty. • Journals metadata only. • Meanwhile the ext3 file system driver keeps track of the particular data blocks that correspond with each metadata update, grouping them into a single entity called a transaction. • When a transaction is applied to the file system proper, the data blocks are written to disk first; then the metadata changes are written to the journal.

  25. ext3 speed • Allows you to choose from one of three data journaling modes at file system mount time: • data=writeback mode • No form of data journaling at all. • Should give you the best ext3 performance under most conditions. • data=ordered • Only officially journals metadata, but it logically groups metadata and data blocks into a transaction. • For appending data to files provides all of the integrity guarantees of full data journaling. • For overwritten data and system crashes, the written region may be a combination of original and updated blocks. • Performs slightly slower than data=writeback file systems. • Performs significantly faster than full data journaling. • data=journal • Provides full data and metadata journaling. • All new data is written to the journal first, and then to its final location. In the event of a crash, the journal can be replayed, bringing both data and metadata into a consistent state. • Usually this is the slowest journaling mode of all, since data gets written to disk twice rather than once. • In certain situations where data needs to be read from and written to disk at the same time, ext3's data=journal mode actually turns out to have a major performance advantage in busy environments where interactive I/O performance needs to be maximized.

  26. Elevator settings • Choosing elevator settings • Elevator is a generic algorithm used by most Linux block device drivers for scheduling block I/O. • ext3 may requires smaller latency numbers • Attempting to tune for maximum throughput at the expense of latency can actually decrease throughput while increasing latency. • Journaling all metadata changes also magnifies the effect of atime changes significantly. • Mount system with noatime flag. • Set read latency (-r) to half of write latency (-w).

  27. Reasons for ext3 • ext3 is forward and backward compatible with ext2. • ext3 benefits from the long history of fixes and enhancements to the ext2 file system. • Anything added to ext2 can easily be extended to ext3 such as enabling access control lists for security parameters and HTrees that make directory operations extremely fast and highly scalable to very large directories. • ext3 provides and makes use of a generic journaling layer (jbd) which can be used in other contexts.

  28. Reasons for using ext3 with zSeries • Software or hardware faults corrupt a file ext3 has broad cross platform compatibility, working on 32- and 64-bit architectures, and on both little-endian and big-endian systems. • ext3 does not require extensive core kernel changes and requires no new system calls. • e2fsck file system recovery program has a long and proven track record of successful data recovery

  29. Tmpfs: Temporary file system • Virtual memory file system • Excellent RAM disk-like system available for Linux right now (2.4kernal). • Tmpfs is like a ramdisk • Can use RAM, but also has option of using swap devices for storage. • Improvements over ramdisk • Tmpfs is a file system, not a block device. • Unlike ext3, ext2, JFS, ReiserFS who all exist on top of an underlying block device. • Can be directly mounted and used.

  30. Properties of tmpfs • Can use both RAM and swap. • Linux kernel's virtual memory resources come from both your RAM and swap devices. • VM subsystem in the kernel transparently (to the user) moves RAM pages to swap and vice-versa in order to allocate and manage the given resources to other parts of the system. • tmpfs file system requests pages from the VM subsystem to store files. • Tmpfs sits directly on top of VM (not block device). • Allows you to create file system with a simple mount command. • # mount tmpfs /mnt/tmpfs -t tmpfs.

  31. Advantages of tmpfs over disk-based fs • Dynamic file system size • Initially it has a very small capacity. • As files are copied and created, the file system driver will allocate more VM and will dynamically increase the file system capacity as needed. • As files are removed, the tmpfs file system driver will dynamically shrink the size of the file system and free VM resources. • Speed • Since tmpfs resides completely in RAM, you benefit from almost instantaneous reads and writes. • During swaps performance is still excellent. • Swapped parts of the tmpfs are moved to RAM as soon as more free VM resources become available. • Ext3 is faster, however. • No persistence • tmpfs data is not preserved between reboots. • Makes it an excellent file system for holding data that you don't need to keep such as temporary files (those found in /tmp) and parts of the /var file system tree.

  32. Problems with tmpfs • Due to dynamic growth you can quickly run into low VM conditions or go so far as exhausting all your virtual memory. • With the 2.4.4 kernel once you ran outof RAM or swap space, the kernel would immediately lock up. • With kernel 2.4.6 • When no more VM can be allocated you won't be able to write any new data to your tmpfs file system. • System becomes extremely sluggish and unresponsive: other processes on the system will be unable to allocate much more memory. • If superuser can’t alleviate the problem, the kernel last resort system to free memory will find the process hogging all the VM resources and kill it. • When tmpfs growth is to blame for VM exhaustion killing the right process doesn’t happen.

  33. Solving growth problem • Specify a maximum upper bound for the file system size during the initial mount or remount. • Select the optimal maximum tmpfs size setting for your environment. • To find a good upper-bound, monitor your system's swap usage during peak usage periods. • Specify an upper-bound that's slightly less than the sum of all free swap and free RAM during these peak usage times. • Creating a tmpfs file system with a maximum size is easy. To create a new tmpfs file system with a maximum file system size of 32 MB, type: tmpfs /dev/shm tmpfs size=32m 0 0 • If you want to limit the file system size to 512 KB or 1 GB, you can specify size=512k and size=1g, respectively. • Can also limit the number of inodes.

  34. Mounts • Mounting on top of existing mountpoints • With a single command, your new tmpfs /tmp file system is mounted at /tmp, on top of the already-mounted partition. • Creates a stack of file systems. • Bind mounts • Allows you to mount all, or even part of an already-mounted file system to another location, and have the file system accessible from both mountpoints at the same time (modifications will effect both). • Make it easy to make modifications to your file system layout. • You can share currently mounted /tmp file systems if you decide you want to use one from a new directory. • Bind mount, then if there are some directories that you don’t want to appear in both, then simply set the permissions of the individual directories/files. So although others’ files may map over, there will be no way to access them.

  35. Conclusion • File systems store, retrieve and manipulate data. • They are composed of inodes, directories, links, and device special files and contain a VFS layer to allow new systems to be mounted. • Their structure, i.e. how they maintain metadata, is the key feature that sets them apart.

  36. Conclusion • Journaled file systems avoid the problems of conventional systems with fsck, by implementing a journal that logs all changes to metadata, making recovery easier and faster. • There are different file systems that focus on different options. ReiserFS focuses on small file performance and utilizes a b* balanced tree to organize these files for quick access without wasting storage space.

  37. Conclusion • ext3 is a great option for zSeries; it allows you all the features of ext2 plus more flexibility and functionality. • ext2 has advantages in availability, data integrity, speed, and easy transition. • ext3 implements journaling very differently; it journals both metadata and data so it is more secure and protective against data corruption. • ext3 implements a journaling block device layer that stores complete modified file system blocks themselves. • There are three modes: writeback mode, ordered, and journal; depending on your write latency and speed requirements any can be chosen at mount time.

  38. Links • http://www-106.ibm.com/developerworks/edu/os-dw-linuxjfs-i.html • Link to register for a tutorial to install JFS on your Linux • http://www.zip.com.au/~akpm/linux/ext3/ installing ext3 learning • http://www-106.ibm.com/developerworks/library/l-fs.html

More Related