More on disks and file systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

More on Disks and File Systems PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

More on Disks and File Systems. CS-502 Operating Systems Fall 2006 (Slides include materials from Operating System Concepts , 7 th ed., by Silbershatz, Galvin, & Gagne and from Modern Operating Systems , 2 nd ed., by Tanenbaum). Additional Topics. Mounting a file system

Download Presentation

More on Disks and File Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


More on disks and file systems

More on Disks and File Systems

CS-502 Operating SystemsFall 2006

(Slides include materials from Operating System Concepts, 7th ed., by Silbershatz, Galvin, & Gagne and from Modern Operating Systems, 2nd ed., by Tanenbaum)

More on Disks and Files


Additional topics

Additional Topics

  • Mounting a file system

  • Mapping files to virtual memory

  • RAID – Redundant Array of Inexpensive Disks

  • Stable Storage

  • Log Structured File Systems

  • Linux Virtual File System

More on Disks and Files


Summary of reading assignments in silbershatz

Summary of Reading Assignmentsin Silbershatz

  • Disks (general) – §12.1 to 12.6

  • File systems (general) – Chapter 11

    • Ignore §11.9, 11.10 for now!

  • RAID – §12.7

  • Stable Storage – §12.8

  • Log-structured File System – §11.8 & §6.9

  • More on Disks and Files


    Mounting

    Mounting

    mount –t type device pathname

    • Attach device (which contains a file system of type type) to the directory at pathname

      • File system implementation for type gets loaded and connected to the device

      • Anything previously below pathname becomes hidden until the device is un-mounted again

      • The root of the file system on device is now accessed as pathname

  • E.g.,

    mount –t iso9660 /dev/cdrom /myCD

  • More on Disks and Files


    Mounting continued

    Mounting (continued)

    • OS automatically mount devices in its mount table at initialization time

      • /etc/fstabin Linux

  • Type may be implicit in device

  • Users or applications may mount devices at run time, explicitly or implicitly — e.g.,

    • Insert a floppy disk

    • Plug in a USB flash drive

  • More on Disks and Files


    Linux virtual file system vfs

    Linux Virtual File System (VFS)

    • A generic file system interface provided by the kernel

    • Common object framework

      • superblock: a specific, mounted file system

      • i-node object: a specific file in storage

      • d-entry object: a directory entry

      • file object: an open file associated with a process

    More on Disks and Files


    Linux virtual file system continued

    Linux Virtual File System (continued)

    • VFS operations

      • super_operations:

        • read_inode, sync_fs, etc.

      • inode_operations:

        • create, link, etc.

      • d_entry_operations:

        • d_compare, d_delete, etc.

      • file_operations:

        • read, write, seek, etc.

    More on Disks and Files


    Linux virtual file system continued1

    Linux Virtual File System (continued)

    • Individual file system implementations conform to this architecture.

    • May be linked to kernel or loaded as modules

    • Linux supports over 50 file systems in official kernel

      • E.g., minix, ext, ext2, ext3, iso9660, msdos, nfs, smb, …

    More on Disks and Files


    Linux virtual file system continued2

    Linux Virtual File System (continued)

    • A special file type — proc

      • Mounted as /proc

      • Provides access to kernel internal data structures as if those structures were files!

      • E.g., /proc/dmesg

    • There are several other special file types

      • Vary from one version/vendor to another

      • See Silbershatz, §11.2.3

      • Love, Linux Kernel Development, Chapter 12

      • SUSE Linux Administrator Guide, Chapter 20

    More on Disks and Files


    Questions

    Questions?

    More on Disks and Files


    Mapping files to virtual memory

    Mapping files to Virtual Memory

    • Instead of “reading” from disk into virtual memory, why not simply use file as the swapping storage for certain VM pages?

    • Called mapping

    • Page tables in kernel point to disk blocks of the file

    More on Disks and Files


    Memory mapped files

    Memory-Mapped Files

    • Memory-mapped file I/O allows file I/O to be treated as routine memory access by mapping a disk block to a page in memory

    • A file is initially “read” using demand paging. A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses.

    • Simplifies file access by allowing application to simple access memory rather than be forced to use read() & write() calls to file system.

    More on Disks and Files


    Memory mapped files continued

    Memory-Mapped Files (continued)

    • A tantalizingly attractive notion, but …

    • Cannot use C/C++ pointers within mapped data structure

    • Corrupted data structures likely to persist in file

      • Recovery after a crash is more difficult

  • Don’t really save anything in terms of

    • Programming energy

    • Thought processes

    • Storage space & efficiency

  • More on Disks and Files


    Memory mapped files continued1

    Memory-Mapped Files (continued)

    Nevertheless, the idea has its uses

    • Simpler implementation of file operations

      • read(), write() are memory-to-memory operations

      • seek() is simply changing a pointer, etc…

      • Called memory-mapped I/O

    • Shared Virtual Memory among processes

    More on Disks and Files


    Shared virtual memory

    Shared Virtual Memory

    More on Disks and Files


    Shared virtual memory continued

    Shared Virtual Memory (continued)

    • Supported in

      • Windows XP

      • Apollo DOMAIN

      • Linux??

    • Synchronization is the responsibility of the sharing applications

      • OS retains no knowledge

      • Few (if any) synchronization primitives between processes in separate address spaces

    More on Disks and Files


    Questions1

    Questions?

    More on Disks and Files


    Problem

    Problem

    • Question:–

      • If mean time to failure of a disk drive is 100,000 hours,

      • and if your system has 100 identical disks,

      • what is mean time between drive replacement?

    • Answer:–

      • 1000 hours (i.e., 41.67 days  6 weeks)

    • I.e.:–

      • You lose 1% of your data every 6 weeks!

    • But don’t worry – you can restore most of it from backup!

    More on Disks and Files


    Can we do better

    Can we do better?

    • Yes, mirrored

      • Write every block twice, on two separate disks

      • Mean time between simultaneous failure of both disks is 57,000 years

    • Can we do even better?

      • E.g., use fewer extra disks?

      • E.g., get more performance?

    More on Disks and Files


    Raid redundant array of inexpensive disks

    RAID – Redundant Array of Inexpensive Disks

    • Distribute a file system intelligently across multiple disks to

      • Maintain high reliability and availability

      • Enable fast recovery from failure

      • Increase performance

    More on Disks and Files


    Levels of raid

    “Levels” of RAID

    • Level 0 – non-redundant striping of blocks across disk

    • Level 1 – simple mirroring

    • Level 2 – striping of bytes or bits with ECC

    • Level 3 – Level 2 with parity, not ECC

    • Level 4 – Level 0 with parity block

    • Level 5 – Level 4 with distributed parity blocks

    More on Disks and Files


    Raid level 0 simple striping

    stripe 0

    stripe 1

    stripe 2

    stripe 3

    stripe 4

    stripe 5

    stripe 6

    stripe 7

    stripe 8

    stripe 9

    stripe 10

    stripe 11

    RAID Level 0 – Simple Striping

    • Each stripe is one or a group of contiguous blocks

    • Block/group i is on disk (imodn)

    • Advantage

      • Read/write n blocks in parallel; n times bandwidth

    • Disadvantage

      • No redundancy at all. System MBTF is 1/n disk MBTF!

    More on Disks and Files


    Raid level 1 striping and mirroring

    stripe 0

    stripe 1

    stripe 2

    stripe 3

    stripe 0

    stripe 1

    stripe 3

    stripe 2

    stripe 4

    stripe 5

    stripe 6

    stripe 7

    stripe 4

    stripe 5

    stripe 6

    stripe 7

    stripe 8

    stripe 9

    stripe 10

    stripe 11

    stripe 10

    stripe 9

    stripe 8

    stripe 11

    RAID Level 1– Striping and Mirroring

    • Each stripe is written twice

      • Two separate, identical disks

  • Block/group i is on disks (imod 2n) & (i+nmod2n)

  • Advantages

    • Read/write n blocks in parallel; n times bandwidth

    • Redundancy: System MBTF = (Disk MBTF)2 at twice the cost

    • Failed disk can be replaced by copying

  • Disadvantage

    • A lot of extra disks for much more reliability than we need

  • More on Disks and Files


    Raid levels 2 3

    RAID Levels 2 & 3

    • Bit- or byte-level striping

    • Requires synchronized disks

      • Highly impractical

  • Requires fancy electronics

    • For ECC calculations

  • Not used; academic interest only

  • See Silbershatz, §12.7.3 (pp. 471-472)

  • More on Disks and Files


    Observation

    Observation

    • When a disk or stripe is read incorrectly,

      we know which one failed!

    • Conclusion:

      • A simple parity disk can provide very high reliability

        • (unlike simple parity in memory)

    More on Disks and Files


    Raid level 4 parity disk

    stripe 1

    stripe 3

    stripe 2

    stripe 0

    parity 0-3

    stripe 4

    stripe 6

    stripe 7

    stripe 5

    parity 4-7

    stripe 8

    stripe 9

    stripe 10

    stripe 11

    parity 8-11

    RAID Level 4 – Parity Disk

    • parity 0-3 = stripe 0 xor stripe 1 xor stripe 2 xor stripe 3

    • n stripes plus parity are written/read in parallel

    • If any disk/stripe fails, it can be reconstructed from others

      • E.g., stripe 1 = stripe 0 xor stripe 2 xor stripe 3 xor parity 0-3

    • Advantages

      • n times read bandwidth

      • System MBTF = (Disk MBTF)2 at 1/n additional cost

      • Failed disk can be reconstructed “on-the-fly” (hot swap)

      • Hot expansion: simply add n + 1 disks all initialized to zeros

    • However

      • Writing requires read-modify-write of parity stripe  only 1x write bandwidth.

    More on Disks and Files


    Raid level 5 distributed parity

    stripe 0

    stripe 1

    stripe 2

    stripe 3

    parity 0-3

    stripe 4

    stripe 5

    stripe 6

    parity 4-7

    stripe 7

    stripe 8

    stripe 9

    parity 8-11

    stripe 10

    stripe 11

    stripe 12

    parity 12-15

    stripe 13

    stripe 14

    RAID Level 5 – Distributed Parity

    stripe 15

    • Parity calculation is same as RAID Level 4

    • Advantages & Disadvantages – Same as RAID Level 4

    • Additional advantages

      • avoids beating up on parity disk

      • Some writes in parallel

    • Writing individual stripes (RAID 4 & 5)

      • Read existing stripe and existing parity

      • Recompute parity

      • Write new stripe and new parity

    More on Disks and Files


    Raid 4 5

    RAID 4 & 5

    • Very popular in data centers

      • Corporate and academic servers

    • Built-in support in Windows XP and Linux

      • Connect a group of disks to fast SCSI port (320 MB/sec bandwidth)

      • OS RAID support does the rest!

    More on Disks and Files


    New topic

    New Topic

    More on Disks and Files


    Incomplete operations

    Incomplete Operations

    • Problem – how to protect against disk write operations that don’t finish

      • Power or CPU failure in the middle of a block

      • Related series of writes interrupted before all are completed

    • Examples:

      • Database update of charge and credit

      • RAID 1, 4, 5 failure between redundant writes

    More on Disks and Files


    Solution part 1 stable storage

    Solution (part 1) – Stable Storage

    • Write everything twice to separate disks

      • Be sure 1st write does not invalidate previous 2nd copy

      • RAID 1 is okay; RAID 4/5 not okay!

      • Read blocks back to validate; then report completion

  • Reading both copies

    • If 1st copy okay, use it – i.e., newest value

    • If 2nd copy different, update it with 1st copy

    • If 1st copy is bad; use 2nd copy – i.e., old value

  • More on Disks and Files


    Stable storage continued

    Stable Storage (continued)

    • Crash recovery

      • Scan disks, compare corresponding blocks

      • If one is bad, replace with good one

      • If both good but different, replace 2nd with 1st copy

  • Result:–

    • If 1st block is good, it contains latest value

    • If not, 2nd block still contains previous value

  • An abstraction of an atomic disk write of a single block

    • Uninterruptible by power failure, etc.

  • More on Disks and Files


    What about more complex disk operations

    What about more complex disk operations?

    • E.g., File create operation involves

      • Allocating free blocks

      • Constructing and writing i-node

        • Possibly multiple i-node blocks

      • Reading and updating directory

  • What if system crashes with the sequence only partly completed?

  • Answer: inconsistent data structures on disk

  • More on Disks and Files


    Solution part 2 log structured file system

    Solution (Part 2) –Log-Structured File System

    • Make changes to cached copies in memory

    • Collect together all changed blocks

      • Including i-nodes and directory blocks

  • Write to log file (aka journal file)

    • A circular buffer on disk

    • Fast, contiguous write

  • Update log file pointer in stable storage

  • Offline: Play back log file to actually update directories, i-nodes, free list, etc.

    • Update playback pointer in stable storage

  • More on Disks and Files


    Transaction data base systems

    Transaction Data Base Systems

    • Similar techniques

      • Every transaction is recorded in log before recording on disk

      • Stable storage techniques for managing log pointers

      • One log exist is confirmed, disk can be updated in place

      • After crash, replay log to redo disk operations

    More on Disks and Files


    Berkeley lfs a slight variation

    Berkeley LFS — a slight variation

    • Everything is written to log

      • i-nodes point to updated blocks in log

      • i-node cache in memory updated whenever i-node is written

      • Cleaner daemon follows behind to compact log

  • Advantages:

    • LFS is always consistent

    • LFS performance

      • Much better than Unix file system for small writes

      • At least as good for reads and large writes

  • Tanenbaum, §6.3.8, pp. 428-430

  • Rosenblum & Ousterhout, Log-structured File System (pdf)

  • Note: not same as Linux LFS (large file system)

  • More on Disks and Files


    Example

    a

    a

    modified blocks

    old blocks

    b

    b

    c

    c

    old i-node

    i-node

    new blocks

    new i-node

    b

    c

    a

    Example

    After

    Before

    log

    More on Disks and Files


    Summary of reading assignments in silbershatz1

    Summary of Reading Assignmentsin Silbershatz

    • Disks (general) – §12.1 to 12.6

    • File systems (general) – Chapter 11

      • Ignore §11.9, 11.10 for now!

  • RAID – §12.7

  • Stable Storage – §12.8

  • Log-structured File System – §11.8 & §6.9

  • More on Disks and Files


  • Login