1 / 42

CIS 402: File Management Techniques Chapter 3

CIS 402: File Management Techniques Chapter 3. Secondary Storage and System Software: Magnetic Disks &Tapes CD-ROMs Issues in Data Management. Part I: Disks. nature and limitations of disks systems used to store and retrieve files designing good file structures

ashlyn
Download Presentation

CIS 402: File Management Techniques Chapter 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIS 402: File Management TechniquesChapter 3 • Secondary Storage and System Software: • Magnetic Disks &Tapes • CD-ROMs • Issues in Data Management

  2. Part I: Disks • nature and limitations of disks • systems used to store and retrieve files • designing good file structures • arrange the data efficiently • minimize access costs for devices used by the system.

  3. Disks: An Overview • Direct Access Storage Devices (DASDs) • they make it possible to access the data directly. • disks are DASD • Serial Devices • only serial access permitted • must traverse all data in order from beginning to access desired data. • Magnetic Tapes

  4. Types of Storage • Magnetic Disk Storage • Hard Disk: • High Capacity + Low Cost per bit. • Floppy Disk: • Cheap, but slow; holds little data. • zip disks: removable disk cartridges; hold larger amounts of data • Optical Disks • Magneto-optical • Holds a lot of data • can be reproduced cheaply • slow • Magnetic Tape Storage

  5. Magnetic Disk Storage • Data represented as magnetic spots • Magnetized spot = 1 • Absence of a magnetized spot = 0 • Read • Converts the magnetized data to electrical impulses • Write • Converts electrical impulses to magnetized spots on disk

  6. Size MB older hard disks, floppies GB current hard disks TB coming soon What’s stored? User documents Software Graphic images Audio files Video files Disk Capacity

  7. Diskettes • Low capacity – small files • Portable • Flexible Mylar coated with metallic substance • Hard plastic jacket for protection • 3 ½ inch, 1.44 MB

  8. High-Capacity Portable Disks • Larger files fit (more of them, too) • Portable • High-capacity • 120 / 200 MB • Can read and write standard diskettes • Ex: Superdisk • Zip disk (Iomega Corp) • 250 MB • not compatible with 3 ½ inch diskettes • Also Jaz disk (2GB) and Peerless (10/20GB)

  9. Data Compression • Why use? • Squeeze big files onto small disks • Speed up data transfer of files • Goal – Remove redundancy (minimize size) • Reduce to the minimal number of bits to store data • Techniques • Remove all extra space characters • Substitutes a smaller data string for a frequently occurring set of characters • Software uses formula to determine how to compress • Different models used based on content (text, image, etc) • Must be decompressed to be used again • e.g. jpeg?

  10. Hard Disk • Various sizes • Portability • Generally non-portable • this is changing with advent of USB disks • still much larger than floppy or zip disk, but some are very small and still hold GBs of data • Rigid platter coated with metallic substance • Now available are external, portable hard disks

  11. Disk Pack Read Heads Several platters Airtight, sealed module Mount disk pack on disk drive

  12. Disk PackMovable vs Fixed-head Disks • Some disks have fixed-heads • As many read/write heads as there are tracks on the platter • Track is selected electronically and is therefore much faster • Cost of additional read/write heads is the limiting factor to production • Disks with an actuator are called moveable-head disks • Actuator moves the (single) read/write head per platter to the appropriate track. This is a seek. • seeks take by far the longest time • require a physical movement of the read heads

  13. Disk PackMovable Head Disks • Disk pack has set of access arms • Two read / write heads per arm • One reads top surface • One reads bottom surface • Access arms move together as a unit • Only one read/write head works at a time

  14. Data Destroyed: Head Crash

  15. Logical Layout of a Disk: Track • Concentric circles • Passes under read/write head as disk rotates • 1.44 MB diskette has 80 tracks on each surface • Numbered 0  79 • Each track stores the same amount of data? • not intuitive, since outer track has larger surface area than inner • true for 3½” floppy

  16. Logical Layout of a Disk: Sector • Pie-shaped division of tracks • Holds a fixed number of bytes (historically 512 bytes)

  17. Logical Layout of a Disk Organizing Tracks per Sector • The Physical Placement of Sectors • Simple logical organization: • sectors on a track are adjacent, fixed-sized segments of a track that hold a file. • Physical reality • An attempt to read a series of sectors, in order, stored in adjacent sectors could be inefficient. Why? • After reading data, the disk controller could require timeto process the data, before reading can continue • In this situation, if sectors are organized in physical order, the controller has to wait for a partial spin of the disk to continue (rotational delay) • Solution: Interleave the sectors to minimize or eliminate this wait time. • Optimal: Next sector arrives at read head exactly when previous data is processed

  18. Sector Interleaving Since the early 1990’s controller speeds have improved to a degree that no interleaving is necessary anymore. This is 1:1 interleaving No interleaving Interleaving - Factor: 5 5 revs to read track Sectors skipped during processing of data

  19. Organizing Tracks by Sectors:Clusters • Cluster • Adjacent sectors treated as a unit of storage • Fixed number of sectors • Minimum space allocated to a file • Files smaller than a cluster waste disk space • A file is treated as a series of clusters by the file manager. • Physical locations of the clusters of a file are held in the File Allocation Table (FAT)

  20. Organizing Tracks by Sectors:Extents • An extent is a grouping of contiguous (by interleaving) clusters • extents are possible when there is enough free space that a file can be stored on contiguous clusters • a file contained within one extent can be accessed via only one seek • If one extent is not enough, then divide the file into more extents. • As the number of extents in a file increases, the file becomes more spread out on the disk, and the amount of seeks required to access it increases. • How can the number of extents a file uses be reduced?

  21. Access Time Components:Terminology • Seek time • Travel time for moving heads over track • Head switching • Activate correct head • Rotational delay • Waiting for sector to arrive under head (Avg ½ revolution) • Data transfer rate • Read/write bits on disk platter • Depends on density and rotational speed

  22. Logical Layout of a Disk: Cylinder • Disk drives with multiple platters have cylinders, which consist of the tracks that are directly above and below one another. • All the info on a single cylinder can be accessed without moving the arm that holds the read/write heads. • Store files across multiple platters in same cylinder • Access time reduced greatly • Moving this arm is called seeking. The arm movement is usually the slowest part of reading information from a disk.

  23. Logical Layout of a Disk: Zone Recording • Assigns more sectors to tracks in outer zones • More sectors = more data storage available • Each sector has same # bytes

  24. Disk Drive: Read / Write Operation • Disks rotate • Access arm moves read/write head • Read / write operation begins and continues until complete • Data is transferred to/from memory

  25. Estimating Capacities and Space Needs • Track Capacity = sectors per track * bytes per sector • Cylinder Capacity = tracks per cylinder * track capacity • tracks/cylinder is # platters • Drive Capacity = number of cylinders * cylinder capacity

  26. Exercises Give the requested values: If a disk has capacity 68,182,605,824 bytes (~64 GB): How many bytes/sector are there if there are 127 sectors/track, 64 tracks/cylinder, and 8192 cylinders? How many tracks/cylinder are there if there are 2048 bytes/sector, 127 sectors/track, and 4096 cylinders? What is the capacity of a disk with 1024 bytes/sector, 255 sectors/track, 16 tracks/cylinder, and 512 cylinders?

  27. The Organization of Disks:Review • The information stored on a disk is stored on the surface of one or more platters. • Information is stored in successive tracks on the surface of the disk. • Each track is divided into a number of sectors • a sector is the smallest addressable portion of a disk.

  28. The Organization of Disks • When a read statement calls for a particular byte from a disk file, the computer’s operating system finds the correct platter, track and sector, reads the entire sector into a special area in memory called a buffer, and then finds the requested byte within that buffer.

  29. Disk Organization Organizing Tracks by Sectors • The file can also be viewed as a series of clusters of sectors which represent a fixed number of (logically) contiguous sectors. • Once a cluster has been found on a disk, all sectors in that cluster can be accessed without requiring an additional seek. • The File Allocation Table ties logical sectors to the physical clusters they belong to.

  30. Storing Records on Disk • A record will rarely be of size equal to that of a sector • too large: each record is in at least two sectors • too small: each sector can contain parts of two records • one record/sector would waste space (internal fragmentation) • parts of two records/sector requires accessing multiple sectors • Solution?: Blocks

  31. Organizing by Block • Rather than being divided into sectors, the disk tracks may be divided into user-defined blocks. • the amount of data transferred in a single I/O operation is variable, according to software requirements as opposed to the specifications of the hardware). • fixed or variable in length. Factors: • requirements of the file designer • capabilities of the operating system.

  32. Organizing by Block • Advantage: • blocks can be sized according to the size of records • no sector-spanning and fragmentation problems • The blocking factor indicates the number of records that are to be stored in each block in a file. • why put more than one record in a block? • Question: How can the operating system know what the designer has decided regarding their blocks? • There is a tradeoff between flexibility and disk space • A block is ordinarily accompanied by subblocks: • key-subblock • count-subblock.

  33. Non-Data Overhead • Disks always have some pieces of information stored both in predetermined places and also among the data • File Allocation Table • DOS: Limited number of entries • Unix: Not sure. See URL: http://aa11.cjb.net/tru64_unix_managers/1997/0440.html for computation of limit Appears to be system dependent

  34. Non-Data OverheadPreformatting • Store information at the beginning of each addressable unit • addresses • sector , block • track • condition – • usable • marked defective • gaps and synchronization markings • distinguish among addressable units

  35. Non-Data OverheadBlock-Addressing • Subblocks • Contain Extra Information • Count subblock • # bytes in this block • Key subblock • Key (search) of last record in block • allows locating proper block quickly • Blocking factor • number of records per block • Higher blocking factor means less disk space used for overhead • The amount of non-data space necessary for a block scheme is higher than for a sector-scheme.

  36. Effect of Blocking Factor on Availability of Storage • The higher the blocking factor, the less times non-data info must be placed in the file • Larger blocks and/or blocking factor means less space needed to hold a file’s contents • But, this also means more records/block, and slower search within a block • Also, larger blocks can lead to more internal fragmentation, as anything left over on a track can’t be used (allocation is from tracks) • See example on pages 58-59 of the text

  37. The Cost of a disk Access • Seek Time is the time required to move the access arm to the correct cylinder. • Seeking moves the read/write heads to different tracks • Rotational Delay is the time it takes for the disk to rotate to the desired sector. • Transfer Time = (Number of Bytes Transferred/Number of Bytes on a Track) * Rotation Time

  38. The Cost of a Disk AccessRotation Time • Rotation time is given in a drive’s specs. • Suppose rotation time of disk X is 1000 rpm • 1000 rpm / 60 s/m =16.6667 rps • Suppose there are 127 sectors/track • Then, the rotation time to read a sector is of the time to complete a revolution: • 1/16.6667 = 0.06 seconds per revolution • 0.06/127 = 0.00047 seconds per sector, 4.7 msec • Formula is percentage of a track * time to read track

  39. The Cost of a Disk AccessTime Computation • For each read, compute product of: • Seek Time • use average seek time • Rotational Delay • time between end of seek and start of read • Time to read the addressable unit • block, sector, cluster • Number of times the above sequence must occur

  40. Example • Find the amount of time necessary to read 435 clusters, where each cluster is 4 sectors, and each track has 127 sectors (3 wasted at end of each track). Assume average bunching (number of cluster that can be read consecutively each time) is 3 clusters • Avg seek: 7 msec • Rotational Delay: 2.5 msec • Rotation time: 10000 rpm • Find time to read a sector: (using previous slide) 166 rps is 6 msec/revolution  6 msec/127 = 0.047msec/sector Now, each cluster thus needs ~ 0.19 msec to be read and, on average, 3 clusters are read together, so we have 145 seeks Time for each seek-rotate-read: 7 ms + 2.5 ms + 0.19 * 3 = 10.07 ms Time for 145 of these: 1460.15 ms = ~1.46s • Note the comparatively small time required for the actual read compared to the time to get to the right spot on the disk! • 9.5*145= ~1.38s en route 0.57*145= ~0.083s reading  < 5% time reading • Compare this example to those on pages 61-62 of the text. • Try exercises11 a,b,e,f on page 112 of the text tread 3 clusters

  41. Disk as Bottleneck • Networks are fast, and disks are still slow relative to networks. Result: Processes that access disk(s) are usually disk-bound • the network and the CPU often have to wait inordinate lengths of time for the disk to furnish data. • Solutions involve allowing the CPU to continue processing, whether on this task or others, during disk accesses, or eliminating disk accesses altogether

  42. Disk as Bottleneck • Solutions • Multiprogramming: (CPU works on other jobs while waiting for the disk) • Stripping: splitting the parts of a file on several different drives, then letting the separate drives deliver parts of the file to the network simultaneously ==> Parallelism • RAID: Redundant Array of Independent Disks. Store each part of data on the same track in a different disk (if possible). Use cache to reassemble during read. • RAM disk ==> Simulate the behavior of the mechanical disk in memory. Avoid accessing disk completely • Disk Cache= large block of memory configured to contain pages of data from a disk. Check cache first. If not there, go to the disk and replace some page already in cache with page from disk containing the data. Well –written look-ahead algorithms can put correct data in cache before it is requested.

More Related