1 / 24

CS4432: Database Systems II

CS4432: Database Systems II. Data Storage. Storage in DBMSs. DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data? Has significant impact on performance Design decisions:

lance
Download Presentation

CS4432: Database Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4432: Database Systems II Data Storage

  2. Storage in DBMSs • DBMSs manage large amounts of data • How does a DBMS store and manage large amounts of data? • Has significant impact on performance • Design decisions: • What representations and data structures best support efficient manipulations of this data? • To understand why the DBMSs applies specific strategies • Must first understand how disks work

  3. Disks and Files • DBMS stores information on (“hard”) disks. • Main memory is only for processing • This has major implications for DBMS design! • READ: transfer data from disk to main memory (RAM). • WRITE: transfer data from RAM to disk. • Both are high-cost operations, relative to in-memory operations, so must be planned carefully!

  4. DBMS vs. OS? Who’s in Control • DBMS is in control of managing its data • It knows more about structure • It knows more about access pattern

  5. That is why DBMS has Storage Manager & Buffer Manager

  6. Understanding Disks

  7. Avg. Size: 256kb-1MB Read/Write Time: 10-8 seconds. Random Access Smallest of all memory, and also the most costly. Usually on same chip as processor. Easy to manage in Single Processor Environments, more complicated in Multiprocessor Systems. Avg. Size: 128 MB – 1 GB Read/Write Time: 10-7 to 10-8 seconds. Random Access Becoming more affordable. Volatile Avg. Size: 30GB-160GB Read/Write Time: 10-2 seconds NOT Random Access Extremely Affordable: $0.68/GB!!! Can be used for File System, Virtual Memory, or for raw data access. Blocking (need buffering) Avg. Size: Gigabytes-Terabytes Read/Write Time: 101 - 102 seconds NOT Random Access, or even remotely close Extremely Affordable: pennies/GB!!! Not efficient for any real-time database purposes, could be used in an offline processing environment Slowest Fastest Storage Hierarchy Tertiary Storage Secondary Storage Main Memory Cache (all levels)

  8. Storage Hierarchy

  9. Memory Hierarchy Summary nearline tape & optical disks offline tape magnetic optical disks 1015 1013 electronic secondary online tape 1011 109 typical capacity (bytes) electronic main 107 105 cache 103 103 10-9 10-6 10-3 10-0 access time (sec)

  10. Memory Hierarchy Summary 104 cache electronic main online tape 102 electronic secondary magnetic optical disks nearline tape & optical disks dollars/MB 100 10-2 offline tape 10-4 103 10-9 10-6 10-3 10-0 access time (sec)

  11. Why Not Store Everything in Main Memory? • Costs too much. $100 will buy you either 16GB of RAM or 360GB of disk today. • Main memory is volatile. We want data to be saved between runs. (Obviously!) • Typical hierarchy: • Main memory (RAM) Processing • Disks (secondary storage)  Persistent Storage • Tapes & DVDs  Archival

  12. Motivation Consider the following algorithm : For each tuple r in relation R{ Read the tuple r For each tuple s in relation S{ read the tuple s append the entire tuple s to r } } What is the time complexity of this algorithm?

  13. Motivation • Complexity: • This algorithm is O(n2) ! Is it always ? • Yes, if we assume random access of data. • Hard disks are not efficient in Random Access ! • Unless organized efficiently, this algorithm may be much worse than O(n2).

  14. Disks: Some Facts • Data is stored and retrieved in units called disk blocks. • Disk block 512 bytes to 4K or 8K • Movement to main-memory • Must read or write one block at a time

  15. Disk Components Platter (2 surface)

  16. Virtual Cylinder Disk Head Cylinder Platter

  17. Tracks divided into Sectors Track Gaps ≈ 10% Sectors ≈ 90% Sector Gap

  18. Movements • Arm moves in-out • Called seek time • Mechanical • Platter rotates • Calledlatency time • Mechanical

  19. Actual Disk

  20. Disk Controller Processor ... ... Memory Disk Controller Controls the mechanical movement Transferring the data from disks to memory Smart buffering and scheduling Disk 1 Disk 2

  21. How big is the disk if? • There are 4 platters • There are 8192 tracks per surface • There are 256 sectors per track • There are 512 bytes per sector Remember 1kb = 1024 bytes, not 1000! Size = 2 * num of platters * tracks * sectors * bytes per sector Size = 2 * 4* 8192 * 256 * 512 Size = 233 bytes / (1024 bytes/kb) /(1024 kb/MB) /(1024 MB/GB) Size = 233 = 23 * 230 = 8GB

  22. Scale of Bytes

  23. More Disk Terminology • Rotation Speed: • The speed at which the disk rotates: 5400RPM • Number of Tracks: • Typically 10,000 to 15,000. • Bytes per track: • ~105 bytes per track

  24. Big Question: What about access time? block x in memory I want block X ? Time = Disk Controller Processing Time + Disk Delay{seek & rotation} + Transfer Time

More Related