1 / 23

Storage Systems CSE 598d, Spring 2007

OS Support for DB Management DB File System April 3, 2007 Mark Johnson. Storage Systems CSE 598d, Spring 2007. What is a database?. It is a special purpose application that uses the file system and provides a layer of abstraction on top of that

dariusr
Download Presentation

Storage Systems CSE 598d, Spring 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OS Support for DB Management DB File System April 3, 2007 Mark Johnson Storage SystemsCSE 598d, Spring 2007

  2. What is a database? It is a special purpose application that uses the file system and provides a layer of abstraction on top of that Organizes data in logical fashion based on application business logic.

  3. How do OS services help/hinder databases? Buffer Pool File System Scheduling Process Management Interprocess Communication Consistency Control

  4. Buffer Pool Management Main memory used as cache for file system. At time of writing (1981, size compiled into OS, UNIX) Is this still the case? Using LRU Strategy

  5. Problems of LRU for DB DB access is a combo of: Sequential access to blocks which are not rereferenced Sequential access to blocks which will be cyclically rereferenced Random access to blocks which will not be rereferenced again Random access to blocks for which there is a nonzero probability of rereference

  6. More LRU LRU works well for only 1 of those, random access/rereference DB should be able to control strategy since it will likely know what the data pattern is going to be. Initial research show that miss ratio could be cut 15% by better strategic cache policy

  7. Prefetching A Database knows exactly what data it is going to get next Next data access is in a logical, not physical order An OS prefetch would just have to get 'lucky'

  8. Crash Recovery Database writes are generally part of a transaction. Transactions store an 'intention list' and the final page flush has to flush the entire transaction list Wasn't supported in this OS Buffer Manager. Similar to a journaled system?

  9. More Buffer Management Summary General Purpose Buffer Pools are not good for databases. Need Application Knowledge to create a good buffer pool. Most DBMS keep private internal cache in user space.

  10. File System Two approaches: One big file (or several big files) Lots of small files representing logical structures Note: Oracle on Windows OS uses first strategy, Big Files

  11. File System DB really wants the second strategy Logical File Structure Directories Keyed names Not really optimally implemented

  12. File System Logical ordering does not imply physical ordering DBMS do a lot of sequential logical requests, resulting in a lot of disk access DBMS would prefer extent based system to lower fragmentation rate

  13. Tree Structured FS Need three layers Logical Representation (i-node) User representation (files, directories) DB representation (keys) Very expensive to have three trees

  14. Scheduling, Process Management, and Interprocess Communication Simplest way is to have one OS process per DB user. Alternative is one 'server' process in which all requests are funneled through. When a buffer pool misses, it will force a task switch. Makes first method very expensive.

  15. Critical Sections Becomes a problem in Process per user scenario Can have many critical sections used by several processes

  16. Server Model Contrast to Process per user. Have one main process, fed by several sub processes Requires DBMS to create own scheduling model Duplicates OS work

  17. Scheduling Summary Neither strategy is ideal Best situation is to have special OS instructions, or scheduling hints. Or ideally, a DBMS scheduling class. Note: Assumes that DB is probably the only major process on machine, as the scheduling algorithm only allows for voluntary resource relinquishment.

  18. Consistency Locking granularity may not be fine enough for DB usage. Need locks for pages and records Need to support application level transaction support Since this would require knowledge of the DB in the OS, the DB ends up duplicating OS like functionality in user space.

  19. Ordering Dependency Must provide in order execution as many DB requests are dependent on one another.

  20. Bottom Line A general purpose OS is not ideal to run a database. Better solution would be to have small OS with minimal services, implement everything in the DB Paper was written in 1981.... DBMS still run on general purpose OS!

  21. Further Research Database Filesystem Switch Roles Use a Database as the file system

  22. Further Research Since a DB provides many capabilities a file system would typically use, how would a DB perform as a file system? Text searching not easily possible on File System, but easy in a DB Difficulties testing.... since a DB runs on a file system already!

More Related