1 / 19

CS222P: Principles of Data Management Lecture #2 Heap Files , Page structure , Record formats

CS222P: Principles of Data Management Lecture #2 Heap Files , Page structure , Record formats. Instructor: Chen Li. Today ’ s Topics. Files of records: heap files Page formats Record formats Project 1 overview. Next topic: Files of Records.

duke
Download Presentation

CS222P: Principles of Data Management Lecture #2 Heap Files , Page structure , Record formats

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS222P: Principles of Data ManagementLecture #2Heap Files, Page structure,Recordformats Instructor: Chen Li

  2. Today’s Topics • Filesofrecords: heapfiles • Page formats • Recordformats • Project 1 overview

  3. Next topic: Files of Records • Page or block is OK when doing I/O, but higher levels of DBMS operate on records, and thus want files of records. • FILE: A collection of pages, each containing a collection of records. Must support: • Insert (append)/delete/modify record • Read a particular record (specified using record id) • Scan all records (possibly with some conditions on the records to be retrieved)

  4. Unordered (“Heap”) Files • Simplest file structure that contains records in no particular (logical) order. • As file grows and shrinks, disk pages are allocated and de-allocated. • To support record level operations, we must: • keep track of the pages in a file • keep track of free space within and across pages • keep track of the records on a page • keep track of fields within records • There are many alternatives for each.

  5. Heap File Implemented as a List Data Page Data Page Data Page • The header page id and Heap file name must be stored someplace. (Project 1 note: The OS filesystem can help…! ) • Each page contains two extra “pointers” in this case. • Refinement: Use several lists for different degrees of free space (to mention just one of many possibilities). Full Pages Header Page Data Page Data Page Data Page Pages with Free Space

  6. Data Page 1 Header Page Data Page 2 Data Page N DIRECTORY Heap File Using a Page Directory • Page entries can include the number of free bytes on each page • Directory is a collection of pages; linked list just one possible implementation. (Note: Can also do extents!)

  7. Project 1: PFM(PagedFileManager)

  8. Next:Pageformat

  9. Page Formats: Fixed Length Records Slot 1 Slot 1 Slot 2 Slot 2 • Record id = <page id, slot #>. In the first (packed) alternative, records will move around for free space management: Rids change may be unacceptable! Free Space . . . . . . Slot N Slot N Slot M N . . . 1 1 1 M 0 M ... 3 2 1 number of records number of slots PACKED UNPACKED, BITMAP

  10. Page Formats: Variable Length Records Rid = (i,N) Page i • Can move records within page w/o changing RIDs; not so unattractive for fixed-length records as a result. • Record movement? (1) Tombstones, or (2) PKeys (vs. RIDs) Rid = (i,2) Rid = (i,1) Free space... . . . (in middle!) F N 20 16 24 SLOT DIRECTORY (offset, length)

  11. ... Variable Length Records (cont.) Page i • Two variable-sized areas growing towards to each other (living within a one-page space budget!) • Other variations on these formats are possible as well • Could track free space holes with an offset-based list structure • Could use a different record format (e.g., PAX, which clusters values by field in page rather than by record and then field) • .... i,2 i,20 i,1 RECORDS ... ... SLOT DIRECTORY (etc.) . . .

  12. Next: record formats

  13. Example CREATE TABLE Emp(id INT, gender CHAR(1), name VARCHAR(30), Salary float );

  14. Record Formats: Fixed Length F1 F2 F3 F4 • Information about field types is the same for all records in file; it is stored in the systemcatalogs.(Note: Record field info in Project 1 passed in “from above”…!) • Finding the i’th field of a record does not require scanning the record. L1 L2 L3 L4 Base address (B) of record Address of F3 = B+L1+L2

  15. Record Formats: Variable Length • Several alternative formats (# fields is fixed): F1 F2 F3 F4 v1 v2 v3 v4 $ $ $ $ Fields Delimited by Special Symbols F1 F2 F3 F4 v1 v2 v3 v4 L1 L2 L3 L4 Fields Preceded by Field Lengths • Some thought questions for you: • (1) What’s true of the second format but not the first? • (2) What annoying disadvantage do both formats share? • (3) And, how do we know the field count in each case?

  16. Record Formats: Variable Length (continued) • Variable-length fields with a directory: F1 F2 F3 F4 v1 v2 v3 v4 4 Array of fieldoffsets (a.k.a. directory) • This format: • (1) Offers direct access to the i'th field. • (2) Helps support efficient storage of nullvalues. (Q: How?) • (3) Just requires a small directory overhead. • (4) Can even help with ALTER TABLE ADD COLUMN! (Q: How?)

  17. Record Formats: Variable Length • More variations on a theme... Addition of null flags: F1 F2 F3 F4 v1 v2 v3 v4 4 0000 Inlining of fixed-size fields: (F1) F2 (F3) F4 l1 v2 v4 l3 v1 v3 4 0000

  18. Project 1: RecordBasedFileManager

  19. PAX format Traditional Format PAX Format • PAX partitions each page into minipages based on fields • Good caching behaviors for “select fields from …”; • Compression • www.pdl.cmu.edu/PDL-FTP/Database/pax.pdf • Column store (e.g., Vertica)

More Related