1 / 25

CPSC 461 Final Review I

CPSC 461 Final Review I. Hessam Zakerzadeh Dina Said. 9.1) What is the most important difference between a disk and a tape?. 9.1) What is the most important difference between a disk and a tape?

sharla
Download Presentation

CPSC 461 Final Review I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 461Final Review I Hessam Zakerzadeh Dina Said

  2. 9.1) What is the most important difference between a disk and a tape?

  3. 9.1) What is the most important difference between a disk and a tape? Tapes are sequential devices that do not support direct access to a desired page. We must essentially step through all pages in order. Disks support direct access to a desired page.

  4. Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

  5. Linear Hashing • No directory • More flexibility wrt time for bucket splits • Worse performance than Extendible Hashing if data is skewed. • Utilizes a family of Hash function h0,h1,… such that hi(v)=h(v) mod 2iN • N is the initial number of buckets • If N is power of 2d0, then apply h and look at the last di bits → di=d0+1

  6. Inserting a Data Entry in LH • Find bucket by applying hLevel/ hLevel+1: • If the bucket to insert into is full: • Add overflow page and insert data entry. • (Maybe) split Nextbucket and increment Next. • Else simply insert the data entry into the bucket.

  7. Bucket Split • A split can be triggered by • the addition of a new overflow page • conditions such as space utilization • Whenever a split is triggered, • the Nextbucket is split, • and hash function hLevel+1 redistributes entries between this bucket (say bucket number b) and its split image; • the split image is therefore bucket number b+NLevel. • Next Next + 1.

  8. Example: Insert 44 (11100), 9 (01001) Level=0, Next=0, N=4 h h 0 1 Next=0 32* 44* 36* 000 00 9* 5* 001 25* 01 30* 10* 14* 18* 10 010 31* 35* 7* 11* 011 11 PRIMARY (This info is for illustration only!) PAGES

  9. Example: Insert 43 (101011) Level=0, N=4 h h Next=0 0 1 32* 44* 36* 000 00 Level=0 Next=1 ç 9* 5* 001 25* 01 h OVERFLOW h PRIMARY 30* 10* 14* 18* 10 010 0 1 PAGES PAGES 32* 31* 35* 7* 11* 000 00 011 11 9* 5* 25* 001 01 PRIMARY (This info is for illustration only!) PAGES 30* 10* 14* 18* 10 010 (This info is for illustration only!) 31* 35* 7* 11* 43* 011 11 100 44* 36* 00

  10. Example: End of a Round Level=1, Next = 0 Insert 50 (110010) PRIMARY OVERFLOW h h PAGES 0 1 PAGES Next=0 Level=0, Next = 3 00 000 32* PRIMARY OVERFLOW PAGES h PAGES h 1 0 001 01 9* 25* 32* 000 00 10 010 50* 10* 18* 66* 34* 9* 25* 001 01 011 11 35* 11* 43* 66* 10 18* 10* 34* 010 Next=3 100 00 44* 36* 43* 11* 7* 31* 35* 011 11 101 11 5* 29* 37* 44* 36* 100 00 14* 22* 30* 110 10 5* 37* 29* 101 01 14* 30* 22* 31* 7* 11 111 110 10

  11. Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

  12. If we start with an index which has B buckets, during the round all the buckets will be split in order, one after the other. • A hash function is expected to distribute the search key values uniformly in all the buckets • A split can be triggered by Conditions such as space utilization →length of the overflow chain reduces. • Therefore, number of overflow pages isn't expect to be more than 1

  13. Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value?

  14. Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value? No. Overflow chains are part of the structure, so no such guarantees are provided

  15. Exercise 11.4 Answer the following questions about Linear Hashing: If a Linear Hashing index using Alternative (1) for data entries contains N records, with P records per page and an average storage utilization of 80 percent, what is the worst-case cost for an equality search? Under what conditions would this cost be the actual search cost?

  16. Maximum Number of records in each page = 0.8 * P If all keys map to the same bucket We will have (N / 0.8P) pages in that bucket. This is the worst time

  17. Exercise 11.4 Answer the following questions about Linear Hashing: If the hash function distributes data entries over the space of bucket numbers in a very skewed (non-uniform) way, what can you say about the space utilization in data pages?

  18. Space utilization = Total Number of buckets / Total Number of pages If data is skewed: All records are mapped to the same bucket Suppose that we have m main pages All records will be mapped to bucket 0 Each additional overflow will cause split Suppose we added n overflow pages to bucket 0 → we added n buckets Total Number of buckets = n+1 Total Number of pages = m + n +n Space Utilization = (n+1) / (m+2n) < 50% → Very bad

  19. 13.4

  20. 10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageThe page is 4K For Pass 0: Ceil(10*10^6 / 320)= 31250 Runs Read Cost per Run = (10+5 + 1*320) Write Cost per Run = (10+5 + 1*320) Total I/O cost = No of Runs * (Cost of read + Cost of Write) = 31250 * 2* (15+320) → Cost of Pass 0

  21. 10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageThe page is 4K Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes = ceil (lognoOfWay31250) = ceil ( ln 31250 / ln No. of ways) Read/Write Cost: = No. of blocks * ( 10 + 5 + 1 * No. of pages per block) No. of blocks= Ceil (10*10^6 / No. of pages per block)

  22. 10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes = = ceil ( ln 31250 / ln No. of ways) = ceil ( ln 31250 / ln 256) = 2 Read Cost: = 16 *10^7

  23. 10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Write Cost: = No. of blocks * ( 10 + 5 + 1 * No. of pages per block) = 156250 * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) = 156250

  24. 10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) = 2* (16*10^7 + 156250 * (15+64))

  25. 10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pagee) Create four ‘input’ buffers of 64 pages each, create an ‘output’ buffer of 64 pages, and do four-way merges. Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes = ceil ( ln 31250 / ln No. of ways) =8 Read/Write Cost: = No. of blocks * ( 10 + 5 + 1 * No. of pages per block) = 156250 * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) = 156250 Total Cost=8 * (2 * 156250 * (15+64))

More Related