1 / 60

The Development of File Structures

The Development of File Structures. Chin-Chen Chang, Ph.D. Chair Professor Dept. of Information Engineering and Computer Science, Feng Chia University, Taiwan. Fig.1 Disk organization. Fig.2 Seven records in the file. Fig.3 Index organization. SIR can read. Fig.4 Index sequential file.

Download Presentation

The Development of File Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Development of File Structures Chin-Chen Chang, Ph.D. Chair Professor Dept. of Information Engineering and Computer Science, Feng Chia University, Taiwan

  2. Fig.1 Disk organization

  3. Fig.2 Seven records in the file

  4. Fig.3 Index organization SIR can read

  5. Fig.4 Index sequential file

  6. Fig.5 Index non-sequential file

  7. Ant Dinosaur Kangaroo Sheep Fig.6 Inserting records in an index non-sequential file (Fig.2)

  8. Fig.7 Index sequential file with reserved overflow area

  9. Fig.8 Index Sequential file with empty positions left among the data records

  10. Fig.9 Difference file

  11. Fig.10 Binary tree

  12. Fig.11 A simple file structure

  13. Fig.12 A simple chained file

  14. Fig.13 Chained file

  15. Fig.14 A multi-list organization

  16. Cell 0 Cell 1 Cell 2 Cell 3 Fig.15 A multi-list organization with cellular chains

  17. Fig.16 Inverted list

  18. Fig.17 Inverted list with indirect addressing

  19. Bucket 0 Bucket 1 Bucket 2 Bucket 3 Bucket 4 Bucket 5 Bucket 6 Bucket 7 Bucket 8 Bucket 9 Fig.18 A bucket-resolved inverted-list with organization

  20. 1. Introduction • Definitions • Multi-attribute file system • A file system whose records are characterized by more than one attribute. • Partial-match queries • Queries of the following form: Retrieve all records where , , …, , .

  21. Examples 1. Table 1(a)

  22. Table 1(b) ANB = 4

  23. 2. Table 2(a)

  24. Table 2(b) ANB = 2

  25. 3. Table 3(a)

  26. Table 3(b) ANB = 2.5

  27. Problem • Given a set of records, our job is to arrange the records in such a way that the average number of buckets to be examined, over all possible partial match queries, is minimized.

  28. 2. The String-homomorphism Hashing (SHH) Method Example D1 = D2 = {a, b, c, d} = D BZ = 4 = 22 = zN D'1 = {a, b} Divide D into D'2 = {c, d} BK1 : D'1× D'1= {(a, a), (a, b), (b, a), (b, b)} BK2 : D'1× D'2= {(a, c), (a, d), (b, c), (b, d)} BK3 : D'2× D'1= {(c, a), (c, b), (d, a), (d, b)} BK4 : D'2× D'2= {(c, c), (c, d), (d, c), (d, d)} The ANB is the minimum. ( why ? )

  29. Theorem [Rivest 1976] • Conditions • (1) Domains D1, D2, …, DN D1 = D2 = …= DN = D. • (2) The bucket size = zN where z is an integer and |D| / z = p = integer. • (3) All of the possible records are present.

  30. Theorem [Rivest 1976] (Cont.) • Procedures • (1) Divide D into D'1, D'2, …, D'p where , |D'i| = z. • (2) Store the set of records into one bucket. ANB is minimized.

  31. Extension [Lin, Lee and Du 1979] Example D1 = {a, b, c, d}, D2 = {1, 2, 3, 4}, z = 2 and N = 2 i.e. BZ = zN = 22 = 4 D11 = {a, b} D21 = {1, 2} D1 D2 D12 = {c, d} D22 = {3, 4} BK1 : {(a, 1), (a, 2), (b, 1), (b, 2)} BK2 : {(a, 3), (a, 4), (b, 3), (b, 4)} BK3 : {(c, 1), (c, 2), (d, 1), (d, 2)} BK4 : {(c, 3), (c, 4), (d, 3), (d, 4)} The ANB is still minimized. ※Note : D1 ≠ D2 but |D1|=|D2|.

  32. 3. The Multi-key Hashing (MKH) Method ■ Example

  33. g1(r) = 0 if 18 ≤ r ≤ 19, = 1 if 20 ≤ r ≤ 21, g3(r) = 0 if A ≤ r ≤ B, = 1 if C ≤ r ≤ D, m1 = 2 m3 = 2 g2(r) = 0 if 0 ≤ r ≤ 70, = 1 if 71 ≤ r ≤ 100, m2 = 2 • Steps : • (1) Choose a hash function gi : Di→{ 0, 1, 2, …, mi – 1 } • (2) Associate with each N-tuple [s1, s2, …, sN] a bucket, 0 ≤ si ≤ mi – 1. • (3) Assign the record R = (a1, a2, …, aN) into Bucket [g1(a1), g2(a2), …, gN(aN)] • Disadvantage • “overflow” problem

  34. 2400 ╳ ╳ ╳ ╳ ╳ ╳ ╳ 2200 ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ 2000 ╳ ╳ ╳ ╳ ╳ ╳ 1800 ╳ ╳ 1600 ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ 1400 1200 ╳ ╳ ╳ ╳ ╳ ╳ ╳ 1000 ╳ ╳ c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 D1 (skill-code) m1 = 4, m2 = 3 4. The Multi-dimensional Directory (MDD) Method ■ Example Example D2 (salary)

  35. 1st degree cube • Steps: • (1) Divide D1 into D11, D12, …, D1m1, s.t. each subspace D1i1×D2×…×DN contains approximately records. • (2) Divide each D1i1×D2×…×DN into s.t. each subspace contains approximately records. Nth-degree cube are generated. • (3) Assign each Nth-degree cube into a bucket. • (4) Generate the corresponding directory. 2nd degree cube

  36. 5. The Multi-key Sorting (MKS) Method • Sorting • single-key sorting • a1, a2, …, aL are sorted, iff a1≤ a2 ≤ …≤ aL or a1≥ a2 ≥ …≥ aL. ( or iff |a1-a2|+|a2-a3|+…+|aL-1-aL| is minimal ). • multi-key sorting • A set of records R1, R2, …, RL are sorted, iff d(R1, R2)+d(R2, R3)+…+ d(RL-1, RL) is minimal.

  37. Hamming distance • For records R=(a1, a2, …, aN), R'=(a'1, a'2, …, a'N), the Hamming distance between R and R' will be whereδ(ai, a'i)=0, if ai = a'i, δ(ai, a'i)=1, if ai≠a'i.

  38. Example 1

  39. Example 2 Advantage : Practical Disadvantage: Cannot obtain the optimal solution

  40. 6. Optimal Cartesian Product Files • Problem : minimize where zi is the subdivision size of the ith domain

  41. Example • Consider the case d1=8, d2=4, d3=9 and NB=6. There are two feasible solutions (1) z1=8, z2=2, z3=3 (2) z1=4, z2=4, z3=3 By (1) z1+z2+z3+z1z2+z1z3+z2z3 = 59. By (2) z1+z2+z3+z1z2+z1z3+z2z3 = 51. Therefore, we conclude (2) is the optimum solution. In this case, m1=8/4=2, m2=4/4=1, m3=9/3=3. Divide D1 into D11 and D12. D2 into one subset. D3 into D31, D32 and D33. BK1 : D11×D2×D31 BK2 : D11×D2×D32 BK3 : D11×D2×D33 BK4 : D12×D2×D31 BK5 : D12×D2×D32 BK6 : D12×D2×D33

  42. Definitions • 1. A 2-tuple (a1, a2) is called minimal 2-tuple if for every other 2-tuple (a'1, a'2) where a1a2= a'1 a'2, a1+ a2 < a'1+a'2. • 2. An N-tuple (a1, a2, …, aN) is called a minimal N-tuple of C, if and for 1 ≤ i, j ≤ N, (ai, aj) is a minimal 2-tuple.

  43. Theorem • A CPF is an optimal CPF if the records of each bucket are of the form of where the size of is zi and zi’s (1) z1z2…zN=C (2) = an integer. (3) (z1, z2, …, zN) is the only minimal N-tuple of BZ.

More Related