1 / 15

Hash File

Hash File. Considerations. Hashing - Hash File Considerations. Statistical Considerations Record Distribution is important Ideal - one record per location Load Factor - How full the file is Load Factor = r / b * m r - number of records stored b - bucket size m - number of addresses.

Download Presentation

Hash File

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hash File Considerations

  2. Hashing - Hash File Considerations • Statistical Considerations • Record Distribution is important • Ideal - one record per location • Load Factor - How full the file is • Load Factor = r / b * m • r - number of records stored • b - bucket size • m - number of addresses

  3. Hashing - Statistical Considerations • Graphing Record Distribution • Frequency Distribution Graph • Y axis - records per address • X axis - RRP • Alternate Frequency Distribution Graph • Y axis - Number of address with x records • X axis - x records assigned • Example - (x DIV 5) MOD 4, • Data: 22, 1, 14, 56, 25, 13, 43, 62, 11

  4. Hashing - Overall Guidelines • If possible, use uniformly distributed Keys • Use a carefully designed hashing scheme • Do statistical studies if possible • Monitor performance • Should be computationally efficient • Taylor bucket size and load factor to particular I/O device

  5. Hashing - Advantages • Flexibility • Adaptable to a variety of situations • Useful both for disk and memory based retrieval • Efficiency of record access • Can achieve O(1) access times

  6. Hashing - Disadvantages • No ordered record access by PK • Data (key set) dependency • Must be specifically tailored for each key distribution and form • If characteristics change, hashing scheme may need to change • Fixed upper limit on file size • Size determined at creation time • Must "rehash" to larger file if expansion needed • May need to redesign hash algorithm as well

  7. Hashing Considerations • Static vs. Dynamic Files • Static files • fixed key data • entire domain of keys known a priori (key set) • By experimentation, my be able to find collision free solution • Examples • Assembler OP code table • FAX group three compression table

  8. Hashing Considerations • Static vs. Dynamic Files • Dynamic files • Key set not known in advance • Patterns/samples of keys may be known • Collision free solution not generally possible • Experimentation may be used to to fine good hash algorithm and configuration. • Hash Algorithm technique • File size • bucket size • Overflow strategy

  9. Hashing Considerations • Static vs. Dynamic Hashing • Static Hashing • file size fixed over life of file • must rebuild to make larger • Dynamic Hashing • file may expand and contract over time • called extensible hashing

  10. Hashing Considerations • Distribution of keys • May know some information about key distribution in advance • Complete set • patterns are predicable • completely unpredictable

  11. Hashing Considerations • Files versus arrays • Hashing suitable for both primary and secondary retrieval purposes. • Primary memory based systems • I/O time not a consideration • buckets not really helpful • Other factors gain in importance • Hash algorithm complexity • overflow technique

  12. Hashing Considerations • Hash Algorithms - general forms • Division • Division remainder scheme an example. • Choice of divisor importance • Should be prime relative to the file size. • Should not be a power of two. • Bad choices result in simple truncation, thus part of the key is simply discarded.

  13. Hashing Considerations • Hash Algorithms - general forms • Multiplication • Multiplicative techniques tend to use ALL of the information in the key (no truncation) • Mid-square technique is an example. • Compression. extraction, folding • Useful for large keys

  14. Hashing Considerations • Hash Algorithms - general forms • Double Hashing • Rather then progressive overflow on collision, use a secondary hash function to generate a step length for the next probe • Helps reduce secondary clustering of linear probing with step size greater then one. • Non-linear, or random probing

  15. Hashing Considerations • Hash Algorithms - general forms • Multi-Attribute hashing • Base the calculation for home address on more than the primary key attribute. • Useful if the primary key exhibits certain bad hashing attributes (clustering, etc.) • Example - use part number (PK) and distributor fields. • Extendible Hashing • See text

More Related