1 / 93

Dr. Thomas Hicks Computer Science Department Trinity University

Hashing. Dr. Thomas Hicks Computer Science Department Trinity University. 1. Address Calculator. Hashing attempts to accomplish Insertion, Deletion, and Searching in Constant Time. ----------------------. Address Calculator. N. ----------------------. ----------------------.

Download Presentation

Dr. Thomas Hicks Computer Science Department Trinity University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing Dr. Thomas Hicks Computer Science Department Trinity University 1

  2. Address Calculator

  3. Hashing attempts to accomplish Insertion, Deletion, and Searching in Constant Time. ---------------------- Address Calculator N ---------------------- ---------------------- unsigned int ---------------------- 0 Can Be Done!

  4. Hashing Requires You To Make Two Important Decisions 1] What Hash Function To Use

  5. Hashing Is A Two Decision Application (1) The First Decision Is What Hash Function To Use: Hash Function - A function that converts an item into an integer suitable to index an array or a direct access file where the item is to be stored. ---------------------- We Are Going To Be Using Social Security Numbers In Our Hashing. 275-75-7575  275,757,575 Hash Function Could Be Modulus 275,757,575 MOD 20 + 1 = ---------------------- 20 275,757,575 ---------------------- 20 ----------------------

  6. HASH FUNCTION TECHNIQUEMODULUS

  7. Social Security Numbers:. 275-75-7575  235,757,575 How Many Of You Think We Could Organize Our Data In Such A Way That We Could Find Any SSN In 1 Look? 999,999,999 ------------------- Address Calculator SSN ------------------- SSN % 999,999,999 ------------------- ------------------- 0

  8. Hashing Is For Large Populations Of Data

  9. Hashing Is Designed For Large Collections Of Data! Would The Student Population Of UT Constitute A Large Collection Of Data? Yea 50,000+ Is Generally Considered A Large Population Of Data

  10. Hash Function = SSN % 999,999,999 How Many Of You Think This Would Be A Hash Function For UT? 999,999,999 ------------------- Address Calculator SSN ------------------- ------------------- ------------------- 0

  11. Hashing Is Can Be One To One

  12. Hash Function = SSN % 999,999,999 This Would Not Be A Hash Function For UT? 999,999,999 ------------------- 50,000------------------- 1,000,000,000 5------------------- 100,000 1------------------- 20,000 .01 % = = = ------------------- If Record Size = 10,000 It Would Require 50,000 x 10,000 = 100,000,000 (1/10 Gig) Hard Drive Space ------------------- How Would You Feel About Using 2,000 GB Of Space For Your Data? ------------------- 0

  13. Acceptable Hashing At Least 80% Loading Factor

  14. Two RequirementsConstituted AcceptableHashing 1 ] At Least 80% Loading Factor 14

  15. Loading The Hash Table Using Linear Probing As A Strategy For Handling Collisions (Generally A File But Could Be An Array)

  16. An Example Of A Perfect Hash FunctionCould Use Modulus(%) to Distribute The Data

  17. Suppose Our Hash Function = SSN % 5 + 1 Suppose We Have 4 Social Security Numbers 454-13-3881 = 454,133,881 = 454133881460-27-3802 = 460,273,802 = 460273802450,273,504 = 450,273,504 = 450273504456-66-2055 = 456,662,055 = 456662055 80% Loading Factor 5 = 2 454,133,881 % 5 + 1 = ___?___ ------------------- 4 ------------------- 3 ------------------- Take 30 Seconds & Fill In Some More Of The Data 2 454133881 ------------------- 1

  18. All Will Not Always Work Out So Nicely! Hash Function = SSN % 5 + 1 Suppose We Have 4 Social Security Numbers =MOD(454133881,5)+1 = 2=MOD(460273802,5)+1 = 3=MOD(450273504,5)+1 = 5=MOD(456662055,5)+1 = 1 No Searches To Find 1 5 450273504 Average Search? ------------------- 4 Total SearchesAverage Search = ---------------------- # Items ------------------- 1 3 460273802 ------------------- 4Average Search = ------ = 1 4 1 2 454133881 ------------------- 1 1 456662055

  19. "Average Search"Also Called "An Access Quotient" In Hashing Total SearchesAverage Search = ---------------------- # Items

  20. All Will Not Always Work Out So Nicely! Hash Function = SSN % 5 + 1 Suppose We Have These 4 Social Security Numbers =MOD(454133881,5)+1 = 2=MOD(456662053,5)+1 = 4=MOD(450273806,5)+1 = 2=MOD(460273802,5)+1 = 3 "Clash" - "Collision" - The result when two or more items in a Hash Table hash out to the same position. 5 ------------------- 4 456662053 ------------------- 3 450273806 ? ------------------- 2 454,133,881 ------------------- 1

  21. Hashing Is A Two Decision Application (1) The First Decision Is What Hash Function To Use: (2) The Strategy For Handling Collisions Example Of A Strategy For Handling Collisions:"Linear Probing" - Place The Item In The Next Available Cell (Go Up - Wrap If Necessary)

  22. All Will Not Always Work Out So Nicely! Hash Function = SSN % 5 + 1 Suppose We Have 4 Social Security Numbers =MOD(454133881,5)+1 = 2=MOD(456662053,5)+1 = 4=MOD(450273806,5)+1 = 2=MOD(460273802,5)+1 = 3 No Searches To Find Linear Probing 5 ------------------- 1 4 456662053 ------------------- 2 3 450273806 ------------------- 1 2 454,133,881 ------------------- 1

  23. All Will Not Always Work Out So Nicely! Hash Function = SSN % 5 + 1 Suppose We Have 4 Social Security Numbers =MOD(454133881,5)+1 = 2=MOD(456662053,5)+1 = 4=MOD(450273806,5)+1 = 2=MOD(460273802,5)+1 = 3 No Searches To Find Linear Probing 5 3 460273802 ------------------- 1 4 456662053 Total SearchesAverage Search = ---------------------- # Items ------------------- 2 3 450273806 ------------------- 7Average Search = ---- = 1.75 4 1 2 454,133,881 ------------------- 1

  24. Acceptable Hashing At Least 80% Loading Factor&Access Quotient Of 1.2 Or Better

  25. Two RequirementsConstituted AcceptableHashing 1 ] At Least 80% Loading Factor 2 ] No More Than 1.2 Access Ratio (Avr Search) 25

  26. What Is A Hash Function? A Hash function is a function that converts an item into an integer suitable to index an array or a direct access file where the item is to be stored.

  27. What Are The Two Requirements For Acceptable Hashing? 1] At Least 80% Loading Factor&2] Access Quotient Of 1.2 Or Better

  28. Hashing Requires You To Make Two Important Decisions 1] What Hash Function To Use 2] What Strategy Do I Use To Handle Collisions/Clashes

  29. How Good Is The Hash Function?

  30. What Did You Think Of The Hash Function : SSN % 5 + 1 ? Suppose We Have 4 Social Security Numbers =MOD(454133881,5)+1 = 2=MOD(460273802,5)+1 = 3=MOD(450273504,5)+1 = 5=MOD(456662055,5)+1 = 1 No Searches To Find 1 5 AbslutelyAwesome! 450273504 ------------------- 4 ------------------- 1 3 460273802 ------------------- 4Average Search = ------ = 1 4 1 2 454133881 ------------------- 1 1 456662055

  31. What Did You Think Of The Hash Function : SSN % 5 + 1 ? Suppose We Have 4 Social Security Numbers =MOD(454133881,5)+1 = 2=MOD(456662053,5)+1 = 4=MOD(450273806,5)+1 = 2=MOD(460273802,5)+1 = 3 No Searches To Find 5 3 460273802 Really GoodOnly One Collision ------------------- 1 4 456662053 ------------------- 2 3 450273806 ------------------- 7Average Search = ---- = 1.75 4 1 2 454,133,881 ------------------- 1

  32. How Good Is The Linear Probing Strategy For Handling The Collisions?

  33. What Did You Think Of Strategy Selected To Handle Collisions : Linear Probing? Suppose We Have 4 Social Security Numbers =MOD(454133881,5)+1 = 2=MOD(460273802,5)+1 = 3=MOD(450273504,5)+1 = 5=MOD(456662055,5)+1 = 1 No Searches To Find 1 5 The Hash FunctionWas So Good It Did Not Matter! 450273504 ------------------- 4 ------------------- 1 3 460273802 ------------------- 4Average Search = ------ = 1 4 1 2 454133881 ------------------- 1 1 456662055

  34. What Did You Think Of Strategy Selected To Handle Collisions : Linear Probing? Suppose We Have 4 Social Security Numbers =MOD(454133881,5)+1 = 2=MOD(456662053,5)+1 = 4=MOD(450273806,5)+1 = 2=MOD(460273802,5)+1 = 3 No Searches To Find 5 3 460273802 Perhaps We Can Find A Better One? ------------------- 1 4 456662053 ------------------- 2 3 450273806 ------------------- 7Average Search = ---- = 1.75 4 1 2 454,133,881 ------------------- 1

  35. Suppose The Hash Function Did Not Distribute The Data Well! After all, the purpose of a good hash function is to randomize something that generally is not random (Part Name, Part No, etc.)

  36. Consider The Following Set Of Social Security Numbers What Is The Least Random Part Of This Collection Of Numbers? First Digit = 4 First Two DigitsOften = 45 or 46 It Is Often Easier To Find A Successful Hash Function If You Can Chop Off (TRUNCATE)The Least Random Portion(s)

  37. HASH FUNCTION TECHNIQUESMODULUSTRUNCATATION

  38. TRUNCATE The First Two Digits - Then Mod + 1 Key The Key Can Be A Combination Of More Than One Data Field In The Record (i.e. Maybe Combine The Last Name & The Phone Number) TRUNCATION Chop Out/Remove The Non-RandomPortion Of The Key Combination

  39. Suppose The Hash Function Did Not Distribute The Data Well! After all, the purpose of a good hash function is to randomize something that often is not random (Part Name, Part No, etc.)

  40. Consider The Following Set Of Social Security Numbers 464 + 133 + 881456 + 662 + 055 464 * 133 * 881456 * 662 * 055 464 * 133 - 881456 * 662 - 055 ABS(464 * (133 - 881))ABS(456 * (662 + 055)) Folding Partitioning The Sequence Digits & Performing Mathematical Constructs On The Subcomponents.

  41. HASH FUNCTION TECHNIQUESMODULUSTRUNCATATIONFOLDING

  42. Suppose We Have 10,000 Social Security Numbers Might Folding Of Three Digit CombinationsBe OK?464 + 133 + 881 + 1 10,000 ------------------- ------------------- ------------------- ------------------- 1

  43. POOR SOLUTION - YUK! Your Hash Function Must Be Capable Of Generating All The Values (1 - 10,000) In Your Key Set Might Folding Of Three Digit CombinationsBe OK?464 + 133 + 881 + 1 Does Any One Have SSN 000-00-0000 + 1 1 Largest SSN 500-99-9999 = 500 + 999 + 999 + 1 ~2500 10,000 ------------------- Address Calculator ------------------- 500-99-9999 ------------------- ------------------- 1 000-00-0000

  44. Use Common Sense & Your Knowledge Of Mathematics To Make Sure That All Values In The Hash Table/File Can Be Generated By Your Hash Function.

  45. Strategies To Resolve Collisions: Adding Data With Linear Probing

  46. 20 Linear Probing Always 80% Loading Factor

  47. Add Enough Cells!

  48. Generate Hash Values

  49. Generate Hash Values

  50. Process

More Related