1 / 24

Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner

Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner. Python Dictionaries. Same idea as Java/C++ hash map, C# dictionary, Perl hash, … H ow is this “magic” implemented?. d = { “snow” : 7 , “apple” : 3 , “white” : 20 } d[ “white” ] = 30 d[ “prince” ] = 16

cleta
Download Presentation

Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design & Analysis of Algorithms COMP 482 / ELEC 420John Greiner

  2. Python Dictionaries Same idea as Java/C++ hash map, C# dictionary, Perl hash, … How is this “magic” implemented? d = {“snow” : 7, “apple” : 3, “white” : 20} d[“white”] = 30 d[“prince”] = 16 d[“apple”]

  3. Combination of Ideas • Hash table & hash function • Dynamic table & amortization To do: [CLRS] 11,17 #4

  4. Hash Tables & Hash Functions d = {“snow” : 7, “apple” : 3, “white” : 20} d[“white”] = 29 d[“prince”] = 16 d[“apple”] 0 Chains: 1 “snow” 3 hash “white”:29 2 “snow”:7, “prince”:16 3 “apple” 7 hash 4 5 “white” 2 hash 6 “apple”:3 7 “prince” 3 hash 8 9

  5. Access Time Depends on Chain Length ? What property do we want of our hash function? ? ? How long is each chain? ? ? How much time per access? ?

  6. Creating a Good Hash Function is Difficult Generally, just use those in libraries. key.__hash__() % table_size class string: def__hash__(self): if not self: return 0 # empty value = ord(self[0]) << 7 for char in self: value = c_mul(1000003, value) ^ ord(char) value = value ^ len(self) if value == -1: # reserved error code value = -2 return value

  7. Dynamic Table Motivation Typically, don’t know how much data we’ll have. • Want underlying hash table to grow, so average chain size is bounded. • Want to retain constant-time indexing. Focus on the latter goal first. • We’ll have to do a little extra to combine hash tables & dynamic tables.

  8. Adding Data when Dynamic Table is Full Must be contiguous for constant-time indexing. data data data data data data data data data data data data

  9. Adding Data when Dynamic Table is Full What’s wrong with just using space at end of array? data data data data data data data data data data data data other data other data That memory might already be in use.

  10. Adding Data when Dynamic Table is Full So, grab needed space elsewhere & copy everything. data data data data data copy data data data data data data data Double the space.

  11. Cost of a Series of Operations Initially: Table size = 5, table empty

  12. Cost of a Series of Operations Add 5 data items, cost = 5 1 2 3 4 5

  13. Cost of a Series of Operations Add 5 more data items, cost = 10 copy 1 1 2 2 3 3 4 4 5 5 6 7 8 9 10

  14. Cost of a Series of Operations Add 5 more data items, cost = 15 1 2 3 4 copy 1 5 2 6 3 7 4 8 5 9 6 10 7 11 8 12 9 13 10 14 15

  15. Cost of a Series of Operations Add 5 more data items, cost = 5 1 2 Total costs: Copying = 15 Adding = 20 Total = 35 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

  16. Cost of Copying is Proportional to Adding 1 2 3 Buy Add 1 Get Copy 2 free! 4 copy 1 5 2 6 3 7 4 8 5 9 6 10 7 8 Each copy costs 2 × #adds since previous copy. 9 10 Use expensive ops sufficiently infrequently.

  17. Amortized Cost Dynamic tables: O(1) amortized time to add data Cost of series of n operations = m  Each operation has amortized cost = m/n

  18. Dynamic Hash Tables d = {“snow” : 7, “apple” : 3, “white” : 20} d[“white”] = 29 d[“prince”] = 16 d[“apple”] “prince”:16 “apple”:3,“white”:29 Hash table “full enough”. Load factor = #items/size “snow”:7 hash hash “snow” “snow” 3 3 hash hash “apple” “apple” 2 7 hash hash “white”:29 “white” “white” 2 2 “snow”:7, “prince”:16 Must rehash all data. hash hash “prince” “prince” 1 3 “apple”:3

  19. Hash Table & Dynamic Table Odds & Ends Hash table size: prime or power-of-2? Chaining vs. open addressing Dynamic table expansion factor Dynamic table contraction Python’s dictionary/hash table implementation Hash tables: Dynamic tables: Python list … Memory heaps for GC Python dict. & set … When want static memory size.

  20. Multipop Amortized cost: ?

  21. Incrementing Binary Counter Another way to sum the costs?

  22. Amortized Analysis Approaches • Aggregate: • For all sequences of m ops, find maximum sum of actual costs. • Potentially difficult to know actual costs or bound well. • Result is sum/m. Sufficient for many commonly-used examples.

  23. Amortized Analysis Approaches • Accounting: • Computeactual costs ci of each kind of op. • Define accounting costs ĉiof each kind of op, such that can assign credits ĉi-ci consistently to data elts. • Can “overpay” on some ops, and use credits to “underpay” on other ops later. • Poor definitions lead to loose bounds. • O(max acct. cost.)

  24. Amortized Analysis Approaches • Potential: • Define potential function Φ(D), such that Φ(Di)≥ Φ(D0). • Essentially assigns “credits” to data structure, rather than operations. • Poor definition leads to loose bounds. • Calculate accounting costs ĉi=ci+ΔΦof each kind of op. • O(max acct. cost.) Complicated approach necessary for more complicated data structures.

More Related