- 64 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'EEM 480' - muhammad-jasmi

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Symbol Table

- Symbol tables are used by compilers to keeptrack of information about
- variables
- functions
- class names
- type names
- temporary variables
- etc.
- Typical symbol table operations are Insert,Delete and Search
- It's a dictionary structure!

Symbol Table

- What kind of information is usually stored in asymbol table?
- Type ( int, short, long int, float, …)
- storage class (label, static symbol, external def,structure tag,..)
- size
- scope
- stack frame offset
- register
- We also need a way to keep track of reservedwords.

Symbol Table

Where is a symbol table stored?

- array/linked list
- simple, but linear lookup time
- However, we may use a sorted array for reservedwords, since they are generally few and known inadvance.
- balanced tree
- O(logn) lookup time
- hash table
- most common implementation
- O(1) amortized time for dictionary operations

Hashing

- Depends on mapping keys into positions in a table called hash table
- Hashing is a technique used for performing insertions, deletions and searches in constant average time

Hashing

- In this example john maps 3
- Phil maps 4 …
- Problem :
- How mapping will be done?
- If two items maps the same place what happens?

A Plan For Hashing

- Save items in a key-indexed table. Index is a function of the key.
- Hash function.
- Method for computing table index from key.
- Collision resolution strategy.
- Algorithm and data structure to handletwo keys that hash to the same index.
- If there is no space limitation
- Trivial hash function with key as address.
- If there is no time limitation
- Trivial collision resolution = sequential search.
- Limitations on both time and space: hashing (the real world)

Hashing

- Hash tables
- use array of size m to store elements
- given key k (the identifier name), use a function h tocompute index h(k) for that key
- collisions are possible
- two keys hash into the same slot.
- Hash functions
- is easy to compute
- avoids collisions (by breaking up patterns in the keys anduniformly distributing the hash values)

Hashing

- Nomenclature
- k is a key
- h(k) is the hash function
- m is the size of the hash table
- n is the number of keys in the hash table

What is Hash

- (in Wikipedia) Hash is an American dish consisting of a mixture of beef (often corned beef or roast beef), onions, potatoes, and spicesthat are mashed together into a coarse, chunky paste, and then cooked, either alone, or with other ingredients.
- Is it related with our definition????
- to chop any patterns in the keys sothat the results are uniformly distributed

What is Hashing

Hashing is the transformation of a stringof characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a databasebecause it is faster to find the item using the shorter hashed key than to find it using the original value. It is also used in many encryption algorithms.

Hashing

- When the key is a string, we generally usethe ASCII values of its characters in someway:
- Examples for k = c1c2c3...cx
- h(k) = (c1128(x-1)+c2128(x-2)+...+cx128*0) mod m
- h(k) = (c1+c2+...+cx) mod m
- h(k) = (h1(c1)+h2(c2)+...hx(cx)) mod m, whereeach hi is an independent hash function.

Finding A Hash Function

- Goal: scramble the keys.
- Each table position equally likely for each key.
- Ex: Vatandaşlık Numarası for 10000 person
- Bad: The Whole Number Since 10000 will not be used forever
- Better: last three digits. But every number is even
- The Best : Use 2,3,4,5 digits
- Ex: date of birth.
- Bad: first three digits of birth year.
- Better: birthday.
- Ex: phone numbers.
- Bad: first three digits.
- Better: last three digits.

Hash Function

Truncation

- Ignore part of the key and use theremaining part directly as the index.
- Example: if the keys are 8-digit numbersand the hash table has 1000 entries, thenthe first, fourth and eighth digit could makethe hash function.
- Not a very good method : does notdistribute keys uniformly

Hash Function

Folding

- Break up the key in parts and combinethem in some way
- Example : if the keys are 9 digit numbers,break up a key into three 3-digit numbersand add them up.
- Ex ISBN 0-321-37319-7
- Divide them to three as 321 373 and 197
- Add them : 891 use it as mod 500 = 491

Hash Function

Middle square

- Compute k*k and pick some digits from theresulting number
- Example : given a 9-digit key k, and a hashtable of size 1000 pick three digits from themiddle of the number k*k.
- Ex 175344387 – 344*344= 118336 -----183 or 833
- Works fairly well in practice if the keys donot have many leading or trailing zeroes.

Hash Function

Division

- h(k)=k mod m
- Fast
- Not all values of m are suitable for this. Forexample powers of 2 should be avoidedbecause then k mod m is just the leastsignificant digits of k
- Good values for m are prime numbers .

Hash Function

Multiplication

- h(k)=int(m *(k * c- int(k * c) ) , 0<c<1
- In English :
- Multiply the key k by a constant c, 0<c<1
- Take the fractional part of k * c
- Multiply that by m
- Take the floor of the result
- The value of m does not make a difference
- Some values of c work better than others
- A good value for c :

Hash Function

- Multiplication
- Example:
- Suppose the size of the table, m, is 1301.
- For k=1234, h(k)=850
- For k=1235, h(k)=353
- For k=1236, h(k)=115
- For k=1237, h(k)=660
- For k=1238, h(k)=164
- For k=1239, h(k)=968
- For k=1240, h(k)=471

Hash Function

- Universal Hashing
- Worst-case scenario: The chosen keys all hashto the same slot.
- This can be avoided if the hash function is notfixed:
- Start with a collection of hash functions with theproperty that for any given set of inputs they willscatter the inputs among the range of the function well
- Select one at random and use that.
- Good performance on average: the probability that therandomly chosen hash function exhibits the worst-case behavior is very low.

When Collusion Occurs...

- Collusion Occurs when more than one item has been mapped to the same location
- Ex n = 10 m = 10 Use mod 10
- 9 will be mapped to 9
- 769 will be mapped to 9
- In probability theory, the birthday problem or birthdayparadoxpertains to the probability that in a setof randomly chosen people some pair of them will have the same birthday. In a group of 23 (or more) randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. For 57 or more people, the probability is more than 99%, reaching 100% as the number of people reaches 366. The mathematics behind this problem leads to a well-known cryptographic attack called the birthday attack.
- When collusion occurs an algorithm has to map the second, third, ...n’th item to a definitive places in the map
- In order to read data from the map the same algorithm has been used to retrieve it.

Resolving Collusion

Chaining

- Put all the elements that collide in a chain(list) attached to the slot.
- The hash table is an array of linked lists
- The load factor indicates the averagenumber of elements stored in a chain. Itcould be less than, equal to, or largerthan 1.

What is Load Factor?

- Given a hash table of size m, and n elementsstored in it, we define the load factor of thetable as =n/m (lambda)
- The load factor gives us an indication of howfull the table is.
- The possible values of the load factor dependon the method we use for resolving collisions.

Return to Resolving Collision Chaining ctd.

- Chaining puts elements that hash to thesame slot in a linked list

- Separate chaining: array of M linked lists.
- Hash: map key to integer i between 0 and M-1.
- Insert: put at front of ith chain.
- constant time
- Search: only need to search ith chain.
- proportional to length of chain

Chaining

- Insert/Delete/Lookup in expected O(1)time
- Keep the list doubly-linked to facilitatedeletions
- Worst case of lookup time is linear.
- However, this assumes that the chainsare kept small.
- If the chains start becoming too long, thetable must be enlarged and all the keysrehashed.

Chaining Performance

- Search cost is proportional to length of chain.
- Trivial: average length = N / M.
- Worst case: all keys hash to same chain.
- Theorem. Let λ= N / M > 1 be average length of list which is called loading factor.
- Average search cost : 1+ λ/2
- What is the choice of M
- M too large too many empty chains.
- M too small chains too long.
- Typical choice: = N / M ~ 10 constant-time search/insert.

Chaining Performance

- Analysis of successful search:
- Expected number e of elements examinedduring a successful search for key k= one more than the expected number ofelements examined when k was inserted.
- it makes no difference whether we insert at the beginning orthe end of the list.
- Take the average, over the n items in thetable, of 1 plus the expected length of thechain to which the ith element was added:

Open Addressing

Open addressing

- Store all elements within the table
- The space we save from the chain pointers is usedinstead to make the array larger.
- If there is a collision, probe the table in asystematic way to find an empty slot.
- If the table fills up, we need to enlarge it andrehash all the keys.

Open Addressing

- hash function: (h(k) + i ) mod m for i=0, 1,...,m-1
- Insert : Start with the location where the key hashed anddo a sequential search for an empty slot.
- Search : Start with the location where the key hashedand do a sequential search until you either find the key(success) or find an empty slot (failure).
- Delete : (lazy deletion) follow same route but mark slotas DELETED rather than EMPTY, otherwise subsequentsearches will fail.

Hash Table without Linked-List

- Linear probing: array of size M.
- Hash: map key to integer i between 0 and M-1.
- Insert: put in slot i if free, if not try i+1, i+2, etc.
- Search: search slot i, if occupied but no match, try i+1, i+2, etc.
- Cluster.
- Contiguous block of items.
- Search through cluster using elementary algorithm for arrays.

Open Address Lineer Probing

- Advantage: very easy to implement
- Disadvantage: primary clustering
- Long sequences of used slots build up with gapsbetween them. Every insertion requires severalprobes and adds to the cluster.
- The average length of a probe sequence wheninserting is

Quadratic Probes

- Probe the table at slots (h(k) + i2) mod m

for i =0, 1,2, 3, ..., m-1

- Ease of computation:
- Not as easy as linear probing.
- Do we really have to compute a power?
- Clustering
- Primary clustering is avoided, since the probesare not sequential.

Search Quadratic Probing

- Probe sequence for hash value 3 in a table ofsize 16:

3 + 0^2 = 3

3 + 1^2 = 4

3 + 2^2 = 7

3 + 3^2 = 12

3 + 4^2 = 3

3 + 5^2 = 12

3 + 6^2 = 7

3 + 7^2 = 4

3 + 8^2 = 3

3 + 9^2 = 4

3 + 10^2 = 7

3 + 11^2 = 12

3 + 12^2 = 3

3 + 13^2 = 12

3 + 14^2 = 7

3 + 15^2 = 4

Quadrature Probing

- Probe sequence for hash value 3 in a table ofsize 19:

3 + 0^2 = 3

3 + 1^2 = 4

3 + 2^2 = 7

3 + 32 = 12

3 + 42 = 0

3 + 52 = 9

3 + 62 = 1

3 + 72 = 14

3 + 82 = 10

3 + 92 = 8

Quadrature Probing

- Disadvantage: secondary clustering:
- if h(k1)==h(k2) the probing sequences fork1 and k2 are exactly the same.
- Is this really bad?
- In practice, not so much
- It becomes an issue when the load factor ishigh.

Double Hashing

- The hash function is (h(k)+i h2(k)) mod m
- In English: use a second hash function to obtainthe next slot.
- The probing sequence is:
- h(k), h(k)+h2(k), h(k)+2h2(k), h(k)+3h3(k), ...
- Performance :
- Much better than linear or quadratic probing.
- Does not suffer from clustering
- BUT requires computation of a second function

Double Hashing

- The choice of h2(k) is important
- It must never evaluate to zero
- consider h2(k)=k mod 9 for k=81
- The choice of m is important
- If it is not prime, we may run out of alternatelocations very fast.

Rehashing

- After 70% of table is full, double the size of the hash table.
- Don’t forget to have prime number

Lempel-Ziv-Welch (LZW) Compression Algorithm

- Introduction to the LZW Algorithm
- Example 1: Encoding using LZW
- Example 2: Decoding using LZW
- LZW: Concluding Notes

Introduction to LZW

- As mentioned earlier, static coding schemes require some knowledge about the data before encoding takes place.
- Universal coding schemes, like LZW, do not require advance knowledge and can build such knowledge on-the-fly.
- LZW is the foremost technique for general purpose data compression due to its simplicity and versatility.
- It is the basis of many PC utilities that claim to “double the capacity of your hard drive”
- LZW compression uses a code table, with 4096 as a common choice for the number of table entries.

Introduction to LZW (cont'd)

- Codes 0-255 in the code table are always assigned to represent single bytes from the input file.
- When encoding begins the code table contains only the first 256 entries, with the remainder of the table being blanks.
- Compression is achieved by using codes 256 through 4095 to represent sequences of bytes.
- As the encoding continues, LZW identifies repeated sequences in the data, and adds them to the code table.
- Decoding is achieved by taking each code from the compressed file, and translating it through the code table to find what character or characters it represents.

LZW Encoding Algorithm

1 Initialize table with single character strings

2 P = first input character

3 WHILE not end of input stream

4 C = next input character

5 IF P + C is in the string table

6 P = P + C

7 ELSE

8 output the code for P

9 add P + C to the string table

10 P = C

11 END WHILE

12 output code for P

LZW Decompression

- The LZW decompressor creates the same string table during decompression.
- It starts with the first 256 table entries initialized to single characters.
- The string table is updated for each character in the input stream, except the first one.
- Decoding achieved by reading codes and translating them through the code table being built.

LZW Decompression Algorithm

1 Initialize table with single character strings

2 OLD = first input code

3 output translation of OLD

4 WHILE not end of input stream

5 NEW = next input code

6 IF NEW is not in the string table

7 S = translation of OLD

8 S = S + C

9 ELSE

10 S = translation of NEW

11 output S

12 C = first character of S

13 OLD + C to the string table

14 OLD = NEW

15 END WHILE

Example 2: LZW Decompression 1

Example 2: Use LZW to decompress the output sequence of

Example 1:

<66><65><256><257><65><260>.

LZW: Some Notes

- This algorithm compresses repetitive sequences of data well.
- Since the codewords are 12 bits, any single encoded character will expand the data size rather than reduce it.
- In this example, 72 bits are represented with 72 bits of data. After a reasonable string table is built, compression improves dramatically.
- Advantages of LZW over Huffman:
- LZW requires no prior information about the input data stream.
- LZW can compress the input stream in one single pass.
- Another advantage of LZW its simplicity, allowing fast execution.

LZW: Limitations

- What happens when the dictionary gets too large (i.e., when all the 4096 locations have been used)?
- Here are some options usually implemented:
- Simply forget about adding any more entries and use the table as is.
- Throw the dictionary away when it reaches a certain size.
- Throw the dictionary away when it is no longer effective at compression.
- Clear entries 256-4095 and start building the dictionary again.
- Some clever schemes rebuild a string table from the last N input characters.

Download Presentation

Connecting to Server..