Topic 20 huffman coding
This presentation is the property of its rightful owner.
Sponsored Links
1 / 63

Topic 20: Huffman Coding PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

Topic 20: Huffman Coding. The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass. Sydney Smith, Edinburgh Review. Compression. Storage of digital information measured in bits and bytes.

Download Presentation

Topic 20: Huffman Coding

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Topic 20 huffman coding

Topic 20: Huffman Coding

The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass.

Sydney Smith, Edinburgh Review


Compression

Compression

Storage of digital information measured in bits and bytes.

Store same information with less memory = compression

lossless and lossy compression

Is it really necessary?

Huffman Coding


Compression why bother

Compression - Why Bother?

Apostolos "Toli" Lerios

Facebook Engineer

Heads image storage group

jpeg images already compressed

look for ways to compress even more

1% less space = millions of dollars in savings

Huffman Coding


Little pipes and big pumps

Little Pipes and Big Pumps

Internet Access Cost

CPU Capability

$1,500 for a good computer

Intel i7 processor

Assume it lasts 3 years.

Memory bandwidth25.6 GB / sec= 2.6 * 1010 bytes / sec

7.6 * 1010 instructions / second

  • 40 Mbps roughly $40 per month.

  • 12 months * 3 years * $40 = $1,440

  • 10,000,000 bits /second= 1.25 * 106 bytes / sec

Huffman Coding


Little pipes and big pumps1

Little Pipes and Big Pumps

CPU

Data In


Encoding

Encoding

  • UTCS

  • 85 84 67 83

  • 0101 0101 0101 0100 0100 0011 0101 0011

  • 0V5V0V5V 0V5V0V5V0V5V0V5V 0V5V0V0V 0V5V0V0V 0V0V5V5V 0V5V0V5V 0V0V5V5V

  • what is a file? what is a file extension?

  • open a bitmap in a text editor

  • open a pdf in word

Huffman Coding


Purpose of huffman coding

Purpose of Huffman Coding

  • Proposed by Dr. David A. Huffman

    • A Method for the Construction of Minimum Redundancy Codes

    • Written in 1952

  • Applicable to many forms of data transmission

    • Our example: text files

    • still used in fax machines, mp3 encoding, others

Huffman Coding


The basic algorithm

The Basic Algorithm

  • Huffman coding is a form of statistical coding

  • Not all characters occur with the same frequency!

  • Yet in ASCII all characters are allocated the same amount of space

    • 1 char = 1 byte, be it e or x

Huffman Coding


The basic algorithm1

The Basic Algorithm

  • Any savings in tailoring codes to frequency of character?

  • Code word lengths are no longer fixed like ASCII or Unicode

  • Code word lengths vary and will be shorter for the more frequently used characters

Huffman Coding


The basic algorithm2

The Basic Algorithm

  • 1.Scan text to be compressed and tally occurrence of all characters.

  • 2.Sort or prioritize characters based on number of occurrences in text.

  • 3.Build Huffman code tree based on prioritized list.

  • 4.Perform a traversal of tree to determine all code words.

  • 5.Scan text again and create new file using the Huffman codes

Huffman Coding


Building a tree scan the original text

Building a TreeScan the original text

  • Consider the following short text

  • Eerie eyes seen near lake.

  • Count up the occurrences of all characters in the text

Huffman Coding


Building a tree scan the original text1

Building a TreeScan the original text

Eerie eyes seen near lake.

  • What characters are present?

E e r i space

y s n a r l k .

Huffman Coding


Building a tree scan the original text2

Char Freq. Char Freq. Char Freq.

E 1 y1 k1

e 8 s 2 .1

r 2 n 2

i 1 a2

space 4 l1

Building a TreeScan the original text

Eerie eyes seen near lake.

  • What is the frequency of each character in the text?

Huffman Coding


Building a tree prioritize characters

Building a TreePrioritize characters

  • Create binary tree nodes with character and frequency of each character

  • Place nodes in a priority queue

    • The lower the occurrence, the higher the priority in the queue

Huffman Coding


Building a tree

E

1

i

1

l

1

.

1

sp

4

e

8

k

1

y

1

a

2

n

2

r

2

s

2

Building a Tree

  • The queue after inserting all nodes

  • Null Pointers are not shown

Huffman Coding


Building a tree1

Building a Tree

  • While priority queue contains two or more nodes

    • Create new node

    • Dequeue node and make it left subtree

    • Dequeue next node and make it right subtree

    • Frequency of new node equals sum of frequency of left and right children

    • Enqueue new node back into queue

Huffman Coding


Building a tree2

E

1

i

1

l

1

.

1

sp

4

e

8

k

1

y

1

a

2

n

2

r

2

s

2

Building a Tree

Huffman Coding


Building a tree3

Building a Tree

l

1

.

1

sp

4

e

8

k

1

y

1

a

2

n

2

r

2

s

2

2

i

1

E

1

Huffman Coding


Building a tree4

Building a Tree

2

k

1

l

1

y

1

.

1

a

2

n

2

r

2

s

2

sp

4

e

8

E

1

i

1

Huffman Coding


Building a tree5

Building a Tree

2

y

1

.

1

a

2

n

2

r

2

s

2

sp

4

e

8

E

1

i

1

2

k

1

l

1

Huffman Coding


Building a tree6

Building a Tree

2

2

y

1

.

1

a

2

n

2

r

2

s

2

sp

4

e

8

k

1

l

1

E

1

i

1

Huffman Coding


Building a tree7

Building a Tree

2

a

2

n

2

r

2

s

2

2

sp

4

e

8

k

1

l

1

E

1

i

1

2

y

1

.

1

Huffman Coding


Building a tree8

Building a Tree

2

a

2

n

2

r

2

s

2

sp

4

e

8

2

2

y

1

.

1

k

1

l

1

E

1

i

1

Huffman Coding


Building a tree9

Building a Tree

r

2

s

2

2

sp

4

e

8

2

2

E

1

i

1

k

1

l

1

y

1

.

1

4

a

2

n

2

Huffman Coding


Building a tree10

Building a Tree

r

2

s

2

e

8

2

sp

4

2

4

2

y

1

.

1

a

2

n

2

E

1

i

1

k

1

l

1

Huffman Coding


Building a tree11

Building a Tree

e

8

4

2

2

2

sp

4

a

2

n

2

k

1

l

1

y

1

.

1

E

1

i

1

4

r

2

s

2

Huffman Coding


Building a tree12

Building a Tree

e

8

4

4

2

2

2

sp

4

a

2

n

2

r

2

s

2

k

1

l

1

y

1

.

1

E

1

i

1

Huffman Coding


Building a tree13

Building a Tree

e

8

4

4

2

sp

4

a

2

n

2

r

2

s

2

y

1

.

1

4

2

2

E

1

i

1

l

1

k

1

Huffman Coding


Building a tree14

Building a Tree

4

4

4

2

e

8

sp

4

2

2

a

2

n

2

r

2

s

2

y

1

.

1

E

1

i

1

l

1

k

1

Huffman Coding


Building a tree15

Building a Tree

4

4

4

e

8

2

2

a

2

n

2

r

2

s

2

E

1

i

1

l

1

k

1

6

sp

4

2

y

1

.

1

Huffman Coding


Building a tree16

Building a Tree

6

4

4

e

8

4

2

sp

4

2

2

r

2

s

2

a

2

n

2

y

1

.

1

E

1

i

1

l

1

k

1

What is happening to the characters with a low number of occurrences?

Huffman Coding


Building a tree17

Building a Tree

4

6

e

8

2

2

2

sp

4

y

1

.

1

E

1

i

1

l

1

k

1

8

4

4

r

2

s

2

a

2

n

2

Huffman Coding


Building a tree18

Building a Tree

4

6

8

e

8

2

2

2

sp

4

4

4

r

1

.

1

E

1

i

1

l

1

k

1

r

2

s

2

a

2

n

2

Huffman Coding


Building a tree19

Building a Tree

8

e

8

4

4

10

r

2

s

2

a

2

n

2

4

6

2

2

2

sp

4

E

1

i

1

l

1

k

1

y

1

.

1

Huffman Coding


Building a tree20

Building a Tree

8

10

e

8

4

4

4

6

2

2

2

r

2

s

2

a

2

n

2

sp

4

E

1

i

1

l

1

k

1

y

1

.

1

Huffman Coding


Building a tree21

Building a Tree

10

16

4

6

2

2

e

8

8

2

sp

4

E

1

i

1

l

1

k

1

y

1

.

1

4

4

r

2

s

2

a

2

n

2

Huffman Coding


Building a tree22

Building a Tree

10

16

4

6

e

8

8

2

2

2

sp

4

4

4

E

1

i

1

l

1

k

1

y

1

.

1

r

2

s

2

a

2

n

2

Huffman Coding


Building a tree23

Building a Tree

26

16

10

4

e

8

8

6

2

2

2

4

4

sp

4

E

1

i

1

l

1

k

1

y

1

.

1

r

2

s

2

a

2

n

2

Huffman Coding


Building a tree24

Building a Tree

  • After enqueueing this node there is only one node left in priority queue.

26

16

10

4

e

8

8

6

2

2

2

4

4

sp

4

E

1

i

1

l

1

k

1

y

1

.

1

r

2

s

2

a

2

n

2

Huffman Coding


Building a tree25

26

16

10

4

e

8

8

6

2

2

2

4

4

sp

4

E

1

i

1

l

1

k

1

y

1

.

1

r

2

s

2

a

2

n

2

Building a Tree

Dequeue the single node left in the queue.

This tree contains the new code words for each character.

Frequency of root node should equal number of characters in text.

Eerie eyes seen near lake. 4 spaces,26 characters total

Huffman Coding


Encoding the file traverse tree for codes

26

16

10

4

e

8

8

6

2

2

2

4

4

sp

4

E

1

i

1

l

1

k

1

y

1

.

1

r

2

s

2

a

2

n

2

Encoding the FileTraverse Tree for Codes

  • Perform a traversal of the tree to obtain new code words

  • left, append a 0 to code word

  • right append a 1 to code word

  • code word is only completed when a leaf node is reached

Huffman Coding


Encoding the file traverse tree for codes1

26

16

10

4

e

8

8

6

2

2

2

4

4

sp

4

E

1

i

1

l

1

k

1

y

1

.

1

r

2

s

2

a

2

n

2

Encoding the FileTraverse Tree for Codes

CharCode

E0000

i0001

k0010

l0011

y0100

.0101

space011

e10

a1100

n1101

r1110

s1111

Huffman Coding


Encoding the file

Rescan text and encode file using new code words

Eerie eyes seen near lake.

Encoding the File

CharCode

E0000

i0001

k0010

l0011

y0100

.0101

space011

e10

a1100

n1101

r1110

s1111

000010111000011001110010010111101111111010110101111011011001110011001111000010100101

Huffman Coding


Encoding the file results

Have we made things any better?

82bits to encode the text

ASCII would take 8 * 26 = 208 bits

Encoding the FileResults

000010111000011001110010010111101111111010110101111011011001110011001111000010100101

  • If modified code used 4 bits per

    character are needed. Total bits

    4 * 26 = 104. Savings not as great.

Huffman Coding


Decoding the file

Decoding the File

  • How does receiver know what the codes are?

  • Tree constructed for each text file.

    • Considers frequency for each file

    • Big hit on compression, especially for smaller files

  • Tree predetermined

    • based on statistical analysis of text files or file types

Huffman Coding


Decoding the file1

Once receiver has tree it scans incoming bit stream

0  go left

1  go right

1010001001111000111111

11011100001010

elk nay sir

eek a snake

eek kin sly

eek snarl nil

eel a snarl

26

16

10

4

e

8

8

6

2

2

2

4

4

sp

4

E

1

i

1

l

1

k

1

y

1

.

1

r

2

s

2

a

2

n

2

Decoding the File

Huffman Coding


Assignment hints

Assignment Hints

  • reading chunks not chars

  • header format

  • the pseudo eof character

  • the GUI

Huffman Coding


Assignment example

Assignment Example

  • "Eerie eyes seen near lake." will result in different codes than those shown in slides due to:

    • adding elements in order to PriorityQueue

    • required pseudo eof character (PEOF)

Huffman Coding


Assignment example1

Assignment Example

Char Freq. Char Freq. Char Freq.

E 1 y1 k1

e 8 s 2 .1

r 2 n 2 PEOF 1

i 1 a2

space 4 l1

Huffman Coding


Assignment example2

Assignment Example

e

8

.

1

E

1

i

1

k

1

l

1

y

1

PEOF

1

a

2

n

2

r

2

s

2

SP

4

Huffman Coding


Assignment example3

Assignment Example

e

8

i

1

k

1

l

1

y

1

PEOF

1

a

2

n

2

r

2

s

2

SP

4

2

.

1

E

1

Huffman Coding


Assignment example4

Assignment Example

e

8

i

1

k

1

l

1

y

1

PEOF

1

a

2

n

2

r

2

s

2

SP

4

2

.

1

E

1

Huffman Coding


Assignment example5

Assignment Example

2

e

8

l

1

y

1

PEOF

1

a

2

n

2

r

2

s

2

SP

4

2

i

1

k

1

.

1

E

1

Huffman Coding


Assignment example6

Assignment Example

2

e

8

PEOF

1

a

2

n

2

r

2

s

2

SP

4

2

2

i

1

k

1

l

1

y

1

.

1

E

1

Huffman Coding


Assignment example7

Assignment Example

2

e

8

n

2

r

2

s

2

SP

4

2

2

3

i

1

k

1

l

1

y

1

a

2

PEOF

1

.

1

E

1

Huffman Coding


Assignment example8

Assignment Example

2

s

2

SP

4

2

2

3

e

8

4

i

1

k

1

l

1

y

1

a

2

PEOF

1

.

1

E

1

n

2

r

2

Huffman Coding


Assignment example9

Assignment Example

2

SP

4

2

3

e

8

4

4

i

1

k

1

l

1

y

1

a

2

PEOF

1

2

s

2

n

2

r

2

.

1

E

1

Huffman Coding


Topic 20 huffman coding

4

4

e

8

7

4

2

s

2

n

2

r

2

2

2

SP

4

3

i

1

k

1

l

1

y

1

.

1

E

1

a

2

PEOF

1

Huffman Coding


Topic 20 huffman coding

7

8

e

8

4

2

2

4

4

SP

4

3

i

1

k

1

l

1

y

1

n

2

r

2

2

s

2

a

2

PEOF

1

.

1

E

1

Huffman Coding


Topic 20 huffman coding

11

8

e

8

7

4

4

4

2

2

n

2

r

2

2

s

2

SP

4

3

i

1

k

1

l

1

y

1

.

1

E

1

a

2

PEOF

1

Huffman Coding


Topic 20 huffman coding

11

16

e

8

8

7

4

4

4

2

2

SP

4

3

2

s

2

n

2

r

2

i

1

k

1

l

1

y

1

a

2

PEOF

1

.

1

E

1

Huffman Coding


Topic 20 huffman coding

27

11

16

e

8

8

7

4

4

4

2

2

SP

4

3

2

s

2

n

2

r

2

i

1

k

1

l

1

y

1

a

2

PEOF

1

.

1

E

1

Huffman Coding


Codes

Codes

value: 32, equivalent char: , frequency: 4, new code 011

value: 46, equivalent char: ., frequency: 1, new code 11110

value: 69, equivalent char: E, frequency: 1, new code 11111

value: 97, equivalent char: a, frequency: 2, new code 0101

value: 101, equivalent char: e, frequency: 8, new code 10

value: 105, equivalent char: i, frequency: 1, new code 0000

value: 107, equivalent char: k, frequency: 1, new code 0001

value: 108, equivalent char: l, frequency: 1, new code 0010

value: 110, equivalent char: n, frequency: 2, new code 1100

value: 114, equivalent char: r, frequency: 2, new code 1101

value: 115, equivalent char: s, frequency: 2, new code 1110

value: 121, equivalent char: y, frequency: 1, new code 0011

value: 256, equivalent char: ?, frequency: 1, new code 0100

Huffman Coding


  • Login