data structures n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Structures PowerPoint Presentation
Download Presentation
Data Structures

Loading in 2 Seconds...

play fullscreen
1 / 26

Data Structures - PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on

Data Structures. Week 6 : Assignment # 2 Problem http://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/. Requirement. Encode a message using Huffman's algorithm Use Min Heap as the priority queue dynamic allocation The input consists of stings A string consists of alphabets only

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Structures' - aletta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data structures

Data Structures

Week 6: Assignment #2 Problem

http://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/

requirement
Requirement
  • Encode a message using Huffman's algorithm
  • Use Min Heap as the priority queue
    • dynamic allocation
  • The input consists of stings
    • A string consists of alphabets only
      • Upper case and lower case letters are treated as different characters
      • stored in a text file
      • given in separate lines
requirement cont
Requirement – cont’
  • Output
    • should be stored in a text file in the following format
  • Due date
    • 2001/5/23 24:00

Heap Traversal:

[character or string]...

Huffman Tree Traversal:

[character or string]...

character: frequency, code

.

.

.

the code for the message:

encoding
Encoding
  • Encode the message as a long bit string
    • assign a bit string code to each symbol of the alphabet
    • then, concatenate the individual codes of the symbols making up the message to produce an encoding for the message
example 1
Example#1

Symbol Code

A 010

B 100

C 000

D 111

  • ABACCDA
    • 010100010000000111010
    • Three bits are used for each symbol
    • 21 bits are needed to encode the message
      • inefficient
example 2
Example#2

Symbol Code

A 00

B 01

C 10

D 11

  • ABACCDA
    • 00010010101100
    • Two bits are used for each symbol
    • 14 bits are needed to encode the message
example 3
Example#3
  • ABACCDA
    • Each of the letters B and D appears only once in the message
    • The letter A appears three times
    • The letter A assigned a shorter bit string than the letters B and D
example 3 cont
Example#3 - cont’

Symbol Code

A 0

B 110

C 10

D 111

  • ABACCDA
    • 0110010101110
    • Encoding of the message requires only 13 bits
      • more efficient
variable length code
Variable-Length Code
  • If variable-length codes are used
    • the code for one symbol may not be a prefix of the code for another
  • Example
    • The code for a symbol x, c(x)
      • a prefix of the code of another symbol y, c(y)
    • When c(x) is encountered in a left-to-right scan
      • It is unclear whether c(x) represents the symbol x or whether it is the first part of c(y).
optimal encoding scheme 1
Optimal Encoding Scheme(1)

Symbol Frequency

A 3

B 1

C 2

D 1

  • Find the two symbols that appear least frequently
  • These are B and D
  • Combine these two symbols into the single symbol BD
  • The frequency of this new symbol is the sum of the frequencies of its two symbols
  • The frequency of BD is 2
optimal encoding scheme 2
Optimal Encoding Scheme (2)

Symbol Frequency

A 3

C 2

BD 2

  • Again choose the two symbols with smallest frequency
  • These are C and BD
  • Combine these two symbols into the single symbol CBD
  • The frequency of this new symbol is the sum of the frequencies of its two symbols
  • The frequency of CBD is 4
optimal encoding scheme 3
Optimal Encoding Scheme (3)

Symbol Frequency

A 3

CBD 4

  • There are now only two symbols remaining
  • These are combined into the single symbol ACBD
  • The frequency of ACBD is 7

Symbol Frequency

ACBD 7

optimal encoding scheme 4
Optimal Encoding Scheme (4)
  • ACBD (A and CBD)
    • assigned the codes 0 and 1
  • CBD (C and BD)
    • assigned the codes 10 and 11
  • BD (B and D)
    • assigned the codes 110 and 111
the huffman s algorithm 7
The Huffman’s Algorithm (7)

ACBD7

ACBD7

A3

CBD4

BD2

C2

B1

D1

the huffman s algorithm 8
The Huffman’s Algorithm (8)
  • Build a min heap which contains the nodes of all symbols with the frequency values as the keys
  • Delete two nodes from the heap, concatenate the two symbols, add their frequencies, and put the result back into the heap
  • Make the two nodes become the two children of the node of the concatenated symbol

i.e) if s=s1 s2 is the symbol concatenated from s1 and s2, then s1 and s2 become the left child and right child of s

  • Continue steps 2 and 3 until priority queue is empty
the huffman s algorithm 9
The Huffman’s Algorithm (9)
  • Once the Huffman tree is constructed
    • the code of any symbol can be constructed by starting at the leaf representing that symbol
    • climbing up to the root
    • The code is initialized to null
    • each time that a left branch is climbed
      • 0 is appended to the beginning of the code
    • each time that a right branch is climbed
      • 1 is appended to the beginning of the code
the huffman s algorithm 10
The Huffman’s Algorithm (10)

VAR

position[i] : a pointer to the ith symbol

n : the number of symbols /*none zero frequency */

frequency[i] : the relative frequency of the ith symbol

code[i] : the code assigned to the ith symbol

p, p1, p2: a pointer to Min heap's node or huffman tree's node

Main Function

{

initialization;

count the frequency of each symbol within the message;

// construct a node for each symbol

for(i=0; i < n; i++){

<p> = create <frequency[i]> a node;

position[i] = p; //a pointer to the leaf containing the ith symbol

insert <p> into Min heap ;

}//end for

the huffman s algorithm 11
The Huffman’s Algorithm (11)

while(Min heap contains more than one item){

<p1> = delete Min heap;

<p2> = delete Min heap;

//combine p1 and p2 as branches of a single tree

<p> = create < info(p1)+info(p2) > a node;

set <p1> to be left_child of huffman tree p;

set <p2> to be right_child of huffman tree p;

insert <p> into Min heap;

}//end while

the huffman s algorithm 12
The Huffman’s Algorithm (12)

//the tree is now constructed; use it to find codes

<root> = delete Min heap;

for(i=0; i<n; i++){

p = position[i];

code[i] = NULL;

while(p!=root){

//travel up to the root

if(is left<p>)

code[i]= 0 followed by code[i];

else

code[i]= 1 followed by code[i];

<p> = move <p> to father node;

} // end while

}//end for

}//end main