730 likes | 866 Views
Today’s topics. Data Representation Memory organization Compression Upcoming Systems Connections Algorithms Reading Computer Science , Chapter 1-3. Main Memory Cells. Cell: A unit of main memory (typically 8 bits which is one byte)
E N D
Today’s topics Data Representation • Memory organization • Compression Upcoming • Systems • Connections • Algorithms Reading Computer Science, Chapter 1-3
Main Memory Cells • Cell: A unit of main memory (typically 8 bits which is one byte) • Most significant bit: the bit at the left (high-order) end of the conceptual row of bits in a memory cell • Least significant bit: the bit at the right (low-order) end of the conceptual row of bits in a memory cell
Main Memory Addresses • Address: A “name” that uniquely identifies one cell in the computer’s main memory • The names are actually numbers. • These numbers are assigned consecutively starting at zero. • Numbering the cells in this manner associates an order with the memory cells.
Memory Terminology • Random Access Memory (RAM): Memory in which individual cells can be easily accessed in any order • Dynamic Memory (DRAM): RAM composed of volatile memory
Measuring Memory Capacity • Kilobyte: 210 bytes = 1024 bytes • Example: 3 KB = 3 times1024 bytes • Sometimes “kibi” rather than “kilo” • Megabyte: 220 bytes = 1,048,576 bytes • Example: 3 MB = 3 times 1,048,576 bytes • Sometimes “megi” rather than “mega” • Gigabyte: 230 bytes = 1,073,741,824 bytes • Example: 3 GB = 3 times 1,073,741,824 bytes • Sometimes “gigi” rather than “giga”
Mass Storage • On-line versus off-line • Typically larger than main memory • Typically less volatile than main memory • Typically slower than main memory
Mass Storage Systems • Magnetic Systems • Disk • Tape • Optical Systems • CD • DVD • Flash Drives
Files • File: A unit of data stored in mass storage system • Fields and keyfields • Physical record versus Logical record • Buffer: A memory area used for the temporary storage of data (usually as a step in transferring the data)
Figure 1.12 Logical records versus physical records on a disk
Data Compression • Compression is a high-profile application • .zip, .mp3, .jpg, .gif, .gz, … • What property of MP3 was a significant factor in what made Napster work (why did Napster ultimately fail?) • Why do we care? • Secondary storage capacity doubles every year • Disk space fills up quickly on every computer system • More data to compress than ever before
More on Compression • What’s the difference between compression techniques? • .mp3 files and .zip files? • .gif and .jpg? • Lossless and lossy • Is it possible to compress (lossless) every file? Why? • Lossy methods • Good for pictures, video, and audio (JPEG, MPEG, etc.) • Lossless methods • Run-length encoding, Huffman, LZW, … 11 3 5 3 2 6 2 6 5 3 5 3 5 3 10
Text Compression • Input: String S • Output: String S • Shorter • S can be reconstructed from S
1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 a a b b 0 1 e e c c d d Text Compression: Examples • “abcde” in the different formats • ASCII: 01100001011000100110001101100100… • Fixed: • 000001010011100 • Var: • 000110100110 • Encodings ASCII: 8 bits/character Unicode: 16 bits/character
13 7 3 3 s s * * 1 1 2 2 6 6 g g o o 4 4 3 3 3 3 2 2 2 2 p p e e h h r r 1 1 1 1 1 1 1 1 Huffman coding: go go gophers ASCII 3 bits Huffman g 103 1100111 000 00 o 111 1101111 001 01 p 112 1110000 010 1100 h 104 1101000 011 1101 e 101 1100101 100 1110 r 114 1110010 101 1111 s 115 1110011 110 101 sp. 32 1000000 111 101 • Encoding uses tree: • 0 left/1 right • How many bits? 37!! • Savings? Worth it?
Huffman Coding • D.A Huffman in early 1950’s • Before compressing data, analyze the input stream • Represent data using variable length codes • Variable length codes though Prefix codes • Each letter is assigned a codeword • Codeword is for a given letter is produced by traversing the Huffman tree • Property: No codeword produced is the prefix of another • Letters appearing frequently have short codewords, while those that appear rarely have longer ones • Huffman coding is optimal per-character coding method
Building a Huffman tree • Begin with a forest of single-node trees (leaves) • Each node/tree/leaf is weighted with character count • Node stores two values: character and count • There are n nodes in forest, n is size of alphabet? • Repeat until there is only one node left: root of tree • Remove two minimally weighted trees from forest • Create new tree with minimal trees as children, • New tree root's weight: sum of children (character ignored) • Does this process terminate? How do we get minimal trees? • Remove minimal trees, hummm……
R N C E F P U L S G T O D B A M I 11 1 2 1 2 4 2 2 5 4 1 3 5 3 3 6 3 2 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
F A S E N G C B M U O R L P T D I 11 2 6 5 5 1 1 1 2 2 3 2 2 2 4 3 3 4 3 2 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
L G M E N C F T S P U B R O D A I 11 2 1 1 1 3 5 6 2 5 2 3 2 2 3 2 3 3 2 4 4 3 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
M E S F T C O P N A R L B D G U I 11 2 2 1 1 3 5 5 6 2 1 4 3 3 4 2 4 3 3 3 2 2 4 2 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
D O N F S C T P E M R L A B G U I 11 2 2 2 1 1 6 5 5 3 2 1 4 4 3 4 2 3 4 3 4 3 3 4 2 2 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
D N E F P C S U T M L A G B O R I 11 2 2 2 2 1 6 1 5 5 3 5 1 2 3 5 4 3 4 4 3 4 2 4 4 3 3 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
P N F B C T U S E L M D G A O R I 11 2 2 2 2 1 6 1 5 5 3 6 1 3 4 6 2 4 2 4 4 4 4 5 5 3 3 3 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
G F E B C P U R N S D M O A T L I 11 5 1 1 3 1 2 6 2 2 3 2 5 2 6 5 6 4 5 3 4 3 2 4 4 4 4 6 6 3 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
U B M T O S G D L R P C A F N E I 11 2 3 8 2 2 2 2 1 3 1 1 5 5 6 3 3 3 4 8 6 4 6 6 6 5 4 5 4 4 4 2 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
U B M T O S G D L R P C A F N E I 11 2 3 8 2 2 2 2 1 3 1 1 5 5 6 3 3 3 5 6 8 4 8 6 8 6 4 6 4 4 5 2 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
O R T S M A B N E U F G C D P L I 11 10 10 4 2 2 1 2 1 3 2 3 5 3 5 3 6 1 6 2 3 4 4 5 5 6 6 4 8 8 6 8 8 2 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
O L E S M N A R B T C P G U D F I 11 11 11 10 10 4 4 2 2 2 3 1 3 1 3 1 3 5 5 8 2 2 3 4 4 5 6 6 8 6 8 8 2 6 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
O D M N S E F A L B P T U R G C I 12 11 11 10 12 10 11 1 4 1 3 1 3 2 2 3 2 2 4 3 5 5 2 4 4 5 6 8 8 6 8 8 6 2 3 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
O G F N E S C M D A U B R T L P I 12 11 16 12 10 11 10 16 11 4 1 4 2 3 3 2 3 2 3 1 2 2 3 2 4 5 6 8 6 8 1 5 5 6 4 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
O G F N E S C M D A U B R T L P I 16 11 21 12 21 11 10 16 12 4 1 4 2 3 3 2 3 2 3 1 2 2 3 2 4 5 6 8 6 8 1 5 5 6 4 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
S M U A B T O E N F D C L P R G I 21 23 11 11 16 12 16 23 21 10 1 1 4 2 2 2 3 3 3 2 1 2 5 2 6 4 3 6 4 6 5 4 8 8 3 5 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
S M U A B T O E N F D C L P R G I 21 23 11 11 37 12 16 37 23 10 1 1 4 2 2 2 3 3 3 2 1 2 5 2 6 4 3 6 4 6 5 4 8 8 3 5 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
S M U A B T O E N F D C L P R G I 21 23 11 11 37 12 16 60 60 10 1 1 4 2 2 2 3 3 3 2 1 2 5 2 6 4 3 6 4 6 5 4 8 8 3 5 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
Properties of Huffman coding • Want to minimize weighted path length L(T)of tree T • wi is the weight or count of each codeword i • di is the leaf corresponding to codeword i • How do we calculate character (codeword) frequencies? • Huffman coding creates pretty full bushy trees? • When would it produce a “bad” tree? • How do we produce coded compressed data from input efficiently?
O G S N E F M P C U B R T L D A I 12 11 37 60 10 21 16 23 11 4 1 4 2 3 3 2 2 3 3 1 2 2 2 3 4 5 6 8 6 8 1 5 5 6 4 Decoding a message 01100000100001001101
O G S N E F M P C U B R T L D A I 12 11 37 60 10 21 16 23 11 4 1 4 2 3 3 2 2 3 3 1 2 2 2 3 4 5 6 8 6 8 1 5 5 6 4 Decoding a message 1100000100001001101
O G S N E F M P C U B R T L D A I 12 11 37 60 10 21 16 23 11 4 1 4 2 3 3 2 2 3 3 1 2 2 2 3 4 5 6 8 6 8 1 5 5 6 4 Decoding a message 100000100001001101
O G S N E F M P C U B R T L D A I 12 11 37 60 10 21 16 23 11 4 1 4 2 3 3 2 2 3 3 1 2 2 2 3 4 5 6 8 6 8 1 5 5 6 4 Decoding a message 00000100001001101
T N C E F O U R P D G B A M S L I 60 37 23 11 21 10 11 12 16 2 4 6 4 5 5 3 2 3 1 2 3 3 1 4 5 6 8 6 8 2 1 2 2 3 4 Decoding a message 0000100001001101 G
T N C E F O U R P D G B A M S L I 60 37 23 11 21 10 11 12 16 2 4 6 4 5 5 3 2 3 1 2 3 3 1 4 5 6 8 6 8 2 1 2 2 3 4 Decoding a message 000100001001101 G
T N C E F O U R P D G B A M S L I 60 37 23 11 21 10 11 12 16 2 4 6 4 5 5 3 2 3 1 2 3 3 1 4 5 6 8 6 8 2 1 2 2 3 4 Decoding a message 00100001001101 G
T N C E F O U R P D G B A M S L I 60 37 23 11 21 10 11 12 16 2 4 6 4 5 5 3 2 3 1 2 3 3 1 4 5 6 8 6 8 2 1 2 2 3 4 Decoding a message 0100001001101 G
T N C E F O U R P D G B A M S L I 60 37 23 11 21 10 11 12 16 2 4 6 4 5 5 3 2 3 1 2 3 3 1 4 5 6 8 6 8 2 1 2 2 3 4 Decoding a message 100001001101 G