MATH 1020: Mathematics For Non-scienceChapter 3.1: Information in a networked age Instructor: Dr. Ken Tsang Room E409-R9 Email: email@example.com
Transmitting Information • Binary codes • Encoding with parity-check sums • Data compression • Cryptography • Model the genetic code
The Challenges • Mathematical Challenges in the Digital Revolution • How to correct errors in data transmission • How to electronically send and store information economically • How to ensure security of transmitted data • How to improve Web search efficiency
Binary Codes • A binary code is a system for encoding data made up of 0’s and 1’s • Examples • Postnet (tall = 1, short = 0) • UPC (universal product code, dark = 1, light = 0) • Morse code (dash = 1, dot = 0) • Braille (raised bump = 1, flat surface = 0) • Yi-jing易经 (Yin=0, yang=1)
Binary Codes are Everywhere CD, MP3, and DVD players, digital TV, cell phones, the Internet, GPS system, etc. all represent data as strings of 0’s and 1’s rather than digits 0-9 and letters A-Z Whenever information needs to be digitally transmitted from one location to another, a binary code is used
Transmission Problems • What are some problems that can occur when data is transmitted from one place to another? • The two main problems are • transmission errors: the message sent is not the same as the message received • security: someone other than the intended recipient receives the message
Transmission Error Example Suppose you were looking at a newspaper ad for a job, and you see the sentence “must have bive years experience” We detect the error since we know that “bive” is not a word Can we correct the error? Why is “five” a more likely correction than “three”? Why is “five” a more likely correction than “nine”?
Another Example Suppose NASA is directing one of the Mars rovers by telling it which crater to investigate There are 16 possible signals that NASA could send, and each signal represents a different command NASA uses a 4-digit binary code to represent this information
Lost in Transmission The problem with this method is that if there is a single digit error, there is no way that the rover could detect or correct the error If the message sent was “0100” but the rover receives “1100”, the rover will never know a mistake has occurred This kind of error – called “noise” – occurs all the time
BASIC IDEA • The details of techniques used to protect information against noise in practice are sometimes rather complicated, but basic principles are easily understood. • The key idea is that in order to protect a message against a noise, we should encode the message by adding some redundant information to the message. • In such a case, even if the message is corrupted by a noise, there will be enough redundancy in the encoded message to recover, or to decode the message completely.
Adding Redundancy to our Messages • To decrease the effects of noise, we add redundancy to our messages. • First method: repeat the digits multiple times. • Thus, the computer is programmed to take any five-digit message received and decode the result by majority rule.
Majority Rule • So, if we sent 00000, and the computer receives any of the following, it will still be decoded as 0. 00000 11000 Notice that for the 10000 10100 computer to decode 01000 10010 incorrectly, at least 00010 10001 three errors must be 00001 etc. made.
Independent Errors • Using the five-time repeats, and assuming the errors happen independently, it is less likely that three errors will occur than two or fewer will occur. • This is called the maximum likelihood decoding.
Why don’t we use this? • Repetition codes have the advantage of simplicity, both for encoding and decoding • But, they are too inefficient! • In a five-fold repetition code, 80% of all transmitted information is redundant. • Can we do better? • Yes!
More Redundancy Another way to try to avoid errors is to send the same message twice This would allow the rover to detect the error, but not correct it (since it has no way of knowing if the error occurs in the first copy of the message or the second)
Parity-Check Sums • Sums of digits whose parities determine the check digits. • Even Parity – Even integers are said to have even parity. • Odd Parity – Odd integers are said to have odd parity. • Decoding • The process of translating received data into code words. • Example: Say the parity-check sums detects an error. • The encoded message is compared to each of the possible correct messages. This process of decoding works by comparing the distance between two strings of equal length and determining the number of positions in which the strings differ. • The one that differs in the fewest positions is chosen to replace the message in error. • In other words, the computer is programmed to automatically correct the error or choose the “closest” permissible answer. 16
Over the past 40 years, mathematicians and engineers have developed sophisticated schemes to build redundancy into binary strings to correct errors in transmission! One example can be illustrated with Venn diagrams! Error Correction Claude Shannon (1916-2001) “Father of Information Theory”
Computing the Check Digits Venn Diagrams V I VI III IV II VII The original message is four digits long We will call these digits I, II, III, and IV We will add three new digits, V, VI, and VII Draw three intersecting circles as shown here Digits V, VI, and VII should bechosen so that each circlecontains an even number ofones
A Hamming (7,4) code • A Hamming code of (n,k) means the message of k digits long is encoded into the code word of n digits. • The 16 possible messages: 0000 1010 0011 1111 0001 1100 1110 0010 1001 1101 0100 0110 1011 1000 0101 0111
The error correcting scheme we just saw is a special case of a Hamming code. These codes were first proposed in 1948 by Richard Hamming (1915-1998), a mathematician working at Bell Laboratories. Hamming was frustrated with losing a week’s worth of work due to an error that a computer could detect, but not correct. Binary Linear Codes
Appending Digits to the Message 1 0 0 0 0 1 1 The message we want to send is “0100” Digit V should be 1 so that the first circle has two ones Digit VI should be 0 so that the second circle has zero ones (zero is even!) Digit VII should be 1 so thatthe last circle has two ones Our message is now 0100101
Encoding those messages Message codeword 0000 0000000 0110 0110010 0001 0001011 0101 0101110 0010 0010111 0011 0011100 0100 0100101 1110 1110100 1000 1000110 1101 1101000 1010 1010001 1011 1011010 1100 1100011 0111 0111001 1001 1001101 1111 1111111
Detecting and Correcting Errors 1 0 0 0 1 1 1 Now watch what happens when there is a single digit error We transmit the message 0100101 and the rover receives 0101101 The rover can tell that the second and third circles have odd numbers of ones, but the first circle is correct So the error must be in the digit that is in the second and third circles, but not the first: that’s digit IV Since we know digit IV is wrong, there isonly one way to fix it: change it from 1 to 0
Try It! Encode the message 1110 using this method You have received the message 0011101. Find and correct the error in this message.
Extending This Idea This method only allows us to encode 4 bits (16 possible) messages, which isn’t even enough to represent the alphabet! However, if we use more digits, we won’t be able to use the circle method to detect and correct errors We’ll have to come up with a different method that allows for more digits
Parity Check Sums The circle method is a specific example of a “parity check sum” The “parity” of a number is 1 is the number is odd and 0 if the number is even For example, digit V is 0 if I + II + III is even, and 1 if I + II + III is odd
Conventional Notation Instead of using Roman numerals, we’ll use a1 to represent the first digit of the message, a2 to represent the second digit, and so on We’ll use c1 to represent the first check digit, c2 to represent the second, etc.
Old Rules in the New Notation c1 a1 c2 a3 a4 a2 c3 • Using this notation, our rules for our check digits become • c1 = 0 if a1 + a2 + a3 is even • c1 = 1 if a1 + a2 + a3 is odd • c2 = 0 if a1 + a3 + a4 is even • c2 = 1 if a1 + a3 + a4 is odd • c3 = 0 if a2 + a3 + a4 is even • c3 = 1 if a2 + a3 + a4 is odd
An Alternative System If we want to have a system that has enough code words for the entire alphabet, we need to have 5 message digits: a1, a2, a3, a4, a5 We will also need more check digits to help us decode our message: c1, c2, c3, c4
Rules for the New System We can’t use the circles to determine the check digits for our new system, so we use the parity notation from before c1 is the parity of a1 + a2 + a3 + a4 c2 is the parity of a2 + a3 + a4 + a5 c3 is the parity of a1 + a2 + a4 + a5 c4 is the parity of a1 + a2 + a3 + a5
Making the Code Using 5 digits in our message gives us 32 possible messages, we’ll use the first 26 to represent letters of the alphabet On the next slide you’ll see the code itself, each letter together with the 9 digit code representing it
Using the Code Now that we have our code, using it is simple When we receive a message, we simply look it up on the table But what happens when the message we receive isn’t on the list? Then we know an error has occurred, but how do we fix it? We can’t use the circle method anymore
Beyond Circles Using this new system, how do we decode messages? Simply compare the (incorrect) message with the list of possible correct messages and pick the “closest” one What should “closest” mean? The distance between the two messages is the number of digits in which they differ
The Distance Between Messages • What is the distance between 1100101 and 1010101? • The messages differ in the 2nd and 3rd digits, so the distance is 2 • What is the distance between 1110010 and 0001100? • The messages differ in all but the 7th digit, so the distance is 6
Hamming Distance • Def: The Hamming distance between two vectors of a vector space is the number of components in which they differ, denoted d(u,v).
Hamming Distance • Ex. 1: The Hamming distance between v = [ 1 0 1 1 0 1 0 ] u = [ 0 1 1 1 1 0 0 ] d(u, v) = 4 • Notice: d(u,v) = d(v,u)
Hamming weight of a Vector • Def: The Hamming weight of a vector is the number of nonzero components of the vector, denoted wt(u).
Hamming weight of a code • Def: The Hamming weight of a linear code is the minimum weight of any nonzero vector in the code.
Hamming Weight • The Hamming weight of v = [ 1 0 1 1 0 1 0 ] u = [ 0 1 1 1 1 0 0 ] w = [ 0 1 0 0 1 0 1 ] are: wt(v) = 4 wt(u) = 4 wt(w) = 3
Nearest-Neighbor Decoding The nearest neighbor decoding method decodes a received message as the code word that agrees with the message in the most positions
Trying it Out Suppose that, using our alphabet code, we receive the message 010100011 We can check and see that this message is not on our list How far away is it from the messages on our list?
Fixing the Error Since 010100001 was closest to the message that we received, we know that this is the most likely actual transmission We can look this corrected message up in our table and see that the transmitted message was (probably) “K” This might still be incorrect, but other errors can be corrected using context clues or check digits
Distances From 1010 110 • The distances between message “1010 110” and all possible code words: