Combinatorics

Combinatorics Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain more than 1 zero? (Important to algorithm analysis )

Which areas of CS need all this? • Information Theory • Data Storage, Retrieval • Data Transmission • Encoding

Lossless Compression: 25.888888888 => 25.[9]8 Lossy Compression: 25.888888888 => 26 Lossless compression: exploit statistical redundancy. For example, In English, letter “e” is common, but “z” is not. And you never have a “q” followed by a “z”. Drawback: not universal. If no pattern - no compression. Lossy compression (e.g. JPEG images). Data Compression

quantifies, in the sense of an expected value, the information contained in a message. Example 1: A fair coin has an entropy of 1 [bit]. If the coin is not fair, then the uncertainty is lower (if asked to bet on the next outcome, we would bet preferentially on the most frequent result) => the Shannon entropy is lower than 1. Example 2: A long string of repeating characters: S=0 Example 3: English text: S ~ 0.6 to 1.3 Information (Shannon) Entropy The source coding theorem: as the length of a stream of independent and identically-distributed random variable data tends to infinity, it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without information loss.

Information (Shannon) Entropy Cont’d The source coding theorem: It is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without information loss. (works in the limit of large length of a stream of independent and identically-distributed random variable data )

Problem: “Random Play” on your I-Touch works like this. When pressed once, it plays a random song from your library of N songs. The song just played is excluded from the library. Next time “Random Play” is pressed, it draws another song at random from the remaining N-1 songs. Suppose you have pressed “Random Play” k times. What is the probability you will have heard your one most favorite song? Combinatorics cont’d

But first, let’s be ready to check your solution once we find it, using “extreme” case test.

The “Random Play” problem. • Tactics. Get your hands dirty. • P(1) = 1/N. P(2) = ? Need to be careful: what if the song has not played 1st? What if it has? Becomes complicated for large N. Let’scompute the complement: the probability the song has NOT played: • Table. Press # (k) vs. P. 1 1 - 1/N 2 1 - 1/(N-1) k 1- 1/(N-k +1) • Key Tactics: find complimentary probability P(not) = (1 - 1/N)(1 - 1/(N-1))*…(1- 1/(N-k +1)); then P(k) = 1 - P(not). • Re-arrange. P(not) = (N-1)/N * (N-2)/ (N-1) * (N-3)/(N-2)*…(N-k)/(N-k+1) = (N-k)/N. Thus, P = 1 - P(not) = k/N. • If you try to guess the solution, make sure your guess works for simple cases when the answer is obvious. E.g. k=1, k=N. Also, P <= 1. • The very simple answer suggests that a simpler solution may be possible. Can you find it? Combinatorics cont’d

The “clever” solution. K N favorite

What is DNA? • All organisms on this planet are made of the same type of genetic blueprint. • Within the cells of any organism is a substance called DNA which is a double-stranded helix of nucleotides. • DNA carries the genetic information of a cell. • This information is the code used within cells to form proteins and is the building block upon which life is formed. • Strands of DNA are long polymers of millions of linked nucleotides.

Graphical Representation of inherent bonding properties of DNA

DNA information content • How much data (Mb? ) is in human genome?

DNA in bits and bytes: • 1 bit = single “0” or “1”. That is 1 bit = 2 possibilities. • 2 bits = ?

DNA in bits and bytes: • 1 bit = single “0” or “1”. That is 1 bit = 2 possibilities. • 2 bits = 2 x 2 = 4 possibilities: 00, 01, 10, 11. • 3 bits = ?

DNA in bits and bytes: • 1 bit = single “0” or “1”. That is 1 bit = 2 possibilities. • 2 bits = 2 x 2 = 4 possibilities: 00, 01, 10, 11. • 3 bits = 2x2x2 = 8. • In general n bits = 2^n possibilities.

DNA bits: • 4 possibilities for a single position (e.g. 1st letter) = 2 bits. That is 2 bits per ”letter”.

DNA bits: • 4 possibilities for a single position (e.g. 1st letter) = 2 bits. That is 2 bits per letter.

DNA bits: • 4 possibilities for a single position (e.g. 1st letter) = 2 bits. That is 2 bits per letter (bp). • A single byte = 8 bits (by definition). • That is 4 bp = 1 byte. • Human genome ~ 3*109 letters. • Human genome = 3*109 bp/4 (bytes/letter) ~ 750 Mb • Is this a lot or a little?

750 Mb of DNA code • = a 15 min long movie clip (HD). • Does that make sense? • Difference between John and Jane?

750 Mb of the DNA code • = a 15 min long movie clip (HD). • Does that make sense? • Difference between John and Jane? About 1% difference in DNA letters, that is about 7.5 Mb of data = about 2 songs. (or a single one by Adele).

750 Mb of the DNA code • = a 15 min long movie clip (HD). • Difference between John and Jane? About 1% difference in DNA letters, that is about 7.5 Mb of data = about 2 songs. (or a single one by Adele). • Does not make sense. There must be more info stored someplace else.

Problem: DNA sequence contains only 4 letters (A,T,G and C). Short “words” made of K consecutive letters are the genetic code. Each word (called “codon”, ) codes for a specific amino-acid in proteins. For example, ATTTC is a5 letter word. There are a total of 20 amino-acids. Prove that a working genetic code based on a fixed K is degenerate, that is there are amino-acids which are coded for by more than one “word”. It is assumed that every word codes for an amino-acid Combinatorics Cont’d

Solution. Start with the heuristics that almost always helps: simplify. Consider K=1, K=2, …

How can you use the solution in an argument • For intelligent design • Against intelligent design

Permutations • How many three digit integers (decimal representation) are there if you cannot use a digit more that once?

P(n,r) • The number of ways a subset of r elements can be chosen from a set of n elements is given by

Theorem • Suppose we have n objects of k different types, with nk identical objects of the kth type. Then the number of distinct arrangements of those n objects is equal to Visualize: nk identical balls of color k. total # of balls = n

Combinatorics Cont’d Example: What is the number of letter permutations of the word BOOBOO? The Mississippi formula. 6!/(2! 4!)

Permutations and Combinations with Repetitions • How many distinct arrangements are there of the letters in the word MISSISSIPPI?Each of the 11 letters in this word has a unique color (The example shows only 4 letters “I” having different colors.) Each arrangement must still read the same word MISSISSIPPI.

The “shoot the hoops” wager. Google interview • You are offered one of the two wagers: (1) You make one hoop in one throw or (2) At least 2 hoops in 3 throws. You get $1000 if you succeed. Which wager would you choose?

Warm-up Use our “test the solution” heuristic to immediately see thatP(2/3) so calculated is wrong.

The hoops wager. Find P(2/3):

The hoops wager. Compare P and P(2/3) Notice: if p <<1, p is always greater than 3p2 – 2p3, therefore if you are a poor shot like myself, than you want to choose the 1st wager, 1 out of 1. At p=1/2 p(1/1) = p(2/3). But at p> ½, 3p2 – 2p3 > p, and so, if you are LeBron James you are more likely to get $1000 if you go with the 2nd wager, 2 out of 3.

Combinatorics

Combinatorics

Presentation Transcript

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

COMBINATORICS

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics

Combinatorics