1 / 62

EEL 4930 §6 / 5930 §5, Spring ‘06 Physical Limits of Computing

http://www.eng.fsu.edu/~mpf. EEL 4930 §6 / 5930 §5, Spring ‘06 Physical Limits of Computing. Slides for a course taught by Michael P. Frank in the Department of Electrical & Computer Engineering. Course Introduction Moore’s Law vs. Modern Physics Foundations

jpalmquist
Download Presentation

EEL 4930 §6 / 5930 §5, Spring ‘06 Physical Limits of Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. http://www.eng.fsu.edu/~mpf EEL 4930 §6 / 5930 §5, Spring ‘06Physical Limits of Computing Slides for a course taught byMichael P. Frankin the Department of Electrical & Computer Engineering

  2. Course Introduction Moore’s Law vs. Modern Physics Foundations Required Background Material in Computing & Physics Fundamentals The Deep Relationships between Physics and Computation IV. Core Principles The two Revolutionary Paradigms of Physical Computation V. Technologies Present and Future Physical Mechanisms for the Practical Realization of Information Processing VI. Conclusion Physical Limits of ComputingCourse Outline Currently I am working on writing up a set of course notes based on this outline,intended to someday evolve into a textbook M. Frank, "Physical Limits of Computing"

  3. Part II. Foundations • This first part of the course quickly reviews some key background knowledge that you will need to be familiar with in order to follow the later material. • You may have seen some of this material before. • Part II is divided into two “chapters:” • Chapter II.A. The Theory of Information and Computation • Chapter II.B. Required Physics Background M. Frank, "Physical Limits of Computing"

  4. Chapter II.A. The Theory of Information and Computation • In this chapter of the course, we review a few important things that you need to know about: • §II.A.1. Combinatorics, Probability, & Statistics • §II.A.2. Information & Communication Theory • §II.A.3. The Theory of Computation M. Frank, "Physical Limits of Computing"

  5. Section II.A.2. The Theory of Information and Communication • This section is a gentle introduction to some of the basic concepts of information theory • also known as communication theory. • Sections: • (a) Basic Concepts of Information • (b) Quantifying Information • (c) Information vs. Entropy • (d) Communication Channels • Later in the course, we will describe Shannon’s famous theorems concerning the fundamental limits of channel capacity. • As well as some newer, more general quantum limits on classical and quantum communication. M. Frank, "Physical Limits of Computing"

  6. Subsection II.A.2.a: Basic Concepts of Information Etymology of “Information”Various Senses of “Information”Information-Related Concepts

  7. Etymology of “Information” • Earliest historical usage in English (from Oxford English Dictionary): • The act ofinforming, • As in education, instruction, training. • “Five books come down from Heaven for information of mankind.” (1387) • Or a particular item of training, i.e., a particular instruction. • “Melibee had heard the great skills and reasons of Dame Prudence, and her wise informations and techniques.” (1386) • Derived by adding the action noun ending “–ation” (descended from Latin’s “–tio”) to the pre-existing verb to inform, • Meaning to give form (shape) to the mind; • to discipline, instruct, teach: • “Men so wise should go and inform their kings.” (1330) • And inform comes from Latin informare, derived from noun forma (form), • Informare means to give form to, or to form an idea of. • Latin also even already contained the derived word informatio, • meaning concept or idea. • Note: The Greek words είδος (eídos) and μορφή (morphé), • Meaning form, or shape, • were famously used by Plato (& later Aristotle) in a technical philosophical sense, to denote the true identity or ideal essence of something. • We’ll see that our modern concept of physical information is not too dissimilar! M. Frank, "Physical Limits of Computing"

  8. Information: Our Definition • Information is that which distinguishes one thing (entity) from another. • It is (all or part of) an identification or description of the thing. • It is a specification of (some or all of) the thing’s properties or characteristics. • We can consider that every thing carries or embodies a complete description of itself. • It does this simply in virtue of its own being, its own existence. • In philosophy, this inherent description is called the entity’s form or constitutive essence. M. Frank, "Physical Limits of Computing"

  9. Specific Senses of Information • But, let us also take care to distinguish between the following uses of “information”: • A form orpattern of information: • An abstract configuration of information, as opposed to a specific instantiation. • Many separate instances of information contained in separate objects may have identical patterns, or content. • We may say that those instances are copies of each other. • An instance orcopy of information: • A specific instantiation (i.e., as found in a specific entity) of some general form. • A holder or slot orlocation for storing information: • An indefinite or variable (mutable) instance of information (or place where instances may be) that may take on different forms at different times or in different situations. • A wraith (pulse?cloud?)of information: • A physical state or set of states, dynamically changing over time. • A moving, constantly-mutating instance of information, where the container of that information may even transition from one physical system to another. • A stream of information • An indefinitely large instance of information, extended over time • Apiece of information: • All or a portion of a pattern, instance, wraith, or stream of information. • A nugget of information: • A piece of information, together with an associated semantic interpretation of that information. • A nugget is often implicitly a valuable, important fact (the meaning of a fact is a true statement). • An amount or quantity of information: • A quantification of how large a given pattern, instance, wraith, piece or nugget of information is. • Measured in logarithmic units, applied to the number of possible patterns. M. Frank, "Physical Limits of Computing"

  10. Information-related concepts • It will also be convenient to discuss the following: • A container or embodiment of information: • A physical system that contains some particular instance of, placeholder for, or pulse of information. (An embodiment contains nothing but that.) • A symbol or message: • A form or instance of information or its embodiment produced with the intent that it should convey some specific meaning, or semantic content. • A message is typically a compound object containing a number of symbols. • An interpretationormeaning of information: • A particular semantic interpretation of a form (pattern of information), tying it to potentially useful facts of interest. • May or may not be the originally intended meaning! • A representation of information: • An encoding of one pattern of information within some other (frequently larger) pattern. • The representation goes according to some particular language or code. • A subject of information: • An entity that is identified or described by a given pattern of information. • May be abstract or concrete, mathematical or physical M. Frank, "Physical Limits of Computing"

  11. Information Concept Map Meaning (interpretationof information) Has a Nugget (valuablepiece of information) Describes, identifies Representedby Interpretedto get Has a Measures Quantity ofinformation Thing (subjector embodiment) Form (pattern ofinformation) Measures size of Has a Measures Maybe a Instantiatedby/in Instantiates,has Measures Forms, composes Contains, carries, embodies Has a changing Wraith (dynamic body of information,changing cloud of states) Physicalentity orsystem Instance (copyof a form) Can move from one to another M. Frank, "Physical Limits of Computing"

  12. Example: A Byte in a Register • The bit-sequence “01000001” is a particular form. • Suppose there is an instance of this particular pattern of bits in a certain machine register in the computer on my desk. • The register hardware is a physical system that is a container of this particular instance. • The physical subsystem delimited by the high and low ranges of voltage levels on the register’s storage nodes embodies this sequence of bits. • The register could hold other forms as well; it provides a holder that can contain an instance of any form that is a sequence of 8 bits. • When the register is erased later, the specific wraith of information that it contained will not be destroyed, but will only be released into the environment. • Although in an altered form which is scrambled beyond all hope of recognition. • The instance of 01000001 contained in the register at this moment happens to be intended to represent the letter “A” (which is another form, and is a symbol) • The meaning of this particular instance of the letter A is that a particular student’s grade in this class (say, Joe Smith’s) is an A. • The valuable nugget of information which is the fact that “Joe has an A” is also represented in this register. • The subject of this particular piece of information is Joe’s grade in the class. • The quantity of information contained in the machine register is Log(256) = 1 byte = 8 bits, because the slot could hold any of 256 different forms (bit patterns). • But in the context of my grading application, the quantity of information contained in the message “A” is only Log(5) = 2.32 bits, since only 5 grades (A,B,C,D,F) are allowed. • The size of the form “A” is 8 bits in the context of the encoding being used. M. Frank, "Physical Limits of Computing"

  13. Subsection II.A.2.b: Quantifying Information Capacity of Compound Holders Logarithmic Information Measures Indefinite Logarithms Logarithmic Units

  14. Quantifying Information • One way to quantify forms of information is to try to count how many distinct ones there are. • Unfortunately, the number of all conceivable forms is infinite. • However, we can count the forms in particular finite subsets… • Consider a situation defined in such a way that a given information holder (in the context of that situation) can only take on forms that are chosen from among some definite, finite number N of possible distinct forms. • One way to try to characterize the informational size or capacity of the holder (the amount of information in it) would then be to simply specify the value of N, the number of forms it could have. • This would describe the amount of variability of its form. • However, the raw number N by itself does not seem to have the right mathematical properties to characterize the size of the holder in an intuitive way… • Intuition tells us that the capacity of holders should be additive… • E.g., it is intuitively clear that two pages of a book should be able to hold twice as much information as one page. M. Frank, "Physical Limits of Computing"

  15. Compound Holders • Consider a holder of information C that is composed by taking two separate and independent holders of information A, B, and considering them together as constituting a single compound holder of information. • Suppose now also that A has N possible forms, and that B has M possible forms. • Clearly then, due to the product rule of combinatorics, C as a whole has N·Mpossible distinct forms. • Each is obtained by assigning a form to Aand a form to B independently. • But should the size of the holder C be the product of the sizes of A and B? • It would seem more natural to say sum,so that “the whole is the sum of the parts.” • How can we arrange for this to be true? Holder C: Has N·M forms Holder A Holder B N possibleforms M possibleforms M. Frank, "Physical Limits of Computing"

  16. Measuring Information with Logarithms • Fortunately, we can convert the product to a sum by using logarithmic units for measuring information. • Due to the rule about logarithms that log(N·M) = log(N) + log(M). • So, if we declare that the sizeor capacityor amount of information ina holder of information is defined to be the logarithm of the number of different forms it can have, • Then we can say, the size of the compound holder Cis the sum of the sizes of the sub-holders A and B that it comprises. • Only problem: What base do we use for the logarithm here? • Different bases would give different numeric answers. • Any base could be chosen by convention, but would be arbitrary. • A choice of a particular base amounts to choosing an information unit. • Arguably, the most elegant answer is: • Leave the base unspecified, and declare that an amount of information is not a number, but rather is a dimensioned indefinite logarithm quantity. M. Frank, "Physical Limits of Computing"

  17. Indefinite Logarithms • Definition.Indefinite logarithm. For any real number x>0, let the indefinite logarithm of x, written Log[x], be defined as: Log[x] = λb.logbx • In other words, the value of Log[x] is a function object with one argument (b), where this function takes any value of the base b (>1) and returns the resulting value of logb x. • E.g., Log[256] = λb.logb 256 (a function of 1 argument), • So for example, Log[256](2) = 8 and Log[256](16) = 2 • Sums, negations, and scalar multiples of indefinite logarithm objects can be defined quite naturally, • by simply operating on the value of the lambda-function in the corresponding way. • See the paper “The Indefinite Logarithm, Logarithmic Units, and the Nature of Entropy” in the readings for details. (using lambda- calculus notation) M. Frank, "Physical Limits of Computing"

  18. Indefinite Logarithms as Curves • The object Log[N] can also be identified with the entire curve or graph (point set) {(b, logbN) | b > 1}. Note: The Log[1]curve is 0 every-where, the Log[4]curve is twice ashigh as the Log[2]curve, Log[10] isLog[2]+Log[5] andis also equal to (log210)Log[2] =3.322Log[2]… In general, each curve is just some constant multiple of each other curve! Larger valuesof the argument N… Log[10] Log[4] Log[3] Log[2] Log[1] M. Frank, "Physical Limits of Computing"

  19. Indefinite Exponential • The inverse of the indefinite logarithm function could be called the indefinite exponential. • Definition.Indefinite exponential. Given any indefinite logarithm object L, the indefinite exponential of L, written Exp[L], is defined byExp[L] :≡ bL(b) where b > 0 may be any real number. • This definition is meaningful because all values of b will give the same result x, since for any b, we have that where x is the unique real number such that L=Log[x]. • Thus, Exp[Log[x]] =x and Log[Exp[L]] = L always. M. Frank, "Physical Limits of Computing"

  20. Logarithmic Quantities & Units • Theorem. Any given indefinite-logarithm quantity Log[x] is equal to a scalar multiple of any fixed indefinite-logarithm quantity, called a logarithmic unitLu = Log[u], where u can be any standard chosen base >0, and where the scalar coefficient is logux. Or, in symbols, Log[x] = (logux)Lu. • Example: Let the logarithmic unit b = L2 = Log[2] be called the bit (binary digit). • Then Log[16] = 4 Log[2] = 4b (4 bits). M. Frank, "Physical Limits of Computing"

  21. Some Common Logarithmic Units • DecibeldB = L1.2589… = Log[100.1] • Binary digit or bitb = L2 = Log[2] • A.k.a. the octave when used to measure tonal intervals in music. • Neper or natn = Le = Log[e] • A.k.a. Boltzmann’s constantk in physics, as we will see! • Octal digito = L8 = 3L2 = Log[8] • Bel or decimal digitd = L10 = 10 dB = Log[10] • A.k.a. order of magnitude, power of ten, decade, Richter-scale point. • Nibble or hex digith = L16 = 4L2 = Log[16] • Byte or octetB = L256 = 8L2 = Log[256] • Kilobitor really kibibitkb = Log[21,024] • Joule per KelvinJ/K = Log[103.14558e22] (roughly) • Units of physical entropy are equivalent to indefinite-logarithm units! M. Frank, "Physical Limits of Computing"

  22. Conversions Between Different Logarithmic Units • Suppose we are given a logarithmic quantity Q expressed as a multiple of some logarithmic unit La (that is, Q=caLa where ca is a number), • and suppose that we wish to re-express Q in terms of some other logarithmic unit Lb, i.e., as Q = cbLb. • The ratio between the two logarithmic units La=Log[a] and Lb=Log[b] is La/Lb = logba. • So, cb = Q/Lb = caLa/Lb = ca·logba. • And so Q = (ca·logba) Lb. • Example. How many nats are there in a byte? • The quantity to convert is 8 bits, Q = 1B = 8b = 8·L2. • The answer should be in units of nats, n = Le. • Thus, Q = (8·loge 2)Le = (8·0.693) n = 5.55 nats. M. Frank, "Physical Limits of Computing"

  23. Capacity of a Holder of Information After all of that, we can now make the following definition… • Definition.The (information) capacityCof (or the amount of information I contained in) a holder of information H that has N possible forms is defined to be the indefinite logarithm of N, that is, CH = Log[N]. • Like any indefinite logarithm quantity, CH can be expressed in any of the units previously discussed: bits, nats, etc. • Example. A memory chip manufacturer develops a memory technology in which each memory cell (capacitor) can reliably hold any of 16 distinguishable logic voltage levels. (For example, 0V, 0.1V, 0.2V, …, 1.5V.) Therefore, what is the information capacity of each cell (that is, of its logical macrostate, as defined by this set of levels), when expressed both in bits, and in kB units (nats)? • Answer.N=16, so Ccell = Log[16] • Log[16] = (log2 16)·Log[2] = 4·Log[2] = 4 bits • Log[16] = (loge 16)·Log[e] = 2.77…·Log[e] ≈ 2.8 kB M. Frank, "Physical Limits of Computing"

  24. Subsection II.A.2.c: Information and Entropy Complexity of a Form Optimal Encoding of a Form Entropy of a Probability Distribution Marginal & Conditional Entropy Mutual Information

  25. Quantifying The Size of a Form • In the previous subsection, we managed to quantify the size or capacity of a holder of information, as the indefinite logarithm of the number of different forms that it could have. • But, that doesn’t tell us how we might quantify the size or complexity of a specific form, by itself. • What about saying “The size of a given form is the capacity of the smallest holder that could have that form?” • The problem with that definition is: • We can always imagine a holder constrained to only have that form and no other. • Its capacity would be Log[1] = 0. • So, with this definition the size of all forms would be 0. • Another idea: Let’s measure the size of a form in terms of the number of small, unit-size pieces of information that it takes to describe it. • To do this, we need some language that we can use to represent forms, e.g., an encoding of forms in terms of sequences of symbols. • Given a language, we can then say that the size or informational complexityK of a given form is the length of the shortest symbol string that represents it in our chosen language. • Its maximally compressed description. • At first, this definition seems pretty ambiguous, but… M. Frank, "Physical Limits of Computing"

  26. Why Complexity is Meaningful • In their algorithmic information theory, Kolmogorov and Chaitin showed that informational complexity is almost language-independent, up to a fixed (language-dependent) additive constant. • In the case of universal (Turing-complete) languages. • Also, whenever we have a probability distribution over a set of possible forms, Shannon showed us how to choose an encoding of the forms that minimizes the expected size of the codeword instance that is needed. • This choice of encoding then minimizes the expected complexity of the forms under the given distribution. • If such a probability distribution is available, we can then assume that the language has been chosen appropriately so as to minimize the expected length of the form’s shortest description. • We define this minimized expected complexity to be the entropy of the system under the given distribution. M. Frank, "Physical Limits of Computing"

  27. Definition of Entropy • The definition of entropy which we just stated is very important, and well worth repeating. • Definition.Entropy. Given a probability distribution over a set of forms, the entropy of that distribution is the expected form complexity, according to whatever encoding of forms yields the smallest expected complexity. • The definition can be applied to also define the entropy of information holders, wraiths, etc. that have given probability distributions over their possible forms. M. Frank, "Physical Limits of Computing"

  28. The Optimal Encoding • Suppose a specific form F has probability p. • Thus, it has improbability i = 1/p. • Note that this is the same probability that F would have if it were one of i equally-likely forms. • We saw earlier that a holder of information having i possible forms is characterized as containing a quantity of information Log[i]. • So, it seems reasonable to declare that the complexity K of the form F itself is, in fact, K(F) =Log[i] = Log[1/p] = −Log[p]. • This suggests that in the optimal encoding language, the description of the form F could be held in a holder of that capacity. • In his Mathematical Theory of Communication (1949) Claude Shannon showed that in fact this is exactly correct, • In an asymptotic limit where we permit ourselves to consider encodings in which many similar systems (whose forms are chosen from the same distribution) are described together. • Modern block-coding schemes (turbo codes, etc.) in fact closely approach Shannon’s ideal encoding efficiency. M. Frank, "Physical Limits of Computing"

  29. Optimal Encoding Example • Suppose a system has four forms A, B, C, D with the following probabilities: • p(A)=½, p(B)=¼, p(C)=p(D)=1/8. • Note that the probabilities sum to 1, as they must. • Then the corresponding improbabilities are: • i(A)=2, i(B)=4, i(C)=i(D)=8. • And the form sizes (log-improbabilities) are: • K(A) = Log[2] = 1 bit,K(B) = Log[4] = 2 Log[2] = 2 bits,K(C) = K(D) = Log[8] = 3 Log[2] = 3 bits. • Indeed, in this example, we can encode the forms using bit-strings of exactly these lengths, as follows: • A=0, B=10, C=110, D=111. • Note that this code is self-delimiting; • the symbols can be concatenated together without ambiguity. 0 1 A 0 1 B 0 1 C D M. Frank, "Physical Limits of Computing"

  30. The Entropy Formula • Naturally, if we have a probability distribution over the possible forms F of some system (holder of information), • We can easily calculate the expected complexity K of the system’s form, which is the entropy S of the system. • This is possible since K itself is a random variable, • a function of the event that the system has a specific form F. • The entropy S of the system is then: • We can also view this formula as a simple additive sum of the entropy contributions s = pK = p·Log[p−1] arising from the individual forms. • The largest single contribution to entropy comes from individual forms that have probability p = 1/e, in which case s = Log[e]/e ≈ .531 bits. • The entropy formula is often credited to Shannon, but it was already known & was being used by Boltzmann in the 1800’s. M. Frank, "Physical Limits of Computing"

  31. Visualizing the Contributions toEntropy in a Probability Distribution Contribution to s in M. Frank, "Physical Limits of Computing"

  32. Known vs. Unknown Information • We can consider the informational capacity I = Log[N] of a holder that is defined as having N possible forms as telling us the total amount of information that the holder contains. • Meanwhile, we can consider its entropy S = Log[i(f)] as telling us how much of the total information that it contains is unknown to us. • How much unknown informationthe holder contains, In the perspective specified by the distribution p(). • Since S ≤ I, we can also define the amount of known information (hereby dubbed extropy) contained in the holder as X = I − S. • Note that our probability distribution p() over the holder’s form could change (if we gain or lose knowledge about it), • Thus, the holder’s entropy Sand extropy X may also change. • However, note that the total informational size of a given holder, I = Log[N]= X + S, always still remains a constant. • Entropy and extropy can be viewed as two forms of information, which can be converted to each other, but whose total amount is conserved. M. Frank, "Physical Limits of Computing"

  33. Information/Entropy Example • Consider a tetrahedral die which maylie on any of its 4 faces labeled 1,2,3,4: • We say that the answer to the question “Which side is down?” is a holder of information having 4 possible forms. • Thus, the total amount of information contained in this holder, and in the orientation of the physical die itself, is Log[4] = 2 bits. • Now, suppose the die is weighted so that p(1)=½, p(2)=¼, and p(3)=p(4)=1/8 for its post-throw state. • Then K(1)=1b, K(2)=2b, and K(3)=S(4)=3b. • The entropy of the holder is then S = 1.75 bits. • This much, this much information remains unknown to us before we have taken a look at the thrown die. • The extropy (known information) is already X = 0.25 bits. • Exactly one-fourth of a bit’s worth of knowledge about the outcome is already expressed by this specific probability distribution p(). • This much information about the die’s state is already known to us even before we have looked at it. M. Frank, "Physical Limits of Computing"

  34. Holder=Variable, Form=Value, and Types of Events. • A holder corresponds to a variable V. • Also associated with a set of possible values {v1,v2,…}. • Meanwhile, a form corresponds to a valuev of that variable. • A primitive event is a proposition that assigns a specific form v to a specific holder, V=v. • I.e., a specific value to a specific variable. • A compound event is a conjunctive proposition that assigns forms to multiple holders, • E.g., V=v, U=u, W=w. • A general event is a disjunctive set of primitive and/or compound events. • Essentially equivalent to a Boolean combination of assignment propositions. M. Frank, "Physical Limits of Computing"

  35. Four Concepts to Distinguish • A set corresponds to (“is” a): • System, state space, sample space, situation space, outcome space. • A partitioning of the set is a: • Subsystem, state variable, mutex/ex set of events… • A section of the partitioning, or a subset of the set, is a: • Subsystem state, macrostate, value of variable, event, abstract proposition… • An individual element is: • System configuration, microstate, primitive event, complete outcome. M. Frank, "Physical Limits of Computing"

  36. Entropy of a Binary Variable Below, little s of an individual form or probability denotesthe contribution to the total entropy of a form with that probability. Maximum s(p) = (1/e) nat = (lg e)/e bits = .531 bits @ p = 1/e = .368 M. Frank, "Physical Limits of Computing"

  37. Proof that a form w. improbability econtributes the most to the entropy • Let’s find the slope of the s curve… • Take the derivative, using standard calculus… • But now, what’s the derivative of an indefinite logarithm quantity like Log[p−1]? • Let’s rewrite Log[p−1] as k ln p−1 (where the constant k = Le = Log[e] is the indefinite log of e), so then… • Plugging this is to the earlier equation, we get • Now just set this to 0 and solve for p: M. Frank, "Physical Limits of Computing"

  38. Joint Distributions over Two Holders • Let X, Y be two holders, each with many forms {x1, x2, …} and {y1, y2, …}. • Let xy represent the compound event X=x,Y=y. • Note: the set of all xys is a mutually exclusive and exhaustive set. • Suppose we have available a joint probability distribution p(xy) over the compound holder XY. • This then implies the reduced or marginal distributions p(x)=∑yp(xy) and p(y)=∑xp(xy). • We also thus have conditional probabilities p(x|y) and p(y|x), according to the usual definitions. • And we have mutual probability ratios r(xy). M. Frank, "Physical Limits of Computing"

  39. Joint, Marginal, Conditional Entropyand Mutual Information • The joint entropy S(XY) = Log[i(xy)]. • The (prior, marginal or reduced) entropy S(X) = S(p(x)) = Log[i(x)]. Likewise for S(Y). • The entropy of each subsystem, taken by itself. • Entropy is subadditive: S(XY) ≤ S(X) + S(Y). • The conditional entropy S(X|Y) = Exy[S(p(x|y))] • The expected entropy after Y is observed. • Theorem:S(X|Y) = S(XY) − S(Y). Joint entropy minus that of Y. • The mutualinformation I(X:Y) = Log[r(xy)]. • We will also prove Theorem:I(X:Y) = S(X) − S(X|Y). • Thus the mutual information is the expected reduction of entropy in either subsystem as a result of observing the other. M. Frank, "Physical Limits of Computing"

  40. Conditional Entropy Theorem The conditional entropy of X given Y is the joint entropy of XYminus the entropy of Y. M. Frank, "Physical Limits of Computing"

  41. Mutual Information is Mutual Reduction in Entropy And likewise, we also have I(X:Y) = S(Y) − S(Y|X), since the definition of I is symmetric. I(X:Y) = S(X) + S(Y) − S(XY) Also, M. Frank, "Physical Limits of Computing"

  42. Visualization of Mutual Information Mutual InformationI(X,Y) • Let the total length of the bar below represent the total amount of entropy in the system XY. S(Y|X) = conditional entropy of Y given X` S(X) = entropy of X S(XY) = joint entropy of X and Y S(X|Y) = conditional entropy of X given Y` S(Y) = entropy of Y M. Frank, "Physical Limits of Computing"

  43. Example 1 • Suppose the sample space of primitive events consists of 5-bit strings B=b1b2b3b4b5. • Chosen at random with equal probability (1/32). • Let variable X=b1b2b3b4, and Y=b3b4b5. • Then S(X) = ___ bits, and S(Y) = ___ b. • Meanwhile S(XY) = ___ b. • Thus S(X|Y) = ___ b, and S(Y|X) = ___ b • And so I(X:Y) = ___ b. 4 3 5 2 1 2 M. Frank, "Physical Limits of Computing"

  44. Example 2 • Let the sample space A consist of the 8 letters {a,b,c,d,e,f,g,h}. (All equally likely.) • Let X partition A into x1={a,b,c,d} and x2={e,f,g,h}. • & Y partitions A into y1={a,b,e}, y2={c,f}, y3={d,g,h}. • Then we have: • S(X) = 1 bit. • S(Y) = 2(3/8 log 8/3) + (1/4 log 4) = 1.561278 bits • S(Y|X) = (1/2 log 2) + 2(1/4 log 4) = 1.5 bits. • I(X:Y) = 1.561278b − 1.5b = .061278 b. • S(XY) = 1b + 1.5b = 2.5 b. • S(X|Y) = 1b − .061278b = .938722 b. Y a b c d X e f g h (Meanwhile, the total information content of the sample space = log 8 = 3 bits) M. Frank, "Physical Limits of Computing"

  45. Effective Entropy? • In many situations, using the ideal Shannon compression may not be feasible in practice. • E.g., too few instances, block coding not available, no source model • Or, a short algorithmic description of a given form might exist, but it might be infeasible to compute it • However, given the following: • A holder with an associated set of forms • A probability distribution over the forms • A particular encoding strategy • E.g., an effective (short run-time) compression algorithm • we can define the effective entropy of the holder in this situation to be the expected compressed size of its encoded form, as compressed by the available algorithm. • This then is the definition of what the entropy can be considered to be “for practical purposes” given the capabilities in that situation. M. Frank, "Physical Limits of Computing"

  46. Subsection II.A.2.d: Communication Channels Shannon’s Paradigm Channel Capacity Shannon’s Theorems

  47. Communication Theory • Shannon’s Ph.D. thesis Mathematical Theory of Communication (1948) is the seminal work that established the field of “Communication Theory.” • a.k.a. “Information Theory.” • It deals with the theory of noiseless and noisy communication channels for transmitting messages consisting of sequences of symbolschosen from a probability distribution. • Where the “channel” can be any medium or process for communicating information through space and/or time. • Shannon proves (among other things) that every channel has a certain capacity for transmitting information, and that this capacity is related to the entropy of the source and channel probability distributions. • At rates less than the channel’s capacity, coding schemes exist that can transmit information with an arbitrarily small probability of error. M. Frank, "Physical Limits of Computing"

  48. Shannon’s Paradigm • A communication systemis any system intended for communicating messages (nuggets of information) • Selected from among some set of possible messages. • Often, the set of possible messages must be astronomically large • In general, such a system will include the following six basic components: (1) Information Source (4) Noise Source (2) Transmitter (5) Receiver (3) Channel (6) Destination Noise Source Destination Information Source Receiver Transmitter Noise Channel Re-ceivedSignal Message Signal Message M. Frank, "Physical Limits of Computing"

  49. Discrete Noiseless Channels • A channel is simply any medium for the communication of signals, which carry messages. • Meaningful instances of information. • A discrete channel supports the communication of discrete signals consisting of sequences (or other kinds of patterns) made up of discrete (distinguishable) symbols. • There may be constraints on what sequences are allowed • If the channel is noiseless, we can assume that the signals are communicated exactly from transmitter to receiver. • Noisy channels will be addressed in a later part of the theory • The information transmission capacityC of a discrete noiseless channel can be defined as: • where t is the duration of the signal (in time) and N(t) is the number of mutually distinguishable signals of duration t. • This is just the asymptotic information capacity of the channel (considered as a container of information) per unit time. M. Frank, "Physical Limits of Computing"

  50. Ergodic Information Sources • In general, we can consider the information source to be producing a stream of information of unbounded length. • Even if the individual messages are short, we can always consider situations where there are unbounded sequences of such messages. • For the theory to apply, we must consider the source to be produced by an ergodic process. • This is a process for which all streams look statistically similar in the long run • In the limit of sufficiently long streams • A discrete ergodic process can be modeled by a Hidden Markov Model (HMM) with a unique stationary distribution. • An HMM is essentially just a Finite State Machine with nondeterministic transitions between states, and no input • But with output, which may be nondeterministic also • A stationary distributionis just a probability distribution over states that is an eigenfunction of the HMM’s transition probability matrix. M. Frank, "Physical Limits of Computing"

More Related