Exploring Basics of Information and Computation Theory

http://www.eng.fsu.edu/~mpf EEL 5930 sec. 5 / 4930 sec. 7, Spring ‘05Physical Limits of Computing Slides for a course taught byMichael P. Frankin the Department of Electrical & Computer Engineering

Module 2: Review of Basic Theory of Information & Computation Probability Information Theory Computation Theory

Outline of this Module Topics covered in this module: • Probability and statistics: Some basic concepts • Some basic elements of information theory: • Various usages of the word “information” • Measuring information • Entropy and physical information • Some basic elements of the theory of computation: • Universality • Computational complexity • Models of computation M. Frank, "Physical Limits of Computing"

Review of Basic Probability and Statistics Background Events, Probabilities, Product Rule, Conditional & Mutual Probabilities, Expectation, Variance, Standard Deviation

Probability • In statistics, an eventE is any possible situation (occurrence, state of affairs) that might or might not be the actual situation. • The proposition P = “the event E occurred” (or will occur) could turn out to be either true or false. • The probability of an event E is a real number p in the range [0,1] which gives our degree of belief in the proposition P, i.e., the proposition that E will/did occur, where • The value p = 0 means that P is false with complete certainty, and • The value p = 1 means that P is true with complete certainty, • The value p = ½ means that the truth value of P is completely unknown • That is, as far as we know, it is equally likely to be either true or value. • The probability p(E) is also the fraction of times that we would expect the event E to occur in a repeated experiment. • That is, on average, if the experiment could be repeated infinitely often, and if each repetition was independent of the others. • If the probability of E is p, then we would expect E to occur once for every 1/p independent repetitions of the experiment, on average. • We’ll call 1/p the improbabilityi of E. M. Frank, "Physical Limits of Computing"

Joint Probability • Let X and Y be events, and let XY denote the event that events X and Yboth occur together (that is, “jointly”). • Then p(XY) is called the joint probabilityof X and Y. • Product rule: If X and Y are independent events, then p(XY) = p(X) · p(Y). • This follows from basic combinatorics. • It can also be considered a definition of what it means for X and Y to be independent. M. Frank, "Physical Limits of Computing"

Event Complements, Mutual Exclusivity, Exhaustiveness • For any event E, its complement~E is the event that event E does not occur. • Complement rule:p(E) + p(~E) = 1. • Two events E and F are called mutually exclusive if it is impossible for E and F to occur together. • That is, p(EF) = 0. • Note that E and ~E are always mutually exclusive. • A set S = {E1, E2, …} of events is exhaustive if the event that some event in S occurs has probability 1. • Note that S = {E, ~E} is an exhaustive set of events. • Theorem: The sum of the probabilities of any exhaustive set S of mutually exclusive events is 1. M. Frank, "Physical Limits of Computing"

Conditional Probability • Let XY be the event that X and Y occur jointly. • Then the conditional probability of X given Y is defined by p(X|Y) :≡ p(XY) / p(Y). • It is the probability that if we are given that Y occurs, that X would also occur. • Bayes’ rule:p(X|Y) = p(X) · p(Y|X) / p(Y). r(XY) Space of possible outcomes Event Y Event XY Event X M. Frank, "Physical Limits of Computing"

Mutual Probability Ratio • The mutual probability ratio of X and Y is defined asr(XY) :≡ p(XY)/[p(X)p(Y)]. • Note that r(XY) = p(X|Y)/p(X) = p(Y|X)/p(Y). • I.e., r is the factor by which the probability of either X or Y gets boosted upon learning that the other event occurs. • WARNING: Some authors define the term “mutual probability” to be the reciprocal of our quantity r. • Don’t get confused! I call that “mutual improbability ratio.” • Note that for independent events, r = 1. • Whereas for dependent, positively correlated events, r > 1. • And for dependent, anti-correlated events, r < 1. M. Frank, "Physical Limits of Computing"

Expectation Values • Let S be an exhaustive set of mutually exclusive events Ei. • This is sometimes known as a “sample space.” • Let f(Ei) be any function of the events in S. • This is sometimes called a “random variable.” • The expectation value or “expected value” or norm of f, written Ex[f] or f, is just the mean or average value of f(Ei), as weighted by the probabilities of the events Ei. • WARNING: The “expected value” may actually be quite unexpected, or even impossible to occur! • It’s not the ordinary English meaning of the word “expected.” • Expected values combine linearly: Ex[af+g]=a·Ex[f] + Ex[g]. M. Frank, "Physical Limits of Computing"

Variance & Standard Deviation • The variance of a random variable f isσ2(f) = Ex[(f − Ex[f])2] • The expected value of the squared deviation of f from the norm. (The squaring makes it positive.) • The standard deviation or root-mean-square (RMS) difference of f from its mean isσ(f) = [σ2(f)]1/2. • This is usually comparable, in absolute magnitude, to a typical value of f − Ex[f]. M. Frank, "Physical Limits of Computing"

The Theory of Information:Some Basic Concepts Basic Information Concepts Quantifying Information Information and Entropy

Etymology of “Information” • Earliest historical usage in English (from Oxford English Dictionary): • The act ofinforming, • As in education, instruction, training. • “Five books come down from Heaven for information of mankind.” (1387) • Or a particular item of training, i.e., a particular instruction. • “Melibee had heard the great skills and reasons of Dame Prudence, and her wise informations and techniques.” (1386) • Derived by adding the action noun ending “–ation” (descended from Latin’s “–tio”) to the pre-existing verb to inform, • Meaning to give form (shape) to the mind; • to discipline, instruct, teach: • “Men so wise should go and inform their kings.” (1330) • And inform comes from Latin informare, derived from noun forma (form), • Informare means to give form to, or to form an idea of. • Latin also even already contained the derived word informatio, • meaning concept or idea. • Note: The Greek words είδος (eídos) and μορφή (morphé), • Meaning form, or shape, • were famously used by Plato (& later Aristotle) in a technical philosophical sense, to denote the true identity or ideal essence of something. • We’ll see that our modern concept of physical information is not too dissimilar! M. Frank, "Physical Limits of Computing"

Information: Our Definition • Information is that which distinguishes one thing (entity) from another. • It is all or part of an identification or description of the thing. • A specification of some or all of its properties or characteristics. • We can say that every thing carries or embodies a complete description of itself. • Simply in virtue of its own being; this is called the entity’s form or constitutive essence. • But, let us also take care to distinguish between the following: • A nuggetof information (for lack of a better phrase): • A specific instantiation (i.e., as found in a specific entity) of some general form. • A cloud or stream of information: • A physical state or set of states, dynamically changing over time. • A form orpattern of information: • An abstract pattern of information, as opposed to a specific instantiation. • Many separate nuggets of information contained in separate objects may have identical patterns, or content. • We may say that those nuggets are copies of each other. • An amount or quantity of information: • A quantification of how large a given nugget, cloud, or pattern of information is. • Measured in logarithmic units, applied to the number of possible patterns. M. Frank, "Physical Limits of Computing"

Information-related concepts • It will also be convenient to discuss the following: • An embodiment of information: • The physical system that contains some particular nugget or cloud of information. • A symbol or message: • A nugget of information or its embodiment produced with the intent that it should convey some specific meaning, or semantic content. • A message is typically a compound object containing a number of symbols. • An interpretation of information: • A particular semantic interpretation of a form (pattern of information), tying it to potentially useful facts of interest. • May or may not be the intended meaning! • A representation of information: • An encoding of one pattern of information within some other (frequently larger) pattern. • According to some particular language or code. • A subject of information: • An entity that is identified or described by a given pattern of information. • May be abstract or concrete, mathematical or physical M. Frank, "Physical Limits of Computing"

Information Concept Map Meaning (interpretationof information) Describes, identifies Representedby Interpretedto get Quantity ofinformation Thing (subjector embodiment) Form (pattern ofinformation) Measures size of May be a Measures Maybe a Instantiatedby/in Instantiates,has Measures Forms, composes Contains, carries, embodies Cloud(dynamicbody of information) Physicalentity Nugget (instanceof a form) Has a changing M. Frank, "Physical Limits of Computing"

Quantifying Information • One way to quantify forms is to try to count how many distinct ones there are. • The number of all conceivable forms is not finite. • However… • Consider a situation defined in such a way that a given nugget (in the context of that situation) can only take on some definite number N of possible distinct forms. • One way to try to characterize the size of the nugget is then to specify the value of N. • This describes the amount of variability of its form. • However, N by itself does not seem to have the right mathematical properties to be used to describe the informational size of the nugget… M. Frank, "Physical Limits of Computing"

Compound Nuggets • Consider a nugget of information C formed by taking two separate and independent nuggets of information A, B, and considering them together as constituting a single compound nugget of information. • Suppose now also that A has N possible forms, and that B has M possible forms. • Clearly then, due to the product rule of combinatorics, C has N·Mpossible distinct forms. • Each is obtained by assigning a form to Aand a form to B independently. • Would the size of the nugget C then be the product of the sizes of A and B? • It would seem more natural to say sum,so that “the whole is the sum of the parts.” Nugget C: Has NM forms Nugget A Nugget B N possibleforms M possibleforms M. Frank, "Physical Limits of Computing"

Information & Logarithmic Units • We can convert the product to a sum by using logarithmic units. • Let us then define the informational size I of (or amount of information contained in) a nugget of information that has N possible forms as being the indefinite logarithm of N, that is, as I = log N. • With an unspecified base for the logarithm. • We can interpret indefinite-logarithm values as being inherently dimensional (not dimensionless pure-number) quantities. • Any numeric result is always (explicitly or implicitly) paired with a unit[log b] which is associated with the base b of the logarithm that is used. • The unit [log 2] is called the bit, the unit [log 10] is the decade or bel, [log 16] is sometimes called a nybble, and [log 256] is the byte. • Whereas, the unit [log e] (most widely used in physics) is called the nat. • The nat is also expressed as Boltzmann’s constant kB(e.g. in Joules/K) • A.k.a. the ideal gas constant R (frequently expressed in kcal/mol/K) Log Unit Number Log Unit [log a] = (logba) [log b] = (logca) [log c] M. Frank, "Physical Limits of Computing"

The Size of a Form • Suppose that in some situation, a given nugget has N possible forms. • Then the size of the nugget is I = log N. • Can we also say that this is the size of each of the nugget’s possible forms? • In a way, but we have to be a little bit careful. • We distinguish between two concepts: • The actual sizeI = log N of each form. • That is, given how the situation is described. • The entropy or compressed sizeS of each form. • Which we are about to define. M. Frank, "Physical Limits of Computing"

The Entropy of a Form • How can we measure the compressed size of an abstract form? • For this, we need a language that we can use to represent forms using concrete nuggets of linguistic information whose size we can measure. • We then say that the compressed size or entropy S of a form is the size of the smallest nugget of information representing it in our language. (Its most compressed description.) • At first, this seems pretty ambiguous, but… • In their algorithmic information theory, Kolmogorov and Chaitin showed that this quantity is even almost language-independent. • It is invariant to a language-dependent additive constant. • That is, among computationally universal (Turing-complete) languages. • Also, whenever we have a probability distribution over forms, Shannon shows us how to choose an encoding that minimizes the expected size of the codeword nugget that is needed. • If a probability distribution is available, we assume a language chosen to minimize the expected size of the nugget representing the form. • We define the compressed size or entropy of the form to be the size of its description in this optimal language. M. Frank, "Physical Limits of Computing"

The Optimal Encoding • Suppose a specific form F has probability p. • Thus, improbability i = 1/p. • Note that this is the same probability that F would have if it were one of i equally-likely forms. • We saw earlier that a nugget of information having i possible forms is characterized as containing a quantity of information I = log i. • And the “actual” size of each form in that situation is the same, I. • If all forms are equally likely, their average compressed size can’t be any less. • So, it seems reasonable to declare that the compressed size S of a form F with probability p is the same as its actual size in this situation, that is, S(F) =log i = log 1/p = −log p. • This suggests that in the optimal encoding language, the description of the form F would be represented in a nugget of that size. • In his Mathematical Theory of Communication (1949) Claude Shannon showed that in fact this is exactly correct, • So long as we permit ourselves to consider encodings in which many similar systems (whose forms are chosen from the same distribution) are described together. • Modern block-coding schemes in fact closely approach Shannon’s ideal encoding efficiency. M. Frank, "Physical Limits of Computing"

Optimal Encoding Example • Suppose a system has four forms A, B, C, D with the following probabilities: • p(A)=½, p(B)=¼, p(C)=p(D)=1/8. • Note that the probabilities sum to 1, as they must. • Then the corresponding improbabilities are: • i(A)=2, i(B)=4, i(C)=i(D)=8. • And the form sizes (log-improbabilities) are: • S(A) = log 2 = 1 bit,S(B) = log 4 = 2 log 2 = 2 bits,S(C) = S(D) = log 8 = 3 log 2 = 3 bits. • Indeed, in this example, we can encode the forms using bit-strings of exactly these lengths, as follows: • A=0, B=10, C=110, D=111. • Note that this code is self-delimiting; • the codewords can be concatenated together without ambiguity. 0 1 A 0 1 B 0 1 C D M. Frank, "Physical Limits of Computing"

Entropy Content of a Nugget • Naturally, if we have a probability distribution over the possible forms F of a nugget, • We can easily calculate the expected entropy S (expected compressed size) of the nugget’s form. • This is possible since S itself is a random variable, • a function of the event that the system has a specific form F. • The expected entropy S of the nugget’s form is then: • We usually drop the “expected,” and just call this the amount of entropy S contained in the nugget. • It is really the expected compressed size of the nugget. Notethe “−”! M. Frank, "Physical Limits of Computing"

Visualizing Boltzmann-Gibbs-Shannon Statistical Entropy M. Frank, "Physical Limits of Computing"

Known vs. Unknown Information • We can consider the informational size I = log N of a nugget that has N forms as telling us the total amount of information that the nugget contains. • Meanwhile, we can consider its entropy S = log i(f) as telling us how much of the total information that it contains is unknown to us. • In the perspective specified by the distribution p(). • Since S ≤ I, we can also define the amount of known information (or extropy) in the nugget as X = I − S. • Note that our probability distribution p() over the nugget’s form could change (if we gain or lose knowledge about it), • Thus, the nugget’s entropy Sand extropy X may also change. • However, note that the total informational size of a given nugget, I = log N = X + S, always still remains a constant. • Entropy and extropy can be viewed as two forms of information, which can be converted to each other, but whose total amount is conserved. M. Frank, "Physical Limits of Computing"

Information/Entropy Example • Consider a tetrahedral die which maylie on any of its 4 faces labeled 1,2,3,4: • We say that the answer to the question “Which side is up?” is a nugget of information having 4 possible forms. • Thus, the total amount of information contained in this nugget, and in the orientation of the physical die itself, is [log 4] = [2 bits]. • Now, suppose the die is weighted so that p(1)=½, p(2)=¼, and p(3)=p(4)=1/8 for its post-throw state. • Then S(1)=1b, S(2)=2b, and S(3)=S(4)=3b. • The expected entropy is then S = 1.75 bits. • This much information remains unknown before the die is thrown. • The extropy (known information) is then X = 0.25 bits. • Exactly one-fourth of a bit’s worth of knowledge about the outcome is already expressed by this specific probability distribution p(). M. Frank, "Physical Limits of Computing"

Nugget=Variable, Form=Value, and Types of Events. • A nugget basically means a variable V. • Also associated with a set of possible values {v1,v2,…}. • Meanwhile, a form is basically a valuev. • A primitive event is a proposition that assigns a specific form v to a specific nugget, V=v. • I.e., a specific value to a specific variable. • A compound event is a conjunctive proposition that assigns forms to multiple nuggets, • E.g., V=v, U=u, W=w. • A general event is a disjunctive set of primitive and/or compound events. • Essentially equivalent to a Boolean combination of assignment propositions. M. Frank, "Physical Limits of Computing"

Entropy of a Binary Variable Below, little s of an individual form or probability denotesthe contribution to the total entropy of a form with that probability. Maximum s(p) = (1/e) nat = (lg e)/e bits = .531 bits @ p = 1/e = .368 M. Frank, "Physical Limits of Computing"

Joint Distributions over Two Nuggets • Let X, Y be two nuggets, each with many forms {x1, x2, …} and {y1, y2, …}. • Let xy represent the compound event X=x,Y=y. • Note: all xys are mutually exclusive and exhaustive. • Suppose we have available a joint probability distribution p(xy) over the nuggets X and Y. • This then implies the reduced or marginal distributions p(x)=∑yp(xy) and p(y)=∑xp(xy). • We also thus have conditional probabilities p(x|y) and p(y|x), according to the usual definitions. • And we have mutual probability ratios r(xy). M. Frank, "Physical Limits of Computing"

Joint, Marginal, Conditional Entropyand Mutual Information • The joint entropy S(XY) = log i(xy). • The (prior, marginal or reduced) entropy S(X) = S(p(x)) = log i(x). Likewise for S(Y). • The entropy of each nugget, taken by itself. • Entropy is subadditive: S(XY) ≤ S(X) + S(Y). • The conditional entropy S(X|Y) = Exy[S(p(x|y))] • The expected entropy after Y is observed. • Theorem:S(X|Y) = S(XY) − S(Y). Joint entropy minus that of Y. • The mutualinformation I(X:Y) = Ex[log r(xy)]. • We will prove Theorem:I(X:Y) = S(X) − S(X|Y). • Thus the mutual information is the expected reduction of entropy in either variable as a result of observing the other. M. Frank, "Physical Limits of Computing"

Conditional Entropy Theorem The conditional entropy of X given Y is the joint entropy of XYminus the entropy of Y. M. Frank, "Physical Limits of Computing"

Mutual Information is Mutual Reduction in Entropy And likewise, we also have I(X:Y) = S(Y) − S(Y|X), since the definition is symmetric. I(X:Y) = S(X) + S(Y) − S(XY) Also, M. Frank, "Physical Limits of Computing"

Visualization of Mutual Information Mutual InformationI(X,Y) • Let the total length of the bar below represent the total amount of entropy in the system XY. S(Y|X) = conditional entropy of Y given X` S(X) = entropy of X S(XY) = joint entropy of X and Y S(X|Y) = conditional entropy of X given Y` S(Y) = entropy of Y M. Frank, "Physical Limits of Computing"

Example 1 • Suppose the sample space of primitive events consists of 5-bit strings B=b1b2b3b4b5. • Chosen at random with equal probability (1/32). • Let variable X=b1b2b3b4, and Y=b3b4b5. • Then S(X) = ___ bits, and S(Y) = ___ b. • Meanwhile S(XY) = ___ b. • Thus S(X|Y) = ___ b, and S(Y|X) = ___ b • And so I(X:Y) = ___ b. 4 3 5 2 1 2 M. Frank, "Physical Limits of Computing"

Example 2 • Let the sample space A consist of the 8 letters {a,b,c,d,e,f,g,h}. (All equally likely.) • Let X partition A into x1={a,b,c,d} and x2={e,f,g,h}. • & Y partitions A into y1={a,b,e}, y2={c,f}, y3={d,g,h}. • Then we have: • S(X) = 1 bit. • S(Y) = 2(3/8 log 8/3) + (1/4 log 4) = 1.561278 bits • S(Y|X) = (1/2 log 2) + 2(1/4 log 4) = 1.5 bits. • I(X:Y) = 1.561278b − 1.5b = .061278 b. • S(XY) = 1b + 1.5b = 2.5 b. • S(X|Y) = 1b − .061278b = .938722 b. Y a b c d X e f g h (Meanwhile, the total information content of the sample space = log 8 = 3 bits) M. Frank, "Physical Limits of Computing"

Physical Information • Now, physical information is simply information that is contained in the state of a physical system or subsystem. • We may speak of a holder, pattern, amount, subject, embodiment, meaning, cloud or representation of physical information, as with information in general. • Note that all information that we can manipulate ultimately must be (or be represented by) physical information! • So long as we are stuck in the physical universe! • In our quantum-mechanical universe, there are two very different categories of physical information: • Quantum information is all the information that is embodied in the quantum state of a physical system. • Unfortunately, it can’t all be measured or copied! • Classical informationis just a piece of information that picks out a particular measured state, once a “basis for measurement” is already given. • It’s the kind of information that we’re used to thinking about. M. Frank, "Physical Limits of Computing"

Objective Entropy? • In all of this, we have defined entropy as a somewhat subjective or relative quantity: • Entropy of a subsystem depends on an observer’s state of knowledge about that subsystem, such as a probability distribution. • Wait a minute… Doesn’t physics have a more objective, observer-independent definition of entropy? • Only insofar as there are “preferred” states of knowledge that are most readily achieved in the lab. • E.g., knowing of a gas only its chemical composition, temperature, pressure, volume, and number of molecules. • Since such knowledge is practically difficult to improve upon using present-day macroscale tools, it serves as a uniform standard. • However, in nanoscale systems, a significant fraction of the physical information that is present in one subsystem is subject to being known, or not, by another subsystem (depending on design). •  How a nanosystem is designed & how we deal with information recorded at the nanoscale may vastly affect how much of the nanosystem’s internal physical information effectively is or is not entropy (for practical purposes). M. Frank, "Physical Limits of Computing"

Entropy in Compound Systems • When modeling a compound system C having at least two subsystems A and B, we can adopt either of (at least) two different perspectives: • The external perspective where we treat AB as a single system, and we (as modelers) have some probability distribution over its states. • This allows us to derive an entropy for the whole system. • The internal perspective in which we imagine putting ourselves “in the shoes” of one of the subsystems (say A), and considering its “state of knowledge” about B. • A may have more knowledge about B than we do. • We’ll see how to make the total expected entropy come out the same in both perspectives! M. Frank, "Physical Limits of Computing"

Beyond Statistical Entropy

Extra Slides Omitted from talk for lack of time

Information Content of a Physical System • The (total amount of) information content I(A) of an abstract physical system A is the unknown information content of the mathematical object D used to define A. • If D is (or implies) only a set S of (assumed equiprobable) states, then we have: I(A) = U(S) = log |S|. • If D implies a probability distribution ℘:S over a set S (of distinguishable states), then: I(A) = U(℘:S) = −℘i log ℘i. • We would expect to gain I(A) information if we measured A (using basis set S) to find its exact actual state sS. •  we say that amount I(A) of information is contained inA. • Note that the information content depends on how broad (how abstract) the system’s description D is! M. Frank, "Physical Limits of Computing"

Information Capacity & Entropy • The information capacity of a system is also the amount of information about the actual state of the system that we do not know, given only the system’s definition. • It is the amount of physical information that we can say is in the state of the system. • It is the amount of uncertainty we have about the state of the system, if we know only the system’s definition. • It is also the quantity that is traditionally known as the (maximum) entropyS of the system. • Entropy was originally defined as the ratio of heat to temperature. • The importance of this quantity in thermodynamics (the observed fact that it never decreases) was first noticed by Rudolph Clausius in 1850. • Today we know that entropy is, physically, really nothing other than (unknown, incompressible) information! M. Frank, "Physical Limits of Computing"

Known vs. Unknown Information • We, as modelers, define what we mean by “the system” in question using some abstract description D. • This implies some information content I(A) for the abstract system A described by D. • But, we will often wish to model a scenario in which some entity E (perhaps ourselves) has more knowledge about the system A than is implied by its definition. • E.g., scenarios in which E has prepared A more specifically, or has measured some of its properties. • Such E will generally have a more specific description of A and thus would quote a lower resulting I(A) or entropy. • We can capture this by distinguishing the information in A that is knownbyE from that which is unknown. • Let us now see how to do this a little more formally. M. Frank, "Physical Limits of Computing"

Subsystems (More Generally) • For a system A defined by a state set S, • any partition P of S into subsets can be considered a subsystem B of A. • The subsets in the partition P can be considered the “states” of the subsystem B. Another subsytem of A In this example,the product of thetwo partitions formsa partition of Sinto singleton sets.We say that this isa complete set ofsubsystems of A.In this example, the two subsystemsare also independent. One subsystemof A M. Frank, "Physical Limits of Computing"

Pieces of Information • For an abstract system A defined by a state set S, any subset TS is a possible piece of information about A. • Namely it is the information “The actual state of A is some member of this set T.” • For an abstract system A defined by a probability distribution ℘:S, any probability distribution ℘′:S such that ℘=0 → ℘′=0 and U(℘′)<U(℘) is another possible piece of information about A. • That is, any distribution that is consistent with and more informative than A’s very definition. M. Frank, "Physical Limits of Computing"

Known Physical Information • Within any universe (closed physical system) W described by distribution ℘, we say entity E (a subsystem of W) knows a piece P of the physical information contained in system A (another subsystem of W) iff ℘ implies a correlation between the state of E and the state of A, and this correlation is meaningfully accessible to E. • Let us now see how to make this definition more precise. The Universe W Entity(Knower)E The PhysicalSystem A Correlation M. Frank, "Physical Limits of Computing"

What is a correlation, anyway? • A concept from statistics: • Two abstract systems A and B are correlated or interdependent when the entropy of the combined system S(AB) is less than that of S(A)+S(B). • I.e., something is known about the combined state of AB that cannot be represented as knowledge about the state of either A or B by itself. • E.g.A,B each have 2 possible states 0,1 • They each have 1 bit of entropy. • But, we might also know that A=B, so the entropy of AB is 1 bit, not 2. (States 00 and 11.) M. Frank, "Physical Limits of Computing"

Known Information, More Formally • For a system defined by probability distribution ℘ that includes two subsystems A,B with respective state variables X,Y having mutual information I℘(X:Y), • The total information content of B is I(B) = U(℘Y). • The amount of information in B that is known by A is KA(B) = I℘(X:Y). • The amount of information in B that is unknown by A is UA(B) = U(℘Y) − KA(B) = S(Y) − I(X:Y) = S(Y|X). • The amount of entropy in B from A’s perspective is SA(B) = UA(B) = S(Y|X). • These definitions are based on all the correlations that are present between A and B according to our global knowledge ℘. • However, a real entity A may not know, understand, or be able to utilize all the correlations that are actually present between him and B. • Therefore, generally more of B’s physical information will be effectively entropy, from A’s perspective, than is implied by this definition. • We will explore some corrections to this definition later. • Later, we will also see how to sensibly extend this definition to the quantum context. M. Frank, "Physical Limits of Computing"

Maximum Entropy vs. Entropy Total information content I = Maximum entropy Smax =logarithm of # states consistent with system’s definition Unknown information UA= Entropy SA(as seen by observer A) Known informationKA = I− UA= Smax − SAas seen by observer A Unknown information UB= Entropy SB(as seen by observer B) M. Frank, "Physical Limits of Computing"

Exploring Basics of Information and Computation Theory