Module #6 – Fundamental Physical Limits of Computing

Module #6 – Fundamental Physical Limits of Computing A Brief Survey

Fundamental Physical Limits of Computing ImpliedUniversal Facts Affected Quantities in Information Processing Thoroughly ConfirmedPhysical Theories Speed-of-LightLimit Communications Latency Theory ofRelativity Information Capacity UncertaintyPrinciple Information Bandwidth Definitionof Energy Memory Access Times QuantumTheory Reversibility 2nd Law ofThermodynamics Processing Rate Adiabatic Theorem Energy Loss per Operation Gravity

A Slightly More Detailed View

The Speed-of-Light Limit on Information Propagation Velocity What are its implications for future computer architectures?

Implications for Computing • Minimum communications latency! • Minimum memory-access latency! • Need for Processing-in-Memory architectures! • Mesh-type topologies are optimally scalable! • Hillis, Vitanyi, Bilardi & Preparata • Together w. 3-dimensionality of space implies: • No network topology with (n3) connectivity(# nodes reachable in n hops) is scalable! • Meshes w. 2-3 dimensions are optimally scalable. • Precise number depends on reversible computing theory!

How Bad it Is, Already • Consider a 3.2 GHz processor (off today’s shelf) • In 1 cycle a signal can propagate at most: • c/(3.2 GHz) = 9.4 cm • For a 1-cycle round-trip to cache memory & back: • Cache location can be at most 4.7 cm away! • Electrical signals travel at ~0.5 c in typical materials • In practice, a 1-cycle memory can be at most 2.34 cm away! • Already  logics in labs at 100 GHz speeds! • E.g., superconducting logic technology: RSFQ • 1-cycle round trips only within 1.5 mm! • Much smaller than a typical chip diameter! • As f, architectures must be increasingly local.

Latency Scaling w. Memory Size • Avg. time to randomly access anyone of n bits of storage (accessibleinformation) scales as (n1/3). • This will remain true in all future technologies! • Quantum mechanics gives a minimum size for bits • Esp. assuming temperature & pressure are limited. • Thus n bits require a (n)-volume region of space. • Minimum diameter of this region is (n1/3). • At lightspeed, random access takes (n1/3) time! • Assuming a non-negative curvature region of spacetime. • Of course, specific memory technologies (or a suite of available technologies) may scale even worse than this! (n1/3) n bits

Scalability & Maximal Scalability • A multiprocessor architecture & accompanying performance model is scalable if: • it can be “scaled up” to arbitrarily large problem sizes, and/or arbitrarily large numbers of processors, without the predictions of the performance model breaking down. • An architecture (& model) is maximally scalable for a given problem if • it is scalable, and if no other scalable architecture can claim asymptotically superior performance on that problem • It is universally maximally scalable (UMS) if it is maximally scalable on all problems! • I will briefly mention some characteristics of architectures that are universally maximally scalable

Shared Memory isn’t Scalable • Any implementation of shared memory requires communication between nodes. • As the # of nodes increases, we get: • Extra contention for any shared BW • Increased latency (inevitably). • Can hide communication delays to a limited extent, by latency hiding: • Find other work to do during the latency delay slot. • But, the amount of “other work” available is limited by node storage capacity, parallizability of the set of running applications, etc.

Unit-Time Message Passing Isn’t Scalable • Model: Any node can pass a message to any other in a single constant-time interval (independent of the total number of nodes) • Same scaling problems as shared memory! • Even if we assume BW contention (traffic) isn’t a problem, unit-time assumption is still a problem. • Not possible for all N, given speed-of-light limit! • Need cube root of N asymptotic time, at minimum.

Many Interconnect Topologies aren’t Scalable • Suppose we don’t require a node can talk to any other in 1 time unit, but only to selected others. • Some such schemes still have scalability problems, e.g.: • Hypercubes • Binary trees, fat trees • Butterfly networks • Any topology in which the number of unit-time hops to reach any one of N nodes is of order less than N1/3 is necessarily doomed to failure! • Caveat: Except in negative-curvature spacetimes!

Only Meshes (or subgraphs thereof) Are Scalable See papers by Hillis, Vitanyi, Bilardi & Preparata • 1-D meshes • linear chain, ring, star (w. fixed # of arms) • 2-D meshes • square grid, hex grid, cylinder, 2-sphere, 2-torus, … • 3-D meshes • crystal-like lattices w. various symmetries • Caveat: • Scalability in 3rd dimension is limited by energy/information I/O considerations! Amorphousarrangementsin 3d, w. localcomms., are also ok

CPU CPU CPU CPU CPU CPU Ideally Scalable Architectures Claim: A 2- or 3-D mesh multiprocessor with a fixed-size memory hierarchy per node is an optimal scalable computer systems design (for any application). Processing Node Processing Node Processing Node Local memory hierarchy(optimal fixed size) Local memory hierarchy(optimal fixed size) Local memory hierarchy(optimal fixed size) Processing Node Processing Node Processing Node Local memory hierarchy(optimal fixed size) Local memory hierarchy(optimal fixed size) Local memory hierarchy(optimal fixed size) Mesh interconnection network

Landauer’s Principle • Low-level physics is reversible • Means, the time-evolution of a state is bijective • Deterministic looking backwards in time • as well as forwards • Physical information (like energy) is conserved • Cannot be created or destroyed, only reversibly rearranged and modified • Implies the 2nd Law of Thermodynamics: • Entropy (unknown info.) in a closed, unmeasured system can only increase (as we lose track of the state) • Irreversible bit “erasure” really just moves the bit into surroundings, increasing entropy & heat

Scaling in 3rd Dimension? • Computing based on ordinary irreversible bit operations only scales in 3d up to a point. • Discarded information & associated energy must be removed thru surface. Energy flux limited. • Even a single layer of circuitry in a high-performance CPU can barely be kept cool today! • Computing with reversible, adiabatic operations does better: • Scales in 3d, up to a point • Then with square root of further increases in thickness, up to a point. (Scales in 2.5 dimensions!) • Scales to much larger thickness than irreversible!

Universal Maximum Scalability • Existence proof for universally maximally scalable (UMS) architectures: • Physics itself is a universal maximally scalable “architecture” because any real computer is merely a special case of a physical system. • Obviously, no restricted class of real computers can beat the performance scalability of physical systems in general. • Unfortunately, physics doesn’t give us a very simple or convenient programming model. • Comprehensive expertise at “programming physics” means mastery of all physical engineering disciplines: chemical, electrical, mechanical, optical, etc. • We’d like an easier programming model than this!

Simpler UMS Architectures • (I propose) any practical UMS architecture will have the following features: • Processing elements characterized by constant parameters (independent of # of processors) • Makes it easy to scale multiprocessors to large capacities. • Mesh-type message-passing interconnection network, arbitrarily scalable in 2 dimensions • w. limited scalability in 3rd dimension. • Processing elements that can be operated in an arbitrarily reversible way, at least, up to a point. • Enables improved 3-d scalability in a limited regime • (In long term) Have capability for quantum-coherent operation, for extra perf. on some probs.

Limits on Amount of Information Content

Some Quantities of Interest • We would like to know if there are limits on: • Information density • = Bits per unit volume • Affects physical size and thus propagation delayacross memories and processors. Also affects cost. • Information flux • = Bits per unit area per unit time • Affects cross-sectional bandwidth, data I/O rates, rates of standard-information input & effective-entropy removal • Rate of computation • = Number of distinguishable-state changes per unit time • Affects rate of information processing achievable in individual devices

Bit Density: No classical limit • In classical (continuum) physics, even a single particle has a real-valued position+momentum • All such states are considered physically distinct • Each position & momentum coordinate in general requires an infinite string of digits to specify: • x = 4.592181291845019587661625618991009… meters • p = 2.393492301938881726153514427394001… kg m/s • Even the smallest system contains an infinite amount of information!  No limit to bit density. • This picture is the basis for various analog computing models studied by some theoreticians. • Wee problem: Classical physics is dead wrong!

The Quantum “Continuum” • In QM, still  uncountably many describable states (mathematically possible wavefunctions) • Can theoretically take infinite info. to describe • But, not all this info has physical relevance! • States are only physically distinguishable when their state vectors are orthogonal. • States that are only indistinguishably different can only lead to indistinguishably different consequences (resulting states) • due to linearity of quantum physics • There is no physical consequence from presuming an infinite # of bits in one’s wavefunction!

Quantum Particle-in-a-Box • Uncountably manycontinuouswavefunctions? • No, can expresswave as a vectorover countablymany orthogonalnormal modes. • Fourier transform • High-frequencymodes have higherenergy (E=hf); alimit on average energy impliesthey have low probability.

Ways of Counting States The entire field of quantum statistical mechanics is all about this, but here are some simple ways: • For a system w. a constant # of particles: • # of states = numerical volume of the position-momentum configuration space (phase space) • When measured in units where h=1. • Exactly approached in the macroscopic limit. • Unfortunately, # of particles is not usually constant! • Quantum field theory bounds: • Smith-Lloyd bound. Still ignores gravity. • General relativistic bounds: • Bekenstein bound, holographic bound.

Smith-Lloyd Bound Smith ‘95Lloyd ‘00 • Based on counting modes of quantum fields. • S = entropy, M = mass, V = volume • q = number of distinct particle types • Lloyd’s bound is tighter by a factor of • Note: • Maximum entropy density scales with only the 3/4 power of mass-energy density! • E.g., Increasing entropy density by a factor of 1,000 requires increasing energy density by 10,000×.

Whence this scaling relation? • Note that in the field theory limit, S E3/4. • Where does the ¾ power come from? • Consider a typical mode in field spectrum • Note that the minimum size of agiven wavelet is ~its wavelength . • # of distinguishable wave-packet location states in a given volume  1/3 • Each such state carries just a little entropy • occupation number of that state (# of photons in it) • 1/3 particles each energy 1/, 1/4 energy • S1/3  E1/4  SE3/4

Whence the distribution? • Could the use of more particles (with less energy per particle) yield greater entropy? • What frequency spectrum (power level or particle number density as a function of frequency) gives the largest # states? • Note  a minimum particle energy in finite-sized box • No. The Smith-Lloyd bound is based on the blackbody radiation spectrum. • We know this spectrum has the maximum info. content among abstract states, b/c it’s the equilibrium state! • Empirically verified in hot ovens, etc.

Examples w. Smith-Lloyd Bound • For systems at the density of water (1 g/cm3), composed only of photons: • Smith’s example: 1 m3 box holds 6×1034 bits • = 60 kb/Å3 • Lloyd’s example: 1 liter “ultimate laptop”, 2×1031 b • = 21 kb/Å3 • Pretty high, but what’s wrong with this picture? • Example requires very high temperature+pressure! • Temperature around 1/2 billion Kelvins!! • Photonic pressure on the order of 1016 psi!! • “Like a miniature piece of the big bang.” -Lloyd • Probably not feasible to implement any time soon!

More Normal Temperatures • Let’s pick a more reasonable temperature: 1356 K (melting point of copper): • The entropy density of light is only 0.74 bits/m3! • Less than the bit density in a DRAM today! • Bit size is comparable to avg. wavelength of optical-frequency light emitted by melting copper • Lesson: Photons are not a viable nanoscale info. storage medium at ordinary temperatures. • They simply aren’t dense enough! • CPUs that do logic with optical photons can’t have their logic devices packed very densely.

Entropy Density of Solids • Can easily calculate from standard empirical thermochemical data. • E.g. see CRC Handbook of Chemistry & Physics. • Obtain entropy by integrating heat capacity ÷ temperature, as temperature increases… • Example result, for copper: • Has one of the highest entropy densities among pure elements, at atmospheric pressure. • @ room temperature: 6 bits/atom, 0.5 b/Å3 • At boiling point: 1.5 b/Å3 • Cesium has one of the highest #bits/atom at room temperature, about 15. • But, only 0.13 b/Å3 • Lithium has a high #bits/mass, 0.7 bits/amu. 1012×denser thanits light! Related toconductivity?

General-Relativistic Bounds • Note: the Smith-Lloyd bound does not take into account the effects of general relativity. • Earlier bound from Bekenstein: Derives a limit on entropy from black-hole physics: S < (2ER / c) nats E = total energy of system R = radius of the system (min sphere) • Limit only attained by black holes! • Black holes have 1/4 nat entropy per square Planck length of surface (event horizon) area. • Absolute minimum size of a nat: 2 Planck lengths, square 4×1039 b/Å3average ent. dens.of a 1-m radiusblack hole!(MassSaturn)

The Holographic Bound • Based on Bekenstein black-hole bound. • The information content I within any surface of area A (independent of its energy content!) isI≤ A/(2P)2 nats • P is the Planck length (see lecture on units) • Implies that any 3D object (of any size) is completely definable via a flat (2D) “hologram” on its surface having Planck-scale resolution. • This information is all entropy only in the case a black hole with event horizon=that surface.

Holographic Bound Example • The age of the universe is 13.7 Gyr ±1% [WMAP]. • Radius of observed part would thus be 13.7 Glyr… • But, due to expansion, it is actually closer to 40 Glyr today. • The universe is “flat,” so Euclidean formulas apply: • The surface area of the observable universe is: • A = 4πr2 = 4π(40 Glyr)2 = 1.80×1054 m2 • The volume of observable universe is: • V = (4/3)πr3 = (4/3)π(40 Glyr)3 = 2.27×1080 m3 • Now, we can calculate the universe’s total info. content, and its average information density! • I = An/4P2 = (πr2/P2) n = 1.72×10123 n = 2.49×10123 b • I/V = 1.10×1043 b/m3 = 0.01 b/fm3 = 1b/(.22fm)3 • A proton is ~1 fm in radius. Very close to 1 b/proton-volume!

Do Black Holes Destroy Information? • Currently, it seems that no one completely understands exactly how information is preserved during black hole accretion, for later re-emission in the Hawking radiation. • Perhaps via infinite time dilation at event horizon? • Some researchers have claimed that black holes must be doing something irreversible in their interior (destroying information). • However, the arguments for this may not be valid. • Recent string theory calculations contradict this claim. • The issue seems not yet fully resolved, but I have many references on it if you’re interested.

Implications of Info. Density Limits • There is a minimum size for a bit-device. • thus there is a minimum communication latency to randomly access a memory containing n bits • as we discussed earlier. • There is also a minimum cost per bit, if there is a minimum cost per unit of matter/energy. • Implications for communications bandwidth limits… • coming up

Some Quantities of Interest • We would like to know if there are limits on: • Information density • = Bits per unit volume • Affects physical size and thus propagation delayacross memories and processors. Also affects cost. • Information flux • = Bits per unit area per unit time • Affects cross-sectional bandwidth, data I/O rates, rates of standard-information input & effective entropy removal • Rate of computation • = Number of distinguishable-state changes per unit time • Affects rate of information processing achievable in individual devices

Communication Limits • Latency (propagation-time delay) limit from earlier, due to speed of light. • Teaches us scalable interconnection technologies • Bandwidth (information rate) limits: • Classical information-theory limit (Shannon) • Limit, per-channel, given signal bandwidth & SNR. • Limits based on field theory (Smith/Lloyd) • Limit given only area and power. • Applies to I/O, cross-sectional bandwidths in parallel machines, and entropy removal rates.

Hartley-Shannon Law • The maximum information rate (capacity) of a single wave-based communication channel is:C = B log (1+S/N) • B = bandwidth of channel, in frequency units • S = signal power level • N = noise power level • Law not sufficiently powerful for our purposes! • Does not tell us how many effective channels are possible, given available power and/or area. • Does not give us any limit if we are allowed to increase bandwidth or decrease noise arbitrarily.

Density & Flux • Note that any time you have: • a limit  on density (per volume) of something, • & a limit v on its propagation velocity, • this automatically implies: • a limit F = v on the flux • by which I mean amount per time per area • Note also we always have a limit c on velocity! • At speeds near c must account for relativistic effects • Slower velocities v<c may also be relevant: • electron saturation velocity, in various materials • velocity of air or liquid coolant in a cooling system • Thus density limit  implies flux limit F=c Cross-section v

Relativistic Effects • For normal matter (bound massive-particle states) moving at a velocity v approaching c: • Entropy density increases by a factor 1/ • Due to relativistic length contraction • But, energy density increases by factor 1/2 • Both length contraction & mass amplification! •  entropy density scales up only w. square root (1/2 power) of energy density from high velocity • Note that light travels at c already, • & its entropy density scales with energy density to the 3/4 power.  Light wins as vc. • If you want to maximize entropy/energy flux

Max. Entropy Flux Using Light • Where: FS = entropy flux FE = energy flux SB = Stefan-Boltzmann constant, 2kB4/60c23 • This is derived from the same field-theory arguments as the density bound. • Again, the blackbody spectrum optimizes the entropy flux given the energy flux • It is the equilibrium spectrum! Smith ‘95

Entropy Flux Examples • Consider a 10 cm wide, flat, square wireless tablet with a 10 W power supply. • What’s its maximum rate of bit transmission? • Independent of spectrum used, noise floor, etc. • Answer: • Energy flux 10 W/(2·(10 cm)2) (use both sides) • Smith’s formula gives 2.2×1021 bps • What’s the rate per square nanometer surface? • Only 109 kbps! (ISDN speed, in a 100 GHz CPU?) • 100 Gbps/nm2 nearly 1 GW power! Light is not informationally dense enough for high-bandwidth communication between densely packed nanometer-scale devices at reasonable power levels!!!

Entropy Flux w. Atomic Matter • Consider liquid copper (~1.5 b/Å3) moving along at a leisurely 10 cm/s… • BW=1.5x1027 bps through the 10-cm wide square! • A million times higher BW than with 10W light! • 150 Gbps/nm2 entropy flux! • Plenty for nano-scale devices to talk to their neighbors • Most of this entropy is in the conduction electrons... • Less conductive materials have much less entropy • Lesson: • For maximum bandwidth density at realistic power levels, encode information using states of matter (electrons) rather than states of radiation (light). Exercise: Kinetic energy flux?

Some Quantities of Interest • We would like to know if there are limits on: • Infropy density • = Bits per unit volume • Affects physical size and thus propagation delayacross memories and processors. Also affects cost. • Infropy flux • = Bits per unit area per unit time • Affects cross-sectional bandwidth, data I/O rates, rates of standard-information input & effective entropy removal • Rate of computation • = Number of distinguishable-state changes per unit time • Affects rate of information processing achievable in individual devices

Computation Speed Limits

The Margolus-Levitin Bound • The maximum rate  at which a system can transition between distinguishable (orthogonal) states is:  4(E  E0)/h • where: • E = average energy (expectation value of energy over all states, weighted by their probability) • E0 = energy of lowest-energy or ground state of system • h = Planck’s constant (converts energy to frequency) • Implication for computing: • A circuit node can’t switch between 2 logic states faster than this frequency determined by its energy. This is for pops,rate of nops ishalf as great.

Example of Frequency Bound • Consider Lloyd’s 1 liter, 1 kg “ultimate laptop” • Total gravitating mass-energy E of 91016 J • Gives a limit of 51050 bit-operations per second! • If laptop contains 21031 bits (photonic maximum), • each bit can change state at a frequency of 2.51019 Hz (25 EHz) • 12 billion times higher-frequency than today’s 2 GHz Intel processors • 250 million times higher-frequency than today’s 100 GHz superconducting logic • But, the Margolus-Levitin limit may be far from achievable in practice!

More Realistic Estimates • Most of the energy in complex stable structures is not accessible for computational purposes... • Tied up in the rest masses of atomic nuclei, • Which form anchor points for electron orbitals • mass & energy of “core” atomic electrons, • Which fill up low-energy states not involved in bonding, • & of electrons involved in atomic bonds • Which are needed to hold the structure together • Conjecture: Can obtain tighter valid quantum bounds on info. densities & state-transition rates by considering only the accessible energy. • Energy whose state-information is manipulable.

More Realistic Examples • Suppose the following system is accessible:1 electron confined to a (10 nm)3 volume, at an average potential of 10 V above ground state. • Accessible energy: 10 eV • Accessible-energy density: 10 eV/(10 nm)3 • Maximum entropy in Smith bound: 1.4 bits? • Not clear yet whether bound is applicable to this case. • Maximum rate of change: 9.7 PHz • 5 million × typical frequencies in today’s CPUs • 100,000 × frequencies in today’s superconducting logics

Summary of Fundamental Limits

Module #6 – Fundamental Physical Limits of Computing

Module #6 – Fundamental Physical Limits of Computing

Presentation Transcript

Module

Module:

Module

Module

Module # Title of Module

Module:

MODULE

Module

Module