We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon
Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Artificial Neural Networks (ANNs) are based around the backpropagation algorithm. The backpropagation algorithm allows you to perform gradient descent on a network of neurons. When we feed training data through an ANNs, we use the backpropagation algorithm to tell us how the weights should change.

ANNs are good at inference problems. Biological Neural Networks (BNNs) are good at inference too. ANNs are built out of neurons. BNNs are built out of neurons too. It makes intuitive sense that ANNs and BNNs might be running similar algorithms.

There is just one problem: BNNs are physically incapable of running the backpropagation algorithm.

We do not know quite enough about biology to say it is impossible for BNNs to run the backpropagation algorithm. However, "a consensus has emerged that the brain cannot directly implement backprop, since to do so would require biologically implausible connection rules" [1].

The backpropagation algorithm has three steps.

Flow information forward through a network to compute a prediction.

Compute an error by comparing the prediction to a target value.

Flow the error backward through the network to update the weights.

The backpropagation algorithm requires information to flow forward and backward along the network. But biological neurons are one-directional. An action potential goes from the cell body down the axon to the axon terminals to another cell's dendrites. An axon potential never travels backward from a cell's terminals to its body.

Hebbian theory Predictive coding is the idea that BNNs generate a mental model of their environment and then transmit only the information that deviates from this model. Predictive coding considers error and surprise to be the same thing. Hebbian theory is specific mathematical formulation of predictive coding.

Predictive coding is biologically plausible. It operates locally. There are no separate prediction and training phases which must be synchronized. Most importantly, it lets you train a neural network without sending axon potentials backwards.

Predictive coding is easier to implement in hardware. It is locally-defined; it parallelizes better than backpropagation; it continues to function when you cut its substrate in half. (Corpus callosotomy is used to treat epilepsy.) Digital computers break when you cut them in half. Predictive coding is something evolution could plausibly invent.

Unification The paper Predictive Coding Approximates Backprop Along Arbitrary Computation Graphs[1:1] "demonstrate[s] that predictive coding converges asymptotically (and in practice rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules." The authors have unified predictive coding and backpropagation into a single theory of neural networks. Predictive coding and backpropagation are separate hardware implementations of what is ultimately the same algorithm.

There are two big implications of this.

This paper permanently fuses artificial intelligence and neuroscience into a single mathematical field.

This paper opens up possibilities for neuromorphic computing hardware.

All C++20 core language features with examples Apr 2, 2021

Introduction The story behind this article is very simple, I wanted to learn about new C++20 language features and to have a brief summary for all of them on a single page. So, I decided to read all proposals and create this “cheat sheet” that explains and demonstrates each feature. This is not a “best practices” kind of article, it serves only demonstrational purpose. Most examples were inspired or directly taken from corresponding proposals, all credit goes to their authors and to members of ISO C++ committee for their work. Enjoy!

“Most mathematicians prove what they can, von Neumann proves what he wants”

It is indeed supremely difficult to effectively refute the claim that John von Neumann is likely the most intelligent person who has ever lived. By the time of his death in 1957 at the modest age of 53, the Hungarian polymath had not only revolutionized several subfields of mathematics and physics but also made foundational contributions to pure economics and statistics and taken key parts in the invention of the atomic bomb, nuclear energy and digital computing.

Known now as “the last representative of the great mathematicians”, von Neumann’s genius was legendary even in his own lifetime. The sheer breadth of stories and anecdotes about his brilliance, from Nobel Prize-winning physicists to world-class mathematicians abound:

”You know, Herb, Johnny can do calculations in his head ten times as fast as I can. And I can do them ten times as fast as you can, so you can see how impressive Johnny is” — Enrico Fermi (Nobel Prize in Physics, 1938)

“One had the impression of a perfect instrument whose gears were machined to mesh accurately to a thousandth of an inch.” — Eugene Wigner (Nobel Prize in Physics, 1963)

“I have sometimes wondered whether a brain like von Neumann’s does not indicate a species superior to that of man” — Hans Bethe (Nobel Prize in Physics, 1967)

An émigré to America in 1933, von Neumann’s life was one famously dedicated to cognitive and creative pursuits, but also the enjoyments of life. Twice married and wealthy, he loved expensive clothes, hard liquor, fast cars and dirty jokes, according to his friend Stanislaw Ulam. Almost involuntarily, his posthumous biographer Norman Macrae recounts, people took a liking to von Neumann, even those who disagreed with his conservative politics (Regis, 1992).

This essay aims to highlight some of the unbelievable feats of “Johnny” von Neumann’s mind. Happy reading!

Early years (1903–1921)Neumann János Lajos (John Louis Neumann in English) was born (or “ arrived”) on December 28th 1903 in Budapest, Hungary. Born to wealthy non-observant Jewish bankers, his upbringing can be described as privileged. His father held a doctorate in law, and he grew up in an 18-room apartment on the top floor above the Kann-Heller offices at 62 Bajcsy-Zsilinszky Street in Budapest (Macrae, 1992).

John von Neumann at age 7 (1910)Child prodigy“Johnny” von Neumann was a child prodigy. Even from a young age, there were strange stories of little John Louis’ abilities: dividing two eight-digit numbers in his head and conversing in Ancient Greek at age six (Henderson, 2007), proficient in calculus at age eight (Nasar, 1998) and reading Emile Borel’s Théorie des Fonctions (“On some points in the theory of functions” ) at age twelve (Leonard, 2010). Reportedly, von Neumann possessed an eidetic memory, and so was able to recall complete novels and pages of the phone directory on command. This enabled him to accumulate an almost encyclopedic knowledge of what ever he read, such as the history of the Peloponnesian Wars, the Trial Joan of Arc and Byzantine history (Leonard, 2010). A Princeton professor of the latter topic once stated that by the time he was in his thirties, Johnny had greater expertise in Byzantine history than he did (Blair, 1957).

Left: John von Neumann at age 11 (1915) with his cousin Katalin Alcsuti. (Photo: Nicholas Vonneumann). Right: The Neumann brothers Miklós (1911–2011), Mihály (1907–1989) and János Lajos (1903–1957)

"One of his remarkable abilities was his power of absolute recall. As far as I could tell, von Neumann was able on once reading a book or article to quote it back verbatim; moreover, he could do it years later without hesitation. He could also translate it at no diminution in speed from its original language into English. On one occasion I tested his ability by asking him to tell me how A Tale of Two Cities started. Whereupon, without any pause, he immediately began to recite the first chapter and continued until asked to stop after about ten or fifteen minutes."Excerpt, The Computer from Pascal to von Neumann by Herman Goldstein (1980)

An unconventional parent, von Neumann’s father Max would reportedly bring his workaday banking decisions home to the family and ask his children how they would have reacted to particular investment possibilities and balance-sheet risks (Macrae, 1992). He was home-schooled until 1914, as was the custom in Hungary at the time. Starting at the age of 11, he was enrolled in the German-speaking Lutheran Gymnasium in Budapest. He would attend the high school until 1921, famously overlapping the high school years of three other “ Martians” of Hungary:

Leo Szilard (att. 1908–16 at Real Gymnasium), the physicist who conceived of the nuclear chain reaction and in late 1939 wrote the famous Einstein-Szilard letter for Franklin D. Roosevelt that resulted in the formation of the Manhattan Project that built the first atomic bomb Eugene Wigner (att. 1913–21 at Lutheran Gymnasium), the 1963 Nobel Prize laureate in Physics who worked on the Manhattan Project, including the theory of the atomic nucleus, elementary particles and Wigner’s Theorem in quantum mechanics Edward Teller (att. 1918–26 at Minta School), the “father of the hydrogen bomb”, an early member of the Manhattan Project and contributor to nuclear and molecular physics, spectroscopy and surface physics

Although all of similar ages and interests, as Macrae (1992) writes:

"The four Budapesters were as different as four men from similar backgrounds could be. They resembled one another only in the power of the intellects and in the nature of their professional careers. Wigner [...] is shy, painfully modest, quiet. Teller, after a lifetime of successful controversy, is emotional, extroverted and not one to hide his candle. Szilard was passionate, oblique, engagé, and infuriating. Johnny [...] was none of these. Johnny's most usual motivation was to try to make the next minute the most productive one for whatever intellectual business he had in mind."- Excerpt, John von Neumann by Norman Macrae (1992)

Yet still, the four would work together off and on as they all emigrated to America and got involved in the Manhattan Project.

By the time von Neumann enrolled in university in 1921, he had already written a paper with one of his tutors, Mikhail Fekete on “A generalization of Fejér’s theorem on the location of the roots of a certain kind of polynomial” (Ulam, 1958). Fekete had along with Laszló Rátz reportedly taken a notice to von Neumann and begun tutoring him in university-level mathematics. According to Ulam, even at the age of 18, von Neumann was already recognized as a full-fledged mathematician. Of an early set theory paper written by a 16 year old von Neumann, Abraham Fraenkel (of Zermelo-Fraenkel set theory fame) himself later stated (Ulam, 1958):

Letter from Abraham Fraenkel to Stanislaw Ulam Around 1922-23, being then professor at Marburg University, I received from Professor Erhard Schmidt, Berlin [...] a long manuscript of an author unknown to me, Johann von Neumann, with the title Die Axiomatisierung der Mengerlehre, this being his eventual doctor dissertation which appeared in the Zeitschrift only in 1928 [...] I asked to express my view since it seemed incomprehensible. I don't maintain that I understood anything, but enough to see that this was an outstanding work, and to recognize ex ungue leonem [the claws of the lion]. While answering in this sense, I invited the young scholar to visit me in Marburg, and discussed things with him, strongly advising him to prepare the ground for the understanding of so technical an essay by a more informal essay which could stress the new access to the problem and its fundamental consequences. He wrote such an essay under the title Eine Axiomatisierung der Mengerlehre and I published it in 1925.

In University (1921–1926)As Macrae (1992) writes, there was never much doubt that Johnny would one day be attending university. Johnny’s father, Max, initially wanted him to follow in his footsteps and become a well-paid financier, worrying about the financial stability of a career in mathematics. However, with the help of the encouragement from Hungarian mathematicians such as Lipót Fejér and Rudolf Ortvay, his father eventually acquiesced and decided to let von Neumann pursue his passions, financing his studies abroad.

Johnny, apparently in agreement with his father, decided initially to pursue a career in chemical engineering. As he didn’t have any knowledge of chemistry, it was arranged that he could take a two-year non-degree course in chemistry at the University of Berlin. He did, from 1921 to 1923, afterwards sitting for and passing the entrance exam to the prestigious ETH Zurich. Still interested in pursuing mathematics, he also simultaneously entered University Pázmány Péter (now Eötvös Loránd University) in Budapest as a Ph.D. candidate in mathematics. His Ph.D. thesis, officially written under the supervision of Fejér, regarded the axiomatization of Cantor’s set theory. As he was officially in Berlin studying chemistry, he completed his Ph.D. largely in absentia, only appearing at the University in Budapest at the end of each term for exams. While in Berlin, he collaborated with Erhard Schmidt on set theory and also attended courses in physics, including statistical mechanics taught by Albert Einstein. At ETH, starting in 1923, he continued both his studies in chemistry and his research in mathematics.

“Evidently, a Ph.D. thesis and examinations did not constitute an appreciable effort” — Eugene Wigner

Two portraits of John von Neumann (1920s)In mathematics, he first studied Hilbert’s theory of consistency with German mathematician Hermann Weyl. He eventually graduated both as a chemical engineer from ETH and with Ph.D. in mathematics, summa cum laude from the University of Budapestin 1926 at 24 years old.

“There was a seminar for advanced students in Zürich that I was teaching and von Neumann was in the class. I came to a certain theorem, and I said it is not proved and it may be difficult. von Neumann didn’t say anything but after five minutes he raised his hand. When I called on him he went to the blackboard and proceeded to write down the proof. After that I was afraid of von Neumann” — George Pólya

From von Neumann’s Fellowship application to the International Education Board (1926)His application to the Rockefeller-financedInternational Education Board (above) for a six-month fellowship to continue his research at the University of Göttingen mentions Hungarian, German, English, French and Italian as spoken languages, and was accompanied by letters of recommendation from Richard Courant, Hermann Weyl and David Hilbert, three of the world’s foremost mathematicians at the time (Leonard, 2010).

In Göttingen (1926–1930)

The Auditorium Maximum at the University of Göttingen, 1935Johnny traveled to Göttingen in the fall of 1926 to continue his work in mathematics under David Hilbert, likely the world’s foremost mathematician of that time. Reportedly, according to Leonard (2010), von Neumann was initially attracted to Hilbert’s stance in the debate over so-called metamathematics, also known as formalism and that this is what drove him to study under Hilbert. In particular, in his fellowship application, he wrote of his wish to conduct (Leonard, 2010)

"Research over the bases of mathematics and of the general theory of sets, especially Hilbert's theory of uncontradictoriness [...], [investigations which] have the purpose of clearing up the nature of antinomies of the general theory of sets, and thereby to securely establish the classical foundations of mathematics. Such research render it possible to explain critically the doubts which have arisen in mathematics"

Very much both in the vein and language of Hilbert, von Neumann was likely referring to the fundamental questions posed by Georg Cantor regarding the nature of infinite sets starting in the 1880s. von Neumann, along with Wilhelm Ackermann and Paul Bernays would eventually become Hilbert’s key assistants in the elaboration of his Entscheidungsproblem (“decision problem”) initiated in 1918. By the time he arrived in Göttingen, von Neumann was already well acquainted with the topic, in addition to his Ph.D. dissertation having already published two related papers while at ETH.

Set theoryJohn von Neumann wrote a cluster of papers on set theory and logic while in his twenties:

von Neumann (1923). His first set theory paper is entitledZur Einführung der transfiniten Zahlen (“On the introduction of transfinite numbers”) and regards Cantor’s 1897 definition of ordinal numbers as order types of well-ordered sets. In the paper, von Neumann introduces a new theory of ordinal numbers, which regards an ordinal as the set of the preceding ordinals (Van Heijenoort, 1970).von Neumann (1925). His second set theory paper is entitled Eine Axiomatisierung der Mengenlehre (“An axiomatization of set theory”). It is the first paper that introduces what would later be known as the von Neumann-Bernays-Gödel set theory (NBG) and includes the first introduction of the concept of a class, defined using the primitive notions of functions and arguments. In the paper, von Neumann takes a stance in the foundations of mathematics debate, objecting to Brouwer and Weyl’s willingness to ‘sacrifice much of mathematics and set theory’, and logicists’ ‘attempts to build mathematics on the axiom of reducibility’. Instead, he argued for the axiomatic approach of Zermelo and Fraenkel, which, in von Neumann’s view, replaced vagueness with rigor (Leonard, 2010).von Neumann (1926). His third paper Az általános halmazelmélet axiomatikus felépitése, his doctoral dissertation, which contains the main points which would be published in German for the first time in his fifth paper.von Neumann (1928). In his fourth set theory paper, entitled Die Axiomatisierung der Mengenlehre (“The Axiomatization of Set Theory”), von Neumann formally lays out his own axiomatic system. With its single page of axioms, it was the most succinct set theory axioms developed at the time, and formed the basis for the system later developed by Gödel and Berneys.von Neumann (1928). His fifth paper on set theory, “Über die Definition durch transfinite Induktion und verwandte Fragen der allgemeinen Mengenlehre” (“On the Definition by Transfinite Induction and related questions of General Set Theory”) proves the possibility of definition by transfinite induction. That is, in the paper von Neumann demonstrates the significance of axioms for the elimination of the paradoxes of set theory, proving that a set does not lead to contradictions if and only if its cardinality is not the same as the cardinality of all sets, which implies the axiom of choice (Leonard, 2010).von Neumann (1929). In his sixth set theory paper, Über eine Widerspruchsfreiheitsfrage in der axiomatischen Mengenlehre, von Neumann discusses the questions of relative consistency in set theory (Van Heijenoort, 1970).

Summarized, von Neumann’s main contribution to set theory is what would become the von Neumann-Bernays-Gödel set theory (NBG), an axiomatic set theory that is considered a conservative extension of the accepted Zermelo-Fraenkel set theory (ZFC). It introduced the notion of class (a collection of sets defined by a formula whose quantifiers range only over sets) and can define classes that are larger than sets, such as the class of all sets and the class of all ordinal numbers.

Left: John von Neumann in the 1920s. Right: von Neumann, J (1923). Zur Einführung der transfiniten Zahlen (“On the introduction of transfinite numbers”). Acta Litterarum ac Scientiarum Regiae Universitatis Hungaricae Francisco-Josephinae, sectio scientiarum mathematicarum, 1, pp. 199–208. Inspired by the works of Georg Cantor, Ernst Zermelo’s 1908 axioms for set theory and the 1922 critiques of Zermelo’s set theory by Fraenkel and Skolem, von Neumann’s work provided solutions to some of the problems of Zermelo set theory, leading to the eventual development of Zermelo-Fraenkel set theory (ZFC). The problems he helped resolve include:

The problem of developing Cantor’s theory of ordinal numbers in Zermelo set theory. von Neumann redefined ordinals using sets that are well-ordered using the so-called ?-relation.The problem of finding a criterion identifying classes that are too large to be sets. von Neumann introduced the criterion that a class is too large to be a set if and only if it can be mapped onto the class of all sets.Zermelo’s somewhat imprecise concept of a ‘definite propositional function’ in his axiom of separation. von Neumann formalized the concept with his functions, whose construction requires only finitely many axioms.The problem of Zermelo’s foundations of the empty set and an infinite set, and iterating the axioms of pairing, union, power set, separation and choice to generate new sets. Fraenkel introduced an axiom to exclude sets. von Neumann revised Fraenkel’s formulation in his axiom of regularity to exclude non-well-founded sets.

Of course, following the critiques and further revisions of Zermelo’s set theory by Fraenkel, Skolem, Hilbert and von Neumann, a young mathematician by the name of Kurt Gödel in 1930 published a paper which would effectively end von Neumann’s efforts in formalist set theory, and indeed Hilbert’s formalist program altogether, his theorem of incompleteness. von Neumann happened to be in the audience when Gödel first presented it:

"At a mathematical conference preceding Hilbert's address, a quiet, obscure young man, Kurt Gödel, only a year beyond his PhD, announced a result which would forever change the foundations of mathematics. He formalized the liar paradox, "This statement is false" to prove roughly that for any effectively axiomatized consistent extension T of number theory (Peano arithmetic) there is a sentence s which asserts its own unprovability in T.John von Neumann, who was in the audience immediately understood the importance of Gödel's incompleteness theorem. He was at the conference representing Hilbert's proof theory program and recognized that Hilbert's program was over.In the next few weeks von Neumann realized that by arithmetizing the proof of Gödel's first theorem, one could prove an even better one, that no such formal system T could prove its own consistency. A few weeks later he brought his proof to Gödel, who thanked him and informed him politely that he had already submitted the second incompleteness theorem for publication."- Excerpt, Computability. Turing, Gödel, Church and Beyond by Copeland et al. (2015)

One of Gödel’s lifelong supporters, von Neumann later stated that

“Gödel is absolutely irreplaceable. In a class by himself.”

By the end of 1927, von Neumann had published twelve major papers in mathematics. His habilitation (qualification to conduct independent university teaching) was completed in December of 1927, and he began lecturing as a Privatdozent at the University of Berlin in 1928 at the age of 25, the youngest Privatdozent ever elected in the university’s history in any subject.

"By the middle of 1927 it was clearly desirable for the young eagle Johnny to soar from Hilbert's nest. Johnny had spent his undergraduate years explaining what Hilbert had got magnificently right but was now into his postgraduate years where had to explain what Hilbert had got wrong"- Excerpt, John von Neumann by Norman Macrae (1992)

Game theoryAround the same time he was making contributions to set theory, von Neumann also proved a theorem known as the minimax theorem for zero-sum games, which would later lay the foundation for the new field of game theory as a mathematical discipline. The minimax theorem may be summarized as follows:

The Minimax Theorem (von Neumann, 1928) The minimax theorem provides the conditions that guarantee that the max-min inequality is also an equality, i.e. that every finite, zero-sum, two-person game has optimal mixed strategies.

The proof was published in Zur Theorie der Gesellschaftsspiele (“On the Theory of Games of Strategy”) in 1928. In collaboration with economist Oskar Morgenstern, von Neumann later published the definitive book on such cooperative, zero-sum games, Theory of Games and Economic Behavior (1944).

Left: von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele (“On the Theory of Games of Strategy”). Right: First edition copy of Theory of Games and Economic Behavior (1944) by John von Neumann and Oskar Morgenstern (Photo: Whitmore Rare Books). By the end of 1929, von Neumann’s number of published major papers had risen to 32, averaging almost one major paper per month. In 1929 he briefly became a Privatdozent at the University of Hamburg, where he found the prospects of becoming a professor to be better.

Quantum mechanicsIn a shortlist von Neumann himself submitted to the National Academy of Sciences later in his life, he listed his work on quantum mechanics in Göttingen (1926) and Berlin (1927–29) as the “most essential”. The term quantum mechanics, largely devised by Göttingen’s own twenty-three year old wunderkind Werner Heisenberg the year before was still hotly debated, and in the same year von Neumann arrived, Erwin Schrödinger, then working from Switzerland, had rejected Heisenberg’s formulation as completely wrong (Macrae, 1992). As the story goes:

"In Johnny's early weeks at Göttingen in 1926, Heisenberg lectured on the difference between his and Schrödinger's theories. The aging Hilbert, professor of mathematics, asked his physics assistant, Lothar Nordheim, what on earth this young man Heisenberg was talking about. Nordheim sent to the professor a paper that Hilbert still did not understand. To quote Nordheim himself, as recorded in Heims's book: "When von Neumann saw this, he cast it in a few days into elegant axiomatic form, much to the liking of Hilbert." To Hilbert's delight, Johnny's mathematical exposition made much use of Hilbert's own concept of Hilbert space."- Excerpt, John von Neumann by Norman Macrae (1992)

Starting with the incident above, in the following years, von Neumann published a set of papers which would establish a rigorous mathematical framework for quantum mechanics, now known as the Dirac-von Neumann axioms. As Van Hove (1958) writes,

"By the time von Neumann started his investigations on the formal framework of quantum mechanics this theory was known in two different mathematical formulations: the "matrix mechanics" of Heisenberg, Born and Jordan, and the "wave mechanics" of Schrödinger. The mathematical equivalence of these formulations had been established by Schrödinger, and they had both been embedded as special cases in a general formalism, often called "transformation theory", developed by Dirac and Jordan.This formalism, however, was rather clumsy and it was hampered by its reliance upon ill-defined mathematical objects, the famous delta-functions of Dirac and their derivatives. [..] [von Neumann] soon realized that a much more natural framework was provided by the abstract, axiomatic theory of Hilbert spaces and their linear operators."- Excerpt, Von Neumann's Contributions to Quantum Theory by Léon Van Hove (1958)

In the period from 1927–31, von Neumann published five highly influential papers relating to quantum mechanics:

von Neumann (1927). Mathematische Begründung der Quantenmechanik (“Mathematical Foundation of Quantum Mechanics”) in Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse pp. 1–57.von Neumann (1927). Wahrscheinlichkeitstheoretischer Aufbau der Quantenmechanik (“Probabilistic Theory of Quantum Mechanics”) in Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse pp. 245–272.von Neumann (1927). Thermodynamik quantenmechanischer Gesamtheiten (“Thermodynamics of Quantum Mechanical Quantities”) in Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse. pp. 273–291.von Neumann (1930). Allgemeine Eigenwerttheorie Hermitescher Funktionaloperatoren (“General Eigenvalue Theory of Hermitian Functional Operators”) in Mathematische Annalen 102 (1) pp 49–131.von Neumann (1931). Die Eindeutigkeit der Schrödingerschen Operatoren (“The uniqueness of Schrödinger operators”) in Mathematische Annalsen 104 pp 570–578.

His basic insight, which neither Heisenberg, Bohr or Schrödinger had, in the words of Paul Halmos was “that the geometry of the vectors in a Hilbert space as the same formal properties as the structure of the states of a quantum mechanical system” (Macrae, 1992). That is, von Neumann realized that a state of a quantum system could be represented by the point of a complex Hilbert space, that in general, could be infinite-dimensional even for a single particle. In such a formal view of quantum mechanics, observable quantities such as position or momentum are represented as linear operators acting on the Hilbert space associated with the quantum system (Macrae, 1992). The uncertainty principle, for instance, in von Neumann’s system is translated into the non-commutativity of two corresponding operators.

Summarized, von Neumann’s contributions to quantum mechanics can be said to broadly be two-fold, consisting of:

The mathematical framework of quantum theory, where states of the physical system are described by Hilbert space vectors and measurable quantities (such as position, momentum and energy) by unbounded hermitian operators acting upon them; andThe statistical aspects of quantum theory. In the course of his formulation of quantum mechanics in terms of vectors and operators in Hilbert spaces, von Neumann also gave the basic rule for how the theory should be understood statistically (Van Hove, 1958). That is, as the result of the measurement of a given physical quantity on a system in a given quantum state, its probability distribution should be expressed by means of a vector representing the state and the spectral resolution of the operator representing the physical quantity.

First edition copy of Mathematische Grundlagen der Quantenmechanik (1932) by John von NeumannHis work on quantum mechanics was eventually collected in the highly influential 1932 book Mathematische Grundlagen der Quantenmechanik (“Mathematical Foundations for Quantum Mechanics”), considered the first rigorous and complete mathematical formulation of quantum mechanics.

Quantum mechanics was very fortunate indeed to attract, in the very first years after its discovery in 1925, the interest of a mathematical genius of von Neumann’s stature. As a result, the mathematical framework of the theory was developed and the formal aspects of its entirely novel rules of interpretation were analysed by one single man in two years (1927–1929). — Van Hove (1958)

Operator theoryFollowing his work in set theory and quantum mechanics, while still in Berlin, von Neumann next turned his attention to algebra, in particular operator theory which concerns the study of linear operators on function spaces. The most trivial examples are the differential and integral operators we all remember from calculus. von Neumann introduced the study of rings of operators through his invention of what is now known as von Neumann algebras, defined as

Definition of a von Neumann algebra A von Neumann algebra is a *-algebra of bounded operators on a Hilbert space that is closed in the weak operator topology and contains the identify operator

The work was published in the paper Zur Algebra der Funktionaloperationen und Theorie der normalen Operatoren (“On the Algebra of Functional Operations and Theory of Normal Operators”) in 1930.

In AmericaJohn von Neumann first travelled to America while still a Privatdozent at the University of Hamburg in October 1929 when he was invited to lecture on quantum theory at Princeton University. The visit led to an invitation to return as a visiting professor, which he did in the years 1930–33. The same year this tenure finished, Adolf Hitler first came to power in Germany, leading von Neumann to abandon his academic posts in Europe altogether, stating about the Nazi regime that

“If these boys continue for two more years, they will ruin German science for a generation — at least”

By most accounts, of course, von Neumann’s prediction turned out true. The following year, when asked by the Nazi minister of education “How mathematics is going at Göttingen, now that it is free from the Jewish influence?” Hilbert is said to have replied:

“There is no mathematics in Göttingen anymore.”

At Princeton University (1930–1933)The circumstances under which von Neumann (and a plethora of other first-rate mathematicians and physicists) would find themselves in Princeton, New Jersey in the mid-1930s is by now well known.

In the case of von Neumann in particular, he was recruited alongside his Lutheran high school contemporary Eugene Wigner by Princeton University professor Oswald Veblen, on a recommendation from Princeton, according to Wigner (Macrae, 1992) to:

"..invite not a single person but at least two, who already knew each other, who wouldn't suddenly feel put on an island where they had no intimate contact with anybody. Johnny's name was of course well known by that time the world over, so they decided to invite Johnny von Neumann. They looked: who wrote articles with John von Neumann? They found: Mr. Wigner. So they sent a telegram to me also."- Excerpt, John von Neumann by Norman Macrae (1992)

And so von Neumann first came to Princeton in 1930 as a visiting professor. Regarding his work while there, von Neumann himself later in life especially highlighted his work on ergodic theory.

Ergodic theoryErgodic theory is the branch of mathematics that studies the statistical properties of deterministic dynamical systems. Formally, ergodic theory is concerned with the states of dynamical systems with an invariant measure. Informally, think of how the planets move according to Newtonian mechanics in a solar system: the planets move but the rule governing the planets’ motion remains constant. In two papers published in 1932, von Neumann made foundational contributions to the theory of such systems, including the von Neumann’s mean ergodic theorem, considered the first rigorous mathematical basis for the statistical mechanics of liquids and gases. The two papers are titled Proof of the Quasi-ergodic Hypothesis (1932) and Physical Applications of the Ergodic Hypothesis (1932).

A subfield of measure theory, ergodic theory in other words concerns the behavior of dynamical systems which are allowed to run for a long time. von Neumann’s ergodic theorem is one of the two most important theorems in the field, the other being by Birkhoff (1931). According to Halmos (1958)

"The profound insight to be gained from [von Neumann's] paper is that the whole problem is essentially group-theoretic in character, and that, in particular, for the solvability of the problem of measure the ordinary algebraic concept of solvability of a group is relevant. Thus, according to von Neumann, it is the change of group that makes a difference, not the change of space; replacing the group of rigid motions by other perfectly reasonable groups we can produce unsolvable problems in R2 and solvable ones in R3."- Excerpt, Von Neumann on Measure and Ergodic Theory by Paul R. Halmos (1958)

“If von Neumann had never done anything else, they would have been sufficient to guarantee him mathematical immortality” — Paul Halmos (1958)

At the Institute for Advanced StudyFollowing his three-year stay as a visiting professor at Princeton in the period 1930–33, von Neumann was offered a lifetime professorship on the faculty of the Institute for Advanced Study (IAS) in 1933. He was 30 years old. The offer came after the the institute’s plan to appoint von Neumann’s former professor Hermann Weyl fell through (Macrae, 1992). Having only been founded three years prior, von Neumann became one of the IAS’ first six professors, the others being J. W. Alexander, Albert Einstein, Marston Morse, Oswald Veblen and eventually, Hermann Weyl.

Institute for Advanced Study in Princeton, New Jersey (Photo: Cliff Compton)When he joined in 1933, the Institute was still located in the math department of Princeton University’s Fine Hall. Founded in 1930 by Abraham Flexner and funded by philanthropy money from Louis Bamberger and Caroline Bamberger Fuld, the Institute for Advanced Study was and is still a university unlike any other. Inspired by Flexner’s experiences at Heidelberg University, All Souls College, Oxford and the Collège de France, the IAS has been described as

“ A first-rate research institution with no teachers, no students, no classes, but only researchers protected from the vicissitudes and pressures of the outside world.” — Sylvia Nasar (1998)

In 1939 moved to its own campus and common room Fuld Hall, the Institute for Advanced Study in a matter of a few years in the early 1930s effectively inherited the University of Göttingen’s throne as the foremost center of the mathematical universe. The dramatic and swift change has since become known as the “Great Purge” of 1933, as a number of top rate academics fled Europe, fearing for their safety. Among them, in addition to von Neumann and Wigner, of course was Einstein (1933), Max Born (1933), fellow Budapestians Leó Szilárd (1938) and Edward Teller (1933), as well as Edmund Landau (1927), James Franck (1933) and Richard Courant (1933), among others.

Left: Photo of part of the faculty at the Institute for Advanced Study, including its most famous resident Albert Einstein, and John von Neumann, visible in the background. Right: Julian Bigelow, Herman Goldstine, J. Robert Oppenheimer and John von Neumann in front of MANIAC, the Institute for Advanced Study computer. GeometryWhile at the Institute for Advanced Study, von Neumann founded the field of continuous geometry, an analogue of complex projective geometry where instead of a dimension of a subspace being in a discrete set 0, 1, …, n, it can be an element of the unit interval [0,1].

A continuous geometry is a lattice L with the following properties:- L is modular - L is complete - The lattice operations satisfy a continuity property - Every element in L has a complement - L is irreducible, meaning the only elements with unique complements are 0 and 1

As with his result in ergodic theory, von Neumann published two papers on continuous geometry, one proving its existence and discussing its properties, and one providing examples:

von Neumann (1936). Continuous geometry. Proceedings of the National Academy of Sciences 22 (2) pp. 92–100.von Neumann (1936). Examples of continuous geometries. Proceedings of the National Academy of Sciences 22 (2) pp. 101–108;

The Manhattan Project (1937–1945)In addition to his academic pursuits, beginning in the mid to late 30s, von Neumann developed an expertise in the science of explosions, phenomena which are very hard to model mathematically. In particular, von Neumann became a leading authority on the mathematics of shaped charges, explosive charges shaped to focus the effect of the energy of an explosive.

By 1937, according to Macrae, von Neumann had decided for himself that war was clearly coming. Although obviously suited for advanced strategic and operations work, humbly he instead applied to become a lieutenant in the reserve of the ordnance department of the U.S.Army. As a member of the Officers’s Reserve Corps, this would mean that he could get trouble-free access to various sorts of explosion statistics, which he thought would be fascinating (Macrae, 1992).

Left: The photo from von Neumann’s Los Alamos ID badge. Right: John von Neumann talking with Richard Feynman and Stanislaw Ulam in Los Alamos (Photo: ) Needless to say, von Neumann‘s main contributions to the atomic bomb would not be as a lieutenant in the reserve of the ordnance department, but rather in the concept and design of the explosive lenses that were needed to compress the plutonium core of the Fat Man weapon that was later dropped on Nagasaki.

A member of the Manhattan Project in Los Alamos, New Mexico, von Neumann in 1944 showed that the pressure increase from explosion shock wave reflections from solid objects was greater than previously believed, depending on the angle of its incidence. The discovery led to the decision to detonate atomic bombs some kilometers above the target, rather than on impact (Macrae, 1992). von Neumann was present during the first Trinity test on July 16th, 1945 in the Nevada desert as the first atomic bomb test ever successfully detonated.

Work on philosophy

von Neumann speaking at the American Philosophical Society in 1957. Photo: Alfred EisenstaedtMacrae (1992) makes the point that in addition to being one of the foremost mathematicians in his lifetime, in many ways, von Neumann should perhaps also be considered one of his era’s most important philosophers. Professor of philosophy John Dorling at the University of Amsterdam, in particular, highlights in particular von Neumann’s contributions to the philosophy of mathematics (including set theory, number theory and Hilbert spaces), physics (especially quantum theory), economics (game theory), biology (cellular automata), computers and artificial intelligence.

His work on the latter two, computers and artificial intelligence (AI) occurred first while he was in Princeton in the mid 1930s meeting with the 24 year old Alan Turing first when the latter spent a year at the IAS in 1936–37. Turing began his career by working in the same field as von Neumann had — on working on set theory, logic and Hilbert’s Entscheidungsproblem. When he finished his Ph.D at Princeton in 1938, Turing had extended the work of von Neumann and Gödel and introduced ordinal logic and the notion of relative computing, augmenting his previously devised Turing machines with so-called oracle machines, allowing the study of problems that lay beyond the capabilities of Turing machines. Although inquired about by von Neumann for a position as a postdoctoral research assistant following his Ph.D., Turing declined and instead travelled back to England.(Macrae, 1992).

Work on computing

"After having been here for a month, I was talking to von Neumann about various kinds of inductive processes and evolutionary processes, and just as an aside he said, "Of course that's what Turing was talking about." And I said, "Who's Turing?". And he said, "Go look up Proceedings of the London Mathematical Society, 1937".The fact that there is a universal machine to imitate all other machines ... was understood by von Neumann and few other people. And when he understood it, then he knew what we could do." - Julian Bigelow"- Excerpt, Turing's Cathedral by George Dyson (2012)

Although Turing left, von Neumann continued thinking about computers throughout the end of the 30s and the war. Following his experiences working on the Manhattan Project, he was first drawn into the ENIAC project at the Moore School of Engineering at the University of Pennsylvania during the summer of 1944. Having observed the large amounts of calculation needed to predict blast radii, plan bomb paths and break encryption schemes, von Neumann early saw the need for substantial increases in computing power.

In 1945, von Neumann proposed a description for a computer architecture now known as the von Neumann architecture, which includes the basics of a modern electronic digital computer including:

A processing unit that contains an arithmetic logic unit and processor registers;A control unit that contains an instruction register and a program counterA memory unit that can store data and instructions;External storage; andInput and output mechanisms;

John von Neumann with the IAS machine, sometimes called the “von Neumann Machine”, stored in the the basement of Fuld Hall from 1942–1951 (Photo: Alan Richards) The same year, in software engineering, von Neumann invented the so-called merge sort algorithm which divides arrays in half before sorting them recursively and then merging them. von Neumann himself wrote the first 23 page sorting program for the EDVAC computer in ink. In addition, in a pioneering 1953 paper entitled Probabilistic Logics and the Synthesis of Reliable Organisms from Unrealiable Components, von Neumann was first to introduce stochastic computing, though the idea was so groundbreaking that it could not be implemented for another decade or so (Petrovik & Siljak, 1962). Related, von Neumann created the field of cellular automata through his rigorous mathematical treatment of the structure of self-replication, which preceded the discovery of the structure of DNA by several years.

Although influential in his own right, throughout his life, von Neumann made sure to acknowledge that the central concept of the modern computer was indeed Turing’s 1936 paperOn Computable Numbers, with an Application to the Entscheidungsproblem (Fraenkel, 1972)

”von Neumann firmly emphasised to me, and to others I am sure, that the fundamental conception is owing to Turing — insofar as not anticipated by Babbage, Lovelace and others.” — Stanley Fraenkel (1972)

Consultancies

"The only part of your thinking we'd like to bid for systematically is that which you spend shaving: we'd like you to pass on to us any ideas that come to you while so engaged."Excerpt, Letter from the Head of the RAND Corporation to von Neumann (Poundstone, 1992)

Throughout his career in America, von Neumann held a number of consultancies for various private, public and defense contractors including the National Defense Research Council (NDRC), the Weapons Systems Evaluation Group (WSEG), the Central Intelligence Agency (CIA), the Lawrence Livermore National Laboratory (LLNL) and the RAND Corporation, in addition to being an advisor to the Armed Forces Specials Weapons Project, a member of the General Advisory Committee of the Atomic Energy Commission, of the Scientific Advisory Group of the United States Air Force and in 1955 a commissioner of the Atomic Energy Commission (AEC).

PersonalityDespite his many appointments, responsibilities and copious research output, von Neumann lived a rather unusual lifestyle for a mathematician. As described by Vonnauman and Halmos:

“Parties and nightlife held a special appeal for von Neumann. While teaching in Germany, von Neumann had been a denizen of the Cabaret-era Berlin nightlife circuit.” — Vonneuman (1987)

The parties at the von Neumann’s house were frequent, and famous, and long. — Halmos (1958)

John von Neumann with his wife Klari Dan and their dog (Photo: Alan Richards)

His first wife, Klara, said that he could count everything except calories.

von Neumann also enjoyed Yiddish and dirty jokes, especially limericks (Halmos, 1958). He was a non-smoker, but at the IAS received complaints for regularly playing extremely loud German march music on the gramophone in his office, distracting those in neighboring offices, including Albert Einstein. In fact, von Neumann claimed to do some of his best work in noisy, chaotic environments such as in the living room of his house with the television blaring. Despite being a bad driver, he loved driving, often while reading books, leading to various arrests and accidents.

Von Neumann in the Florida Everglades in 1938 (Photo: Marina von Neumann Whitman)As a thinkerStanislaw Ulam, one of von Neumann’s close friends, described von Neumann’s mastery of mathematics as follows:

“Most mathematicians know one method. For example, Norbert Wiener had mastered Fourier transforms. Some mathematicians have mastered two methods and might really impress someone who knows only one of them. John von Neumann had mastered three methods: 1) A facility for the symbolic manipulation of linear operators, 2) An intuitive feeling for the logical structure of any new mathematical theory; and 3) An intuitive feeling for the combinatorial superstructure of new theories.”

Biographer Sylvia Nasar describes von Neumann’s own “thinking machine” by the following, now well-known anecdote regarding the so-called “ two trains puzzle”:

Two bicyclists start twenty miles apart and head toward each other, each going at a steady rate of 10 m.p.h. At the same time, a fly that travels at a steady 15 m.p.h. starts from the front wheel of the southbound bicycle and flies to the front wheel of the northbound one, then turns around and flies to the front wheel of the southbound one again, and continues in this manner till he is crushed between the two front wheels. Question: what total distance did the fly cover?There are two ways to answer the problem. One is to calculate the distance the fly covers on each leg of its trips between the two bicycles and finally sum the infinite series so obtained. The quick way is to observe that the bicycles meet exactly an hour after they start so that the fly had just an hour for his travels; the answer must therefore be 15 miles. When the question was put to von Neumann, he solved it in an instant, and thereby disappointed the questioner: “Oh, you must have heard the trick before!” “What trick,” asked von Neumann, “all I did was sum the infinite series.”Excerpt, A Beautiful Mind (Nasar, 1998)

As a supervisor

In the paper Szeged in 1934 (Lorch, 1993) Edgar R. Lorch describes his experience of working as an assistant for von Neumann in the 1930s, including his duties:

Attending von Neumann’s lectures on operator theory, taking notes, completing unfinished proofs and circulating them to all American university libraries;Assisting von Neumann in his role as the editor of the Annals of Mathematics by reading through every manuscript accepted to the publication, underlining greek letters in red and german letters in green, circling italics, writing notes to printers in the margins and going once per week to the printers in order to instruct them in the art of typesetting;Translating von Neumann’s numerous 100-page papers into English;

"His fluid line of thought was difficult for those less gifted to follow. He was notorious for dashing out equations on a small portion of the available blackboard and erasing expressions before students could copy them."- Excerpt, John von Neumann: As Seen by his Brother by N.A. Vonneuman (1987)

Later years

President Dwight D. Eisenhower (left) presenting John von Neumann (right) the Presidential Medal of Freedom in 1956In 1955, Von Neumann was diagnosed with what was likely either bone, pancreatic or prostate cancer (accounts differ on which diagnosis was made first). He was 51 years old. Following two years of illness which at the end confined him to a wheelchair, he eventually died on the 8th of February 1957, at 53 years old. On his deathbed, he reportedly entertained his brother by reciting the first few lines of each page from Goethe’s Faust, word-for-word, by heart (Blair, 1957).

He is buried at Princeton Cemetery in Princeton, New Jersey alongside his lifelong friends Eugene Wigner and Kurt Gödel. Gödel wrote him a letter a year before his death, which has been made public. The letter is discussed in detail by Hartmanis (1989) in his working paper The Structural Complexity Column. An excerpt is included below:

Letter from Kurt Gödel to von Neumann, March 20th 1956Dear Mr. von Neumann: With the greatest sorrow I have learned of your illness. The news came to me as quite unexpected. Morgenstern already last summer told me of a bout of weakness you once had, but at that time he thought that this was not of any greater significance. As I hear, in the last months you have undergone a radical treatment and I am happy that this treatment was successful as desired, and that you are now doing better. I hope and wish for you that your condition will soon improve even more and that the newest medical discoveries, if possible, will lead to a complete recovery.[...]I would be very happy to hear something from you personally. Please let me know if there is something that I can do for you. With my best greetings and wishes, as well to your wife,Sincerely yours, Kurt GödelP.S. I heartily congratulate you on the award that the American government has given to you

Interview on TelevisionRemarkably, there exists a video interview with von Neumann on the NBC show America’s Youth Wants to Know in the early 1950s (below):

For anyone interested in learning more about the life and work of John von Neumann, I especially recommend his friend Stanislaw Ulam’s 1958 essayJohn von Neumann 1903–1957 in the Bulletin of the American Mathematical Society 64 (3) pp 1–49 and the book John von Neumann* by Norman Macrae (1992).

This essay is part of a series of stories on math-related topics, published in Cantor’s Paradise, a weekly Medium publication. Thank you for reading!

Twenty years after an apparent anomaly in the behavior of elementary particles raised hopes of a major physics breakthrough, a new measurement has solidified them: Physicists at Fermi National Accelerator Laboratory near Chicago announced today that muons — elementary particles similar to electrons — wobbled more than expected while whipping around a magnetized ring.

The widely anticipated new measurement confirms the decades-old result, which made headlines around the world. Both measurements of the muon’s wobbliness, or magnetic moment, significantly overshoot the theoretical prediction, as calculated last year by an international consortium of 132 theoretical physicists. The Fermilab researchers estimate that the difference has grown to a level quantified as “4.2 sigma,” well on the way to the stringent five-sigma level that physicists need to claim a discovery.

Taken at face value, the discrepancy strongly suggests that unknown particles of nature are giving muons an extra push. Such a discovery would at long last herald the breakdown of the 50-year-old Standard Model of particle physics — the set of equations describing the known elementary particles and interactions.

“Today is an extraordinary day, long awaited not only by us but by the whole international physics community,” Graziano Venanzoni, one of the leaders of the Fermilab Muon g-2 experiment and a physicist at the Italian National Institute for Nuclear Physics, told the press.

However, even as many particle physicists are likely to be celebrating — and racing to propose new ideas that could explain the discrepancy — a paper published today in the journal Nature casts the new muon measurement in a dramatically duller light.

The paper, which appeared just as the Fermilab team unveiled its new measurement, suggests that the muon’s measured wobbliness is exactly what the Standard Model predicts.

In the paper, a team of theorists known as BMW present a state-of-the-art supercomputer calculation of the most uncertain term that goes into the Standard Model prediction of the muon’s magnetic moment. BMW calculates this term to be considerably larger than the value adopted last year by the consortium, a group known as the Theory Initiative. BMW’s larger term leads to a larger overall predicted value of the muon’s magnetic moment, bringing the prediction in line with the measurements.

If the new calculation is correct, then physicists may have spent 20 years chasing a ghost. But the Theory Initiative’s prediction relied on a different calculational approach that has been honed over decades, and it could well be right. In that case, Fermilab’s new measurement constitutes the most exciting result in particle physics in years.

“This is a very sensitive and interesting situation,” said Zoltan Fodor, a theoretical particle physicist at Pennsylvania State University who is part of the BMW team.

BMW’s calculation itself is not breaking news; the paper first appeared as a preprint last year. Aida El-Khadra, a particle theorist at the University of Illinois who co-organized the Theory Initiative, explained that the BMW calculation should be taken seriously, but that it wasn’t factored into the Theory Initiative’s overall prediction because it still needed vetting. If other groups independently verify BMW’s calculation, the Theory Initiative will integrate it into its next assessment.

Dominik Stöckinger, a theorist at the Technical University of Dresden who participated in the Theory Initiative and is a member of the Fermilab Muon g-2 team, said the BMW result creates “an unclear status.” Physicists can’t say whether exotic new particles are pushing on muons until they agree about the effects of the 17 Standard Model particles they already know about.

Regardless, there’s plenty of reason for optimism: Researchers emphasize that even if BMW is right, the puzzling gulf between the two calculations could itself point to new physics. But for the moment, the past 20 years of conflict between theory and experiment appear to have been replaced by something even more unexpected: a battle of theory versus theory.

Momentous MuonsThe reason physicists have eagerly awaited Fermilab’s new measurement is that the muon’s magnetic moment — essentially the strength of its intrinsic magnetism — encodes a huge amount of information about the universe.

A century ago, physicists assumed that the magnetic moments of elementary particles would follow the same formula as larger objects. Instead they found that electrons rotate in magnetic fields twice as much as expected. Their “gyromagnetic ratio,” or “g-factor” — the number relating their magnetic moment to their other properties — seemed to be 2, not 1, a surprise discovery later explained by the fact that electrons are “spin-1/2” particles, which return to the same state after making two full turns rather than one.

For years, both electrons and muons were thought to have g-factors of exactly 2. But then in 1947, Polykarp Kusch and Henry Foley measured the electron’s g-factor to be 2.00232. The theoretical physicist Julian Schwinger almost immediately explained the extra bits: He showed that the small corrections come from an electron’s tendency to momentarily emit and reabsorb a photon as it moves through space.

Many other fleeting quantum fluctuations happen as well. An electron or muon might emit and reabsorb two photons, or a photon that briefly becomes an electron and a positron, among countless other possibilities that the Standard Model allows. These temporary manifestations travel around with an electron or muon like an entourage, and all of them contribute to its magnetic properties. “The particle you thought was a bare muon is actually a muon plus a cloud of other things that appear spontaneously,” said Chris Polly, another leader of the Fermilab Muon g-2 experiment. “They change the magnetic moment.”

The rarer a quantum fluctuation, the less it contributes to the electron or muon’s g-factor. “As you go further into the decimal places you can see where suddenly the quarks start to appear for the first time,” said Polly. Further along are particles called W and Z bosons, and so on. Because muons are 207 times heavier than electrons, they’re about 2072 (or 43,000) times more likely to conjure up heavy particles in their entourage; these particles therefore alter the muon’s g-factor far more than an electron’s. “So if you’re looking for particles that could explain the missing mass of the universe — dark matter — or you’re looking for particles of a theory called supersymmetry,” Polly said, “that’s where the muon has a unique role.”

For decades, theorists have strived to calculate contributions to the muon’s g-factor coming from increasingly unlikely iterations of known particles from the Standard Model, while experimentalists measured the g-factor with ever-increasing precision. If the measurement were to outstrip the expectation, this would betray the presence of strangers in the muon’s entourage: fleeting appearances of particles beyond the Standard Model.

Muon magnetic moment measurements began at Columbia University in the 1950s and were picked up a decade later at CERN, Europe’s particle physics laboratory. There, researchers pioneered the measurement technique still used at Fermilab today.

High-speed muons are shot into a magnetized ring. As a muon whips around the ring, passing through its powerful magnetic field, the particle’s spin axis (which can be pictured as a little arrow) gradually rotates. Millionths of a second later, typically after speeding around the ring a few hundred times, the muon decays, producing an electron that flies into one of the surrounding detectors. The varying energies of electrons emanating from the ring at different times reveal how quickly the muon spins are rotating.

In the 1990s, a team at Brookhaven National Laboratory on Long Island built a 50-foot-wide ring to fling muons around and began collecting data. In 2001, the researchers announced their first results, reporting 2.0023318404 for the muon’s g-factor, with some uncertainty in the final two digits. Meanwhile, the most comprehensive Standard Model prediction at the time gave the significantly lower value of 2.0023318319.

It instantly became the world’s most famous eighth-decimal-place discrepancy.

“Hundreds of newspapers covered it,” said Polly, who was a graduate student with the experiment at the time.

Brookhaven’s measurement overshot the prediction by nearly three times its supposed margin of error, known as a three-sigma deviation. A three-sigma gap is significant, unlikely to be caused by random noise or an unlucky accumulation of small errors. It strongly suggested that something was missing from the theoretical calculation, something like a dark matter particle or an extra force-carrying boson.

But unlikely sequences of events sometimes happen, so physicists require a five-sigma deviation between a prediction and a measurement to definitively claim a discovery.

Trouble With HadronsA year after Brookhaven’s headline-making measurement, theorists spotted a mistake in the prediction. A formula representing one group of the tens of thousands of quantum fluctuations that muons can engage in contained a rogue minus sign; fixing it in the calculation reduced the difference between theory and experiment to just two sigma. That’s nothing to get excited about.

But as the Brookhaven team accrued 10 times more data, their measurement of the muon’s g-factor stayed the same while the error bars around the measurement shrank. The discrepancy with theory grew back to three sigma by the time of the experiment’s final report in 2006. And it continued to grow, as theorists kept honing the Standard Model prediction for the g-factor without seeing the value drift upward toward the measurement.

The Brookhaven anomaly loomed ever larger in physicists’ psyches as other searches for new particles failed. Throughout the 2010s, the $20 billion Large Hadron Collider in Europe slammed protons together in hopes of conjuring up dozens of new particles that might complete the pattern of nature’s building blocks. But the collider found only the Higgs boson — the last missing piece of the Standard Model. Meanwhile, a slew of experimental searches for dark matter found nothing. Hopes for new physics increasingly rode on wobbly muons. “I don’t know if it is the last great hope for new physics, but it certainly is a major one,” Matthew Buckley, a particle physicist at Rutgers University, told me.

The original Muon g-2 experiment was constructed at Brookhaven National Laboratory on Long Island in the 1990s. Rather than build a new experiment from scratch, physicists used a series of barges and trucks to move the 700-ton electromagnetic ring down the Atlantic coast, across the Gulf of Mexico, and up the Mississippi, Illinois and Des Plaines rivers to the Fermi National Laboratory in Illinois. Thousands of people came out to celebrate its arrival in July 2013.

Darin Clifton/Ceres Barge; Reidar Hahn

Everyone knew that in order to cross the threshold of discovery, they would need to measure the muon’s gyromagnetic ratio again, and more precisely. So plans for a follow-up experiment got underway. In 2013, the giant magnet used at Brookhaven was loaded onto a barge off Long Island and shipped down the Atlantic Coast and up the Mississippi and Illinois rivers to Fermilab, where the lab’s powerful muon beam would let data accrue much faster than before. That and other improvements would allow the Fermilab team to measure the muon’s g-factor four times more accurately than Brookhaven had.

In 2016, El-Khadra and others started organizing the Theory Initiative, seeking to iron out any disagreements and arrive at a consensus Standard Model prediction of the g-factor before the Fermilab data rolled in. “For the impact of such an exquisite experimental measurement to be maximized, theory needs to get its act together, basically,” she said, explaining the reasoning at the time. The theorists compared and combined calculations of different quantum bits and pieces that contribute to the muon’s g-factor and arrived at an overall prediction last summer of 2.0023318362. That fell a hearty 3.7 sigma below Brookhaven’s final measurement of 2.0023318416.

But the Theory Initiative’s report was not the final word.

Uncertainty about what the Standard Model predicts for the muon’s magnetic moment stems entirely from the presence in its entourage of “hadrons”: particles made of quarks. Quarks feel the strong force (one of the three forces of the Standard Model), which is so strong it’s as if quarks are swimming in glue, and that glue is endlessly dense with other particles. The equation describing the strong force (and thus, ultimately, the behavior of hadrons) can’t be exactly solved.

That makes it hard to gauge how often hadrons pop up in the muon’s midst. The dominant scenario is the following: The muon, as it travels along, momentarily emits a photon, which morphs into a hadron and an antihadron; the hadron-antihadron pair quickly annihilate back into a photon, which the muon then reabsorbs. This process, called hadronic vacuum polarization, contributes a small correction to the muon’s gyromagnetic ratio starting in the seventh decimal place. Calculating this correction involves solving a complicated mathematical sum for each hadron-antihadron pair that can arise.

Uncertainty about this hadronic vacuum polarization term is the primary source of overall uncertainty about the g-factor. A small increase in this term can completely erase the difference between theory and experiment. Physicists have two ways to calculate it.

With the first method, researchers don’t even try to calculate the hadrons’ behavior. Instead, they simply translate data from other particle collision experiments into an expectation for the hadronic vacuum polarization term. “The data-driven approach has been refined and optimized over decades, and several competing groups using different details in their approaches have confirmed each other,” said Stöckinger. The Theory Initiative used this data-driven approach.

But in recent years, a purely computational method has been steadily improving. In this approach, researchers use supercomputers to solve the equations of the strong force at discrete points on a lattice instead of everywhere in space, turning the infinitely detailed problem into a finite one. This way of coarse-graining the quark quagmire to predict the behavior of hadrons “is similar to a weather forecast or meteorology,” Fodor explained. The calculation can be made ultra-precise by putting lattice points very close together, but this also pushes computers to their limits.

The 14-person BMW team — named after Budapest, Marseille and Wuppertal, the three European cities where most team members were originally based — used this approach. They made four chief innovations. First they reduced random noise. They also devised a way of very precisely determining scale in their lattice. At the same time, they more than doubled their lattice’s size compared to earlier efforts, so that they could study hadrons’ behavior near the center of the lattice without worrying about edge effects. Finally, they included in the calculation a family of complicating details that are often neglected, like mass differences between types of quarks. “All four [changes] needed a lot of computing power,” said Fodor.

The researchers then commandeered supercomputers in Jülich, Munich, Stuttgart, Orsay, Rome, Wuppertal and Budapest and put them to work on a new and better calculation. After several hundred million core hours of crunching, the supercomputers spat out a value for the hadronic vacuum polarization term. Their total, when combined with all other quantum contributions to the muon’s g-factor, yielded 2.00233183908. This is “in fairly good agreement” with the Brookhaven experiment, Fodor said. “We cross-checked it a million times because we were very much surprised.” In February 2020, they posted their work on the arxiv.org preprint server.

The Theory Initiative decided not to include BMW’s value in their official estimate for a few reasons. The data-driven approach has a slightly smaller error bar, and three different research groups independently calculated the same thing. In contrast, BMW’s lattice calculation was unpublished as of last summer. And although the result agrees well with earlier, less precise lattice calculations that also came out high, it hasn’t been independently replicated by another group to the same precision.

The Theory Initiative’s decision meant that the official theoretical value of the muon’s magnetic moment had a 3.7-sigma difference with Brookhaven’s experimental measurement. It set the stage for what has become the most anticipated reveal in particle physics since the Higgs boson in 2012.

The RevelationsA month ago, the Fermilab Muon g-2 team announced that they would present their first results today. Particle physicists were ecstatic. Laura Baudis, a physicist at the University of Zurich, said she was “counting the days until April 7,” after anticipating the result for 20 years. “If the Brookhaven results are confirmed by the new experiment at Fermilab,” she said, “this would be an enormous achievement.”

And if not — if the anomaly were to disappear — some in the particle physics community feared nothing less than “the end of particle physics,” said Stöckinger. The Fermilab g-2 experiment is “our last hope of an experiment which really proves the existence of physics beyond the Standard Model,” he said. If it failed to do so, many researchers might feel that “we now give up and we have to do something else instead of researching physics beyond the Standard Model.” He added, “Honestly speaking, it might be my own reaction.”

The 200-person Fermilab team revealed the result to themselves only six weeks ago in an unveiling ceremony over Zoom. Tammy Walton, a scientist on the team, rushed home to catch the show after working the night shift on the experiment, which is currently in its fourth run. (The new analysis covers data from the first run, which makes up 6% of what the experiment will eventually accrue.) When the all-important number appeared on the screen, plotted along with the Theory Initiative’s prediction and the Brookhaven measurement, Walton was thrilled to see it land higher than the former and pretty much smack dab on top of the latter. “People are going to be crazy excited,” she said.

Papers proposing various ideas for new physics are expected to flood the arxiv in the coming days. Yet beyond that, the future is unclear. What was once an illuminating breach between theory and experiment has been clouded by a far foggier clash of calculations.

It’s possible that the supercomputer calculation will turn out to be wrong — that BMW overlooked some source of error. “We need to have a close look at the calculation,” El-Khadra said, stressing that it’s too early to draw firm conclusions. “It is pushing on the methods to get that precision, and we need to understand if the way they pushed on the methods broke them.”

That would be good news for fans of new physics.

Interestingly, though, even if the data-driven method is the approach with an unidentified problem under the hood, theorists have a hard time understanding what the problem could be other than unaccounted-for new physics. “The need for new physics would only shift elsewhere,” said Martin Hoferichter of the University of Bern, a leading member of the Theory Initiative.

Researchers who have been exploring possible problems with the data-driven method over the past year say the data itself is unlikely to be wrong. It comes from decades of ultraprecise measurements of 35 hadronic processes. But “it could be that the data, or the way it is interpreted, is misleading,” said Andreas Crivellin of CERN and other institutions, a coauthor (along with Hoferichter) of one paper studying this possibility.

It’s possible, he explained, that destructive interference happens to reduce the likelihood of the hadronic processes arising in certain electron-positron collisions, without affecting hadronic vacuum polarization near muons; then the data-driven extrapolation from one to the other doesn’t quite work. In that case, though, another Standard Model calculation that’s sensitive to the same hadronic processes gets thrown off, creating a different tension between the theory and data. And this tension would itself suggest new physics.

It’s tricky to resolve this other tension while keeping the new physics “elusive enough to not have been observed elsewhere,” as El-Khadra put it, yet it’s possible — for instance, by introducing the effects of hypothetical particles called vector-like leptons.

Thus the mystery swirling around muons might lead the way past the Standard Model to a more complete account of the universe after all. However things turn out, it’s safe to say that today’s news — both the result from Fermilab, as well as the publication of the BMW calculation in Nature — is not the end for particle physics.

Although the historical annual improvement of about 40% in central processing unit performance is slowing, the combination of CPUs packaged with alternative processors is improving at a rate of more than 100% per annum. These unprecedented and massive improvements in processing power combined with data and artificial intelligence will completely change the way we think about designing hardware, writing software and applying technology to businesses.

Every industry will be disrupted. You hear that all the time. Well, it’s absolutely true and we’re going to explain why and what it all means.

In this Breaking Analysis, we’re going to unveil some data that suggests we’re entering a new era of innovation where inexpensive processing capabilities will power an explosion of machine intelligence applications. We’ll also tell you what new bottlenecks will emerge and what this means for system architectures and industry transformations in the coming decade.

Is Moore’s Law really dead?

We’ve heard it hundreds of times in the past decade. EE Times has written about it, MIT Technology Review, CNET, SiliconANGLE and even industry associations that marched to the cadence of Moore’s Law. But our friend and colleague Patrick Moorhead got it right when he said:

Moore’s Law, by the strictest definition of doubling chip densities every two years, isn’t happening anymore.

And that’s true. He’s absolutely correct. However, he couched that statement saying “by the strictest definition” for a reason… because he’s smart enough to know that the chip industry are masters at figuring out workarounds.

Historical performance curves are being shatteredThe graphic below is proof that the death of Moore’s Law by its strictest definition is irrelevant.

The fact is that the historical outcome of Moore’s Law is actually accelerating, quite dramatically. This graphic digs into the progression of Apple Inc.’s system-on-chip developments from the A9 and culminating in the A14 five-nanometer Bionic system on a chip.

The vertical axis shows operations per second and and the horizontal axis shows time for three processor types. The CPU, measured in terahertz (the blue line which you can hardly see); the graphics processing unit or GPU, measured in trillions of floating point operations per second (orange); and the neural processing unit or NPU, measured in trillions of operations per second (the exploding gray area).

Many folks will remember that historically, we rushed out to buy the latest and greatest personal computer because the newer models had faster cycle times, that is, more gigahertz. The outcome of Moore’s Law was that performance would double every 24 months or about 40% annually. CPU performance improvements have now slowed to roughly 30% annually, so technically speaking, Moore’s Law is dead.

Apple’s SoC performance shatters the normCombined, the improvements in Apple’s SoC since 2015 have been on a pace that’s higher than 118% annual improvement. Actually it’s higher because 118% is the actual figure for these three processor types shown above. In the graphic, we’re not even counting the impact of the digital signal processors and accelerator components of the system, which would push this higher.

Apple’s A14 shown above on the right is quite amazing with its 64-bit architecture, multiple cores and alternative processor types. But the important thing is what you can do with all this processing power – in an iPhone! The types of AI continue to evolve from facial recognition to speech and natural language processing, rendering videos, helping the hearing impaired and eventually bringing augmented reality to the palm of your hand.

Quite incredible.

Processing goes to the edge – networks and storage become the bottlenecksWe recently reported Microsoft Corp. Chief Executive Satya Nadella’s epic quote that we’ve reached peak centralization. The graphic below paints a picture that is telling. We just shared above that processing power is accelerating at unprecedented rates. And costs are dropping like a rock. Apple’s A14 costs the company $50 per chip. Arm at its v9 announcement said that it will have chips that can go into refrigerators that will optimize energy use and save 10% annually on power consumption. They said that chip will cost $1 — a buck to shave 10% off your electricity bill from the fridge.

Processing is plentiful and cheap. But look at where the expensive bottlenecks are: networks and storage. So what does this mean?

It means that processing is going to get pushed to the edge – wherever the data is born. Storage and networking will become increasingly distributed and decentralized. With custom silicon and processing power placed throughout the system with AI embedded to optimize workloads for latency, performance, bandwidth, security and other dimensions of value.

And remember, most of the data – 99% – will stay at the edge. We like to use Tesla Inc. as an example. The vast majority of data a Tesla car creates will never go back to the cloud. It doesn’t even get persisted. Tesla saves perhaps five minutes of data. But some data will connect occasionally back to the cloud to train AI models – we’ll come back to that.

But this picture above says if you’re a hardware company, you’d better start thinking about how to take advantage of that blue line, the explosion of processing power. Dell Technologies Inc., Hewlett Packard Enterprise Co., Pure Storage Inc., NetApp Inc. and the like are either going to start designing custom silicon or they’re going to be disrupted, in our view. Amazon Web Services Inc., Google LLC and Microsoft are all doing it for a reason, as are Cisco Systems Inc. and IBM Corp.. As cloud consultant Sarbjeet Johal has said, “this is not your grandfather’s semiconductor business.”

And if you’re a software engineer, you’re going to be writing applications that take advantage of of all the data being collected and bringing to bear this immense processing power to create new capabilities like we’ve never seen before.

AI everywhereMassive increases in processing power and cheap silicon will power the next wave of AI, machine intelligence, machine learning and deep learning.

We sometimes use artificial intelligence and machine intelligence interchangeably. This notion comes from our collaborations with author David Moschella. Interestingly, in his book “ Seeing Digital,” Moschella says “there’s nothing artificial” about this:

There’s nothing artificial about machine intelligence just like there’s nothing artificial about the strength of a tractor.

It’s a nuance, but precise language can often bring clarity. We hear a lot about machine learning and deep learning and think of them as subsets of AI. Machine learning applies algorithms and code to data to get “smarter” – make better models, for example, that can lead to augmented intelligence and better decisions by humans, or machines. These models improve as they get more data and iterate over time.

Deep learning is a more advanced type of machine learning that uses more complex math.

The right side of the chart above shows the two broad elements of AI. The point we want to make here is that much of the activity in AI today is focused on building and training models. And this is mostly happening in the cloud. But we think AI inference will bring the most exciting innovations in the coming years.

AI inference unlocks huge valueInference is the deployment of the model, taking real-time data from sensors, processing data locally, applying the training that has been developed in the cloud and making micro-adjustments in real time.

Let’s take an example. We love car examples and observing Tesla is instructive and a good model as to how the edge may evolve. So think about an algorithm that optimizes the performance and safety of a car on a turn. The model takes inputs with data on friction, road conditions, angles of the tires, tire wear, tire pressure and the like. And the model builders keep testing and adding data and iterating the model until it’s ready to be deployed.

Then the intelligence from this model goes into an inference engine, which is a chip running software, that goes into a car and gets data from sensors and makes micro adjustments in real time on steering and braking and the like. Now as we said before, Tesla persists the data for a very short period of time because there’s so much data. But it can choose to store certain data selectively if needed to send back to the cloud and further train the model. For example, if an animal runs into the road during slick conditions, maybe Tesla persists that data snapshot, sends it back to the cloud, combines it with other data and further perfects the model to improve safety.

This is just one example of thousands of AI inference use cases that will further develop in the coming decade.

AI value shifts from modeling to inferencingThis conceptual chart below shows percent of spend over time on modeling versus inference. And you can see some of the applications that get attention today and how these apps will mature over time as inference becomes more mainstream. The opportunities for AI inference at the edge and in the “internet of things” are enormous.

Modeling will continue to be important. Today’s prevalent modeling workloads in fraud, adtech, weather, pricing, recommendation engines and more will just keep getting better and better. But inference, we think, is where the rubber meets the road, as shown in the previous example.

And in the middle of the graphic we show the industries, which will all be transformed by these trends.

One other point on that: Moschella in his book explains why historically, vertical industries remained pretty stovepiped from each other. They each had their own “stack” of production, supply, logistics, sales, marketing, service, fulfillment and the like. And expertise tended to reside and stay within that industry and companies, for the most part, stuck to their respective swim lanes.

But today we see so many examples of tech giants entering other industries. Amazon entering grocery, media and healthcare, Apple in finance and EV, Tesla eyeing insurance: There are many examples of tech giants crossing traditional industry boundaries and the enabler is data. Auto manufacturers over time will have better data than insurance companies for example. DeFi or decentralized finance or platforms using the blockchain will continue to improve with AI and disrupt traditional payment systems — and on and on.

Hence we believe the oft-repeated bromide that no industry is safe from disruption.

Snapshot of AI in the enterpriseLast week we showed you the chart below from Enterprise Technology Research.

This is data shows on the vertical axis Net Score or spending momentum. The horizontal axis is Market Share or pervasiveness in the ETR data set. The red line at 40% is our subjective anchor; anything about 40% is really good in our view.

Machine learning and AI are the No. 1 area of spending velocity and has been for a while, hence the four stars. Robotic process automation is increasingly an adjacency to AI and you could argue cloud is where all the machine learning action is taking place today and is another adjacency, although we think AI continues to move out of the cloud for the reasons we just described.

Enterprise AI specialists carve out positionsThe chart below shows some of the vendors in the space that are gaining traction. These are the companies chief information officers and information technology buyers associate with their AI/ML spend.

This graph above uses the same Y/X coordinates – Spending Velocity on the vertical by Market Share on the horizontal axis, same 40% red line.

The big cloud players, Microsoft, AWS and Google, dominate AI and ML with the most presence. They have the tooling and the data. As we said, lots of modeling is going on in the cloud, but this will be pushed into remote AI inference engines that will have massive processing capabilities collectively. We are moving away from peak centralization and this presents great opportunities to create value and apply AI to industry.

Databricks Inc. is seen as an AI leader and stands out with a strong Net Score and a prominent Market Share. SparkCognition Inc. is off the charts in the upper left with an extremely high Net Score albeit from a small sample. The company applies machine learning to massive data sets. DataRobot Inc. does automated AI – they’re super high on the Y axis. Dataiku Inc. helps create machine learning-based apps. C3.ai Inc. is an enterprise AI company founded and run by Tom Siebel. You see SAP SE, Salesforce.com Inc. and IBM Watson just at the 40% line. Oracle is also in the mix with its autonomous database capabilities and Adobe Inc. shows as well.

The point is that these software companies are all embedding AI into their offerings. And incumbent companies that are trying not to get disrupted can buy AI from software companies. They don’t have to build it themselves. The hard part is how and where to apply AI. And the simple answer is: Follow the data.

Key takeawaysThere’s so much more to this story, but let’s leave it there for now and summarize.

We’ve been pounding the table about the post-x86 era, the importance of volume in terms of lowering the costs of semiconductor production, and today we’ve quantified something that we haven’t really seen much of and that’s the actual performance improvements we’re seeing in processing today. Forget Moore’s Law being dead – that’s irrelevant. The original premise is being blown away this decade by SoC and the coming system on package designs. Who knows with quantum computing what the future holds in terms of performance increases.

These trends are a fundamental enabler of AI applications and as is most often the case, the innovation is coming from consumer use cases; Apple continues to lead the way. Apple’s integrated hardware and software approach will increasingly move to the enterprise mindset. Clearly the cloud vendors are moving in that direction. You see it with Oracle Corp. too. It just makes sense that optimizing hardware and software together will gain momentum because there’s so much opportunity for customization in chips as we discussed last week with Arm Ltd.’s announcement – and it’s the direction new CEO Pat Gelsinger is taking Intel Corp.

One aside – Gelsinger may face massive challenges with Intel, but he’s right on that semiconductor demand is increasing and there’s no end in sight.

If you’re an enterprise, you should not stress about inventing AI. Rather, your focus should be on understanding what data gives you competitive advantage and how to apply machine intelligence and AI to win. You’ll buy, not build AI.

Data, as John Furrier has said many times, is becoming the new development kit. He said that 10 years ago and it’s more true now than ever before:

Data is the new development kit.

If you’re an enterprise hardware player, you will be designing your own chips and writing more software to exploit AI. You’ll be embedding custom silicon and AI throughout your product portfolio and you’ll be increasingly bringing compute to data. Data will mostly stay where it’s created. Systems, storage and networking stacks are all being disrupted.

If you developer software, you now have processing capabilities in the palm of your hands that are incredible and you’re going to write new applications to take advantage of this and use AI to change the world. You’ll have to figure out how to get access to the most relevant data, secure your platforms and innovate.

And finally, if you’re a services company you have opportunities to help companies trying not to be disrupted. These are many. You have the deep industry expertise and horizontal technology chops to help customers survive and thrive.

Privacy? AI for good? Those are whole topics on their own, extensively covered by journalists. We think for now it’s prudent to gain a better understanding of how far AI can go before we determine how far it should go and how it should be regulated. Protecting our personal data and privacy should be something that we most definitely care for – but generally we’d rather not stifle innovation at this point.

Also, check out this ETR Tutorial we created, which explains the spending methodology in more detail. Note: ETR is a separate company from Wikibon/SiliconANGLE. If you would like to cite or republish any of the company’s data, or inquire about its services, please contact ETR at legal@etr.ai.

Cerebras Systems has unveiled its new Wafer Scale Engine 2 processor with a record-setting 2.6 trillion transistors and 850,000 AI-optimized cores. It’s built for supercomputing tasks, and it’s the second time since 2019 that Los Altos, California-based Cerebras has unveiled a chip that is basically an entire wafer.

Chipmakers normally slice a wafer from a 12-inch-diameter ingot of silicon to process in a chip factory. Once processed, the wafer is sliced into hundreds of separate chips that can be used in electronic hardware.

But Cerebras, started by SeaMicro founder Andrew Feldman, takes that wafer and makes a single, massive chip out of it. Each piece of the chip, dubbed a core, is interconnected in a sophisticated way to other cores. The interconnections are designed to keep all the cores functioning at high speeds so the transistors can work together as one.

Twice as good as the CS-1

Above: Comparing the CS-1 to the biggest GPU.

Image Credit: Cerebras

In 2019, Cerebras could fit 400,000 cores and 1.2 billion transistors on a wafer chip, the CS-1. It was built with a 16-nanometer manufacturing process. But the new chip is built with a high-end 7-nanometer process, meaning the width between circuits is seven billionths of a meter. With such miniaturization, Cerebras can cram a lot more transistors in the same 12-inch wafer, Feldman said. It cuts that circular wafer into a square that is eight inches by eight inches, and ships the device in that form.

“We have 123 times more cores and 1,000 times more memory on chip and 12,000 times more memory bandwidth and 45,000 times more fabric bandwidth,” Feldman said in an interview with VentureBeat. “We were aggressive on scaling geometry, and we made a set of microarchitecture improvements.”

Now Cerebras’ WSE-2 chip has more than twice as many cores and transistors. By comparison the largest graphics processing unit (GPU) has only 54 billion transistors — 2.55 trillion fewer transistors than the WSE-2. The WSE-2 also has 123 times more cores and 1,000 times more high performance on-chip high memory than GPU competitors. Many of the Cerebras cores are redundant in case one part fails.

“This is a great achievement, especially when considering that the world’s third largest chip is 2.55 trillion transistors smaller than the WSE-2,” said Linley Gwennap, principal analyst at The Linley Group, in a statement.

Feldman half-joked that this should prove that Cerebras is not a one-trick pony.

“What this avoids is all the complexity of trying to tie together lots of little things,” Feldman said. “When you have to build a cluster of GPUs, you have to spread your model across multiple nodes. You have to deal with device memory sizes and memory bandwidth constraints and communication and synchronization overheads.”

The CS-2’s specs

Above: TSMC put the CS-1 in a chip museum.

Image Credit: Cerebras

The WSE-2 will power the Cerebras CS-2, the industry’s fastest AI computer, designed and optimized for 7 nanometers and beyond. Manufactured by contract manufacturer TSMC, the WSE-2 more than doubles all performance characteristics on the chip — the transistor count, core count, memory, memory bandwidth, and fabric bandwidth — over the first generation WSE. The result is that on every performance metric, the WSE-2 is orders of magnitude larger and more performant than any competing GPU on the market, Feldman said.

TSMC put the first WSE-1 chip in a museum of innovation for chip technology in Taiwan.

“Cerebras does deliver the cores promised,” Patrick Moorhead, an analyst at Moor Insights & Strategy. “What the company is delivering is more along the lines of multiple clusters on a chip. It does appear to give Nvidia a run for its money but doesn’t run raw CUDA. That has become somewhat of a de facto standard. Nvidia solutions are more flexible as well as they can fit into nearly any server chassis.”

With every component optimized for AI work, the CS-2 delivers more compute performance at less space and less power than any other system, Feldman said. Depending on workload, from AI to high-performance computing, CS-2 delivers hundreds or thousands of times more performance than legacy alternatives, and it does so at a fraction of the power draw and space.

A single CS-2 replaces clusters of hundreds or thousands of graphics processing units (GPUs) that consume dozens of racks, use hundreds of kilowatts of power, and take months to configure and program. At only 26 inches tall, the CS-2 fits in one-third of a standard datacenter rack.

“Obviously, there are companies and entities interested in Cerebras’ wafer-scale solution for large data sets,” said Jim McGregor, principal analyst at Tirias Research, in an email. “But, there are many more opportunities at the enterprise level for the millions of other AI applications and still opportunities beyond what Cerebras could handle, which is why Nvidia has the SuprPod and Selene supercomputers.”

He added, “You also have to remember that Nvidia is targeting everything from AI robotics with Jenson to supercomputers. Cerebras is more of a niche platform. It will take some opportunities but will not match the breadth of what Nvidia is targeting. Besides, Nvidia is selling everything they can build.”

Lots of customers

Above: Comparing the new Cerebras chip to its rival, the Nvidia A100.

Image Credit: Cerebras

And the company has proven itself by shipping the first generation to customers. Over the past year, customers have deployed the Cerebras WSE and CS-1, including Argonne National Laboratory; Lawrence Livermore National Laboratory; Pittsburgh Supercomputing Center (PSC) for its Neocortex AI supercomputer; EPCC, the supercomputing center at the University of Edinburgh; pharmaceutical leader GlaxoSmithKline; Tokyo Electron Devices; and more. Customers praising the chip include those at GlaxoSmithKline and the Argonne National Laboratory.

Kim Branson, senior vice president at GlaxoSmithKline, said in a statement that the company has increased the complexity of the encoder models it generates while decreasing training time by 80 times. At Argonne, the chip is being used for cancer research and has reduced the experiment turnaround time on cancer models by more than 300 times.

“For drug discovery, we have other wins that we’ll be announcing over the next year in heavy manufacturing and pharma and biotech and military,” Feldman said.

The new chips will ship in the third quarter. Feldman said the company now has more than 300 engineers, with offices in Silicon Valley, Toronto, San Diego, and Tokyo.

VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you

our newsletters

gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More

Now is the time for a new programming paradigm. Previous generations of languages were called imperative, object-oriented, and then functional. The next generation of languages is now possible: Declarative languages, which have almost no bugs, and offer a 10:1 reduction in total lifecycle cost. For many graphical interactive or client/server applications, you can replace the entire development stack with one relatively simple tool.

The Beads language compiler is now available, free. It supports both Macintosh and Windows OS (sorry, no Linux yet). Please download the SDK, unzip it, and read the READ ME file to get started. There is a YouTube video showing how to get started for Macintosh or Windows. Linux development can be done using Wine to create a virtual Windows under Linux.

The next generation computer language and toolchain, code-named "Beads", has the following goals:

1) Provide an alternative to Excel for business modeling and automation. 2) Make it easier for programs to be improved by someone other than the original author. 3) Provide a system that protects against programmer errors to the extent possible 4) Offer a notation that is independent of hardware and operating systems, so that programs will last for decades.

EXCEL is clumsy and unreliableMillions of businesses use Excel every day, but it is clumsy and difficult to audit. Beads offers businesses a way to automate business processes without a large investment.

Beads can build a complex product using one simple language.Never in history have programmers had to work in so many languages and frameworks at once. A typical project today might use HTML, CSS, Javascript, Apache, MySQL, PHP, and perhaps multiple frameworks like jQuery or React. This is a complex set of tools that is costly and cumbersome to use. In Beads, you work in one language that is simple and direct.

beads works both forwards and backwardsAuthoring a program is not where the productivity problem in software lies. The real drawback of current tools is that when you look at the screen (the output), and want to go backwards into the source code to make some change, it is very difficult and time consuming. The majority of time in conventional programming is spent in the backwards process euphemistically called debugging. Beadshas a unique ability to make the reverse linkage more direct, so that it is easier to figure out where in the source code a particular problem occurs.

Beads includes a databaseIn many languages when it comes time to manipulate and store data you use an external database. Beads' internal data structures, which resemble the graph database as exemplified by Neo4J, are so powerful and flexible that you don't normally use an external database system. This dramatically simplifies the programming task, as working with databases always makes things more complex.

Beads is robustIn many languages the slightest error in input data can cause a program to seriously malfunction. Beads has special rules of arithmetic and a robust mathematical model, that makes it extremely difficult to have a serious malfunction.

Reality is merely an illusion, albeit a very persistent one. -- Albert Einstein

Code in Assembly for Apple Silicon with the AsmAttic app

Learning a little assembly language is not only good for the soul, but it has value for anyone wanting to deepen their understanding of a processor, those who want to read disassembled code such as security researchers, and anyone writing code in a higher-level language such as Objective-C or Swift. Although there are several good books about ARM assembly (see references), using it in Xcode apps is not well documented. Apple’s developer information is most helpful to those who already write assembly and are accustomed to its quirks.

This article is the first of what I hope will be a series to open up access to assembly coding for Apple Silicon Macs. Here I take you through building a simple app which wraps itself around four lines of ARM64 assembly code, and provides a platform for subsequent articles. To work through this, you’ll need an M1 Mac and Xcode 12.5 (free from the App Store), and I assume that you’re sufficiently familiar with that and Swift to be able to build a basic app in AppKit or similar.

The complete Xcode project is available here: asmattic

Start by creating a new project for a macOS app, which I’ve named AsmAttic. Set its Interface and Life Cycle to support your favourite model. In my case, that’s a conventional Storyboard with an AppKit App Delegate and Swift as its language. You’re welcome to use SwiftUI or anything else which you find straightforward.

In its Project Build Settings, set it to build the Active Architecture only for both debug and release versions, so that Xcode will only make ARM versions of the app. If you want to support Intel, you’ll need to add conditionals to ensure that the assembly is only built for and called by the ARM version.

Build your app a little interface window, here with three numeric input boxes, and a scrolling text output view, so that it can take three floating point numbers as input, and write a string containing the results. Add real code to perform the compiled equivalent of what you’re going to code in assembly, a double multiply and add.

My primary purpose in exploring ARM assembly is to look in greater detail at its floating point arithmetic. The instructions which I’m most interested in merge two arithmetic operations, multiply and add. They take three doubles, a, b and c, and calculate the result of (a * b) + c They’re of particular interest to me because they reduce error compared with two separate operations. So that’s what this initial version of AsmAttic is going to perform in both Swift and assembly.

At this stage, wire up the window with code which performs that using Swift (see the completed code below for one solution). Test the app to ensure that it works without calling any assembly routines.

When you’re happy that’s working correctly, add a New File, selecting the Assembly type, and naming it asmmath.s. The code that contains is short and sweet: .global _multadd .align 4

That stores a set of registers, performs the FMADD operation on the three doubles, leaves the result in D0, restores the registers, and returns.

To be able to access that from Swift, you then need a C header file, asmmath.h, which contains just the following: #ifndef asmmath_h #define asmmath_h extern double multadd(double, double, double); #endif /* asmmath_h */ As with the other files here, ensure this is added to the target of the project.

In theory, Xcode should automatically generate a bridging header to enable your Swift code to call that assembly routine. In practice, I’ve not seen that happen, and have had to create that manually. To do that, add another Header file, this time named AsmAttic-Bridging-Header.h. Inside that, the key line is: #include "asmmath.h" which bridges between Swift and the C header, which in turn wraps the _multadd routine in assembly language.

The final step is to tell Xcode to use that bridging header in the project’s Build Settings. Locate within those the Swift Compiler – General section, and add as the location for the bridging header the path to your file relative to the project file, typically something like AsmAttic/AsmAttic-Bridging-Header.h If you get that wrong, Xcode will complain that it can’t find the bridging header and builds will fail with that error.

Go back to your Swift code to handle the button press, and call the assembly routine using code such as let theRes2 = multadd(theA.doubleValue, theB.doubleValue, theC.doubleValue) so you can print theRes2 as its result in the output text.

My final code for the ViewController reads:

class ViewController: NSViewController { @IBOutlet weak var no1Text: NSTextField! @IBOutlet weak var no1Formatter: NumberFormatter! @IBOutlet weak var no2Text: NSTextField! @IBOutlet weak var no2Formatter: NumberFormatter! @IBOutlet weak var no3Text: NSTextField! @IBOutlet weak var no3Formatter: NumberFormatter! @IBOutlet var outputText: NSTextView! override func viewDidLoad() { super.viewDidLoad() // Do any additional setup after loading the view. } override var representedObject: Any? { didSet { // Update the view, if already loaded. } } @IBAction func goButton(_ sender: Any) { if let theA = self.no1Formatter.number(from: self.no1Text.stringValue) { if let theB = self.no2Formatter.number(from: self.no2Text.stringValue) { if let theC = self.no3Formatter.number(from: self.no3Text.stringValue) { let theRes1 = theA.doubleValue * theB.doubleValue + theC.doubleValue let theRes2 = multadd(theA.doubleValue, theB.doubleValue, theC.doubleValue) self.outputText.string = "In Swift \(theRes1), by assembler \(theRes2)\n" } } } } } Your app should now let you set the three variables, calculate the result both using Swift and the FMADD operation, and write the result to the output view.

Passing the three doubles to the assembly language routine and passing the result back relies on the calling convention, which passes the three values in registers D0 to D2, and returns the result in D0. In the next article I’ll look at those calling conventions, which are so crucial to success in assembly language.

References

Stephen Smith (2020) Programming with 64-Bit ARM Assembly Language, Apress, ISBN 978 1 4842 5880 4. Daniel Kusswurm (2020) Modern Arm Assembly Language Programming, Apress, ISBN 978 1 4842 6266 5.

I’ve written about this before but yesterday the NY Times published an interesting piece on the EUV lithography machines produced by ASML and how those machines really determine who can manufacture cutting edge microchips.

As you probably know, there’s a concept called Moore’s Law which suggests that the complexity of microchips doubles every two years while the cost of the chips is cut in half. And for the most part that has held true since the first CPUs were introduced in the 1970s.

But cramming more and more transistors into the same physical space gets harder to do over time. With each successive generation of chips, the number of transistors packed into a square millimeter has to climb. Actually making that happen turns out to be massively difficult. In fact, it required expertise from different companies around the world to allow for the creation of the world’s first EUV lithography machines. The machine itself is about the size and shape of a bus and costs $150 million dollars each.

Inside are a series of mirrors which reflect ultraviolent light through an image of the chip, shrinking it down so many copies can be printed onto a single silicon wafer. ASML partnered with German optical company Zeiss to produce the high end optics for the machines. But it turns out that even the best mirrors aren’t that reflective to the ultraviolet wavelengths needed to produce the small traces on the latest chips. So the light source has to be very bright to compensate. In the end, ASML settled on a design which sprays tiny droplets of molten tin. Those droplets are then hit with a powerful laser which instantly turns them into a plasma that releases a lot of ultraviolet light.

To say it’s a complicated system is underselling it substantially. An IBM senior VP calls it the most complicated machine ever built by humans. ASML has only made about 100 of them and can only make a maximum of about 50 of them in a year. But thanks to the Trump administration, China can’t buy one.

The tool, which took decades to develop and was introduced for high-volume manufacturing in 2017, costs more than $150 million. Shipping it to customers requires 40 shipping containers, 20 trucks and three Boeing 747s.

The complex machine is widely acknowledged as necessary for making the most advanced chips, an ability with geopolitical implications. The Trump administration successfully lobbied the Dutch government to block shipments of such a machine to China in 2019, and the Biden administration has shown no signs of reversing that stance.

Manufacturers can’t produce leading-edge chips without the system, and “it is only made by the Dutch firm ASML,” said Will Hunt, a research analyst at Georgetown University’s Center for Security and Emerging Technology, which has concluded that it would take China at least a decade to build its own similar equipment. “From China’s perspective, that is a frustrating thing.”…

Since ASML introduced its commercial EUV model in 2017, customers have bought about 100 of them. Buyers include Samsung and TSMC, the biggest service producing chips designed by other companies. TSMC uses the tool to make the processors designed by Apple for its latest iPhones. Intel and IBM have said EUV is crucial to their plans.

“It’s definitely the most complicated machine humans have built,” said Darío Gil, a senior vice president at IBM.

It would probably take a decade and a trillion dollars for China to replicate the European, Japanese and American supply chain that produces the parts for the ASML EUV lithography machine. That’s a long time and by the time they got it done the free would would have moved on to something even more advanced.

But once place that does have these machines is the world’s leading chipmaker, a company called TSMC which stands for Taiwan Semiconductor Manufacturing Company Co. So if you’re wondering why China is so hot to reunite Taiwan with the mainland, one reason may be that in taking over the island they would effectively seize control of the latest ASML machines which they can’t buy and can’t produce on their own. It would be the greatest theft of advanced technology in China’s history. If China wants its technology to catch up with the rest of the world, an invasion of Taiwan is probably their best bet.

Of course invading Taiwan would probably put an end to selling new EUV machines to TSMC, but the disruption of potentially having TSMC under Chinese control would potentially set back the rest of the world’s manufacturing by several years. China might not be able to catch up completely but they could jump ahead several years and set the US back at the same time. It’s one reason the US has been looking into being less dependent on places like Taiwan for our high tech manufacturing as we move forward.

Update: This primer on EUV lithography from Zeiss notes that a single image printed by the optical system contains about a terrapixel of information. That’s equivalent to 2.4 million times the number of pixels in an HDTV.

One more mind-blowing stat. If you expanded an EUV mirror to cover the entire size of Germany the largest bump on the surface would be 100 micrometers tall. That’s how perfect these optics that produce the chips in a modern iPhone are.