Representing Representers and What They Represent

GMU Representing Representers and What They Represent Note change to less pretentious and more accessible title Kathryn Blackmond Laskey George Mason University Department of Systems Engineering and Operations Research Krasnow Institute QMind II

This talk is dedicated to the memory of journalist Danny Pearl, murdered in Pakistan in February 2002, and to the pioneering research of his father Judea Pearl. Judea Pearl’s research has the potential to create unprecedented advances in our ability to anticipate and prevent future terrorist incidents.

Representation • A representation consists of: • A representing system • A represented system • A mapping between the representing system and the represented system • Important properties of the represented system correspond to features in the representation • A conscious organism • Represents its environment and possibly itself to itself • Uses its representations to engage in adaptive behavior with respect to its environment • Sense • Recognize • Plan and act

Observations Representation Real World Actions

Science and Representation • Elements of a representation • Reality to represent • Space of possible representations of reality • Correspondence between aspects of reality and features in representation space • Important considerations • By whom is representation being used? • For what purpose? • How to measure how good it is? • Scientists study a phenomenon by • Building a representation of the phenomenon • Manipulating the representation • Comparing non-obvious features of the representation to corresponding features in reality • How do we study representation?

Observations Real world with real representation created by real conscious subsystem Artificial world with simulated representation created by simulated conscious subsystem Actions Representing Representation

Physics, Representation and Learning • Cross-fertilization from physics to statistics and machine learning has created rapid progress • Recipe for creating a good learning algorithm • Represent the learning problem as a physical system in which “low action” or “low free energy” maps to good representation • Simulate the physical system on a computer • Let the simulation evolve according to (simulated) laws of physics • Presto! Out comes a good solution to your problem • The opposite direction: • Can ideas from learning theory give insights for a physics of consciousness?

Learners and Learnable Phenomena • Good learners • Loosely coupled local learners • Multi-resolution representations • Bias toward simple representations • Compose elements to form complex representations • Adjust appropriately to environmental feedback • Intrinsic randomness to bump out of locally but not globally optimal representations • Learnable systems • Repeated structure • Complexity built up out of simple pieces • Not too much randomness • A system capable of self-representation must be • Simple enough to exhibit learnable regularities • Complex enough to form and evolve representations of itself

Observations • Physical reality • Wave function • Deterministic evolution punctuated by “jumps” • No consensus on • How and why “jumps” occur • How consciousness interacts with physical world ? Real World Representation Actions 20th Century Science

Stapp Theory of Consciousness • Timing of reduction and choice of operator occur by conscious choice • Efficacious conscious choice enters where physics currently lacks a theory • Comments on Stapp theory • Stapp does not demand that all state vector reductions involve conscious choice • Theory and experiment verify that macroscopic evolution of physical system can depend on choice and timing of reductions • Experimentally verified quantum Zeno effect is one potential mechanism by which conscious choice might operate • Stapp argues that operation of quantum Zeno effect is plausible in conditions occurring in brains

Paradigm Shift in Computing • Old paradigm: Algorithms running on Turing machines • Deterministic • Based on Boolean logic • New paradigm: Economy of software agents executing on a physical symbol system • Agents make decisions (deterministic or stochastic) to achieve objectives • “Program” is replaced by dynamic system evolving better solutions • Based on decision theory / game theory / stochastic processes • Hardware realizations of physical symbol systems • Physical systems minimize action • Decision theoretic systems maximize utility / minimize loss • Hardware realization of physical symbol system maps action to utility • Programming languages are replaced by specification / interaction languages • Software designer specifies goals, rewards and information flows • Unified theory spans sub-symbolic to cognitive levels • Old paradigm is limiting case of new paradigm

Plausible inference The evidence for cat allergy “explains away” sneezing and cold is no longer needed as an explanation 2 1 3 Decision Graph: An Example • Maria is visiting a friend when she suddenly begins sneezing. • "Oh dear, I'm getting a cold," she thinks. “I had better not visit Grandma.” • Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!” Does Maria have a “grandmother neuron”?

What Happened Under the Hood? • A decision graph is both a knowledge representation and a computational architecture • Represents knowledge about variables and their interactions • Modular elements with defined interconnections • Computation can exploit loosely coupled structure for efficiency • Parsimony • Probability distributions on 5 binary variables  31-dimensional space • Probability distributions for Maria’s Bayesian network  9-dimensional space • Learning about one variable affects likelihood of other variables • Evidence “flows” along the arcs • Bidirectional inference • Learn structure and probabilities as cases accumulate • The information update operation is called Bayes Rule • Bayesian inference is belief dynamics • Within-case evidence accumulation • Cross-case learning

Posterior odds ratio Prior odds ratio Likelihood ratio Subjective Probability • PS(E|B) is system’s degree of belief that E will occur given background information B • In subjectivist theory there is no one “correct” probability • Viewpoints vary on whether “objective probabilities” exist • Probability as belief dynamics • If new information N is added to background information B then belief in E changes to PS(E|B&N) • Probability updating follows the dynamic equation known as Bayes rule • Belief in E1 increases relative to E2 if N was more likely to co-occur with E1 than with E2

Maria’s Continuing Saga… • Variation 1: • Tran is sneezing and saw scratches • Tran was recently exposed to a cold and probably is not allergy prone • Variation 2: • Tran saw scratches • Maria did not see scratches • Tran is in room with Maria • Variation 3: • Tran and Maria both are sneezing, are allergy prone, and saw scratches • Tran and Maria are a continent apart

Variation 1 • Add background variables to specialize model to different individuals • Still a “template model” with limited expressive power

Variation 2 • Decision graph has replicated sub-parts • Different kinds of entities (cats and people)

Variation 3 Done Wrong • Variation 2 model gets wrong answer if Maria and Tran are not near each other and both are near cats! • We need to be able to hypothesize additional cats if and when necessary But is the cat dead or alive?

Variation 3 Done Right(…but what a mess!) • This model gets the right answer on all the variations

The Solution: Multi-Entity Decision Graphs • Specify model in pieces and let the computer compose them • First-order predicate calculus plus probabilities and decisions Spatial Fragment Hypothesis Management Fragment Cats & Allergies Fragment Value Fragment Colds&Time Fragment Sneezing Fragment

Observations • Decisions and actions • When to take observation • Which question to ask • Predicted outcomes • Probability distribution for next observable • Values • Accurate prediction • Survival • Stochastic process • Time evolution governed by Shrödinger equation plus “quantum jumps” • No good theory for: • Timing of reductions • Which operator is applied Representation of “Real” World “Real” World Actions Representing Representation

wave function reduction wave function reduction Shrödinger Dynamics Y1 Y2 Y1+ Y2+ T2 T1 O1 E1 E2 “Information influence” O2 V2 Described in psychological terms Described in psychological terms V1 Representation of “Player’s” Choice as Decision Graph Information influences and value nodes are modeled by standard physics Yi = wave function before observation Yi+ = wave function after observation Mi = measurement operation Ti = time since last measurement Ei = current experience Vi = value to player - decision - chance event - value - deterministic event • “Players” choose when to cause reduction events & operator to apply • “Players” evolve representations • Consistent with quantum mechanics • Schrödinger evolution between reductions • Dirac probabilities for selecting actual experience from possible experiences

When to Reduce? • Game theoretic semantics • Player’s utility function includes effort of applying operator and value of result • Choose reduction policy that maximizes player’s utility • Players interact and can affect each other’s utility • Evolutionary pressure for players who “like” policies conducive to survival • As time since last observation increases • Probability of “termination state” increases • Fatigue decreases • Components of uncertainty • Intrinsic stochasticity (“object level” uncertainty) • Lack of knowledge (“higher order” uncertainty) • Approximation error (“model uncertainty”) • As players evolve more complex and more accurate models • Forecasts become more accurate • Less higher order uncertainty • Better ability to control question asking • Players can learn to share information and effort

Direction of Time • Second law of thermodynamics: time is direction of increasing physical entropy • “Learning universe” hypothesis: time is direction of increasing knowledge of players about the universe they inhabit • Can these arrows be reconciled? • Expansion of physical phase space • Contraction of information phase space

Communication • Learning can be faster when players exchange information • Communicating players exchange messages • Players can learn each other’s representations • Efficient communication: Player 1 expresses difference between Player 1’s knowledge and Player 2’s knowledge in language of Player 2’s representation • Mixed motives for information sharing • Communication respects laws of physics • Intrinsic randomness • Prevents “freezing” at local optima

Summary • Conscious agents construct representations • Conscious agents learn better representations over time • Common mathematics and algorithms for • Simulating physical systems • Learning complex representations • Many parameters • High degree of conditional independence (representation is restricted to low-dimensional subspace of all probability distributions) • High degree of self-similarity • Conscious subsystems of universe evolve to construct better representations of themselves and the world around them

To Think About • Where is information technology revolution heading? • Silicon intelligence? • Symbiotic carbon/silicon intelligence? • “Earth consciousness”? • Assertion: It is important that we • Embed (decision theoretically) coherent logic and decision making rules in hardware of the systems we build • Base software architecture on decision and game theoretic semantics • Map the software dynamics properly to the physical dynamics • Understand the semantics of the knowledge representations we construct • Understand how they are realized in the physics • Understand the interface between the physics and the representation • Questions to address: • What base beliefs should we embed in hardware? • What base values values should we embed in hardware? • How should we initialize decision rules? • It is better to have our eyes open and our minds engaged in considering the possibilities than to just let it happen to us

Speculative Question • Can “intelligent life” be modeled as attractor in phase space? • “Players” stay alive by constructing accurate (enough) representations • Control exercised by “players” introduces nonlinearity • “Players” guide the system toward “edge of complexity” • Simple enough to learn • Complex enough to evolve learners • Learners cooperate to organize into mutually beneficial societies • Could we obtain a shadowing theorem? • Unconscious Schrodinger evolution moves away from the attractor under influence of local forces • Wave function reduction brings system back toward the attractor • Reduction event registers in consciousness of learners and increases their knowledge • Consciously adopting policies that keep us near attractor has survival value

Additional Speculative Questions • “Mixture distributions” over models of different dimensions are active research area • Bayes rule gives rise to “natural Occam’s razor” • Bias toward simple models (low-dimensional parameter space) • Dimensions are included as needed to explain observations • Tractable approximation of complex models • Algorithms imported from physics (e.g., variational methods, Markov Chain Monte Carlo) are being applied to learn very complex models • Can we be modeled as MCMC samplers learning a representation of the universe we live in? • If we average over models of different dimensions the parameter space is not a smooth manifold • Image: quantum foam • Might there be something to this image?

Representing Representers and What They Represent