1 / 127

Leonid Perlovsky Visiting Scholar, Harvard University Technical Advisor, AFRL

COGNITIVE COMPUTATIONAL INTELLIGENCE for data mining, financial prediction, tracking, fusion, language, cognition, and cultural evolution. IASTED CI 2009 Honolulu, HI 1:30 – 5:30 pm, Aug. 19. Leonid Perlovsky Visiting Scholar, Harvard University Technical Advisor, AFRL. OUTLINE.

blake-house
Download Presentation

Leonid Perlovsky Visiting Scholar, Harvard University Technical Advisor, AFRL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COGNITIVE COMPUTATIONAL INTELLIGENCEfordata mining, financial prediction, tracking, fusion, language, cognition, and cultural evolution IASTED CI 2009 Honolulu, HI 1:30 – 5:30 pm, Aug. 19 Leonid Perlovsky Visiting Scholar, Harvard University Technical Advisor, AFRL

  2. OUTLINE 1. Cognition and Logic 2. The Knowledge Instinct -Dynamic Logic 3. Language 4. Integration of cognition and language 5. High Cognitive Functions 6. Evolution of cultures 7. Future directions

  3. INTRODUCTION The mind

  4. PHYSICS AND MATHEMATICS OF THE MINDRANGE OF CONCEPTS • Logic is sufficient to explain mind • [Newell, “Artificial Intelligence”, 1980s] • No new specific mathematical concepts are needed • Mind is a collection of ad-hoc principles, [Minsky, 1990s] • Specific mathematical constructs describe the multiplicity of mind phenomena • “first physical principles of mind” • [Grossberg, Zadeh, Perlovsky,…] • Quantum computation • [Hameroff, Penrose, Perlovsky,…] • New unknown yet physical phenomena • [Josephson, Penrose]

  5. GENETIC ARGUMENTSFOR THE “FIRST PRINCIPLES” • Only 30,000 genes in human genome • Only about 2% difference between human and apes • Say, 1% difference between human and ape minds • Only about 300 proteins • Therefore, the mind has to utilize few inborn principles • If we count “a protein per concept” • If we count combinations: 300300 ~ unlimited => all concepts and languages could have been genetically h/w-ed (!?!) • Languages and concepts are not genetically hardwired • Because they have to be flexible and adaptive

  6. COGNITION • Understanding the world • Perception • Simple objects • Complex situations • Integration of real-time signals and existing (a priori) knowledge • From signals to concepts • From less knowledge to more knowledge

  7. EXAMPLE • Example: “this is a chair, it is for sitting” • Identify objects • signals -> concepts • What in the mind help us do this? Representations, models, ontologies? • What is the nature of representations in the mind? • Wooden chairs in the world, but no wood in the brain

  8. VISUAL PERCEPTION • Neural mechanisms are well studied • Projection from retina to visual cortex (geometrically accurate) • Projection of memories-models • from memory to visual cortex • Matching: sensory signals and models • In visual nerve more feedback connections than feedforward • matching involves complicated adaptation of models and signals • Difficulty • Associate signals with models • A lot of models (expected objects and scences) • Many more combinations: models<->pixels • Association + adaptation • To adapt, signals and models should be matched • To match, they should be adapted

  9. ALGORITHMIC DIFFICULTIES A FUNDAMENTAL PROBLEM? • Cognition and language involve evaluating large numbers of combinations • Pixels -> objects -> scenes • Combinatorial Complexity (CC) • A general problem (since the 1950s) • Detection, recognition, tracking, fusion, situational awareness, language… • Pattern recognition, neural networks, rule systems… • Combinations of 100 elements are 100100 • This number ~ the size of the Universe • > all the events in the Universe during its entire life

  10. COMBINATORIAL COMPLEXITY SINCE the 1950s • CC was encountered for over 50 years • Statistical pattern recognition and neural networks: CC of learning requirements • Rule systems and AI, in the presence of variability : CC of rules • Minsky 1960s: Artificial Intelligence • Chomsky 1957: language mechanisms are rule systems • Model-based systems, with adaptive models: CC of computations • Chomsky 1981: language mechanisms are model-based (rules and parameters) • Current ontologies, “semantic web” are rule-systems • Evolvable ontologies : present challenge

  11. CC AND TYPES OF LOGIC • CC is related to formal logic • Law of excluded middle (or excluded third) • every logical statement is either true or false • Gödel proved that logic is “illogical,” “inconsistent” (1930s) • CC is Gödel's “incompleteness” in a finite system • Multivalued logic eliminated the “law of excluded third” • Still, the math. of formal logic • Excluded 3rd -> excluded (n+1) • Fuzzy logic eliminated the “law of excluded third” • How to select “the right” degree of fuzziness • The mind fits fuzziness for every statement at every step => CC • Logic pervades all algorithms and neural networks • rule systems, fuzzy systems (degree of fuzziness), pattern recognition, neural networks (training uses logical statements)

  12. LOGIC VS. GRADIENT ASCENT • Gradient ascent maximizes without CC • Requires continuous parameters • How to take gradients along “association”? • Data Xn (or) to object m • It is a logical statement, discrete, non-differentiable • Models / ontologies require logic => CC • Multivalued logic does not lead to gradient ascent • Fuzzy logic uses continuous association variables, b • A new principle is needed to specify gradient ascent along fuzzy associations: dynamic logic

  13. DYNAMIC LOGIC • Dynamic Logic unifies formal and fuzzy logic • initial “vague or fuzzy concepts” dynamically evolve into “formal-logic or crisp concepts” • Dynamic logic • based on a similarity between models and signals • Overcomes CC • fast algorithms • Proven in neuroimaging experiments (Bar, 2006) • Initial representations-memories are vague • “close-eyes” experiment

  14. ARISTOTLE VS. GÖDEL logic, forms, and language • Aristotle • Logic: a supreme way of argument • Forms: representations in the mind • Form-as-potentiality evolves into form-as-actuality • Logic is valid for actualities, not for potentialities (Dynamic Logic) • Thought language and thinking are closely linked • Language contains the necessary uncertainty • From Boole to Russell: formalization of logic • Logicians eliminated from logic uncertainty of language • Hilbert: formalize rules of mathematical proofs forever • Gödel (the 1930s) • Logic is not consistent • Any statement can be proved true and false • Aristotle and Alexander the Great

  15. OUTLINE • Cognition, complexity, and logic • Logic does not work, but the mind does • The Mind and Knowledge Instinct • Neural Modeling Fields and Dynamic Logic • Language • Integration of cognition and language • Higher Cognitive Functions • Future directions

  16. STRUCTURE OF THE MIND • Concepts • Models of objects, their relations, and situations • Evolved to satisfy instincts • Instincts • Internal sensors (e.g. sugar level in blood) • Emotions • Neural signals connecting instincts and concepts • e.g. a hungry person sees food all around • Behavior • Models of goals (desires) and muscle-movement… • Hierarchy • Concept-models and behavior-models are organized in a “loose” hierarchy

  17. THE KNOWLEDGE INSTINCT • Model-concepts always have to be adapted • lighting, surrounding, new objects and situations • even when there is no concrete “bodily” needs • Instinct for knowledge and understanding • Increase similarity between models and the world • Emotions related to the knowledge instinct • Satisfaction or dissatisfaction • change in similarity between models and world • Related not to bodily instincts • harmony or disharmony (knowledge-world): aesthetic emotion

  18. REASONS FOR PAST LIMITATIONS • Human intelligence combines conceptual understanding with emotional evaluation • A long-standing cultural belief that emotions are opposite to thinking and intellect • “Stay cool to be smart” • Socrates, Plato, Aristotle • Reiterated by founders of Artificial Intelligence [Newell, Minsky]

  19. Neural Modeling Fields (NMF) • A mathematical construct modeling the mind • Neural synaptic fields • A loose hierarchy • bottom-up signals, top-down signals • At every level: concepts, emotions, models, behavior • Concepts become input signals to the next level

  20. NEURAL MODELING FIELDSbasic two-layer mechanism: from signals to concepts • Bottom-up signals • Pixels or samples (from sensor or retina) x(n), n = 1,…,N • Top-down signals (concept-models) Mm(Sm,n), parameters Sm, m = 1, …; • Models predict expected signals from objects • Goal: learn object-models and match to signals (knowledge instinct)

  21. THE KNOWLEDGE INSTINCT • The knowledge instinct = maximize similarity between signals and models • Similarity between signals and models, L • L = l ({x}) = l (x(n)) • l (x(n)) = r(m) l (x(n) | Mm(Sm,n)) • l (x(n) | Mm(Sm,n)) is a conditional similarity for x(n) given m • {n} are not independent, M(n) may depend on n’ • CC: L contains MN items: all associations of pixels and models (LOGIC)

  22. SIMILARITY • Similarity as likelihood • l (x(n) | Mm(Sm,n)) = pdf(x(n) | Mm(Sm,n)), • a conditional pdf for x(n) given m • e.g., Gaussian pdf(X(n)|m) = G(X(n)|Mm,Cm) = 2p-d/2 detCm-1/2 exp(-DmnTCm-1 Dmn/2); Dmn = X(n) – Mm(n) • Note, this is NOT the usual “Gaussian assumption” • deviations from models D are random, not the data X • multiple models {m} can model any pdf, not one Gaussian model • Use for sets of data points • Similarity as information • l (x(n) | Mm(Sm,n)) = abs(x(n))*pdf(x(n) | Mm(Sm,n)), • a mutual information in model m on data x(n) • L is a mutual information in all model about all data • e.g., Gaussian pdf(X(n)|m) = G(X(n)|Mm,Cm) • Use for continuous data (signals, images)

  23. DYNAMIC LOGIC (DL) non-combinatorial solution • Start with a set of signals and unknown object-models • any parameter values Sm • associate signals (n) and models (m) • (1) f(m|n) = r(m) l (n|m) /r(m') l (n|m') • Improve parameter estimation • (2) Sm = Sm + a f(m|n) [ln l (n|m)/Mm]*[Mm/Sm] • Continue iterations (1)-(2). Theorem: NMF is a convergingsystem - similarity increases on each iteration - aesthetic emotion is positive during learning

  24. OUTLINE • Cognition, complexity, and logic • Logic does not work, but the mind does • The Mind and Knowledge Instinct • Neural Modeling Fields and Dynamic Logic • Application examples • Language • Integration of cognition and language • Higher Cognitive Functions • Future directions

  25. APPLICATIONS • Many applications have been developed • Government • Medical • Commercial (about 25 companies use this technology) • Sensor signals processing and object recognition • Variety of sensors • Financial market predictions • Market crash on 9/11 predicted a week ahead • Internet search engines • Based on text understanding • Evolving ontologies for Semantic Web • Every application needs models • Future self-evolving models: integrated cognition and language

  26. APPLICATION 1 – CLUSTERING(data mining) • Find “natural” groups or clusters in data • Use Gaussian pdf and simple models l (n|m) = 2p-d/2 detCm-1/2 exp(-DmnTCm-1 Dmn/2); Dmn = X(n) – Mm(n) Mm(n) = Mm;each model has just 1 parameter, Sm = Mm • This is clustering with Gaussian Mixture Model • For complex l(n|m) derivatives can be taken numerically • For simple l(n|m) derivatives can be taken manually • Simplification, not essential • Simplify parameter estimation equation for Gaussian pdf and simple models ln l (n|m)/Mm =  (-DmnTCm-1 Dmn) /Mm = Cm-1 Dmn + DmnTCm-1 = 2 Cm-1 Dmn, (C is symmetric) Mm = Mm + a f(m|n) Cm-1 Dmn … • In this case, even simpler equations can be derived samples in class m: Nm = f(m|n); N = Nm rates (priors): rm = Nm / N means: Mm = f(m|n) X(n) / Nm covariances: Cm = f(m|n) Dmn * DmnT / Nm - simple interpretation: Nm, Mm, Cm are weighted averages. The only difference from standard mean and covariance estimation is weights f(m|n), probabilities of class m • These are iterative equations, f(m|n) depends on parameters; theorem: iterations converge

  27. 1 km (a) True Tracks Range 0 Cross-Range 0 1 km b Initial state of model 2 iterations 1 km Range 0 5 iterations 9 iterations 12 iterations Converged state Example 2: GMTI Tracking and Detection Below Clutter DL starts with uncertain knowledge and converges rapidly on exact solution 18 dB improvement

  28. TRACKING AND DETECTION BELOW CLUTTER (movie, same as above) DL starts with uncertain knowledge, and similar to human mind does not sort through all possibilities, but converges rapidly on exact solution 3 targets, 6 scans, signal-to-clutter, S/C ~ -3.0dB

  29. TRACKING EXAMPLE complexity and improvement • Technical difficulty • Signal/Clutter = - 3 dB, standard tracking requirements 15 dB • Computations, standard hypothesis testing ~ 101700, unsolvable • Solved by Dynamic Logic • Computations: 2x107 • Improvement 18 dB

  30. CRAMER-RAO BOUND (CRB) • Can a particular set of models be estimated from a particular (limited) set of data? • The question is not trivial • A simple rule-of-thumb: N(data points) > 10*S(parameters) • In addition: use your mind: is there enough information in the data? • CRB: minimal estimation error (best possible estimation) for any algorithm or neural neworks, or… • When there are many data points, CRB is a good measure (=ML=NMF) • When there are few data points (e.g. financial prediction) it might be difficult to access performance • Actual errors >> CRB • Simple well-known CRB for averaging several measurements st.dev(n) = st.dev(1)/√n • Complex CRB for tracking: • Perlovsky, L.I. (1997a). Cramer-Rao Bound for Tracking in Clutter and Tracking Multiple Objects. Pattern Recognition Letters, 18(3), pp.283-288.

  31. APPLICATION 3 FINDING PATTERNS IN IMAGES

  32. IMAGE PATTERN BELOW NOISE Object Image Object Image + Clutter y y x x

  33. PRIOR STATE-OF-THE-ART Computational complexity Multiple Hypothesis Testing (MHT) approach: try all possible ways of fitting model to the data For a 100 x 100 pixel image: Number of ObjectsNumber of Computations 1 1010 2 1020 3 1030

  34. NMF MODELS • Information similarity measure lnl (x(n) | Mm(Sm,n)) = abs(x(n))*ln pdf(x(n) | Mm(Sm, n)) n = (nx,ny) • Clutter concept-model (m=1) pdf(X(n)|1) = r1 • Object concept-model (m=2… ) pdf(x(n) | Mm(Sm, n))= r2 G(X(n)|Mm (n,k),Cm) Mm (n,k) = n0+ a*(k2,k); (note: k, K require no estimation)

  35. ONE PATTERN BELOW CLUTTER Y X SNR = -2.0 dB

  36. DYNAMIC LOGIC WORKING DL starts with uncertain knowledge, and similar to human mind converges rapidly on exact solution • Object invisible to human eye • By integrating data with the knowledge-model DL finds an object below noise y (m) Range x (m) Cross-range

  37. MULTIPLE PATTERNS BELOW CLUTTER Three objects in noise object 1 object 2 object 3 SCR - 0.70 dB -1.98 dB -0.73 dB 3 Object Image 3 Object Image + Clutter y y x x

  38. b d a c h e f g IMAGE PATTERNS BELOW CLUTTER (dynamic logic iterations see note-text) Logical complexity = MN = 105000, unsolvable; DL complexity = 107 S/C improvement ~ 16 dB

  39. MULTIPLE TARGET DETECTION DL WORKING EXAMPLE DL starts with uncertain knowledge, and similar to human mind does not sort through all possibilities like an MHT, but converges rapidly on exact solution y x

  40. COMPUTATIONAL REQUIREMENTS COMPARED Dynamic Logic (DL) vs. Classical State-of-the-art Multiple Hypothesis Testing (MHT) Based on 100 x 100 pixel image Number of Objects Number of Computations DL vs. MHT 1 2 3 108 vs. 1010 2x108vs. 1020 3x108 vs. 1030 • Previously un-computable (1030), can now be computed (3x108 ) • This pertains to many complex information-finding problems

  41. APPLICATION 4 SENSOR FUSION Concurrent fusion, navigation, and detection below clutter

  42. SENSOR FUSION • The difficult part of sensor fusion is association of data among sensors • Which sample in one sensor corresponds to which sample in another sensor? • If objects can be detected in each sensor individually • Still the problem of data association remains • Sometimes it is solved through coordinate estimation • If 3-d coordinates can be estimated reliably in each sensor • Sometimes it is solved through tracking • If objects could be reliably tracked in each sensor, => 3-d coordinates • If objects cannot be detected in each sensor individually • We have to find the best possible association among multiple samples • This is most difficult: concurrent detection, tacking, and fusion

  43. NMF/DL SENSOR FUSION • NMF/DL for sensor fusion requires no new conceptual development • Multiple sensor data require multiple sensor models • Data: n -> (s,n); X(n) -> X(s,n) • Models Mm(n) -> Mm(s,n) • PDF(n|m) is a product over sensors • This is a standard probabilistic procedure, another sensor is like another dimension • pdf(m|n) -> pdf(m|s,n) • Note: this solves the difficult problem of concurrent detection, tracking, and fusion

  44. Source: UAS Roadmap 2005-2030 UNCLASSIFIED

  45. CONCURRENT NAVIGATION, FUSION, AND DETECTION • multiple target detection and localization based on data from multiple micro-UAVs • A complex case • detection requires fusion (cannot be done with one sensor) • fusion requires exact target position estimation in 3-D • target position can be estimated by triangulation from multiple views • this requires exact UAV position • GPS is not sufficient • UAV position - by triangulation relative to known targets • therefore target detection and localization is performed concurrently with UAV navigation and localization, and fusion of information from multiple UAVs • Unsolvable using standard methods. Dynamic logic can solve because computational complexity scales linearly with number of sensors and targets

  46. GEOMETRY: MULTIPLE TARGETS, MULTIPLE UAVS UAV m Xm= X0m + Vmt UAV 1 Xm=(Xm,Ym,Zm) X1=(X1,Y1,Z1) X1= X01 + V1t

  47. CONDITIONAL SIMILARITIES (pdf) FOR TARGET k Data from UAV m, sample number n, where βnm = signature position and fnm = classification feature vector: Similarity for the data, given target k: signature position where classification features Note: Also have a pdf for a single clutter component pdf(wnm| k=0) which is uniform in βnm, Gaussian in fnm.

  48. Compute parameters that maximize the log-likelihood Data Model and Likelihood Similarity Total pdf of data samples is the summation of conditional pdfs (summation over targets plus clutter) (mixture model) classification feature parameters UAV parameters target parameters

  49. Concurrent Parameter Estimation / Signature Association (NMF iterations) FIND SOLUTION FOR SET OF “BEST” PARAMETERS BY ITERATING BETWEEN… Parameter Estimationand Association Probability Estimation (Bayes rule) (probability that sample wnm was generated by target k) Note1: bracket notation Note2: proven to converge (e.g. EM algorithm) Note 3: Minimum MSE solution incorporates GPS measurements

  50. Sensor 1 (of 3): Models Evolve to Locate Target Tracks in Image Data

More Related