1 / 20

Heuristic Search in Program Space for the AGINAO Cognitive Architecture

(. com ). Heuristic Search in Program Space for the AGINAO Cognitive Architecture. Wojciech (Wojtek) Skaba. aginao @ aginao.com. E. Sensory. Environment. Actuators. AMD GEODE 500 MHz (Linux). Powerful host Intel i7-980X (6 cores ). Robot. Actions. States. RL Agent.

gyala
Download Presentation

Heuristic Search in Program Space for the AGINAO Cognitive Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (.com) Heuristic Search in Program Spacefor the AGINAO Cognitive Architecture Wojciech (Wojtek) Skaba aginao@aginao.com

  2. E Sensory Environment Actuators AMD GEODE 500 MHz(Linux) Powerful host Intel i7-980X (6 cores) Robot Actions States RL Agent

  3. Robot Host communication Sensory • raw visual data YUV422, 640x480 • auditory data, FFT preprocessed (4 mikes) • joints positions • other… up to 3 MB/sec. Actuators • newjointspositions, movements • speakers (2) • virtualfovea • other… less than 100 kB/sec. The RL agent has no explicit (predefined) informationconcerningthenature of the sensory data, nor theactionsperformed by theactuators.

  4. Data flow model • program code • I/O structure • type (atomic, evolved) • localstaticmemory • priority • resources • expiration • list of actions, rewards • other Atomic sensory Atomic sensory Conceptnode Evolvednode Evolvednode ... Evolvednode Evolvednode Evolvednode Generalizedartificial neuron. Transfer function == PROGRAM Evolvednode Evolvednode Atomicactuator

  5. Self-programming == managingconceptnetwork • constructingnewconcepts, theirstructure, especiallytheembedded program code • adding/removingconcepts to/fromtheconceptnetwork • adding/removingindividuallinksbetweenconcepts • evaluatingconceptnetwork Reinforcement learning setting STATE → „active” concept, i.e. one returning an outputvalue (vector) program action ACTION → next program to be executed on thatoutputvalue • action selectionis independent of thecurrentvalue of theoutput • theexecuted program isembeddedinthenextconcepttheselected action points to, not intheconceptreturningoutputvalue • for multi-inputconcepts (>1), execution will be postponeduntilalloutputvalues of theotherconceptsareavailable and valid program action

  6. Conceptnetworkevaluation Conceptnode long-term program code, stored on mass storagedevice Runtimenode short-termexecutable of a conceptin RAM, lifetime <100 ms • conceptsareevaluated by launchingtheir program code as a runtimenode • input/outputvaluesareruntimespecific, i.e. unique for differentruntimes • multipleruntimes of the same concept (>1000) maycoexist and be executed • terminatedruntimescauseupdates of RL valuefunctions, links and conceptnodes Conceptnode Runtimenode Runtimenode Runtimenode Conceptnode

  7. Hermeneuticcircle* Removeconcept Discardruntime, ifexecutionhasfailed Terminatedruntime Executeruntime Update RL valuesRemovelinks Conceptnode New createdruntime Terminatedruntime Repeat N times Ask for next action Createnewruntime Addnewconcepts Addnewlinks Conceptnode Terminatedruntime Conceptnode

  8. System design layers VirtualMachine One accumulator (int), one IDX register (int),ZERO flag, MINUS flag, localstaticmemory (int *), one output (int *), N>0 inputs (int *), nowint == 16 bits. PHYSICS VM Instruction Set 50+ machinecodeinstructions, fixed but customizable, resemblingthose of early 1980s microprocessors, plusspecialpurposeinstructions, like WAIT. ELEMENTS ConceptPrograms Program Generator makesshortprograms, 4-7 instructions long. A heuristicsearch, applying 30+ rules, reducesthespace of 1020 programs to ˜ 108usefulones. CHEMISTRY Executableruntimes Sorting out uselessconcept/linkscontinues for programs: reportingfatalerrors (e.g. out of range), running out of resources (infiniteloop), correct but rarelyutilized, uselessinterms of RL, thosethatjustdidn’thaveluck. LIFE

  9. Internal program/data structure All conceptshare and exchange data withthe same commontypeint[n], i.e. a vector(array) of nintegers. Theint → int16_t (16-bit) incurrentimplementation. 0 1 2 3 4 5 Example: single visualpixel 5 row col Y U V 12 bytes sizecoordinatesvalue int *prog(const int *src_1,..., const int *src_N){ static int memory[size]; // program codehere } • Each program consists of: • N>0 constinputvectorsthat (cannot be overwritten), lengthknownatruntime • single outputvector, predefined max length, actuallengthisreturned • optionallocalstaticmemory, knowsize, sharedamongallruntimes

  10. VM Diagram acc idx Input(s) int int Z M Output size int int int CPU size int int int size int int int int int int localstatic 0000 MOVI IDX, 0002 0003 MOVX A, var1[idx] 0006 SAVI [00],A 0009 RET 0000 MOV A, var1[00] 0005 ADD A, var2[00] 0010 APPEND, A 0011 JZ 0005 0014 RET

  11. Noteworthy controls of runtimes Expiration time – a real time value of milisecond resolution, typically 100 mssincetheruntimecreation time. When time passes, theruntimepassesaway, no matterwhat state itiscurrently in. Priority– individuallyassignedrealnumberdeterminingthe order of executionrelative to otherruntimescurrentlyawaitinginthepriorityqueue. Resources– individuallyassignedrealnumberdeterminingtheamount of system resources a runtimemayexhaust. Basically, itdenotesthenumber of VM instructions a runtimemayperform. Useful for overcomingthehalting problem. Status – the state a runtimeisin: PENDING, CREATED, EXECUTED, SLEEP, TERMINATED, EXITED, DISCARD (discussedbelow). Hash– calculatedbased on a uniqueID of a concept to launchitsruntime andIDs of runtime(s) of allinput(s), used to avoidrepeatedcomputation of the same combination of data and program.

  12. Runtime life cycle and state transitions* Pending A runtimeisrequested but awaitingallinputs to be available. Thisisthecase of multipleinputconceptsonly. 70% runtimes do timeoutbeforecompletion. Created Ready to run and awaitingin a priorityqueue. 50% timeoutbeforehavingever a chance to be executed. Single inputconcepts start inthis state directly. Executed Will return successorerror. Contextswitchingmayputtheruntimeinpriorityqueueagain. Sleep If a temporalinstructionisencountered, executionissuspendeduntilrise time passes. Terminated The program completedcomputation and returnedoutputvector. Thisthreadissupposed to breedmoreprocessing. Exited The program completedcomputation w/o anyerror, but exitedduee.g. not finding a match. Furtherprocessing for thisthreadisaborted. Discard The program reported a fatalerrororexhaustedall resources. Actualmemoryreleasingisposponeduntilall dependent processesunlocktheruntime.

  13. Natural environment parallel Concepts Species Runtimes Individuals of species Links Dependenciesbetweenspecies Non-parallels Concepts do not evolve, i.e. program codedoesn’tchageduringconceptlifetime. A species (concept) mayexistevenifcurrently no single individual (runtime) isliving. Concept program code and structureisreusable. Twoseparateconceptsplacedatdifferentlocation of thenetworkmayshareexactlythe same „genotype”.

  14. Processingtemporalpatterns* Specialpurposeinstructions WAIT suspendexecution for N miliseconds DELAY computedifference (ms) betweenthecreation time of theexecutedruntime and thecreation time of one of theinputs Applicationexample Visual movementdetectingconcept Pseudocode DELAY pixel 1 DELAY pixel 2 DELAY pixel 3 check condition TERMINATE if fulfilled EXIT otherwise pixel 1, pos. x1,y1 pixel 2, pos. x2,y2 match pixel 3, pos. x3,y3

  15. VisionProcessingexample New imageiscomparedwiththecurrentlystored New imagearrivesfrom robot to host Ifpixeldifferencein YUV exceedsa predefinedthreshold, a vectoriscreated. 5 row col Y U V Runtimenode An atomic sensory runtimeiscreated, directlyinTERMINATED state, withpixelvectoras outputvalue. int int int int int int Runtimenode Runtimenode Runtimenode Runtimea priori priorityis setto amplitudeinluminance. Runtimenode Runtime isplacedina priorityqueue. Runtimenode Runtimenode Priority = abs(Yt1-Yt0)

  16. Conceptintegration & patternmatching* 1 3 2 2 2 2 Runtimenode 4 Runtimenode Conceptnode Runtimenode Conceptnode Runtimenode

  17. Exploration/Exploitation & Intrinsicmotivation Conceptnode Exploration → addingnew action (link and concept) RconstRconst+∑Ri P(Exploration) ~ Conceptnode Temporaldifference learning No distinctconsecutivestates → learning may be performedconcurrently Immediate reward

  18. Immediate reward as intrinsicreward rewardingconcept independentlycalculated for each link/action immediatereward non-rewardingconcept justified by cumulativereward ... JMP COND EXIT(negative) RET(positive) state space Conceptnode p = ( ) NposNpos+Nneg partitioning r = - log2(p) ravg. = - p*log2(p)

  19. Comparisonwithotherapproaches LevinSearch and similarattempts (naïveapproach) • intractable, limited distinctionbetween data and program, no continuousoperation, etc. • OOPS (Schmidhuber) incremental learning: requires list of problems. AGINAO approachesproblemsin natural order of theircomplexity. PUnS (Schaul,Sch.), FrontierSearch (Sun,Sch.),Self-improving program search (Kaiser). • Gödelmachine (Schmidhuber) theotherextreme. Is AGINAO an approximation? Evolutionary/Geneticprogramming • AGINAO does not matchdefinition: no fitness function, no population, no generations/mutations, etc.Extensions: ADF, reusableprograms (Koza). • Evolutionarymethods not applicable for interactionwiththe environment (Sutton). Self-programming for the AGI – disctinctionfromadaptivity/learning • musthave a Turing-likemachinebuilt-in on top, oritssourcecode be modifiable • programsmust be automatically/random generatedratherthanhumancrafted • do not matchcriteria: LIDA (Franklin), HTM (Hawkins), NARS? (Wang), Soar (Laird) • match: Novamente (Goertzel), MOSES, automated program learning (Looks), VARIAC (Hall), Ikon Flux (Nivel,Thorisson)

  20. Summary: uniqueproperties of AGINAO architecture Robotic/Embodiedapproachin natural environment thatconstitutesoptimallyorderedtrainingexamples. No dealingwithtoyproblems. Temporalaspect of theconceptsnaturallyembeddedinthecognitivearchitecture.Spatial and temporalpatternsarevirtuallyindistinguishable by the learning engine. Built-inprocesses of artificialeconomics, to copewiththedanger of combinatorialexplosion, to list: priorityqueue, expiration time, resource management. Two step self-programming: atthemachinecodelevel and attheconceptlevel,for greaterflexibility. No explicit fitness measure. The learning issupposed to be basedpurely on theinformationtheory. Instead of trying to learnhow to solve a given problem thesystem solveswhatcaneasily be learned. Only on thefoundation of easilylearnedconceptsitwouldproceed to themorechallengingones, eventually to encounterand attacktheproblemsthedesignerswouldlikeit to solve.

More Related