Sandbox Learning : Try without error ? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover

Sandbox Learning: Trywithouterror? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover Institut für Systems Engineering – System- und Rechnerarchitektur Appelstraße 4 30159 Hannover cms@sra.uni-hannover.de +49 (0)511 762 19730 based on jointworkwith Hartmut Schmeck, University of Karlsruhe, and Theo Ungerer, University of Augsburg Team: Jörg Hähner, Holger Prothmann, Fabian Rochner, Sven Tomforde

Outline • Online learning and errors • A firstsolution • OrganicTrafficControl • OrganicNetworkControl • Open questions

Learning Learning • Observation of the world, update of a world model • Acting in the world: Try & error • Reinforcement learning: Reward/penalty assigned to action  influencesfuturedecisions • But: Immediate real-world effects Nature • Collective level (genotype) • 4 bn. years • Huge populations • Redundancy (neglect of the individual) • Individual level (phenotype) • Modification of behavior or preferences based on experience (try) … • …as long as theindividualsurvives.

Learning in technical systems Requirements • Immediate reaction (even if sub-optimal) • Guaranteed prevention of illegal actions (4-way green) • Adaptation and long-term improvement • How long is long-term? Learning speed!! Example • Learning traffic light controller • Genetic algorithm with selection based on real-world evaluation • # tries until reasonable solution: 1000 • Assessment time constant (traffic): 15 minutes •  999 unsuitabletries •  10 days

5 Generic 3-level architecture User Definition of system objectives objectives (LoS, …) Level 2 Layer 2 • Sandbox: Off-line parameter optimization • Evolutionary Algorithm (EA) • Simulation-based evaluation • Only legal parameter sets sent to level 1 Simulator EA Layer 1 • Immediate reaction • Observer: Situation classification • Selection from legal parameter sets • might be suboptimal level 2 Level 1 Observer LCS SuOC • Real world • Sensors • Actuators System under Observation/Control detector data actuator settings Productive system

Example 1: Organic Traffic Control Goals • Network of adaptive learning traffic light controllers (TLCs). • TLCslearn with some limited sensory horizon. • TLCscooperate to achieve a global goal (e.g. reduced avg. travel time). • Explore possibilities/limitations of decentralized control systems. Phase 1 • Single, isolated junction Phase 2 • Collaborating TLCs • Progressive signals (GrüneWelle)

7 Traffic Control Architecture User Definition of system objectives objectives (LOS, …) Level 2 Layer 2 • Off-line parameter optimisation • Evolutionary Algorithm (EA) evolves TLC parameters • Simulation-based evaluation (AIMSUN) Simulator EA Layer 1 • On-line parameter selection • Observer monitors traffic • Learning Classifier System (LCS) selects TLC parameters and learns rule quality Level 1 Observer LCS SuOC • Control of traffic signals • Industry-standard TLC • Fixed-time • Traffic-responsive • Parameters determine performance System under Observation/Control detector data signal settings Traffic Light Controller (TLC)

Hamburg

OTC: Performance OTC performance during three consecutive days Manually designed reference

Example 2: OrganicNetworkControl • OrganicControl of Data CommunicationNetworks • Controland management of networkprotocolclients in datacommunicationnetworks • Autonomouscontrolsystemforeachnetworkentity • Collaborationbetweenneighbourednetworkentities

ONC: Motivation • Networkprotocolconfigurationisstatic • Goal: dynamicadaptation of networkprotocolparametersettings to changingenvironment • Client actswithin large computernetworks • Currentnetworkstatus has influence on theperformanceof thenetworkprotocol. • Computer isusedfor different taskssimultaneously • Currentusage of systemressourceshas influence ontheperformance of thenetworkprotocol.

File ONC: BitTorrent • Currentfocus: BitTorrent1) • Trackerresponsibleformeeting of peers • Fairness-baseddistribution • Files aresplitintosmallerparts („chunks“) • Variable parameters(mostimportantones): • Delays • Intervals (Choking, …) • Number of peers(minimum,maximum, initiallyfromtracker, etc.) • Number of openconnections • Chunksize Chunk (1) „IncentivesBuildRobustness in BitTorrent“: Bram Cohen, Proc. 1st Workshop on Economics of Peer-to-Peer Systems, Berkeley 2003.

objectives (download-rate, etc.) Level 1 Observer LCS ONC architecture • User interface • User defines system objectives • E.g. download-rate for BitTorrent or coverage-rate for MANETs Level 2 Simulator Observer • Level 2 • Extend behavioral repertoire of level 1 • Off-line learning (protocol parameter sets) EA • Level 1 • Adapt SuOC-parameters (rules) • On-line learning (rule fitness) SuOC • System under Observation and Control • Network protocol client • E.g. BitTorrent Client network data Network protocol Client protocol configuration

Evaluation: Off-line (1) • Off-lineoptimisation: influence of number of peers

ONC Evaluation : On-line (2) • Adaptation to backgroundclientusageprofile

Open questions, future work (1/2) Incongruent model • Model adjustment Abstraction of non-local environment • Influence of neighboring nodes? Verification • Optimized parameter sets could be verified before implemented into layer 1 State-less behavior  Multi-step LCS • LCSs are stateless (stimulus – response) • Learning of action sequences? objectives Layer 2 Simulator EA Layer 1 Observer LCS Productive system

Open questions, future work (2/2) • So far: Simulation of local neighborhood with assumptions about the behavior of other nodes. Communication between nodes • Level 1: Increase learning performance by exchange of learnt rule sets: Rule generalization? • Level 2: Exchange of populations  distributed EA Parallel “sandbox” world on layer 2 • Network-wide distributed simulation: Synchronization? Convergence? • Influence on real world? • Analogy from human society: social discourse Layer 2 Layer 2 Simulator Simulator EA EA Layer 1 Layer 1 Observer Observer LCS LCS Productive system Productive system

Thankyouforyourattention!

Sandbox Learning : Try without error ? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover