slide1
Download
Skip this Video
Download Presentation
Sandbox Learning : Try without error ? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover

Loading in 2 Seconds...

play fullscreen
1 / 18

Sandbox Learning : Try without error ? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover - PowerPoint PPT Presentation


  • 158 Views
  • Uploaded on

Sandbox Learning : Try without error ? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover Institut für Systems Engineering – System- und Rechnerarchitektur Appelstraße 4 30159 Hannover [email protected] +49 (0)511 762 19730

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Sandbox Learning : Try without error ? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover' - emmly


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Sandbox Learning: Trywithouterror?

Prof. Dr.-Ing. C. Müller-Schloer

Universität Hannover

Institut für Systems Engineering –

System- und Rechnerarchitektur

Appelstraße 4

30159 Hannover

[email protected]

+49 (0)511 762 19730

based on jointworkwith Hartmut Schmeck, University of Karlsruhe,

and Theo Ungerer, University of Augsburg

Team: Jörg Hähner, Holger Prothmann, Fabian Rochner, Sven Tomforde

outline
Outline
  • Online learning and errors
  • A firstsolution
    • OrganicTrafficControl
    • OrganicNetworkControl
  • Open questions
learning
Learning

Learning

  • Observation of the world, update of a world model
  • Acting in the world: Try & error
    • Reinforcement learning: Reward/penalty assigned to action  influencesfuturedecisions
    • But: Immediate real-world effects

Nature

  • Collective level (genotype)
    • 4 bn. years
    • Huge populations
    • Redundancy (neglect of the individual)
  • Individual level (phenotype)
    • Modification of behavior or preferences based on experience (try) …
    • …as long as theindividualsurvives.
learning in technical systems
Learning in technical systems

Requirements

  • Immediate reaction (even if sub-optimal)
  • Guaranteed prevention of illegal actions (4-way green)
  • Adaptation and long-term improvement
  • How long is long-term? Learning speed!!

Example

  • Learning traffic light controller
    • Genetic algorithm with selection based on real-world evaluation
    • # tries until reasonable solution: 1000
    • Assessment time constant (traffic): 15 minutes
    •  999 unsuitabletries
    •  10 days
generic 3 level architecture
5Generic 3-level architecture

User

Definition of system objectives

objectives (LoS, …)

Level 2

Layer 2

  • Sandbox: Off-line parameter optimization
  • Evolutionary Algorithm (EA)
  • Simulation-based evaluation
  • Only legal parameter sets sent to level 1

Simulator

EA

Layer 1

  • Immediate reaction
  • Observer: Situation classification
  • Selection from legal parameter sets
    • might be suboptimal level 2

Level 1

Observer

LCS

SuOC

  • Real world
  • Sensors
  • Actuators

System under Observation/Control

detector

data

actuator

settings

Productive system

example 1 organic traffic control
Example 1: Organic Traffic Control

Goals

  • Network of adaptive learning traffic light controllers (TLCs).
  • TLCslearn with some limited sensory horizon.
  • TLCscooperate to achieve a global goal (e.g. reduced avg. travel time).
  • Explore possibilities/limitations of decentralized control systems.

Phase 1

    • Single, isolated junction

Phase 2

    • Collaborating TLCs
    • Progressive signals (GrüneWelle)
traffic control architecture
7Traffic Control Architecture

User

Definition of system objectives

objectives (LOS, …)

Level 2

Layer 2

  • Off-line parameter optimisation
  • Evolutionary Algorithm (EA) evolves TLC parameters
  • Simulation-based evaluation (AIMSUN)

Simulator

EA

Layer 1

  • On-line parameter selection
  • Observer monitors traffic
  • Learning Classifier System (LCS) selects TLC parameters and learns rule quality

Level 1

Observer

LCS

SuOC

  • Control of traffic signals
  • Industry-standard TLC
    • Fixed-time
    • Traffic-responsive
    • Parameters determine performance

System under Observation/Control

detector

data

signal

settings

Traffic Light Controller (TLC)

otc performance
OTC: Performance

OTC performance during three consecutive days

Manually designed reference

example 2 organic network control
Example 2: OrganicNetworkControl
  • OrganicControl of Data CommunicationNetworks
  • Controland management of networkprotocolclients in datacommunicationnetworks
  • Autonomouscontrolsystemforeachnetworkentity
  • Collaborationbetweenneighbourednetworkentities
onc motivation
ONC: Motivation
  • Networkprotocolconfigurationisstatic
    • Goal: dynamicadaptation of networkprotocolparametersettings to changingenvironment
  • Client actswithin large computernetworks
    • Currentnetworkstatus has influence on theperformanceof thenetworkprotocol.
  • Computer isusedfor different taskssimultaneously
    • Currentusage of systemressourceshas influence ontheperformance of thenetworkprotocol.
onc bittorrent

File

ONC: BitTorrent
  • Currentfocus: BitTorrent1)
    • Trackerresponsibleformeeting of peers
    • Fairness-baseddistribution
    • Files aresplitintosmallerparts („chunks“)
  • Variable parameters(mostimportantones):
    • Delays
    • Intervals (Choking, …)
    • Number of peers(minimum,maximum, initiallyfromtracker, etc.)
    • Number of openconnections
    • Chunksize

Chunk

(1) „IncentivesBuildRobustness in BitTorrent“: Bram Cohen, Proc. 1st Workshop on Economics of Peer-to-Peer Systems, Berkeley 2003.

onc architecture

objectives (download-rate, etc.)

Level 1

Observer

LCS

ONC architecture
  • User interface
  • User defines system objectives
    • E.g. download-rate for BitTorrent or coverage-rate for MANETs

Level 2

Simulator

Observer

  • Level 2
  • Extend behavioral repertoire of level 1
  • Off-line learning (protocol parameter sets)

EA

  • Level 1
  • Adapt SuOC-parameters (rules)
  • On-line learning (rule fitness)

SuOC

  • System under Observation and Control
  • Network protocol client
  • E.g. BitTorrent Client

network

data

Network protocol Client

protocol

configuration

evaluation off line 1
Evaluation: Off-line (1)
  • Off-lineoptimisation: influence of number of peers
onc evaluation on line 2
ONC Evaluation : On-line (2)
  • Adaptation to backgroundclientusageprofile
open questions future work 1 2
Open questions, future work (1/2)

Incongruent model

  • Model adjustment

Abstraction of non-local environment

  • Influence of neighboring nodes?

Verification

  • Optimized parameter sets could be verified before implemented into layer 1

State-less behavior  Multi-step LCS

  • LCSs are stateless (stimulus – response)
  • Learning of action sequences?

objectives

Layer 2

Simulator

EA

Layer 1

Observer

LCS

Productive system

open questions future work 2 2
Open questions, future work (2/2)
  • So far: Simulation of local neighborhood with assumptions about the behavior of other nodes.

Communication between nodes

  • Level 1: Increase learning performance by exchange of learnt rule sets: Rule generalization?
  • Level 2: Exchange of populations  distributed EA

Parallel “sandbox” world on layer 2

  • Network-wide distributed simulation: Synchronization? Convergence?
  • Influence on real world?
  • Analogy from human society: social discourse

Layer 2

Layer 2

Simulator

Simulator

EA

EA

Layer 1

Layer 1

Observer

Observer

LCS

LCS

Productive system

Productive system

ad