Class project
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Class Project PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on
  • Presentation posted in: General

Class Project. Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs or individual Email me by Wednesday, November 3. Projects. Implementing Knn to Classify Bedform Stability Fields

Download Presentation

Class Project

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Class project

Class Project

  • Due at end of finals week

  • Essentially anything you want, so long as it’s AI related and I approve

  • Any programming language you want

  • In pairs or individual

  • Email me by Wednesday, November 3


Projects

Projects

  • Implementing Knn to Classify Bedform Stability Fields

  • Blackjack Using Genetic Algorithms

  • Computer game players:Go, Checkers, Connect Four, Chess, Poker

  • Computer puzzle solvers: Minesweeper, mazes

  • Pac-Man with intelligent monsters

  • Genetic algorithms:

    • blackjack strategy

  • Automated 20-questions player

  • Paper on planning

  • Neural network spam filter

  • Learning neural networks via GAs


Projects1

Projects

  • Solving neural networks via backprop

  • Code decryptor using Gas

  • Box pushing agent (competing against an opponent)


What didn t work as well

What didn’t work as well

  • Too complicated games: Risk, Yahtzee, Chess, Scrabble, Battle Simulation

    • Got too focused in making game work

    • I sometimes had trouble running the game

    • Game was often incomplete

    • Didn’t have time to do enough AI

  • Problems that were too vague

    • Simulated ant colonies / genetic algorithms

    • Bugs swarming for heat (emergent intelligence never happened)

    • Finding paths through snow

  • AdaBoost on protein folding data

    • Couldn’t get boosting working right, needed more time on small datasets (spent lots of time parsing protein data)


Reinforcement learning

Reinforcement Learning

  • Game playing: So far, we have told the agent the value of a given board position.

  • How can agent learn which positions are important?

    • Play whole bunch of games, and receive reward at end (+ or -)

    • How to determine utility of states that aren’t ending states?


The setup possible game states

The setup: Possible game states

  • Terminal states have reward

  • Mission: Estimate utility of all possible game states


What is a state

What is a state?

  • For chess: state is a combination of position on board and location of opponents

    • Half of your transitions are controlled by you (your moves)

    • Other half of your transitions are probabilistic (depend on opponent)

  • For now, we assume all moves are probabilistic (probabilities unknown)


Passive learning

Passive Learning

  • Agent learns by “watching”

  • Fixed probability of moving from one state to another


Sample results

Sample Results


Technique 1 naive updating

Technique #1: Naive Updating

  • Also known as Least Mean Squares (LMS) approach

  • Starting at home, obtain sequence of states to terminal state

  • Utility of terminal state = reward

  • loop back over all other states

    • utility for state i = running average of all rewards seen for state i


Naive updating analysis

Naive Updating Analysis

  • Works, but converges slowly

    • Must play lots of games

  • Ignores that utility of a state should depend on successor


Technique 2 adaptive dynamic programming

Technique #2: Adaptive Dynamic Programming

  • Utility of a state depends entirely on the successor state

    • If a state has one successor, utility should be the same

    • If a state has multiple successors, utility should be expected value of successors


Finding the utilities

Finding the utilities

  • To find all utilities, just solve equations

  • Set of linear equations, solveable

  • Changes each iteration as you learn probabilities

  • Completely intractable for large problems:

    • For a real game, it means finding actual utilities of all states


Technique 3 temporal difference learning

Technique 3: Temporal Difference Learning

  • Want utility to depend on successors, but want to solve iteratively

  • Whenever you observe a transition from i to j:

  • a = learning rate

  • difference between successive states = temporal difference

  • Converges faster than Naive updating


Active learning

Active Learning

  • Probability of going from one state to another now depends on action

  • ADP equations are now:


Active learning1

Active Learning

  • Active Learning with Temporal Difference Learning: works the same way (assuming you know where you’re going)

  • Also need to learn probabilities to eventually make decision on where to go


Exploration where should agent go to learn utilities

Exploration: where should agent go to learn utilities?

  • Suppose you’re trying to learn optimal game playing strategies

    • Do you follow best utility, in order to win?

    • Do you move around at random, hoping to learn more (and losing lots in the process)?

  • Following best utility all the time can get you stuck at an imperfect solution

  • Following random moves can lose a lot


Where should agent go to learn utilities

Where should agent go to learn utilities?

  • f(u,n) = exploration function

    • depends on utility of move (u), and number of times that agent has tried it (n)

  • One possibility: instead of using utility to decide where to go, use

  • Try a move a bunch of times, then eventually settle


Q learning

Q-learning

  • Alternative approach for temporal difference learning

  • No need to learn probabilities: considered more desirable sometimes

  • Instead, looking for “quality” of (state, action) pair


Generalization in reinforcement learning

Generalization in Reinforcement Learning

  • Maintaining utilities for all seen states in a real game is intractable.

  • Instead, treat it as a supervised learning problem

  • Training set consists of (state, utility) pairs

    • Or, alternatively, (state, action, q-value) triples

  • Learn to predict utility from state

  • This is a regression problem, not a classification problem

    • Radial basis function neural networks (hidden nodes are Gaussians instead of sigmoids)

    • Support vector machines for regression

    • Etc…


Other applications

Other applications

  • Applies to any situation where something is to learn from reinforcement

  • Possible examples:

    • Toy robot dogs

    • Petz

    • That darn paperclip

    • “The only winning move is not to play”


  • Login