- 130 Views
- Uploaded on
- Presentation posted in: General

Class Project

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Due at end of finals week
- Essentially anything you want, so long as it’s AI related and I approve
- Any programming language you want
- In pairs or individual
- Email me by Wednesday, November 3

- Implementing Knn to Classify Bedform Stability Fields
- Blackjack Using Genetic Algorithms
- Computer game players:Go, Checkers, Connect Four, Chess, Poker
- Computer puzzle solvers: Minesweeper, mazes
- Pac-Man with intelligent monsters
- Genetic algorithms:
- blackjack strategy

- Automated 20-questions player
- Paper on planning
- Neural network spam filter
- Learning neural networks via GAs

- Solving neural networks via backprop
- Code decryptor using Gas
- Box pushing agent (competing against an opponent)

- Too complicated games: Risk, Yahtzee, Chess, Scrabble, Battle Simulation
- Got too focused in making game work
- I sometimes had trouble running the game
- Game was often incomplete
- Didn’t have time to do enough AI

- Problems that were too vague
- Simulated ant colonies / genetic algorithms
- Bugs swarming for heat (emergent intelligence never happened)
- Finding paths through snow

- AdaBoost on protein folding data
- Couldn’t get boosting working right, needed more time on small datasets (spent lots of time parsing protein data)

- Game playing: So far, we have told the agent the value of a given board position.
- How can agent learn which positions are important?
- Play whole bunch of games, and receive reward at end (+ or -)
- How to determine utility of states that aren’t ending states?

- Terminal states have reward
- Mission: Estimate utility of all possible game states

- For chess: state is a combination of position on board and location of opponents
- Half of your transitions are controlled by you (your moves)
- Other half of your transitions are probabilistic (depend on opponent)

- For now, we assume all moves are probabilistic (probabilities unknown)

- Agent learns by “watching”
- Fixed probability of moving from one state to another

- Also known as Least Mean Squares (LMS) approach
- Starting at home, obtain sequence of states to terminal state
- Utility of terminal state = reward
- loop back over all other states
- utility for state i = running average of all rewards seen for state i

- Works, but converges slowly
- Must play lots of games

- Ignores that utility of a state should depend on successor

- Utility of a state depends entirely on the successor state
- If a state has one successor, utility should be the same
- If a state has multiple successors, utility should be expected value of successors

- To find all utilities, just solve equations
- Set of linear equations, solveable
- Changes each iteration as you learn probabilities
- Completely intractable for large problems:
- For a real game, it means finding actual utilities of all states

- Want utility to depend on successors, but want to solve iteratively
- Whenever you observe a transition from i to j:

- a = learning rate
- difference between successive states = temporal difference
- Converges faster than Naive updating

- Probability of going from one state to another now depends on action
- ADP equations are now:

- Active Learning with Temporal Difference Learning: works the same way (assuming you know where you’re going)
- Also need to learn probabilities to eventually make decision on where to go

- Suppose you’re trying to learn optimal game playing strategies
- Do you follow best utility, in order to win?
- Do you move around at random, hoping to learn more (and losing lots in the process)?

- Following best utility all the time can get you stuck at an imperfect solution
- Following random moves can lose a lot

- f(u,n) = exploration function
- depends on utility of move (u), and number of times that agent has tried it (n)

- One possibility: instead of using utility to decide where to go, use
- Try a move a bunch of times, then eventually settle

- Alternative approach for temporal difference learning
- No need to learn probabilities: considered more desirable sometimes
- Instead, looking for “quality” of (state, action) pair

- Maintaining utilities for all seen states in a real game is intractable.
- Instead, treat it as a supervised learning problem
- Training set consists of (state, utility) pairs
- Or, alternatively, (state, action, q-value) triples

- Learn to predict utility from state
- This is a regression problem, not a classification problem
- Radial basis function neural networks (hidden nodes are Gaussians instead of sigmoids)
- Support vector machines for regression
- Etc…

- Applies to any situation where something is to learn from reinforcement
- Possible examples:
- Toy robot dogs
- Petz
- That darn paperclip
- “The only winning move is not to play”