1 / 23

Genetic Algorithms (Evolutionary Computing)

Genetic Algorithms (Evolutionary Computing). Genetic Algorithms are used to try to “evolve” the solution to a problem Generate prototype solutions called chromosomes (individuals) Backpack problem as example: http://home.ksp.or.jp/csd/english/ga/gatrial/Ch9_A2_4.html

skah
Download Presentation

Genetic Algorithms (Evolutionary Computing)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genetic Algorithms(Evolutionary Computing) • Genetic Algorithms are used to try to “evolve” the solution to a problem • Generate prototype solutions called chromosomes (individuals) • Backpack problem as example: • http://home.ksp.or.jp/csd/english/ga/gatrial/Ch9_A2_4.html • All individuals form the population • Generate new individuals by reproduction • Use a fitness function to evaluate individuals • Survival of the fittest: population has a fixed size • Individuals with higher fitness is more likely to reproduce

  2. Reproduction Methods • Mutation • Alter a single gene in the chromosome randomly to create a new chromosome • Example • Cross-over • Pick a random location within chromosome • New chromosome receives first set of genes from parent 1, second set from parent 2 • Example • Inversion • Reverse the chromsome

  3. Interpretation • Genetic algorithms try to solve a hill climbing problem • Method is parallelizable • The trick is in how you represent the chromosome • Tries to avoid local maxima by keeping many chromsomes at a time

  4. Another Example:Traveling Sales Rep Problem • How to represent a chromosome? • What effects does this have on crossover and mutation?

  5. TSP • Chromosome: Ordering of city numbers • (1 9 2 4 6 5 7 8 3) • What can go wrong with crossover? • To fix, use order crossover technique • Take two chromosomes, and take two random locations to cut • p1 = (1 9 2 | 4 6 5 7 | 8 3) • p2 = (4 5 9 | 1 8 7 6 | 2 3) • Goal: preserve as much as possible of the orderings in the chromosomes

  6. Order Crossover • p1 = (1 9 2 | 4 6 5 7 | 8 3) • p2 = (4 5 9 | 1 8 7 6 | 2 3) • New p1 will look like: • c1 = (x x x | 4 6 5 7 | x x) • To fill in c1, first produce ordered list of cities from p2, starting after cut, eliminating cities in c1 • 2 3 9 1 8 • Drop them into c1 in order • c1 = (2 3 9 4 6 5 7 1 8) • Do similarly in reverse to obtain • c2 = (3 9 2 1 8 7 6 4 5)

  7. Mutation & Inversion • What can go wrong with mutation? • What is wrong with inversion?

  8. Mutation & Inversion • Redefine mutation as picking two random spots in path, and swapping • p1 = (1 9 2 4 6 5 7 8 3) • c1 = (1 9 8 4 6 5 7 2 3) • Redefine inversion as picking a random middle section and reversing: • p1 = (1 9 2 | 4 6 5 7 8 | 3) • c1 = (1 9 2 | 8 7 5 6 4 | 3) • Another example: • http://home.online.no/~bergar/mazega.htm

  9. Reinforcement Learning • Game playing: So far, we have told the agent the value of a given board position. • How can an agent learn which board positions are important? • Play a whole bunch of games, and receive reward at end (+ or -) • How do you determine utility of states that aren’t ending states?

  10. The setup: Possible game states • Terminal states have reward • Mission: Estimate utility of all possible game states

  11. Passive Learning • Agent learns by “watching” • Fixed probability of moving from one state to another

  12. Sample Results

  13. Technique #1: Naive Updating • Also known as Least Mean Squares (LMS) approach • Starting at home, obtain sequence of states to terminal state • Utility of terminal state = reward • loop back over all other states • utility for state i = running average of all rewards seen for state i

  14. Naive Updating Analysis • Minimizes mean square error with respect to seen data • Works, but converges slowly • Must play lots of games • Ignores that utility of a state should depend on successor

  15. Technique #2: Adaptive Dynamic Programming • Utility of a state depends entirely on the successor state • If a state has one successor, utility should be the same • If a state has multiple successors, utility should be expected value of successors

  16. Finding the utilities • To find all utilities, just solve equations • This is done via dynamic programming • “Gold standard” – this gets you the right values instantly, no convergence or iteration • Completely intractable for large problems: • For a real game, it means finding actual utilities of all states • Assumes that you know Mij

  17. Technique 3: Temporal Difference Learning • Want utility to depend on successors, but want to solve iteratively • Whenever you observe a transition from i to j: • a = learning rate • difference between successive states = temporal difference • Converges faster than Naive updating

  18. Passive Learning in Unknown Environment • Unknown environment = transition probabilities unknown • Only affects technique 2, Adaptive Dynamic Programming • Iteratively: • Estimate transition probabilities based on what you’ve seen • Solve dynamic programming problem with best estimates so far

  19. Active Learning in an Unknown Environment • Probability of going from one state to another now depends on action • ADP equations are now:

  20. Exploration: where should agent go to learn utilities? • Suppose you’re trying to learn optimal blackjack strategies • Do you follow best utility, in order to win? • Do you move around at random, hoping to learn more (and losing lots in the process)? • Following best utility all the time can get you stuck at an imperfect solution • Following random moves can lose a lot

  21. Where should agent go to learn utilities? • f(u,n) = exploration function • depends on utility of move, and number of times that agent has tried it • One possibility: • Try a move a bunch of times, then eventually settle

  22. Generalization in Reinforcement Learning • Maintaining utilities for all seen states in a real game is intractable. • Instead, treat it as a supervised learning problem • Training set consists of (state, utility) pairs • Learn to predict utility from state • This is a regression problem, not a classification problem • Can use neural network with multiple outputs

  23. Other applications • Applies to any situation where something is to learn from reinforcement • Possible examples: • Toy robot dogs • Petz • That darn paperclip • “The only winning move is not to play”

More Related