1 / 41

Machine Learning CS 165B Spring 2012

Machine Learning CS 165B Spring 2012. Course outline. Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks (Ch. 4) Linear classifiers Support Vector Machines Bayesian Learning (Ch. 6 ) Genetic Algorithms (Ch. 9 )

Download Presentation

Machine Learning CS 165B Spring 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine LearningCS 165BSpring 2012

  2. Course outline • Introduction (Ch. 1) • Concept learning (Ch. 2) • Decision trees (Ch. 3) • Ensemble learning • Neural Networks (Ch. 4) • Linear classifiers • Support Vector Machines • Bayesian Learning (Ch. 6) • Genetic Algorithms (Ch. 9) • Instance-based Learning (Ch. 8) • Clustering • Computational learning theory (Ch. 7)

  3. Genetic Algorithms - History • Pioneered by John Holland in the 1970’s • Got popular in the late 1980’s • Based on ideas from Darwinian Evolution • Can be used to solve a variety of problems that are not easy to solve using other techniques • Particularly well suited for hard problems where little is known about the underlying search space • Widely-used in business, science and engineering

  4. Classes of Search Techniques Search Techniqes Enumerative Techniqes Guided random search techniqes Dynamic Programming DFS BFS Evolutionary Algorithms Tabu Search Hill Climbing Simulated Anealing Genetic Programming Genetic Algorithms

  5. Background: Biological Evolution • Biological analogy for learning • Lamarck: • Species adapt over time • Pass this adaptation to offsprings • Darwin: • Consistent, heritable variation among individuals in population • Natural selection of the fittest • Mendel: • A mechanism for inheriting traits • genotype → phenotype mapping (“code”) • Epigenetics: • Non-genetic inheritance

  6. Evolution in the real world • Each cell of a living thing contains chromosomes - strings of DNA • Each chromosome contains a set of genes - blocks of DNA • Each gene determines some aspect of the organism (like eye colour) • A collection of genes is sometimes called a genotype • A collection of aspects (like eye colour) is sometimes called a phenotype • Reproduction involves recombination of genes from parents and then small amounts of mutation (errors) in copying • The fitness of an organism is how much it can reproduce before it dies • Evolution based on “survival of the fittest”

  7. Start with a Dream… • Suppose you have a problem • You don’t know how to solve it • What can you do? • Can you use a computer to somehow find a solution for you? • This would be nice! Can it be done?

  8. A dumb solution A “blind generate and test” algorithm: Repeat Generate a random possible solution Test the solution and see how good it is Until solution is good enough

  9. Can we use this dumb idea? • Sometimes - yes: • if there are only a few possible solutions • and you have enough time • then such a method could be used • For most problems - no: • many possible solutions • with no time to try them all • so this method can not be used

  10. A “less-dumb” idea (GA) Generate a set of random solutions Repeat Test each solution in the set (rank them) Remove some bad solutions from set Duplicate some good solutions make small changes to some of them Until best solution is good enough

  11. GA(Fitness, Fitness_threshold, p, r, m) Fitness: assigns evolution score Fitness_threshold: termination criterion p: number of hypotheses in a generation r: fraction of population to be replaced m: mutation rate • Initialize: P ← Generate p random hypotheses • Evaluate: for each h in P, compute Fitness(h) • While [maxhFitness(h)] < Fitness_threshold • Create a new generation P (5 steps: Select, Crossover, Mutate, Update, Evaluate) • Return the hypothesis in P that has the highest Fitness

  12. Create a New Generation Ps • Select: Probabilistically select (1-r) p members of P to be non-replaced part of PS • Add rp members to these“survivors” • Crossover: Probabilistically select rp/2 pairs of hypotheses from P. For each pair h1, h2, produce two offspring by applying the Crossover operator. Add all offspring to PS • Mutate: Invert a randomly selected bit in m% of random members of PS • (Apply other operators, • e.g., Invert: “switch” portions of each hypothesis) • Update: P ← PS • Evaluate: for each h in P, compute Fitness(h)

  13. How to encode a hypothesis/solution? • Obviously this depends on the problem! • GA’s often encode solutions as fixed length “bitstrings” (e.g. 101110, 111111, 000101) • Each bit represents some aspect of the proposed solution to the problem • For GA’s to work, we need to be able to “test” any string and get a “score” indicating how “good” that solution is

  14. Representing hypotheses • Hypothesis representation specifics: • “1”s in all attribute value positions implies “don’t care” • Single “1”: only one attribute value possible • “0”s in all positions: no attribute values possible • Can represent rule-sets using concatenation of rules • Can construct symbolic strings (computer programs) instead of bit strings

  15. Fitness landscapes

  16. Overcast Outlook Wind Weak Rain Sunny Strong Representing hypotheses about concepts • Hypotheses: represent as conjunctive rules/bit strings • Fixed length bit strings • Enumerate attributes and possible attribute values • One approach; for each value a of attribute A: • 1 for A=a possible, 0 if not • Multiple values of A means  • Different attribute means  • Same for the target (post conditions) • IF Wind = StrongTHEN PlayTennis 0 1 1 1 0 Outlook Wind Tennis? Overcast Weak No Rain Strong Yes Sunny 1 1 1 0 1 0 1

  17. 11101001000 11001011000 Two-point Crossover 00111110000 00001010101 00101000101 11101001000 10001000100 Uniform Crossover 10011010011 00001010101 01101011001 Point Mutation 11101001000 11101011000 Typical operators Crossover Mask Initial strings Offspring 11101001000 11101010101 Single-point Crossover 11111000000 00001010101 00001001000

  18. Selecting most fit hypotheses • Fitness proportionate selection: • Also called “Roulette Wheel Selection” • Crowding problem possible under proportionate selection • One genetic structure becomes dominant • Low diversity in population • Alternatives to simple random selection • Tournament selection (for more diverse population): • Pick h1, h2 at random with uniform probability • With probability p, select the more fit • Rank selection: • Sort all hypotheses by fitness • Probability of selection is proportional to rank • Not fitness

  19. Example: GABIL • Representation: encoding into bit strings: • IF a1=T a2=F THEN c=T; IF a2=T THEN c=Fa1 a2 c a1 a2 c 10 01 1 11 10 0 • Encode multiple rules • Length of hypothesis grows with number of rules • Fitness based on predictive correctness of hypotheses Fitness(h) = (correct(h))2 • Genetic operators ???: • Need to accommodate variable length rule sets • Two point crossover • Standard mutation

  20. Two Point Crossover • Variable length rule sets • Examples with various representations of a and c values • a1 a2 c a1 a2 c10 01 1 11 10 0 • a1 a2 c a1 a2 ca1 a2 c10 01 1 11 10 0 0111 1 • Result must be a well-formed (WF) bitstring hypothesis • 10 01 1 11 10 0 000a1 a2 c a1 a2 ca1 a2 c?? • Idea: find the corresponding positions to do crossover • Choose #crossover points • Choose points in first parent randomly • Choose points in second parent to give WF rules

  21. Crossover with variable length bitstrings • Two (initially equal length) hypotheses: a1a2c a1a2 c h1 : 10 01 1 11 10 0 h2 : 01 11 0 10 010 • Choose crossover points for h,e.g., after bits 1, 8 • Now restrict crossover points in h2 to those that produce bitstrings with well-defined semantics e.g., 1, 3, 1, 8, 6, 8 • Example: if we choose 1, 3, result is a1a2c h3 : 11 10 0 a1a2c a1a2 ca1a2 c h4 : 00 01 1 11 11 0 10 01 0

  22. GABIL results • 92% correctness in basic form • Can extend to many variants • Add new genetic operators, also applied probabilistically: • AddAlternative: generalize constraint on ai by changing a 0 to 1 • DropCondition: generalize constraint on ai by changing every0to 1 • Performance improves to 95% • Add two bits to determine whether to allow these a1a2c a1a2 cAA DC 01 11 0 10 010 1 0 • The learning strategy also evolves! • r= 0.6, m= 0.001, p= 100-1000 • Performance of GABIL comparable to symbolic rule/tree learning methods

  23. Example: Traveling Salesman Problem (TSP) The traveling salesman must visit every city in his territory exactly once and then return to the starting point; given the cost of travel between all cities, how should he plan his itinerary for minimum total cost of the entire tour? TSP is NP-Complete

  24. TSP solution by GA A vector v = (i1 i2… in) represents a tour (v is a permutation of {1,2,…,n}) Fitness f of a solution is the inverse cost of the corresponding tour Initialization: use either some heuristics, or a random sample of permutations of {1,2,…,n} Fitness: proportionate selection

  25. TSP Crossover Build offspring by choosing a sub-sequence of a tour from one parent and preserving the relative order of cities from the other parent and feasibility Example: p1= (1 2 3 4 5 6 7 8 9) and p2= (4 5 2 1 8 7 6 9 3) First, the segments between cut points are copied into offspring o1= (x x x 4 5 6 7 x x) and o2= (x x x 1 8 7 6 x x)

  26. TSP Crossover Next, starting from the second cut point of one parent, the cities from the other parent are copied in the same order The sequence of the cities in the second parent is 9 – 3 – 4 – 5 – 2 – 1 – 8 – 7 – 6 After removal of cities from the first offspring we get 9 – 3 – 2 – 1 – 8 This sequence is placed in the first offspring o1= (2 1 8 4 5 6 7 9 3), and similarly in the second o2= (3 4 5 1 8 7 6 9 2)

  27. TSP Inversion The sub-string between two randomly selected points in the path is reversed Example: (1 2 3 4 5 6 7 8 9) is changed into (1 2 7 6 5 4 3 8 9) Such simple inversion guarantees that the resulting offspring is a legal tour

  28. Hypothesis space search by GA • Local minima not much of a problem • Big jumps in search space • Crowding as a problem • Various strategies to overcome over-representation of successful individuals • Tournament/rank selection • Fitness sharing • Fitness of an individual reduced by the presence of similar individuals

  29. Population evolution and schemas • How to characterize evolution of population in GA? • Schema: string containing 0, 1, * (don’t care) • E.g. 10**0* • Many instances of a schema: 101101, 100000, … • An individual represents many schemas • 0010 represents 16 schemas • Characterize population by • m(s, t)= number of instances of schemas in the population at time t • Schema theorem characterizes E(m(s,t+1)) in terms of • m(s,t) • Operators of genetic algorithm • Selection • Recombination • Mutation

  30. Characterizing Population Change • m(s, t)= number of instances of schema s in the population at time t • Find expected value of m(s, t+1)[E(m(s, t+1))]in terms of • m(s, t) • breeding parameters • Schema theorem provides lower bound on E(m(s, t+1) ) • First consider case of selection only • Then add effects of • crossover • mutation

  31. Selection only • f(h)= fitness of hypothesis (or bit string) h f(t)= average fitness of population of n at time t m(s, t)= number of instances of schema s at time t u(s, t)= average fitness of instances of s at time t n= size of population pt= set of bit strings at time t (i.e., population) • Probability of selecting h in one selectionstep • Probability in 1 selection of getting an instance of s

  32. Schema Theorem • After nselections (entire new generation): • BUT crossover/mutation may reduce number of instances of s: • pc= probability of single point crossover operator • pm= probability of mutation operator • l= length of single bit strings • o(s)= number of defined (non “*”) bits in s • d(s)= distance between leftmost, rightmost defined bits in s

  33. Interpretation of Schema theorem • Interpret E as • Proportional to average fitness of schema • Inversely proportional to fitness of average individual • More fit schema tend to grow in influence under selection, cross-over, and mutation but especially if: • Have small number of defined bits (o(s)) • Have close bundles of defined bits (d(s)) • GA keeps good chunks together: • GAs explore the search space by short, low-order schemata which, subsequently, are used for information exchange during crossover • Building block hypothesis: A genetic algorithm seeks near-optimal performance through the juxtaposition of short, low-order, high-performance schemata, called the building blocks

  34. + sin  x + ^ y x 2 Genetic Programming • Analogous ideas to basic GA, but elements are programs • Fitness: executing the program on a set of training data • Maintain the population size • Genetic algorithm: evolution of programs • Example: programs as (parse) trees

  35. + + sin ^ sin  x ^ x + x 2 + y x y Crossover + + sin ^ sin  x + x + x y ^ y x 2

  36. Example: Block Problem S • Goal: spell the word UNIVERSAL • Need appropriate representation • Key issue that determines efficiency of learning • Need • Representations of terminal arguments • “natural” representation, if possible • Primitive functions U I E N A R V L

  37. Example: Block Problem S • Terminals: • CS (current stack): • Name of top block on stack or F (False) if there is no current stack • TB (top correct block): • name of topmost correct block on stack (it and blocks below are in correct order) • NN (next necessary): • name of the next block needed above TB in the stack or F if no more blocks needed U I E N A R V L

  38. Primitive Functions • (MS x) (move to stack): • if block x is on the table, move x to the top of the stack and return the value T. Otherwise, do nothing and return the value F • (MT x) (move to table): • if block x is in stack, move block at top of stack to table and return the value T. Otherwise, return value F • (EQ x y) (equal): • return T if x equals y, and return F otherwise • (NOT x) (not x): • return T if x=F, else return F • (DU x y) (do until): • execute expression x repeatedly until expression y returns the value T

  39. Learned Program • Trained to fit 166 test problems • Using population of 300 programs, found a solution after 10 generations: (EQ (DU (MT CS) (NOT CS))(DU (MS NN) (NOT NN))) • (EQxy) equal • (DU x y) do until • (MT x) move to table • CS current stack • (NOT x) not x • (MS x) move to stack • NN next necessary

  40. Other examples of genetic programs • Design of electronic filter circuits • Discovery of circuits competitive with best human design • Image classification

  41. Summary of evolutionary programming • Conduct randomized, hill-climbing search through hypotheses • Analogy to biological evolution • Learning  optimization problem (optimize fitness) • Can be parallelized easily

More Related