1 / 22

Project 2: Classification Using Genetic Programming

Artificial Intelligence. Project 2: Classification Using Genetic Programming. 2008. 10. 27 Kim, MinHyeok mhkim@bi.snu.ac.kr Biointelligence laboratory. Contents. Project outline Description on the data set Genetic Programming Brief overview Fitness function & Selection methods

lweisgerber
Download Presentation

Project 2: Classification Using Genetic Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence Project 2: Classification Using Genetic Programming 2008. 10. 27 Kim, MinHyeok mhkim@bi.snu.ac.krBiointelligence laboratory

  2. Contents • Project outline • Description on the data set • Genetic Programming • Brief overview • Fitness function & Selection methods • Classification with GP (in this project) • Guide to writing reports • Style & contents • Submission guide / Marking scheme (C) 2008, SNU Biointelligence Laboratory

  3. Outline • Goal • Understand the Genetic Programming (GP) deeper • Practice researching and writing a paper • Forest Fires problem (classification) • To predict whether a fire occurs or not • Using Genetic Programming • Estimating several statistics on the dataset • Data set • Variation of the ‘Forest Fires data set’ • http://archive.ics.uci.edu/ml/datasets/Forest+Fires (C) 2008, SNU Biointelligence Laboratory

  4. Forest Fires Data Set • Description • Database of 517 samples • You can use at most 500 samples for training • 17 samples for prediction • 12 attributes • X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,label • Integer or real value • Label (Class) • Two classes • 0 : a fire does not occur • 1 : a fire occurs (C) 2008, SNU Biointelligence Laboratory

  5. Brief Summary of GP • A kind of evolutionary algorithms • It is represented with a tree structure • You need to set up following elements for GP run • The set of terminals (input attributes, the class variable, constants) • The set of functions (numerical / condition operators) • The fitness measure • The algorithm parameters • population size, maximum number of generations • crossover rate and mutation rate • maximum depth of GP trees etc. • The method for designating a result and the criterion for terminating a run. (C) 2008, SNU Biointelligence Laboratory

  6. GP Flowchart GA loop GP loop 6

  7. Initialization • Maximum initial depth of trees Dmax is set. • Full method (each branch has depth = Dmax): • nodes at depth d < Dmax randomly chosen from function set F • nodes at depth d = Dmax randomly chosen from terminal set T • Grow method (each branch has depth Dmax): • nodes at depth d < Dmax randomly chosen from F T • nodes at depth d = Dmax randomly chosen from T • Common GP initialisation: ramped half-and-half, where grow and full method each deliver half of initial population (C) 2008, SNU Biointelligence Laboratory

  8. Fitness Functions • Relative squared error • The number of outputs that are within % of the correct value • And you can try other fitness functions which are well-defined to solve problems

  9. Selection methods (1/2) • Fitness proportional (roulette wheel) selection • The roulette wheel can be constructed as follows. • Calculate the total fitness for the population. • Calculate selection probability pk for each chromosome vk. • Calculate cumulative probability qk for each chromosome vk.

  10. Procedure: Proportional_Selection • Generate a random number r from the range [0,1]. • If r q1, then select the first chromosome v1; else, select the kth chromosome vk (2 k  pop_size) such that qk-1< r  qk.

  11. Selection methods (2/2) • Tournament selection • Tournament size q • Ranking-based selection • 2    POP_SIZE • 1  +  2 and - = 2 - + • Elitism • To preserve n good solutions until the next generation

  12. Classification with GP (in this project) IF • Function Regression • Search a function f(x) s.t. • f(x) ≥ threshold t when y=1 • f(x)< threshold t when y=0 • Converting to Boolean value 1 > 0 f(x) t ∧ ¬ ∨ > < = rain 0 RH 50 wind + FFMC ISI

  13. What to do for the experiment? • Select a library that implements GP • You can find various libraries written in C++/Java/Matlab • See the list of recommended libraries on the next page • Build up your own code for the experiment • Check sample codes and tutorials of libraries for quick start • Add comments to explain the flow of your program • Caution • Running GP may take much time (C) 2008, SNU Biointelligence Laboratory

  14. Recommended Libraries for GP • C++ • GPLib: http://www.cs.bham.ac.uk/~cmf/GPLib/index.html • Java • JGAP: http://jgap.sourceforge.net/ • ECJ: http://cs.gmu.edu/~eclab/projects/ecj/ • Matlab toolbox • GPLAB: http://gplab.sourceforge.net/ • More References • Implementations section in Wiki – Genetic Programming: http://en.wikipedia.org/wiki/Genetic_programming (C) 2008, SNU Biointelligence Laboratory

  15. Reports Style • English only!! • Scientific journal-style • How to Write A Paper in Scientific Journal Style and Format • http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWsections.html (C) 2008, SNU Biointelligence Laboratory

  16. Report Contents (1/3) • System description • Used programming language and running environments • Result tables • Analysis & discussion (Very Important!!) (C) 2008, SNU Biointelligence Laboratory

  17. Report Contents (2/3) • Graph • Avg., Max. Fitness versus Generation • Tree size versus Generation (C) 2008, SNU Biointelligence Laboratory

  18. Report Contents (3/3) • Basic experiments • Changing parameters for the crossover and mutation • Various function sets: arithmetic, numerical • Optional experiments • Various selection methods • Depth limitation • Population size, generation numbers • Comparison to Neural Network • … • References (C) 2008, SNU Biointelligence Laboratory

  19. Submission Guide • Due date: Nov. 19 (Wed) 18:00 • Submit both ‘hardcopy’ and ‘email’ • Hardcopy submission to the office (301-417 ) • E-mail submission to mhkim@bi.snu.ac.kr • Subject : [AI Project1 Report] Student number, Name • Report + your source code with comments + executable file(s) • Length: report should be summarized within 12 pages. • We are NOT interested in the accuracy and your programming skill, but your creativity and research ability. • If your major is not a C.S, team project with a C.S major student is possible (Use the class board to find your partner and notice the information of your team to TA (bhkim@bi.snu.ac.kr) by Nov. 5) (C) 2008, SNU Biointelligence Laboratory

  20. Marking Scheme • 5 points for programming • 5 points for result prediction • 30 points for experiment & analysis • 15 pts for experiments, 15pts for analysis • 10 points for report • Late work • - 10% per one day • Maximum 7 days (C) 2008, SNU Biointelligence Laboratory

  21. QnA (C) 2008, SNU Biointelligence Laboratory

  22. Test Data

More Related