220 likes | 225 Views
Artificial Intelligence. Project 2: Classification Using Genetic Programming. 2008. 10. 27 Kim, MinHyeok mhkim@bi.snu.ac.kr Biointelligence laboratory. Contents. Project outline Description on the data set Genetic Programming Brief overview Fitness function & Selection methods
E N D
Artificial Intelligence Project 2: Classification Using Genetic Programming 2008. 10. 27 Kim, MinHyeok mhkim@bi.snu.ac.krBiointelligence laboratory
Contents • Project outline • Description on the data set • Genetic Programming • Brief overview • Fitness function & Selection methods • Classification with GP (in this project) • Guide to writing reports • Style & contents • Submission guide / Marking scheme (C) 2008, SNU Biointelligence Laboratory
Outline • Goal • Understand the Genetic Programming (GP) deeper • Practice researching and writing a paper • Forest Fires problem (classification) • To predict whether a fire occurs or not • Using Genetic Programming • Estimating several statistics on the dataset • Data set • Variation of the ‘Forest Fires data set’ • http://archive.ics.uci.edu/ml/datasets/Forest+Fires (C) 2008, SNU Biointelligence Laboratory
Forest Fires Data Set • Description • Database of 517 samples • You can use at most 500 samples for training • 17 samples for prediction • 12 attributes • X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,label • Integer or real value • Label (Class) • Two classes • 0 : a fire does not occur • 1 : a fire occurs (C) 2008, SNU Biointelligence Laboratory
Brief Summary of GP • A kind of evolutionary algorithms • It is represented with a tree structure • You need to set up following elements for GP run • The set of terminals (input attributes, the class variable, constants) • The set of functions (numerical / condition operators) • The fitness measure • The algorithm parameters • population size, maximum number of generations • crossover rate and mutation rate • maximum depth of GP trees etc. • The method for designating a result and the criterion for terminating a run. (C) 2008, SNU Biointelligence Laboratory
GP Flowchart GA loop GP loop 6
Initialization • Maximum initial depth of trees Dmax is set. • Full method (each branch has depth = Dmax): • nodes at depth d < Dmax randomly chosen from function set F • nodes at depth d = Dmax randomly chosen from terminal set T • Grow method (each branch has depth Dmax): • nodes at depth d < Dmax randomly chosen from F T • nodes at depth d = Dmax randomly chosen from T • Common GP initialisation: ramped half-and-half, where grow and full method each deliver half of initial population (C) 2008, SNU Biointelligence Laboratory
Fitness Functions • Relative squared error • The number of outputs that are within % of the correct value • And you can try other fitness functions which are well-defined to solve problems
Selection methods (1/2) • Fitness proportional (roulette wheel) selection • The roulette wheel can be constructed as follows. • Calculate the total fitness for the population. • Calculate selection probability pk for each chromosome vk. • Calculate cumulative probability qk for each chromosome vk.
Procedure: Proportional_Selection • Generate a random number r from the range [0,1]. • If r q1, then select the first chromosome v1; else, select the kth chromosome vk (2 k pop_size) such that qk-1< r qk.
Selection methods (2/2) • Tournament selection • Tournament size q • Ranking-based selection • 2 POP_SIZE • 1 + 2 and - = 2 - + • Elitism • To preserve n good solutions until the next generation
Classification with GP (in this project) IF • Function Regression • Search a function f(x) s.t. • f(x) ≥ threshold t when y=1 • f(x)< threshold t when y=0 • Converting to Boolean value 1 > 0 f(x) t ∧ ¬ ∨ > < = rain 0 RH 50 wind + FFMC ISI
What to do for the experiment? • Select a library that implements GP • You can find various libraries written in C++/Java/Matlab • See the list of recommended libraries on the next page • Build up your own code for the experiment • Check sample codes and tutorials of libraries for quick start • Add comments to explain the flow of your program • Caution • Running GP may take much time (C) 2008, SNU Biointelligence Laboratory
Recommended Libraries for GP • C++ • GPLib: http://www.cs.bham.ac.uk/~cmf/GPLib/index.html • Java • JGAP: http://jgap.sourceforge.net/ • ECJ: http://cs.gmu.edu/~eclab/projects/ecj/ • Matlab toolbox • GPLAB: http://gplab.sourceforge.net/ • More References • Implementations section in Wiki – Genetic Programming: http://en.wikipedia.org/wiki/Genetic_programming (C) 2008, SNU Biointelligence Laboratory
Reports Style • English only!! • Scientific journal-style • How to Write A Paper in Scientific Journal Style and Format • http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWsections.html (C) 2008, SNU Biointelligence Laboratory
Report Contents (1/3) • System description • Used programming language and running environments • Result tables • Analysis & discussion (Very Important!!) (C) 2008, SNU Biointelligence Laboratory
Report Contents (2/3) • Graph • Avg., Max. Fitness versus Generation • Tree size versus Generation (C) 2008, SNU Biointelligence Laboratory
Report Contents (3/3) • Basic experiments • Changing parameters for the crossover and mutation • Various function sets: arithmetic, numerical • Optional experiments • Various selection methods • Depth limitation • Population size, generation numbers • Comparison to Neural Network • … • References (C) 2008, SNU Biointelligence Laboratory
Submission Guide • Due date: Nov. 19 (Wed) 18:00 • Submit both ‘hardcopy’ and ‘email’ • Hardcopy submission to the office (301-417 ) • E-mail submission to mhkim@bi.snu.ac.kr • Subject : [AI Project1 Report] Student number, Name • Report + your source code with comments + executable file(s) • Length: report should be summarized within 12 pages. • We are NOT interested in the accuracy and your programming skill, but your creativity and research ability. • If your major is not a C.S, team project with a C.S major student is possible (Use the class board to find your partner and notice the information of your team to TA (bhkim@bi.snu.ac.kr) by Nov. 5) (C) 2008, SNU Biointelligence Laboratory
Marking Scheme • 5 points for programming • 5 points for result prediction • 30 points for experiment & analysis • 15 pts for experiments, 15pts for analysis • 10 points for report • Late work • - 10% per one day • Maximum 7 days (C) 2008, SNU Biointelligence Laboratory
QnA (C) 2008, SNU Biointelligence Laboratory