1 / 18

with Gene Expression Programming

Statistical Learning for Event Selection. with Gene Expression Programming. Outline. Introduction to evolutionary computation Gene Expression Programming (GEP) Application of GEP for Event Selection Conclusion. Evolutionary Computation.

dalia
Download Presentation

with Gene Expression Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Learning for Event Selection with Gene Expression Programming

  2. Outline • Introduction to evolutionary computation • Gene Expression Programming (GEP) • Application of GEP for Event Selection • Conclusion Liliana Teodorescu

  3. Evolutionary Computation • Evolutionary computationsimulates thenatural evolutionon a computer process leading to maintenance or increase of a population ability to survive and reproduce in a specific environment quantitatively measured by evolutionary fitness • Goal of natural evolution - to generate a population of individuals with increasing fitness • Goal of evolutionary computation - to generate a set of solutions (to a problem) of increasing quality Liliana Teodorescu

  4. Terminology • Individual– candidate solution to a problem decoding encoding • Chromosome– representation of the candidate solution • Gene– constituent entity of the chromosome • Population– set of individuals/chromosomes • Fitness function – representation of how good a candidate solution is • Genetic operators– operators applied on chromosomes in order to create genetic variation (other chromosomes) Liliana Teodorescu

  5. Evolutionary Algorithms Natural evolution simulation -core of theevolutionary algorithms: optimisation algorithms(iteratively improve the quality of the solutions until anoptimal/feasible solutionis found) Basic evolutionary algorithm Run Start Initial population creation (randomly) • Problem definition • Encoding of the candidate solution • Fitness definition • Run • Decoding the best fitted chromosome = solution Fitness evaluation (of each chromosome) yes Terminate? Stop New generation no Selection of individuals (proportional with fitness) Reproduction (genetic operators) Replacement of the current population with the new one Liliana Teodorescu

  6. Classes of Evolutionary Algorithms • Genetic Algorithms (GA) (J. H. Holland, 1975) • Genetic Programming (GP) (J. R. Koza, 1992) • Gene Expression Programming (GEP)(C. Ferreira, 2001) Main differences • Encoding method • Reproduction method Liliana Teodorescu

  7. Genetic Algorithms • Encoding • Chromosome- fixed-length binary string (common technique) • Gene - each bit of the string chromosome genes 1 0 0 1 1 0 1 1 • Reproduction Recombination (crossover) – exchanges parts of two chromosomes (usual rate 0.7) Point choosen randomly 0 1 1 0 0 1 1 1 1 1 Mutation – changes the gene value (usual rate 0.001-0.0001) Point choosen randomly 1 0 0 1 1 1 0 0 1 0 Liliana Teodorescu

  8. Genetic Programming GP search for the computer program to solve the problem, not for the solution to the problem. Computer program - any computing language (in principle) - LISP (List Processor) (in practice) LISP- highly symbol-oriented Graphical representation of S-expression Mathematical expression S-expression functions (+,*) and terminals (a,b,c) - (-(*ab)c) a*b-c c * a b • Encoding Chromosome:S-expression - variable length => more flexibility - sintax constraints => invalid expressions produced in the evolution process must be eliminated => waste of CPU • Reproduction Recombination (crossover) and Mutation(usualy) Liliana Teodorescu

  9. Gene Expression Programming • search for the computer program that solve the problem (as GP) • works with two entities: chromosomesand expression trees Encoding Candidate solutionrepresented by an expression tree (ET) (similar with GP tree) ET encoded in a chromosome: read ET from left to right and from top to bottom Q * Q*-+abcd Q means sqrt + - a b c d • Decoding the chromosome(translates the chromosome in an ET) • first line of ET (root) – first element of the chromosome • next line of ET – as many arguments needed by the element in • the previous line Liliana Teodorescu

  10. Gene Expression Programming (cont.) Chromosome– has one or more genes of equal length Gene– head:contains both functions and terminals (length h) - tail: contains only terminals (length t) n – number of arguments of the function with the highest number of arguments t=h(n-1)+1 e.g. set of functions: Q,*,/,-,+ set of terminals: a,b n=2; h=15 (choosen) =>t =16 => length of gene=15+16=31 * + b - a a Q *b+a-aQab+//+b+babbabbbababbaaa a ET ends before the end of the gene! Liliana Teodorescu

  11. Gene Expression Programming (cont.) Reproduction Genetic operatorsapplied on chromosomsnot on ET => always produce sintactically correct structures! • Recombination • Mutation • Transposition – a part of the chromosome moved to another part of the same chromosome e.g. Mutation: Q replaced with * *b+a-a*ab+//+b+babbabbbababbaaa *b+a-aQab+//+b+babbabbbababbaaa * * + + b b - - a a a Q a * a b a Liliana Teodorescu

  12. GEP in HEP This is the first try! • GEP for event selection • cuts/selection criteria finding • classification problem (signal/background classification) • statistical learning approach • Data samples: • Monte-Carlo simulation from BaBar experiment • Ks production in e+e- (~10 GeV) • 4000-5000 training events (for classification rule extraction) • 4000-5000 test events (others than training events) • limitations imposed by APS 3.0 • Software resources • APS 3.0 (Automatic Problem Solver) - commercial package (Windows based) - www.gepsoft.com - function finding, classification, time series analysis Liliana Teodorescu

  13. Input Parameters Functions and constants to be used in the classification rules (cuts type rule) - AND, AND1 (x>=0 and y>=0 => 1 else 0), AND2 (x<0 and y<0 => 1 else 0) - <, >, <=, >=, =, != - floating point constants (-10,10) Data– variables usually used in a cut based analysis for selection - doca (distance of closest approach) - RXY, |RZ| (region around interaction point) - |cos(hel)| - SFL (Signed Flight Length) - Fsig (Flight Significance) - Pchi (2 probability of the vertex) - Mass (KS reconstructed mass) GEP parameters - fitness function: number of hits (events correctly classified) - gene length (head = 1-5) - no. of chromosomes per generation: 100 - no. of generations per run: 500-2000 - genetic operators rates: mutation 0.044, inversion 0.1, transposition 0.3, recombination 0.1 Liliana Teodorescu

  14. Analysis 5 Variables - doca (distance of closest approach) - RXY, |RZ| (region around interaction point) - |cos(hel)| - SFL (Signed Flight Length) Training sample Total = 3985 events S = 3390 events B = 595 events No. of genes = 1, Head length =2 Liliana Teodorescu

  15. Standard cut analysis Liliana Teodorescu

  16. Complex chromosomes Liliana Teodorescu

  17. Performance Parameters ~92% ~8% Liliana Teodorescu

  18. Conclusions GEP allows • fast identification of powerful cuts • signal/background separation of ~90% accuracy for both low and high background samples • Can it go beyond that? • Can it develop more complex selection criteria? • GEP • is still in the R&D phase • needs software development -> underway Liliana Teodorescu

More Related