Genetic programming for mining dna chip data from cancer patients
Download
1 / 22

Genetic Programming for Mining DNA Chip data from Cancer Patients - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on
  • Presentation posted in: General

Genetic Programming for Mining DNA Chip data from Cancer Patients. W.B. Langdon & B.F. Buxton Genetic Programming and Evolving Machines, 5 (3): 251-257 September 2004 Presenter John Dynan. Why Genetic Programming ?. Applies principles Darwinism to AI

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Genetic Programming for Mining DNA Chip data from Cancer Patients

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Genetic Programming for Mining DNA Chip data from Cancer Patients

W.B. Langdon & B.F. Buxton

Genetic Programming and Evolving Machines, 5 (3): 251-257 September 2004

Presenter

John Dynan


Why Genetic Programming ?

  • Applies principles Darwinism to AI

  • Allows natural selection of the Fittest Models

  • Iterative process that evolves numerous Solutions

  • Similar to the Biology of Genetic

  • Resolves over fitting issue found in other Approaches

  • DNA arrays with limited data sets (<100 Tissues)

  • Predictive nature of low expression Genes

  • Disease , treatment and prevention


What is Genetic Programming(GP) ?

  • Replicates Genetic Process:

    • Crossover(recombination)

    • Duplication

    • Mutation

    • Production

    • Deletion

    • DNA string of Elements (A,C,G,U=T)


GP Cross Over


Biological Genetic Cross Over


What it is not

  • Clustering K-means

  • Heuristic Combination of fixed Rules

  • Single set of features

  • Sequential learning process for features

  • Optimal solution

  • Controlled Feature Deletion or Addition


History

  • Extension of Holland(1975) Genetic Algorithms Work(Stanford):

    • Structures are programs

    • Syntax Trees

    • Nodes

      • Functions ( Mul, Add, Div, Sub, Exp ..)

      • Terminals (Attributes, Gene Expression, ..)

  • GP is a search for Terminals and Functions


Syntax Tree


µarray Problem

  • Pomeroy Data Set (url)

  • 7129 Gene Expressions

  • 60 Patents

    • 39 Survivors ( Cancer Tissues)

    • 21 Terminal (Non Cancer)

  • Compare w/ K=5 & 8 Genes - Pomeroy


Pomeroy Data Set Snippet

  • Brain_MD_30Brain_MD_31Brain_MD_32Brain_MD_33Brain_MD_34Brain_MD_35

  • Brain_MD_36Brain_MD_37Brain_MD_38Brain_MD_39Brain_MD_40Brain_MD_41

  • Brain_MD_42Brain_MD_43Brain_MD_44Brain_MD_45Brain_MD_46Brain_MD_47

  • Brain_MD_48Brain_MD_49Brain_MD_50Brain_MD_51Brain_MD_52Brain_MD_53

  • Brain_MD_54Brain_MD_55Brain_MD_56Brain_MD_57Brain_MD_58Brain_MD_59

  • Brain_MD_60

  • U08998_atTAR RNA binding protein (TRBP) mRNA206.055.0106.0323.0209.088.0

  • 179.0-493.0-40.060.0-200.0312.0-26.0-234.0127.010.0135.0-72.0

  • 46.0-77.050.0375.0-252.0-189.0-112.0-931.0193.0-125.0-1244.0-470.0

  • -683.0-261.0-18.0-90.0-3.0-57.0-201.050.0-197.0-141.0-353.0-132.0

  • -408.0-262.020.0239.0-232.0-593.0-443.06.0-316.0116.0-7.0169.0

  • -260.0-137.017.0100.0-954.0-353.0

  • U41737_atPancreatic beta cell growth factor (INGAP) mRNA15.0-87.011.0173.0177.0

  • -105.035.013.053.08.025.0 28.021.061.0-8.075.024.0

  • -135.055.0162.0139.022.0-89.013.0-177.0-384.045.0-38.0-38.0

  • -136.0-152.0-42.0-85.0-31.070.0-76.0-74.0-50.029.0-81.0145.0

  • 42.0-79.025.018.0-20.044.0-78.0192.0-66.0-73.0-39.057.0

  • -122.0-90.025.0-10.0-80.0-306.0-3.0

  • 60 2 1

  • # class0 class1

  • 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

  • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


Method

  • The individual consist of 5 trees (mating pools)

  • N=60 fold generates 60 random models

  • N =60 fold is repeated 10 times

  • 600 Predictive Patent Survival Models

  • if Tree(i=1..5)>0, GP model positive (node)

  • Genetic modifications in tree 1 and 2

  • Trees may specialize(tissue)

  • Program Fitness (Pos/Neg) Accuracy > .5


GP Conditions

  • Terminals ( µarray data)

  • Functions(+,-,/,*,exp,<,> ..)

  • Fitness Measurement(Data)

  • Program Control(loop,time)

  • Termination(Generations)


GP DNA Parameters


GP 1st/2nd Data Mining

  • 600 GP models

  • 6970 of 7129 Attributes in GP Models

  • 404 Genes in ten or more GP Models

  • 404 Genes were used in 2nd GP run

  • Two Genes in 100 GP models

    • U08998 - 182 GP Models

    • U41737 – 193 GP Models


Gene Biology

  • Genes NOT highly Expressed

  • Not Found in Pomeroy Kmeams Cluster Analysis

  • U08998_at

    • TAR RNA binding protein – promotes cancer

    • TARBP1 GeneCard

  • U41737_at

    • Pancreatic beta cell growth

    • REG3A GeneCard


Gene Frequency 2nd GP


Final GP

  • Limited number of functions

  • Single IF statements ( <,>,,≤)

  • Random generation of function and Genes

  • N=60 fold times 10 accuracy = 68%

  • 147 of 192 were incorrect predictors

  • 39 of 192 were correct two gene predictors


Two Gene Profile


Two Gene Outcome

  •  Survived/Predicted Correct –TP

  •  Failed Treatment/Predicted Wrong – FP

  • ⃟ Survived/Predicted Wrong – FN

  •  Failed Treatment/Predicted Correct –TN

  • Darken points poor predictors

  • GP Model predictor:

  • -42 < U41737_at + 2*U0998_at


Limitations

  • Extensive computer resources( exponential)

  • NP solution

  • Only heuristic optimal solution

  • Replications of the random selection process with various genetic evolutionary change rates, can cause different results


Bioinformatics

  • Allows the selection of low expression gene into predictive model

  • New information can be harvested by repeating execution of GP

  • 5 tree members can be isolated members of

    different organ tissues

  • Disease treatment, prediction and cured


References

  • 1 J. DeRisi, et al. 1998. The transcriptional program of sporulation in budding yeasts.

  • Science 282:699-705

  • 2Mitra, A; Almal, A. ; George, B.;Fry,D. ; Lenehan et. al, The use of genetic programming analysis of quantitative expression profiles… BMC Cancer 206;6:159.

  • 3University of Manchester GP Web Site URL

  • : http://dbkgroup.org/gp_home.htm

  • 4Biolograhy of GP references:

  • http://liinwww.ira.uka.de/bibliography/Ai/genetic.programming.html

  • 5Langdon,L.; and Poli, R. Foundations of Genetic Programming ,Springer –Verlag , Berlin. 2001

  • 6Koza,John; Bennett, F.;Andre, D. and Keane, Martin. Genetic Programming, Morgan Kaufmann Publishing, San Francisco, 1999.

  • 7 Hartl, D. and Jones, E. 2002. Essential Genetics 3rd ed. Boston, MA. : .Jones and Bartlett Publishers


ad
  • Login