Genetic programming for mining dna chip data from cancer patients
Download
1 / 22

Genetic Programming for Mining DNA Chip data from Cancer Patients - PowerPoint PPT Presentation


  • 169 Views
  • Uploaded on

Genetic Programming for Mining DNA Chip data from Cancer Patients. W.B. Langdon & B.F. Buxton Genetic Programming and Evolving Machines, 5 (3): 251-257 September 2004 Presenter John Dynan. Why Genetic Programming ?. Applies principles Darwinism to AI

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Genetic Programming for Mining DNA Chip data from Cancer Patients' - angeni


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Genetic programming for mining dna chip data from cancer patients

Genetic Programming for Mining DNA Chip data from Cancer Patients

W.B. Langdon & B.F. Buxton

Genetic Programming and Evolving Machines, 5 (3): 251-257 September 2004

Presenter

John Dynan


Why genetic programming
Why Genetic Programming ? Patients

  • Applies principles Darwinism to AI

  • Allows natural selection of the Fittest Models

  • Iterative process that evolves numerous Solutions

  • Similar to the Biology of Genetic

  • Resolves over fitting issue found in other Approaches

  • DNA arrays with limited data sets (<100 Tissues)

  • Predictive nature of low expression Genes

  • Disease , treatment and prevention


What is genetic programming gp
What is Genetic Programming(GP) ? Patients

  • Replicates Genetic Process:

    • Crossover(recombination)

    • Duplication

    • Mutation

    • Production

    • Deletion

    • DNA string of Elements (A,C,G,U=T)


Gp cross over
GP Cross Over Patients



What it is not
What it is not Patients

  • Clustering K-means

  • Heuristic Combination of fixed Rules

  • Single set of features

  • Sequential learning process for features

  • Optimal solution

  • Controlled Feature Deletion or Addition


History
History Patients

  • Extension of Holland(1975) Genetic Algorithms Work(Stanford):

    • Structures are programs

    • Syntax Trees

    • Nodes

      • Functions ( Mul, Add, Div, Sub, Exp ..)

      • Terminals (Attributes, Gene Expression, ..)

  • GP is a search for Terminals and Functions


Syntax tree
Syntax Tree Patients


Array problem
µarray Problem Patients

  • Pomeroy Data Set (url)

  • 7129 Gene Expressions

  • 60 Patents

    • 39 Survivors ( Cancer Tissues)

    • 21 Terminal (Non Cancer)

  • Compare w/ K=5 & 8 Genes - Pomeroy


Pomeroy data set snippet
Pomeroy Data Set Snippet Patients

  • Brain_MD_30 Brain_MD_31 Brain_MD_32 Brain_MD_33 Brain_MD_34 Brain_MD_35

  • Brain_MD_36 Brain_MD_37 Brain_MD_38 Brain_MD_39 Brain_MD_40 Brain_MD_41

  • Brain_MD_42 Brain_MD_43 Brain_MD_44 Brain_MD_45 Brain_MD_46 Brain_MD_47

  • Brain_MD_48 Brain_MD_49 Brain_MD_50 Brain_MD_51 Brain_MD_52 Brain_MD_53

  • Brain_MD_54 Brain_MD_55 Brain_MD_56 Brain_MD_57 Brain_MD_58 Brain_MD_59

  • Brain_MD_60

  • U08998_at TAR RNA binding protein (TRBP) mRNA 206.0 55.0 106.0 323.0 209.0 88.0

  • 179.0 -493.0 -40.0 60.0 -200.0 312.0 -26.0 -234.0 127.0 10.0 135.0 -72.0

  • 46.0 -77.0 50.0 375.0 -252.0 -189.0 -112.0 -931.0 193.0 -125.0 -1244.0 -470.0

  • -683.0 -261.0 -18.0 -90.0 -3.0 -57.0 -201.0 50.0 -197.0 -141.0 -353.0 -132.0

  • -408.0 -262.0 20.0 239.0 -232.0 -593.0-443.0 6.0 -316.0 116.0 -7.0 169.0

  • -260.0 -137.0 17.0 100.0 -954.0 -353.0

  • U41737_at Pancreatic beta cell growth factor (INGAP) mRNA 15.0 -87.0 11.0 173.0 177.0

  • -105.0 35.0 13.0 53.0 8.0 25.0 28.0 21.0 61.0 -8.0 75.0 24.0

  • -135.0 55.0 162.0 139.0 22.0 -89.0 13.0 -177.0 -384.0 45.0 -38.0 -38.0

  • -136.0 -152.0 -42.0 -85.0 -31.0 70.0 -76.0 -74.0 -50.0 29.0 -81.0 145.0

  • 42.0 -79.0 25.0 18.0 -20.0 44.0-78.0 192.0 -66.0 -73.0 -39.0 57.0

  • -122.0 -90.0 25.0 -10.0 -80.0 -306.0 -3.0

  • 60 2 1

  • # class0 class1

  • 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

  • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


Method
Method Patients

  • The individual consist of 5 trees (mating pools)

  • N=60 fold generates 60 random models

  • N =60 fold is repeated 10 times

  • 600 Predictive Patent Survival Models

  • if Tree(i=1..5)>0, GP model positive (node)

  • Genetic modifications in tree 1 and 2

  • Trees may specialize(tissue)

  • Program Fitness (Pos/Neg) Accuracy > .5


Gp conditions
GP Conditions Patients

  • Terminals ( µarray data)

  • Functions(+,-,/,*,exp,<,> ..)

  • Fitness Measurement(Data)

  • Program Control(loop,time)

  • Termination(Generations)



Gp 1 st 2 nd data mining
GP 1 Patientsst/2nd Data Mining

  • 600 GP models

  • 6970 of 7129 Attributes in GP Models

  • 404 Genes in ten or more GP Models

  • 404 Genes were used in 2nd GP run

  • Two Genes in 100 GP models

    • U08998 - 182 GP Models

    • U41737 – 193 GP Models


Gene biology
Gene Biology Patients

  • Genes NOT highly Expressed

  • Not Found in Pomeroy Kmeams Cluster Analysis

  • U08998_at

    • TAR RNA binding protein – promotes cancer

    • TARBP1 GeneCard

  • U41737_at

    • Pancreatic beta cell growth

    • REG3A GeneCard


Gene frequency 2 nd gp
Gene Frequency 2 Patientsnd GP


Final gp
Final GP Patients

  • Limited number of functions

  • Single IF statements ( <,>,,≤)

  • Random generation of function and Genes

  • N=60 fold times 10 accuracy = 68%

  • 147 of 192 were incorrect predictors

  • 39 of 192 were correct two gene predictors



Two gene outcome
Two Gene Outcome Patients

  •  Survived/Predicted Correct –TP

  •  Failed Treatment/Predicted Wrong – FP

  • ⃟ Survived/Predicted Wrong – FN

  •  Failed Treatment/Predicted Correct –TN

  • Darken points poor predictors

  • GP Model predictor:

  • -42 < U41737_at + 2*U0998_at


Limitations
Limitations Patients

  • Extensive computer resources( exponential)

  • NP solution

  • Only heuristic optimal solution

  • Replications of the random selection process with various genetic evolutionary change rates, can cause different results


Bioinformatics
Bioinformatics Patients

  • Allows the selection of low expression gene into predictive model

  • New information can be harvested by repeating execution of GP

  • 5 tree members can be isolated members of

    different organ tissues

  • Disease treatment, prediction and cured


References
References Patients

  • 1 J. DeRisi, et al. 1998. The transcriptional program of sporulation in budding yeasts.

  • Science 282:699-705

  • 2Mitra, A; Almal, A. ; George, B.;Fry,D. ; Lenehan et. al, The use of genetic programming analysis of quantitative expression profiles… BMC Cancer 206;6:159.

  • 3University of Manchester GP Web Site URL

  • : http://dbkgroup.org/gp_home.htm

  • 4Biolograhy of GP references:

  • http://liinwww.ira.uka.de/bibliography/Ai/genetic.programming.html

  • 5Langdon,L.; and Poli, R. Foundations of Genetic Programming ,Springer –Verlag , Berlin. 2001

  • 6Koza,John; Bennett, F.;Andre, D. and Keane, Martin. Genetic Programming, Morgan Kaufmann Publishing, San Francisco, 1999.

  • 7 Hartl, D. and Jones, E. 2002. Essential Genetics 3rd ed. Boston, MA. : .Jones and Bartlett Publishers