Genetic programming for mining dna chip data from cancer patients
Download
1 / 22

Genetic Programming for Mining DNA Chip data from Cancer Patients - PowerPoint PPT Presentation


  • 169 Views
  • Uploaded on

Genetic Programming for Mining DNA Chip data from Cancer Patients. W.B. Langdon & B.F. Buxton Genetic Programming and Evolving Machines, 5 (3): 251-257 September 2004 Presenter John Dynan. Why Genetic Programming ?. Applies principles Darwinism to AI

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Genetic Programming for Mining DNA Chip data from Cancer Patients' - angeni


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Genetic programming for mining dna chip data from cancer patients

Genetic Programming for Mining DNA Chip data from Cancer Patients

W.B. Langdon & B.F. Buxton

Genetic Programming and Evolving Machines, 5 (3): 251-257 September 2004

Presenter

John Dynan


Why genetic programming
Why Genetic Programming ? Patients

  • Applies principles Darwinism to AI

  • Allows natural selection of the Fittest Models

  • Iterative process that evolves numerous Solutions

  • Similar to the Biology of Genetic

  • Resolves over fitting issue found in other Approaches

  • DNA arrays with limited data sets (<100 Tissues)

  • Predictive nature of low expression Genes

  • Disease , treatment and prevention


What is genetic programming gp
What is Genetic Programming(GP) ? Patients

  • Replicates Genetic Process:

    • Crossover(recombination)

    • Duplication

    • Mutation

    • Production

    • Deletion

    • DNA string of Elements (A,C,G,U=T)


Gp cross over
GP Cross Over Patients



What it is not
What it is not Patients

  • Clustering K-means

  • Heuristic Combination of fixed Rules

  • Single set of features

  • Sequential learning process for features

  • Optimal solution

  • Controlled Feature Deletion or Addition


History
History Patients

  • Extension of Holland(1975) Genetic Algorithms Work(Stanford):

    • Structures are programs

    • Syntax Trees

    • Nodes

      • Functions ( Mul, Add, Div, Sub, Exp ..)

      • Terminals (Attributes, Gene Expression, ..)

  • GP is a search for Terminals and Functions


Syntax tree
Syntax Tree Patients


Array problem
µarray Problem Patients

  • Pomeroy Data Set (url)

  • 7129 Gene Expressions

  • 60 Patents

    • 39 Survivors ( Cancer Tissues)

    • 21 Terminal (Non Cancer)

  • Compare w/ K=5 & 8 Genes - Pomeroy


Pomeroy data set snippet
Pomeroy Data Set Snippet Patients

  • Brain_MD_30 Brain_MD_31 Brain_MD_32 Brain_MD_33 Brain_MD_34 Brain_MD_35

  • Brain_MD_36 Brain_MD_37 Brain_MD_38 Brain_MD_39 Brain_MD_40 Brain_MD_41

  • Brain_MD_42 Brain_MD_43 Brain_MD_44 Brain_MD_45 Brain_MD_46 Brain_MD_47

  • Brain_MD_48 Brain_MD_49 Brain_MD_50 Brain_MD_51 Brain_MD_52 Brain_MD_53

  • Brain_MD_54 Brain_MD_55 Brain_MD_56 Brain_MD_57 Brain_MD_58 Brain_MD_59

  • Brain_MD_60

  • U08998_at TAR RNA binding protein (TRBP) mRNA 206.0 55.0 106.0 323.0 209.0 88.0

  • 179.0 -493.0 -40.0 60.0 -200.0 312.0 -26.0 -234.0 127.0 10.0 135.0 -72.0

  • 46.0 -77.0 50.0 375.0 -252.0 -189.0 -112.0 -931.0 193.0 -125.0 -1244.0 -470.0

  • -683.0 -261.0 -18.0 -90.0 -3.0 -57.0 -201.0 50.0 -197.0 -141.0 -353.0 -132.0

  • -408.0 -262.0 20.0 239.0 -232.0 -593.0-443.0 6.0 -316.0 116.0 -7.0 169.0

  • -260.0 -137.0 17.0 100.0 -954.0 -353.0

  • U41737_at Pancreatic beta cell growth factor (INGAP) mRNA 15.0 -87.0 11.0 173.0 177.0

  • -105.0 35.0 13.0 53.0 8.0 25.0 28.0 21.0 61.0 -8.0 75.0 24.0

  • -135.0 55.0 162.0 139.0 22.0 -89.0 13.0 -177.0 -384.0 45.0 -38.0 -38.0

  • -136.0 -152.0 -42.0 -85.0 -31.0 70.0 -76.0 -74.0 -50.0 29.0 -81.0 145.0

  • 42.0 -79.0 25.0 18.0 -20.0 44.0-78.0 192.0 -66.0 -73.0 -39.0 57.0

  • -122.0 -90.0 25.0 -10.0 -80.0 -306.0 -3.0

  • 60 2 1

  • # class0 class1

  • 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

  • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


Method
Method Patients

  • The individual consist of 5 trees (mating pools)

  • N=60 fold generates 60 random models

  • N =60 fold is repeated 10 times

  • 600 Predictive Patent Survival Models

  • if Tree(i=1..5)>0, GP model positive (node)

  • Genetic modifications in tree 1 and 2

  • Trees may specialize(tissue)

  • Program Fitness (Pos/Neg) Accuracy > .5


Gp conditions
GP Conditions Patients

  • Terminals ( µarray data)

  • Functions(+,-,/,*,exp,<,> ..)

  • Fitness Measurement(Data)

  • Program Control(loop,time)

  • Termination(Generations)



Gp 1 st 2 nd data mining
GP 1 Patientsst/2nd Data Mining

  • 600 GP models

  • 6970 of 7129 Attributes in GP Models

  • 404 Genes in ten or more GP Models

  • 404 Genes were used in 2nd GP run

  • Two Genes in 100 GP models

    • U08998 - 182 GP Models

    • U41737 – 193 GP Models


Gene biology
Gene Biology Patients

  • Genes NOT highly Expressed

  • Not Found in Pomeroy Kmeams Cluster Analysis

  • U08998_at

    • TAR RNA binding protein – promotes cancer

    • TARBP1 GeneCard

  • U41737_at

    • Pancreatic beta cell growth

    • REG3A GeneCard


Gene frequency 2 nd gp
Gene Frequency 2 Patientsnd GP


Final gp
Final GP Patients

  • Limited number of functions

  • Single IF statements ( <,>,,≤)

  • Random generation of function and Genes

  • N=60 fold times 10 accuracy = 68%

  • 147 of 192 were incorrect predictors

  • 39 of 192 were correct two gene predictors



Two gene outcome
Two Gene Outcome Patients

  •  Survived/Predicted Correct –TP

  •  Failed Treatment/Predicted Wrong – FP

  • ⃟ Survived/Predicted Wrong – FN

  •  Failed Treatment/Predicted Correct –TN

  • Darken points poor predictors

  • GP Model predictor:

  • -42 < U41737_at + 2*U0998_at


Limitations
Limitations Patients

  • Extensive computer resources( exponential)

  • NP solution

  • Only heuristic optimal solution

  • Replications of the random selection process with various genetic evolutionary change rates, can cause different results


Bioinformatics
Bioinformatics Patients

  • Allows the selection of low expression gene into predictive model

  • New information can be harvested by repeating execution of GP

  • 5 tree members can be isolated members of

    different organ tissues

  • Disease treatment, prediction and cured


References
References Patients

  • 1 J. DeRisi, et al. 1998. The transcriptional program of sporulation in budding yeasts.

  • Science 282:699-705

  • 2Mitra, A; Almal, A. ; George, B.;Fry,D. ; Lenehan et. al, The use of genetic programming analysis of quantitative expression profiles… BMC Cancer 206;6:159.

  • 3University of Manchester GP Web Site URL

  • : http://dbkgroup.org/gp_home.htm

  • 4Biolograhy of GP references:

  • http://liinwww.ira.uka.de/bibliography/Ai/genetic.programming.html

  • 5Langdon,L.; and Poli, R. Foundations of Genetic Programming ,Springer –Verlag , Berlin. 2001

  • 6Koza,John; Bennett, F.;Andre, D. and Keane, Martin. Genetic Programming, Morgan Kaufmann Publishing, San Francisco, 1999.

  • 7 Hartl, D. and Jones, E. 2002. Essential Genetics 3rd ed. Boston, MA. : .Jones and Bartlett Publishers


ad