Bayesian Models for Gene expression With DNA Microarray Data Joseph G. Ibrahim, Ming-Hui Chen, and Robert J. Gray Presented by Yong Zhang
Goals: • To build a model to compare between normal and tumor tissues and to find the genes that best distinguish between tissue types. • to develop model assessment techniques so as to assess the fit of a class of competing models.
Outline General Model Gene Selection Algo. Prior Distributions L measures(assessment) example
Data structure x: the expression level for a given gene C0: threshold value for which a gene is considered as not expressed Let p = P(x=c0), then where y is the continuous part for x.
j=1, 2 index the tissue type(normal vs. tumor) • i=1,2,…nj, ith individual • g=1,…G, gth gene • xjig : the gene expression mixture random variable for the jth tissue type for the ith individual and the gth gene.
The General Model • Assume • δjig = 1(xjig=c0) • pjg=P(xjig=c0)=P(δjig = 1)
=(,2,p) • Data D=(x111,…x2,n2,G, ) • Likelihood function for : L(|D)= In order to findwhich genes best discriminate between the normal and tumor tissues, let
Then we set such that we can use g to judge them.
Prior Distributions • jg2 ~ Inverse Gamma(aj0,bj0) • j0 ~ N(mj0,vj02), j=1,2
bj0 ~ gamma(qj0,tj0) • ejg ~ N(uj0,kj0wj02)
Gene Selection Algo. • For each gene, compute g and • Select a “threshold” value, say r0, to decide which genes are different. If • Once the gth genes are declared different, set 1g 2g, otherwise set 1g =2g g , where g is treated as unknown.
Gene Selection Algo. 4) Create several submodels using several values of r0. 5) Use L measure to decide which submodel is the best one(smallest L measure).
The properties of this approach • Model the gene expression level as a mixture random variable. • Use a lognormal model for the continuous part of the mixture. • Use L measure statistic for evaluating models.
L measure for model assessment • It relies on the notion of an imaginary replicate experiment. • Let z= (z111, …, z2,n2,G) denote future values of a replicate experiment.
L measure is the expected squared Euclidean distance between x and z, A more general is The r.s. of the last formula can be got by MCMC.
For 1–4 and 6, the generation is straightforward. • For 5, we can use an adaptive rejection algorithm(Gilks and Wild, 1992) because the corresponding conditional posterior densities are log-concave.
Discussion • That model development and prior distributions in this paper can be easily extended to handle three or more tissue types. • More general classes of priors • The gene selection criterions