Bayesian Models for Gene expression With DNA Microarray Data

Bayesian Models for Gene expression With DNA Microarray Data Joseph G. Ibrahim, Ming-Hui Chen, and Robert J. Gray Presented by Yong Zhang

Goals: • To build a model to compare between normal and tumor tissues and to find the genes that best distinguish between tissue types. • to develop model assessment techniques so as to assess the fit of a class of competing models.

Outline General Model Gene Selection Algo. Prior Distributions L measures(assessment) example

Data structure x: the expression level for a given gene C0: threshold value for which a gene is considered as not expressed Let p = P(x=c0), then where y is the continuous part for x.

j=1, 2 index the tissue type(normal vs. tumor) • i=1,2,…nj, ith individual • g=1,…G, gth gene • xjig : the gene expression mixture random variable for the jth tissue type for the ith individual and the gth gene.

The General Model • Assume • δjig = 1(xjig=c0) • pjg=P(xjig=c0)=P(δjig = 1)

=(,2,p) • Data D=(x111,…x2,n2,G, ) • Likelihood function for : L(|D)= In order to findwhich genes best discriminate between the normal and tumor tissues, let

Then we set such that we can use g to judge them.

Prior Distributions • jg2 ~ Inverse Gamma(aj0,bj0) • j0 ~ N(mj0,vj02), j=1,2

bj0 ~ gamma(qj0,tj0) • ejg ~ N(uj0,kj0wj02)

Gene Selection Algo. • For each gene, compute g and • Select a “threshold” value, say r0, to decide which genes are different. If • Once the gth genes are declared different, set 1g  2g, otherwise set 1g =2g  g , where g is treated as unknown.

Gene Selection Algo. 4) Create several submodels using several values of r0. 5) Use L measure to decide which submodel is the best one(smallest L measure).

The properties of this approach • Model the gene expression level as a mixture random variable. • Use a lognormal model for the continuous part of the mixture. • Use L measure statistic for evaluating models.

L measure for model assessment • It relies on the notion of an imaginary replicate experiment. • Let z= (z111, …, z2,n2,G) denote future values of a replicate experiment.

L measure is the expected squared Euclidean distance between x and z, A more general is The r.s. of the last formula can be got by MCMC.

Computational Algo.(MCMC) 1.

For 1–4 and 6, the generation is straightforward. • For 5, we can use an adaptive rejection algorithm(Gilks and Wild, 1992) because the corresponding conditional posterior densities are log-concave.

Discussion • That model development and prior distributions in this paper can be easily extended to handle three or more tissue types. • More general classes of priors • The gene selection criterions

Bayesian Models for Gene expression With DNA Microarray Data

Bayesian Models for Gene expression With DNA Microarray Data

Presentation Transcript

Clustering analysis of microarray gene expression data

Statistical Methods for Analyzing Ordered Gene Expression Microarray Data

Microarray technology and analysis of gene expression data

Microarray Gene Expression Data Analysis

Bayesian Networks for Modeling Gene Expression Data

Learning Bayesian Networks with microarray data

Introduction to Gene Chips and Microarray Expression Data

Discrimination and clustering with microarray gene expression data

Microarray Data Analysis Differential Gene Expression

Introduction to Microarray Gene Expression

Gene expression: Microarray data analysis

Classification of Microarray Gene Expression Data

Microarray and Gene Expression Profiling

A Gene Expression Barcode for Microarray Data

Context-Specific Bayesian Clustering for Gene Expression Data

ArrayExpress – a public database for microarray gene expression data

Bayesian mixture models for analysing gene expression data

BIOL6900 Chapter 9 Gene expression: Microarray data analysis

Classification of Microarray Gene Expression Data

Lecture 4 MicroArray Gene expression

Clustering analysis of microarray gene expression data

Eigensolvers for analysis of microarray gene expression data