Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28

1 / 19

# Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28 - PowerPoint PPT Presentation

Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28. So far, all methods are one-gene-at-a-time First these methods are simple and intuitive, then they begin to become complicated.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28' - dominic-cherry

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Part II

– with interactions of genes in mind

Min-Te Chao

2002/10/ 28

So far, all methods are one-gene-at-a-time
• First these methods are simple and intuitive, then they begin to become complicated.
• Eg., Efron has to use a tricky logistic regression to estimate the prior density which is not too easy.
The general problem with microarray of data is, although similar in regression setup, the “design matrix” is never of full rank.
In the setup

Y=X * \beta + error

X is n by p, with n<100, p>1000.

I have seen a case with n=7, but p>6000.

Let us say there is a way to “Do the statistical problem” (say, with traditional methods), with a smaller p, say p=p_1=3 or 30, depending on the value of n we have.
• Let us assume a model with the first p_1 parameteres only (the other betas are all 0, say)
With our traditional method, we may find the likelihood function – with n observation and p_1 parmateres
• And we go through the text book method to do inference about the selected p_1 parameters.
• And obtain an estimator of the p_1-dim parameter (together with a sd or p-value)
Repeat the procedure B times, each time with a

“simple random sample without replacement of size p_1”

from the p genes in the problem.

In this way we change an unsolvable problem (in our classical statistical sense) to B problems, all of them can be done with traditional methods
• It is very time-consuming, but sometimes it works
Lo, S haw-Hwa and Tien Zheng (2002) Backward haplotype transmission association algorithm – a fast multi-marker screening method

To appear: Human Heredity

Instead of genes, they use markers.
• P-markers, n-patient
• For each patient, we have data from father and mother
• So we have n pieces of

parents – child

data.

They pick out r markers at a time, r<<p
• A statistics T(r) is constructed, which tells the “amount of information” for a n-patient, r-marker sub-problem
• Markers in this subproblem are deleted one by one, the least important one first,

until all markers left are important

This gets us the group 1 of important markers.
• We do the same thing for another subset of r markers, and get the group 2 of important markers, ….
• Do it B times, B pretty large, say 5000
• More specifically, markers whose returning frequencies are more than the 3-rd quartile plus 1.8 times IQR will be selected (about 3.1 sd from mean)
• About 10^{-3} type I error.
The difficult part of the problem is to formulate a likelihood function for the r selected markers.
• The next problem is to derive a test statistic, together with its properties.

But these are problem-specific…

It is the generality of the setup that is important.
• Because it considers r markers at a time, so the likelihood function is with respect to the r selected markers. If there is any interaction between 2 or 3 markers, this process has a potential to pick them up
All known methods, data mining or not, for analysis of micro array type of data are ad hoc and rather primitive.
• Amount of theory is limited.
• It has the tendency that these methods will eventually become statistical in nature, because an assessment of risk is still a very important factor in scientific work
Subject-matter relevancy is the key
• Other keys:

good data

other scientists

effective computation

don’t wait