slide1
Download
Skip this Video
Download Presentation
Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28

Loading in 2 Seconds...

play fullscreen
1 / 19

Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28 - PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on

Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28. So far, all methods are one-gene-at-a-time First these methods are simple and intuitive, then they begin to become complicated.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28' - dominic-cherry


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Part II

– with interactions of genes in mind

Min-Te Chao

2002/10/ 28

slide2
So far, all methods are one-gene-at-a-time
  • First these methods are simple and intuitive, then they begin to become complicated.
  • Eg., Efron has to use a tricky logistic regression to estimate the prior density which is not too easy.
slide3
The general problem with microarray of data is, although similar in regression setup, the “design matrix” is never of full rank.
slide4
In the setup

Y=X * \beta + error

X is n by p, with n<100, p>1000.

I have seen a case with n=7, but p>6000.

slide5
Let us say there is a way to “Do the statistical problem” (say, with traditional methods), with a smaller p, say p=p_1=3 or 30, depending on the value of n we have.
  • Let us assume a model with the first p_1 parameteres only (the other betas are all 0, say)
slide6
With our traditional method, we may find the likelihood function – with n observation and p_1 parmateres
  • And we go through the text book method to do inference about the selected p_1 parameters.
  • And obtain an estimator of the p_1-dim parameter (together with a sd or p-value)
slide7
Repeat the procedure B times, each time with a

“simple random sample without replacement of size p_1”

from the p genes in the problem.

slide8
In this way we change an unsolvable problem (in our classical statistical sense) to B problems, all of them can be done with traditional methods
  • It is very time-consuming, but sometimes it works
slide9
Lo, S haw-Hwa and Tien Zheng (2002) Backward haplotype transmission association algorithm – a fast multi-marker screening method

To appear: Human Heredity

slide10
Instead of genes, they use markers.
  • P-markers, n-patient
  • For each patient, we have data from father and mother
  • So we have n pieces of

parents – child

data.

slide12
They pick out r markers at a time, r<<p
  • A statistics T(r) is constructed, which tells the “amount of information” for a n-patient, r-marker sub-problem
  • Markers in this subproblem are deleted one by one, the least important one first,

until all markers left are important

slide13
This gets us the group 1 of important markers.
  • We do the same thing for another subset of r markers, and get the group 2 of important markers, ….
  • Do it B times, B pretty large, say 5000
slide14
Combine all markers together, those with highest frequencies are selected.
  • More specifically, markers whose returning frequencies are more than the 3-rd quartile plus 1.8 times IQR will be selected (about 3.1 sd from mean)
  • About 10^{-3} type I error.
slide15
The difficult part of the problem is to formulate a likelihood function for the r selected markers.
  • The next problem is to derive a test statistic, together with its properties.

But these are problem-specific…

slide16
It is the generality of the setup that is important.
  • Because it considers r markers at a time, so the likelihood function is with respect to the r selected markers. If there is any interaction between 2 or 3 markers, this process has a potential to pick them up
slide18
All known methods, data mining or not, for analysis of micro array type of data are ad hoc and rather primitive.
  • Amount of theory is limited.
  • It has the tendency that these methods will eventually become statistical in nature, because an assessment of risk is still a very important factor in scientific work
slide19
Subject-matter relevancy is the key
  • Other keys:

good data

other scientists

effective computation

don’t wait

ad