Modeling Gene Interactions in Disease. CS 686 Bioinformatics. Some Definitions. Data mining : extracting hidden patterns and useful info from large data sets. Ex- clustering, machine learning. Should not be:
CS 686 Bioinformatics
Should not be:
"Torturing data until it confesses ... and if you torture it enough, it will confess to anything" - Jeff Jonas, IBM
1) yi = b0 + b1Ai + εi, i = 1,…,n
2) yi = b0 + b2(Bi)2 + εi, i = 1,…,n
where b0, b1,b2 = parameters, εi is error term.
In both of these examples, the disease is modeled as linear in the parameters, although it is quadratic in variable B
Given a sample, we estimate the params
(ex: can use least squares) to arrive at the linear regression model:
yi = b0 + b1xi1 + b2xi2 + … + bpxip + εi, i = 1,…,n
For each unit increase in xip, is expected to increase by .
where xBand xC are measured binary indicator variables, and regression coefficients βand y represent main effects, i represents interaction.
Marginal penetrance: Ex: The probability P(D|A=Aa), irrespective of what value B has
Table II. Penetrance values for combinations of genotypes from two single nucleotide polymorphisms exhibiting interactions in the absence of independent main effects
Genotype Genotype Marginal penetrance B
AA (0.25) Aa (0.50) aa (0.25)
BB (0.25) 0 1 0 0.5
Bb (0.50) 1 0 1 0.5
bb (0.25) 0 1 0 0.5
Marginal 0.5 0.5 0.5
Genotype frequencies are given in parentheses
Marginal penetrance values for the A, B genotypes.