modeling gene interactions in disease n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Modeling Gene Interactions in Disease PowerPoint Presentation
Download Presentation
Modeling Gene Interactions in Disease

Loading in 2 Seconds...

play fullscreen
1 / 13

Modeling Gene Interactions in Disease - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

Modeling Gene Interactions in Disease. CS 686 Bioinformatics. Some Definitions. Data mining : extracting hidden patterns and useful info from large data sets. Ex- clustering, machine learning. Should not be:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Modeling Gene Interactions in Disease' - lars-patrick


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
some definitions
Some Definitions
  • Data mining: extracting hidden patterns and useful info from large data sets. Ex- clustering, machine learning.

Should not be:

"Torturing data until it confesses ... and if you torture it enough, it will confess to anything"  - Jeff Jonas, IBM

  • Machine learning: the ability of a program to learn from experience. Ex- neural networks, decision trees, rule-based methods, MDR.
methods
Methods
  • Regression methods: modeling the relationship between a dependent variable and one of more independent variables.
  • Data mining methods: Search the space of possible models efficiently. Better with non-linear and high-dimensional data, or data with many potential interactions.
  • Exhaustive Search: search all possible models for the best one.
linear regression
Linear regression
  • Relates outcome as a linear combination of the parameters (but not necessarily of the independent variables).
  • Ex: Let y = incidence of disease, n data points. Independent variables A,B

1) yi = b0 + b1Ai + εi, i = 1,…,n

2) yi = b0 + b2(Bi)2 + εi, i = 1,…,n

where b0, b1,b2 = parameters, εi is error term.

In both of these examples, the disease is modeled as linear in the parameters, although it is quadratic in variable B

linear regression1
Linear regression

Given a sample, we estimate the params

(ex: can use least squares) to arrive at the linear regression model:

[1]

multiple regression
Multiple regression
  • Relates the the probability of an event to a linear combination of predictor variables.
  • Ex: Let y = incidence of disease, n data points. Independent variables x1, x2

yi = b0 + b1xi1 + b2xi2 + … + bpxip + εi, i = 1,…,n

Best-fit line:

For each unit increase in xip, is expected to increase by .

logistic regression 1
Logistic regression[1]
  • Often used when the outcome is binary, relates the log-odds of the probability of an event to a linear combination of predictor variables. Ex:
  • ln(p/(1 – p)) = α + βxB + γxC + ixBxC,

where xBand xC are measured binary indicator variables, and regression coefficients βand y represent main effects, i represents interaction.

other statistical methods 1
Other statistical methods [1]
  • Bayesian model selection: a statistical approach incorporating both prior distributions for parameters and observed data into the model.
  • Maximum likelihood: a statistical method used to make inferences about the combination of parameter values resulting in the highest probability of obtaining the observed data
modeling terminology 1
Modeling Terminology[1]
  • Saturated: a statistical model that is as full as possible (saturated) with parameters.
  • Marginal effects: the effects of one parameter averaged over the possible values taken by other parameters
  • Entropy: the uncertainty associated with a random variable
modeling terminology 11
Modeling Terminology[1]
  • Cross-validation: partitioning a data set into n subsets, then using each subset in turn as the test set while using the other n-1 to train.
  • Overfitting: a model that provides a good fit to a specific data set but generalizes poorly.
  • Marginal effects: the effects of one parameter averaged over the possible values taken by other parameters.
slide11

Marginal Effects [2]

Marginal penetrance: Ex: The probability P(D|A=Aa), irrespective of what value B has

Table II. Penetrance values for combinations of genotypes from two single  nucleotide polymorphisms exhibiting interactions in the absence of independent main effects

Genotype Genotype Marginal penetrance B

AA (0.25)      Aa (0.50) aa (0.25)

BB (0.25)         0          1          0 0.5

Bb (0.50)        1          0          1 0.5

bb (0.25)         0          1          0 0.5

Marginal          0.5          0.5          0.5

penetrance A

Genotype frequencies are given in parentheses

Marginal penetrance values for the A, B genotypes.

weka 3
Weka [3]
  • A collection of visualization tools and algorithms for data analysis and predictive modeling.
  • Preprocessing tools for reading data in a variety of formats and transforming it.
  • Classification algorithms include regression, neural network, support vector machine, decision tree. Display includes ROC curves
  • Clustering: k-means, expectation maximization
  • Visualization includes scatter-plot, bar graph
references
References
  • Cordell, 2009, Detecting gene–gene interactions that underlie human diseases. Nature Review Genetics
  • McKinney et al, 2006, Machine Learning for Detecting Gene-Gene Interactions, A Review. Biomedical Genomics and Proteomics
  • Weka site: http://www.cs.waikato.ac.nz/ml/weka