1 / 33

Intro. Bioinformatics

Intro. Bioinformatics. Spencer Muse, NCSU Statistics Hamid Ashrafi, NCSU HorticuluturalScience Fred Wright, NCSU Statistics/Biological Sciences Block 1: DNA Sequence Analysis 8/17 – 9/21 Spencer Muse. Genetics Basics. Basic Concepts. For most organisms DNA is the genetic material

ndiaz
Download Presentation

Intro. Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro. Bioinformatics Spencer Muse, NCSU Statistics Hamid Ashrafi, NCSU HorticuluturalScience Fred Wright, NCSU Statistics/Biological Sciences Block 1: DNA Sequence Analysis 8/17 – 9/21 Spencer Muse

  2. Genetics Basics

  3. Basic Concepts For most organisms • DNA is the genetic material • DNA composes chromosomes • Chromosomes are found in the nucleus of cells • Chromosomes are inherited by offspring from parents

  4. DNA, the Genetic Material • DNA is a chain of nucleotides, or bases. • DNA has 4 different nucleotides: • A: adenine • C: cytosine • G: guanine • T: Thymine • U: Uracil (in RNA) ATGCTACTTCACTGA ||||||||||||||| TACGATGAAGTGACT DNA is often found in a double-stranded form. A pairs with T, C pairs with G.

  5. Genes A gene is a small region of a chromosome (and is thus simply a string of nucleotides). ATGCTACTTCACTGA

  6. The Genetic Code • Protein-coding genes are composed of triplets of nucleotides called codons. • Each codon encodes one of 20 possible amino acids. • Chains of amino acids form proteins. ATG CTA CTT CAC TGA Met Leu Leu His Stop M L L H *

  7. Central Dogma DNA ATG CTA CTT CAC TGA RNA AUG CUA CUU CAC UGA Transcription Protein Translation M L L H

  8. Anatomy of a Gene Introns Promoter Exons

  9. DNA to RNA to Protein

  10. DNA RNA Nucleotide Base Transcription Translation Intron Exon RNA polymerase Promoter Chromosome Gene Protein Amino acid Splicing Nucleus Keywords

  11. Phenotypes • Which genes affect a phenotype? • Relating genetic variation to phenotypic variation

  12. SNPs • Single Nucleotide Polymorphisms • Very dense SNP maps are currently being produced (1,000,000+ in humans) • Fast, cheap to score

  13. Plasminogen activator inhibitor-2 HMG CoA reductase Gene Expression Profiling using DNA Microarrays Each spot corresponds to a single human gene Signal color and intensity reveal changes in gene activity

  14. Other Markers • SSRs (Simple Sequence Repeats; microsatellites) • RFLP (Restriction Fragment Length Polymorphisms) • SSCP (Single Sequence Confirmation Polymorphisms)

  15. Statistics Overview

  16. Random Variables and Probability Probability Distributions Parameter Estimation Hypothesis Testing Likelihood Conditional Probability Stochastic Processes Inference for Stochastic Processes Overview

  17. Probability The probability of a particular event occurring is the frequency of that event over a very long series of repetitions. • P(tossing a head) = 0.50 • P(rolling a 6) = 0.167 • P(average age in a population sample is greater than 21) = 0.25

  18. Random Variables A random variable is a quantity that cannot be measured or predicted with absolute accuracy.

  19. Probability Distributions • The distribution of a random variable describes the possible values of the variable and the probabilities of each value. • For discrete random variables, the distribution can be enumerated; for continuous ones we describe the distribution with a function.

  20. Examples of Distributions Binomial Normal

  21. Parameter Estimation One of the primary goals of statistical inference is to estimate unknown parameters. For example, using a sample taken from the target population, we might estimate the population mean using several different statistics: the sample mean, the sample median, or the sample mode. Different statistics have different sampling properties.

  22. Hypothesis Testing A second goal of statistical inference is testing the validity of hypotheses about parameters using sample data. If the observed frequency is much greater than 0.5, we should reject the null hypothesis in favor of the alternative hypothesis. How do we decide what “much greater” is?

  23. Likelihood For our purposes, it is sufficient to define the likelihood function as Analyses based on the likelihood function are well-studied, and usually have excellent statistical properties.

  24. We say that is the maximum likelihood estimate of . Maximum Likelihood Estimation The maximum likelihood estimate of an unknown parameter is defined to be the value of that parameter that maximizes the likelihood function:

  25. If , then Some simple calculus shows that the MLE of is , the frequency of “successes” in our sample of size n. Example: Binomial Probability If we had been unable to do the calculus, we could still have found the MLE by plotting the likelihood:

  26. Likelihood Ratio Tests Consider testing the hypothesis: The likelihood ratio test statistic is:

  27. Distribution of the Likelihood Ratio Test Statistic Under quite general conditions, where n-1 is the difference between the number of free parameters in the two hypotheses.

  28. Conditional Probability The conditional probability of event A given that event B has happened is

  29. Stochastic Processes A stochastic process is a series of random variables measured over time. Values in the future typically depend on current values. • Closing value of the stock market • Annual per capita murder rate • Current temperature

  30. ACGGTTACGGATTGTCGAA t = 0 ACaGTTACGGATTGTCGAA t = 1 ACaGTTACGGATgGTCGAA t = 2 ACcGTTACGGATgGTCGAA t = 3

  31. Inference for Stochastic Processes We often need to make inferences that involve the changes in molecular genetic sequences over time. Given a model for the process of sequence evolution, likelihood analyses can be performed.

More Related