1 / 39

Lecture 19: Association Studies II

Lecture 19: Association Studies II. Date: 10/29/02 Finish case-control TDT Relative Risk. REVIEW Case-Control – Derivation VIII. CORRECTION Case-Control – Hypothesis Testing. Recall that the trait allele frequencies are set in stone to calculate the trait prevalence K .

urit
Download Presentation

Lecture 19: Association Studies II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 19: Association Studies II Date: 10/29/02 Finish case-control TDT Relative Risk

  2. REVIEW Case-Control – Derivation VIII

  3. CORRECTION Case-Control – Hypothesis Testing • Recall that the trait allele frequencies are set in stone to calculate the trait prevalence K. • Model 1 (HWE, no LE): There are 2n distinct haplotypes, thus there are 2n-2 degrees of freedom. • Restricted Model 0 (HWE, LE): There are n distinct alleles, thus there are n – 1 degrees of freedom. • 2(lnL1 – lnL2) with n – 1 degrees of freedom tests for LE under the assumption of HWE. • Calculate the mle for model 1 with a modified EM.

  4. Estimating Genetic Parameters • h = p1, p2, f11, f12, f22 are genetic parameters underlying the theoretical distribution of genotypes in the case-control approach. • When the genetic model and thus h are unknown, then one resorts to contingency tables. • Can the data be used to estimate h?

  5. Estimating Genetic Parameters • One could estimate the haplotype frequencies h1i, h2i simultaneously with the genetic parameters h. • Then, 2[lnL(h1i, h2i, h) – lnL(qi, h)] is a statistic for testing linkage equilibrium without conditioning on known genetic parameters. • However, the G statistics above has an unknown distribution because when there is linkage equilibrium, then the marker locus and disease locus are independent and L(qi, h) is actually independent of h.

  6. Spurious Associations (4.6.4) • Population subdivision, or any of the other causes of linkage disequilibrium we discussed last time, can cause spurious associations, i.e. linkage disequilibrium not caused by tight linkage. • Population subdivision is probably the most common source of spurious associations. • Other sources of spurious association cannot be accommodated so easily, except to know your population and know what is greater than “normal” association in this population.

  7. Population Subdivision – Identifying Subpopulations • Identify subpopulations where matings occur randomly. These are subpopulations which will differ in trait and marker allele frequencies. Sometimes, a priori information is available about subpopulations in which these allele frequencies differ. • Often subdivide by ethnicity, location, religion, social class, and age.

  8. Population Subdivision - Sampling Designs • Sample only from one identified subdivision. • Match case and control by subdivision. • In complex traits, there may be multiple loci associated with a disease, and these loci may vary between subpopulations. Which sampling scheme do you recommend?

  9. Hidden Population Stratification • One cannot anticipate all sources of spurious association. • Internal checks may indicate presence of remaining spurious association. • Test HWE on individual markers. • Test markers on different chromosomes for spurious association. • Trait loci that associate tightly with multiple distant markers are a sign of trouble.

  10. Using Families – Removing Spurious Association • The effect of spurious association can be removed by comparing the chromosomes of affected children to their relatives. • The most common relative to use? Parents. • This does NOT mean that we are returning to family-based linkage analysis. As you will see, we still use information from multiple generations of recombination.

  11. Moving to Biallelic Model linkage disequilibrium linkage equilibrium

  12. TDT – Assumptions • Depends on the presence of linkage disequilibrium at the population level. • Assumes random mating.

  13. TDT – Genetic Model A D q Allele Frequencies P(A) = pA P(a) = 1 – pA P(D) = pD P(d) = 1 - pD Linkage Disequilibrium DAB = hAD - pApD

  14. TDT – Haplotype Frequencies

  15. TDT – The Test • Assume we randomly sample affected individuals and then genotype that individual and his/her two parents for marker A. • Take those families where the parents are heterozygous for the marker. • Record the data as transmitted and nontransmitted alleles. A table as shown on the next slide is typically used.

  16. TDT – The Table N is the number of affected children sampled.

  17. TDT – Filling the Table Aa Aa n12 += _____ n21 += _____ AA

  18. TDT – Filling the Table Aa Aa n12 += _____ n21 += _____ Aa

  19. TDT – Statistic

  20. TDT – Derivation Nontransmitted Transmitted Under H0 the expected frequencies are equal.

  21. TDT – Example • Search for Insulin-Dependent Diabetes Mellitus (IDDM) (Spielman et al. 1993). • 94 families included in study • 62 families had heterozygous parents at a marker on chromosome 11 with possible alleles “1” and “X”. • 78 “1” alleles were transmitted to affected children. 124-78 = 46 “X” alleles were transmitted to affected children.

  22. TDT – Example (cont)

  23. TDT - Power • How do we calculate the power of a TDT test? Make assumptions

  24. TDT – Power (cont) • Statistical power is given by

  25. TDT – Power (cont) • Power increases with sample size (number affected children). • Power increases with as recombination fraction decreases. • Power increases as linkage disequilibrium in population increases. • Power increases as trait allele frequency decreases (trait is rare). • Power is only slightly affected by marker allele frequencies.

  26. TDT – Power Compared • TDT has lower power than a simple test for linkage disequilibrium in a random population sample. • TDT loses power by ignoring some of the data (only heterozygous parents considered) and because homozygous parents provide much information about linkage disequilibrium. • Why is TDT used then?

  27. TDT – Advantages • TDT is a test for linkage and linkage disequilibrium, not just linkage disequilibrium. • Linkage disequilibrium from non-linkage sources can only change the genotypes of the parents. • TDT test transmission of heterozygous parents, and only linkage can result in significant result. • TDT can also detect segregation distortion at the marker locus. Another reason to check marker alleles for segregation distortion.

  28. TDT – Advantages (cont) Ad AD unlinked Ad AD AD AD aD AD AD aD AD AD linked AD AD aD

  29. Relative Risk Method • Analog to the general disequilibrium test on random population sample when dominant or recessive trait or marker (two genotype classes indistinguishable). • Observe two independent groups, defined by their marker genotype. • Determine the risk of being affected conditional on group P(affected | marker group). • Then, the relative risk is

  30. Relative Risk – Data

  31. Relative Risk – Statistic

  32. Relative Risk – Conditional Probabilities

  33. Relative Risk – Null Distribution

  34. Relative Risk – Statistical Test • Chi-squared test for independence on the table. • Likelihood ratio test: 2 degrees of freedom

  35. Haplotype Relative Risk AB BC case genotype: _____ control genotype: _____ BB

  36. Haplotype-Based HRR (HHRR) • Focus on alleles rather than genotypes. • There are two transmitted and two non-transmitted alleles in every pair of parents with one affected offspring. • Treat the two allele samples as independent case-control samples.

  37. HHRR – II AB BC case alleles: _____ control alleles: _____ BB

  38. HHRR – III

  39. HRR & HHRR • Most powerful when linkage is 0. • Both assume random mating when they assume the parents provide an independent control genotype or alleles. • HHRR is more powerful than TDT because it uses information from homozygous parents. • HHRR, is valid test statistic for DAD = 0 and q=0.

More Related