genomewide association studies n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Genomewide Association Studies PowerPoint Presentation
Download Presentation
Genomewide Association Studies

Loading in 2 Seconds...

play fullscreen
1 / 67

Genomewide Association Studies - PowerPoint PPT Presentation


  • 153 Views
  • Uploaded on

Genomewide Association Studies. Genomewide Association Studies. 1. History Linkage vs. Association Power/Sample Size 2. Human Genetic Variation: SNPs 3. Direct vs. Indirect Association Linkage Disequilibrium 4. SNP selection, Coverage, Study Designs 5. Genotyping Platforms

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Genomewide Association Studies


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Genomewide Association Studies

    2. Genomewide Association Studies • 1. History • Linkage vs. Association • Power/Sample Size • 2. Human Genetic Variation: SNPs • 3. Direct vs. Indirect Association • Linkage Disequilibrium • 4. SNP selection, Coverage, Study Designs • 5. Genotyping Platforms • 6. Early (recent) GWA Studies

    3. Risch and Merikangas 1996 Sample Size Association < Sample Size for Linkage

    4. Risch and Merikangas 1996

    5. Sample Size Required • Linkage Analysis with affected sib pairs • Transmission Disequilbrium Test (TDT) • TDT with affected sib pairs

    6. Affected Sib Pair Linkage Analysis • 2 siblings/family • Both sibs affected • IBD at the marker locus • Expect 50% on average

    7. Identity By Descent Sibling 1 A A 2 1 1 0 A A a A A a a a

    8. Identity By Descent Expected number of alleles IBD is = 2*25% + 1*50% + 0*25% = 1 allele = 50% sharing

    9. Risch and Merikangas 1996

    10. Sample Size Calculation Exposure Frequency Effect Size Identity By Descent (IBDM) Sample Size Required

    11. Sample Size Calculation Exposure Frequency Effect Size Identity By Descent (IBDM) Sample Size Required High IBD sharing Low IBD sharing

    12. TDT Transmitted alleles vs. non-transmitted alleles M1 M2 M2 M2 M1M2

    13. TDT Transmitted alleles vs. non-transmitted alleles TDT = (n12 - n21)2 (n12 + n21) Asymptotically c2 with 1 degree of freedom

    14. TDT Transmitted alleles vs. non-transmitted alleles M1 M2 M2 M2 M1M2

    15. TDT For this one Trio: TDT = (1 - 0)2 (1 + 0) p-value = 0.32 = 1

    16. TDT For one hundred Trios: TDT = (50 - 45)2 (50 + 45) p-value = 0.01 = 6.58

    17. Risch and Merikangas 1996 TDT

    18. Linkage • Good for Large Effect Sizes • Genomewide Association • Good for Modest Effect Sizes • Not good for rare disease alleles

    19. Two Hypotheses • Common Disease-Common Variant • Common variants • Small to modest effects • Rare Variant • Rare variants • Larger effects

    20. Allele Frequency and Sample Size

    21. GWA Issues • Cost • Sample Size • Effect Size • Disease Allele Frequency • Multiple Testing • SNP selection • How many? • Which SNPs? • Available Genotyping Platforms

    22. Types of Variants • Single Nucleotide Polymorphism (SNP) • Insertion/Deletion (indel) • Microsatellite or Short Tandem Repeat (STR)

    23. What is a SNP? TTCAGTCAGATCCTAGCCC AAGTCAGTCTAGGATCGGG Chromosome 1 TTCAGTCAGATCCCAGCCC Chromosome 2 AAGTCAGTCTAGGGTCGGG SNP

    24. What is an insertion/deletion? TTCAGTCAGATCCTAGCCC AAGTCAGTCTAGGATCGGG Chromosome 1 TTCAGTCAGATCCCTAGCCC Chromosome 2 AAGTCAGTCTAGGGATCGGG Insertion/Deletion

    25. What is an microsatellite? TTCACAGCAGCAGCAGAGCCC AAGTGTCGTCGTCGTCTCGGG Chromosome 1 TTCACAGCAGCAGAGCCC Chromosome 2 AAGTGTCGTCGTCTCGGG 3 vs. 4 trinucleotide repeats

    26. Relative frequency of each type of variant

    27. The Number of SNPs in the Human Genome

    28. How many SNPs? • 6 billion humans • 12 billion chromosomes • 1% frequency SNP • 120 million copies of the minor allele

    29. Ethnic/Racial Variation in SNP frequency

    30. Rare SNPs across populations

    31. How many of these SNPs have we found? • dbSNP: http://www.ncbi.nlm.nih.gov/projects/SNP/ • 10,430,753 SNPs • 4,868,126 are “validated”

    32. What Risch and Merikangas proposed: • 5 genetic polymorphisms per gene • 100,000 genes (1996) • = 500,000 genotypes per subject • Candidate Gene Study Design • All genes are candidates • Direct or Sequence-based approach • Causal variant is one of the variants tested

    33. Direct vs. IndirectSequence-based vs. Map-based

    34. Indirect Association relies on LD Decay • Variants that are close will have high LD • Variants that are far apart will have low LD • Indirect Association is a form of Positional Cloning

    35. LD Decay E(Dt) = D1 * (1-q)t where Dt is the current amount of LD and t is the number of generations If q = 0.5, LD decays at a rate of 50% per generation If q < 0.5, LD decay is slower

    36. LD Decay over time

    37. Observed LD Decay

    38. Linkage Disequilibrium r2 = (pAB*pab – pAb*paB)2 A B pA * pa * pB * pb a b A b a B

    39. Indirect Association and LD • Sample size required for Direct Association, n • Sample size for Indirect Association = n/ r2 • For r2 = 0.8, increase is 25% • For r2 = 0.5, increase is 100%

    40. Coverage • Percent of all SNPs captured by genotyped SNPs • More genotyped SNPs = better coverage

    41. Diminishing Marginal Returns(Wang and Todd 2003) r2 = 0.5 1,500,000 SNPs 600,000 SNPs r2 = 0.8

    42. Number of SNPs needed to capture all SNPs • Depends on: • Population studied • Minor allele frequency of causal SNP • Level of LD (r2) used as a cutoff • 1.4 million selected SNPs for • Caucasians/Asians • 5% and above • r2 = 0.8

    43. The HapMap Project • Initial Goal: • 600,000 SNPs for indirect association • LD information between SNPs • Phase 1: 1 million SNPs • Phase 2: additional 2.9 million SNPs

    44. HapMap • 270 subjects • 45 Chinese • 45 Japanese • 90 Yoruban and 90 European-American • 30 Trios • 2 parents, 1 child

    45. HapMap • SNPs from dbSNP were genotyped • Looked for 1 every 5kb • SNP Validation • Polymorphic • Frequency • Haplotype Estimation • Haplotype tagging SNPs

    46. Haplotype Tagging

    47. Two approaches • Positional cloning • expand LD mapping to entire genome • Tool: HapMap SNPs • Candidate gene or Gene-based • Expand the number of genes to all genes • 25,000 genes • Tools: jSNPs, SeattleSNPs, NIEHSSNPs

    48. Genome-wide Association LD Based Gene Based

    49. Potentially Functional Regions of a Gene cis regulator ? promoter Amino acid coding RNA processing Transcription regulation

    50. Comparison of Gene-based and Positional Cloning Designs • Positional Cloning • Agnostic (no biological knowledge needed) • Regulatory regions • SNP sets currently incomplete • Expensive • Gene-based • Efficient: Less SNPs need to be genotyped • May miss regulatory regions • Not all SNPs are known