280 likes | 294 Views
Explore innovative methods to combine various marker densities and datasets for more accurate genomic evaluation. Learn about tracing inheritance, haplotype probabilities with few or more markers, recent program revisions, coding of alleles and segments, and successful haplotyping measures.
E N D
Topics • Methods to combine different marker densities and datasets • More markers: 500,000 simulation • More animals: 3,000 marker subset • More breeds: multi-trait markers • More traits: same genotype cost
Methods to Trace Inheritance • Few markers • Pedigree needed • Prob (paternal or maternal alleles inherited) computed within families • Many markers • Can find matching DNA segments without pedigree • Prob (haplotypes are identical) mostly near 0 or 1 if segments contain many markers
Haplotype Probabilities with Few Markers (12 SNP / chromosome)
Haplotype Probabilities with More Markers (50 SNP / chromosome)
Haplotyping Programfindhap.f90 • Begin with population haplotyping • Divide chromosomes into segments, ~250 SNP / segment • List haplotypes by genotype match • Similar to FastPhase, IMPUTE, or long range phasing • End with pedigree haplotyping • Detect crossover, fix noninheritance • Impute nongenotyped ancestors
Recent Program Revisions • Improved imputation and reliability • Changes since January 2010 • Use known haplotype if second is unknown • Use current instead of base frequency • Combine parent haplotypes if crossover is detected • Begin search with parent or grandparent haplotypes • Store 2 most popular progeny haplotypes • Simulated crossover rate increased
Coding of Alleles and Segments • Genotypes • 0 = BB, 1 = AB or BA, 2 = AA • 3 = B_, 4 = A_, 5 = __ (missing) • Allele frequency used for missing • Haplotypes • 0 = B, 1 = not known, 2 = A • Segment inheritance (example) • Son has haplotype numbers 5 and 8 • Sire has haplotype numbers 8 and 21 • Son got haplotype number 5 from dam
Most Frequent Haplotypes1st segment of chromosome 15 1 5.16% 022222222020020022002020200020000200202000022022222202220 2 4.37% 022020220202200020022022200002200200200000200222200002202 3 4.36% 022020022202200200022020220000220202200002200222200202220 4 3.67% 022020222020222002022022202020000202220000200002020002002 5 3.66% 022222222020222022020200220000020222202000002020220002022 6 3.65% 022020022202200200022020220000220202200002200222200202222 7 3.51% 022002222020222022022020220200222002200000002022220002220 8 3.42% 022002222002220022022020220020200202202000202020020002020 9 3.24% 022222222020200000022020220020200202202000202020020002020 10 3.22% 022002222002220022002020002220000202200000202022020202220 For efficiency, store haplotypes just once. Most frequent haplotype in Holsteins had 4,316 copies = .0516 * 41,822 animals * 2 chromosomes each
Population Haplotyping Steps • Put first genotype into haplotype list • Check next genotype against list • Do any homozygous loci conflict? • If haplotype conflicts, continue search • If match, fill any unknown SNP with homozygote • 2nd haplotype = genotype minus 1st haplotype • Search for 2nd haplotype in rest of list • If no match in list, add to end of list • Sort list to put frequent haplotypes 1st
Check New Genotype Against List1st segment of chromosome 15 Search for 1st haplotype that matches genotype: 022112222011221022021110220010110212202000102020120002021 5.16% 022222222020020022002020200020000200202000022022222202220 4.37% 022020220202200020022022200002200200200000200222200002202 4.36% 022020022202200200022020220000220202200002200222200202220 3.67% 022020222020222002022022202020000202220000200002020002002 3.66% 022222222020222022020200220000020222202000002020220002022 Get 2nd haplotype by removing 1st from genotype: 022002222002220022022020220020200202202000202020020002020 3.65% 022020022202200200022020220000220202200002200222200202222 3.51% 022002222020222022022020220200222002200000002022220002220 3.42% 022002222002220022022020220020200202202000202020020002020 3.24% 022222222020200000022020220020200202202000202020020002020 3.22% 022002222002220022002020002220000202200000202022020202220
Simulated 500K Tests • How many 500K genotypes needed? • Is computation affordable? • Two subsets of mixed 500K and 50K: • Of 33,414 HO, only 1,406 (young) had 500K • Also bulls > 99% reliability, total 3,726 • Linkage generated in base population • Efficient and similar to autoregressive • Linkage affects gain from more markers
Measures of Haplotyping Success • Does estimated = true genotype? • Does estimated = true linkage for adjacent heterozygous markers? • Does estimated = true paternity? • How many alleles remain missing? • What is the error rate (Druet, 2010)? • What is corr2(estimated, true genotype)? • Are resulting GEBVs reliable?
Imputation Summary • 1,406 young animals genotyped at 500K • REL gain 0.8% vs. 1.4% with all 500K • Imputation better if ancestors also genotyped • Could genotype additional reference bulls instead of re-genotyping bulls already done • 32,008 animals imputed from 50K • 10% SNP known before, 93% after • 97-98% of 500K genotypes correct • .839 squared correlation (estimated, true genotype)
Multi-Breed Genomic Evaluation • Treat allele effects as independent, same, or correlated, using data of • 5,331 purebred Holsteins, • 1,361 purebred Jerseys, and • 506 purebred Brown Swiss
Protein Yield R2 Optimum correlation was .3 with 43K markers, and would be larger with more markers
Fewer Markers, More Animals • Half of young animals assigned 3K • Proven bulls, cows all had 43K • Dams imputed using 43K and 3K • Half of ALL animals assigned 3K • Could 3K reference animals help? • 10,000 proven bulls yet to genotype • Should cows with 3K be predictors?
Correlations2 of 3K and PA with 43KGenotyped ancestors had 43K • Consistent gains across traits • Reliability gain from progeny with 3K was 79-87% of gain from 43K • Gain % = [Corr(3K,43K)2 - Corr(PA,43K)2] / [1 - Corr(PA,43K)2] • Large benefits for smaller cost
Conclusions - 1 • Missing genotypes can be filled easily • Population and pedigree haplotyping can both process long segments efficiently • Imputing 500,000 SNP for 33,414 Holsteins required 3 Gbyte memory, 3 CPU hours • Haplotyping implemented for April 2010 routine U.S. evaluation • Several recent improvements to accuracy • Ready to include lower or higher density genotypes in evaluations
Conclusions - 2 • More markers improved reliability < 2% • 1,406 high density genotypes sufficient • 32,008 other animals imputed from 50K to 500K in simulation • Fewer markers can decrease cost • More animals can greatly increase reliability and selection differential • Multi-breed model improves reliability only slightly (< 1%) at current density
Acknowledgments • Katie Olson computed the multi-breed genomic evaluation • Mel Tooker assisted with graphics and computation • Bob Schnabel helped improve marker locations on the map