410 likes | 671 Views
Using Genomic Data to Improve Dairy Cattle Genetic Evaluations. Acknowledgments. Genotyping and DNA extraction: USDA Bovine Functional Genomics Lab, U. Missouri, U. Alberta, GeneSeek, Genetics & IVF Institute, Genetic Visions, and Illumina Computing: AIPL staff (Mel Tooker, Leigh Walton)
E N D
Using Genomic Data to Improve Dairy Cattle Genetic Evaluations
Acknowledgments • Genotyping and DNA extraction: • USDA Bovine Functional Genomics Lab, U. Missouri, U. Alberta, GeneSeek, Genetics & IVF Institute, Genetic Visions, and Illumina • Computing: • AIPL staff (Mel Tooker, Leigh Walton) • Funding: • National Research Initiative grants • 2006-35205-16888, 2006-35205-16701 • Agriculture Research Service • Holstein and Jersey breed associations • Contributors to Cooperative Dairy DNA Repository (CDDR)
CDDR Contributors • National Association of Animal Breeders (NAAB, Columbia, MO) • ABS Global (DeForest, WI) • Accelerated Genetics (Baraboo, WI) • Alta (Balzac, AB, Canada) • Genex (Shawano, WI) • New Generation Genetics (Fort Atkinson, WI) • Select Sires (Plain City, OH) • Semex Alliance (Guelph, ON, Canada) • Taurus-Service (Mehoopany, PA)
Genetic Markers: Changing GoalsPast andFuture • Determine if major genes exist (few) • Estimate sparse marker effects • Only within family analysis • Find causative mutations (DGAT1, ABCG2) • Estimate dense effects across families • Implement routine predictions • Increase REL with more genotypes • Decrease cost with a selected SNP subset
Old Genetic Terms • Predicted transmitting ability and parent average • PTA required progeny or own records • PA included only parent data • Genomics blurs the distinction • Reliability • REL of PA could not exceed 50% because of Mendelian sampling • Genomics can predict the other 50% • REL limit at birth theoretically 99%
New Genetic Terms • Genomic relationships and inbreeding • Actual genes in common (G) vs. expected genes in common (A) • Wright’s correlation matrix or Henderson’s numerator relationship (covariance) matrix • Average relationship to population • Expected future inbreeding (EFI) from A • Genomic future inbreeding (GFI) from G • Daughter merit vs. son merit (X vs. Y)
Genomic Studies at Beltsville • 174 markers, 1068 bulls, 8 sires • Illinois, Israel, and AIPL • 1991-1999 • 367 markers, 1415 bulls, 10 sires • GEML, AIPL, Illinois, and Israel • 1995-2004 • 38,416 markers, 19,105 animals • BFGL, AIPL, Missouri, Canada, and Illumina • Oct 2007- Dec 2008
SNP Edits & Counts • SNP available(Illumina SNP50 BeadChip) 58,336 Insufficient average number of beads 1,389 • Unscorable SNP 4,360 • Monomorphic in Holsteins 5,734 • Minor allele frequency (MAF) of <5% 6,145 • Not in Hardy-Weinberg equilibrium 282 • Highly correlated 2,010 • Used for genomic prediction 38,416
Animal Genotype Edits • Require 90% call rate of SNP / animal • Check parent-progeny pair for conflicting homozygotes • If many conflicts or if parent not genotyped, check all genotyped animals for possible parent • Check maternal grandsire (MGS) for expected relationship • Check heterozygous SNP on X (only females)
Repeatability of Genotypes • 2 laboratories genotyped the same 46 bulls • SNP scored the same by both labs • About 1% missing genotypes per lab • Mean of 37,624 out of 38,416 SNP (98% same) • Range across animals of 20 to 2,244 SNP missing • SNP conflict (<0.003%, or 99.997% concordance) • Mean of 0.9 SNP error per 38,416 • Range of 0 to 7 SNP
Genotype Data for ElevationChromosome 1 1000111220020012111011112111101111001121100020122002220111 1202101200211122110021112001111001011011010220011002201101 1200201101020222121122102010011100011220221222112021120120 2010020220200002110001120201122111211102201111000021220200 0221012020002211220111012100111211102112110020102100022000 2201000201100002202211022112101121110122220012112122200200 0200202020122211002222222002212111121002111120011011101120 0202220001112011010211121211102022100211201211001111102111 2110211122000101101110202200221110102011121111011202102102 1211011022122001211011211012022011002220021002110001110021 1021101110002220020221212110002220102002222121221121112002 0110202001222222112212021211210110012110110200220002001002 0001111011001211021212111201010121202210101011111021102112 2111111212111210110120011111021111011111220121012121101022 202021211222120222002121210121210201100111222121101
Genotype Data from Inbred BullChromosome 24 of Megastar 1021222101021021011102110112112211211002202000222020002020220 0000220020222202202000020020222222000020222200000220200002002 2002000000222200022220000000000020222022002000222020222220002 2022222222200002002202022202000200022000000002202220000002200 2020002222002020020020202220222222220222020002022022022220202 2202020202200022002220220022200000220200002002002000200222220 0022220202002220022202000020200000022222020200002002002222000 2022022220022000222202200222202020002202202222002220022000200 2202000002200220222000022000022000222202002222000220020020202 2020002220002220022202202200000220220020020020220002000222202 2002220020220200222202220000020220002020020202000220022000002 2022200202220200022002000200022002002000200220222220022022000 2000020002000020220020220200200002220000222002000200222000022 0220020022002202202020202020200022202000220200202202220220000 2020200002020200022222200222200020022022220000020220020200202 022022020200002000200220220002200
Close Inbreeding (F=14.7%): Double Grandson of Aerostar Aerostar Megastar Aerostar Chromosome 24
Differences in G and AG = genomic and A = traditional relationships • Detected clones, identical twins, and duplicate samples • Detected incorrect DNA samples • Detected incorrect pedigrees • Identified correct source of DNA by genomic relationships with other animals
3 Formulas to Compute G • Sum products of genotypes (g) adjusted for allele frequency (p) • G1jk = ∑ (gij-pi) (gik-pi) / [2 ∑ pi(1-pi)] • Or individually weighted by p • G2jk = ∑ (gij-pi) (gik-pi) / 2pi(1-pi) • Or scaled by intercept (b0) and regression (b1) on A, using p = 0.5 • G3jk = [∑ (gij - 0.5) (gik - 0.5) – b0] / b1
Compare A with 3 formulas for GActual Holstein Data 1Diagonal = 1 + Inbreeding
Summary of G Formulasfor Genomic Inbreeding • Correlations ranked G3 > G1 > G2 in simulation vs. G2 > G1 > G3 with real data (opposite) • G2 and G1 biased down, G3 up • G1 and G2 can be adjusted toward A using b0 and b1, similar to G3 formula • After adjusting, mean G1 = 1.08 and G2 = 1.09 compared to G3 = 1.13 and A = 1.05 • G1 was unbiased in simulation using true rather than estimated frequencies
Genomic vs. PedigreeInbreeding Correlation = .68
Experimental DesignHolstein, Jersey, and Brown Swiss breeds Data from 2003 used to predict independent data from 2008
Genomic Methods • Direct genomic evaluation • Evaluate genotyped animals by summing effects of 38,416 genetic markers (SNPs) • Combined genomic evaluation • Include phenotypes of non-genotyped ancestors by selection index • Transferred genomic evaluation • Propagate info from genotyped animals to non-genotyped relatives by selection index
Reliability Gain1 by BreedYield traits and NM$ of young bulls 1Gain above parent average reliability ~35%
Reliability Gain by BreedHealth and type traits of young bulls
Reliability Gains for Proven Bulls • Proven bulls included in test had: • >10 daughters in August 2003 • >10% increase in reliability by 2008 • Numbers of bulls in test ranged from 104 to 735 across traits • Predicted the change in evaluation • Significant increase in R2 (P < .001) for 26 of 27 traits
Value of Genotyping More SNP9,604 (10K), 19,208 (20K), and 38,416 (40K) SNP
Simulated ResultsWorld Holstein Population • 15,197 older and 5,987 younger bulls in Interbull file • 40,000 SNPs and 10,000 QTLs • Provided timing, memory test • Reliability vs parent average REL • REL = corr2 (EBV, true BV) • 80% vs 34% expected for young bulls • 72% vs 30% observed in simulation
Major Gene on Chromosome 18Net Merit, Productive Life, Calving Ease, Stature, Strength, Rump Width
X, Y, Pseudo-autosomal SNPs 35 SNPs 35 SNPs 0 SNPs 487 SNPs
SNPs on X Chromosome • Each animal has two evaluations: • Expected genetic merit of daughters • Expected genetic merit of sons • Difference is sum of effects on X • SD = .1 σG, smaller than expected • Correlation with sire’s daughter vs. son PTA difference was significant (P<.0001), regression close to 1.0
Linear and Nonlinear Predictions • Linear model • Infinitesimal alleles model: all SNP have normally distributed effects • Nonlinear models • Model A: all SNP have effects, but with a heavy-tailed prior distribution • Model B: some SNP have no effects, the rest are normally distributed • ModelAB: some SNP have no effect, the rest have a heavy-tailed prior
Genetic Progress • Assume 60% REL for net merit • Sires mostly 2 instead of 6 years old • Dams of sons mostly heifers with 60% REL instead of cows with phenotype and genotype (66% REL) • Progress could increase by >50% • 0.37 vs. 0.23 genetic SD per year • Reduce generation interval more than accuracy
Low Density SNP Chip • Choose 384 marker subset • SNP that best predict net merit • Parentage markers to be shared • Use for initial screening of cows • 40% benefit of full set for 10% cost • Could get larger benefits using haplotyping (Habier et al., 2008)
Conclusions • 100X more markers allows MAS across rather than within families • 10X more bulls allows estimation of much smaller QTL effects (HO) • Reliability increases by tracing actual genes inherited instead of expected average from parents