Efficient Estimation of Breeding Values from Dense Genomic Data

Efficient Estimation of Breeding Values from Dense Genomic Data

Genomic Calculations • Genotypes soon available from BFGL: • 50,000 SNPs / animal • 3,000 animals, many more possible • Need efficient computing algorithms • Traditional PTAs available from AIPL: • PTAs combine phenotypes and pedigree • SNP effects evaluated in second step using deregressed PTAs weighted by reliability

Genomic Computer Programs • Simulate SNPs and QTLs • Compare SNP numbers, size of QTLs • Calculate genomic EBVs • Use selection index, G instead of A • Use iteration on data for SNP effects • Form haplotypes from genotypes? • Not tested yet, SNP regression used

Simulation Program • Save memory by processing each chromosome separately • 3,000 Holstein bulls to genotype • 17,000 ancestors in pedigree file • 1 billion (20,000 x 50,000 SNPs) genotypes simulated per replicate • Only 150 million (3,000 x 50,000) genotypes stored for evaluation

Linear Estimates using Markers • Selection index equations for EBV • u^ = Cov(u,y) Var(y)-1 (y – Xb) • u^ = Z Z’ [Z Z’ + R]-1 (y – Xb) • R has diagonals = (1 / Reliability) - 1 • BLUP equations for marker effects, sum to get EBV • u^ = Z [Z’R-1Z + I k]-1 Z’R-1(y – Xb) • k = var(u) / var(m)

Non-linear vs Linear Models

Marker Effect Prior DistributionNonlinear Model

Iteration on Data • Simple trick to reduce time from quadratic to linear with # SNPs • Sum coefficients x solutions once • Sum – diagonal =  off-diagonals • Janss and de Jong, 1999 conference • Rediscovered by Legarra and Misztal • Elements of Z are –p and (1 – p), where p is frequency of 2nd allele

Computer Memory • Inversion including G matrix • Animals x markers to hold genotypes • Animals2 to hold elements of G • <1 Gbyte for 50,000 SNPs, 3000 bulls • Iteration on genotype data • Markers +animals • <.1 Gbyte for 50,000 SNPs, 3000 bulls • Little memory required for either

Computing Times • Inversion including G matrix • Animals2 x markers to form G matrix • Animals3 to invert selection index • 10 hours for 3000 bulls, 50,000 SNPs • Iteration on genotype data • Markers x animals x iterations • 16 hours for 1000 iterations • .997 correlation with inversion

Convergencewith iteration on data • Jacobi iteration • Use previous round coefficients x solutions • Adaptive under-relaxation • Increase relax if convergence improving • Decrease relax (each round) if diverging • Solution convergence reasonable • SD of change < .0001 after 350 rounds • SD of change < .000001 after 1700 rounds

Potential ResultsSimulation of 50,000 SNPs, 100 QTLs Higher REL if major QTLs exist or >3000 bulls genotyped, lower if more loci (>100) affect trait Reliability = accuracy2

Reliability from Genotyping • Daughter equivalents • DETotal = DEPA + DEProg + DEYD + DEG • DEG is additional DE from genotype • REL = DEtotal / (DETotal + k) • Gains in reliability • DEG could be about 15 for Net Merit • More for traits with low heritability • Less for traits with high heritability

Conclusions • Predictions from 50,000 SNPs using: • Selection index equations, or • Iteration on genotype data • Predictions correlated by up to .9999 • Linear and nonlinear costs OK • Convergence within 200 to 2500 rounds • Nonlinear regression improved reliabilities • Real data predictions available soon

Efficient Estimation of Breeding Values from Dense Genomic Data

Efficient Estimation of Breeding Values from Dense Genomic Data

Presentation Transcript

Dense Motion Estimation

CHAPTER 9 Estimation from Sample Data

Towards genomic breeding values for sheep in South Africa

Genomic selection and systems biology – lessons from dairy cattle breeding

Channel Estimation from Data

Dense Motion Estimation

Estimation of Allele Frequencies from Quantitative Trait Data

Visualization of genomic data

Visualization of genomic data

Rainfall Estimation from Satellite Data

Extracting Typed Values from XML Data

Estimation of Aerosol Properties from CHRIS-PROBA Data

Mapping Transcription Mechanisms from Multimodal Genomic Data

Efficient Estimation of Breeding Values from Dense Genomic Data

Predicting Survival Time From Genomic Data

Bioinformatic Analysis of Chromatin Genomic Data

Estimation of T e from ECE data Estimation of n e from reflectometry data

Building biological networks from diverse genomic data

IDENTIFICATION OF GENES IN GENOMIC DATA

Efficient Estimation of Residual Trajectory Deviations from SAR data

Breeding Value Estimation