imputation n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Imputation PowerPoint Presentation
Download Presentation
Imputation

Loading in 2 Seconds...

play fullscreen
1 / 41

Imputation - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Imputation. 1 00 0 1111 0 1 22 00 2 00 1 2 0 2 1 2 111 0 1111 2 1

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Imputation' - kamala


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
imputation

Imputation

100

011110

1220020012

02121110111121

10111100112110002012200222011112021012002111221100211120220

0011110010110110102200110022011011200201101020222121122101220

2010011100011220221222112021120120201002022020002122

21122011101210011121110211211002010210002200020221

201000201100002202211022112101121110122220012011

12220020002002020201222110022222220022121111220

2100211112001101110112002022200011120110102121

1121211102022100211201211001111102111211020002

122000101101110202200221110102011121111011221

202102102121101102212200121101121101202201100

01 22200210021100011100211021101110002220021121

22121211000222010200222212001221121210111011

11 2002011020200122222200211102200112211122 10101121211202111

2112 12112121 10120

1021 01 11220012 10021

0022 11

12 1 021

1 2 12001

0 12

imputation1
Imputation
  • Based on splitting the genotype into individual chromosomes (maternal and paternal contributions)
  • Missing SNPs assigned by tracking inheritance from ancestors and descendents
  • Imputed dams increase predictor population
  • Genotypes from all chips merged by imputing SNPs not present
terms
Terms
  • Genotype – Alleles on both chromosomes for all markers
  • Allele representation – A,B; A,C,T,G
  • Genotype representation – number of A’s; 0,1,2,5 (missing)
  • Imputation – Determination of an allele from alleles of other markers and animals
  • Phasing – Separating a genotype into individual chromosomes and possibly assigning maternal or paternal origin
genotype for elevation
Genotype for Elevation
  • Chromosome 1

10001112200200121110111121111011110011211000201220022201111202101200211122110021112001111001011011010220011002201101120020110102022212112210201001110001122022122211202112012020100202202000021100011202011221112111022011110000212202000221012020002211220111012100111211102112110020102100022000220100020110000220221102211210112111012222001211212220020002002020201222110022222220022121111210021111200110111011200202220001112011010211121211102022100211201211001111102111211021112200010110111020220022111010201112111101120210210212110110221220012110112110120220110022200210021100011100211021101110002220020221212110002220102002222121221121112002011020200122222211221202121121011001211011020022000200100200011110110012110212121112010101212022101010111110211021122111111212111210110120011111021111011111220121012121101022202021211222120222002121210121210201100111222121101

x chromosome
X chromosome
  • Bull

202220200002022220002020222020202

  • Cow

1201201212222010111022210210212022

findhap
findhap
  • Developed by Paul VanRaden
  • Divides chromosomes into segments
  • Allows for successively shorter segments, typically 3 runs
    • Long segments lock in identical by descent
    • Shorter segments fill in missing SNPs
  • Separates genotype into maternal and paternal contribution, haplotypes (phasing)
  • Builds haplotype library sequenced by frequency
findhap characteristics
findhap characteristics
  • Population haplotyping
    • Divides chromosomes into segments
    • Lists haplotypes by genotype match
    • Similar to FastPhase, Impute, or long range phasing
  • Pedigree haplotyping
    • Detects crossover; fixes noninheritance
    • Imputes nongenotyped ancestors
recent program revisions
Recent program revisions
  • Improved imputation and reliability
  • Changes since January 2010
    • Use known haplotype if 2nd is unknown
    • Use current instead of base frequency
    • Combine parent haplotypes if crossover is detected
    • Begin search with parent or grandparent haplotypes
    • Store 2 most popular progeny haplotypes
  • Decreased computing time by using previous haplotype library
population haplotyping
Population haplotyping
  • Put 1st genotype into haplotype list
  • Check next genotype against list
    • Do any homozygous loci conflict?
      • If haplotype conflicts, continue search
      • If match, fill any unknown SNP with homozygote
      • 2nd haplotype = genotype minus 1st haplotype
      • Search for 2nd haplotype in rest of list
    • If no match in list, add to end of list
  • Sort list to put frequent haplotypes 1st
coding of alleles and segments
Coding of alleles and segments
  • Genotypes
    • 0 = BB, 1 = AB or BA, 2 = AA,
    • 3 = B_, 4 = A_, 5 = __ (missing)
    • Allele frequency used for missing
  • Haplotypes
    • 0 = B, 1 = not known, 2 = A
  • Segment inheritance (example)
    • Son has haplotype numbers 5 and 8
    • Sire has haplotype numbers 8 and 21
    • Son got haplotype number 5 from dam
most frequent haplotypes
Most frequent haplotypes
  • 1st segment of chromosome 15
  • For efficiency, store haplotypes just once
  • Most frequent Holstein haplotype had 4,316 copies (0.0516 41,822 animals 2 chromosomes each)

1 5.16% 022222222020020022002020200020000200202000022022222202220

2 4.37% 022020220202200020022022200002200200200000200222200002202

3 4.36% 022020022202200200022020220000220202200002200222200202220

4 3.67% 022020222020222002022022202020000202220000200002020002002

5 3.66% 022222222020222022020200220000020222202000002020220002022

6 3.65% 022020022202200200022020220000220202200002200222200202222

7 3.51% 022002222020222022022020220200222002200000002022220002220

8 3.42% 022002222002220022022020220020200202202000202020020002020

9 3.24% 022222222020200000022020220020200202202000202020020002020

10 3.22% 022002222002220022002020002220000202200000202022020202220

check new genotype against list
Check new genotype against list
  • 1st segment of chromosome 15
    • Search for 1st haplotype that matches genotype

022112222011221022021110220010110212202000102020120002021

    • Get 2nd haplotype by removing 1st from genotype

022002222002220022022020220020200202202000202020020002020

5.16% 022222222020020022002020200020000200202000022022222202220

4.37% 022020220202200020022022200002200200200000200222200002202

4.36% 022020022202200200022020220000220202200002200222200202220

3.67% 022020222020222002022022202020000202220000200002020002002

3.66% 022222222020222022020200220000020222202000002020220002022

3.65%022020022202200200022020220000220202200002200222200202222

3.51% 022002222020222022022020220200222002200000002022220002220

3.42%022002222002220022022020220020200202202000202020020002020

3.24% 022222222020200000022020220020200202202000202020020002020

3.22%022002222002220022002020002220000202200000202022020202220

recessive defect discovery
Recessive defect discovery
  • Check for homozygous haplotypes
    • Most haplotype blocks ~5 Mbp long
    • 7–90 expected, but 0 observed
  • 5 of top 11 haplotypes confirmed as lethal
  • Investigation of 936–52,449 carrier sirecarrier MGS fertility records found 3.0–3.7% lower conception rates
traditional evaluations 3x year
Traditional evaluations 3X/year
  • Yield
    • Milk, fat, protein, component percentages
  • Type
    • Stature, udder characteristics, feet and legs
  • Calving
    • Calving ease, stillbirth rate
  • Functional
    • Somatic cell score, productive life, fertility
genomic prediction of progeny test
Genomic prediction of progeny test

0

1

2

3

4

5

  • Select parents, transfer embryos to recipients

Calves born from DNA-selected parents

Bull receives progeny test

  • Calves born and DNA tested

Reduce generation interval from 5 to 2 yr

benefit of genomics
Benefit of genomics
  • Determine value of bull at birth
  • Increase selection accuracy
  • Reduce generation interval
  • Increase selection intensity
  • Increase rate of genetic gain
genomic evaluation program
Genomic evaluation program
  • Identify animals to genotype
  • Send sample to genotyping laboratory
  • Genotype sample
  • Send genotype to evaluation center
  • Calculate genomic evaluation
  • Release monthly evaluation
genomic data flow
Genomic data flow

DHI herd

DNA samples

DNA samples

genomic

evaluations

DNA samples

DNA laboratory

AI organization, breed association

genotypes

nominations,

pedigree data

genotype

quality reports

genomic

evaluations

genotypes

CDCB

steps to prepare genotypes
Steps to prepare genotypes
  • Nominate animal for genotyping
  • Collect blood, hair, semen, nasal swab, or ear punch
    • Blood may not be suitable for twins
  • Extract DNA at laboratory
  • Prepare DNA and apply to beadchip
  • Do amplification and hybridization, 3-day process
  • Read red/green intensities from chip and call genotypes from clusters
what can go wrong
What can go wrong
  • Inadequate DNA quality or quantity from sample
  • Genotype with many SNPs that cannot be determined (90% call rate required)
  • Parent-progeny conflicts
    • Pedigree error
    • Sample ID error (switched samples)
    • Laboratory error
    • Parent-progeny relationship detected not in pedigree
parentage validation and discovery
Parentage validation and discovery
  • Parent-progeny conflicts detected
    • Animal checked against all other genotypes
    • Conflict reported to breeds and requesters
    • Correct sire usually detected
  • MGS checked
    • 1 SNP at a time
    • Haplotype checking more accurate
  • Breeds moving to accept SNPs in place of microsatellites
parent progeny conflicts
Parent-progeny conflicts

Sire

Conflicts = 0

*Tests = 10

Conflict % = 0%

MGS

Conflicts = 3

*Tests = 10

Conflict % = 30.0%

Conflict %

Relationship

parent progeny conflicts1

For animal

Pedigree wrong

Genotype unreliable (3K)

For SNP

SNP unreliable

Clustering needs adjustment

Parent 10212002101201211001020100100

Progeny 10202010100200221001120120220

Parent-progeny conflicts
detecting unreliable genotypes
Detecting unreliable genotypes

Unreliable genotype (reject)

Reject

Accept

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2.0

3.6

2.4

2.8

3.2

Conflicts (%)

mgs detection
MGS detection
  • SNP conflict method (SNP)
    • Check if animal and MGS have opposite homozygotes(duo test)
    • If sire is genotyped, some heterozygous SNP can be checked (trio test)
  • Common haplotype method (HAP)
    • After imputation of all loci, determine maternal contribution by removing paternal haplotype
    • Count maternal haplotypes in common with MGS
    • Remove haplotypes from MGS and check remaining against maternal great-grandsire (MGGS)
results by breed
Results by breed

*50K genotyped animals only

lab qc
Lab QC
  • Each SNP evaluated for
    • Call rate
    • Portion heterozygous
    • Parent-progeny conflicts
  • Clustering investigated if SNP exceeds limits
  • Number of failing SNPs indicates genotype quality
  • Target <10 SNPs in each category
automated qc reporting
Automated QC reporting

6160 Genotypes Processed from LAB2013021811

PASS/FAIL,Count,Description

PASS,1,Parent Progeny Conflict SNP >2%

PASS,5,Low Call Rate SNP >10%

PASS,0,HWE SNP

PASS,0,Chips w/ >20 Conflicts

PASS,0.3,No Nomination %

PASS,0,Genotype Submitted with No Sample Sheet Row

reliability of holstein predictions
Reliability of Holstein predictions

*2011 deregressed value – 2007 genomic evaluation

ways to increase accuracy
Ways to increase accuracy
  • Automatic addition of traditional evaluations of genotyped bulls when are 5 yr old
  • Possible genotyping of 10,000 bulls with semen in repository
  • Collaboration with other countries
  • Use of more SNPs from HD chips
  • Full sequencing – identify causative mutations
application to more traits
Application to more traits
  • Animal’s genotype is good for all traits
  • Traditional evaluations required for accurate estimates of SNP effects
  • Traditional evaluations not currently available for heat tolerance or feed efficiency
  • Research populations could provide data for traits that are expensive to measure
  • Will resulting evaluations work in target population?
impact on producers
Impact on producers
  • Young-bull evaluations with accuracy of early 1st­crop evaluations
  • AI organizations marketing genomically evaluated young bulls
  • Genotype usually required to be a bull dam
  • Rate of genetic improvement likely to increase by up to 50%
  • AI organizations reducing progeny-test programs
why genomics works for dairy cattle
Why genomics works for dairy cattle
  • Extensive historical data available
  • Well developed genetic evaluation program
  • Widespread use of AI sires
  • Progeny-test programs
  • High-value animals worth the cost of genotyping
  • Long generation interval that can be reduced substantially by genomics
council on dairy cattle breeding cdcb
Council on Dairy Cattle Breeding – CDCB
  • CDCB assuming responsibility for receiving data and computing and delivering U.S. evaluations
  • USDA will continue research and development to improve evaluation system
  • CDCB and USDA employees located at USDA’s Beltsville Agricultural Research Center in Beltsville, Maryland