560 likes | 719 Views
Linkage analysis. Jan Hellemans. 6. Finding causal mutations. 2 opposing strategies sequence then select select then sequence Sequencing traditional Sanger sequencing only possible after selection Massively parallel sequencing possible prior to or after selection RNA sequencing
E N D
Linkage analysis Jan Hellemans 6
Finding causal mutations • 2 opposing strategies • sequence then select • select then sequence • Sequencing • traditional Sanger sequencing only possible after selection • Massively parallel sequencing possible prior to or after selection • RNA sequencing • exome sequencing • genome sequencing
Finding causal mutations • Selection • positional (prior to sequencing) • linkage analysis • GWAS • structural variations (e.g. microdeletions) • functional (prior to & after sequencing) • candidate genes selected based on known function or involvement in related disorders • filtering of variants based on functional predictions • overlap (after sequencing) • looking for genes / variants that occur in multiple independent patients • mostly a combination is used
Aims Interprete microsatellite results Add genotypes to pedigrees Create pedigree and genotype files Calculate and interprete LOD-scores Delineate linkage intervals Basic principles of linkage analysis Analyze other types of markers Association studies Learn how to work with specific pedigree programs
Preparations • Clearly define the phenotype • If not specific enough than you may analyze different disorders that can map to different genomic loci • LOD scores are additive • Find suitable families • larger is better • more patients is better • Collect genomic DNA from as much family members as possible • Determine the type of inheritance • Calculate the power to prove linkage with the available material (SLink – not part of this course)
Linkage analysis types • Directed linkage analysis • Evaluate linkage at a specific locus such as a candidate gene • Common approach: evaluate an intragenic, 5’ and 3’ markeroften microsattelites • Genome wide linkage analysis • Screen for linkage for markers spread across the entire genome • Microsatellites: ~400 markers spaced at about 10cM • SNP’s: 500k SNP array • Homozygosity mapping • Screen only affected individuals in inbred families • Select homozygous markers (typically SNP markers) • Very efficient technology • Fine mapping • Some linked markers are known, but the borders of the linkage interval still need to be defined
Exercise – Part 1 • 2 inbred families with a recessive disorder • With a homozygosity mapping based on 500k SNP arrays 2 candidate regions could be identified • Chromosome 4 • Patient 1 homozygous for • 6.052Mb - 14.488Mb • 21.008Mb – 37.477Mb • Patient 2 homozygous for • 11.186Mb – 37.219Mb • Task: find microsatellite markers to confirm linkage
Find additional flanking markers • Find physical position of marker in NCBI > UniSTS • NCBI map viewer: http://www.ncbi.nlm.nih.gov/mapview/ • Go to Homo sapiens and to the wright chromosome • Maps & options: show • DeCode, Généthon & Marshfield (genetic maps) • Genes • Set region: e.g. 2Mb up- and downstream of your marker • Click ‘Data as table view’ • Click on STS behind a marker to see its details • Select markers that • locate to only 1 genomic location • have a PCR product with an extended size rangeone size not polymorphic
Exercise – Part 1 > possible solution • Markers in 1st candidate region • D4S3017 (21.078Mb) • D4S3044 (25.189Mb) • D4S1618 (33.857Mb) • D4S3350 (33.857Mb) • D4S2988 (36.889Mb) • Markers in 2nd candidate region • D4S1582 (10.311Mb) • D4S2906 (12.321Mb) • D4S2944 (13.141Mb) • D4S1602 (14.059Mb) • D4S2960 (15.437Mb) • Order primers & analyze them on all family members
Microsatellites > basics • Repeats of short sequences (e.g. 2bp)NNNNAC(AC)nACNNNN • Number of repeats is variable (instable sequence) • Number of repeats determines the allele • Number of repeats corresponds to specific length of PCR product: • allel 1: NNNNACACACACACNNNN (5*AC 18bp) • allel 2: NNNNACACACACACACNNNN (6*AC 20bp) • allel 3: NNNNACACACACACACACNNNN (7*AC 22bp) • ... • Determine length to know the allele (sequencer)
Microsatellites > determine size • Use internal size standard (other color) 220bp 230bp 225bp
Microsatellites > heterozygotes 220bp 230bp 223bp 225bp
Microsatellites > stutter peaks • Repeats are difficult to copy polymerase slips • Some amplicons have 1 repeat lessa few even loose multiple repeats • Small repeats are more prone to slippage and show more pronounced stutter peaks • Largest product is the correct one • Distance between peaks = length of a repeat
Microsatellites > stutter peaks allelic peak 1st stutter peak 2nd stutter peak
Microsatellites > stutter peaks • Allelic peaks are the heighest • Stutter peaks are lower A1 A2
Microsatellites > +A peaks • Taq polymerase tends to add an extra A at the 3’ end • Variable degree of products with or without this extra A • Do not confuse with stutter peaks (only 1bp difference) allelic peak allelic peak + A 1st stutter peak 1st stutter peak + A 2nd stutter peak 2nd stutter peak + A
Microsatellites > mutliplex • Combine multiple markers in a single analysis ($$$) • Different size range • Multicolor • Commercial kits: e.g. 16 markers / lane
Genotyping pedigrees • Screen one or multiple markers for some or all family members • For every marker: • Make a list of all occuring allele sizes • Due to technical variation on sizing the same allele can have a slightly different size in different measurements (-0.4bp _ +0.4bp). Give all alleles within this range the same allele number • Add the allele numbers to the pedigree at the corresponding individual/marker combination • Find the wright phase • Advanced software like GeneMapper can generate tables with allele numbers for every sample / marker • Advanced pedigree programs like Progeny can store genotype information for family members • Verify inheritance
Exercise – Part 2 • Genotype 3 markers in all available individuals of 2 families • Pedigrees & microsatellite plots inExercisePart2-GenotypingData.pdf • Add allele numbers for the 3 markers to the pedigree • Interprete the genotyped pedigrees: linked?
Exercise – Part 2 > Conclusions • D4S1582 • Mendelian error can not be interpreted • D4S2944 • Linked • D4S3017 • Not-linked: unaffected individuals with the same genotype as a patient
EasyLinkage EasyLinkage = UI for linkage analysis http://genetik.charite.de/hoffmann/easyLINKAGE/index.html#start Bioinformatics. 2005 Feb 1;21(3):405-7 PMID: 15347576 Bioinformatics. 2005 Sep 1;21(17):3565-7 PMID: 16014370 Interface for many linkage analysis programs Input Pedigree file (linkage format) Genotype file(s) Marker information (already provided for popular markers) Settings
Pedigree file Naming requirements for EasyLinkage:p_xxx.pro e.g. p_SMMD.pro Format: Tab delimited text file 1 individual per row Columns: 1 family ID 2 person ID 3 father ID 4 mother ID 5 sex (1=male, 2=female, 0=unknown) 6 affection status (1=unaffected, 2=affected, 0=unknown) 7 DNA availability (optional, relevant for power calculations) 8 liability class (to be provided if multiple liability classes are used)
Genotype files Person ID’s have to match exactly with those provided in the pedigree file Naming requirements for EasyLinkage:MarkerName_xxx.abi e.g. D1S1609_SMMD.abi Format: Tab delimited text file 1 individual per row Columns (for microsatellite based analysis): 1 marker (same as in file name and matching a marker in an available marker set) 2 custom information (content doesn’t matter, but column must be present) 3 individual ID (match person ID in pedigree file) 4 & 5 genotypes for 2 alleles (unknown=0)
Marker information Contains information on the chromosome and position of every marker Already available for a number of commercial SNP-arrays and for the microsatellite markers from Genethon Marshfield DeCode Custom marker sets can be created (see manual)
EasyLinkage settings Choose a program: FastLink Parametric, single-point SuperLink Parametric, single-/multipoint SPLink Nonparametric, single-point Genehunter Nonpara-/parametric, single-/multipoint Genehunter Plus Nonpara-/parametric, single-/multipoint Genehunter MOD Nonpara-/parametric, single-/multipoint Genehunter Imprinting Nonpara-/parametric, single-/multipoint GeneHunter TwoLocus Parametric, two-locus, single-/multipoint Merlin Nonpara-/parametric, single-/multipoint SimWalk Nonparametric, single-/multipoint Allegro Nonpara-/parametric, single-/multipoint & simulation, single-/multi-point PedCheck Mendelian error check FastSLink Simulation, single-/multi-point
EasyLinkage settings Parametric <-> non-parametric Single point <-> multipoint Frequency of the disease allele Penetrance vectors (wt/wt, wt/mt, mt/mt) Standard dominant: 0 1 1 Standard recessive: 0 0 1 Reduced penetrance: replace 1 by penetrance (e.g. 0.9) Phenocopy: replace 0 by percentage of phenocopy (e.g. 0.1) Example: 0.01 0.9 0.991% chance to show a similar phenotype despite a normal genotype90% chance to show the phenotype when 1 mutant allele (dominant with incomplete penetrance)99% likelihood to present with the phenotype if both alleles are mutant
Evaluate calculated LOD-scores Maximum LOD-scores can be seen in EasyLinkage Details about LOD-scores at different recombination fractions can be found in text files generated by EasyLinkage process in Excel (generate graphs, ...) Standard rules for LOD-scores >3 significant linkage 2<LOD<3 suggestive linkage -2<LOD<2 uninformative <-2 significant absence of linkage