Linkage analysis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 56

Linkage analysis PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

Linkage analysis. Jan Hellemans. 6. Finding causal mutations. 2 opposing strategies sequence then select select then sequence Sequencing traditional Sanger sequencing only possible after selection Massively parallel sequencing possible prior to or after selection RNA sequencing

Download Presentation

Linkage analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Linkage analysis

Linkage analysis

Jan Hellemans

6


Finding causal mutations

Finding causal mutations

  • 2 opposing strategies

    • sequence then select

    • select then sequence

  • Sequencing

    • traditional Sanger sequencing only possible after selection

    • Massively parallel sequencing possible prior to or after selection

      • RNA sequencing

      • exome sequencing

      • genome sequencing


Finding causal mutations1

Finding causal mutations

  • Selection

    • positional (prior to sequencing)

      • linkage analysis

      • GWAS

      • structural variations (e.g. microdeletions)

    • functional (prior to & after sequencing)

      • candidate genes selected based on known function or involvement in related disorders

      • filtering of variants based on functional predictions

    • overlap (after sequencing)

      • looking for genes / variants that occur in multiple independent patients

    • mostly a combination is used


Exome sequencing

exome sequencing


Linkage analysis

Aims

Interprete microsatellite results

Add genotypes to pedigrees

Create pedigree and genotype files

Calculate and interprete LOD-scores

Delineate linkage intervals

Basic principles of linkage analysis

Analyze other types of markers

Association studies

Learn how to work with specific pedigree programs


Starting linkage analysis

Starting linkage analysis


Preparations

Preparations

  • Clearly define the phenotype

    • If not specific enough than you may analyze different disorders that can map to different genomic loci

    • LOD scores are additive

  • Find suitable families

    • larger is better

    • more patients is better

  • Collect genomic DNA from as much family members as possible

  • Determine the type of inheritance

  • Calculate the power to prove linkage with the available material (SLink – not part of this course)


Linkage analysis types

Linkage analysis types

  • Directed linkage analysis

    • Evaluate linkage at a specific locus such as a candidate gene

    • Common approach: evaluate an intragenic, 5’ and 3’ markeroften microsattelites

  • Genome wide linkage analysis

    • Screen for linkage for markers spread across the entire genome

    • Microsatellites: ~400 markers spaced at about 10cM

    • SNP’s: 500k SNP array

  • Homozygosity mapping

    • Screen only affected individuals in inbred families

    • Select homozygous markers (typically SNP markers)

    • Very efficient technology

  • Fine mapping

    • Some linked markers are known, but the borders of the linkage interval still need to be defined


Exercise part 1

Exercise – Part 1

  • 2 inbred families with a recessive disorder

  • With a homozygosity mapping based on 500k SNP arrays 2 candidate regions could be identified

  • Chromosome 4

  • Patient 1 homozygous for

    • 6.052Mb - 14.488Mb

    • 21.008Mb – 37.477Mb

  • Patient 2 homozygous for

    • 11.186Mb – 37.219Mb

  • Task: find microsatellite markers to confirm linkage


  • Find additional flanking markers

    Find additional flanking markers

    • Find physical position of marker in NCBI > UniSTS

    • NCBI map viewer: http://www.ncbi.nlm.nih.gov/mapview/

    • Go to Homo sapiens and to the wright chromosome

    • Maps & options: show

      • DeCode, Généthon & Marshfield (genetic maps)

      • Genes

    • Set region: e.g. 2Mb up- and downstream of your marker

    • Click ‘Data as table view’

    • Click on STS behind a marker to see its details

    • Select markers that

      • locate to only 1 genomic location

      • have a PCR product with an extended size rangeone size  not polymorphic


    Http www ncbi nlm nih gov projects mapview

    http://www.ncbi.nlm.nih.gov/projects/mapview


    Http www ncbi nlm nih gov projects mapview1

    http://www.ncbi.nlm.nih.gov/projects/mapview


    Http www ncbi nlm nih gov projects mapview2

    http://www.ncbi.nlm.nih.gov/projects/mapview


    Exercise part 1 possible solution

    Exercise – Part 1 > possible solution

    • Markers in 1st candidate region

      • D4S3017 (21.078Mb)

      • D4S3044 (25.189Mb)

      • D4S1618 (33.857Mb)

      • D4S3350 (33.857Mb)

      • D4S2988 (36.889Mb)

    • Markers in 2nd candidate region

      • D4S1582 (10.311Mb)

      • D4S2906 (12.321Mb)

      • D4S2944 (13.141Mb)

      • D4S1602 (14.059Mb)

      • D4S2960 (15.437Mb)

    •  Order primers & analyze them on all family members


    Analyzing microsatellite data

    Analyzing microsatellite data


    Microsatellites basics

    Microsatellites > basics

    • Repeats of short sequences (e.g. 2bp)NNNNAC(AC)nACNNNN

    • Number of repeats is variable (instable sequence)

    • Number of repeats determines the allele

    • Number of repeats corresponds to specific length of PCR product:

      • allel 1: NNNNACACACACACNNNN(5*AC  18bp)

      • allel 2: NNNNACACACACACACNNNN(6*AC  20bp)

      • allel 3: NNNNACACACACACACACNNNN(7*AC  22bp)

      • ...

    • Determine length to know the allele (sequencer)


    Microsatellites basics1

    Microsatellites > basics


    Microsatellites determine size

    Microsatellites > determine size

    • Use internal size standard (other color)

    220bp

    230bp

    225bp


    Microsatellites heterozygotes

    Microsatellites > heterozygotes

    220bp

    230bp

    223bp

    225bp


    Microsatellites stutter peaks

    Microsatellites > stutter peaks

    • Repeats are difficult to copy  polymerase slips

    • Some amplicons have 1 repeat lessa few even loose multiple repeats

    • Small repeats are more prone to slippage and show more pronounced stutter peaks

    • Largest product is the correct one

    • Distance between peaks = length of a repeat


    Microsatellites stutter peaks1

    Microsatellites > stutter peaks

    allelic peak

    1st stutter peak

    2nd stutter peak


    Microsatellites stutter peaks2

    Microsatellites > stutter peaks

    • Allelic peaks are the heighest

    • Stutter peaks are lower

    A1

    A2


    Microsatellites stutter peaks3

    Microsatellites > stutter peaks

    A1

    A2


    Microsatellites a peaks

    Microsatellites > +A peaks

    • Taq polymerase tends to add an extra A at the 3’ end

    • Variable degree of products with or without this extra A

    • Do not confuse with stutter peaks (only 1bp difference)

    allelic peak

    allelic peak + A

    1st stutter peak

    1st stutter peak + A

    2nd stutter peak

    2nd stutter peak + A


    Microsatellites complex plots stutter a

    Microsatellites > complex plots (stutter & +A)

    A1

    A2


    Microsatellites mutliplex

    Microsatellites > mutliplex

    • Combine multiple markers in a single analysis ($$$)

      • Different size range

      • Multicolor

      • Commercial kits: e.g. 16 markers / lane


    Microsatellite plots examples

    Microsatellite plots examples


    Genotyping pedigrees

    Genotyping pedigrees


    Genotyping pedigrees1

    Genotyping pedigrees

    • Screen one or multiple markers for some or all family members

    • For every marker:

      • Make a list of all occuring allele sizes

      • Due to technical variation on sizing the same allele can have a slightly different size in different measurements (-0.4bp _ +0.4bp). Give all alleles within this range the same allele number

      • Add the allele numbers to the pedigree at the corresponding individual/marker combination

      • Find the wright phase

    • Advanced software like GeneMapper can generate tables with allele numbers for every sample / marker

    • Advanced pedigree programs like Progeny can store genotype information for family members

    • Verify inheritance


    Exercise part 2

    Exercise – Part 2

    • Genotype 3 markers in all available individuals of 2 families

    • Pedigrees & microsatellite plots inExercisePart2-GenotypingData.pdf

    • Add allele numbers for the 3 markers to the pedigree

    • Interprete the genotyped pedigrees: linked?


    Family 1

    Family 1


    Family 2

    Family 2


    Exercise part 2 conclusions

    Exercise – Part 2 > Conclusions

    • D4S1582

      • Mendelian error  can not be interpreted

    • D4S2944

      • Linked

    • D4S3017

      • Not-linked: unaffected individuals with the same genotype as a patient


    Calculate lod scores

    Calculate LOD scores


    Easylinkage

    EasyLinkage

    EasyLinkage = UI for linkage analysis

    http://genetik.charite.de/hoffmann/easyLINKAGE/index.html#start

    Bioinformatics. 2005 Feb 1;21(3):405-7PMID: 15347576

    Bioinformatics. 2005 Sep 1;21(17):3565-7PMID: 16014370

    Interface for many linkage analysis programs

    Input

    Pedigree file (linkage format)

    Genotype file(s)

    Marker information (already provided for popular markers)

    Settings


    Pedigree file

    Pedigree file

    Naming requirements for EasyLinkage:p_xxx.pro e.g. p_SMMD.pro

    Format:

    Tab delimited text file

    1 individual per row

    Columns:

    1  family ID

    2  person ID

    3  father ID

    4  mother ID

    5  sex (1=male, 2=female, 0=unknown)

    6  affection status (1=unaffected, 2=affected, 0=unknown)

    7  DNA availability (optional, relevant for power calculations)

    8  liability class (to be provided if multiple liability classes are used)


    Genotype files

    Genotype files

    Person ID’s have to match exactly with those provided in the pedigree file

    Naming requirements for EasyLinkage:MarkerName_xxx.abi e.g. D1S1609_SMMD.abi

    Format:

    Tab delimited text file

    1 individual per row

    Columns (for microsatellite based analysis):

    1  marker (same as in file name and matching a marker in an available marker set)

    2  custom information (content doesn’t matter, but column must be present)

    3  individual ID (match person ID in pedigree file)

    4 & 5  genotypes for 2 alleles (unknown=0)


    Marker information

    Marker information

    Contains information on the chromosome and position of every marker

    Already available for a number of commercial SNP-arrays and for the microsatellite markers from

    Genethon

    Marshfield

    DeCode

    Custom marker sets can be created (see manual)


    Easylinkage settings

    EasyLinkage settings

    Choose a program:

    FastLink  Parametric, single-point

    SuperLink  Parametric, single-/multipoint

    SPLink  Nonparametric, single-point

    Genehunter  Nonpara-/parametric, single-/multipoint

    Genehunter Plus  Nonpara-/parametric, single-/multipoint

    Genehunter MOD  Nonpara-/parametric, single-/multipoint

    Genehunter Imprinting  Nonpara-/parametric, single-/multipoint

    GeneHunter TwoLocus  Parametric, two-locus, single-/multipoint

    Merlin  Nonpara-/parametric, single-/multipoint

    SimWalk  Nonparametric, single-/multipoint

    Allegro  Nonpara-/parametric, single-/multipoint & simulation, single-/multi-point

    PedCheck  Mendelian error check

    FastSLink  Simulation, single-/multi-point


    Easylinkage settings1

    EasyLinkage settings

    Parametric <-> non-parametric

    Single point <-> multipoint

    Frequency of the disease allele

    Penetrance vectors (wt/wt, wt/mt, mt/mt)

    Standard dominant: 0 1 1

    Standard recessive: 0 0 1

    Reduced penetrance: replace 1 by penetrance (e.g. 0.9)

    Phenocopy: replace 0 by percentage of phenocopy (e.g. 0.1)

    Example: 0.01 0.9 0.991% chance to show a similar phenotype despite a normal genotype90% chance to show the phenotype when 1 mutant allele (dominant with incomplete penetrance)99% likelihood to present with the phenotype if both alleles are mutant


    Evaluate calculated lod scores

    Evaluate calculated LOD-scores

    Maximum LOD-scores can be seen in EasyLinkage

    Details about LOD-scores at different recombination fractions can be found in text files generated by EasyLinkage  process in Excel (generate graphs, ...)

    Standard rules for LOD-scores

    >3  significant linkage

    2<LOD<3  suggestive linkage

    -2<LOD<2  uninformative

    <-2  significant absence of linkage


    Interpreting lod plots

    Interpreting LOD plots


    Exercise part 3

    Exercise – Part 3

    Generate one pedigree file containing all family members of both families (use Global ID’s)

    Generate a genotype file for each of the tested markers

    Run SuperLink analysis with the right settings

    Evaluate results


    Exercise part 3 results

    Exercise – Part 3 > Results


    Strengthen the evidence

    Strengthen the evidence

    • Analyze more family members

    • Analyze more families

    • Analyze flanking markers

      • Look for more informative markers that result in higher LOD-scores

      • A series of flanking markers allows for multipoint linkage analysis

      • A series of linked markers gives more confidence (subjective)

      • Flanking markers can also be used to fine-map the linkage interval


    Determine the linkage interval

    NL

    NL

    NL

    ?

    ?

    L

    L

    ...

    candidate

    region

    L

    L

    L

    ?

    ?

    NL

    NL

    Determine the linkage interval


    Exercise 2 find the linkage interval

    Exercise 2: find the linkage interval


    Post linkage

    Post linkage

    • Create a list of all the genes within the linkage interval

      • NCBI map viewer

      • UCSC (also for non-coding RNA’s)

    • Evaluate known gene functions for relevance to the investigated phenotype

    • Sequence genes

      • Start with those that seem the most relevant to the disorder

      • Start with the coding regions

      • Screen the entire region with capture sequencing

    • Finding a mutation and proving its causality is the ultimate proof


  • Login