1 / 35

Genetic analysis of human disorders

Genetic analysis of human disorders. Tom Scerri Basic ideas in genetics and linkage analysis. Why do we think genes cause disease?. Family aggregation: Family pedigrees with many affected individuals. Caveat: shared environmental influences may causes familial aggregation:

Download Presentation

Genetic analysis of human disorders

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genetic analysis ofhuman disorders Tom Scerri Basic ideas in genetics and linkage analysis

  2. Why do we think genes cause disease? • Family aggregation: • Family pedigrees with many affected individuals. • Caveat: • shared environmental influences may causes familial aggregation: • e.g. living close to an area of pollution • Large twin based studies.

  3. Large family pedigreeExample 1: Norwegian family Adapted from Fagerheim et al. (1999; Journal of Medical Genetics)

  4. Unaffected Unknown Dyslexic Large family pedigreeExample 2: Finnish family Adapted from Hannula-Jouppi et al. (2005; PLoS Genetics)

  5. Unaffected Unknown Dyslexic ? ? Large family pedigreeExample 3: Dutch family Adapted from Kovel et al. (2006; Journal of Medical Genetics)

  6. mother father mother father child 1 child 2 child 3 child 4 Identical twins Non-Identical twins Twin Studies • Twins share 100% of their environment. • Identical twins also share 100% of their genes. • Non-identical twins share on average 50% of their genes.

  7. ? ? Twin Studies Identical twins Non-Identical twins 68% concordance 38% concordance

  8. MENDELIAN (Single gene) MULTIFACTORIAL (many genes + environment) Complete penetrance Incomplete penetrance Environmental factors Genetic heterogeneity

  9. Phenotypes (or traits) • Categorical or dichotomous • yes/no or present/absent or affected/unaffected • diseases (e.g. cystic fibrosis) • disorders (e.g. dyslexia) • traits (e.g. taste perception of phenylthiocarbamide (PTC), flower colour) • often monogenic • Quantitative or continuous • range of values, often normally distributed • Height, weight • Intelligence, or reading ability • often polygenic • small effects from multiple genes • often modified by environmental influences • e.g. diet, education

  10. Complete penetrance Partial penetrance Polydactyly in cats Danforth, Journal of Heredity (1947) Penetrance versus Expressivity • Penetrance • The proportion of people carrying a disease causing allele that unaffected • Can be complete of partial • Expressivity • The extent to which an allele “displays or expresses” its affect • May depend on other genes and/or the environment

  11. Carriers of high risk variant Carriers of low risk (‘normal’) variant Phenotype X Phenotype X Gene A Gene B Gene C Gene D Phenotype Y Phenotype Y Phenotype Y Phenotype Y Gene A Gene B Gene C Gene D Phenotype Z Phenocopies, Heterogeneity and Oligogenicity • Phenocopy • Affected individuals carrying ‘normal’ variant of gene • Heterogeneity • Variants in distinct genes that may result in the same phenotype (e.g. in different families). • Oligogenicity • Variant from several distinct genes acting together to create a phenotype

  12. Familiarity versus Heritability • Familiarity • the extent to which a ‘trait’ passes down through generations • genetic • environmental • Heritability (H2) • the proportion of phenotypic variation that is attributable to genetic variation • Phenotype (P) = Genotype (G) + Environment (E) • Var(P) = Var (G) + Var (E) (simplistic model) • H2 = Var(G) / Var(P)

  13. a human cell a human The Human Genome 23 from mother 23 from father • 23 pairs of chromosomes: • each made of DNA • contains of hundreds of genes

  14. female male Genetic Inheritance - simplistic model I mother father child 1 child 2 child 3 correct

  15. female male Genetic Inheritance - simplistic model II mother father child 1 child 2 child 3 wrong

  16. female male Genetic Inheritance - recombination mother father child 1 child 2 child 3 correct

  17. Family 1 Family 2 Family 3 mother father mother father mother father child 1 child 2 child 1 child 2 child 1 child 2 Linkage Analysis • Principle: Identify regions of genome co-segregating with disease in affected individuals. .... Family 300 Linked to disease

  18. Types of Linkage Analysis • Parametric linkage analysis • Must define precise model of inheritance • e.g. dominant, recessive • gene allele frequency • penetrance of alleles • Suitable for Mendelian phenotypes • Non-parametric linkage analysis • Model free • Suitable for complex disorders • Require larger samples for comparable power • Look for chromosomal regions shared by affected individuals

  19. mother father a1a2 a1a3 a1a3 a1a2 mother father a1a2 a2a3 a1a3 a1a2 IBS versus IBD • Identity by state (IBS) • Two alleles that appear the same (e.g. a1) • Not necessarily from the same ancestor • Identity by descent (IBD) • Two alleles that appear the same (e.g. a1) • They must be IBS • Derived from the same ancestor • Requires parental information • Used for affected sibling pair (ASP) analysis

  20. Assume affection status of both parents not known AB CD AC AC = 2 IBD (0.25) AD BC BD = 0 IBD (0.25) 1 IBD (0.50) AB CD AB AD AC AC AD AA AA dominant recessive Affected Sibling Pairs • Given a random chromosomal locus, siblings will be expected to share 0, 1 or 2 haplotypes IBD with frequencies 0.25, 0.5 or 0.25 respectively. • Given a chromosomal locus ‘linked’ to a disease, i.e. a disease allele (A) is on a haplotype carried by affected individuals, siblings will share 0, 1 or 2 haplotypes IBD with frequencies: • 0.0, 0.5 or 0.5 respectively, if dominant • 0.0, 0.0 or 1.0 respectively, if recessive

  21. Affected Sib Pair Analysis • Non-parametric • i.e. it is “model free” (no need to define model) • Collect lots of nuclear families: • Two parents • Affected sibling pairs • Genotype lots of markers: • Preferably polymorphic, e.g. microsatellites • Look for deviations from the 0.25, 0.5, 0.25 frequencies of 0, 1 or 2 IBD. • Caveats with complex disorders: • Partial penetrance and varied expressivity. • Susceptibility loci are not always necessary or sufficient to cause disease. • A single chromosomal region may not be shared by all affected sib pairs. • Can lead to large candidate regions. • Programs such as Merlin, GENEHUNTER and MAPMAKER/SIBS can derive nonparametric lod scores

  22. Quantitative Trait Loci (QTL) Mapping • Non-parametric (or Model-free) linkage methods • Hence, do not necessarily require: • allele/gene frequencies • rates of penetrance • mode of inheritance • assumption of monogenetic inheritance • Can incorporate or better handle heterogeneity and oligogenicity • Therefore more suitable for complex genetic traits • Method 1: Haseman-Elston Linkage Analysis • Method 2: Variance Components (VC) Linkage Analysis

  23. Square-trait difference Basic Haseman-Elston Linkage Analysis • Squared trait differences for sib-pairs are regressed on IBD allele-sharing • For a locus that does not influences trait levels, the sib-pair IBD will not be correlated with their squared trait-differences.

  24. Square-trait difference Basic Haseman-Elston Linkage Analysis • Squared trait differences for sib-pairs are regressed on IBD allele-sharing • For a locus that does influence trait levels, the sib-pair IBD will be negatively correlated with their squared trait-differences.

  25. Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 Locus 6 Basic Haseman-Elston Linkage Analysis • Squared trait differences for sib-pairs are regressed on IBD allele-sharing • For a locus that influences trait levels, the sib-pair IBD will be negatively correlated with their squared trait-differences. Square-trait difference

  26. VC Linkage Analysis • Dissects the genetic variation within the quantitative trait. • Advantage - large sibships or entire pedigrees can be simultaneously analysed. • Advantage - all phenotypic variability is considered. • Disadvantage - Computationally intensive. • Uses maximum-likelihood estimation • A statistical method for fitting a statistical model to data • Provides estimates for the parameters • Takes a fixed set of data (i.e. genotypes, phenotypes and pedigree structure) and derives the model parameters that produce the distribution most likely to have resulted in the observed data • Trait variability is partitioned into major-gene, polygenic and environmental factors. • Linkage analysis compares null hypothesis (no major gene effect) to the alternative hypothesis (where the major gene component can vary freely).

  27. Key: CCI CCN Olson Read Spell Spoon Example comparing two methods:Chromosome 18 Linkage Results Haseman-Elston Analysis Variance Components Analysis UK Sample1 centromere centromere UK Sample 2 centromere centromere UK Sample 3 centromere centromere

  28. Merlin • Performs NPL analysis: • Qualitatitive • Quantitative (an extension of the HE method) • Uses 3 specific input files: • Ped file (.ped) • Dat file (.dat) • Map file (.map)

  29. Family 1 Mother [1] Father [2] Child [3] Child [4] header row only visible for this lecture, must not be used in reality founders Merlin: ped file • Tab-delimited • Describes pedigree structure • Each row represents a different individual • 5 mandatory columns on left side: • Family ID (numeric, unique between families) • Individual ID (numeric, unique within family) • Father ID • Mother ID • Sex of individual (1 = male, 2 = female)

  30. Merlin: ped file Family 1 Mother [1] Father [2] • Subsequent columns contain: • genotype information • Two consecutive integers per marker • One for each allele • Else, X = missing allele • phenotype information • Qualitative (affection status) • 1 = unaffected • 2 = affected • 0 = missing phenotype • Quantitative • Numeric values • X = missing phenotype Child [3] Child [4]

  31. Merlin: ped file • Can be massive, e.g.: • Families • Many siblings • Half-relatives • Multigenerational • Markers and/or phenotypes: • Tens, hundreds, thousands or even millions! • Requires a .dat file to describe the user-definable columns

  32. Merlin: dat file • Tab delimited file • Describes columns from 6th onwards. • Each row: • describes a subsequent column (starting from the 6th) • contains two columns: • 1st = nature of .ped file column • A = affection status • T = quantitative trait • M = genetic marker (actually corresponds to 2 columns from ped file) • 2nd = alphanumeric name of the phenotype or genetic marker

  33. Merlin: map file • Tab delimited file • Describes the positions of genetic markers present in the .dat file • First row is a header row: • CHROMOSOME, MARKER, POSITION • Each subsequent row gives the: • chromosome number of the marker • name of the marker • genetic position of the marker (in centiMorgans)

  34. Exercise 1a: Using Merlin to perform NPL analysis • Data and 1st lecture available here: • www.well.ox.ac.uk/~clicker/Bologna/Lecture1/ • Merlin website available here: • http://www.sph.umich.edu/csg/abecasis/merlin/tour/linkage.html • Or simply Google “Merlin Linkage” • Follow the Merlin “Linkage” tutorial using the ASP example files. • Understand input files • Make sure to check for data integrity using pedstats: • Check for family connectivity • Perform NPL and VC linkage analyses • Tip: use the “ - -pdf ” option to see graph of output

  35. Exercise 1b: Using Merlin to perform NPL analysis • Click on “Regression” on the left-hand menu • Perform “regression-based” linkage analysis with: • ASP example data files • chr18 data set: • 50+ microsatellites (majority named d18s###) • 3 quantitative phenotypes: • Read_T_2003 • Spell_T_2003 • Spoon_Resid_2003 • Contains 3 bugs that need fixing • Tip: use the “ - -pdf ” option to see graph of output

More Related