Population Genomics
Download
1 / 88

Population Genomics- A View - PowerPoint PPT Presentation


  • 315 Views
  • Updated On :

Population Genomics Friend or Foe? Tim Shank 4/2/03 [email protected] Woods Hole Oceanographic Institution. Genome Projects. Microbial Genomics. Genome- Genome Interactions. Comparative Genomics. Population Genomics. Genomics Projects. Microbial Genomics. Functional Genomics.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Population Genomics- A View' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Population Genomics

Friend or Foe?

Tim Shank

4/2/03

[email protected]

Woods Hole Oceanographic Institution


Slide4 l.jpg

Genome

Projects

Microbial

Genomics

Genome-

Genome

Interactions

Comparative

Genomics


Population genomics a view l.jpg

Population

Genomics

Genomics

Projects

Microbial

Genomics

Functional

Genomics

Pharmaco

genomics

Population Genomics- A View


Slide6 l.jpg

Population Genomics- definitions

The study of forces that determine patterns of DNA

variations in populations (Michel Veuille, European Consortium)

Field of genomics that links complex genotypes and phenotypes

by comparing the flow of genotypic and phenotypic information

in breeding and natural populations (Andrew Benson, U. Neb)

Genomic variation within species permitting the construction of

detailed linkage maps using polymorphic markers, and through

crossing experiments between individuals with different

phenotypes, identification of genes responsible for phenotypic

variation (e.g, disease susceptibility, drug toxicity) (Andrew Clark, PSU)


Questions in marine population genetics l.jpg

  • What role do larval retention and stepping stone habitats play in species maintenance?

  • Does the pattern of colonization and mode of dispersal affect the retention of genetic diversity in marine animals?

Questions in Marine Population Genetics

  • Characterization of genetic relationships of populations important for understanding:

  • • Genetic management of protected or threatened populations (e.g. Jones et al. 2002)

    • Historical migrations and connectivity of populations (e.g. Eizirik et al. 2001)

  • • Kin selection and social behavior (e.g. Morin et al. 1994)

  • • Mating systems (e.g. Engh et al. 2002)

  • • Dispersal, temporal and spatial genetic structure (e.g. Goodisman & Crozier 2001)


  • Dispersal models l.jpg
    Dispersal models isolated?

    • Continuous populations

      • Isolation-by-distance

    • Discrete populations

      • Stepping-stone

      • Island model


    F st approaches l.jpg

    F isolated?ST

    Nm

    FST -approaches

    Wright (1951) [The genetical structure of populations. Ann. Eugen. 15:323-354.] noted the following relationship holds when populations reach an equilibrium between genetic drift

    and migration:

    where N is the variance effective population size of the

    average population, and

    m is the average proportion of immigrants in each

    population

    Problem: Useful parameter space is for FST values

    between 0.1 and 0.4

    Nm is a virtual number


    Slide10 l.jpg

    20 isolated?

    10

    5

    100

    1000

    10,000

    DISTANCE (Km)

    The giant tubeworm, Riftia pachyptila

    Guaymas

    21°

    13

    °

    11

    °

    2

    °

    East Pacific Rise

    9

    °

    Fst. Migration rate

    Galapagos

    Rift

    N

    W

    E

    S

    Reject expectations of "island model"

    Consistent with stepping-stone model

    Inference: a species with more limited dispersal abilities

    Black et al. 1994 Gene flow among vestimentiferan tube worm

    (Riftia pachyptila) populations from hydrothermal vents of the Eastern Pacific. Marine Biology 120: 33-39.


    Molecular toolkit markers for inferring population structure and gene flow l.jpg
    Molecular Toolkit: markers for inferring population structure and gene flow

    • Allozymes

      • multiple, independent, codominant loci; relatively easy; low cost

      • need to freeze samples; state characters

    • RFLPs

      • variation in restriction fragment lengths

      • polymorphic due to restriction site mutation

    • mtDNA

      • relatively easy; maternally inherited; effectively haploid; non-recombining; modest cost; amenable to genealogical analysis

      • linked loci and psuedoreplication

    • nuclear DNA sequences

      • amenable to genealogical analysis

      • diploid; recombination; start-up time may be considerable

    • AFLPs

      • can get 100s of loci relatively easily

      • dominance; recombination; state characters; mutation models not available

    • minisatellites

      • repeats of 10-40 bp units

      • polymorphic due to unequal crossing over


    Molecular toolkit markers for inferring population structure and gene flow12 l.jpg
    Molecular Toolkit: markers for inferring population structure and gene flow

    • DNA microsatellites

      • Repeat unit 2-3 bp; nuclear; can get dozens of loci relatively easily; method of choice for parentage

      • recombination; state characters; start-up time is great; issues of homoplasy in geographical studies; mutation must be taken into account in gene flow models

    • Single-Nucleotide Polymorphisms (SNPs)

      • Most simple form and most common source of genetic polymorphism in most genomes.

      • large amount of sequencing effort in nonmodel organisms

      • Violation of analyitcal assmumption of independence among marker loci

    • Sequence Tagged Sites (STSs) (physical marker)

      • A short DNA segment that occurs only once in the genome and whose exact location and order of bases are known. (They can be used as primers for PCR reaction).

      • Very labor intensive; very few loci

    • Expressed Sequence Tags (ESTs) (physical marker)

      • Short (100-300bps) part a cDNA which can be used to fish the rest of the gene out of the chromosome by matching base pairs with part of the gene.

      • large amount of sequencing effort


    Slide13 l.jpg

    Molecular Markers: structure and gene flowRandomAmplifiedPolymorphic DNA, AP-PCR

    • PCR-based method

    Target Sequence

    = arbitrary primer (e.g. ggcattactc)

    • High Variability: Probably due to mutations in priming sequences

    Amplify regions between priming sites by polymerase chain reaction

    Analyze PCR products by agarose gel electrophoresis.

    Marker is dominant (presence/absence of band).

    No prior sequence knowledge required

    Many variations on the theme (e.g., RAMP, ISSR)


    Slide14 l.jpg

    A structure and gene flowmplified Fragment Length Polymorphism (AFLPs)

    • Polymorphism based on gain or loss of restriction site, or selective bases

    • Technically demanding and expensive

    • Many markers generated, mostly dominant

    • More reliable than RAPD, less so than SSR

    • No prior sequence knowledge required


    S ingle s trand c onformational p olymorphism l.jpg
    S structure and gene flowingle-Strand Conformational Polymorphism

    1. Amplify Target Sequence

    • Highly sensitive to DNA sequence: can detect single base changes

    • Simple process but can be difficult to repeat

    2. Denature product with heat and formamide

    3. Analyze on native (nondenaturing) polyacrylamide gel

    4. Base sequence determines 3-dimensional conformation


    D enaturing g radient g el e lectrophoresis l.jpg
    D structure and gene flowenaturing Gradient Gel Electrophoresis

    1. Amplify Target Sequence

    4. Denaturing gradient gels can be difficult to produce: use perpendicular gradient to identify optimal conditions, move to CDGE: constant denaturant gel electrophoresis

    2. Run product on gel with denaturing gradient (parallel or perpendicular to direction gel runs)

    3. Product begins denaturing at a certain point, depending on base sequence: greatly retards migration and allows discrimination of alleles based on small sequence differences


    C leaved a mplified p olymorphic s equence caps l.jpg
    C structure and gene flowleaved Amplified Polymorphic Sequence (CAPS)

    1. Amplify Target Sequence

    • Fairly simple analysis (cutting can be a hassle)

    • Requires sequence information from several alleles (or luck)

    2. Cut with a restriction enzyme that differentiates alleles

    X

    Allele 1

    Allele 2

    3. Alleles can be differentiated by size based on loss or gain of restriction site; May be able to analyze on agarose gel


    Allele discrimination via quantitative pcr taqman l.jpg
    Allele Discrimination via Quantitative PCR ( structure and gene flowTaqman)


    Slide19 l.jpg

    Microsatellites ( structure and gene flowSimple Sequence Repeats)


    Microsatellites l.jpg
    Microsatellites structure and gene flow

    “…reiterated short sequences [of DNA] tandemly arrayed, with variations in copy number accounting for a profusion of distinguishable alleles” - (Avise 1994)

    Locations:

    - Nuclear DNA

    - Chloroplast


    Microsatellite types l.jpg
    Microsatellite Types structure and gene flow

    • Dinucleotide

      • Animals - CA

      • Plants - TA, GA

    • Trinucleotide

      • GTG, CAG, and AAT

      • Related to disease and cancers

  • Tetranucleotide

    • GATA/GACA

    • Highly polymorphic


  • Microsatellite uses l.jpg
    Microsatellite Uses structure and gene flow

    • Population Genetics

      • Gene flow

      • Stock Structure

  • Genetic Probes

    • Larvae

    • Gut contents

    • Scat

    • Source populations

  • Pedigree Maps

  • Understanding Diseases


  • Microsatellite advantages l.jpg
    Microsatellite Advantages structure and gene flow

    • Highly Polymorphic

    • Codominant

    • In every organism examined to date

    • Very abundant

    • Random spacing in the genome

    • Can find same loci in closely related species

    • Easy and reliable scoring

    • Highly sensitive

    • Neutral markers


    Microsatellite disadvantages l.jpg
    Microsatellite Disadvantages structure and gene flow

    • Expensive

    • Time consuming

    • Several loci are needed to obtain sufficient statistical power

    • Current analyses methods do not distinguish between changes in flanking regions vs. changes within the microsatellite regions

    • Different rates of evolution at different loci


    Mutation mechanisms l.jpg
    Mutation Mechanisms structure and gene flow

    • Slippage in DNA at Replication (Slip-Strand Mispairing, SSM)

      • increases or decreases the repeat by one unit

      • most supporting evidence

  • Recombination

    • Unequal crossing over (UCO)

    • Gene conversion


  • Microsatellite mutations l.jpg
    Microsatellite Mutations structure and gene flow

    • 10-3 to 10-6 events per locus per generation (point mutation 10-9 to 10-10)

    • Varies by

      • repeat type

      • base composition of the repeat

      • taxonomic group

      • length of the allele

    • most common - addition or deletion of a single repeat

      • occasionally 2 to several repeats

    • strong evidence that the number of repeats is limited


    Mutation models l.jpg
    Mutation Models structure and gene flow

    • Infinite Allele Model (IAM)

      • gain or loss of any number of repeats and always results in an allelic state not present in the population

    • Stepwise Mutation Model (SMM)

      • gain or loss of a single repeat

    • Two-Phase Model (TPM)

      • gain or loss of X repeats

    • K-allele Model (KAM)

      • Intermediate step in the IAM (IAM = KAM with infinite K)

      • K possible allelic states


    Creating a microsatellite enriched library l.jpg

    DNA Library structure and gene flow

    Genomic

    DNA

    DNA

    Extraction

    Digestion

    Add

    Linkers

    Creating A Microsatellite-Enriched Library

    PCR


    Enriching microsat library l.jpg

    Hybridize structure and gene flow

    to Beads

    CACA

    GTGT

    PCR

    Microsatellite-Enriched

    DNA Library

    Enriching Microsat Library


    Microsatellite library screening l.jpg

    Blots/ structure and gene flow

    Hybridizations

    Cloning

    Plasmid

    Preps

    Enzyme

    Digest

    Isolated Plasmids

    Microsatellite Library Screening

    Check Insert Size

    Dot Blot Hybridizations


    References l.jpg
    References structure and gene flow

    www.biotech.ufl.edu/WorkshopsCourses/mm_manual.htm

    Avise, J.C. 1994. Molecular Markers, Natural History and Evolution. Chapman and Hall, New York. 511 pp.

    Balloux, F. and N. Lugon-Moulin. 2002. The estimate of population differentiation with microsatellite markers. Molecular Ecology. 11: 155-165.

    Goldstein, D.B. and C. Schloterrer (Editors). 1999. Microsatellites: Evolution and Applications. Oxford University Press, Oxford, 352 pp.

    Jarne, P and P.J.L. Lagoda. 1996. Microsatellites, from molecules to populations and back. Trends in Ecology and Evolution 11(10): 424-429.

    Slatkin, M. 1995. A measure of population subdivision based on microsatellite allele frequencies. Genetics 139: 457-462.


    Slide33 l.jpg

    Fluorescent Labeling of Microsatellites structure and gene flow

    • Acrylamide gel with 5 microsatellite loci and internal size standard

    • Simultaneous analysis of a dozen loci


    Comparing genomic methods for population studies l.jpg
    Comparing “Genomic” Methods for Population Studies structure and gene flow

    * Depends on cost of restriction enzymes employed


    Slide35 l.jpg

    All population genetic/genomic markers are vulnerable to structure and gene flow

    violations of assumptions- linkage equilibrium, mendelian inheritance, neutrality.

    Linkage Disequilibrium- alleles at different loci are found together more or less

    often than expected based on their frequencies (and location in the genome).

    Goldstein and Weale 2001 Population genomics: linkage disequilibrium holds the key. Current Biology 11:576-579


    Population genomics research l.jpg
    Population Genomics Research structure and gene flow

    • Understandings population structure, historical migrations, and gene flow among populations (e.g. SNP density distribution, coalescent approaches)

      • Need relatively moderate polymorphism, low cost per sample

      • mtDNA, Microsatellites, SNPs

    • Understanding current gene flow and mating systems by direct methods (e.g., maternity analysis, paternity analysis)

      • Need high polymorphism, codominance, repeatability, low cost per sample

      • Microsatellites, SNPs

    • Pharmacogenomics: polymorphism-based approaches for the discoveryand development of new medications; translating polymorphisms into “new genomic medicine”*

      • Need rapid, low-cost, repeatable way to distinguish alleles

      • screening large numbers of individuals; SNPs and Sequencing

        *New York Times, Nov. 2002


    Slide38 l.jpg

    • Two main hypotheses for human evolution: structure and gene flow

      • “Recent African origin” hypothesis- modern humans

        originated in Africa 100 - 200k years ago, and spread

      • “Multi-regional” hypothesis- modern humans evolved in different parts of the world

    • MtDNA favored out of Africa hypothesis but lacked statistical support for deep African branches

    Neighbor-joining phylogram based on complete

    mtDNA genome sequences (excluding D-loop).

    1000 bootstrap replicates shown on nodes.

    Asterisk refers to the MRCA of the youngest clade

    containing both African and non-African individuals.

    • 53 human mtDNA sequences (16,500 bp)

    • examined timing of evolutionary events

    • mtDNA evolving in a “clocklike” fashion

    • Linkage Disequilibrium not evident

    • 3 deepest branches lead exclusively to sub-Saharan

    • Note star-like vs deep branching topology- larger Ne

      or longer genetic history in Africa; bottleneck in non-Affican


    Slide39 l.jpg

    Exodus from Africa began 100 million years ago structure and gene flow

    Divergence of Africans and non-Africans occurred

    52,000  28,000 years ago

    mtDNA mismatch distributions for Africans and non-Africans

    • Individuals of African origin show a ragged distribution

    consistent with constant population size

    • Individuals of non-African origin show a bell-shaped distribution

    strongly suggests a recent population expansion

    Mismatch distributions of pairwise nucleotide

    differences between a) African and b) non-African


    Slide40 l.jpg

    Human genome mining to produce 507,152 high-confidence SNP candidates

    as uniform resource for describing nucleotide diversity and regional variation

    within and between human populations


    So what s a snp l.jpg
    So What’s a SNP? candidates

    • A mutation that causes a single base change is known as a Single Nucleotide Polymorphism (SNP)

    • SNPs are the most simple form and most common source of genetic polymorphism in the human genome

      • 90% of all human DNA polymorphisms;1SNP in 1000 bp; 1.42 million

    • SNP Haplotype is a particular pattern of sequential SNPs (or alleles) found on a single chromosome

      • Microarrays, mass spectrometry and sequencing are all used to accomplish grouping or blocking of SNPs= haplotyping

    • Haplotype Determination Problem- find all haplotypes given a genome and all identified SNPs (algorithm development)


    Slide42 l.jpg

    Approaches to SNP discovery and Genotyping candidates

    Many and numerous!

    (Reviewed Pui-Yan Kwok Annu. Rev. Genomics Hum Genet. 2001. 2:235-258

    SNP discovery can be based on expressed sequence tags (ESTs), genomic restriction fragments,

    aligned BAC sequences, random shot gun clone sequences, overlapping genomic clone sequences

    • Parallel genotyping of SNPs using generic high-density oligonucleotide tag arrays

    • Fan et al. (2000) Genome Research 10:853-860. (see Stickney et al 2002 for zebrafish SNP arraying)

    • PCR + single base extension chimeric primers, allele specific (labeled) dideox NTPs and then

      • hybridized to arrays containing thousands of preselected 20-mer oligonucleotide tags

    • Polymorphism ratio sequencing: a new approach for SNP discovery and genotyping

    • Blazej et al. (2003) Genome Research 13:287-293.

    • Dideoxy-terminator extension ladders generated from a single sample and reference template are

      • labeled with fluorescent dyes and coinjected into a separation capillary for comparison of

      • relative signal intensities.

    • A novel method for SNP detection using a new duplex-specific nuclease from crab hepatopancreas

    • Shagin et al. (2002) Genome Research 12:1935-1942.

    • “Duplex Specific Nuclease Preference” - SNP region amplified, template, signal probe, and

      • matched duplexes are then cleaved by DSN to generate sequence-specific fluorescence


    Genbank has a dbsnp l.jpg
    GenBank has a dbSNP candidates

    One year ago: dbSNP had 2,842,021 SNP submissions total

    Today, 2003, dbSNP has 6,250,820 submissions for human

    1,368,805 submissions for mosquito

    197,414 submissions for mouse

    2,031 submissions for zebrafish

    It is possible to search dbSNP by BLAST comparisons to a target sequence


    Slide44 l.jpg

    The SNP Consortium is an alliance of pharmaceutical and computer companies managed by Lincoln Stein at Cold Spring Harbor Lab.

    • “The SNP Consortium Ltd.. is a non-profit foundation organized for

    • the purpose of providing public genomic data. Its mission is to develop up to 300,000 SNPs distributed evenly throughout the human genome and to make the information related to these SNPs available to the public without intellectual property restrictions. The project started in April 1999 and is anticipated to continue until the end of 2001.”


    Slide45 l.jpg

    We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.


    Slide46 l.jpg

    • Looked for mismatches; SNPs polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

    • if Polybayes probability was 0.80

    • Built a set of pairwise sequence

    • alignments by analyzing the over-

    • lapping regions of large insert clones

    • SNP marker density grouped by

    • overlapping regions

    • Modeled the marker density

    • distribution


    Slide47 l.jpg

    Marker density distributions predicted under polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

    competing population genetic models

    No demographic history

    Poisson distribution driven by mutation rate

    Distribution of polymorphic sites profoundly impacted

    Increased pop size yields abundance of new lineages with more mutation

    Decreased pop size raises likelihood of relatedness resulting in

    over-representation of sequence identity

    Collapse followed by a phase of recent population recovery

    Evaluated degree of fit between observed density distribution and

    probability predicted using the log likelihood of the data for a given model

    r indicates the per nucleotide, per generation recombination rate


    Slide48 l.jpg

    Superior fit of the modeled parameters (with or without recombination) suggests a

    severe, 2- to 7 fold, collapse of population size 40,000 years (1600 generations) ago

    ….followed by a modest recovery

    % of successful trials for each model, at each data fraction;

    Assessments based on the amount of data required for rejection by X2 test.

    Interestingly, data fit between observations and best-fitting models decays with more data.


    Slide49 l.jpg

    History of the inbred laboratory mouse recombination) suggests a

    • Compared the C57BL/6J Mouse genome sequence with 59 finished segments of the 129/Sv inbred strain

    • Discovered nearly 70,000 SNPs on blocks of high SNP density (40 SNPs per 10kb)

    • separated by blocks of low density (0.5 SNPs per 10kb)

    • Surveyed panels of inbred mouse strains to find that distinct SNP haplotypes

    • were shared among common inbred populations.

    • Surveyed wild strains showed that 67% of each of the inbred genomes are derived from

    • European mice and 33% from Asian mice


    Slide50 l.jpg

    How about other organisms? or new ‘model’ organisms; recombination) suggests a

    organisms that exemplify phenomena not well studied in human/worm/mouse?

    Three-Spined Sticklebacks

    • morphological evolution

    • populations isolated after last glaciation, have diverged morphologically and in sequence (CAn microsatellites)

    • strategy: cross benthic and limnetic fish; intercross F1s, follow morphological traits and polymorphisms in F2s

    • see Peichel et al (2001) The genetic architecture of divergence between threespine stickleback species. Nature 414: 901-5.


    Stickleback genetic map woods et al 2000 l.jpg
    Stickleback genetic map recombination) suggests a (Woods et al. 2000)

    • 227 polymorphisms

    • 1 SNP marker per 4 cM

    • took ~4 person-years

    • now mapping genetic basis of morphological variations


    Slide52 l.jpg

    First zebrafish SNP map recombination) suggests a

    5 months ago

    2102 SNPs for mutation mapping

    Hundreds of SNPs on single array

    Stickney et al. 2002 Rapid mapping of zebrafish mutations

    with SNPs and oligonucleotide microarrays. Genome Res.

    12: 1929-1934.

    Vertical lines = 25 linkage groups

    Red dots correspond to SNPs represented on the olig. microarray

    Zebrafish

    Genes

    Postlehwait et al. 1994 A genetic linkage map for zebrafish. Science 264: 699-703.

    Woods et al. 2000 A comparative map of zebrafish genome. Genome Research 10: 1903-1914.

    Geisler et al. 1999 A radiation hybrid map of the zebrafish genome. Nature Genetics 23: 86-89.

    Microsatellites

    Shimoda et al. 1999 Zebrafish genetic map with 2000 microsatellite markers. Genomics 58: 219-232.


    Population genomics research53 l.jpg
    Population Genomics Research recombination) suggests a

    • Understandings population structure, historical migrations, and gene flow among populations (e.g. SNP density distribution, coalescent approaches)

      • Need moderate polymorphism, low cost per sample

      • Allozymes, mtDNA, RAPDs, Microsatellites, AFLPs, RFLPs, SNPs

    • Understanding current gene flow and mating systems by direct methods (e.g., maternity analysis, paternity analysis)

      • Need high polymorphism, codominance, repeatability, low cost per sample

      • Microsatellites, SNPs

    • Pharmacogenomics: polymorphism-based approaches for the discovery and development of new medications; translating polymorphisms into “new genomic medicine”*

      • Need rapid, low-cost, repeatable way to distinguish alleles

      • screening large numbers of individuals; SNPs and Sequencing

        *New York Times, Nov. 2002


    Slide54 l.jpg

    Inferring Pairwise Relationships with SNPs (in Your Favorite Metazoan)

    (Glaubitz, Rhodes, and Dewoody 2003 Molecular Ecology 12: 1039-1047)

    Problem:

    Need to determine genetic relationships in populations without known pedigrees

    Goal:

    To assess known pairwise relationships - via single nucleotide polymorphisms

    where already have parallel microsatellite results.

    Recent advances in microarray technology permit genotyping of large #s of individuals

    at 100s to 1000s of SNP loci (reviewed by Kwok 2001)- this could be big!

    Need to know if SNPs equal or exceed the power of practical numbers

    of microsatellite loci in estimating relationships?

    Microsatellites current methods of choice among close kin within a population,

    but the number of independently segregating microsatellite markers is limited

    SNPs may provide large number of segregating loci with

    a large number of alleles at even frequencies


    Slide55 l.jpg

    Constructed 5 catagories of relationships types Metazoan)

    Glaubitz et al. 2003-

    Computer simulations designed to evaluate SNPs ability

    to discriminate a variety of (pairwise) relationships likely

    to occur in natural populations, comparisons to

    microsatellites from Blouin et al 1996

    •SNPs segregate independently, ideal genome with 20

    autosomes, 5 SNPs per chromosome, 10,000 individuals

    random genotypes

    Constructed an array of pedigrees

    estimated pairwise relatedness at a single locus (r1)

    Evaluated the performance of 100 simulated SNPs by

    estimating misclassification (rate) of relationships


    Slide56 l.jpg

    • illustrates that different pairwise relationships can have different amounts of inherent variance in relatedness

    • the parent offspring (PO) and unrelated (U) relationships have 0 inherent variance (share one or no alleles)

    • FS has largest variance; second order relatives can not be distinguished from each other via estimation of r

    100 independently segregating SNPs determinined parent-offspring pairs

    as well as about 16 or fewer microsatellite loci when both parents are unknown

    Even under the optimistic scenario of 100 independent loci, results show little promise for discriminating higher order relationships on the basis of pairwise relatedness.

    Microsatellite approaches are still better…


    Slide57 l.jpg

    My two cents: different amounts of inherent variance in relatedness

    Based on 1) assumption of independence among the sampled SNP loci

    2) that the microsatellites themselves are independent (not linked)

    Conclusion:

    “SNPs have limited potential for the delineation of genealogical relationships…”

    • In the absence of a linkage map, the number of microsatellite or SNP loci scored must be

    • increased to compensate for the loss of information as a result of nonindependence

    • between markers

    • An alternative to using independently segregating SNPs is to use independently

    • segregating haplotypes, with each haplotype defined by a cluster of tightly linked

    • SNPs. (e.g., Heaton et al 2002 sequenced regions around 32 cattle SNPs; additional

    • 183 polymorphic sites; and more haplotypes for better resolution)


    Slide58 l.jpg

    To take full advantage of the “vast” abundance of SNPs in metazoan genomes

    and their potential automation, we will need analytical methods that account

    for tight genetic linkage (McPeek and Sun 2000) and known recombination

    frequencies….

    until then, SNP population genomics will likely only be used on model organisms.


    Population genomics research59 l.jpg
    Population Genomics Research in metazoan genomes

    • Understandings population structure, historic migrations, and gene flow among populations (e.g., Fst, coalescent approaches)

      • Need moderate polymorphism, low cost per sample

      • Allozymes, mtDNA, RAPDs, Microsatellites, AFLPs, RFLPs, SNPs

    • Understanding current gene flow and mating systems by direct methods (e.g., maternity analysis, paternity analysis)

      • Need high polymorphism, codominance, repeatability, low cost per sample

      • Microsatellites, allozymes

    • Pharmacogenomics: polymorphism-based approaches for the discovery and development of new medications; translating polymorphisms into “new genomic medicine”*

      • Need rapid, low-cost, repeatable way to distinguish alleles

      • screening large numbers of individuals; SNPs and Sequencing

        *New York Times, Nov. 2002


    Pharmacogenomics l.jpg
    Pharmacogenomics in metazoan genomes

    • The use of DNA sequence information to measure and predict the reaction of individuals to drugs.

    • Pharmacogenetics is the study of this variation at the level of a single gene, while pharmacogenomics studies variation at the genome wide level.

    • Observation that there is great individual variation in response to drugs- genetically determined.

    • It is possible to measure many thousands of SNPs simultaneously in a small blood sample from a patient

    • Can compare “genotypes” for SNP markers linked to virtually any trait


    Slide62 l.jpg

    Evolving Paradigm for Discovery of Genetic Polymorphisms in metazoan genomes

    associated with aberrant drug disposition or effects

    More discoveries thru polymorphisms

    in candidate genes (metabolism; transport;

    targets of candidate medication

    Observed phenotype - family studies-

    inherited basis


    New drug targets expected from the human genome project l.jpg
    New Drug Targets Expected from the Human Genome Project in metazoan genomes

    Number of Drug Targets

    12,000

    5,000–10,000

    10,000

    8,000

    6,000

    4,000

    2,000

    Approx. 500

    0

    Cumulative Number of Targets Known Today

    New Targets Expected from Human Genome Project

    Source: Drews J. Nat Biotechnol 1996;14.


    The gene for l.jpg
    The Gene for… in metazoan genomes


    Disease genes discovered l.jpg
    Disease Genes Discovered in metazoan genomes

    • For 1100 genes at least one disease-related mutation has been identified


    Clinical disorders and gene mutations l.jpg
    Clinical disorders and gene mutations in metazoan genomes

    • Different mutations in the same gene can give rise to more or less distinct disorders, so total number of diseases for which there are known mutations is ~1500


    Functional classifications l.jpg
    Functional Classifications in metazoan genomes

    • Disease genes classed by function and their relative representations


    Some diseases involve polygenic effects l.jpg
    Some Diseases Involve Polygenic Effects in metazoan genomes

    • There are a number of classic “genetic diseases” caused by mutations of a single gene

      • Huntington’s, Cystic Fibrosis, Tay-Sachs, PKU, etc.

  • There are also many diseases that are the result of the interactions of many genes:

    • Asthma, Heart disease, Cancer

  • Each of these genes may be considered to be a risk factor for the disease.

  • Groups of SNP markers may be associated with a disease without determining mechanism


  • Slide69 l.jpg

    Gene Product- Drug Interaction in metazoan genomes

    • There are proteins that chemically activate or inactivate drugs.

    • Other proteins can directly enhance or block a drug's activity.

    • There are also genes that control side effects.


    Some examples l.jpg
    Some Examples in metazoan genomes

    • 10% of African Americans have polymorphic alleles of Glucose-6-phosphate dehydrogenase that lead to haemolyitic anemia when they are given the anti-malarial drug primaquine.


    Slide71 l.jpg

    Succinylcholine Toxicity in metazoan genomes

    • 0.04% of individuals are homozygous for alleles of psedocholineseterase that are unable to inactivate the muscle relaxant drug succinylcholine, leading to respiratory paralysis.


    Slide72 l.jpg

    Isoniazid Metablolism in metazoan genomes

    • There are many polymorphic alleles of the N-acetlytransferase (NAT2) gene with reduced (or acclerated) ability to inactivate the drug isoniazid.

      • Some individuals developed peripheral neuropathy in reaction to this drug

      • Some alleles of the NAT2 gene are also associated with succeptibility to various forms of cancer


    Cytochrome p450 l.jpg
    Cytochrome P450 in metazoan genomes

    • ~10% of the Caucasian population is homozygous for alleles of the Cytochrome P450 gene CYP2D6 that do not metabolize the hypertension drug debrisoquine, which can lead to dangerous vascular hypotension.


    Slide74 l.jpg
    ACE in metazoan genomes

    • Patients homozygous for an allele with a deletion in intron 16 of the gene for angiotensin-converting enzyme (ACE) showed no benefit from the hypertension drug enalapril while other patients benefit.


    Slide75 l.jpg

    Collect Drug Response Data in metazoan genomes

    • These drug response phenotypes are associated with a set of specific gene alleles.

    • Identify populations of people who show specific responses to a drug.

    • In early clinical trials, it is possible to identify people who react well and react poorly.


    Slide76 l.jpg

    Make Genetic Profiles in metazoan genomes

    • Scan these populations with a large number of SNP markers.

    • Find markers linked to drug response phenotypes.


    Use the profiles l.jpg
    Use the Profiles in metazoan genomes

    • Genetic profiles of new patients can then be used to prescribe drugs more effectively & avoid adverse reactions.

    • Can also speed clinical trials by testing on those who are likely to respond well.


    Major pharmacogenetics approaches in post genomic era l.jpg
    Major pharmacogenetics approaches in post-genomic era in metazoan genomes

    • Identifying SNP variations in the genome and populations

    • Study of differential gene expression

      • Chips with mRNAs from different tissue types or normal and diseased tissue

      • Can detect expression of a target gene among 50,000-300,000 transcripts on a microarray

      • Possibility of simultaneously monitoring expression of every gene in any tissue will be possible

    • Detecting new metabolic disease pathways

      • Based on comparisons with other model organisms


    Slide79 l.jpg

    Micro-Array technology to analyze gene expression in metazoan genomes

    The principle behind this is to look at differences in gene expression when variables are changed -

    eg. Yeast cells grown in the presence of EtOH- what genes are turned on or off in response to that change in the environment

    Another variable could be normal versus diseased tissue

    Pool the cDNAs


    Slide80 l.jpg

    The cDNAs are hybridized to microarrays on which every gene that has been cloned is present [the DNA is spotted on the microslides and each spot corresponds to DNA from a different gene]

    If a particulatr gene is expressed, then it will be present and labelled in the the cDNA pool. It can then hybridize to the spot of the plate corresponding to that particular gene


    Slide81 l.jpg

    The results from such an experiment look like this where the color of the spot tells you something about that gene expression and drug therapy optimization.


    Slide82 l.jpg

    The data can then be analyzed and sorted into tables that show which genes are expressed in response to the stimulus and which are turned off

    This sort of experiment can be done with any collection of RNAs that you want to compare- particularly useful to compare ‘normal’ to mutant/disease state- eg. tells you what genes are turned on in cancerous cells, may give you a clue as to how cancer works


    Link gene expression to genome sequence l.jpg
    Link Gene Expression to Genome Sequence show which genes are expressed in response to the stimulus and which are turned off

    • Identify promoter and 5' sequence for a group of co-expressed genes.

    • Scan for known transcription factor binding sites.

    • Predict new regulatory sites based on common sequence elements.


    Diagnostic arrays l.jpg
    Diagnostic arrays show which genes are expressed in response to the stimulus and which are turned off

    -Examples of factors showing variability that could be detected on arrays

    -Provide information of status of SNPs and gene expression profiles


    Pharmacogenomics the future l.jpg
    Pharmacogenomics - The Future show which genes are expressed in response to the stimulus and which are turned off

    • Ultimate goal is to personalize drug treatment regimes

    • $

    • Faster clinical trials

    • $

    • Less drug side effects

    • $

    • Identify how genetic factors interact to affect variation in drug outcomes

      • Inactivation or activation by oxidation by cytochrome P450a

      • Clearance from bloodstream through kidney

      • Target sensitivity

      • Toxicity

      • Heterogeneity of disease mechanisms


    Slide86 l.jpg

    Pharmacogenomics - The Future…continued show which genes are expressed in response to the stimulus and which are turned off

    • Mutations in coding sequences will probably only play a small role in disease susceptibility between individuals

    • Variations affecting splicing and gene regulation will play a greater role

    • We know very little about the the importance that variations in regulatory and intronic sequences have and how they differ between populations

    • Issues:

      • associating sequence variations with heritable phenotypes

      • how genotypes affect common diseases, drug responses, and other complex phenotypes


    Slide87 l.jpg

    Booming Population Databases show which genes are expressed in response to the stimulus and which are turned off

    ScienceNews Focus

    The promise is to deliver “personalized” medicine


    ad