Sequence diversity in evolution and crop improvement
Download
1 / 69

- PowerPoint PPT Presentation


  • 204 Views
  • Uploaded on

Sequence Diversity in Evolution and Crop Improvement. Teosinte. Maize Landraces. Inbreds/Hybrids. Sherry Flint-Garcia Research Geneticist USDA-ARS MU Division of Plant Sciences. Photos courtesy J. Doebley. Sequence Diversity. Evolution: What are the forces that cause evolution?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - telyn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Sequence diversity in evolution and crop improvement l.jpg

Sequence Diversity in Evolution and Crop Improvement

Teosinte

Maize Landraces

Inbreds/Hybrids

Sherry Flint-Garcia

Research Geneticist

USDA-ARS

MU Division of Plant Sciences

Photos courtesy J. Doebley


Sequence diversity l.jpg
Sequence Diversity

  • Evolution:

    • What are the forces that cause evolution?

    • Speciation & hybridization

    • Uncovering evolutionary history

  • Crop Improvement:

    • The teosinte-maize story


The four forces of evolution l.jpg
The Four Forces of Evolution

  • Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution.

  • Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population.

  • Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal.

  • Genetic drift -- random changes in gene frequency. This is very important in small populations.


Mutation generation of new alleles l.jpg
Mutation: Generation of New Alleles

  • Mutations are the result of mistakes in DNA replication, exposure to UV or to some chemicals (mutagens) and other causes.

  • Point mutations

    • changing one nucleotide to another

    • e.g., C-->T


Sickle cell anemia l.jpg
Sickle Cell Anemia

A single point mutation causes a

dramatic change in phenotype.


Other types of mutations l.jpg
Other types of mutations

  • Indels

    • insertions/deletions

    • Cause frame-shifts, & usually premature ‘stops’

  • Geneduplication

    • May lead to new functions

  • Chromosomalmutations

    • Inversions, translocations, deletions

  • Polyploidy

    • Very common in plants

    • May lead to new species in one step


Most point mutations have no effect or almost no effect why l.jpg
Most point mutations have no effect or almost no effect. Why?

Most of the genome seems to be ‘junk’ -- at least it doesn’t code for proteins.

Many mutations within protein-coding region of genes don’t change the amino acid specified. i.e., there is redundancy in the genetic code.

For example,

6 different codons

specify the amino

acid leucine.


The four forces of evolution8 l.jpg
The Four Forces of Evolution Why?

  • Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution.

  • Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population.

  • Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal.

  • Genetic drift -- random changes in gene frequency. This is very important in small populations.


Natural selection l.jpg
Natural Selection Why?

  • Peppered moth (Biston betularia) evolution during the industrial revolution in England

  • Early 1800s = pre-industrial

    • Bark of trees were white

    • Almost all moths were of typica form

  • 1895 = Industrial Era

    • Bark of trees were covered in black soot

    • 98% of moths were of carbonaria form

  • Today = Clean Air laws enforced

    • Prevalence of carbonaria form declining

‘typica’ form

‘carbonaria’ form


Slide11 l.jpg

Brassica Why?

oleracea


The four forces of evolution12 l.jpg
The Four Forces of Evolution Why?

  • Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution.

  • Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population.

  • Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal.

  • Genetic drift -- random changes in gene frequency. This is very important in small populations.


Gene flow l.jpg
Gene Flow Why?

  • Tends to homogenize populations.

  • Rates of gene flow depend on the spatial arrangement of populations.

“Directional” movement of alleles

Migration occurs at random among

a group of equivalent populations.


Slide14 l.jpg

Migration along a linear set of populations Why?

Populations

are continuous. 


The four forces of evolution16 l.jpg
The Four Forces of Evolution Why?

  • Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution.

  • Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population.

  • Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal.

  • Genetic drift -- random changes in gene frequency. This is very important in small populations.


Slide18 l.jpg

Founder effect: Gene flow Why?and genetic drift are responsible for the limited genetic variation on islands, relative to mainland populations.


Speciation and hybridization l.jpg
Speciation and Hybridization Why?

  • Speciation – how do new species arise?

  • What is a species, anyway?

  • Most species were originally described by their morphology.

  • The Problem: Convergence

    • Similar features in unrelated organisms due to evolution of traits that “work” in similar environments


Slide20 l.jpg
Convergent structures in the ocotillo (left) from the American Southwest, and in the allauidia (right) from Madagascar.



Speciation and hybridization22 l.jpg
Speciation and Hybridization morphology.

  • Biological Species Concept (BSC)

    • Based on reproductive compatibility

    • Natural spatial, temporal, and morphological discontinuities generally correspond to fertility barriers

  • The Problem: In plants, many named species can hybridize.


Slide23 l.jpg

Most dandelions are asexual. So the biological species concept (BSC) doesn’t apply.

How can you name species depending on who can mate with whom when the organisms do not mate at all?!


Slide24 l.jpg

Scarlet and Black oaks can hybridize and inhabit the same range -- but they have different microhabitat preferences so hybridization is rare.


Slide25 l.jpg

These pines can also hybridize but they shed their range -- but they have different microhabitat preferences so hybridization is rare.

pollen at different times of the season


Speciation by hybridization l.jpg
Speciation by Hybridization range -- but they have different microhabitat preferences so hybridization is rare.

Hybridization

often shows

how difficult it

is to apply the

BSC to plants.

The hybrid in this case is a new species. The rearrangements of its chromosomes make it infertile with either parent.

hybrid


Slide27 l.jpg

As the climate becomes drier the desert splits the range of this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.


Slide28 l.jpg

Evolution of species this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.

that are geographically

separated. Genetic drift plays

a significant role.

“Edge effect” where evolution

of reproductive barriers occurs

between neighboring populations.

Requires considerable selection

pressure.

Establishment of a new population

with a different ecological niche

within the same geographical range

of the parental population


Uncovering evolutionary history l.jpg
Uncovering Evolutionary History this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.

  • Taxonomy vs. Systematics

  • Estimating Phylogeny

    • Distance Methods

    • Maximum Parsimony Methods

    • Maximum Likelihood Methods


Taxonomy vs systematics l.jpg
Taxonomy vs. Systematics this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.

  • Taxonomy

    • Discovering

    • Describing

    • Naming

    • Classifying

  • Systematics

    • Figuring out the evolutionary relationships of species

    • Summarize the evolutionary history of a group


Plant taxonomy l.jpg
Plant Taxonomy this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.

  • taxon - any group at any rank

  • corn = common name

  • kingdom Plantae (Viridiplantae)

  • division (phylum) Anthophyta

  • class Liliopsida

  • order Commelinales

  • family Poaceae

  • genus Zea

  • species Zea mays

always

capitalized

never

capitalized


Plant systematics l.jpg
Plant Systematics this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.

  • A phylogenetic tree is used to illustrate systematicrelationships

  • Modern taxonomic groups generally correspond to clades on a phylogenetic tree (i.e. cladogram)

  • Example: phylogenetictree of the grass family

Mathews et al. 2000 American Journal of Botany


Angiosperm phylogeny group tree dicots are not a monophyletic group l.jpg
Angiosperm Phylogeny Group Tree this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.“Dicots” are not a monophyletic group.


Data types that can be used to estimate a phylogeny l.jpg

Cross Compatibility this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.

Uses the ‘Biological Species Concept’

Morphological

Continuous traits

Meristic (countable) traits

Cytological

Chromosome number

Chromosome features

Pairing in hybrids

Molecular data

Secondary chemicals

Proteins

DNA

Allele frequencies at many loci (isozymes, SSR)

DNA sequences, considered as a whole

DNA sequences, considered site-by-site

Data Types that can be used to Estimate a Phylogeny


Maximum parsimony minimum evolution methods l.jpg
Maximum Parsimony this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.(Minimum Evolution) Methods

  • The process of attaching preference to the pathway that requires the invocation of the smallest number of mutational events.

  • Most effective when examining sequences with strong similarity

  • Underlying premises:

    Mutations are exceedingly rare events.

    The more unlikely events a model invokes, the less likely the model is to be correct.


Using only trait 1 l.jpg

trait1 this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.

2

3

trait5

4

sp2

sp1

Species 1

red

0

1.2

A

T

0<->1

3.4

Species 2

blue

0

G

C

3.5

A

Species 3

sp5

1

T

red

sp3

sp4

1

4.0

red

A

T

Species 4

Species 5

1

2.8

blue

G

T

Using only trait 1 …

Traits must have

discrete character states.

Must have same character state in at least 2 taxa.


But traits 3 4 disagree with trait 1 l.jpg

trait1 this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.

2

3

trait5

4

Species 1

red

0

1.2

A

T

3.4

Species 2

blue

0

G

C

3.5

A

Species 3

1

T

red

1

4.0

red

A

T

Species 4

Species 5

1

2.8

blue

G

T

But traits 3 & 4 disagree with trait 1.

sp2

sp5

Red<->blue

A<->G

sp3

sp1

sp4


Slide38 l.jpg

4

1

2

5

3

3

5

2

1

4

Blue

Blue

0

0

G

G

0

Blue

4 substitutions

required

5 substitutions

required

G

Red

Red

A

A

1

1


Distance based approaches l.jpg

Sp1 informative site (computationally intensive).

Sp2

Sp3

Sp4

Sp5

0

Sp1

0

Sp2

0

Sp3

0

Sp4

Sp5

0

Distance-based approaches

Compare each taxon to every

other taxon to estimate a

“distance matrix”

Distances are then

‘clustered’ to estimate

a phylogenetic tree.

d12

d13

d14

d15

d23

d24

d25

d34

d35

d45


Distance based approaches40 l.jpg

Sp1 informative site (computationally intensive).

Sp2

Sp3

Sp4

Sp5

0

Sp1

0

Sp2

0

Sp3

0

Sp4

Sp5

0

Distance-based approaches

Compare each taxon to every

other taxon to estimate a

“distance matrix”

Example: DNA sequence considered as a whole

10 20 30 40 50Sp1: GTGCTGCACG GCTCAGTATA GCATTTACCC TTCCATCTTC AGATCCTGAASp2: ACGCTGCACG GCTCAGTGCG GTGCTTACCC TCCCATCTTC AGATCCTGAASp3: GTGCTGCACG GCTCGGCGCA GCATTTACCC TCCCATCTTC AGATCCTATCSp4: GTATCACACG ACTCAGCGCA GCATTTGCCC TCCCGTCTTC AGATCCTAAASp5: GTATCACATA GCTCAGCGCA GCATTTGCCC TCCCGTCTTC AGATCTAAAA

9

8

12

15

11

15

18

10

13

5


Distance based approaches41 l.jpg

Sp1 informative site (computationally intensive).

Sp2

Sp3

Sp4

Sp5

0

Sp1

0

Sp2

0

Sp3

0

Sp4

4

5

Sp5

0

Distance-based approaches

Distances are then

‘clustered’ to estimate

a phylogenetic tree.

Example:

UPGMA algorithm

Unweighted Pair-Group

Method using Arithmetic means

9

8

12

15

11

15

18

10

13

The smallest distance is identified, the average

of the two combined taxa is calculated, and the

matrix is recalculated. This iteration is repeated.

5

2.5

2.5


Distance based approaches42 l.jpg

1 informative site (computationally intensive).

4

3

5

Distance-based approaches

Sp1

Sp2

Sp3

4-5

0

9

8

13.5

Sp1

11

16.5

0

Sp2

11.5

0

Sp3

0

4-5

4

4

2.5

2.5


Distance based approaches43 l.jpg
Distance-based approaches informative site (computationally intensive).

Sp2

1-3

4-5

0

10

16.5

Sp2

12.5

0

1-3

0

4-5

4

4

5

2.5

2.5

1

3

2

4

5


Distance based approaches44 l.jpg
Distance-based approaches informative site (computationally intensive).

1-2-3

4-5

0

12.5

1-2-3

0

4-5

6.5

6.5

4

4

5

2.5

2.5

1

3

2

4

5


Maximum likelihood methods l.jpg
Maximum Likelihood Methods informative site (computationally intensive).

  • Best suited for DNA and protein sequence data

  • Requires a model of evolution

  • Each nucleotide/amino acid substitution has an associated likelihood

  • A function is derived to represent the likelihood of the data given the tree, branch-lengths and additional parameters

  • Function is minimized


Slide46 l.jpg

1 informative site (computationally intensive).

1

1

3

2

3

2

3

4

4

4

2

0.25

L0

T

10-6

L1

L2

T

G

2 x 10-6

L4

L5

L6

T

T

A

G

Tree 1

Based on a model of

nucleotide substitution matrix (transitions and transversions)

A

C

G

T

A

1

10-6

2 x10-6

10-6

C

1

2 x10-6

10-6

10-6

1

G

10-6

10-6

2 x10-6

T

10-6

2 x10-6

10-6

1

1: ACGCG T T GG G

2: ACGCG T T GG G

3: ACGCAA T GAA

4: ACACAGGGAA

L(Tree 1) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 5 x 10-13


Slide47 l.jpg

Consider every possible base assignment to each node and calculate the likelihood

1

3

2

4

0.25

L0

L0

T

C

10-6

L1

L2

L1

L2

2 x 10-6

T

G

T

G

2 x 10-6

L4

L5

L6

L3

L4

L5

L6

T

T

A

G

T

T

A

G

Tree 1

Tree 2

1: ACGCG T T GG G

2: ACGCG T T GG G

3: ACGCAA T GAA

4: ACACAGGGAA

Repeat for each of node assignment, and each site in alignment.

Probability of that unrooted tree is the sum of all individual trees.

Repeat for each unrooted tree and choose the tree with the highest liklihood.

L(Tree 1) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 5 x 10-13

L(Tree 2) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 1 x 10-18


The teosinte maize story l.jpg

6000 – 10,000 years ago calculate the likelihood

The Teosinte-Maize Story

  • The practical side of sequence diversity

  • PLANT BREEDING!

  • Sequence Diversity in Teosinte

  • Sequence Diversity in Maize

  • Selection During Domestication and Improvement


Sequence diversity and plant breeding l.jpg
Sequence Diversity and Plant Breeding calculate the likelihood

  • Genetic diversity within a crop species is the raw material for current plant breeding

  • Genetic diversity is the insurance policy to enable plant breeders to adapt crops to changing environments


The problem l.jpg

Bushels Per Acre calculate the likelihood

Single Cross Hybrids

Open Pollinated Varieties

Double Cross Hybrids

Year

The Problem

  • To what degree is limiting genetic diversity

  • inhibiting genetic improvement in corn?


Two views of the problem l.jpg
Two Views of the Problem calculate the likelihood

  • “Most of the corn germplasm in use in the USA today is derived from mixtures of only two major races [out of ~ 300 races total] (Wallace and Brown, 1956). The simplest means of correcting this situation and of increasing the genetic diversity of this important crop is to introduce unrelated sources of germplasm” (Brown and Goodman, 1977, Races of Corn, in Corn and Corn Improvement)

  • [From a project comparing sequence diversity in 21 genes of nine U.S. inbred lines with 16 diversity maize landraces] “We found that our sample of [U.S.] inbredscontained a level of [SNP] diversity that was 77% the level of diversity in our landrace sample.”(Tenaillon et al., 2001, PNAS, 98:9161-9166)


Sequence diversity in maize l.jpg
Sequence Diversity in Maize calculate the likelihood

  • How has selection shaped sequence diversity in maize?

    • Survey SNPs from ~1800 genes in diverse maize and teosinte germplasm

    • Screen 4000 candidate genes for evidence of selection

  • Practical Goal: identify genes exhibiting selection

    • Domestication, agronomic improvement, and local adaptation


Slide53 l.jpg

Allele Frequencies calculate the likelihood

teosinte

Domestication

landraces

Plant Breeding

modern

inbreds

Unselected

Gene

Domestication

Gene

Improvement

Gene


Can we develop genomic screens to identify genes that have undergone selection l.jpg
Can we develop genomic screens to identify genes that have undergone selection?

1. Invariant SSR approach

2. Direct Sequencing Approach

What proportion of genomic sequences that have low allelic diversity among inbreds result from selection for domestication?

Contrast sequence diversity among teosintes, landraces, and inbreds


Slide55 l.jpg

Screening SSR undergone selection?

primers against

12 inbred lines

  • 1,772 total SSRs

    • 1,053 were polymorphic (Class I)

    • 719 were invariant (Class II)

Invariant SSR primers


Invariant ssr screening l.jpg

Non - Class II undergone selection?

Teosinte(6)

Landrace (5)

US Inbreds

Class II

Teosinte(6)

Landrace (5)

US Inbreds

Invariant SSR Screening

  • 470 invariant SSR primer sets

    • 321 monomorphic throughout

    • 60 polymorphic in both exotics and teosintes

    • 14 polymorphic only in exotics

    • 75 polymorphic only in teosintes (Class II-E)

Vigouroux et al. 2002. PNAS 99: 9650


Analysis of class ii e ssrs l.jpg
Analysis of Class II-E SSRs undergone selection?

  • 31 Class I SSRs and 44 Class II-E SSRs

  • 44 teosinte and 45 landrace accessions

  • Tested for selection (loss of diversity)

  • 0 Class I SSRs showed evidence of selection

  • 15 Class II-E SSRs showed evidence of selection

  • Extrapolated back to the 1772 total SSRs:

  • “1.4% genes have been selected”


Direct sequencing approach l.jpg
Direct Sequencing Approach undergone selection?

  • Purpose: to develop a SNP resource for the maize community

  • Result: a LOT of data!!!


Distribution of snp haplotypes patterns l.jpg

Conserved undergone selection?

Diverse

Distribution of SNP Haplotypes (patterns)

470 maize Unigenes in 14 maize lines

Mean haplotype # = 4.46

> 80% of unigenes have 2 to 7 haplotypes

For each gene, a few haplotypes account for much of the diversity


Slide60 l.jpg

Tripsacum undergone selection?

teosinte

landraces

inbreds

Are genes with low inbred diversity enriched for domestication and improvement candidates?(Masanori Yamasaki, post-doc in McMullen Lab)

36 genes with no diversity among a 14-inbred set

Sequenced same region in 16 landraces,

16 teosintes, and a Tripsacum dactyloides sample.

Test for selection on inbreds, landraces and teosintes compared to four neutral genes.


Selection tests for 33 of 36 genes l.jpg
Selection Tests for 33 (of 36) Genes undergone selection?

5 genes were significant in both the inbreds and the landraces (evidence for domestication genes).

7 genes were significant in the inbreds but not the landraces (evidence for improvement genes).

1 additional gene was classified as either domestication or improvement depending on the test.

13 out of 33 genes = 39% !!

Yamasaki et al. submitted


Selection on a genomic scale l.jpg
Selection on a Genomic Scale undergone selection?

  • Sequenced 774 maize unigenes in 14 maize inbreds and 16 teosinte accessions

  • Tested for selection using coalescent simulations

  • Result: 2-4% had experienced artificial selection

  • Assume 59,000 genes in maize

  • 59,000 x 2% = 1200 selected genes

Wright et al. 2005 Science 308: 1310


Where are we going with this l.jpg
Where are we going with this? undergone selection?

  • Before genomics, 11 genes had been identified as selected by population genetic approaches.

  • By sequencing 1000 genes, have ~50 novel candidates.

  • We need:

  • 1. to completely sequence the maize genome to identify ALL genes.

  • 2. to resequence all remaining genes in multiple maize inbreds and teosinte accessions.

1140 more !


Signatures of selection l.jpg
Signatures of Selection undergone selection?

  • If selected genes were important in the past improvement, continued manipulation might contribute to future gain.

  • If selected genes suffered a loss of diversity because of selection, they are prime candidates for introgressive breeding from wild relatives.

  • Hypothesis: manipulation of the expression of domestication and improvement genes will alter key agronomic traits


Selection for amino acid content l.jpg

25 undergone selection?

30

Teosinte

Landraces

25

Maize

20

20

15

% of Kernel Weight

% of total AA

15

10

10

5

5

0

0

Total AA

Valine

Serine

Lysine

Proline

Glycine

Alanine

Taurine

Arginine

Leucine

Histidine

Cysteine

Tyrosine

Ornithine

Isoleucine

Threonine

Methionine

Tryptophan

Proline Total

Aspartic Acid

Glutamic Acid

Arginine Total

Cysteine Total

Phenylalanine

Hydroxyproline

Selection for Amino Acid Content?

  • Four genes that show evidence of selection are involved in amino acid biosynthesis


Selection for amino acid content66 l.jpg
Selection for Amino Acid Content? undergone selection?

  • Are there more genes in amino acid pathways that have been selected?

  • Sequenced 16 genes in 28 maize inbreds, 16 teosinte, and 2 tripsacum.

  • Result: we found 4 genes that may have been selected during domestication/improvement.


The ultimate selection project l.jpg

B73 with knockout undergone selection?

in selected gene

teosinte

B73 with teosinte

allele of selected gene

The Ultimate Selection Project

B73 – inbred line


Sequence diversity in evolution and crop improvement68 l.jpg

Sequence Diversity in undergone selection?Evolution and Crop Improvement

Teosinte

Maize Landraces

Inbreds/Hybrids

Sherry Flint-Garcia

Research Geneticist

USDA-ARS

MU Division of Plant Sciences

Photos courtesy J. Doebley


Slide69 l.jpg

In undergone selection?sertion

B73

CO159

GT119

Tx501

Tx303

Mo17

Mp708

IHO

T218

Deletion

Conserved region

SNP

InDel

Molecular Diversity:

SNP: Single nucleotide polymorphism

InDel: Insertion deletion

SNPs and Indels are used markers for genetic analysis


ad