Sequence diversity in evolution and crop improvement l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 69

Sequence Diversity in Evolution and Crop Improvement PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on
  • Presentation posted in: General

Sequence Diversity in Evolution and Crop Improvement. Teosinte. Maize Landraces. Inbreds/Hybrids. Sherry Flint-Garcia Research Geneticist USDA-ARS MU Division of Plant Sciences. Photos courtesy J. Doebley. Sequence Diversity. Evolution: What are the forces that cause evolution?

Download Presentation

Sequence Diversity in Evolution and Crop Improvement

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sequence diversity in evolution and crop improvement l.jpg

Sequence Diversity in Evolution and Crop Improvement

Teosinte

Maize Landraces

Inbreds/Hybrids

Sherry Flint-Garcia

Research Geneticist

USDA-ARS

MU Division of Plant Sciences

Photos courtesy J. Doebley


Sequence diversity l.jpg

Sequence Diversity

  • Evolution:

    • What are the forces that cause evolution?

    • Speciation & hybridization

    • Uncovering evolutionary history

  • Crop Improvement:

    • The teosinte-maize story


The four forces of evolution l.jpg

The Four Forces of Evolution

  • Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution.

  • Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population.

  • Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal.

  • Genetic drift -- random changes in gene frequency. This is very important in small populations.


Mutation generation of new alleles l.jpg

Mutation: Generation of New Alleles

  • Mutations are the result of mistakes in DNA replication, exposure to UV or to some chemicals (mutagens) and other causes.

  • Point mutations

    • changing one nucleotide to another

    • e.g., C-->T


Sickle cell anemia l.jpg

Sickle Cell Anemia

A single point mutation causes a

dramatic change in phenotype.


Other types of mutations l.jpg

Other types of mutations

  • Indels

    • insertions/deletions

    • Cause frame-shifts, & usually premature ‘stops’

  • Geneduplication

    • May lead to new functions

  • Chromosomalmutations

    • Inversions, translocations, deletions

  • Polyploidy

    • Very common in plants

    • May lead to new species in one step


Most point mutations have no effect or almost no effect why l.jpg

Most point mutations have no effect or almost no effect. Why?

Most of the genome seems to be ‘junk’ -- at least it doesn’t code for proteins.

Many mutations within protein-coding region of genes don’t change the amino acid specified. i.e., there is redundancy in the genetic code.

For example,

6 different codons

specify the amino

acid leucine.


The four forces of evolution8 l.jpg

The Four Forces of Evolution

  • Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution.

  • Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population.

  • Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal.

  • Genetic drift -- random changes in gene frequency. This is very important in small populations.


Natural selection l.jpg

Natural Selection

  • Peppered moth (Biston betularia) evolution during the industrial revolution in England

  • Early 1800s = pre-industrial

    • Bark of trees were white

    • Almost all moths were of typica form

  • 1895 = Industrial Era

    • Bark of trees were covered in black soot

    • 98% of moths were of carbonaria form

  • Today = Clean Air laws enforced

    • Prevalence of carbonaria form declining

‘typica’ form

‘carbonaria’ form


Slide11 l.jpg

Brassica

oleracea


The four forces of evolution12 l.jpg

The Four Forces of Evolution

  • Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution.

  • Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population.

  • Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal.

  • Genetic drift -- random changes in gene frequency. This is very important in small populations.


Gene flow l.jpg

Gene Flow

  • Tends to homogenize populations.

  • Rates of gene flow depend on the spatial arrangement of populations.

“Directional” movement of alleles

Migration occurs at random among

a group of equivalent populations.


Slide14 l.jpg

Migration along a linear set of populations

Populations

are continuous. 


The four forces of evolution16 l.jpg

The Four Forces of Evolution

  • Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution.

  • Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population.

  • Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal.

  • Genetic drift -- random changes in gene frequency. This is very important in small populations.


Slide18 l.jpg

Founder effect: Gene flow and genetic drift are responsible for the limited genetic variation on islands, relative to mainland populations.


Speciation and hybridization l.jpg

Speciation and Hybridization

  • Speciation – how do new species arise?

  • What is a species, anyway?

  • Most species were originally described by their morphology.

  • The Problem: Convergence

    • Similar features in unrelated organisms due to evolution of traits that “work” in similar environments


Slide20 l.jpg

Convergent structures in the ocotillo (left) from the American Southwest, and in the allauidia (right) from Madagascar.


Nectar feeders have converged on this hovering long tongued morphology l.jpg

Nectar feeders have converged on this hovering long-tongued morphology.


Speciation and hybridization22 l.jpg

Speciation and Hybridization

  • Biological Species Concept (BSC)

    • Based on reproductive compatibility

    • Natural spatial, temporal, and morphological discontinuities generally correspond to fertility barriers

  • The Problem: In plants, many named species can hybridize.


Slide23 l.jpg

Most dandelions are asexual. So the biological species concept (BSC) doesn’t apply.

How can you name species depending on who can mate with whom when the organisms do not mate at all?!


Slide24 l.jpg

Scarlet and Black oaks can hybridize and inhabit the same range -- but they have different microhabitat preferences so hybridization is rare.


Slide25 l.jpg

These pines can also hybridize but they shed their

pollen at different times of the season


Speciation by hybridization l.jpg

Speciation by Hybridization

Hybridization

often shows

how difficult it

is to apply the

BSC to plants.

The hybrid in this case is a new species. The rearrangements of its chromosomes make it infertile with either parent.

hybrid


Slide27 l.jpg

As the climate becomes drier the desert splits the range of this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation.


Slide28 l.jpg

Evolution of species

that are geographically

separated. Genetic drift plays

a significant role.

“Edge effect” where evolution

of reproductive barriers occurs

between neighboring populations.

Requires considerable selection

pressure.

Establishment of a new population

with a different ecological niche

within the same geographical range

of the parental population


Uncovering evolutionary history l.jpg

Uncovering Evolutionary History

  • Taxonomy vs. Systematics

  • Estimating Phylogeny

    • Distance Methods

    • Maximum Parsimony Methods

    • Maximum Likelihood Methods


Taxonomy vs systematics l.jpg

Taxonomy vs. Systematics

  • Taxonomy

    • Discovering

    • Describing

    • Naming

    • Classifying

  • Systematics

    • Figuring out the evolutionary relationships of species

    • Summarize the evolutionary history of a group


Plant taxonomy l.jpg

Plant Taxonomy

  • taxon - any group at any rank

  • corn = common name

  • kingdomPlantae (Viridiplantae)

  • division (phylum)Anthophyta

  • classLiliopsida

  • orderCommelinales

  • familyPoaceae

  • genusZea

  • speciesZea mays

always

capitalized

never

capitalized


Plant systematics l.jpg

Plant Systematics

  • A phylogenetic tree is used to illustrate systematicrelationships

  • Modern taxonomic groups generally correspond to clades on a phylogenetic tree (i.e. cladogram)

  • Example: phylogenetictree of the grass family

Mathews et al. 2000 American Journal of Botany


Angiosperm phylogeny group tree dicots are not a monophyletic group l.jpg

Angiosperm Phylogeny Group Tree“Dicots” are not a monophyletic group.


Data types that can be used to estimate a phylogeny l.jpg

Cross Compatibility

Uses the ‘Biological Species Concept’

Morphological

Continuous traits

Meristic (countable) traits

Cytological

Chromosome number

Chromosome features

Pairing in hybrids

Molecular data

Secondary chemicals

Proteins

DNA

Allele frequencies at many loci (isozymes, SSR)

DNA sequences, considered as a whole

DNA sequences, considered site-by-site

Data Types that can be used to Estimate a Phylogeny


Maximum parsimony minimum evolution methods l.jpg

Maximum Parsimony (Minimum Evolution) Methods

  • The process of attaching preference to the pathway that requires the invocation of the smallest number of mutational events.

  • Most effective when examining sequences with strong similarity

  • Underlying premises:

    Mutations are exceedingly rare events.

    The more unlikely events a model invokes, the less likely the model is to be correct.


Using only trait 1 l.jpg

trait1

2

3

trait5

4

sp2

sp1

Species 1

red

0

1.2

A

T

0<->1

3.4

Species 2

blue

0

G

C

3.5

A

Species 3

sp5

1

T

red

sp3

sp4

1

4.0

red

A

T

Species 4

Species 5

1

2.8

blue

G

T

Using only trait 1 …

Traits must have

discrete character states.

Must have same character state in at least 2 taxa.


But traits 3 4 disagree with trait 1 l.jpg

trait1

2

3

trait5

4

Species 1

red

0

1.2

A

T

3.4

Species 2

blue

0

G

C

3.5

A

Species 3

1

T

red

1

4.0

red

A

T

Species 4

Species 5

1

2.8

blue

G

T

But traits 3 & 4 disagree with trait 1.

sp2

sp5

Red<->blue

A<->G

sp3

sp1

sp4


Slide38 l.jpg

  • Every possible tree is considered individually for each informative site (computationally intensive).

  • After all informative sites have been considered, the tree that invokes the smallest total number of substitutions is the most parsimonious.

4

1

2

5

3

3

5

2

1

4

Blue

Blue

0

0

G

G

0

Blue

4 substitutions

required

5 substitutions

required

G

Red

Red

A

A

1

1


Distance based approaches l.jpg

Sp1

Sp2

Sp3

Sp4

Sp5

0

Sp1

0

Sp2

0

Sp3

0

Sp4

Sp5

0

Distance-based approaches

Compare each taxon to every

other taxon to estimate a

“distance matrix”

Distances are then

‘clustered’ to estimate

a phylogenetic tree.

d12

d13

d14

d15

d23

d24

d25

d34

d35

d45


Distance based approaches40 l.jpg

Sp1

Sp2

Sp3

Sp4

Sp5

0

Sp1

0

Sp2

0

Sp3

0

Sp4

Sp5

0

Distance-based approaches

Compare each taxon to every

other taxon to estimate a

“distance matrix”

Example: DNA sequence considered as a whole

10 20304050Sp1: GTGCTGCACGGCTCAGTATAGCATTTACCCTTCCATCTTCAGATCCTGAASp2: ACGCTGCACGGCTCAGTGCGGTGCTTACCCTCCCATCTTCAGATCCTGAASp3: GTGCTGCACGGCTCGGCGCAGCATTTACCCTCCCATCTTCAGATCCTATCSp4: GTATCACACGACTCAGCGCAGCATTTGCCCTCCCGTCTTCAGATCCTAAASp5: GTATCACATAGCTCAGCGCAGCATTTGCCCTCCCGTCTTCAGATCTAAAA

9

8

12

15

11

15

18

10

13

5


Distance based approaches41 l.jpg

Sp1

Sp2

Sp3

Sp4

Sp5

0

Sp1

0

Sp2

0

Sp3

0

Sp4

4

5

Sp5

0

Distance-based approaches

Distances are then

‘clustered’ to estimate

a phylogenetic tree.

Example:

UPGMA algorithm

Unweighted Pair-Group

Method using Arithmetic means

9

8

12

15

11

15

18

10

13

The smallest distance is identified, the average

of the two combined taxa is calculated, and the

matrix is recalculated. This iteration is repeated.

5

2.5

2.5


Distance based approaches42 l.jpg

1

4

3

5

Distance-based approaches

Sp1

Sp2

Sp3

4-5

0

9

8

13.5

Sp1

11

16.5

0

Sp2

11.5

0

Sp3

0

4-5

4

4

2.5

2.5


Distance based approaches43 l.jpg

Distance-based approaches

Sp2

1-3

4-5

0

10

16.5

Sp2

12.5

0

1-3

0

4-5

4

4

5

2.5

2.5

1

3

2

4

5


Distance based approaches44 l.jpg

Distance-based approaches

1-2-3

4-5

0

12.5

1-2-3

0

4-5

6.5

6.5

4

4

5

2.5

2.5

1

3

2

4

5


Maximum likelihood methods l.jpg

Maximum Likelihood Methods

  • Best suited for DNA and protein sequence data

  • Requires a model of evolution

  • Each nucleotide/amino acid substitution has an associated likelihood

  • A function is derived to represent the likelihood of the data given the tree, branch-lengths and additional parameters

  • Function is minimized


Slide46 l.jpg

1

1

1

3

2

3

2

3

4

4

4

2

0.25

L0

T

10-6

L1

L2

T

G

2 x 10-6

L4

L5

L6

T

T

A

G

Tree 1

Based on a model of

nucleotide substitution matrix (transitions and transversions)

A

C

G

T

A

1

10-6

2 x10-6

10-6

C

1

2 x10-6

10-6

10-6

1

G

10-6

10-6

2 x10-6

T

10-6

2 x10-6

10-6

1

1: ACGCG T T GG G

2: ACGCG T T GG G

3: ACGCAA T GAA

4: ACACAGGGAA

L(Tree 1) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 5 x 10-13


Slide47 l.jpg

Consider every possible base assignment to each node and calculate the likelihood

1

3

2

4

0.25

L0

L0

T

C

10-6

L1

L2

L1

L2

2 x 10-6

T

G

T

G

2 x 10-6

L4

L5

L6

L3

L4

L5

L6

T

T

A

G

T

T

A

G

Tree 1

Tree 2

1: ACGCG T T GG G

2: ACGCG T T GG G

3: ACGCAA T GAA

4: ACACAGGGAA

Repeat for each of node assignment, and each site in alignment.

Probability of that unrooted tree is the sum of all individual trees.

Repeat for each unrooted tree and choose the tree with the highest liklihood.

L(Tree 1) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 5 x 10-13

L(Tree 2) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 1 x 10-18


The teosinte maize story l.jpg

6000 – 10,000 years ago

The Teosinte-Maize Story

  • The practical side of sequence diversity

  • PLANT BREEDING!

  • Sequence Diversity in Teosinte

  • Sequence Diversity in Maize

  • Selection During Domestication and Improvement


Sequence diversity and plant breeding l.jpg

Sequence Diversity and Plant Breeding

  • Genetic diversity within a crop species is the raw material for current plant breeding

  • Genetic diversity is the insurance policy to enable plant breeders to adapt crops to changing environments


The problem l.jpg

Bushels Per Acre

Single Cross Hybrids

Open Pollinated Varieties

Double Cross Hybrids

Year

The Problem

  • To what degree is limiting genetic diversity

  • inhibiting genetic improvement in corn?


Two views of the problem l.jpg

Two Views of the Problem

  • “Most of the corn germplasm in use in the USA today is derived from mixtures of only two major races [out of ~ 300 races total] (Wallace and Brown, 1956). The simplest means of correcting this situation and of increasing the genetic diversity of this important crop is to introduce unrelated sources of germplasm” (Brown and Goodman, 1977, Races of Corn, in Corn and Corn Improvement)

  • [From a project comparing sequence diversity in 21 genes of nine U.S. inbred lines with 16 diversity maize landraces] “We found that our sample of [U.S.] inbredscontained a level of [SNP] diversity that was 77% the level of diversity in our landrace sample.”(Tenaillon et al., 2001, PNAS, 98:9161-9166)


Sequence diversity in maize l.jpg

Sequence Diversity in Maize

  • How has selection shaped sequence diversity in maize?

    • Survey SNPs from ~1800 genes in diverse maize and teosinte germplasm

    • Screen 4000 candidate genes for evidence of selection

  • Practical Goal: identify genes exhibiting selection

    • Domestication, agronomic improvement, and local adaptation


Slide53 l.jpg

Allele Frequencies

teosinte

Domestication

landraces

Plant Breeding

modern

inbreds

Unselected

Gene

Domestication

Gene

Improvement

Gene


Can we develop genomic screens to identify genes that have undergone selection l.jpg

Can we develop genomic screens to identify genes that have undergone selection?

1. Invariant SSR approach

2. Direct Sequencing Approach

What proportion of genomic sequences that have low allelic diversity among inbreds result from selection for domestication?

Contrast sequence diversity among teosintes, landraces, and inbreds


Slide55 l.jpg

Screening SSR

primers against

12 inbred lines

  • 1,772 total SSRs

    • 1,053 were polymorphic (Class I)

    • 719 were invariant (Class II)

Invariant SSR primers


Invariant ssr screening l.jpg

Non - Class II

Teosinte(6)

Landrace (5)

US Inbreds

Class II

Teosinte(6)

Landrace (5)

US Inbreds

Invariant SSR Screening

  • 470 invariant SSR primer sets

    • 321 monomorphic throughout

    • 60 polymorphic in both exotics and teosintes

    • 14 polymorphic only in exotics

    • 75 polymorphic only in teosintes (Class II-E)

Vigouroux et al. 2002. PNAS 99: 9650


Analysis of class ii e ssrs l.jpg

Analysis of Class II-E SSRs

  • 31 Class I SSRs and 44 Class II-E SSRs

  • 44 teosinte and 45 landrace accessions

  • Tested for selection (loss of diversity)

  • 0 Class I SSRs showed evidence of selection

  • 15 Class II-E SSRs showed evidence of selection

  • Extrapolated back to the 1772 total SSRs:

  • “1.4% genes have been selected”


Direct sequencing approach l.jpg

Direct Sequencing Approach

  • Purpose: to develop a SNP resource for the maize community

  • Result: a LOT of data!!!


Distribution of snp haplotypes patterns l.jpg

Conserved

Diverse

Distribution of SNP Haplotypes (patterns)

470 maize Unigenes in 14 maize lines

Mean haplotype # = 4.46

> 80% of unigenes have 2 to 7 haplotypes

For each gene, a few haplotypes account for much of the diversity


Slide60 l.jpg

Tripsacum

teosinte

landraces

inbreds

Are genes with low inbred diversity enriched for domestication and improvement candidates?(Masanori Yamasaki, post-doc in McMullen Lab)

36 genes with no diversity among a 14-inbred set

Sequenced same region in 16 landraces,

16 teosintes, and a Tripsacum dactyloides sample.

Test for selection on inbreds, landraces and teosintes compared to four neutral genes.


Selection tests for 33 of 36 genes l.jpg

Selection Tests for 33 (of 36) Genes

5 genes were significant in both the inbreds and the landraces (evidence for domestication genes).

7 genes were significant in the inbreds but not the landraces (evidence for improvement genes).

1 additional gene was classified as either domestication or improvement depending on the test.

13 out of 33 genes = 39% !!

Yamasaki et al. submitted


Selection on a genomic scale l.jpg

Selection on a Genomic Scale

  • Sequenced 774 maize unigenes in 14 maize inbreds and 16 teosinte accessions

  • Tested for selection using coalescent simulations

  • Result: 2-4% had experienced artificial selection

  • Assume 59,000 genes in maize

  • 59,000 x 2% = 1200 selected genes

Wright et al. 2005 Science 308: 1310


Where are we going with this l.jpg

Where are we going with this?

  • Before genomics, 11 genes had been identified as selected by population genetic approaches.

  • By sequencing 1000 genes, have ~50 novel candidates.

  • We need:

  • 1. to completely sequence the maize genome to identify ALL genes.

  • 2. to resequence all remaining genes in multiple maize inbreds and teosinte accessions.

1140 more !


Signatures of selection l.jpg

Signatures of Selection

  • If selected genes were important in the past improvement, continued manipulation might contribute to future gain.

  • If selected genes suffered a loss of diversity because of selection, they are prime candidates for introgressive breeding from wild relatives.

  • Hypothesis: manipulation of the expression of domestication and improvement genes will alter key agronomic traits


Selection for amino acid content l.jpg

25

30

Teosinte

Landraces

25

Maize

20

20

15

% of Kernel Weight

% of total AA

15

10

10

5

5

0

0

Total AA

Valine

Serine

Lysine

Proline

Glycine

Alanine

Taurine

Arginine

Leucine

Histidine

Cysteine

Tyrosine

Ornithine

Isoleucine

Threonine

Methionine

Tryptophan

Proline Total

Aspartic Acid

Glutamic Acid

Arginine Total

Cysteine Total

Phenylalanine

Hydroxyproline

Selection for Amino Acid Content?

  • Four genes that show evidence of selection are involved in amino acid biosynthesis


Selection for amino acid content66 l.jpg

Selection for Amino Acid Content?

  • Are there more genes in amino acid pathways that have been selected?

  • Sequenced 16 genes in 28 maize inbreds, 16 teosinte, and 2 tripsacum.

  • Result: we found 4 genes that may have been selected during domestication/improvement.


The ultimate selection project l.jpg

B73 with knockout

in selected gene

teosinte

B73 with teosinte

allele of selected gene

The Ultimate Selection Project

B73 – inbred line


Sequence diversity in evolution and crop improvement68 l.jpg

Sequence Diversity in Evolution and Crop Improvement

Teosinte

Maize Landraces

Inbreds/Hybrids

Sherry Flint-Garcia

Research Geneticist

USDA-ARS

MU Division of Plant Sciences

Photos courtesy J. Doebley


Slide69 l.jpg

Insertion

B73

CO159

GT119

Tx501

Tx303

Mo17

Mp708

IHO

T218

Deletion

Conserved region

SNP

InDel

Molecular Diversity:

SNP: Single nucleotide polymorphism

InDel: Insertion deletion

SNPs and Indels are used markers for genetic analysis


  • Login