A comparison of algorithms for identification of specimens using DNA barcodes: examples from gymnosp...
Download
1 / 44

Damon P. Little and Dennis Wm. Stevenson Cullman Program for Molecular Systematic Studies - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

A comparison of algorithms for identification of specimens using DNA barcodes: examples from gymnosperms. Damon P. Little and Dennis Wm. Stevenson Cullman Program for Molecular Systematic Studies The New York Botanical Garden, Bronx, New York. Why is DNA barcoding useful?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Damon P. Little and Dennis Wm. Stevenson Cullman Program for Molecular Systematic Studies' - ardith


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

A comparison of algorithms for identification of specimens using DNA barcodes: examples from gymnosperms

Damon P. Little and Dennis Wm. Stevenson

Cullman Program for Molecular Systematic Studies

The New York Botanical Garden, Bronx, New York


Why is dna barcoding useful
Why is DNA barcoding useful? using DNA barcodes: examples from gymnosperms


Why is dna barcoding useful1
Why is DNA barcoding useful? using DNA barcodes: examples from gymnosperms

  • (1) Non–specialists can identify specimens (e.g., customs inspectors, ethnobotanists).

  • (2) Morphologically deficient or incomplete specimens can be identified (e.g., powders).


Application to conservation
application to conservation: using DNA barcodes: examples from gymnosperms

  • Cycadopsida:

  • all 305 species are protected by CITES (Convention on International Trade in Endangered Species)

  • 5 genera are appendix I

  • 6 genera are appendix II*

Cycas machonie


Nrits 2
nrITS 2: using DNA barcodes: examples from gymnosperms

((GTGCTCGGGC and TCTCGCACTG) and not CGCCTCCCCT)

CGCCTCCCCT

Lepidozamia hopei

Encephalartos ferox

CITES appendix II

CITES appendix I


Selection of the barcode locus
selection of the barcode locus using DNA barcodes: examples from gymnosperms


Loci used for barcoding
loci used for barcoding using DNA barcodes: examples from gymnosperms

  • nuclear:

  • rDNA: 26S, 18S, ITS 1, ITS 2

  • mitochondrial:

  • COI

  • chloroplast:

  • trnH-psbA, rbcL


Consortium for the barcode of life cbol
Consortium for the Barcode Of Life (CBOL) using DNA barcodes: examples from gymnosperms

  • cpDNA: matK, rpoC1, rpoB, YCF5, accD, ndhJ

  • Edinburgh (UK) => Podocarpus, Araucaria, Asterella, Anastrophyllum

  • Instituto de Biologia UNAM (Mexico) => Agave

  • Kew (UK) => Conostylis, Pinus, Equisetum, Dactylorhiza

  • National Biodiversity Institute (South Africa) => Encephalartos, Mimetes

  • Natural History Museum (Denmark) => Hordeum, Scalesia, Crocus

  • Natural History Museum (UK) => Tortella, Ptychomniaceae, Asplenium,

  • New York Botanical Garden (USA) => Elaphoglossum, Cupressus, Labordia

  • Universidad de los Andes (Colombia) => Lauraceae

  • University of Cape Town (South Africa) => Anastrophyllum, Bryum

  • Universidade Estadual de Feira de Santana (Brazil) => Laelia, Cattleya


Measuring precision and accuracy

measuring precision and accuracy using DNA barcodes: examples from gymnosperms


Test data sets
test data sets using DNA barcodes: examples from gymnosperms

  • gymnosperm nuclear ribosomal internal transcribed spacer 2 (nrITS 2)

  • 1,037 sequences

  • 413 species

  • 71 genera

  • gymnosperm plastid encoded maturase K (matK)

  • 522 sequences

  • 334 species

  • 75 genera


Pairwise divergence
pairwise divergence using DNA barcodes: examples from gymnosperms


Hierarchical clustering
hierarchical clustering using DNA barcodes: examples from gymnosperms


Alignment
…alignment using DNA barcodes: examples from gymnosperms


Hierarchical clustering1
hierarchical clustering using DNA barcodes: examples from gymnosperms

  • reference databases:

  • aligned with MUSCLE 3.52

  • query sequence:

  • aligned to the reference database using MUSCLE (“-profile” option)

  • parsimony (TNT 1.0):

  • (1) 200 iteration ratchet holding 1 tree

  • (2) SPR holding 1 tree

  • neighbor joining (PHYLIP 3.63):

  • Jukes–Cantor distance (returns 1 tree)

  • identification scored using “Least Inclusive Clade”


Will and rubinoff 2004
Will and Rubinoff (2004)... using DNA barcodes: examples from gymnosperms

  • identification ambiguity due to tree shape

  • Fitch (1971) optimization of group membership variables


Least Inclusive Clade using DNA barcodes: examples from gymnosperms


Clustering with nrits 2 and matk
…clustering with nrITS 2 and using DNA barcodes: examples from gymnospermsmatK


Clustering time s
…clustering time (s) using DNA barcodes: examples from gymnosperms

N = 29; 3.06 GHz Intel Pentium 4; 1 GB of RAM; Ubuntu Linux 5.04 (Hoary Hedgehog)


Similarity methods
similarity methods using DNA barcodes: examples from gymnosperms


Similarity methods1
similarity methods using DNA barcodes: examples from gymnosperms

  • BLASTn (version 2.2.10)

  • BLAT (version 32)

  • megaBLAST (version 2.2.10)

  • default parameters

  • best match(es) taken as ID


Similarity methods with nrits 2 and matk
…similarity methods with nrITS 2 and using DNA barcodes: examples from gymnospermsmatK


Similarity time s
… similarity time (s) using DNA barcodes: examples from gymnosperms

N = 29; 3.06 GHz Intel Pentium 4; 1 GB of RAM; Ubuntu Linux 5.04 (Hoary Hedgehog)


Combination methods cf bold id
combination methods (cf. BOLD using DNA barcodes: examples from gymnosperms–ID)


Combination methods cf bold id1
combination methods (cf. BOLD using DNA barcodes: examples from gymnosperms–ID):

  • get the top 100 BLAST hits

  • (2) align with MUSCLE

  • (a) 200 iteration ratchet holding 1 tree

  • (b) SPR holding 1 tree

  • (c) neighbor joining with Jukes–Cantor distances


Combination methods with nrits 2 and matk
…combination methods with nrITS 2 and using DNA barcodes: examples from gymnospermsmatK


Combination time s
…combination time (s) using DNA barcodes: examples from gymnosperms

N = 29; 3.06 GHz Intel Pentium 4; 1 GB of RAM; Ubuntu Linux 5.04 (Hoary Hedgehog)


Diagnostic methods
diagnostic methods using DNA barcodes: examples from gymnosperms


Dna bar dasgupta et al 2005
DNA using DNA barcodes: examples from gymnosperms–BAR (DasGupta et al 2005):

each sequenceand itsreverse complement (separated by 50 ``N'' symbols)

degenbar

presence/absence matrix of“distinguishers” up to 50 bp long


Dna bar dasgupta et al 20051
DNA using DNA barcodes: examples from gymnosperms–BAR (DasGupta et al 2005):

matrix of distinguishers

query + PERL script

C. arizonica 1 matches = 582

C. arizonica 2 matches = 582

C. lusitanica 1 matches = 582

ID = the reference sequence(s) with the greatest number of

matching presence/absence scores


Dna bar distinguisher matrix
DNA using DNA barcodes: examples from gymnosperms–BAR... distinguisher matrix


Diagnostic methods dome id
diagnostic methods: DOME ID using DNA barcodes: examples from gymnosperms

  • reference database (via PERL and MySQL):

  • (1) all sequence strings of 10 nucleotides offset by 5 nucleotides were extracted from the reference sequences

  • (2) each string was classified as diagnostic (unique to a particular species) or non–diagnostic

  • (3) diagnostic strings were inserted into the diagnostic barcode database

GCGTTGATGG GTTGGGCGTT CATACGTTGG GTCACCATAC CCTTTGTTTG AGGGACCTTT CTGAGCATCG GTGCACTGAG TTCTCGATGC GGCGTTTCTC TAGCTGGCGT AGGTCTAGCT GGCTGAGGTC GCTTGCATCG CCCTAGCTTG AATGTGCGCA GATGCAATGT TAGCCGGCGT CTGTCTAGCC GCCTTGCCCC ATGCCCCCTG ATCGTGGTGC CCCTGCAAGT AGTGTGCGCA TAGACGACGT CTGTCTAGAC GACTTGCCCC CTTGCGGATC CGGCCTGACT ACCCCCGGCC CGTGAACCCC CTGCCTGACT CCCCCCTGCC TGGGCCGTCA CGCGATGGGC ATACGCGCGA GCCCTTTGAG TGCGGTGGGA CAAGTGAGGA TCGGGCAAGT TAAAATCGTC CAAACCCGTC GTGCATGTGC CGTGCGTGCA CTTCCCACGA CCGTCCCGCA GCATTTGCGG CTCGGGGAGC AAGACCCGTC GCGGCAAGAC GTGCGTGCGT TGCAGAGGGG TTCTCACGAA AGGTTCTCCC GTGCCAGGTT TGCGTCCCGC TTGTTTGCGT TTTCATTGTT GGCGGCATGA TCCCCTGCCC CTTGCTTTTT GGCGGCTTGC CGGCGGGCGG CGGCACGGCG CTTTACGGCA AGACTCCGCG GATCGAGACT CAAGTGATCG GGTGTCAAGT GGTGGCCCCC GGCTCATCAT TGAAACGTGC CCCAAGACGG CGTGCCCCAA AGGACCGGGA TGGGGGTGGG CCGCGTGGGG GACCTCCATT AAACCGACCT AAAGAAAAGA TCCAAGAAAA GCCTGTTTTC GGTCAGCCTG CATGCGTGCG TCAAGGATCC CGGTTTCAAG CGACGCGGTT GTGCTCGGAA GGGATGTGCT CTACGGTCGA GTCGCCTACG ATAGTCTTCA CGGCGATAGT TGTTTTCATG GATGGTGTTT GTCCCTATCA ATTAAAATAC CGATCCGAGT GCGGGTGAGA TCCCCCCCAA AGGATGACGA GCAAAAGGAT ACATGATTCG AATACAACTC CGCAAGCGGC GGCGTGGAAT TCAGCGTTGG ACGGGTCAGC GATAGTCCGT GATCCGATAG GCATTGGGGG GATATTTGAT TAGCCCAAAA TCGCCTAGCC GCCCTTCGGC CATGCGCCCT CTACTCTTTC AACGTCTACT CACGCGAGAG CGCGTCACGC CGCGTATCTT AGCGTGCATC GGGGGAGCGT GCTACGGGGG CGAGGCGTCC GGAACCGAGG TTTCACGGGT GCCGATCCGG AATGCGCCGA GTACTCGCGA TGGCAAGGAT GCCGGTACCG CAACGGCCGG AAGCGGGCAG GCAGCAAGCG CGAGACGATG GACGACGAGA AGACCCGGGA CGAGCCTTCA CGGATGAGAA TTGCGCGGAT CTCCATAGGT TTCCCCCAAG AATCGTTCCC CGCCTCGATG CCGAGCCTCG TTCAAGAATC GTGAATTCAA AAAATTCACG TCGTCCGCCG GCGACCCAGC GAAGCGCGAC ACGGGTGCCG CGTGTAATGT AACGACGTGT AGTAAAGGTC GCTCAAGTAA GACGTGCTCA TGCTGGACGT TAGATGGCTG GGCGGTATGT CCGATGCGAT ATCCCCCGAT TCCTGTCCTC GAGACTCCAA ACCGGCGTTG CAAAGACCGG ACTGAAATGA AGGGCTCGGC ATATCGTCGG CAGGAATCCC AATTGCAGGA CCAACGATGA ACATCCCAAC TGTCAACATC CCTCTCCCGT GGTTGGACGG TTGATGGTTG GGGGATTGAT AATCTAGTTG AGGGGAATCT CTCTTTCCAA CGCCTCTCTT CTGTGCGCCT TCGACCTGTG CTTTCTCGAC CGCTACTTTC AGCGCCGCTA ATCTCAGCGC TGGGTATCTC CTCGTTGGGT TCGCGCTCGT GTGTGTCGCG CTTGACGTCC AAAGCCTCGT CTTCGAAAGC CCGATGCGCT TCTCGCCGAT CCCTGTCTCG GTTGGAGGGT TGATCGTTGG TTGATTGATC GGTGATTGAT TCGTGGGTGA TCTTCTCGTG GCTATTCTTC GACGGGCTAT TAGCTGACGG CTGGATAGCT CAGCACTGGA GGCTTCAGCA TCGCGGGCTT GTGATTGCTG CCGCCGTGAT CTGCCCCGCC CTTCTCTGCC CCTGACTTCT CGTTGCCTGA GCTGCCGTTG TGCTGGCTGC TCCAGTGCTG GGCTATCCAG CCGTGGGCTA GCGCCCCGTG CTGTTGCGCC CGAGGCTGTT CTTTACGCCT GCGCCCTTTA GAAAGGGCTT GATCGGAAAG TGTTGCATGT GGTCCTGTTG TTGTCGGTCC CATGGTTGTC


Diagnostic methods dome id1
diagnostic methods: DOME ID using DNA barcodes: examples from gymnosperms

  • reference database (via PERL and MySQL):

  • (1) all sequence strings of 10 nucleotides offset by 5 nucleotides were extracted from the reference sequences

  • (2) each string was classified as diagnostic (unique to a particular species) or non–diagnostic

  • (3) diagnostic strings were inserted into the diagnostic barcode database

diagnostic barcode database


Diagnostic methods dome id2
diagnostic methods: DOME ID using DNA barcodes: examples from gymnosperms

diagnostic barcode database

query + MySQL + PERL script

C. arizonica matches = 43

ID = the reference sequence(s) with the greatest number of

matching presence/absence scores


Diagnostic methods atim
diagnostic methods: ATIM using DNA barcodes: examples from gymnosperms

PERL script

presence/absence matrix of all possible of 10 bp combinations [1,048,576 motifs]


Diagnostic methods atim1
diagnostic methods: ATIM using DNA barcodes: examples from gymnosperms

1,048,576 character presence/absence matrix

TNT (parsimony ratchet)

reference tree (strict consensus)


Diagnostic methods atim2
diagnostic methods: ATIM using DNA barcodes: examples from gymnosperms

query

+

1,048,576 character presence/absence matrix

+

reference tree (positive constraint)

TNT (TBR hold 20)

identification scored using “Least Inclusive Clade”


Diagnostic methods with nrits 2 and matk
…diagnostic methods with nrITS 2 and using DNA barcodes: examples from gymnospermsmatK


Diagnostic time s
…diagnostic time (s) using DNA barcodes: examples from gymnosperms

N = 29; 3.06 GHz Intel Pentium 4; 1 GB of RAM; Ubuntu Linux 5.04 (Hoary Hedgehog)


Dawg i training dataset
DAWG I “training” dataset using DNA barcodes: examples from gymnosperms


The dawg i training dataset
…the DAWG I “training” dataset using DNA barcodes: examples from gymnosperms


Conclusions
conclusions: using DNA barcodes: examples from gymnosperms

  • all methods are relatively precise

  • => expect accuracy to approximate precision

  • observed accuracy of species level identification is lower

  • => failure of the algorithms to correspond to species delimitations (shared haplotypes or haplotypes of a species are more similar to those of different species)

  • => for accurate identification, the reference database must contain virtually all haplotypes

  • none of the methods performed particularly well

  • => computer time

  • => BLAST (BLAT and megaBLAST too)

  • => DNA–BAR


Acknowledgments

brilliant insights &tc: using DNA barcodes: examples from gymnosperms

K. Cameron

C. Chaboo

T. Dikow

C. Martin

R. Meier

M. Mundry

money:

Cullman Program for Molecular Systematic Studies

DIMACS/NSF

acknowledgments


ad