GenBank Huge amounts of data, easily accessible Rate of growth of phylogenetic knowledge Number of papers with “molecular” and “phylogeny” in Web of Science Number of studies in TreeBASE Why have a phylogeny database? Archive data and trees (repeat old analyses with new tools)
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Number of papers with “molecular” and “phylogeny” in Web of Science
Number of studies in TreeBASE
Hemideina maori (weta)
18 TreeBASE names = 1 real name
3 TreeBASE names = 1 real name
Physeter catodon (Sperm Whale)
TreeBASE and GenBank have harp seals under two different names, only ITIS knows that they are the same thing
but “tree surfing” won’t find them
Fig. 1. The `data availability matrix' for green plant protein sequences from GenBank (release 132). A set of 130304 sequences for 14667 species sequences were clustered into 61117 groups of homologous proteins by a combination of BLAST and single-linkage clustering (using the program Blastclust from the NCBI Blast toolkit: http://www.ncbi.nlm.nih.gov/BLAST/ ). A column represents a protein or protein family; a row represents one of the species in the dataset; and a dot indicates the existence of a sequence for that species and protein. Species are sorted vertically by their number of sequences; the most-represented species ( Arabidopsis thaliana ) is at the top. Proteins are sorted horizontally by the number of taxa for which they have been sequenced; the most heavily sequenced gene ( rbcL ) is on the right. This figure shows the most heavily sampled corner of the data availability matrix; the remainder of the matrix is even more sparse.
Lienhard & Smithers (2002)
[courtesy of Kevin Johnson]
Problem: not enough data and trees in journals make it into databases
Text, data, trees locked up in paper and PDF