Protein Classification. A comparison of function inference techniques . Why do we need automated classification?. Sequencing a genome is only the first step. Between 35-50% of the proteins in sequenced genomes have no assigned functionality.
A comparison of function inference techniques
ProtoMap: Automatic Classification of Protein Sequences, a Hierarchy of Protein Families, and Local Maps of the Protein Space (1999)
Clustering is done iteratively.
Start with a threshold of E < 10-100
Cluster and increase threshold by a factor of 105
Sublinear threshold prevents the collapse of sequence space
Functional Classification of Proteins for the Prediction of Cellular Function from a Protein-Protein Interaction Network (2003)
PRODISTIN: Present problems in clustering by biochemical function
ProtoMap: Can create undesired connection among unrelated groups
P(linking to node i)
y = 2.5 for ProDom
y = 1.7 for Pfam