The Protein Identifier Cross-Reference (PICR) service. Overview. The problem… What is PICR? Access via web and web services. -No direct comparison of the results can be done. -Both groups used different Protein DB to report their results. The problem…. Protein list A (DB Search vs. IPI)
The Protein Identifier Cross-Reference (PICR) service
-No direct comparison of the results can be done.
-Both groups used different Protein DB to report their results.
We would need to use the PICR tool to be able to make a direct comparison
Merging datasets to a common identifier space
Finding all aliases/synonyms for an identifier
(data integration – submissions!)
Mapping from secondary IDs to more recent primary IDs
Preparing data sets for specific tools
Querying in various primary databases
(data format requirements)
The basic problem: the same protein sequence is referred to by multiple accession numbers assigned by multiple databases.
No universal identifier scheme
Redundant databases – multiple identifiers for the same sequence in the same database
Unstable identifiers (ex: gi numbers)
Obsolete and deleted identifiers (hypothetical proteins)
Different production cycles for major databases
Tools exist, but are limited in important their database and species coverage and in their usability and availability.
BLAST functionality for protein fragments
Limit search by taxonomy (pessimistic)
Submit accessions OR sequences (FASTA) with 500 entry interactive limit (no batch limit)
Choose to return all mappings or only active ones
Select output format
Select one or many databases to map to in one request
Wein et al., NAR, 2012