1 / 1

UniProt Non-redundant Reference Cluster (UniRef) Databases

www.uniprot.org. UniProt Non-redundant Reference Cluster (UniRef) Databases.

bevis
Download Presentation

UniProt Non-redundant Reference Cluster (UniRef) Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. www.uniprot.org UniProt Non-redundant Reference Cluster (UniRef) Databases UniProt Reference Clusters (UniRef), UniRef100, UniRef90 and UniRef50 are automatically generated from UniProt Knowledgebase and selected UniParc records. The databases provide complete coverage of sequence space while hiding redundant sequences from view. The non-redundancy allows faster sequence similarity searches by using UniRef90 and UniRef50 UniProtKB Sequences UniProtKB Isoform Sequences Selected UniParc Sequences from ENSEMBL, RefSeq and PDB databases String Comparison: Identifying sub-fragments and identical sequences UniRef100 Identical sequences and sub-fragments with 11 or more residues are placed into a single record CD-HIT computation: Clustering UniRef100 representative sequences at 90% level UniRef90 40% size Reduction UniRef90 Members of related UniRef100s at 90% level form a UniRef90 cluster. The representative is selected based on the quality of the entry, name, organism and sequence length. Title and identifier are derived from the representative sequence. CD-HIT computation: Clustering UniRef90 representative sequences at 50% level UniRef50 Members of related UniRef90s at 50% level form a UniRef90 cluster. The representative is selected based on the quality of the entry, name, organism and sequence length. Title and identifier are derived from the representative sequence. UniRef50 65% size Reduction Generating data files for distribution XML file <?xml version="1.0" encoding="ISO-8859-1" ?> <UniRef90 xmlns="http://uniprot.org/uniref" <entry id="UniRef90_P00439" updated="2006-05-16"> <name>Phenylalanine-4-hydroxylase related cluster</name> <representativeMember> <dbReference type="UniProtKB ID" id="PH4H_HUMAN"> <property type="UniProtKB accession" value="P00439"/> <property type="UniProtKB accession" value="Q16717"/> <property type="UniProtKB accession" value="Q8TC14"/> <property type="UniRef100 ID" value="UniRef100_P00439"/> <property type="protein name" value="Phenylalanine-4-hydroxylase"/> <property type="source organism" value="Homo sapiens (Human)"/> <property type="NCBI taxonomy" value="9606"/> <property type="length" value="452"/> </dbReference> <sequence length="452" checksum="018F00EBBBDDCE2F"> MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDV NLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPW FPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQFADIAYNYRHGQPIPRVEYM EEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGF RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFA QFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSE KPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATIPRPFSVRYDPYTQR IEVLDNTQQLKILADSINSEIGILCSALQKIK </sequence> </representativeMember> FASTA file UniRef Release >UniRef90_P00439 Phenylalanine-4-hydroxylase related cluster MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDV NLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPW FPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQFADIAYNYRHGQPIPRVEYM EEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGF RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFA QFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSE KPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATIPRPFSVRYDPYTQR IEVLDNTQQLKILADSINSEIGILCSALQKIK UniRef Usages ●Speeding up similarity search ●Reducing bias in homology searches by providing more even sequence space ●Using the clusters for family classification ●Using the clusters to annotate EST and other sequence databases ●Using the clusters to check the consistency of UniProtKB annotations Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI) Protein Information Resource (PIR) UniProt is mainly supported by the National Institutes of Health (NIH) grant 2 U01 HG02712-04. Additional support for the EBI's involvement in UniProt comes from the European Commission contract FELICS (021902) and from the NIH grant 5 P41 HG02273-06. UniProtKB/Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science. PIR activities are also supported by the NIH grants for NIAID proteomic resource (HHSN266200400061C) and grid enablement (NCI-caBIG-ICR), and National Science Foundation grants for protein ontology (ITR-0205470) and BioTagger (IIS-0430743). Contact help@uniprot.org

More Related