1 / 28

DATABASE

BIOCHEMISTRY

Download Presentation

DATABASE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIOLOGICAL DATABASES M.Prasad Naidu MSc Medical Biochemistry, Ph.D,.

  2. INTRODUCTION The database • must be maintained as a central shareable resource • should provide easy-to-use software to access the information (web-pages...) • has to be structurally organised and fully annotated to find the information needed • should not contain redundant information • should be error free

  3. Levels of protein sequence databases and structural organisation Primary database Primary Sequence AVILDRYFH Motif or Pattern Secondary [AS]-X-[IL]2-[DE] Secondary database Rosmann fold, GTP-binding domain... Structure database Tertiary Domain

  4. Different Types Of Databases • Primary Databases. • Composite Databases. • Secondary Databases.

  5. PRIMARY DATABASES • In 1980, Due to the flooding of sequence information, need to storage of sequence Data. • They contain sequence information. • Eg: NAProtein EMBL PIR Gen Bank MIPS DDBJ SWISS-Prot Tr-EMBL NRL-3D

  6. PIR • Developed by National Biomedical Research Foundation in 1960’s by Margaret Dayhoff – to investigate evolutionary relationships between proteins. • Maintained by PIR, an association of Macromolecular sequence data collection centres • Pir at NBRF • International protein information database of Japan (JIPID). • Martinsried institute of Protein sequences (MIPS).

  7. Quality of PIR Database Has been split into 4 different sections ranked according to quality: • PIR1: fully classified and annotated entries • PIR2: includes preliminary entries (may include redundancy) • PIR3: includes unverified entries • PIR4: contains conceptual translations

  8. MIPS • Collects and processes sequence Data for the PIR. • Also distributed with Patch x ,a supplement of unverified protein sequences from external resources.

  9. SWISS-PROT database • Produced by the Dept. of Medical Biochemistry at University of Geneva and the EMBL in 1986. • Was transferred to EBI in1994. • Further changed to Swiss institute of Bioinformatics-SIB. • Has a High level annotated entries with descriptions of functions, structure, post translational modifications.

  10. Example of a Flat file: SWISS-PROT Q14790 ID ICE8_HUMAN STANDARD; PRT; 479 AA. AC Q14790; Q14791; Q14792; Q14793; Q14794; AC Q14795; Q14796; Q15780; Q15806; Q9UQ81; AC O14676; DT 01-NOV-1997 (Rel. 35, Created) DT 01-NOV-1997 (Rel. 35, Last sequence update) DT 01-OCT-2000 (Rel. 40, Last annotation DT update) DE CASPASE-8 PRECURSOR (EC 3.4.22.-) (ICE-LIKE DE APOPTOTIC PROTEASE 5)(MORT1-ASSOCIATED CED-DE 3 HOMOLOG) (MACH) (FADD-HOMOLOGOUS ICE/CED-DE 3-LIKE PROTEASE) (FADD-LIKE ICE) (FLICE) DE (APOPTOTIC CYSTEINE PROTEASE)(APOPTOTIC DE PROTEASE MCH-5) (CAP4). GN CASP8 OR MCH5. Identification PROTEIN_SOURCE Gene name Description Date of entry Accession number Because ID codes can change

  11. OS Homo sapiens (Human). OC Eukaryota; Metazoa; Chordata; Craniata; OC Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Primates; Catarrhini; OC Hominidae; Homo. OX NCBI_TaxID=9606; RN [1] RP SEQUENCE FROM N.A., AND ALTERNATIVE RP SPLICING. RC TISSUE=Thymus, and B-cell; RX MEDLINE=96279826; PubMed=8681376; [NCBI, RX ExPASy, EBI, Israel, Japan] RA Boldin M.P., Goncharov T.M., Goltsev Y.V., Wallach D.; Organism species Organism classification References

  12. Reference 2 and so on ... RT "Involvement of MACH, a novel MORT1/FADD-interacting protease, in RT Fas/APO-1- and TNF receptor-induced cell death."; RL Cell 85:803-815(1996). RN [2] RP X-RAY CRYSTALLOGRAPHY (2.8 ANGSTROMS). RX MEDLINE=99451259; PubMed=10508784; [NCBI, RX ExPASy, EBI, Israel, Japan] RA Blanchard H., Kodandapani L.,Mittl P.R.E., RA Di Marco RA, S., Krebs J.F., Wu J.C., RA Tomaselli K.J., Gruetter M.G.; RT "The three-dimensional structure of RT caspase-8: an initiator enzyme in RT apoptosis."; RL Structure 7:1125-1133(1999).

  13. Function CC -!- FUNCTION: MOST UPSTREAM PROTEASE OF CC THE ACTIVATION CASCADE OF CASPASES CC RESPONSIBLE FOR THE FAS-RECEPTOR CC MEDIATED (CD95) AND TNFR-1 INDUCED CELL CC DEATH. BINDING TO THE ADAPTOR MOLECULE CC FADD RECRUITS IT TO EITHER RECEPTORS. CC THE RESULTING AGGREGATE CALLED THE CC DEATH-INDUCING SIGNALING COMPLEX (DISC) CC PERFORMS FLICE/MACH PROTEOLYTIC CC ACTIVATION. THE ACTIVE DIMERIC ENZYME IS CC THEN LIBERATED FROM THE DISC AND FREE TO CC ACTIVATE DOWNSTREAM APOPTOTIC PROTEASES. CC PROTEOLYTIC FRAGMENTS OF THE N-TERMINAL CC PROPEPTIDE (TERMED CAP3, CAP5 AND CAP6) CC ARE LIKELY RETAINED IN THE DISC. CLEAVES Comments

  14. CC AND ACTIVATES CASPASE-3, -4, -6, -7, -9, CC AND -10. MAY PARTICIPATE IN THE GRANZYME B CC APOPTOTIC PATHWAYS. PROTEOLYTICALLY CC CLEAVES POLY(ADP-RIBOSE) POLYMERASE(PARP). CC HYDROLYZES THE SMALL- MOLECULE SUBSTRATE, CC AC- ASP-GLU-VAL-ASP-|-AMC. LIKELY TARGET CC FOR THE COWPOX VIRUS CRMA DEATH INHIBITORY CC PROTEIN. CC -!- SUBUNIT: HETERODIMER OF A 18 KDA (P18) CC AND A 10 KDA (P10) SUBUNIT. INTERACTS WITH CC CFLAR. CC -!- ALTERNATIVE PRODUCTS: 8 ISOFORMS; 1- CC ALPHA (SHOWN HERE), 2-ALPHA/MCH5-BETA, 3-CC ALPHA, 4-ALPHA, 1-BETA, 2-BETA, 3-BETA AND CC 4-BETA; ARE PRODUCED BY ALTERNATIVE CC SPLICING. Presence of subunits and of alternative proteins

  15. CC -!- TISSUE SPECIFICITY: ALPHA 1 AND BETA 1 CC ISOFORMS ARE EXPRESSED IN A WIDE VARIETY CC OF TISSUES. HIGHEST EXPRESSION IN CC PERIPHERAL BLOOD LEUKOCYTES, SPLEEN, CC THYMUS AND LIVER. BARELY DETECTABLE IN CC BRAIN, TESTIS, AND SKELETAL MUSCLE. CC -!- PTM: GENERATION OF THE SUBUNITS CC REQUIRES ASSOCIATION WITH THE DISC, CC WHEREAS ADDITIONAL PROCESSING IS LIKELY CC DUE TO THE AUTOCATALYTIC ACTIVITY OF THE CC ACTIVATED PROTEASE. GRANZYME B AND CC CASPASE-10 CAN BE INVOLVED IN THESE CC PROCESSING EVENTS. CC -!- SIMILARITY: BELONGS TO PEPTIDASE CC FAMILY C14; ALSO KNOWN AS THE CASPASE CC FAMILY. CONTAINS 2 DEATH EFFECTOR CC DOMAINS (DED). Tissue specificity, Post-translational modifications , Similarity

  16. DR EMBL; X98172; CAA66853.1; -. [EMBL / DR GenBank / DDBJ] [CoDingSequence] DR EMBL; X98173; CAA66854.1; -. [EMBL / DR GenBank / DDBJ] [CoDingSequence] DR EMBL; X98174; CAA66855.1; -. [EMBL / DR GenBank / DDBJ] [CoDingSequence] DR PDB; 1QDU; PRELIMINARY. [ExPASy / RCSB] DR SWISS-3DIMAGE; ICE8_HUMAN. DR InterPro; IPR001875; DED. DR Pfam; PF01335; DED; 2. DR Pfam; PF00655; ICE_p10; 1. DR Pfam; PF00656; ICE_p20; 1. DR PROSITE; PS50207; CASPASE_P10; 1. DR PROSITE; PS50208; CASPASE_P20; 1. DR PROSITE; PS50168; DED; 2. Database cross-referencewith access number

  17. DR ProDom [Domain structure / List of seq. DR sharing at least 1 domain] DR BLOCKS; Q14790. DR DOMO; Q14790. DR PROTOMAP; Q14790. DR PRESAGE; Q14790. DR DIP; Q14790. DR SWISS-2DPAGE; GET REGION ON 2D PAGE. KW Hydrolase; Thiol protease; Apoptosis; KW Zymogen; Alternative splicing; KW 3D-structure. Keywords

  18. Subunits Variant and sequence error Active site position FT PROPEP 1 216 FT CHAIN 217 374 CASPASE-8 SUBUNIT P18. FT PROPEP 375 384 FT CHAIN 385 479 CASPASE-8 SUBUNIT P10. FT ACT_SITE 317 317 FT ACT_SITE 360 360 FT DOMAIN 2 80 DED 1. FT DOMAIN 100 177 DED 2. FT VARSPLIC 102 102 R -> RFHFCRMSWAEANSQC FT QTQSVPFWRRVDHLLIR (IN ISOFORM 4 ALPHA). FT VARSPLIC MISSING (IN ISOFORM 2 ALPHA, FT ISOFORM 4 ALPHA AND ISOFORM 4 BETA). FT CONFLICT 285 285 D -> H (IN REF. 3 AND FT 5). FT CONFLICT 294 294 E -> D (IN REF. 4). Feature Table

  19. SQ SEQUENCE 479 AA; 55391 MW; SQ 7A5FEAA6B39B582F CRC64; MDFSRNLYDI GEQLDSEDLA SLKFLSLDYI PQRKQEPIKD ALMLFQRLQE KRMLEESNLS FLKELLFRIN RLDLLITYLN TRKEEMEREL QTPGRAQISA YRVMLYQISE EVSRSELRSF KFLLQEEISK CKLDDDMNLL DIFIEMEKRV ILGEGKLDIL KRVCAQINKS LLKIINDYEE FSKERSSSLE GSPDEFSNGE ELCGVMTISD SPREQDSESQ TLDKVYQMKS KPRGYCLIIN NHNFAKAREK VPKLHSIRDR NGTHLDAGAL TTTFEELHFE IKPHDDCTVE QIYEILKIYQ LMDHSNMDCF ICCILSHGDK GIIYGTDGQE APIYELTSQF TGLKCPSLAG KPKVFFIQAC QGDNYQKGIP VETDSEEQPY LEMDLSSPQT RYIPDEADFL LGMATVNNCV SYRNPAEGTW YIQSLCQSLR ERCPRGDDIL TILTEVNYEV SNKDDKKNMG KQMPQPTFTL RKKLVFPSD // The same file in an oriented Web looking via SWISS-Prot

  20. TrEMBL database • Designed as a supplement to SWISS-PROT • Benefits by providing translation of all coding sequences • Consists of 2 sections SP-TrEMBL with entries that will be incorporated into SWISS-PROT after annotation REM-TrEMBL with entries that are not destined to be included in SWISS-PROT (synthetic sequences, conceptual translations,…) do not compromise the quality of the SWISS-PROT

  21. NRL-3D databases • Contains onlyprotein sequences extracted from the Brookhaven Protein Databank (PDB) But includes: • bibliographic references and MEDLINE cross- references • secondary structure information • active and binding site, modification in the sequence • details on experimental method, resolution, R-factor,…

  22. Composite protein sequence Databases • 1) To render sequence searching more efficient • To answer the questions of choosing the ‘best’ primary databases? (the most up-to-date, which database to use? ,…)

  23. Some of the Composite protein sequence databases available NRDB OWL MIPSX SP+TrEMBL PDB SWISS-PROT PIR SWISS-PROT SWISS-PROT PIR MIPSOwn TrEMBL PIR GenBank MIPSTrn GenPept NRL-3D MIPSH SWISS-PROT update PIRMOD GenPeptupdate NRL-3D SWISS-PROT EMTrans GBTrans Kabat PseqIP

  24. NRDB • NRDB (Non-Redundant Database) is built locally at the NCBI. • It is a composite of -Gen pept. (Genbanks CDS translations) -PDB sequences. -Swissprot update (updates of swissprot) -PIR -Gen pept updates (daily updates of Gen pept) • NRDB is not prone to errors. • NRDB is the database of BLAST services.

  25. OWL • Non redundant protein Sequence database. • Built at university of Leeds in collaboration with the Dares bury Laboratory in Washington. • Composite of -Swiss-Prot. -PIR -Genbank. -NRL-3D.

  26. MIPS X • Merged database produced at the Max Planck institute in Martinsried Institute of Protein sequences. • Composite of -PIR NRL-3D -MIPSOWN Swiss-prot -MIPS Trn EM trans -MIPS H GB trans -PIRMOD

  27. Swiss-Prot +TrEmbl • EBI constructed database. • Composite of both Swiss-Prot + TrEmbl. • Minimally redundant. • SRS is used to retrieve the information.

  28. THANK YOU

More Related