1 / 16

Protein localization & drug subcellular distribution ---- Data Integration

Lab Meeting of Rosania Lab. Protein localization & drug subcellular distribution ---- Data Integration Jingyu, Yu 03/17/2008. 1 . Sequences 2. Structure 3. Expression level 4. Function 5. Partners 6. Location Critical to understanding function

gali
Download Presentation

Protein localization & drug subcellular distribution ---- Data Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab Meeting of Rosania Lab Protein localization & drug subcellular distribution ---- Data Integration Jingyu, Yu 03/17/2008

  2. 1. Sequences 2. Structure 3. Expression level 4. Function 5. Partners 6. Location Critical to understanding function Many are working on this field using image! Protein Subcellular Location Image Database http://murphylab.web.cmu.edu/ Things about Proteins

  3. What are we asking? • Certain compounds/drugs bind specific proteins. • Cheminformatics, SAR/QSAR, CADD,etc • Proteins are NOT distributed evenly in cell. • Question: Will protein localization will play a role in the subcellular distribution, PK of drug/ligand? How? Can we model it?

  4. Starting Point Map/Crosslink these two database Ligand <-> Protein Binding MOAD Protein <-> Localization Organelle DB ?

  5. Organelle DB • Anuj Kumar Lab (MCDB, LSI) • Protein localization data • 30188 genes,138 organisms with emphasis on the major model systems (including human proteins) • Gene names have been obtained from the appropriate model organism database (e.g., SGD, MGI, FlyBase, WormBase, TAIR,etc)

  6. Binding MOAD • Heather A. Carlson Lab(Medchem) • The largest database of protein-ligand complexes • All relevant entries from the Protein Databank (PDB) • 9837 entries, 3151 unique protein systems, andbinding affinity data for 2950 complexes

  7. Data Overview A. Components of Organelle DB: 1.Sequence data (FASTA) 2.Protein information ( Accession_ID, Standard Name, Systematic Name,etc) 3.Localization information (GO Term, Localization information). B. Components of Binding MOAD: 1.Sequence data (FASTA) 2. PDB file (structure information, etc)

  8. How to map/crosslink two database? • Structure Based Method DALI, SSM, Combinatorial extension • Sequence Based Method: BLAST √

  9. Importance of sequence • The sequence of each protein determines where it localized in cells • The subsequences(“motifs”) within a protein’s sequence are responsible for targeting it to one( or more) locations or organells. • PSORT : Predicts a probable localization site to a protein given an amino acid sequence alone (Kenta Nakai in 1991)

  10. Sequence Alignment • Stand alone BLAST ( most people use web based search ) http://www.ncbi.nlm.nih.gov/blast/Blast.cgi) • Set up local database on windows platform (UNIX will be faster) using Organelle FASTA • Using MOAD sequences (FASTA) as query to search the Organelle database. • OVER 30188*20270 pair-wise alignments done in 4 hrs. • Output TEXT file: 180M

  11. Data process • Program coding in C++ was used to automat extracting information we need from the output text file(180M) • Only the best match (highest identities and lowest E value) was selected. • Merge the Organelle DB and the result information. • Sample output:

  12. Summary of alignment

  13. Benchmark of alignment • Random select 10-30 PDB proteins from each level of similarity group. • PDB: http://www.rcsb.org/pdb/home/home.do • Find information about the PDB protein( function, source organism, etc) • Identical at 100%, similar from 99-20% similarity. • Difference are trivial at all levels (except for 2-12%). • Different source organism, same function with different substrate, and same family different subtype. Example: 20% similarity PDB 1cy0: DNA topoisomerase I Organelle 141797: DNA topoisomerase III

  14. About Database • No specific protein localization information is given in PDB. • Information about function of certain proteins in Organelle DB is putative based on sequence or structure based alignment, which should not be considered in benchmark. • Things to be done: • Identify how many proteins are overlap by ClustalW2 • Extracting ligand information from MOAD (automate by programming) SMILES->Structures->Chemaxon/MOE

  15. Perspective • Model subcellular distribution of compounds based on structure of ligand and protein, 1CellPK and the protein localization. • Challenge: 1. No quantitative distribution data available 2. A and B are mitochondria proteins. Do they have same distribution in outer, inner membrane and other compartments?

  16. Acknowledgement • Dr. Gus Rosania • Graduate Students: Xinyuan Zhang, Jason Baik, Nan Zheng

More Related