1 / 46

NCBI Molecular Biology Resources

NCBI Molecular Biology Resources. A Field Guide part 2 (post intermission). September 30, 2004 ICGEB. PSI-BLAST Position-Specific Iterated BLAST. Mining for protein domains

mesperanza
Download Presentation

NCBI Molecular Biology Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB

  2. PSI-BLASTPosition-Specific Iterated BLAST Mining for protein domains Confirming relationships among related proteins

  3. Position Specific Substitution Rates Weakly conserved serine Active site serine

  4. Position Specific Score Matrix (PSSM) A R N D C Q E G H I L K M F P S T W Y V 206 D 0 -2 0 2 -4 2 4 -4 -3 -5 -4 0 -2 -6 1 0 -1 -6 -4 -1 207 G -2 -1 0 -2 -4 -3 -3 6 -4 -5 -5 0 -2 -3 -2 -2 -1 0 -6 -5 208 V -1 1 -3 -3 -5 -1 -2 6 -1 -4 -5 1 -5 -6 -4 0 -2 -6 -4 -2 209 I -3 3 -3 -4 -6 0 -1 -4 -1 2 -4 6 -2 -5 -5 -3 0 -1 -4 0 210 D -2 -5 0 8 -5 -3 -2 -1 -4 -7 -6 -4 -6 -7 -5 1 -3 -7 -5 -6 211 S 4 -4 -4 -4 -4 -1 -4 -2 -3 -3 -5 -4 -4 -5 -1 4 3 -6 -5 -3 212 C -4 -7 -6 -7 12 -7 -7 -5 -6 -5 -5 -7 -5 0 -7 -4 -4 -5 0 -4 213 N -2 0 2 -1 -6 7 0 -2 0 -6 -4 2 0 -2 -5 -1 -3 -3 -4 -3 214 G -2 -3 -3 -4 -4 -4 -5 7 -4 -7 -7 -5 -4 -4 -6 -3 -5 -6 -6 -6 215 D -5 -5 -2 9 -7 -4 -1 -5 -5 -7 -7 -4 -7 -7 -5 -4 -4 -8 -7 -7 216 S -2 -4 -2 -4 -4 -3 -3 -3 -4 -6 -6 -3 -5 -6 -4 7 -2 -6 -5 -5 217 G -3 -6 -4 -5 -6 -5 -6 8 -6 -8 -7 -5 -6 -7 -6 -4 -5 -6 -7 -7 218 G -3 -6 -4 -5 -6 -5 -6 8 -6 -7 -7 -5 -6 -7 -6 -2 -4 -6 -7 -7 219 P -2 -6 -6 -5 -6 -5 -5 -6 -6 -6 -7 -4 -6 -7 9 -4 -4 -7 -7 -6 220 L -4 -6 -7 -7 -5 -5 -6 -7 0 -1 6 -6 1 0 -6 -6 -5 -5 -4 0 221 N -1 -6 0 -6 -4 -4 -6 -6 -1 3 0 -5 4 -3 -6 -2 -1 -6 -1 6 222 C 0 -4 -5 -5 10 -2 -5 -5 1 -1 -1 -5 0 -1 -4 -1 0 -5 0 0 223 Q 0 1 4 2 -5 2 0 0 0 -4 -2 1 0 0 0 -1 -1 -3 -3 -4 224 A -1 -1 1 3 -4 -1 1 4 -3 -4 -3 -1 -2 -2 -3 0 -2 -2 -2 -3 Serine scored differently in these two positions Active site nucleophile

  5. >gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINE MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGF VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVD EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAY RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGA VRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK PSI-BLAST e value cutoff for PSSM

  6. RESULTS: Initial BLASTP Same results as protein-protein BLAST

  7. Results of First PSSM Search Other purine nucleotide metabolizing enzymes not found by ordinary BLAST

  8. Just below threshold, another nucleotide metabolism enzyme Check to add to PSSM Third PSSM Search: Convergence

  9. MegaBLAST > 1133045 gnl|UG|Hs#S1133045 qd43b11.x1 Homo sapiens cDNA, 3' end CATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGTTTG GTGAGAAGTGCTCGATTAGTTCAGACAACATCTGGCACTTGATGTCTGTCCTTCCCTCCT TTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAAGGT GACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACACCG TCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAAAAC CACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGC > 1141828 gnl|UG|Hs#S1141828 qv37f11.x1 Homo sapiens cDNA, 3' end GAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGT GCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATA CATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAA GTCGTATCGATGT > 1145899 gnl|UG|Hs#S1145899 qv33c06.x1 Homo sapiens cDNA, 3' end GAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGT GCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATA CATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAA GTCGTATCGATGT > 2291670 gnl|UG|Hs#S2291670 7e65f04.x1 Homo sapiens cDNA, 3' end TTTCATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGT TTGGTGAGAAGTGCTCGATTAGTTCAAACAACATCTGGCACTTGATGTCTGTCCTTCCCT CCTTTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAA GGTGACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACA CCGTCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAA AACCACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGCCTCCCAACCGCATTC CTGCCTGTGTAGCAGGCGGTGAGCACCCAGAAGGGGCACATACCTCTCCAAGCCTTGAAA GCAAAGCATGGAGATCTACAAAAATAGGATTTCCACTTGGAGAAATGTCGCTGGGACAGT AI217550 AI251192 AI254381 BE645079 C:\seq\hs.4.fsa

  10. What is Discontiguous (Cross-species) MegaBLAST? W = 11, t = 16, coding: 1101101101101101 W = 11, t = 16, non-coding: 1110010110110111 W = 12, t = 16, coding: 1111101101101101 W = 12, t = 16, non-coding: 1110110110110111 W = 11, t = 18, coding: 101101100101101101 W = 11, t = 18, non-coding: 111010010110010111 W = 12, t = 18, coding: 101101101101101101 W = 12, t = 18, non-coding: 111010110010110111 W = 11, t = 21, coding: 100101100101100101101 W = 11, t = 21, non-coding: 111010010100010010111 W = 12, t = 21, coding: 100101101101100101101 W = 12, t = 21, non-coding: 111010010110010010111 Ma, B., Tromp, J., Li, M., "PatternHunter: faster and more sensitive homology search", Bioinformatics 2002 Mar;18(3):440-5

  11. Neighbors: Precomputed BLAST Nucleotide Protein Entrez Related Sequences produces a list of sequences sorted by BLAST score, but with no alignment details.

  12. Blink – Protein BLAST Alignments • Lists only 200 hits • List is nonredundant

  13. BLAST Databases: Non-redundant protein nr (non-redundant protein sequences) • GenBank CDS translations • NP_ RefSeqs • Outside Protein • PIR, Swiss-Prot, PRF • PDB(sequences from structures)

  14. BLAST Databases: Nucleic Acid • nr (nt) • Traditional GenBank Divisions • NM_ and XM_ RefSeqs • dbest • EST Division • htgs • HTG division • gss • GSS division • chromosome • NC_ RefSeqs • wgs • whole genome shotgun

  15. Genomic BLAST • These pages provide customized nucleotide and protein databases for each genome • If a Map Viewer is available, the BLAST hits can be viewed on the maps

  16. What if Your Favorite Gene is not found in the latest genome build? POSSIBLE VARIANTS: • The gene does not exist; • It exists, but there is a problem with assembly; • It exists, but there is a problem with annotation

  17. An example: finding prestin in Human genome • We start with rat prestin, BLAST it against the Human genome and look for evidences that human prestin exists as well.

  18. >gi|12188917|emb|AJ303372.1|RNO303372 Rattus norvegicus ATGGATCATGCTGAAGAAAATGAAATTCCTGCAGAGATCAGAAGTACCTCGTGGAA GTCATCCGGTCCTCCAGGAGAGGCTGCACGTCAAGGACAAAGTCACAGACTCCATC GCAGGCATTCACGTGCACTCCTAAAAAAGTAAGAAACATCATCTACATGTTCTTGC TTGCCAGCATATAAATTCAAGGAGTATGTGCTGGGTGACTTGGTCTCGGGCATAAG AGCTCCCCCAAGGCTTAGCCTTCGCGATGCTGGCAGCTGTGCCTCCGGTGTTCGGC Searching the Human Genome On for same species comparisons

  19. BLAST Results Human Genome Database 953 contigs 2.9 billion letters 16 hits to one contig

  20. Map Viewer: Genomic Context of BLAST Hits Genome Scan Models Contig GenBank Genes Mouse EST hits Human EST hits

  21. Human prestin: now appears in Build 34

  22. Now we can compare genes

  23. Three prestin genes: finally together!

  24. Same prestin, different assemblies

  25. Does homology mean the common biological function? • Not always; the existence of the common ancestor does not guarantee that some function won’t be lost or acquired after the divergence. An example: zeta-crystallin is a component of a transparent lens matrix of the vertebrate eye. Its homolog in E.coli is the metabolic enzyme quinone oxidoreductase.

  26. Text Entrez Sequence BLAST Structure VAST

  27. Structure similarity: No More BLASTing! • Three-dimensional structures are most conserved during the evolution; • One still can detect the existence of the common ancestor based on the structure similarity; • Spatial similarity is not calculated the same way we do it for sequences

  28. VAST: Structure Neighbors Vector Alignment Search Tool 4 For each protein chain, 2 locate SSEs (secondary structure elements), 5 6 and represent them as individual vectors. 1 3 Human IL-4

  29. VAST: Structure Neighbors

  30. Structure Neighbors in Cn3D C-Src kinase Human vs. Chicken SH3 SH2

  31. 3D Domain Neighbors Human C-Src Kinase (Tyr) vs. Chk1 kinase (Ser/Thr)

  32. NCBI is changing From sequence data storage facility to one-stop shop with integrated databases of various kind. You can be part of the future – work with us! Your expertise and data are indispensable.

  33. GenBank

  34. Refseq

  35. Entrez Gene

  36. Homologene database

  37. New generation of databases: an example

  38. Protein interaction database: a seed for future precomputed resources

  39. New databases: GenSAT

  40. PubChem

  41. Headache? Take Aspirin

  42. Aspirin has 432 neighbors

  43. Link to 3D protein structures

  44. PubCrawler – Update Alerting Service for PubMed and GenBank

  45. MedBlast: searching for articles related to a sequence.

  46. For More Information… E-mail addresses • General Helpinfo@ncbi.nlm.nih.gov • BLASTblast-help@ncbi.nlm.nih.gov The (free!) NCBI Newsletter http://www.ncbi.nih.gov/About/newsletter.html The NCBI Handbook Follow the link from the NCBI Home Page The NCBI Education Page http://www.ncbi.nih.gov/Education/index.html

More Related