1 / 18

Module 3 Sequence and Protein Analysis (Using web-based tools)

Module 3 Sequence and Protein Analysis (Using web-based tools). Working with Pathogen Genomes - Uruguay 2008. Artemis & ACT. PSU Projects. Organism. Database entry. Finished genome. Annotated genome. Annotation using Artemis: mapping domains in proteins. Gene finders. Primary

marcos
Download Presentation

Module 3 Sequence and Protein Analysis (Using web-based tools)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008

  2. Artemis & ACT PSU Projects Organism Database entry Finishedgenome Annotatedgenome

  3. Annotation using Artemis: mapping domains in proteins

  4. Gene finders Primary DNA sequence Preannotation manual curation BlastN tRNA scan BlastX Dotter Repeats rRNA tRNA Pseudo-genes CDSs

  5. Gene finders Primary DNA sequence Preannotation Manual curation BlastN tRNA scan BlastX Dotter Repeats rRNA tRNA Pseudo-genes CDSs Fasta BlastP Pfam Prosite Psort SignalP TMHMM Manual curation Annotated sequence

  6. Gene model annotation Protein function

  7. Annotation of Protein-coding genes: (from gene model to protein function) • search programs: local (BLAST) and global (FASTA) alignments, EST hits • Protein domains and motifs: InterPro (Pfam, Prosite, SMART etc.) • Transmembrane / signal peptide prediction (TMHMM, SignalP, Phobius) • - Base annotation on characterised proteins where possible (manually curated SWISSPROT entry) • Read the literature (PUBMED) Use several lines of evidence!

  8. Annotation of non-protein-coding genes: (tRNAs, rRNAs, snRNAs, other ncRNAs) Structural conservation of ncRNAs! • Initial searches: • BlastN, GC-plots • tRNA scan • sno scan • Others • Search in specialised databases: • Rfam scan • microRNAdb etc. • Comparative ncRNA prediction tools: • RNAZ • Evofold • QRNA etc. • Structure prediction of ncRNAs: • MFOLD • Others Use several lines of evidence!

  9. Statistical significance of database hits E-values (Expectation value) E-value = No alignments with the equivalent score that you would expect to find by random chance. An e-value of 5 would mean that you would expect 5 alignments with the equivalent or higher score to have occurred by random chance more reliable than the % ID Caution: Repeat regions / non-curated protein sequences

  10. Sequence similarity searching: BLAST (Basic Local Alignment Search Tool) analysis: Nucleotide sequences: blastn: nucleotide sequence compared to nucleotide database blastx: nucleotide sequence translated and all 6 frame translations compared to protein database tblastn: protein query vs translated database Protein sequences blastp: protein query vs protein database tblastx: translated query vs translated database (all 6 frames) FastA: Provides sequence similarity and homology searching against nucleotide and protein databases using the Fasta programs. Fasta can be very specific when identifying long regions of low similarity especially for highly diverged sequences.

  11. FASTA (Global) BLAST (Local)

  12. Orthologues and paralogues Human hemoglobin Human myoglobin Human hemoglobin Mouse hemoglobin orthologues paralogues Originate from evolution Similar functions Originate from gene duplication Diverged functions Best tool to look for orthologues? Blast or FastA? FastA!

  13. A B A B C A B C Functional assignment: alignments of modular proteins

  14. HMMs WHAAAAT??? A hidden Markov model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. The extracted model parameters can then be used to perform further analysis, for example for pattern recognition applications. An HMM can be considered as the simplest dynamic Bayesian network.

  15. ..HMPLKHRLHP.. ..RMPLKHRPHP.. ..GMRLKHRHHP.. ..PMGLKHAGHP.. aligned sequences ..-MPLKHR-HP.. Profile HMM for the aligned motif that can be used to search databases for proteins containing this motif

  16. ..-MPLKHR-HP.. Create HMM Search database with HMM Remote homology detection ..RMPLKHRFHP.. ..PMPLKHRIHP.. ..HMPLKHDVHP.. ..YMDLKHELHP.. ..-MPLKHR-HP.. • FastA • Blast • Psi-blast • HMM searches • HMM-HMM comparison: HHPred server http://toolkit.tuebingen.mpg.de/hhpred • HMM-HMM comparison: HHPred server http://toolkit.tuebingen.mpg.de/hhpred • Psi-blast • HMM searches Psi-blast

  17. Input protein sequence Psi-blast Secondary structure prediction Alignment HMM building Secondary structure comparison HMM-HMM comparison Extremely sensitive remote homology detection 3D structure modelling

  18. Module 3 Exercises: Section A: •Sequence retrieval of a P. falciparum protein (cyclophilin) using SRS • BLAST and Fasta searches by cutting & pasting the sequence. Section B: Exercise 1 Part I: • Search PROSITE server by cutting & pasting the cyclophylin sequence Exercise 1 Part II: • Pfam server Exercise 1 Part III: • SMART server Exercise 1 Part IV: • InterPro server Exercise 2: • Sequence retrieval of P. falciparum PFC0125w protein using SRS. • TMHMMv2.0 server. • SignalPv3.0 server. Section C: • Other web resources

More Related