1 / 23

Bioinformatics II

Bioinformatics II. Stephen Tsui Biochemistry. Sequence Alignment. NCBI. BLAST sequence alignment http://www.ncbi.nlm.nih.gov/BLAST/. Handbook. NCBI. BLAST sequence alignment http://www.ncbi.nlm.nih.gov/BLAST/. Different types of algorithms:

sheryl
Download Presentation

Bioinformatics II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics II Stephen Tsui Biochemistry

  2. Sequence Alignment

  3. NCBI BLAST sequence alignment http://www.ncbi.nlm.nih.gov/BLAST/ Handbook

  4. NCBI BLAST sequence alignment http://www.ncbi.nlm.nih.gov/BLAST/ • Different types of algorithms: BLASTn – blast search for nucleotide sequences BLASTp – blast search for protein sequences BLASTx – translate a nucleotide sequence to search against a protein database Bl2seq – blast search for two sequences BLAST for short and highly similar sequences Example: p53 mRNA; p53 protein

  5. Sequence Alignment DNA sequence alignment – identities only; Protein sequence alignment – identities and similarities.

  6. Parameters of Sequence Alignment Word size – the size of short subsequences, which for proteins the default is three letters. BLAST works by first making a look-up table of all the "words" and "neighboring words", i.e., similar words in the query sequence. The sequence database is then scanned for these "hot spots". When a match is identified, it is used to initiate gap-free and gapped extensions of the "word".

  7. Parameters of Sequence Alignment A gap is a space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. Gap penalty - To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score. Extension of the gap to encompass additional nucleotides or amino acid is also penalized in the scoring of an alignment. Decrease in gap penalty could increase the chance of the incorporation of gaps into the alignment.

  8. Parameters of Sequence Alignment Scoring Matrices BLOSUM62 Matrices The BLOSUM 62 matrix is the default for most BLAST programs. Other matrices include BLOSUM45, BLOSUM80, PAM30 and PAM70.

  9. EBI • Other sequence alignment tools http://www.ebi.ac.uk/services/ FASTA - can be very specific when identifying long regions of low similarity especially for highly diverged sequences. MPsrch - MPsrch utilises an exhaustive algorithm, which is recognized as the most sensitive sequence comparison method available. As a consequence, MPsrch is capable of identifying hits in cases where BLAST and FASTA fail and also reports fewer false-positive hits.

  10. EBI Multiple sequence alignment tools • ClustalW http://www.ebi.ac.uk/clustalw/index.html Example >human_CRIP MPKCPKCNKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYAAMFGPKGFGRGGAESHTFK >mouse_CRIP MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >rat_CRIP MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >frog_CRIP MPKCPKCQKEVYFAERVSSLGKDWHRPCLKCEKCSKTLTPGSHAEHEGKPYCNQPCYGALFGPKGFGRGGTESHSYK

  11. Structure Visualization Download the structures in http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure and view them by free software, e.g. Cn3D - http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml Rasmol - http://www.umass.edu/microbio/rasmol/ Swiss-PDBviewer - http://au.expasy.org/spdbv/ Example: Hepatitis B virus core protein

  12. Structure Prediction

  13. Secondary Structure Prediction Example: Analyzed insulin receptor by PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/psiform.html

  14. Tertiary Structure Analysis Tools • SWISS-MODEL - is a fully automated protein structure homology-modeling server http://swissmodel.expasy.org/ • SWISS-PDBViewer - is an application that provides a user friendly interface allowing to analyze several proteins at the same time. The proteins can be superimposed in order to deduce structural alignments and compare their active sites or any other relevant parts. http://ca.expasy.org/spdbv/

  15. Tertiary Structure Prediction BioInfoBank Meta Server - offers a gateway to well-benchmarked protein structure and function prediction methods. http://bioinfo.pl/meta/ Examples: 3D-PSSM mGenThreader

  16. EBI Building phylogenetic trees • Phylip algorithm http://www.ebi.ac.uk/clustalw/index.html Example >human_CRIP MPKCPKCNKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYAAMFGPKGFGRGGAESHTFK >mouse_CRIP MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >rat_CRIP MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >frog_CRIP MPKCPKCQKEVYFAERVSSLGKDWHRPCLKCEKCSKTLTPGSHAEHEGKPYCNQPCYGALFGPKGFGRGGTESHSYK

  17. Other Resources

  18. ExPASy Proteomics Server Proteomics tools – a new and specialized area that demand extra efforts in order to get satisfactory results. Mascot - http://www.matrixscience.com/ Example: 1011.48 1373.65 1567.84 1589.81 1711.78 2400.31 2851.43

  19. ExPASy Proteomics Server Metabolic Pathways • Roche Applied Science’s Biochemical Pathways http://ca.expasy.org/tools/pathways/ • Enzyme http://ca.expasy.org/enzyme/ Example: Glucose-6-phosphate dehydrogenase

  20. Software for Building Phylogenetic Trees http://evolution.genetics.washington.edu/phylip/software.html Totally, 274 phylogeny packages and 32 free servers are listed. Two of the most versatile and popular are PAUP and PHYLIP.

  21. Protein Functional Analysis • InterProScan – searching for protein domains by multiple algorithms. The identification of protein domain has a big implication on protein fucntions. http://www.ebi.ac.uk/InterProScan/ Example: p53 protein

  22. Patent Search Engine • Patent abstracts http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+query+-libList+PATABS Example: The authentication of herbal Chinese medicines discovered by Prof. Shaw Pang-Chui

  23. Microarray Databases • ArrayExpress - a public repository for microarray data http://www.ebi.ac.uk/arrayexpress/ Example: FHL2

More Related