Bioinformatics II

Bioinformatics II Stephen Tsui Biochemistry

Sequence Alignment

NCBI BLAST sequence alignment http://www.ncbi.nlm.nih.gov/BLAST/ Handbook

NCBI BLAST sequence alignment http://www.ncbi.nlm.nih.gov/BLAST/ • Different types of algorithms: BLASTn – blast search for nucleotide sequences BLASTp – blast search for protein sequences BLASTx – translate a nucleotide sequence to search against a protein database Bl2seq – blast search for two sequences BLAST for short and highly similar sequences Example: p53 mRNA; p53 protein

Sequence Alignment DNA sequence alignment – identities only; Protein sequence alignment – identities and similarities.

Parameters of Sequence Alignment Word size – the size of short subsequences, which for proteins the default is three letters. BLAST works by first making a look-up table of all the "words" and "neighboring words", i.e., similar words in the query sequence. The sequence database is then scanned for these "hot spots". When a match is identified, it is used to initiate gap-free and gapped extensions of the "word".

Parameters of Sequence Alignment A gap is a space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. Gap penalty - To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score. Extension of the gap to encompass additional nucleotides or amino acid is also penalized in the scoring of an alignment. Decrease in gap penalty could increase the chance of the incorporation of gaps into the alignment.

Parameters of Sequence Alignment Scoring Matrices BLOSUM62 Matrices The BLOSUM 62 matrix is the default for most BLAST programs. Other matrices include BLOSUM45, BLOSUM80, PAM30 and PAM70.

EBI • Other sequence alignment tools http://www.ebi.ac.uk/services/ FASTA - can be very specific when identifying long regions of low similarity especially for highly diverged sequences. MPsrch - MPsrch utilises an exhaustive algorithm, which is recognized as the most sensitive sequence comparison method available. As a consequence, MPsrch is capable of identifying hits in cases where BLAST and FASTA fail and also reports fewer false-positive hits.

EBI Multiple sequence alignment tools • ClustalW http://www.ebi.ac.uk/clustalw/index.html Example >human_CRIP MPKCPKCNKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYAAMFGPKGFGRGGAESHTFK >mouse_CRIP MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >rat_CRIP MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >frog_CRIP MPKCPKCQKEVYFAERVSSLGKDWHRPCLKCEKCSKTLTPGSHAEHEGKPYCNQPCYGALFGPKGFGRGGTESHSYK

Structure Visualization Download the structures in http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure and view them by free software, e.g. Cn3D - http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml Rasmol - http://www.umass.edu/microbio/rasmol/ Swiss-PDBviewer - http://au.expasy.org/spdbv/ Example: Hepatitis B virus core protein

Structure Prediction

Secondary Structure Prediction Example: Analyzed insulin receptor by PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/psiform.html

Tertiary Structure Analysis Tools • SWISS-MODEL - is a fully automated protein structure homology-modeling server http://swissmodel.expasy.org/ • SWISS-PDBViewer - is an application that provides a user friendly interface allowing to analyze several proteins at the same time. The proteins can be superimposed in order to deduce structural alignments and compare their active sites or any other relevant parts. http://ca.expasy.org/spdbv/

Tertiary Structure Prediction BioInfoBank Meta Server - offers a gateway to well-benchmarked protein structure and function prediction methods. http://bioinfo.pl/meta/ Examples: 3D-PSSM mGenThreader

EBI Building phylogenetic trees • Phylip algorithm http://www.ebi.ac.uk/clustalw/index.html Example >human_CRIP MPKCPKCNKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYAAMFGPKGFGRGGAESHTFK >mouse_CRIP MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >rat_CRIP MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >frog_CRIP MPKCPKCQKEVYFAERVSSLGKDWHRPCLKCEKCSKTLTPGSHAEHEGKPYCNQPCYGALFGPKGFGRGGTESHSYK

Other Resources

ExPASy Proteomics Server Proteomics tools – a new and specialized area that demand extra efforts in order to get satisfactory results. Mascot - http://www.matrixscience.com/ Example: 1011.48 1373.65 1567.84 1589.81 1711.78 2400.31 2851.43

ExPASy Proteomics Server Metabolic Pathways • Roche Applied Science’s Biochemical Pathways http://ca.expasy.org/tools/pathways/ • Enzyme http://ca.expasy.org/enzyme/ Example: Glucose-6-phosphate dehydrogenase

Software for Building Phylogenetic Trees http://evolution.genetics.washington.edu/phylip/software.html Totally, 274 phylogeny packages and 32 free servers are listed. Two of the most versatile and popular are PAUP and PHYLIP.

Protein Functional Analysis • InterProScan – searching for protein domains by multiple algorithms. The identification of protein domain has a big implication on protein fucntions. http://www.ebi.ac.uk/InterProScan/ Example: p53 protein

Patent Search Engine • Patent abstracts http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+query+-libList+PATABS Example: The authentication of herbal Chinese medicines discovered by Prof. Shaw Pang-Chui

Microarray Databases • ArrayExpress - a public repository for microarray data http://www.ebi.ac.uk/arrayexpress/ Example: FHL2

Bioinformatics II