Bioinformatics and Protein Sequence Analysis

Surabhi Agarwal Bioinformatics and Protein Sequence Analysis With sequencing of large number of proteins and subsequent storage of data, it has become easier for researchers to study the proteins. These studies help in providing preliminary insights into the structural and functional aspects of proteins without conducting experiments.

Master Layout (Part 1) 1 This animation consists of 2 parts: Part 1: Protein Sequence Alignment Part 2: Alignment analysis and interpretations Extract the newly determined amino acid sequence for your query peptide. 2 3 4 Assess the significance of the result with its alignment score Seq 1 Seq 2 Seq 3 5

Definitions of the componentsPart 1 – Protein sequence alignment 1 Query Peptide: This refers to the unknown protein or peptide that is provided as an input to the sequence analysis server. The sequence of this protein is determined before carrying out further studies for analyzing similarity matches with other proteins. Relevant Algorithm: An algorithm refers to the sequence of logical steps that are used for comparing the query peptide with other given protein sequences. The nature of query such as “Local” or “Global” and “Pair-wise alignment” or “Multiple Sequence Alignment” determines the algorithm that is used. Local Alignment: “Local” alignment represents matching individual blocks of protein sequences in which the protein alignment gets broken at positions where a mismatch occurs. The aim of such alignment studies is to find the longest possible blocks of similarity in aligned protein sequences. Global Alignment: “Global” alignment represents an end-to-end alignment of two or more sequences, where gaps are introduced at the positions where mismatches occur. Pair-wise sequence alignment: This procedure compares and aligns two given sequences. The comparison can either be Global or Local with the quality of alignment being judged by the alignment score. 2 3 4 5

Definitions of the componentsPart 1 – Protein sequence alignment 1 • Multiple Sequence Alignment: This refers to the end-to-end alignment of several given sequences that are provided to the search engine. Multiple alignment tends to introduce minimum gaps and finds regions of similarity within all given sequences. • Word –length: The minimum length of an amino acid sequence that needs to match exactly in order to initiate an alignment process in either direction. Sensitivity and speed of alignment are dependent on the word length provided by the user. • Scoring Matrix: The matrix of values that are referred to for assigning a score to the alignment of pairs of residues. The matrix used for a BLAST search is selected depending on the type of sequences that one is searching with. These are PAM series matrices and BLOSUM series. • PAM: PAM stands for Point Accepted Mutations. It is a log-odds, matrix scoring system that is constructed on the amino acid replacements in a set of closely related proteins. PAM value helps in defining the percentage of mutations that get accepted from a given set of proteins. 1 PAM refers to a change in position for an average of 1% of amino-acids residues. • BLOSUM: This stands for “Blocks of Amino Acid Substitution Matrix” and is constructed from a set of distantly related proteins. BLOSUM provides a comprehensive biological insight into proteins when the evolutionary distance is not known beforehand. It is based on the relative frequency of amino acid residues and the probabilities of their substitution in a set of highly conserved blocks of residues in proteins that are evolutionarily distant. 2 3 4 5

Definitions of the componentsPart 1 – Protein sequence alignment 1 • Threshold: Threshold provides a measure of the statistical significance of the results of an alignment study and represents the expected number of matches occurring by chance event. • Gap Penalty and Gap Extension: In an alignment of two or more given protein sequences, a gap is introduced wherever an amino acid mismatch occurs. In this context, “Gap penalty” refers to a deduction in the overall alignment score on introduction of a gap while the “Gap Extension” is for extending an already existing gap. • Alignment Score: This is also referred to as the Bit Score and provides a comparative quantification of the quality of alignment. The score increases when a higher number of residue matches and lower number of mismatches are encountered. The alignment having a higher bit score is a better match. • Percentage Identity: This indicates the percentage of amino acid residues that are an identical match to each other during the comparison of two sequences. • E-value: E-value provides a quantification of any chance alignment between two or more sequences instead of them being a biologically significant match. For similarity match against a database, this value is dependant on the size of the database against which the sequence is compared. The closer the e-value is to zero, the higher is the biological significance of the match. • Hit: The results of a search are called a ‘Hit’ and the term ‘best Hit’ would refer to the best result for that particular query. 2 3 4 5

1 Step 1: Pair-wise sequence alignment for two given sequences - INPUT Length of initial set of amino acids that needs to be matched before alignment begins SEQUENCE DATABASE Expected Number of Matches that are allowed to occur by chance Enter sequence 1 2 Word Size 3 >gi|268576797|ref|XP_002643378.1| C. briggsae CBR-COL-186 protein [Caenorhabditis briggsae] MKSTEKKSTELDLELEAQSLRRIAFFGVAMSTVATFVCIITVPLAYNKMQQMQSNMIDQYMASARGIRVA … Values deducted from overall alignment score on introduction and extension of mismatches 1 10 Threshold The reference matrix used to assign scores to matches of residues 3 Enter sequence 2 Existence 11, Extension 1 Gap penalty >gi|6682|emb|CAA35955.1| collagen [Caenorhabditis elegans] MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQHRSNGLWDEYK … Enter sequence 1 BLOSUM62 Scoring Matrix PAM30 BLOSUM62 ALIGNMENT ALGORITHM (BLAST) 4 Action Description of the action Audio Narration Alignment algorithms are computer algorithms which take the 2 protein sequences and align them residue by residue. Here we depict alignment done between 2 given sequences. To align two sequences, enter them in input box. We took the example of CBR-COL-186 protein of Caenorhabditis briggsae and collagen of Caenorhabditis elegans. The sequences are abridged for the purpose of animation. To carry out the exact study, users can download the sequences corresponding to the Gene ID. Enter the parameters as per the nature of the query and the purpose of the search and finally click on the BLAST tool. Follow the animation steps. Re-draw all figures. Show all definitions first by highlighting the parameter. Follow it with input of 2 sequences and the parameter values one by one. Downlink after scoring matrix should look like the downlinks seen on web-pages. Click on the downlink and show the BLOSUM62 Matrix getting selected. Click on BLAST tool Schematic of the process of pair-wise alignment 5

1 Step 2: Pair-wise sequence alignment for two given sequences - OUTPUT Bit score are the normalized scores which are found after normalization of raw scores based on the scoring matrix used in the algorithm Dot-Plot is the graphical visualization of the two given sequences to find approximate overlaps to identify regions of close similarity The percentage of residues which were identical in the two sequences 2 The statistical measure of the biological significance. The closer e-value is to 0, higher is the biological significance Shows the match or mismatch between each of the residues 3 DOT-PLOT E-VALUE Sequence 1 PERCENTAGE IDENTITY BIT SCORE ALIGNMENT: Sequence 1 LELEAQSLRRIAFFGVAMSTVATFVCIITVPLAYNKMQQMQSNMIDQYMASARGIRVARR + E +SLR++AFFG+A+ST+AT II VP+ YN MQ +QS++ + Sequence 2 IAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSE----------VEF Gaps introduced in sequence 2 due to lack of similar residues in sequence 1 6e-19 Sequence 2 77.4 bits 34% Action Description of the action Audio Narration 4 Shows the various output formats for pair-wise alignment Show the smaller image of the server with every output and definitions coming out of it one at a time as shown in the powerpoint animation Pair-wise alignment with the help of BLOSUM 62 matrix gives various kinds of results after alignment. These are alignment, alignment score, dot-plot, percentage identity and e-value. The raw score from BLOSUM62 matrix is 189 and from PAM30 matrix is 178. Bit score for alignment of the exact same study done using BLOSUM62 is 77.4 and for PAM30 matrix is 78.7. Therefore, the Bit scores give a uniform and normalized measure of the overall quality of alignment irrespective of the scoring system. The biological significance of this result is very high as the e value is very near to 0. For a more detailed study on the types of BLAST tools available, visit http://blast.ncbi.nlm.nih.gov/Blast.cgi 5 http://blast.ncbi.nlm.nih.gov/Blast.cgi

Step 3: Pair-wise alignment of sequences against database- INPUT 1 SEQUENCE DATABASE Enter sequence 1 Word Size 3 2 MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQHRSNGLWDEYKRFQGVSGVEGRIKRDAYHRSLGVSGASRKARRQSYGNDAAVGGFGGSSGGSCCSCGSGAAGPAGSPGQDGAPGNDGAPGAPGNPGQDASEDQTAGPDSFCFDCPAGPPGPSGAPGQKGPSGAPGAPGQSGGAALPGPPGP 10 Threshold SELECT DATABASE Existence 11, Extension 1 Gap penalty 3 PROTEIN NUCLEOTIDE GENE PROTEOME GEO EST SNP PAM30 Scoring Matrix PAM30 BLOSUM62 ALIGNMENT ALGORITHM (BLAST) 4 Action Description of the action Audio Narration Alignment can also be done by matching a sequence against a related database of sequences to identify it. Input the unknown sequence, and then select the database against which the sequence is to be matched. Fill the parameter values as per the purpose of the search and the nature of the query sequence. In this case we study the hits using PAM30 scoring Matrix. Click on the BLAST tool once all parameters have been entered. Follow the animation steps. Re-draw all figures. Show all definitions first by highlighting the parameter. Follow it with input of 1 sequence. Downlink after “Select Database” and “Scoring Matrix” should look like the downlinks seen on web-pages. Select “Protein” under the “Select Database” options box as shown in the animation. Follow this by inputting the parameter values one by one. Click on the downlink against “Scoring Matrix” and show the PAM30 Matrix. Click on BLAST tool. Schematic of the process of pair-wise alignment 5

1 Step 4: Pair-wise alignment of sequences against database- OUTPUT SEQUENCE DATABASE Percentage of residues exactly matching in the query sequence and the selected hit Enter sequence 1 The query is scanned to find domains from Pfam Database. In case, such a domain is identified, it is shown as part of the result Word Size 3 In the case of database searches, E-value is found by the multiplication of pair-wise e-value number of sequences in the database. MPSSVSWGILLLAGLCCLVPVSLAEDPQGDAAQKTDTSHHDQDHPTFNKITPNLAEFAFSLYRQLAHQSNSTNIFFSPVSIATAFAML Identifies the protein sequence and the source organism for the unknown sequence 2 Alignment shows 100% matching with the identified sequence Measure of the quality of the alignment when compared to bit scores of other hits of the search 10 Threshold Pfam ID: pfam01484: Domain Name: Col_cuticle_N Description: Nematode cuticle collagen N-terminal domain SELECT DATABASE Existence 11, Extension 1 Gap penalty PROTEIN NUCLEOTIDE GENE PROTEOME GEO EST SNP BLOSUM Scoring Matrix 3 Domain Identified (if any) PAM BLOSUM 17 69 1 50 250 300 ALIGNMENT: Query MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQH MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQH Database MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQH 100 150 200 IDENTIFICATION GENE ID: 179452 col-13 | Collagen [Caenorhabditiselegans] Percentage Identity TOTAL SCORE 100% E-Value ALIGNMENT ALGORITHM (BLAST) 624 bits 1e-176 4 Action Description of the action Audio Narration Pair-wise alignment gives various kinds of results after alignment. These are alignment views, alignment score, dot-plot, e-value, percentage identity amongst many others. When compared to bit scores from other hits of the result, the bit score turns out to be the highest for collagen proteins in Caenorhabditis elegans Shows the various output formats for pair-wise alignment Show the smaller image of the server with every output and definitions coming out of it one at a time as shown in the powerpoint animation 5 http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html; http://pfam.sanger.ac.uk/

Step 5: Multiple Sequence Alignment - INPUT 1 The word-size is the length of the initial seed set of amino acids, which needs to match exactly to get the alignment extended in both directions Window Length is the length of the residues on either side of the initial matched sequence, till which the alignment will be extended. SEQUENCE DATABASE Enter sequence 1 2 >gi|268574584|ref|XP_002642271.1| Hypothetical protein CBG18259 [Caenorhabditis briggsae] MDEKQRLQAYRFVAYSAVTFSTVAVFSLCITLPLVYNYVDGIKTQINHEIKFCKHSARDIFAEVNHIRANPKNASRFARQAGYGTDEAVSGGS Word Size 3 Users can choose to see absolute scores for comparing or percentage value of the scores 10 Window length Enter sequence 2 >gi|32565788|ref|NP_871711.1| COLlagen family member (col-96) [Caenorhabditis elegans] MDEITRRNAYRFVAYSAVTFSVVAVFSLCITLPMVYNYVHGIKSQINHQISFCKHSARDIFSEVNHIRASPNNATLREKRQAGDCSGCCL Existence 11, Extension 1 Gap penalty 3 Enter sequence 3 ABSOLUTE Score type >gi|17559060|ref|NP_505677.1| COLlagen family member (col-13) [Caenorhabditis elegans] MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQHRSNGLWDEYKRFQGVSGVEGRIKRDAYH ABSOLUTE PERCENTAGE MULTIPLE SEQUENCE ALIGNMENT (CLUSTAL-W) ADD MORE SEQUENCES 4 Action Description of the action Audio Narration Follow the animation steps. Enter first 2 sequences. Click on “Add more sequences”. Open the 3rd input box for entering thee 3rd sequence. Show the input of 3rd sequence. Show the input of parameters. Select “Absolute” ahead of “Score Type” downlonk. Downlink after scoring matrix should look like the downlinks seen on web-pages. Multiple Sequence Alignment tools are used to compare the amino acid sequences of more than two proteins. The word-size is the length of the seed set of amino acids, which needs to match exactly to get extended in both directions. Window Length is the length of the residues on either side, till which the alignment will be extended. The Gap penalty and extension hold the same meaning as in pair-wise alignment. In the scores, users can choose to see absolute scores for comparing or percentage value of the scores. Schematic of the process of pair-wise alignment 5

1 Step 6: Multiple Sequence Alignment - OUTPUT SEQUENCE DATABASE Enter sequence 1 MPSSVSWGILLLAGLCCLVPVSLAEDPQGDAAQKTDTSHHDQDHPTFNKITP Word Size 3 10 Enter sequence 2 Threshold Gap penalty MKLLKLTGFIFFLFFLTESLTLPTQPRDIENFNSTQKFIEDNIEYITIIAFAQYVQEA Existence 11, Extension 1 Mapping of colors to amino acid groups 2 Scoring Matrix Enter sequence 2 BLOSUM MKLLKLTGFIFFLFFLTESLTLPTQPRDIENFNSTQKFIEDNIEYITIIAFAQYVQEA MULTIPLE SEQUENCE ALIGNMENT (CLUSTAL-W) Text alignment of query sequences Color coded alignment of query sequences Alignment score which can be compared with other scores to measure the quality of alignmnet 3 COLOR CODED ALIGNMENT MULTIPLE SEQUENCE ALIGNMENT sequence 1 MDE-----KQRLQAYRFVAYSAVTFSTVAVFSLCITLPLVYNYVDGIKTQ sequence 2 MDE-----ITRRNAYRFVAYSAVTFSVVAVFSLCITLPMVYNYVHGIKSQ sequence 3 MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSS ALIGNMENT SCORE Sequence 1 Sequence 2 Sequence 3 5269 Action Description of the action Audio Narration 4 Shows the various output formats for multiple sequence alignment Show the smaller image of the server with every output coming out of it one at a time Multiple sequence alignment gives various kinds of results after alignment. The alignment view in text format displays the residue wise matching for the input sequence. The color coded alignment gives a better graphical picture as the amino acid residues are assigned colors based on their physico-chemical properties. Here we depict one of the many color coding available. Alignment score is an absolute term, as selected previously. It can be compared with other scores to measure the quality of alignment. Users obtain .output file for the summary of the result, .aln files which contains the text alignment and .dnd files which contain the distance based information. For detailed understanding of these outputs, kindly visit http://www.ebi.ac.uk/Tools/clustalw2/index.html 5 http://www.ebi.ac.uk/Tools/es/cgi-bin/clustalw2/

Master Layout (Part 2) 1 This animation consists of 2 parts: Part 1: ProteinSequence Alignment Part 2: Alignment analysis and interpretations 2 Phylogram representing evolutionary relationships 3 Structural features that decide function 4 Protein secondary structures 5

Definitions of the componentsPart 2 – Alignment analysis and interpretations 1 Computational Phylogenetic Predictions: Sequence alignment studies of proteins can reveal the conserved and variable residues between the two sequences. Protein sequences derived from different organisms, but having a high degree of similarity are assumed to be coming from the same ancestor. Such predictions, which can now be carried out computationally with the help of various algorithms, help in providing an insight into evolutionary processes. Phylogram: Phylogram is a pictorial representation that provides a visualization of evolutionary relationships or phylogeny. In this, the length of branches in the tree are considered to be proportional to the evolutionary distance. Cladogram: A Cladogram is another form of pictorial representation that also gives a visual insight into evolutionary relationships or phylogeny. Unlike the phylogram, the branches of a cladogram are of equal length irrespective of the evolutionary distance. Maximum Parsimony: A method used for alignments which show very strong sequence similarity. This is usually applied for less than twelve sequences. 2 3 4 5

Definitions of the componentsPart 2 – Alignment analysis and interpretations 1 • Distance methods: This predicts the evolutionary distance when there is any sequence variation present and can be used on large number of sequences. As the distance between two sequences increases, the uncertainty of the alignment also increases. • Maximum likelihood: This method is useful for prediction of evolutionary distance when sequence variability is high. It can be used for alignments with any amount of variability. • Protein structure prediction: Thethree dimensional structure of a protein is largely specified by its amino acid sequence. Protein structures can be predicted with an accuracy of 70-75% when provided with the sequence. • Functional annotation: Function(s) of proteins can be predicted for those proteins having a well-described homology. Gene Ontology terms (GO terms) provide a unique identification of the function that the gene is involved in. These functions are categorized at different levels of functional hierarchy. • Protein motif: Common patterns of residues in a set of protein sequences is known as a motif. 2 3 4 5

Step 1: Phylogenetic analysis from alignment- Input 1 SEQUENCE DATABASE Enter a sequence alignment for 2 or more sequences 2 Select a method MAXIMUM PARSIMONY USED FOR SEQUENCES WITH HIGHLY CONSERVED RESIDUES MAXIMUM PARSIMONY DISTANCE METHODS MAXIMUM LIKELIHOOD Seq1 -------------- LLFLFSSAYSRGVFRRDTHK Seq2 MKWVTFISLLFLFSSAYSRGVFRRDAH Seq3 MKWVTFLLLLFVSGSAFSRGVFRREA USED FOR SEQUENCES WITH MODERATELY CONSERVED RESIDUES 3 USED FOR SEQUENCES WITH HIGHLY VARIABLE RESIDUES PHYLOGENETIC ANALYSIS (PHYLIP) Action Description of the action Audio Narration 4 Multiple sequence alignment produces alignment files (.aln), which can be used to determine the evolutionary distances of a set of given protein sequences. This can be achieved by many server-based and stand-alone programs. The user needs to select the method for calculating the distance. Here we depict the usage of alignment files for phylogenetic analysis. Follow the animation steps. Show the description of each of the methods as the mouse hovers over them. Finally select “Maximum Parsimony” method. Downlink after scoring matrix should look like the downlinks seen on web-pages. Schematic of the process of analysis of alignment 5

Step 2: Phylogenetic analysis from alignment- Output 1 SEQUENCE DATABASE Select a method MAXIMUM PARSIMONY Enter a sequence alignment for 2 or more sequences 2 DND files gives the distance measure of the aligned sequences from their common ancestral node Branching diagram depicting evolutionary relationships or phylogeny. Phylogram is a branching depicting evolutionary relationships or phylogeny. In this, the length of branches in the tree are considered to be proportional to the evolutionary distance. PGFPPLVAPEPDALCAAFQDN PNLPRLVRPEVDVMCTAFHDN PKLK-PDPNTLCDEFKADEKKF PHYLOGENETIC ANALYSIS (PHYLIP) 3 ( seq 1:0.13525, Seq 2:0.09868, seq 3:0.09868); PHYLOGRAM DND FILES CLADOGRAM 4 Action Description of the action Audio Narration The outputs from the analysis will be Distance file known as the DND file, Cladogram and Phylogram which are evolutionary trees. In the DND file, there is a common node. The values against the sequence are the distance from the common node. DND files give the distance measure of the aligned sequences from their common ancestral node. Cladograms are the graphical representation of the branching during evolution of the proteins that were aligned. Cladograms do not represent the evolutionary distances or the common ancestral node. Phylograms also represent the evolutionary distance tree in a graphical format. In this, the branch lengths correspond to the evolutionary distance between the two proteins. All branches will converge to a common ancestral root. Follow the animation steps. The server on the previous slide gives the following outputs Schematic of the process of analysis of alignment 5

Step 3: Structural and Functional prediction from alignment- Input 1 SEQUENCE DATABASE Enter a sequence alignment for 2 or more sequences 2 Range for width of the motifs to be found 6-50 Seq 1 PGFPPLVAPEPDALCAAFQDN Seq 2 PNLPRLVRPEVDVMCTAFHDN Seq 3 PKLK-PDPNTLCDEFKADEKKF Maximum number of motifs to be found 3 3 Structural and Functional prediction (MeMe server) Action Description of the action Audio Narration 4 Alignment files can also be used for a variety of structural and functional analysis. Here we represent the functioning of such programs and servers by taking a simple example of protein motif prediction. The range of the width and the maximum number of motifs to be found are defined by the user. Follow the animation steps. Input the alignment. Input the parameters. Click on the server tool. Schematic of the for structural and functional analysis 5 http://meme.sdsc.edu/meme4_4_0/intro.html

Step 4: Structural and Functional prediction from alignment- Output 1 SEQUENCE DATABASE Enter a sequence alignment for 2 or more sequences The color coded diagram shows the positions of the motifs in the text alignment of the compared sequences Range for width of the motifs to be found 6-50 2 Block diagram of motif prediction is the schematic used to visualize the positions and kinds of motifs in the alignment of two or more sequences PGFPPLVAPEPDALCAAFQDN PNLPRLVRPEVDVMCTAFHDN PKLK-PDPNTLCDEFKADEKKF Maximum number of motifs to be found 3 Structural and Functional prediction (MeMe server) 3 Residue-wise sites for motifs Color coded block diagram for motifs 4 Action Description of the action Audio Narration The outputs obtained are 1. Block Diagram of protein motifs, which is the schematic used to visualize the positions and kinds of motifs in the alignment of two or more sequences. The color coding varies from server to server. 2. Sites of the blocks on a residue-by-residue basis. Follow the animation steps., The server on the previous slide gives the following outputs Schematic of the for structural and functional analysis 5 http://meme.sdsc.edu/meme4_4_0/intro.html

Step 5: Structural and Functional prediction from alignment- Further Analysis 1 2 Enzyme Active sites Subtilisn Epitope prediction in antigens 3 Finding Trans-membrane domains Identify DNA binding residues 4 Action Description of the action Audio Narration Animator needs to re-draw all the images shown as they have been retrieved from web-resources. Show the pie chart. Highlight one quarter of it one at a time and depict the diagram next to it along with narrating it. Once the protein motifs are detected, they can be used for further analysis, such as 1. Epitope Prediction 2. Active site determination 3. Determination of trans-membrane domains 4. Identification of DNA binding residues Functions that can be predicted from sequence data 5 http://qwickstep.com/search/the-active-site-of-an-enzyme.html, http://www.science.uva.nl/research/its/molsim/research/TMsignalling_lizhe/index.html https://www.uzh.ch/oci/ssl-dir/group/files/14_roverview.jpg, http://medgadget.com/archives/2008/03/3d_imaging_of_bleomycindna_binding.html

Interactivity option 1: Find the evolutionary distance between insulin chain A of human and mouse 1 Chose the protein sequences corresponding to insulin A 2. Store the FASTA sequences mentioned against Human and mouse in separate locations 4 Input the two sequences in a multiple alignment server 5 2 Input the term “insulin chain A” in the protein database of your choice 1 Check the.dnd file to find evolutionary distance 8 3 Check for the .aln file and input it into programs for finding Phylogenetic distances such as phylip 7 Check the source organism for the protein sequence. 3. Run the server to obtain output 6. 4 Results Interacativity Type Options All the tabs must be arranged in right order. Arrange the steps in the order to be performed. Remove the step number from the bottom of the tab . Show all the steps in the mixed order. The user must click on the tabs order wise. If the user clicks at a tab which is not in the right order, then flash a message saying “try again” 5

Interactivity option 2.a : Match the following 1 SIMILARITY BASED SCORING MATRIX PAM MATRIX EVOLUTIONARY TREE DOMAIN IDENTIFICATION 2 MEASURE OF BIOLOGICAL SIGNIFICANCE PHYLOGRAM DISTANCE BASED SCORING MATRIX BIT SCORE 3 MEASURE OF QUALITY OF ALIGNMENT, NORMALIZED ACCORDING TO SCORING MATRIX E-VALUE BLOSUM MATRIX BLAST RESULT LINKED TO PFAM 4 Results Interacativity Type Options Results on next slide Match the left column to the right Match the meaning of the parameter on the right to the name of the parameter on the left. If the matching is correct, turn the tab green, else flash “Try Again” 5

Interactivity option 2.b : Match the following 1 PAM MATRIX SIMILARITY BASED SCORING MATRIX BLAST RESULT LINKED TO PFAM DOMAIN IDENTIFICATION 2 EVOLUTIONARY TREE PHYLOGRAM MEASURE OF QUALITY OF ALIGNMENT, NORMALIZED ACCORDING TO SCORING MATRIX BIT SCORE 3 E-VALUE MEASURE OF BIOLOGICAL SIGNIFICANCE BLOSUM MATRIX DISTANCE BASED SCORING MATRIX 4 Results Interacativity Type Options Boundary/limits Correct Matching Match the left column to the right Match the meaning of the parameter on the right to the name of the parameter on the left. If the matching is correct, turn the tab green, else flash “Try Again” 5

Questionnaire 1 1. Which is a scoring matrix based on distantly related proteins? Answers: a) PAM b)BLOSUMc) Bothd)‏ None 2. Which parameter signifies whether the match between two sequences is a chance alignment? Answers: a) word-length b) e-value c) dot-plot d)‏ none 3. Which evolutionary tree has the branch length corresponding to the evolutionary distances? Answers: a) Phylogram b)Cladogram c) both d)‏ none 4. Which is NOT a ClustalW output file extension? Answers: a) .dnd b) .txt c) .aln d)‏ .output 5. Phylogenetic method for most variable sequence is Answers: a) Distance method b) Maximum Distance c) Maximum Parsimony d)‏ Maximum Likelihood 2 3 4 5

Links for further reading Reference websites: http://blast.ncbi.nlm.nih.gov/Blast.cgi http://www.ebi.ac.uk/Tools/clustalw2/index.html http://www.pdb.org/pdb/home/home.do http://expasy.org/sprot/ http://expasy.org/prosite/ http://pfam.sanger.ac.uk/ http://www.psc.edu/general/software/packages/phylip/

Links for further reading Following URLs are used for animations http://www.ncbi.nlm.nih.gov/ http://blast.ncbi.nlm.nih.gov/Blast.cgi http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html http://pfam.sanger.ac.uk/ http://www.ebi.ac.uk/Tools/es/cgi-bin/clustalw2/ http://meme.sdsc.edu/meme4_4_0/intro.html http://www.ebi.ac.uk/Tools/clustalw2/index.html http://qwickstep.com/search/the-active-site-of-an-enzyme.html http://www.science.uva.nl/research/its/molsim/research/TMsignalling_lizhe/index.html https://www.uzh.ch/oci/ssl-dir/group/files/14_roverview.jpg http://medgadget.com/archives/2008/03/3d_imaging_of_bleomycindna_binding.html

Links for further reading Books: Bioinformatics Sequence and Genome Analysis by David Mount

Bioinformatics and Protein Sequence Analysis

Bioinformatics and Protein Sequence Analysis

Presentation Transcript

From Protein Sequence to Function: Functional Analysis of Protein Sequences and Protein Classification

Protein Sequence Analysis - Overview

Proteomics and Protein Bioinformatics: Functional Analysis of Protein Sequences

Basics of Protein Bioinformatics and Structural Bioinformatics

PROTEIN SEQUENCE ANALYSIS

Protein Sequence Analysis - Overview

Bioinformatics and Protein Structural Analysis

Protein Sequence Analysis - Overview -

Protein sequence analysis

Bioinformatics Sequence Analysis I

Protein Primary Sequence

Protein Sequence

Day 1b: Protein Sequence Analysis

Recent Advances in Protein Sequence Analysis

Part I – Sequence analysis (DNA) : Bioinformatics Software

Protein Evolution and Sequence Analysis

Course Sequence Analysis for Bioinformatics Master’s

Protein Sequence Motifs

Protein Evolution and Sequence Analysis

Protein Sequence Analysis - Overview