This presentation is the property of its rightful owner.
1 / 19

# Assessment of sequence alignment PowerPoint PPT Presentation

Assessment of sequence alignment. Lecture 10. Introduction. The Dot plot Matrix visualisation matching tool: Basics of Dot plot Examples of Dot plot matching sequences Tandems repeats self matching Inverted repeats: genetic palindromes . Sequence alignment Analysis.

Assessment of sequence alignment

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Assessment of sequence alignment

Lecture 10

### Introduction

• The Dot plot Matrix visualisation matching tool:

• Basics of Dot plot

• Examples of Dot plot matching sequences

• Tandems repeats self matching

• Inverted repeats: genetic palindromes

### Sequence alignment Analysis

• In order to measure the degree of similarity between sequences they must first be aligned to maximise the matching score (refer to lecture 11):

• Example 1

• I am from Cork

• I am not from Cork

• ****

• (4 matches out of 18; based on length of bottom string)

• Example 2

• I am ---- from Cork

• I am not from Cork

• **** **********

• (14 matches out of 18; based on length of bottom string)

### The Dot plot

• A “better” way of doing this is to represent each sequence as a table or matrix, where one sequence represents the rows and the other the columns. The Dot plot Matrix is a visual way of seeing the alignment between two sequences:

• The first sequence (query sequence) represents the rows and the other sequence (subject sequence) represents the columns.

• All elements (row/column) are checked for a match and if there the cell is marked.

• This will show all areas of both sequences where matches occur.

### Dot plot

• Consider the following:

• Diagonal lines represent a alignments (match)

• Horizontal lines between aligned sequences indicate gaps are required (where the gaps indicate a deletion/insertion)

• This has four “potential” aligned sequences:

• D->Y;

• H->N

• R->0

• 0->H

• Longest sequence of alignments are:

• “THIS” ; and “SEQUENCE“;

• “IS” would be considered as gaps

• The pink dots: they can represent noise (spurious alignments)

adapted from understanding bioinformatics p. 77

### Dot plot Matrix: purpose

• This allows us to visualise areas of “local alignment” as opposed to global alignment.

• One of the main purpose to find domains / motifs that match . This could be useful for many reasons; e.g. promoter factor binding site, finding exons….

• For visualisation of pair-wise alignment you have one query on the x-axis and the other on the y-axis.

### Dot Plot noise

This shows the effect of noise (blue line has be been inserted to highlight alignment if interest. The figure on the left represents SH2 sequence (sample files ) plotted against inself. The one on the right has been filter; in this case an alignment must be at least 10 residues long with a score of 3. adapted from understanding bioinformatics p. 77

### Dot plot Matrix: imperfect match

• Some alignments require gaps to increase the matching score; the gaps are used represent inclusion/deletion mutations

• The diagram shows that most of the 2 sequences are aligned. Where there are gaps indicates areas of non-alignment or mismatches: gaps or substitutions

Refer to saved web page

### Dot plot for Tandem Repeats

• The human genome has many tandem repeats small sequences of nucleic acids (bases)/ Amino acids that are repeated and are ubiquitous in genomes and can compromise 50% of genome. (Richard 2008)

• They can be used as genealogical markers

• To determine specific regions of interest; e.g. introns

• Play a significant part in evolution Gemayel 2010

• An example of a protein with multiple repeats is human mucin (Baxevanis 2005 p. 297)

### Tandem repeat dot plot

• To determine if there is tandem repeats the sequence is compared with itself (refer table 1)

• The more diagonals the more repeats

• The diagonals at the bottom left compare the start with the finish

• The fact the main diagonal means the both sequences are the same .

• The lines are symmetrical around the main diagonal:

### Tandem repeats (Example)

• BRCA2 gene has a number of BRC repeats (39 residues long. The diagram shows two plots: one with noise (unfiltered) and the other showing two repeating sequences. Adapted from Figure 4.3 understanding bioinformatics

### Genetic “Palindromes”

• A palindrome is a word that is spelt the same from right to left as well as from left to write: This will give an “X” shaped dot-plot. (try; eye, navan; never odd or even …..)

• Remember left to right is (5’ to 3’) on primary strand and right to left is (5’ to 3’) on the complimentary strand. Alternatively it means a match between a strand and its reverse compliment.

• 2 possible types of “Genetic Palindromes” [the difference being that the left to right, read, is on one strand while the right to left, read, is on its complimentary strand]:

• Restrictive enzymes such as EcoR1:

• 5’ GAATTC 3’

• 3’ CTTAAG 5’

• Inverted repeats

• On different segments; each repeat read the same (GTGAG) but in opposite directions. An example is promoter region for the CAP protein in the lac operon :

• 5‘ GTGAGnnnCTCAC 3'3' CACTCnnnGAGTG 5’

• What will the dot plot for the above 2 sequences look like.

• introduction to dotplot (figure 6 gives a more indepth view of different types of plots referred to above: alignment, alignment with gaps, tandem repeats, palindromes…..

• Inverted repeats and dotplot. (more advanced analysis of plots for inverted repeats)

### Exam Question

• Describe, using a suitable example, how to construct a dot plot matrix for the alignment of DNA/AA sequences. (10 marks)

• Describe the significance of two types of repeating sequences found in DNA sequences (6 marks)

• Explain, using suitable examples, how the DOT plot matrix can find the two types of repeating regions [what is plotted against what and what will the DOT PLOT look like] (14 marks)

### References

• Baxevanis A.D. 2005 Bioinformatics: a practical guide to the analysis of genes and proteins chapter 11; Wiley

• Klug, W. S. (2010); the essentials of genetics; 7th ed Pearson Education

• Gemayel, R. et al 2010 Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu Rev genet 44: 445-477

• Richard, G.F. (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol biol rev 2008 Dec;72(4):686-727