Biometrics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

BIOMETRICS PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on
  • Presentation posted in: General

BIOMETRICS. Module Code: CA641 Week 11- Pairwise Sequence Alignment. Outline. Why should one do sequence comparison ? Dynamic programming alignment algorithms global alignment (Needleman- Wunsch ) local alignment (Smith-Waterman ) Heuristic alignment algorithms : FASTA , BLAST

Download Presentation

BIOMETRICS

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Biometrics

BIOMETRICS

Module Code: CA641

Week 11- Pairwise Sequence Alignment


Outline

Outline

  • Whyshould one do sequence comparison?

  • Dynamic programming alignment algorithms

    • global alignment (Needleman-Wunsch)

    • local alignment (Smith-Waterman)

  • Heuristic alignment algorithms: FASTA, BLAST

    Acknowledgement: this presentation was created following Prof. LiviuCiortuz, “Computer Science” Faculty, “Al. I. Cuza” University, Iasi, Romania: http://profs.info.uaic.ro/~ciortuz/


Why should one do sequence comparison first success stories

Why should one do sequence comparison?First success stories

  • 1984 (Russell Doolittle): similarity between the newly discovered ν-sis onco-gene in monkeys and a previously known gene PDGF(platelet derived growth factor) involved in controlled growth of the cell, suggested that the onco-gene does the right job at the wrong time

  • 1989: cystic fibrosis disease — an abnormally high content of sodium in the mucus in lungs:

  • Biologists suggested that this is due to a mutant gene (still unknown at that time, but narrowed down to a 1MB region) on chromosome 7. Comparison with (then) known genes found similaritywith a gene that encodes ATP , the adenosine triphosphate binding protein, which spans the cell membrane as an ion transport channel. Cystic fibrosis was then proved to be caused by the deletionof a triplet in the CFTR(cystic fibrosis transport regulator) gene.


Model organism

Model organism

a non-human species - extensively studied to understand particular biological phenomena

used to research human disease

Escherichia coli

Drosophila melanogaster

Zea mays

Laboratory mice

Image source: http://en.wikipedia.org/wiki/Model_organism


How to rank alignments scoring systems notations

How to rank alignments: Scoring systems Notations

  • Consider 2 sequencesx and y of lengths n and m , respectively; xi is the i-th symbol (“residue”) in x and yj is the j -th symbol in y

  • For DNA, the alphabetis {A, C, G, T}. For proteins, the alphabet consists of 20 amino acids.


Sequence matching alignment

Sequence matching/alignment

  • Given two DNA sequences or proteins

    • X = x1x2x3..xm

    • Y = y1y2y3…yn,

      where xi, yi are from the alphabet Alf = {A, C, G, T}, what is their similarity?

  • Seq. Alignment= a way to arrange the sequences of DNA, RNA or protein, i.e. regions of similarity can be identified

  • Example of seq. alignment:

  • Gapsmay be opened to improve alignment


A dot plot example

A dot plot example


Gap penalties

Gap Penalties

  • Types of gap penalties:

    • Linear score: ϒ (g) = -gd

    • Affine score: ϒ (g) = -d – (g-1)e

      where:

    • d > 0 is the gap open penalty,

    • e> 0 is the gap extension penalty,

    • g N is the length of the gap.

  • Note:

  • Usually e < d, therefore long insertions and deletions will be penalised less than they would have been penalised by a linear gap cost.

    • Examples from this presentation use d = -8.


Dna substitution matrices

DNA substitution matrices

Simple

Used in genome alignments


Dynamic programming

Dynamic programming

  • Better alignments will have a higher score. Therefore, the purpose is to maximize the score in order to find the optimal alignment.

  • Sometimes scores are assigned by other means as costs or edit distances, when a minimum cost of an alignment is searched.

  • Both approaches have been used in biological sequence comparison.

  • Dynamic programming is applied to both cases: the difference is searching for a minimum or maximum score of the alignment.


The score matrix for the following examples from blosum50

The score matrix for the following examples (from BLOSUM50)

Examples from this presentation use d = -8.


Global alignment needleman wunsch algorithm 1970

Global Alignment:Needleman–Wunsch Algorithm (1970)

  • Problem: Find the optimal global alignment of two sequences x = x1..xn, and y = y1..ym.

  • Idea: Use dynamic programming.

  • Initialisation:

    F(0,0) =0, F(i,0)=-id, F(0,j) = -jd, for i=1..n, j=1..m

  • Recurrence:

    There are three ways in which the best score F(i,j) of an alignment up to xi, yj could be obtained:

    F(i,j)= max 

    Fill in the matrix F from top left to bottom right.

  • Optimal score: F(n,m).


Filling in the dynamics programming matrix

Filling in the dynamics programming matrix

-d

-d


Global alignment example computing the dynamic programming matrix

Global Alignment: ExampleComputing the dynamic programming matrix


Global alignment example cont d finding an optimal alignment

Global Alignment: Example (cont’d)Finding an optimal alignment


Local alignment smith waterman algorithm

Local AlignmentSmith–Waterman Algorithm

  • Problem: Find the best local alignment of two sequences x and y, allowing gaps.

  • Idea: Modify the global alignment algorithm in such a way that it will end in a local alignment whenever the score becomes negative..

  • Initialisation:

    F(0,0) =0, F(i,0)= F(0,j) = 0, for i=1..n, j=1..m

  • Recurrence:

    There are three ways in which the best score F(i,j) of an alignment up to xi, yj could be obtained:

    F(i,j)= max 

    Fill in the matrix F from top left to bottom right.

  • Optimal score: maxi, j F(i,j).


Biometrics

Note

For the local alignment algorithm to work,

  • Some (a,b) must be positive, otherwise the algorithm will not find any alignment at all;

  • The expected score of a random alignment should be negative, otherwise

    long matches between unrelated sequences will have the high score, just based on their length, and a true local alignment would be masked by a longer but incorrect one.


Local alignment example

Local Alignment: Example


Additional links

Additional links

  • http://thor.info.uaic.ro/~ciortuz/SLIDES/pairAlign.pdf


Biometric workshop page 11

Biometric Workshop – page 11

  • Q.3 Given 2 sequences

    GCCGAGCTAG

    CCTAGCAAG

  • Build the matrix of maximal similarity between them, according to Smith-Waterman algorithm (local alignment), using the scores: match = 1; mismatch = -1; gap = -3.

  • What is the optimal alignment?


  • Login