slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Sequence Similarity Search Algorithm Based on a Probabilistic Interpretation of an Alignment Scoring System PowerPoint Presentation
Download Presentation
A Sequence Similarity Search Algorithm Based on a Probabilistic Interpretation of an Alignment Scoring System

Loading in 2 Seconds...

play fullscreen
1 / 22

A Sequence Similarity Search Algorithm Based on a Probabilistic Interpretation of an Alignment Scoring System - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

A Sequence Similarity Search Algorithm Based on a Probabilistic Interpretation of an Alignment Scoring System. Philipp Bucher and Kay Hofmann. Proc Int Conf Intell Syst Mol Biol. 1996;4:44-51. Goal. Modify Smith-Waterman (SW) algorithm such that it has a probabilistic interpretation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Sequence Similarity Search Algorithm Based on a Probabilistic Interpretation of an Alignment Scoring System' - yestin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

A Sequence Similarity Search Algorithm Based on a ProbabilisticInterpretation of an Alignment Scoring System

Philipp Bucher and Kay Hofmann

Proc Int Conf Intell Syst Mol Biol. 1996;4:44-51

slide2
Goal
  • Modify Smith-Waterman (SW) algorithm such that it has a probabilistic interpretation
introduction 1
Introduction 1
  • Goal: find a local alignment between a query sequence and a sequence in a database
  • Local similarity to find conserved domains
  • Conservation implies function
introduction 2
Introduction 2
  • Smith-Waterman (SW) Algorithm (dynamic programming) is the most sensitive algorithm to identify local alignment between two sequences
  • Heuristic algorithms such as FASTA and BLAST are modifications or special cases of SW algorithm
  • O (mxn)
definition
Definition
  • a = a1 a2 . . . am

b = b1 b2 . . . bn

a,bS, S containing N elements

u alignment path

u= (x1,y1), (x2,y2), . . . (xl,yl)

x k+1>xk, y k+1>y k, x £m, y£n

m=8

n=7

l=6

EGAWGHE-E

P-AW-HEAE

EAWHEE

PAWHEE

scoring

Sequence dependent

Sequence independent

Gap score

Scoring

Substitution matrix s(a, b)

SA(a, b, u) = SM (a, b, u) + SG(u)

  • Gap weighting function w(k )
  • w(k ) = a + bk for k ³1,
  • w(0) = 0 if k=0
probabilistic smith waterman psw algorithm
Defines a probability distribution over the sequence space by means of a stochastic process involving arandom walkthrough the model

Defines a probability distribution over the space of sequence pairs by means of a stochastic process involving a random walk through an alignment path matrix

?

?

Probabilistic Smith-Waterman (PSW) Algorithm

ASS

HMM

null probability

Length distribution (same for ASS and Null model)

Null model

residue probability distribution over the alphabet S

residue a

Null probability
slide11

Scoring fxn of local alignment

Length normalizing fxn

slide12

Length normalizing fxn

SM(a, b, u)

SG(u)

Scoring fxn of localalignment

SA(a, b, u)= SM(a, b, u) + SG(u)

Z is some logarithmic base that satisfies:

slide14

G

RKE

GAWG--HE-

AAW-RKHEE

GAWHE

AAWHE

Length of unmatched pairs

Length of matched pairs

P0(a,b)

vk, wk unmatched residues in a and b, respectively

xk, yk matched residues in a and b, respectively

performance evaluation of psw
Performance evaluation of PSW
  • BLAST (Blosum 62)
  • SSEARCH
    • Native SW
    • Blosum 45
    • default gap weighting fxn
  • PSW
    • Blosum 45
    • Same weighing fxn as SSEARCH
  • Search the Swissprot protein database
  • Query: from well known protein family and domains
slide18
Typically 50-90% true positives retrieved for a single query sequence
  • % True positives affected by
    • Divergence of sequence family
    • Stringency of significant criterion applied
  • Stringency of criterion determined by fixing thenumber of false positives accepted
    • Not appropriate if the status of sequences is not known in advance
comparison

5%

9%

14%

14%

26%

33%

54%

54%

53%

Comparison

Equivalent performance of SSEARCH and PSW on GPC receptors, SH2-domain, SH3-domain

comparison ii
Comparison II
  • Improved or equivalent performance of PSW over native SW
  • PSW is specially more sensitive for stringent criterion
summary
Summary
  • Pairwise sequence alignments can be improved by interpreting a scoring system as a probabilistic model
  • Probabilistic interpretation gives higher sensitivity
  • Log-likelihood ratio eliminates scoring bias due to sequence length or choice of the scoring matrix
  • Facilitates optimization of gap weighting matrices
advantages of psw
Advantages of PSW
  • No assumption about evolutionary relatedness is made
  • Therefore, any scoring matrix can be used