bnfo 240 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
BNFO 240 PowerPoint Presentation
Download Presentation
BNFO 240

Loading in 2 Seconds...

play fullscreen
1 / 21

BNFO 240 - PowerPoint PPT Presentation


  • 177 Views
  • Uploaded on

BNFO 240. Usman Roshan. Last time. Traceback for alignment How to select the gap penalties? Benchmark alignments Structural superimposition BAliBASE. Database searching. Suppose we have a set of 1,000,000 sequences

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'BNFO 240' - race


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
bnfo 240

BNFO 240

Usman Roshan

last time
Last time
  • Traceback for alignment
  • How to select the gap penalties?
  • Benchmark alignments
    • Structural superimposition
    • BAliBASE
database searching
Database searching
  • Suppose we have a set of 1,000,000 sequences
  • You have a query sequence q and want to find the m closest ones in the database---that means 1,000,000 pairwise alignments!
  • How to speed up pairwise alignments?
fasta
FASTA was the first software for quick searching of a database

Introduced the idea of searching for k-mers

Can be done quickly by preprocessing database

FASTA
blast
BLAST

Key idea: search for k-mers (short matchig substrings)

quickly by preprocessing the database.

blast1
BLAST

This key idea can also be used for speeding up pairwise

alignments when doing multiple sequence alignments

biologically realistic scoring matrices
Biologically realistic scoring matrices
  • PAM and BLOSUM are most popular
  • PAM was developed by Margaret Dayhoff and co-workers in 1978 by examining 1572 mutations between 71 families of closely related proteins
  • BLOSUM is more recent and computed from blocks of sequences with sufficient similarity
slide9
PAM
  • We need to compute the probability transition matrix M which defines the probability of amino acid i converting to j
  • Examine a set of closely related sequences which are easy to align---for PAM 1572 mutations between 71 families
  • Compute probabilities of change and background probabilities by simple counting
slide10
PAM
  • In this model the unit of evolution is the amount of evolution that will change 1 in 100 amino acids on the average

The scoring matrix Sab is the ratio of Mab to pb

multiple sequence alignment
Multiple sequence alignment
  • “Two sequences whisper, multiple sequences shout out loud”---Arthur Lesk
  • Computationally very hard---NP-hard
multiple sequence alignment1
Unaligned sequences

GGCTT

TAGGCCTT

TAGCCCTTA

ACACTTC

ACTT

Aligned sequences

_G_ _ GCTT_

TAGGCCTT_

TAGCCCTTA

A_ _CACTTC

A_ _C_ CTT_

Conserved regions help us

to identify functionality

Multiple sequence alignment
profiles
Profiles
  • Before we see how to construct multiple alignments, how do we align two alignments?
  • Idea: summarize an alignment using its profile and align the two profiles
iterative alignment heuristic for sum of pairs
Iterative alignment(heuristic for sum-of-pairs)
  • Pick a random sequence from input set S
  • Do (n-1) pairwise alignments and align to closest one t in S
  • Remove t from S and compute profile of alignment
  • While sequences remaining in S
    • Do |S| pairwise alignments and align to closest one t
    • Remove t from S
iterative alignment
Iterative alignment
  • Once alignment is computed randomly divide it into two parts
  • Compute profile of each sub-alignment and realign the profiles
  • If sum-of-pairs of the new alignment is better than the previous then keep, otherwise continue with a different division until specified iteration limit
progressive alignment
Progressive alignment
  • Idea: perform profile alignments in the order dictated by a tree
  • Given a guide-tree do a post-order search and align sequences in that order
  • Widely used heuristic