coffee shop n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Coffee Shop PowerPoint Presentation
Download Presentation
Coffee Shop

Loading in 2 Seconds...

play fullscreen
1 / 61

Coffee Shop - PowerPoint PPT Presentation


  • 344 Views
  • Uploaded on

Coffee Shop. F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜. Menu. Coffee Shop Opening Why coffee shop? Three Flavors COFFEE T-Coffee 3DCoffee Remarks Recipes. Multiple Sequence Alignment.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Coffee Shop' - triage


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
coffee shop

Coffee Shop

F91921025 黃仁暐

F92921029 戴志華

F92921041 施逸優

R93921142 吳於芳

R94921035 林與絜

slide2
Menu
  • Coffee Shop Opening
    • Why coffee shop?
  • Three Flavors
    • COFFEE
    • T-Coffee
    • 3DCoffee
  • Remarks
  • Recipes
multiple sequence alignment
Multiple Sequence Alignment
  • Multiple sequence alignment is one of the most important tool for analyzing biological sequence.
    • structure prediction
    • phylogenetic analysis
    • function prediction
    • polymerase chain reaction (PCR) primer design.
multiple sequence alignment1
Multiple Sequence Alignment
  • However, the accuracy is not good enough.
    • difficult to evaluate the quality of a multiple alignment
    • algorithmically very hard to produce the optimal alignment
  • In order to increase the accuracy of multiple sequence alignment, we opened a coffee shop to share three kinds of coffee.
before drinking coffee
Before (drinking) COFFEE
  • For comparative genomics, and why?
    • Understanding the process of evolution at gross level and local level
    • Translate DNA sequence data into proteins of known function
    • Meaning of conservative regions
  • E. coli, C. elegans, Drosophila, Human…
    • What’s their relationship?
slide6

大腸桿菌

線蟲

集胞藻屬(藍綠藻類)

果蠅

人類

酵母菌

阿拉伯芥

Classification for genes of different function

Adapted from “Principles of genome analysis and genomics” Fig. 7.5 (p.129), by S. B. Primrose and R. M. Twyman, 3rd edition

comparative genomics vs multiple sequence alignment
Comparative genomics vs. multiple sequence alignment
  • Alignment → conservative region
  • Conservative region → gene location
  • Evolution evidence

http://www.public.iastate.edu/~semrich/compgen/

slide8

A: human chromosome IB: human chromosome IIC: human chromosome III

Chromosome III region 125-128 Mb was magnified 120X

The alignment between the chromosomes

http://gchelpdesk.ualberta.ca/news/02jun05/cbhd_news_02jun05.php

our flavors
Our Flavors
  • COFFEE: A New Objective Function For Multiple Sequence Alignmnent.
    • C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998
  • T-Coffee: A novel method for multiple sequence alignments.
    • C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000
  • 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments.
    • O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins,

C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004

coffee1
COFFEE
  • An objective function for multiple sequence alignments
    • Cédirc Notredame, Liisa Holm and Desmond G. Higgins
  • SAGA with COFFEE score
introduction
Introduction
  • COFFEE - Consistency based Objective Function For alignmEnt Evaluation
  • An objective function, COFFEE score, is proposed to measure the quality of multiple sequence alignments
  • Optimize the COFFEE score of a multiple sequence alignment with the genetic algorithm package SAGA (Sequence Alignment Genetic Algorithm)
overview of their method
Overview of their method
  • Given
    • a set of sequences to be aligned
    • a library containing all pairwise alignments between them,
  • the COFFEE score reflects the level of consistency between a multiple sequence alignment and the library.
coffee score

-

N

1

N

å

å

×

W

SCORE

(

A

)

i

,

j

i

,

j

=

=

+

i

1

j

i

1

=

COFFEE

score

-

N

1

N

å

å

×

W

LEN

(

A

)

i

,

j

i

,

j

=

=

+

i

1

j

i

1

with

:

=

SCORE

(

A

)

number

of

aligned

pairs

of

residues

,

i

j

that

are

shared

between

A

and

the

library

,

i

j

COFFEE score
using coffee in saga
Using COFFEE in SAGA
  • Iteratively, a multiple sequence alignment with higher COFFEE score is generated by SAGA until the COFFEE score cannot be improved
  • SAGA follows the general principle of genetic algorithm.
    • The notion of survival of the fittest
  • SAGA iteratively does:
    • Evaluate the score of the alignments
    • The fitter an alignment, the more likely it is to survive and produce an offspring
    • Alignments survived may be kept unchanged, randomly modified (mutation), or combined with another alignment (cross-over)
results

SAGA

Results

COFFEE function

COFFEE score & alignment accuracy

Optimization of COFFEE function

等下會看到一堆表格很枯燥,所以請忍耐…

Effect of optimization

Comparison: COFFEE and others

Others: PRRP, Clustal W, PILEUP, SAGA MSA, SAM

optimization
Optimization
  • COFFEE function was optimized by SAGA

Using SAGA alignments

Using ClustalW alignments

comparison
Comparison
  • Multiple alignments of SAGA COFFEE and 5 other methods
    • PRRP, ClustalW, PILEUP, SAGA MSA, SAM
  • Performance of SAGA and ClustalW
  • Comparison of other 5 methods
    • 即使SAGA-COFFEE不是最好的結果 →跟最好的也相去不遠
  • Identity level lower → better SAGA-COFFEE results
slide21

Better than PRRP

Correctly aligned ratio

Worse than PRRP

  • Ratio of (E+H) residue correctly aligned
  • Better of worse alignment? SAGA-COFFEE & others
  • NO such thing as an ideal method
coffee score and alignment accuracy

E+H accuracy (%)

E+H accuracy (%)

r=0.65

Average identity (%)

Coffee sequence score

COFFEE score and alignment accuracy

>85%的sequence都可預測 (error ~ ±10%)

由coffee score去預測alignment的準確度

Average identity 並沒有辦法預測alignment的準確度

slide23

Correlation between score and accuracy

  • Higher score → higher accuracy
  • SAGA produces more high-score sequence than ClustalW
t coffee1
T-Coffee
  • A novel method for multiple sequence alignments
    • C.Notredame, D. Higgins, J. Heringa
  • ClustalW with extended library
clustalw
ClustalW

ClustalW is the core alignment stradegy of T-Coffee, it follows the procedure below:

  • Pairwise Alignment: calculate distance matrix
  • Guide Tree
    • Unrooted Neighbor-Joining Tree
    • Rooted Neighbor-Joining Tree: guide tree with sequence weights
  • Progressive Alignment: align following the guide tree
guide tree
Guide tree
  • Use Neighbor-Joining Method to build guide tree from distance matrix.
  • First construct an unrooted Neighbor-Joining tree, then convert it to a rooted Neighbor-Joining tree, the guide tree.
progressive alignment align following the guide tree
Progressive Alignment: align following the guide tree

Seq5

Seq3

Seq4

Seq1

Seq2

Alignment 2

Alignment 1

Final alignment

Alignment 3

progressive alignment strategy
Progressive-alignment strategy
  • Pros
    • Faster and saving spaces. (compared with computing all possible multiple alignments)
  • Cons
    • May not find optimum solution.
    • Errors made in the rest alignments cannot be rectified later as the rest of the sequences are added in.

T-Coffee is an attempt to minimize that effect!

“Once a gap, always a gap!”

t coffee algorithm
T-Coffee Algorithm
  • Generating a primary library of alignments
  • Derivetion of the primary library weights
  • Combination of the libraries
  • Extending the library
  • Progressive alignment strategy
slide35

Lalign Primary Library (Local)

ClustalW Primary Library (Global)

Weighting

Primary Library

slide37

Lalign Primary Library (Local)

ClustalW Primary Library (Global)

Weighting

Primary Library

Extension

Extended Library

extended library

A

Extended Library

Weight(A-C-B)

= min( Weigh(A-C), Weight(B-C) )

= min( 77, 100 ) = 77

Weight(A-D-B)

= min( Weight(A-D), Weight(B-D) )

= min( 100, 100 ) = 100

extended library1

SeqA: GARFIELD THE LAST FAT CAT

SeqB: GARFIELD THE FAST CAT

SeqA: GARFIELD THE LAST FAT CAT

A

SeqB: GARFIELD THE FAST CAT

Extended Library
extended library2

SeqA: GARFIELD THE LAST FAT CAT

SeqB: GARFIELD THE FAST CAT

SeqA: GARFIELD THE LAST FAT CAT

A

SeqB: GARFIELD THE FAST CAT

Extended Library
slide41

Lalign Primary Library (Local)

ClustalW Primary Library (Global)

Weighting

Primary Library

Extension

Extended Library

Progressive Alignment

Multiple Alignment Information

complexity analysis
Complexity Analysis
  • complexity of the whole procedure:

O(N2L2) + O(N3L) + O(N3) + O(NL2)

  • O(N2L2): computation of the pair-wise library
  • O(N3L): computation of the extended pair-wise library
  • O(N3): computation of the NJ tree
  • O(NL2): computation of the progressive alignment
  • N sequences that can be aligned in a multiple alignment of length L
experiment
Experiment
  • Implementation environment
  • Result 1: Effect of combining local and global alignments without extension; effect of the library extension
  • Result 2: compared with other multiple sequence alignment methods
implementation environment
Implementation environment
  • Programming language: ANSI C
  • Hardware: LINUX platform with Pentium II processors (330 MHz).
  • Test case: BaliBase database of multiple sequence alignment
result 1
Result 1

Table 1: The effect of combining local and global alignments

Name global/local/extend Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total(141) Significance

C ClustalW pw /.../... 70.6 26.7 43.0 56.0 60.0 58.9 7.8

CE ClustalW pw/…/ex 77.1 33.6 47.6 64.8 75.9 66.3 17.7

L .../Lalign pw/... 65.4 12.1 22.8 53.9 66.0 52.0 7.8

LE .../Lalign pw/ex 72.6 25.6 47.2 77.5 85.5 64.2 16.3

CL ClustalW pw/Lalign pw/.. 76.2 32.0 48.3 76.2 74.6 66.5 12.1g

CLE ClustalW pw/Lalign pw /ex 80.6 37.1 52.9 83.2 88.6 72.0

result 2
Result 2

Table 2: T-coffee compared with other multiple sequence alignment methods

Method Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total1(141) Total2(141) Significance

Dialign 71.0 25.2 35.1 74.7 80.4 61.5 57.3 11.3

ClustalW 78.5 32.2 42.5 65.7 74.3 66.4 58.6 26.2

Prrp 78.6 32.5 50.2 51.1 82.7 66.4 59.0 36.9

T-Coffee 80.6 37.1 52.9 83.2 88.6 72.0 68.6

3dcoffee1
3DCoffee
  • Combining protein sequences and structures within multiple sequence alignments
    • O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame
  • T-Coffee with structure information
3dcoffee2
3DCoffee
  • Structural information can help to improve the quality of multiple sequence alignments
  • 3DCoffee
    • Combines protein sequences and structures
    • Is based on T-Coffee version 2.00
    • Uses a mixture of pairwise sequence alignments and pairwise structure comparison methods.
3dcoffee3
3DCoffee
  • Use T-Coffee to compile
    • A primary library: a list of weighted pairs of residues.
    • An extended library: usage the column consistency relationship between all sequences
  • According to the structure information
    • Fugue, SAP, LSQman
3dcoffee4
3DCoffee
  • Fugue – a threading method that aligns a protein sequence with a 3D-structure
  • SAP – uses DP to compute a pairwise alignment based on a non-rigid structure superposition
  • LSQman – a rigid body structure superposition package
3dcoffee5
3DCoffee
  • Set the weight of new alignment as 100
    • which is the most score of primary library
  • Add the weighted alignments into the library
  • Carry out progressive alignment the same as T-Coffee
remarks
Remarks
  • COFFEE : An objective function for multiple sequence alignments
    • SAGA with COFFEE score
  • T-Coffee : A novel method for multiple sequence alignments
    • ClustalW with extended library
    • 3DCoffee : Combining protein sequences and structures within multiple sequence alignments
    • T-Coffee with structure information
recipes
Recipes
  • CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
    • Julie D.Thompson, Desmond G.Higgins+ and Toby J.Gibson*. 1994
  • COFFEE: A New Objective Function For Multiple Sequence Alignmnent.
    • C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998
  • T-Coffee: A novel method for multiple sequence alignments.
    • C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000
  • 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments.
    • O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004
residue score
Residue score
  • Sequence score measurement
    • Global measurement
  • Residue was scored 9
    • >90% of the pairs involved in were also present in the reference library
  • Residue score evaluated → substitution defined
    • Class 5 substitution → residue score ≥ 5
slide59

5566677788888888899999877- - - - -66666666788888888887

vsdvprdlevvaatptslliswdap gslevvaatptslliswdap

slide60

Correct substitution: SAGA > ClustalW

  • Lower accuracy: more false positive in SAGA alignment
slide61

High-scoring residues with high accuracy

Higher substitution category → smaller number of prediction