slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Protein Evolution: SARS coronavirus as an example PowerPoint Presentation
Download Presentation
Protein Evolution: SARS coronavirus as an example

Loading in 2 Seconds...

play fullscreen
1 / 45

Protein Evolution: SARS coronavirus as an example - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Lecture 3: Protein Families and Family Prediction Methods Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore. Protein Evolution: SARS coronavirus as an example. SARS Coronavirus.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Protein Evolution: SARS coronavirus as an example' - africa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Lecture 3: Protein Families and Family Prediction MethodsProf. Chen Yu ZongTel: 6874-6877Email: csccyz@nus.edu.sghttp://xin.cz3.nus.edu.sgRoom 07-24, level 7, SOC1, National University of Singapore

sars coronavirus
SARS Coronavirus

A novel coronavirus

Identified as

the cause of

severe respiratory

syndrome (SARS )

sars infection
SARS Infection

How SARS coronavirus enters

a cell and reproduce

protein evolution
Protein Evolution

Generation of different species

protein families
Protein Families
  • Sequence alignment-based families.
    • Based on Principle of Sequence-structure-function-relationship.
    • Derived by multiple sequence alignment
    • Database: PFAM (Nucleic Acids Res. 30:276-280)
  • Structure-based families.
    • Derived by visual inspection and comparison of structures
    • Database: SCOP (J. Mol. Biol. 247, 536-540)
  • Functional Families.
    • Databases:
      • G-protein coupled receptors: GPCRDB (Nucleic Acids Res. 29: 346-349), ORDB (Nucleic Acids Res. 30:354-360)
      • Nuclear receptors: NucleaRDB (Nucleic Acids Res. 29: 346-349)
      • Enzymes: BRENDA (Nucleic Acids Res. 30, 47-49)
      • Transporters: TC-DB (Microbiol Mol Biol Rev. 64:354-411)
      • Ligand-gated ion channels: LGICdb (Nucleic Acids Res. 29: 294-295)
      • Therapeutic targets: TTD (Nucleic Acids Res. 30, 412-415)
      • Drug side-effect targets: DART (Drug Safety 26: 685-690)
protein families1
Protein Families

Sequence families

=\= Structural families

=\= Functional families

Sequence similar, structure different

Sequence different, structure similar

Sequence similar, function different (distantly related proteins)

Sequence different, function similar

Homework: find examples

protein family prediction methods
Protein Family Prediction Methods

Sequence alignment-based families:

  • Multiple sequence alignment (HMM): HMMER;

JMB 235, 1501-153; JMB 301, 173-190

Structure-based families:

  • Visual inspection and comparison of structures

Functional Families.

  • Statistical learning methods:
    • Neural network: ProtFun (Bioinformatics, 19:635-642)
    • Support vector machines:SVMProt (Nucleic Acids Res., 31: 3692-3697)
sequence comparison as a mathematical problem
Sequence Comparison as a Mathematical Problem:

Example:

Sequence a:  ATTCTTGC

Sequence b: ATCCTATTCTAGC

         Best Alignment:             ATTCTTGC                                  ATCCTATTCTAGC                                        /|\                  gap    Bad Alignment: AT     TCTT       GC                                  ATCCTATTCTAGC                                                              /|\             /|\                                      gap          gap

Construction of many alignments => which is the best?

how to rate an alignment
How to rate an alignment?
  • Match: +8 (w(x, y) = 8, if x = y)
  • Mismatch: -5 (w(x, y) = -5, if x ≠ y)
  • Each gap symbol: -3 (w(-,x)=w(x,-)=-3)

C - - - T T A A C TC G G A T C A - - T

+8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12

Alignment score

alignment graph
Alignment Graph

Sequence a: CTTAACT

Sequence b: CGGATCAT

C G G A T C A T

CTTAACT

C---TTAACTCGGATCA--T

an optimal alignment the alignment of maximum score
An optimal alignment-- the alignment of maximum score
  • Let A=a1a2…am and B=b1b2…bn .
  • Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj
  • With proper initializations, Si,j can be computedas follows.
computing s i j
Computing Si,j

j

w(ai,bj)

w(ai,-)

i

w(-,bj)

Sm,n

initializations
Initializations

C G G A T C A T

CTTAACT

s 3 5
S3,5 = ?

C G G A T C A T

CTTAACT

s 3 51
S3,5 = ?

C G G A T C A T

CTTAACT

optimal score

c t t a a c t c g g a t c a t
C T T A A C – TC G G A T C A T

8 – 5 –5 +8 -5 +8 -3 +8 = 14

C G G A T C A T

CTTAACT

global alignment vs local alignment
Global Alignment vs. Local Alignment
  • global alignment:
  • local alignment:
an optimal local alignment
An optimal local alignment
  • Si,j: the score of an optimal local alignment ending at ai and bj
  • With proper initializations, Si,j can be computedas follows.
local alignment
local alignment

Match: 8

Mismatch: -5

Gap symbol: -3

C G G A T C A T

CTTAACT

local alignment1
local alignment

A – C - TA T C A T

8-3+8-3+8 = 18

C G G A T C A T

CTTAACT

The best score

multiple sequence alignment msa
Multiple sequence alignment (MSA)
  • The multiple sequence alignment problem is to simultaneously align more than two sequences.

Seq1: GCTC

Seq2: AC

Seq3: GATC

GC-TC

A---C

G-ATC

how to score an msa
How to score an MSA?
  • Sum-of-Pairs (SP-score)

GC-TC

A---C

Score

+

GC-TC

A---C

G-ATC

GC-TC

G-ATC

Score

Score

=

+

A---C

G-ATC

Score

functional classification by svm
Functional Classification by SVM
  • A protein is classified as either belong (+) or not belong (-) to a functional family
  • By screening against all families, the function of this protein can be identified(example: SVMProt)
  • What is SVM? Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes.
  • Advantage of SVM: Diversity of class members (no racial discrimination). Use of sequence-derived physico-chemical features as basis for classification. Suitable for functional family classifications.
svm references
SVM References
  • C. Burges, "A tutorial on support vector machines for pattern recognition", Data Mining and Knowledge Discovery, Kluwer Academic Publishers,1998 (on-line).
  • R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (section 5.11, hard-copy).
  • S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Pres, 2001 (sections 3.6.2, 3.7.2, hard copy).
  • Online lecture notes
introduction to machine learning
Introduction to Machine Learning
  • Goal:
    • To “improve” (gaining knowledge, enhancing computing capability)
  • Tasks:
    • Forming concepts by data generalization.
    • Compiling knowledge into compact form
    • Finding useful explanations for valid concepts.
    • Clustering data into classes.
  • Reference:
    • Machine Learning in Molecular Biology Sequence Analysis.
  • Internet links:
    • http://www.ai.univie.ac.at/oefai/ml/ml-resources.html
introduction to machine learning1
Introduction to Machine Learning
  • Category:
    • Inductive learning.
      • Forming concepts from data without a lot of knowledge from domain (learning from examples).
    • Analytic learning.
      • Use of existing knowledge to derive new useful concepts (explanation based learning).
    • Connectionist learning.
      • Use of artificial neural networks in searching for or representing of concepts.
    • Genetic algorithms.
      • To search for the most effective concept by means of Darwin’s “survival of the fittest” approach.
machine learning methods
Machine Learning Methods

Inductive learning:

Concept learning and example-based learning

Concept

learning:

machine learning methods1
Machine Learning Methods

Analytic

learning:

machine learning methods3
Machine Learning Methods

Genetic algorithms:

Pattern

Strength

Classification

svm for classification of proteins
SVM for Classification of Proteins

How to represent a protein?

  • Each sequence represented by specific feature vector assembled from encoded representations of tabulated residue properties:
    • amino acid composition
    • Hydrophobicity
    • normalized Van der Waals volume
    • polarity,
    • Polarizability
    • Charge
    • surface tension
    • secondary structure
    • solvent accessibility
  • Three descriptors, composition (C), transition (T), and distribution (D), are used to describe global composition of each of these properties.

Nucleic Acids Res., 31: 3692-3697

svm for classification of proteins1
SVM for Classification of Proteins

Descriptors for amino acid composition of protein:

C=(53.33, 46.67)

T=(51.72)

D=(3.33, 16.67, 40.0, 66.67, 96.67, 6.67, 26.67, 60.0, 76.67, 100.0)

Nucleic Acids Res., 31: 3692-3697