slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
BINF6201/8201 Patterns in Protein Families 11-16-2009 PowerPoint Presentation
Download Presentation
BINF6201/8201 Patterns in Protein Families 11-16-2009

Loading in 2 Seconds...

play fullscreen
1 / 23

BINF6201/8201 Patterns in Protein Families 11-16-2009 - PowerPoint PPT Presentation

  • Uploaded on

BINF6201/8201 Patterns in Protein Families 11-16-2009. Problems for pairwise alignment for database searches.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

BINF6201/8201 Patterns in Protein Families 11-16-2009

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Patterns in Protein Families



Problems for pairwise alignment for database searches

  • To predict the function of a new sequences, we typically identify the best hit of the sequence in a well-annotated the database such as Swiss-Prot using a pairwise alignment tool such as FASTA or BLAST, and then predict the function of the new sequence according to the annotated function of the best hit sequences.
  • However this approach may cause problems, and result in incorrect functional predictions:
  • Sequence complexity affects the search results;
  • The sites are equally weighted, so the score may not reflect the similarity of the functions;
  • The annotated function of the best hit sequence could be incorrect.
  • Low complexity regions
  • Multi-domain proteins

Motifs are more powerful for predicting protein functions

  • These problems can be partially solved by using the sequence signatures (motifs) shared by the members of a protein family.

Motif 1

Motif 2

Motif 2

Motif 4


Motifs in a protein family

Conserved catalytic domain in the caspase-like superfamily

Catalytic motif 1

Catalytic motif 2


Representation of a motif








Consensus sequence


A collection of s70 binding sites in E. coli

Regular expression


Frequency matrix

Position specific scoring matrix (PSSM)


Searching with regular expression (regex) of motifs

  • The regular express of a motif is constructed based on the multiple alignment of sequences of the motifs



Search motifs using regexs

  • Examples of regex expression of motifs.
  • Each window of a sequence is compared with the regex, and the exact matches are returned as the hits.
  • More flexible than the search using the consensus sequence, but still lacks enough flexibility.

Search motifs using regexs

  • To increase the flexibility of regex searches, amino acids with similar physico-chemical properties can be allowed—permissive regexs.


E-x-[EDQN]-x-K-[LIVM](2)-x-[KRH]-LIVM (2)-x-[DNQE]-M-C-x(2)-Q-Y


Searching with fingerprints

  • Several motifs (fingerprint) can be compiled from the alignment of the members of a protein family, and the frequency matrix can be constructed.

Multiple alignment

Frequency profile


Searching with fingerprints

  • The frequency matrix or the resulting PSSM can be considered as a specialized scoring matrix of the fingerprint, while PAM and BLOSUM are general purpose scoring matrices.

The frequency matrix after 3 iterative searches of Swiss-Prot database using the matrix of the fingerprint

The frequency matrix after 3 iterative searches of Swiss-Prot database using a PAM matrix.


Searching with fingerprints

  • Using the fingerprint (8 motifs) of the prion protein family to scan the human and chick prion protein sequences.
  • Human prion has complete match, but chick prion is still qualified as a prion protein.

Searching with blocks

  • Ungapped longer conserved sequences called blocks can be also constructed for protein families.
  • A frequency matrix can be computed for a block, and each sequence can have a different contribution to the score using different weights.
  • Highly similar sequences in the block are clustered, and are given smaller weight, while relatively distantly related one is given a higher weight.

Searching with block/profiles

  • The PSSM of a few blocks of protein family can be used for annotating members of the family.
  • In this case, gaps between blocks are considered, and gaps are also allowed in the block to make the search more flexible.

Analysis of G protein coupled receptors (GPCRs)

  • GPCRs are a very large group of proteins found in species from bacteria to mammals.

Have a very diverse spectrum of biological functions;

In vertebrates, PCRs have under gone lineage specific expansions through gene duplications, e.g., the human genome encodes more than 800 GPCR genes.

50% of marked drugs are targeted to GPCRs;

Generate a revenue of $16 billion.

  • GPCRs carry out their functions by converting a extracellular signal to an intracellular signal.
  • A GPCR protein binds a signal molecule in the membrane domain or extracellular domain, and binds a specific G-protein via its intracellular domain.

Classification of GPCRs

  • Phylogenetic analysis suggests that GPCR can be divided into three super-families (classes):
  • Rhodopsin-like
  • Secretin-like
  • Metabotropic glutamate receptor-like

Nature Reviews Drug Discovery1; 599-608 (2002); doi:10.1038/nrd872


Diversity of the sequences of different GPCR classes

  • The sequence similarity between different GPCR super-families can be very low, e.g., bacterial rhodopsin and bovine rhodopsin share only16% sequence identity.

The diversity of sequences of different GPCR classes

  • Although all GPCR classes possess a 7 transmembrane domain architecture, their 3-D structures could be quite different.
  • Therefore, it is still contentious whether all the extant GPCR are evolved from the same ancestor, or are arisen independently.


bovine rhodopsin


Sequence similarity of members of the same GPCR class

  • On the other hand, GPCRs in the same super-family may have very similar sequences, because they are arisen through recent linage-specific gene expansions.
  • Because of their very similar sequences, it can be very challenging to predict their functions (orthologous relationship) by pairwise sequence similarity comparison.

Example: two families of GPCRs, rhodopsins and opsins, control the light sensation of animals.

A rhodopsin binds a chromophore (retinal, a vitamin A derivative), and responsible for vision in dim light.

Different opsins bind a different chromophores, and sense Red, Green and Blue lights, respectively.

Rhodopsins and opsins share 40% sequence similarity on average. However, opsins may have 98% sequence similarity.


Similarity of members of the same GPCR class

  • Pairwise sequence comparisons of rhodopsin and opsins can be misleading.

Green opsins of chick and goldfish are more similar to rhodopins,

Blue and purple opsins in gecko and chameleon are more similar to rhodopins

Canonical green and red opsins


Similarity of members of the same GPCR class

  • BLAST fails to predict the function of urotensin II receptor. According the BLAST results, we might predict it as a somatostain or a galanin receptor.

Using motifs, fingerprints and domain signatures can help predict the function of a GPCR

  • Using the InterPro database for functional annotation.

The result of a search of InterPro with the human vasopressin 1A receptor sequence (VIAR_HUMAN)

  • RINTS fingerprints PR00896 and PR00752 give the most specific annotations.

Using motifs, fingerprints and domain signatures can help predict the function of a GPCR

  • The PRINTS fingerprint of urotensin family can identify the human urotensin receptor as a member, but that of the somatostatin family cannot.

Analysis of G protein coupled receptors (GPCRs)

  • In generally, signatures shared at different hierarchical levels of the super/family predicts functions with different specificity.

Sub-family signature

Family signature

7 TM

Super-family signature