Protein function and classification
This presentation is the property of its rightful owner.
Sponsored Links
1 / 57

Protein function and classification PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on
  • Presentation posted in: General

Protein function and classification . www.ebi.ac.uk/interpro. Hsin -Yu Chang www.ebi.ac.uk. Greider and Balckburn discovered telomerase in 1984 and were awarded Nobel prize in 2009. Which model organism they used for this study ? . 3. Mouse. 2. S accharomyces cerevisiae.

Download Presentation

Protein function and classification

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Protein function and classification

Protein function and classification

www.ebi.ac.uk/interpro

Hsin-Yu Chang

www.ebi.ac.uk


Protein function and classification

Greider and Balckburn discovered telomerase in 1984 and were awarded Nobel prize in 2009. Which model organism they used for this study ?

3. Mouse

2. Saccharomyces cerevisiae

1. Tetrahymena

4. Human


Protein function and classification

1995 Clone hTR

1995/1997 Clone hTERT

1997 Telomerase knockout mouse

1989

Telomere hypothesis of cell senescence

Szostak

1999/2000…

Telomerase/telomere dysfunctions and cancer

1985

Discovery of telomerase

Greider and Blackburn

1998 Ectopic expression of telomerase in normal fibroblasts and epithelial cells bypasses the Hayflick’s limit

A single Tetrahymena cell has 40,000 telomeres, whereas a human cell only has 92.

Gilson and Ségal-Bendirdjian, Biochimie, 2010.


Therefore protein classification could help scientists to gain information about protein functions

Therefore, protein classification could help scientists to gain information about protein functions.


I n the lab what do we usually do to analyse protein sequences and find out their functions

In the lab, what do we usually do to analyse protein sequences and find out their functions?


Protein function and classification

Protein BLAST

Publications - text books or papers

UniProt

PDB

Specialized protein databases such as SGD, the human protein atlas, etc.

What I used to do:


Blast it

BLAST it?

  • Drawbacks:

    • sometimes struggle with multi-domain proteins

    • less useful for weakly-similar sequences (e.g., divergent homologues)

  • Advantages:

  • Relatively fast

  • User friendly

  • Very good at recognising similarity between closely related sequences


Using blast to find clues of protein functions when it goes well

Using BLAST to find clues of protein functions-when it goes well


Pairwise alignment of two proteins cd4 from two closely related species

Pairwise alignment of two proteins: CD4 from two closely-related species


Protein function and classification

Using BLAST to find clues of protein functions-when it does not give you much information


Protein function and classification

Using BLAST to find clues of protein functions-when it does not give you much information


Protein function and classification

Because BLAST performs localpairwise alignment, it:

  • Cannot encode the information found in an multiple sequence alignment that show you conserved sites.


60s acidic ribosomal protein p0 m ultiple sequence alignment

60S acidic ribosomal protein P0: multiple sequence alignment

Using pairwise alignment could miss out on conserved residues


An alternative approach protein signature search

An alternative approach: protein signature search

  • Model the pattern of conserved amino acids at specific positions within a multiple sequence alignment

  • Use these models to infer relationships with the characterised sequences (from which the alignment was constructed)

  • This is the approach taken by protein signature databases


Three different protein signature approaches

Three different protein signature approaches

Patterns

Single motif methods

Profiles & HMMs

hidden Markov models

Full alignment methods

Fingerprints

Multiple motif methods


Protein function and classification

PS00000

Patterns

Patterns are usually directed against functional sequence features such as: active sites, binding sites, etc.

Sequence alignment

Motif

ALVKLISG

AIVHESAT

CHVRDLSC

CPVESTIS

Pattern sequences

[AC] – x -V- x(4) - {ED}

Regular expression

Pattern signature


Protein function and classification

Patterns

  • Advantages:

    • Cananchor the match to the extremity of a sequence

    • <M-R-[DE]-x(2,4)-[ALT]-{AM}

    • Strict - a pattern with very little variability and forbidden residues can produce highly accurate matches

  • Drawbacks:

    • Simple but less flexible


Protein function and classification

Motif 1

Motif 2

Motif 3

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

Motif sequences

Fingerprint signature

PR00000

Fingerprints:

a multiple motif approach

Sequence alignment

Define motifs

Weight matrices


The significance of motif context

The significance of motif context

  • Identify small conserved regions in proteins

  • Several motifs  characterise family

  • Offer improved diagnostic reliability over single motifs by virtue of the biological context provided by motif neighbours

order

1

2

3

interval


Protein function and classification

Fingerprints

  • Good at modeling the often small differences between closely related proteins

  • Distinguish individual subfamilies within protein families, allowing functional characterisation of sequences at a high level of specificity


Protein function and classification

Profiles & HMMs

Whole protein

Sequence alignment

Entire domain

Define coverage

xxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxx

Use entire alignment of domain or protein family

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Build model

Profile or HMM signature


Protein function and classification

Profiles

Start with a multiple sequence alignment

Amino acids at each position in the alignment are scored according to the frequency with which they occur

Scores are weighted according to evolutionary distance using a BLOSUM matrix

  • Good at identifying homologues


Protein function and classification

HMMs

Start with a multiple sequence alignment

Amino acid frequency at each position in the alignment and their transition probabilities are encoded

Insertions and deletions are also modelled

  • Can model very divergent regions of alignment

  • Very good at identifying evolutionarily distant homologues


Three different protein signature approaches1

Three different protein signature approaches

Patterns

Single motif methods

Profiles & HMMs

hidden Markov models

Full alignment methods

Fingerprints

Multiple motif methods


Protein function and classification

www.ebi.ac.uk/interpro


Protein function and classification

The aim of InterPro

InterPro


What is interpro

What is InterPro?

  • InterProis an integrated sequence analysis resource

  • It combines predictive models (known as signatures) from different databases to provide functional analysis of protein sequences by classifying them into families and predicting domains and important sites


Protein function and classification

Facts about InterPro

  • First release in 1999

  • 11 partner databases

  • Forms part of the automated system that adds annotation to UniProtKB/TrEMBL

  • Provides matches to over 80% of UniProtKB

  • Source of >60 million Gene Ontology (GO) mappings to >17 million distinct UniProtKBsequences

  • 50,000 unique visitors to the web site per month> 2 million sequences searched online per month. Plus offline searches with downloadable version of software


Protein function and classification

HAMAP

Profiles

Protein features 

(sites)

Functional annotation of families/domains

Structural

domains

Patterns

Finger prints

Hidden Markov Models


Interpro signature integration process

InterPro signature integration process

  • Signatures are provided by member databases

  • They are scanned against the UniProt database to see which sequences they match

  • Curators manually inspect the matches before integrating the signatures into InterPro

  • Signatures representing the same entity are integrated together

  • Relationships between entries are traced, where possible

  • Curators add literature referenced abstracts, cross-refs to other databases, and GO terms


Protein function and classification

http://www.ebi.ac.uk/interpro/


Protein function and classification

Search using protein sequences


Protein function and classification

Family


Protein function and classification

Type


Interpro entry types

InterPro entry types

Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure

Family

Domain

Distinct functional, structural or sequence units that may exist in a variety of biological contexts

Repeats

Short sequences typically repeated within a protein

Active

Site

Binding

Site

Conserved

Site

PTM

Sites


Protein function and classification

Type

Name

Identifier

Contributing signatures

Description

References

GO terms


Protein function and classification

Type

Name

Identifier

Contributing signatures

Relationships

Description

References


Interpro family and domain relationships

InterPro family and domain relationships


Family relationships in interpro

Family relationships in InterPro:

Interleukin-15/Interleukin-21 family

Interleukin-15

Interleukin-15

mammal

Interleukin-15

fish

Interleukin-15

avian


Protein function and classification

Relationships


Interpro relationships domains

InterPro relationships: domains

Protein kinase-like

domain

Protein kinase

catalytic domain

Tyrosine

kinase catalytic

domain

Serine/threonine

kinase catalytic

domain


A brief diversion into the gene ontology

A brief diversion into the Gene Ontology...


Gene ontology

Gene Ontology

  • Unify the representation of gene and gene product attributes across species

  • Allow cross-species and/or cross-database comparisons


Protein function and classification

The Gene Ontology

Less specific concepts

  • A way to capture

    biological knowledge

    in a written and

    computable form

  • A set of concepts

  • and their relationships

  • to each other arranged

  • as a hierarchy

More specific concepts

www.ebi.ac.uk/QuickGO


The concepts in go

The Concepts in GO

1. Molecular Function

  • protein kinase activity

  • insulin receptor activity

2. Biological Process

  • Cell cycle

  • Microtubule cytoskeleton organisation

3. Cellular Component


Protein function and classification

GO:0006955 Immune response

GO:0016020 membrane


Summary

Summary

InterPro is a sequence analysis resource that classifies sequences into protein families and predicts important domains and sites

It uses protein signatures based on different methodologies from different member databases

Its member databases all have their particular niche or focus...

...but InterPro offers a combination of all their areas of expertise!


Why use interpro

Why use InterPro?

  • Large amounts of manually curated data

    • 35,634signatures integrated into 25,214entries

    • Cites 38,877PubMed publications

  • Large coverage of protein sequence space

  • Regularly updated

    • ~ 8 week release schedule

    • New signatures added

    • Scanned against latest version of UniProtKB


Caution

Caution

And one more thing…..

  • InterPro is a predictive protein signature database - results are predictions, and should be treated as such

  • InterPro entries are based on signatures supplied to us by our member databases

    • ....this means no signature, no entry!

We need your feedback!

missing/additional references

reporting problems

requests

EBI support page.


Protein function and classification

The InterPro Team:

Hsin-Yu

Chang

Alex Mitchell

Craig McAnulla

Siew-Yit Yong

Amaia Sangrador

Sarah

Hunter

Gift

Nuka

Sebastien Pesseat

Matthew

Fraser

Maxim Scheremetjew

Louise

Daugherty


Thank you

Thank you!

www.ebi.ac.uk

Twitter: @emblebi

Facebook: EMBLEBI

YouTube: EMBLMedia


  • Login