Predicting Protein Function
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

protein PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on
  • Presentation posted in: General

Predicting Protein Function. protein. RNA. DNA. Biochemical function (molecular function). What does it do? Kinase??? Ligase???. Page 245. Function based on ligand binding specificity. What (who) does it bind ??. Page 245. Function based on biological process.

Download Presentation

protein

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Protein

Predicting Protein Function

protein

RNA

DNA


Protein

Biochemical function

(molecular function)

What does it do?

Kinase???

Ligase???

Page 245


Protein

Function based on

ligand binding specificity

What (who) does it bind ??

Page 245


Protein

Function based

on biological process

What is it good for ??

Amino acid metabolism?

Page 245


Protein

Function based on

cellular location

DNA

RNA

Where is it active??

Nucleolus ?? Cytoplasm??

Page 245


Protein

Function based on

cellular location

DNA

RNA

Where is the RNA/Protein Expressed ??

Brain? Testis?

Where it is under expressed??

Page 245


Go gene ontology http www geneontology org

GO (gene ontology)http://www.geneontology.org/

  • The GO project is aimed to develop three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated

  • molecular functions(F)

  • biological processes (P)

  • cellular components (C)

Ontology is a description of the concepts and relationships that can

exist for an agent or a community of agents


Inferring protein function bioinformatics approach

Inferring protein function Bioinformatics approach

  • Based on homology

  • Based on functional characteristics

  • “protein signature”


Homologous proteins

Homologous proteins

  • Rule of thumb:Proteins are homologous if 25% identical (length >100)


Protein

Proteins with a common evolutionary origin

Homologous proteins

Orthologs - Proteins from different species that evolved by speciation.

Hemoglobin human vsHemoglobin mouse

Paralogs - Proteins encoded within a given species that arose from one or

more gene duplication events.

Hemoglobin human vsMyoglobin human


Cogs c lusters of o rthologous g roups of proteins

COGsClustersof Orthologous Groupsof proteins

> Each COG consists of individual orthologous proteins or orthologous sets of paralogs.

> Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG.

Refence: Classification of conserved genes according to their homologous relationships. (Koonin et al., NAR)

DATABASE


Inferring protein function based on the protein signature

Inferring protein function based on the protein signature


Protein

The Protein Signature

  • Motif (or fingerprint):

  • a short, conserved region of a protein

  • typically 10 to 20 contiguous amino acid residues

  • Domain:

  • A region of a protein that can adopt a 3 dimensional structure


Protein

Protein Motifs

Protein motifs can be represented as a consensus or a profile

1 50

ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD

vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLD

hsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD

GTWYEI

K AV

M

GXW[YF][EA][IVLM]


Protein

Searching for Protein Motifs

- ProSite a database of protein patterns that can be searched

by either regular expression patterns or sequence profiles.

- PHI BLASTSearching a specific protein sequence pattern

with local alignments surrounding the match.

-MEME searching for a common motifs in unaligned sequences


Protein domains

Protein Domains

  • Domains can be considered as building blocks of proteins.

  • Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function.


Dna binding domain zinc finger

DNA Binding domainZinc-Finger


Protein

Varieties of protein domains

Extending along the length of a protein

Occupying a subset of a protein sequence

Occurring one or more times

Page 228


Protein

Example of a protein with 2 domains:

Methyl CpG binding protein 2 (MeCP2)

MBD

TRD

The protein includes a Methylated DNA Binding Domain

(MBD) and a Transcriptional Repression Domain (TRD).

MeCP2 is a transcriptional repressor.


Protein

Result of an MeCP2 blastp search:

A methyl-binding domain shared by several proteins


Protein

Are proteins that share only a domain homologous?


Protein

Pfam

  • > Database that contains a large collection of multiple sequence alignments of protein domains

  • Based on

  • Profile hidden Markov Models (HMMs).


Profile hmm hidden markov model

Profile HMM (Hidden Markov Model)

HMM is a probabilistic model of the MSA consisting

of a number of interconnected states

D19

D16

D17

D18

100%

delete

100%

16 17 18 19

50%

M16

M17

M18

M19

D R T R

D R T S

S - - S

S P T R

D R T R

D P T S

D - - S

D - - S

D - - S

D - - R

100%

100%

50%

Match

D 0.8

S 0.2

P 0.4

R 0.6

R 0.4

S 0.6

T 1.0

I16

I17

I18

I19

insert

X

X

X

X


Protein

Pfam

> Database that contains a large collection of multiple sequence alignments of protein domains

Based on

Profile Hidden Markov Models (HMMs).

  • > The Pfam database is based on two distinct classes of alignments

    • Seed alignments which are deemed to be accurate and used to produce Pfam A

    • -Alignments derived by automatic clustering of SwissProt, which are less reliable and give rise to Pfam B


Physical properties of proteins

Physical properties of proteins


Dna binding domains have relatively high frequency of basic positive amino acids

DNA binding domains have relatively high frequency of basic (positive) amino acids

MKD P A A LKRARN T E A A

RRS SRARKL QRM

GCN4

zif268

M E R P Y A C P V E S C D RR F

S R S D E L T RH I R I H T

S K V N E A F E T L KR C T S S N

P N Q R L P K V E I L R N A I R

myoD


Transmembrane proteins have a unique hydrophobicity pattern

Transmembrane proteins have a unique hydrophobicity pattern


Protein

Physical properties of proteins

Many websites are available for the analysis of

individual proteins for example:

EXPASY (ExPASy)

UCSC Proteome Browser

ProtoNet HUJI

The accuracy of the analysis programs are variable.

Predictions based on primary amino acid sequence

(such as molecular weight prediction) are likely to be

more trustworthy. For many other properties (such as

Phosphorylation sites), experimental evidence may be

required rather than prediction algorithms.

Page 236


Knowledge based approach

Knowledge Based Approach

  • IDEA

    Find the common properties of a protein family (or any group of proteins of interest)

    which are unique to the group and different from all the other proteins.

    Generate a model for the group and predict new members of the family which have similar properties.


Knowledge based approach1

Knowledge Based Approach

Basic Steps

1. Building a Model

  • Generate a dataset of proteins with a common function (DNA binding protein)

  • Generate a control dataset

  • Calculate the different properties which are characteristic of the protein family you are interested for all the proteins in the data (DNA binding proteins and the non-DNA binding proteins

  • Represent each protein in a set by a vector of calculated features and build a statistical model to split the groups


Protein

Basic Steps

2. Predicting the function of a new protein

  • Calculate the properties for a new protein

    And represent them in a vector

  • Predict whether the tested protein belongs to the family


Test case

TEST CASE

Y14 – A protein sequence translated from an ORF

(Open Reading Frame)

Obtained from the Drosophila complete Genome

>Y14

PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G


Protein

>Y14

PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G

Y14 DOES NOT BIND RNA


Database and tools for protein families and domains

Database and Tools for protein families and domains

  • InterPro - Integrated Resources of Proteins Domains and Functional Sites

  • Prosite – A dadabase of protein families and domain

  • BLOCKS - BLOCKS db

  • Pfam - Protein families db (HMM derived)

  • PRINTS - Protein Motif fingerprint db

  • ProDom - Protein domain db (Automatically generated)

  • PROTOMAP - An automatic hierarchical classification of Swiss-Prot proteins

  • SBASE - SBASE domain db

  • SMART - Simple Modular Architecture Research Tool

  • TIGRFAMs - TIGR protein families db


  • Login