Predicting Protein Function
Download
1 / 39

protein - PowerPoint PPT Presentation


  • 175 Views
  • Uploaded on

Predicting Protein Function. protein. RNA. DNA. Biochemical function (molecular function). What does it do? Kinase??? Ligase???. Page 245. Function based on ligand binding specificity. What (who) does it bind ??. Page 245. Function based on biological process.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' protein' - aria


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Biochemical function

(molecular function)

What does it do?

Kinase???

Ligase???

Page 245


Function based on

ligand binding specificity

What (who) does it bind ??

Page 245


Function based

on biological process

What is it good for ??

Amino acid metabolism?

Page 245


Function based on

cellular location

DNA

RNA

Where is it active??

Nucleolus ?? Cytoplasm??

Page 245


Function based on

cellular location

DNA

RNA

Where is the RNA/Protein Expressed ??

Brain? Testis?

Where it is under expressed??

Page 245


Go gene ontology http www geneontology org
GO (gene ontology)http://www.geneontology.org/

  • The GO project is aimed to develop three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated

  • molecular functions(F)

  • biological processes (P)

  • cellular components (C)

Ontology is a description of the concepts and relationships that can

exist for an agent or a community of agents


Inferring protein function bioinformatics approach
Inferring protein function Bioinformatics approach

  • Based on homology

  • Based on functional characteristics

  • “protein signature”


Homologous proteins
Homologous proteins

  • Rule of thumb:Proteins are homologous if 25% identical (length >100)


Proteins with a common evolutionary origin

Homologous proteins

Orthologs - Proteins from different species that evolved by speciation.

Hemoglobin human vsHemoglobin mouse

Paralogs - Proteins encoded within a given species that arose from one or

more gene duplication events.

Hemoglobin human vsMyoglobin human


Cogs c lusters of o rthologous g roups of proteins
COGs Clustersof Orthologous Groupsof proteins

> Each COG consists of individual orthologous proteins or orthologous sets of paralogs.

> Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG.

Refence: Classification of conserved genes according to their homologous relationships. (Koonin et al., NAR)

DATABASE



The Protein Signature

  • Motif (or fingerprint):

  • a short, conserved region of a protein

  • typically 10 to 20 contiguous amino acid residues

  • Domain:

  • A region of a protein that can adopt a 3 dimensional structure


Protein Motifs

Protein motifs can be represented as a consensus or a profile

1 50

ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD

vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLD

hsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD

GTWYEI

K AV

M

GXW[YF][EA][IVLM]


Searching for Protein Motifs

- ProSite a database of protein patterns that can be searched

by either regular expression patterns or sequence profiles.

- PHI BLASTSearching a specific protein sequence pattern

with local alignments surrounding the match.

-MEME searching for a common motifs in unaligned sequences


Protein domains
Protein Domains

  • Domains can be considered as building blocks of proteins.

  • Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function.


Dna binding domain zinc finger
DNA Binding domain Zinc-Finger


Varieties of protein domains

Extending along the length of a protein

Occupying a subset of a protein sequence

Occurring one or more times

Page 228


Example of a protein with 2 domains:

Methyl CpG binding protein 2 (MeCP2)

MBD

TRD

The protein includes a Methylated DNA Binding Domain

(MBD) and a Transcriptional Repression Domain (TRD).

MeCP2 is a transcriptional repressor.


Result of an MeCP2 blastp search:

A methyl-binding domain shared by several proteins



Pfam

  • > Database that contains a large collection of multiple sequence alignments of protein domains

  • Based on

  • Profile hidden Markov Models (HMMs).


Profile hmm hidden markov model
Profile HMM (Hidden Markov Model)

HMM is a probabilistic model of the MSA consisting

of a number of interconnected states

D19

D16

D17

D18

100%

delete

100%

16 17 18 19

50%

M16

M17

M18

M19

D R T R

D R T S

S - - S

S P T R

D R T R

D P T S

D - - S

D - - S

D - - S

D - - R

100%

100%

50%

Match

D 0.8

S 0.2

P 0.4

R 0.6

R 0.4

S 0.6

T 1.0

I16

I17

I18

I19

insert

X

X

X

X


Pfam

> Database that contains a large collection of multiple sequence alignments of protein domains

Based on

Profile Hidden Markov Models (HMMs).

  • > The Pfam database is based on two distinct classes of alignments

    • Seed alignments which are deemed to be accurate and used to produce Pfam A

    • -Alignments derived by automatic clustering of SwissProt, which are less reliable and give rise to Pfam B



Dna binding domains have relatively high frequency of basic positive amino acids
DNA binding domains have relatively high frequency of basic (positive) amino acids

MKD P A A LKRARN T E A A

RRS SRARKL QRM

GCN4

zif268

M E R P Y A C P V E S C D RR F

S R S D E L T RH I R I H T

S K V N E A F E T L KR C T S S N

P N Q R L P K V E I L R N A I R

myoD



Physical properties of proteins (positive) amino acids

Many websites are available for the analysis of

individual proteins for example:

EXPASY (ExPASy)

UCSC Proteome Browser

ProtoNet HUJI

The accuracy of the analysis programs are variable.

Predictions based on primary amino acid sequence

(such as molecular weight prediction) are likely to be

more trustworthy. For many other properties (such as

Phosphorylation sites), experimental evidence may be

required rather than prediction algorithms.

Page 236


Knowledge based approach
Knowledge Based Approach (positive) amino acids

  • IDEA

    Find the common properties of a protein family (or any group of proteins of interest)

    which are unique to the group and different from all the other proteins.

    Generate a model for the group and predict new members of the family which have similar properties.


Knowledge based approach1
Knowledge Based Approach (positive) amino acids

Basic Steps

1. Building a Model

  • Generate a dataset of proteins with a common function (DNA binding protein)

  • Generate a control dataset

  • Calculate the different properties which are characteristic of the protein family you are interested for all the proteins in the data (DNA binding proteins and the non-DNA binding proteins

  • Represent each protein in a set by a vector of calculated features and build a statistical model to split the groups


Basic Steps (positive) amino acids

2. Predicting the function of a new protein

  • Calculate the properties for a new protein

    And represent them in a vector

  • Predict whether the tested protein belongs to the family


Test case
TEST CASE (positive) amino acids

Y14 – A protein sequence translated from an ORF

(Open Reading Frame)

Obtained from the Drosophila complete Genome

>Y14

PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G


>Y14 (positive) amino acids

PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G

Y14 DOES NOT BIND RNA


Database and tools for protein families and domains
Database and Tools for protein families and domains (positive) amino acids

  • InterPro - Integrated Resources of Proteins Domains and Functional Sites

  • Prosite – A dadabase of protein families and domain

  • BLOCKS - BLOCKS db

  • Pfam - Protein families db (HMM derived)

  • PRINTS - Protein Motif fingerprint db

  • ProDom - Protein domain db (Automatically generated)

  • PROTOMAP - An automatic hierarchical classification of Swiss-Prot proteins

  • SBASE - SBASE domain db

  • SMART - Simple Modular Architecture Research Tool

  • TIGRFAMs - TIGR protein families db


ad