Predicting Protein Function. protein. RNA. DNA. Biochemical function (molecular function). What does it do? Kinase??? Ligase???. Page 245. Function based on ligand binding specificity. What (who) does it bind ??. Page 245. Function based on biological process.
What does it do?
ligand binding specificity
What (who) does it bind ??
on biological process
What is it good for ??
Amino acid metabolism?
Where is it active??
Nucleolus ?? Cytoplasm??
Where is the RNA/Protein Expressed ??
Where it is under expressed??
Ontology is a description of the concepts and relationships that can
exist for an agent or a community of agents
Orthologs - Proteins from different species that evolved by speciation.
Hemoglobin human vsHemoglobin mouse
Paralogs - Proteins encoded within a given species that arose from one or
more gene duplication events.
Hemoglobin human vsMyoglobin human
> Each COG consists of individual orthologous proteins or orthologous sets of paralogs.
> Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG.
Refence: Classification of conserved genes according to their homologous relationships. (Koonin et al., NAR)
Protein motifs can be represented as a consensus or a profile
ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD
vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLD
hsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD
- ProSite a database of protein patterns that can be searched
by either regular expression patterns or sequence profiles.
- PHI BLASTSearching a specific protein sequence pattern
with local alignments surrounding the match.
-MEME searching for a common motifs in unaligned sequences
Extending along the length of a protein
Occupying a subset of a protein sequence
Occurring one or more times
Methyl CpG binding protein 2 (MeCP2)
The protein includes a Methylated DNA Binding Domain
(MBD) and a Transcriptional Repression Domain (TRD).
MeCP2 is a transcriptional repressor.
A methyl-binding domain shared by several proteins
HMM is a probabilistic model of the MSA consisting
of a number of interconnected states
16 17 18 19
D R T R
D R T S
S - - S
S P T R
D R T R
D P T S
D - - S
D - - S
D - - S
D - - R
> Database that contains a large collection of multiple sequence alignments of protein domains
Profile Hidden Markov Models (HMMs).
MKD P A A LKRARN T E A A
RRS SRARKL QRM
M E R P Y A C P V E S C D RR F
S R S D E L T RH I R I H T
S K V N E A F E T L KR C T S S N
P N Q R L P K V E I L R N A I R
Physical properties of proteins (positive) amino acids
Many websites are available for the analysis of
individual proteins for example:
UCSC Proteome Browser
The accuracy of the analysis programs are variable.
Predictions based on primary amino acid sequence
(such as molecular weight prediction) are likely to be
more trustworthy. For many other properties (such as
Phosphorylation sites), experimental evidence may be
required rather than prediction algorithms.
Find the common properties of a protein family (or any group of proteins of interest)
which are unique to the group and different from all the other proteins.
Generate a model for the group and predict new members of the family which have similar properties.
1. Building a Model
Basic Steps (positive) amino acids
2. Predicting the function of a new protein
And represent them in a vector
Y14 – A protein sequence translated from an ORF
(Open Reading Frame)
Obtained from the Drosophila complete Genome
>Y14 (positive) amino acids
Y14 DOES NOT BIND RNA