200 likes | 385 Views
Identification of protein-protein binding motifs. Felipe Leal Valentim felipe.lealvalentim@wur.nl. Aalt-Jan van Dijk aaltjan.vandijk@wur.nl. Plant Research International Applied Bioinformatics. Protein-protein binding interfaces. Protein-protein binding interfaces. Interface.
E N D
Identification of protein-proteinbinding motifs Felipe Leal Valentim felipe.lealvalentim@wur.nl Aalt-Jan van Dijk aaltjan.vandijk@wur.nl Plant Research International Applied Bioinformatics
Protein-protein binding interfaces Interface Surface Surface Ligand binding site Core Core Core structural residues • Properties: • Exposed in the protein surface; • Functionally/Structurally important residues are more highly conserved; DNA-binding site
Changing the specificity of the protein interaction [van Dijk AD et al., PLoS Comput Biol. 2010] - Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction
Protein-protein binding motifs Interface
Protein-protein binding motifs Protein binding interfaces are composed by residues highly conserved and exposed in the surface; The interface can be represented by short sequence motifs; which are thought to be overrepresented in pairs of interacting proteins.
Identification binding interfaces from structures Arabidopsis Histidine Kinase4 Protein 1 Binding interface Protein 2 Binding interface Protein 1 Protein 2 Interface Complex 1-2 Arabidopsis Trans Zeatin [Hubbard SJ, Thornton JM] Naccess V2.1.1 - Atomic Solvent Accessible Area Calculations
Sequence- and interactome-based pipeline to locate binding sites in Arabidopsis proteins • Sequences -> The evolutionary conservation; • Sequences -> Residue surface accessibility; • Interactome -> Overrepresented motifs; Motif that are: likely to be exposed in the surface; conserved across species; and overrepresented in pairs of interacting proteins.
Sequence- and interactome-based pipeline to locate binding sites in Arabidopsis proteins IAA16 IAA11 IAA7 IAA2 IAA1 SHY2 TPL IAA18
Sequence- and interactome-based pipeline to locate binding sites in Arabidopsis proteins
Assessment of the pipeline's performance Interface motif Predicted motifs • Non-interface motifs False Positives (FP) True Positives (TP) Precision = TP/(TP + FP)
Assessment of the pipeline's performance • Coverage: up to 42%, 22% and 42%, respectively for the human, yeast and Arabidopsis subsets. • Precision: up to 58%, 96% and 100%.
Locating interaction binding sites in Arabidopsis sequences at a large scale – Overview • Predicted motifs: 1498 interactions among 985 proteins 36% of the proteins in the interactome and ~5.5% of all Arabidopsis proteins Validation and bioinformatics analysis
Comparison with single nucleotide polymorphism (SNP) data nsSNP’s Protein sequence Predicted protein-protein binding sites nsSNPs(protein sequence):2.2% > nsSNPs(binding sites):1.6% Intermolecular coevolution Functional constraints
Comparison with annotation of amino acid mutagenesis Proteins with a predicted motif amino acid mutagenesis n=985 Protein sequence Protein-protein binding sites Others functionally important sites DNA binding sites Mutagenesis annotation (UniProt) (n=38) 16 cases: predicted motifs overlap the mutated amino acid
Master's Project Proposal: Cross-species analysis of protein-protein binding motifs