Multiple Mapping Method with Multiple Templates (M4T):
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Andr á s Fiser Department of Biochemistry and Seaver Center for Bioinformatics PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates. Andr á s Fiser Department of Biochemistry and Seaver Center for Bioinformatics Albert Einstein College of Medicine Bronx, New York, USA.

Download Presentation

Andr á s Fiser Department of Biochemistry and Seaver Center for Bioinformatics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Andr s fiser department of biochemistry and seaver center for bioinformatics

Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates

András Fiser

Department of Biochemistry and

Seaver Center for Bioinformatics

Albert Einstein College of Medicine

Bronx, New York, USA


Andr s fiser department of biochemistry and seaver center for bioinformatics

Comparative protein structure modeling

START

Template Search

Multiple Templates

Target – Template

Alignment

Multiple Mapping Method

Model Building

Loop, side chain modeling

Model Evaluation

Statistical potential

END


Andr s fiser department of biochemistry and seaver center for bioinformatics

Why do we need sequence alignments?

#Sequence vs. sequence:

Establishing residue equivalencies between two proteins to locate conserved/variable regions

#Sequence vs. databases:

Querying sequence databases

  • #Sequence vs. structure

  • To generate input alignment for comparative modeling / threading


Andr s fiser department of biochemistry and seaver center for bioinformatics

Ranking of models built on alternative alignments

Template: 1a6m;

Target: 1spg, chain B

~21% sequence identity

Example:

Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK

Target CLW DWTDAERAAIKALWGKIDVGEIGP—-QALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM

Target A2D DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM

Template GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY

Target CLW DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----

Target A2D DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFTPEIHEAWQKFLAVVVSALGRQYH

Problem: None of the currently available methods produce consistentlysuperior results in all cases


Andr s fiser department of biochemistry and seaver center for bioinformatics

Alternative solutions vs. sequence similarity

Instead of relying on just one alignment method, one should combine results of several alternative techniques


Multiple mapping method

Multiple Mapping Method

  • Idea:

    • Improve the accuracy of sequence-to-structure alignment by optimally splicing alternative inputs.

  • Three components:

    - Sampling

    - Algorithm

    - Scoring function


Andr s fiser department of biochemistry and seaver center for bioinformatics

MMM scoring function: increasing the dimensionality of input information

Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAIL

Target CLW DWTDAERAAIKALWGKIDVGEIGP—-QALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAV

Template KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY

Target CLWQNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----

Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAIL

Target A2D DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAV

Template KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY

Target A2D QNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFTPEIHEAWQKFLAVVVSALGRQYH

1

2

1

2

Different mapping identifies a different environment for each residue to align

Assess the “fitness” of each mapping


Andr s fiser department of biochemistry and seaver center for bioinformatics

Multiple Mapping Method: Algorithm

Step 1: Identify variable regions from the consensus alignment of the input set

Step 2: Select the best scoring variable segments, and combine them with

with the core region of the alignment.

Example:

Template 1a6m;

Target 1spg, chain B

21% sequence id

Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK

Target CLW DWTDAERAAIKALWGKIDVGEIGP—-QALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM

Target A2D DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM

Template GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY

Target CLW DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----

Target A2D DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFTPEIHEAWQKFLAVVVSALGRQYH


Andr s fiser department of biochemistry and seaver center for bioinformatics

CLUSTALW 4.6 Å

ALIGN2D 1.1 Å

MMM example using ideal scoring function

Experimental

ClustalW, RMSD 2.0 Å

Align2D, RMSD 2.7 Å

CLUSTALW 2.6 Å

ALIGN2D 6.1 Å

Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK

Target MMM DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM

Template GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGY

Target MMM DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----

Experimental

MMM, RMSD 1.8 Å


Andr s fiser department of biochemistry and seaver center for bioinformatics

Multiple Mapping Method: scoring function (1)

A composite scoring function to assess the compatibility/fit of alternative variable segments in the template structural environment.

  • The composite scoring function consists of three mostly non-overlapping components.

    • Environment-specific substitution matrices (FUGUE1).

    • A scoring scheme based on a comparison (PHD vs. DSSP) of the secondary structure types (H3P22).

    • Statistically derived residue-residue contact energy (Rykunov and Fiser3).

1Shi et al. J. Mol. Biol. (2001) 310, 243-257

2Rice et al., J. Mol. Biol (1997) 267, 1026-1038

3Rykunov & Fiser., Proteins. (2007) 67, 559-68


Andr s fiser department of biochemistry and seaver center for bioinformatics

MMM performance on 1400 pairs


Andr s fiser department of biochemistry and seaver center for bioinformatics

MMM performance on 87 pairs, meta-servers

ESypred3D

Consensus


Sampling vs scoring

Sampling vs. Scoring


Andr s fiser department of biochemistry and seaver center for bioinformatics

Summary

  • Multiple Mapping Method optimally combines alternative alignments obtained from different methods or scoring function:

    On a benchmark dataset of 6635 protein pair structural alignments, comparative models built using MMM alignments are approximately 0.3 Ǻ and 0.5 Å more accurate on average in the whole spectrum and in the <30% target-template sequence identity regions, respectively, than the average accuracy of models built using the alternative input alignments ( ~3 and ~4 Å).


Optimally combining multiple templates

Optimally combining multiple templates


Selecting multiple templates

Selecting multiple templates

  • Target sequence: by PSI-BLAST.

  • Hits selected if sequence overlap with the target is > 60% of the actual SCOP domain length or more than 75% of the PDB chain length in case of a missing SCOP classification.

  • Iterative clustering procedure identifies the most suitable templates to combine. Templates are selected or discarded according to a hierarchical selection procedure that accounts for

    • sequence identity between templates and target sequence,

    • sequence identity among templates,

    • crystal resolution of the templates,

    • contribution of templates to the target sequence (i.e. if a region is covered by several templates or by a single template only).


Single versus multiple templates

Single versus multiple templates

Using a dataset of 765 proteins with known structure two sets of models were built: (1) using one template (best E-value hit; light bars), (2) using multiple templates (grey bars)


And increased coverage

And…increased coverage

Histogram of models’ difference length. Each query sequence is modeled using single and multiple templates. The histogram shows the frequency of (Lm–Ls). Lm: length of model built using multiple templates, and Ls length of the model built using a single template.


Andr s fiser department of biochemistry and seaver center for bioinformatics

The x-ray structure, the model with multiple templates and with a single template are shown in grey, red, and blue, respectively. Multiple templates agree much better in two exposed regions: A and B, than the model built using single template.


Andr s fiser department of biochemistry and seaver center for bioinformatics

Increased CoverageThe x-ray structure, the model with multiple templates, and model with single templates are shown in grey, red, and blue, respectively. The addition of extra templates allowed obtaining a longer model that include a beta-turn-beta-turn extra region (20 amino acids), depicted in ribbon.


Andr s fiser department of biochemistry and seaver center for bioinformatics

Acknowledgement

  • Lab members:

    • Dmitrij Rykunov

    • Rotem Rubinstein

    • J. Eduardo Fajardo

    • Carlos J. Madrid-Aliste

    • Veena Venkatagiriyappa

    • Joseph Dybas

    • Mario Pujato

    • Brajesh Rai

    • Narcis Fernandez-Fuentes

    • Elliot Sternberger


Andr s fiser department of biochemistry and seaver center for bioinformatics

Http://www.fiserlab.org/servers


  • Login