Biomolecular modeling building a 3d protein structure from its sequence
Download
1 / 96

Biomolecular Modeling: building a 3D protein structure from its sequence - PowerPoint PPT Presentation


Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School of Medicine National University of Singapore (shoba@bic.nus.edu.sg). Biomolecular Modeling: building a 3D protein

Related searches for Biomolecular Modeling: building a 3D protein structure from its sequence

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Biomolecular Modeling: building a 3D protein structure from its sequence

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Biomolecular modeling building a 3d protein structure from its sequence l.jpg

Prof Shoba RanganathanDept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia &Dept of Biochemistry, Yong Loo Lin School of MedicineNational University of Singapore(shoba@bic.nus.edu.sg)

Biomolecular Modeling:

building a 3D protein

structure from its sequence


Why protein structure l.jpg

Why protein structure?

  • In the factory of the living cell, proteins are the workers, performing a variety of tasks

  • Each protein adopts a particular folding pattern that determines its function

    • The 3D structure of a protein brings into close proximity residues that are far apart in the amino acid sequence


How does a protein fold l.jpg

How does a protein fold?

  • Most newly synthesized proteins fold without assistance!

  • Ribonuclease A: denatured protein could refold and recover its activity (C. Anfinsen -1966)

    • “Structure implies function”

      • The amino acid sequence encodes the protein’s structural information


Slide4 l.jpg

  • Understanding Protein Structure

  • A Quick Overview of Sequence Analysis

  • Finding a Structural Homologue

  • Template Selection

  • Aligning the Query Sequence to Template Structure(s)

  • Building the Model


The basics l.jpg

The basics

  • Proteins are linear heteropolymers: one or more polypeptide chains

    • Repeat units: 20 amino acid residues

      • Range from a few 10s-1000s

        • Three-dimensional shapes (“folds”) adopted vary enormously

          • Experimental methods: X-ray crystallography, electron microscopy and NMR (nuclear magnetic resonance)


The l amino acid l.jpg

The (L-)amino acid

N

C

O

O

R

Side chain = H,CH3,…

Backbone

Amino

C a

+

-

Carboxylate


The peptide bond l.jpg

The peptide bond


Coplanar atoms l.jpg

Coplanar atoms


Levels of protein structure l.jpg

Levels of protein structure

  • Zeroth: amino acid composition

  • Primary

    • This is simply the order of covalent linkages along the polypeptide chain, i.e. the sequence itself


Levels of protein structure10 l.jpg

Levels of protein structure

  • Secondary

    • Local organization of the protein backbone: a-helix, b-strand (which assemble into b-sheets), turn and interconnecting loop


Ramachandran phi psi plot l.jpg

Ramachandran / phi-psi plot

b-sheet

a-helix (left handed)

y

a-helix (right handed)

f


Levels of protein structure12 l.jpg

Levels of protein structure

  • Tertiary

    • packing of secondary structure elements into a compact spatial unit

    • “Fold” or domain – this is the level to which structure prediction is currently possible


Levels of protein structure13 l.jpg

Levels of protein structure

  • Quaternary

    • Assembly of homo- or heteromeric protein chains

    • Usually the functional unit of a protein, especially for enzymes


Structural classes l.jpg

Structural classes

All-a (helical)

All-b (sheet)


Slide15 l.jpg

Structural classes

a/b(parallelb-sheet)

a+b(antiparallelb-sheet)


Structural information l.jpg

Structural information

  • Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics

    • http://www.rcsb.org/pdb

    • > 45,744structures of proteins

    • Also contains structures of DNA, carbohydrates, protein-DNA complexes and numerous small ligand molecules.


The pdb data l.jpg

The PDB data

  • Text files

  • Each entry is identified by a unique 4-letter code: say 1emg

  • 1emg entry

    • Header information

    • Atomic coordinates in Å (1 Ångstrom = 1.0e-10 m)


Pdb header details l.jpg

PDB Header details

  • identifies the molecule, any modifications, date of release of PDB entry

  • organism, keywords, method

  • Authors, reference, resolution if X-ray structure

  • Sequence, x-reference to sequence databases

HEADER GREENFLUORESCENT PROTEIN 12-NOV-98 1EMG

TITLE GREEN FLUORESCENT PROTEIN (65-67 REPLACED BY CRO, S65T

TITLE 2 SUBSTITUTION, Q80R)

COMPND MOL_ID: 1;

COMPND 2 MOLECULE: GREEN FLUORESCENT PROTEIN;

COMPND 3 CHAIN: A;

COMPND 4 ENGINEERED: YES;

COMPND 5 MUTATION: 65 - 67 REPLACED BY CRO, S65T SUBSTITUTION, Q80R

COMPND 6 SUBSTITUTION;

COMPND 7 BIOLOGICAL_UNIT: MONOMER


The data itself l.jpg

The data itself

  • Coordinates for each heavy (non-hydrogen) atom from the first residue to the last

  • Any ligands (starting with HETATM) follow the biomacromolecule

  • O of water molecules (also HETATM) at the end

ATOM 1 N SER A 2 29.089 9.397 51.904 1.00 81.75

ATOM 2 CA SER A 2 27.883 10.162 52.185 1.00 79.71

ATOM 3 C SER A 2 26.659 9.634 51.463 1.00 82.64

ATOM 4 O SER A 2 26.718 8.686 50.686 1.00 81.02

ATOM 5 CB SER A 2 28.039 11.660 51.932 1.00 75.59

ATOM 6 OG SER A 2 27.582 12.038 50.639 1.00 43.28

-------

ATOM 1737 CD1 ILE A 229 39.535 21.584 52.346 1.00 41.62

TER 1738 ILE A 229


Structural families l.jpg

Structural Families

  • SCOP - Structural Classification Of Proteins

    • http://scop.mrc-lmb.cam.ac.uk/scop

  • FSSP – Family of Structurally Similar Proteins

    • http://www.ebi.ac.uk/dali/fssp/

  • CATH – Class, Architecture, Topology, Homology

    • http://www.biochem.ucl.ac.uk/bsm/cath


Structure comparison facts l.jpg

Structure comparison facts

  • Proteins adopt a limited number of topologies.

    • Homologous sequences show very similar structures, with strong conservation in secondary structural elements: variations in non-conserved regions.

      • In the absence of sequence homology, some folds are preferred by vastly different sequences.


Structure comparison facts22 l.jpg

Structure comparison facts

  • The “active site” (a collection of functionally critical residues) is remarkably conserved, even when the protein fold is different.

    • Structural models (especially those based on homology) provide insights into possible function for new proteins.

      • Implications for

        • protein engineering

        • ligand/drug design,

        • function assignment of genomic data.


Visualizing pdb information l.jpg

Visualizing PDB information

  • RASMOL: most popular, available for all platforms

    (Sayle et al, 2005)

    http://www.bernstein-plus-sons.com/software/rasmol

  • DeepView Swiss-PDBViewer: from Swiss-Prot

    (Guex & Peitsch, 1997)

    http://tw.expasy.org/spdbv/

  • Chemscape Chime Plug-in: for PC and Mac

    http://www.mdli.com/products/framework/chemscape

  • PyMOL: Very good, available for all platforms

    (DeLano, W.L. The PyMOL Molecular Graphics System, 2002)

    http://pymol.sourceforge.net


Rasmol views sh2 domain l.jpg

RASMOL views - SH2 domain

All-atom model

Space-filling model

Atom colors:NOCS


Rasmol views 1sha l.jpg

RASMOL views – 1sha

Ca Trace

Ribbon

Rainbow coloring:Nto C

Coloring:by structural units


Homologous folds l.jpg

Homologous folds

  • Hemoglobinand erythrocruorin: 31% sequence identity


Analogous folds l.jpg

Analogous folds

  • Hemoglobin and phycocyanin: 9% sequence identity


Surface properties l.jpg

Surface Properties

Cro repressor – DNA complex

  • Basic residues in blue

  • Acidic residues in red


Mapping functional regions l.jpg

Mapping Functional Regions

Immunoglobulin l light chain - dimer

  • Hydrophobhic residues in magenta

  • Hydrophilic and charged residues in cyan


Slide30 l.jpg

  • Understanding Protein Structure

  • A Quick Overview of Sequence Analysis

  • Finding a Structural Homologue

  • Template Selection

  • Aligning the Query Sequence to Template Structure(s)

  • Building the Model


Siblings and cousins l.jpg

Siblings and Cousins

  • Siblings or homologues: sequences with at least 30% sequence identity over an alignment length of at least 125 residues and conservation of function.

  • Cousins or paralogues: < 30% identity but with conservation of function

  • Both show structural conservation

  • Homologues located using a database search tool such as BLAST (free webserver): http://www.ncbi.nlm.nih.gov/BLAST

  • Paralogues require a more sensitive method such as PSI-BLAST


Multiple sequence alignment l.jpg

Multiple Sequence Alignment

Finding the best way to match the residues of

related sequences

  • Identical residues must be lined up

  • The rest should be arranged, based on

    • observed substitution in protein families

    • chemical similarity

    • charge similarity

  • Where it is impossible to get the residues to line up, the biological concept of insertion/deletion in invoked: the ‘gap’ in alignments


Msa methods l.jpg

MSA Methods

  • CLUSTALW / CLUSTALX (Thompson et al, 1997): freely available for all platforms and one of the best alignment programs

    http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html

  • MAXHOM (Sander & Schneider, 1991): alignment based on maximum homology; available via the PredictProtein webserver, free for academics

    http://cubic.bioc.columbia.edu/predictprotein/

  • MALIGN (Johnson et al, 1994): freely available UNIX program, based on the structural alignment of protein families

    http://www.abo.fi/fak/mnf/bkf/research/johnson/software.html


Alignment checks l.jpg

Alignment Checks

  • Conservation of functionally important residues: e.g. the catalytic triad (Asp-Ser-His) that are essential for serine proteinase activity

  • Line up of structurally important residues: e.g. cysteines forming disulfide bonds

  • Overall, maximizing the alignment of “like” residues

  • Completely conserved residues usually indicate some conserved structural or functional role, especially buried charges


Sequence motifs patterns l.jpg

Sequence Motifs & Patterns

  • From the analysis of the alignment of protein families

  • Conserved sequence features, usually associated with a specific function

  • PROSITE (Hulo et al, 2006) database for protein “signature” patterns: http://www.expasy.ch/prosite


Aligned sequence families l.jpg

Aligned Sequence Families

  • From alignments of homologous sequences:

    • PRINTS

    • PRODOM: http://www.toulouse.inra.fr/prodom.html

  • From Hidden Markov Model based methods:

    • PFAM: http://www.sanger.ac.uk/Pfam


Protein domains l.jpg

Protein Domains

  • Most proteins are composed of structural subunits called domains

  • A domain is a compact unit of protein structure, usually associated with a function.

  • It is usually a “fold” - in the case of monomeric soluble proteins.

  • A domain comprises normally only one protein chain: rare examples involving 2 chains are known.

  • Domains can be shared between different proteins: like a LEGO block


Protein architectures l.jpg

Protein Architectures

  • Beads-on-a-string: sequential location: tyrosine-protein kinase receptor TIE-1 (immunoglobulin, EGF, fibronectin type-3 and protein kinase).

  • Domain insertions: “plugged-in” - pyruvate kinase (1pyk)

  • SMART: smart.embl-heidelberg.de

    Simple Modular Architecture Retrieval Tool


Dissection into domains l.jpg

Dissection into Domains

  • A sequence, usually > 125 residues should be routinely checked to see how many domains are present.

  • Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence

  • E.g. NP_002917 shows similarity to G-protein regulators:


Slide40 l.jpg

  • Understanding Protein Structure

  • A Quick Overview of Sequence Analysis

  • Finding a Structural Homologue

  • Template Selection

  • Aligning the Query Sequence to Template Structure(s)

  • Building the Model


Structural homologues l.jpg

Structural Homologues

  • BLASTP vs. PDB database or PSI-BLAST: look for 4-character PDB ID

    • E < 0.005

  • Domain coverage: at least 60% coverage is recommended

  • Gaps: we don’t want them. Choose between:

    • few gaps and reasonable similarity scores or

    • lots of gaps and high similarity scores?


Small proteins disulfide bonds l.jpg

Small Proteins: Disulfide bonds

  • BLAST-type methods may not locate homologues, if Conserved Domain search is not turned on.

  • Are the Cys residues conserved?

  • Gaps: where are they on the structure?

gnl|Pfam|pfam00095, wap, WAP-type (Whey Acidic Protein)

  • four-disulfide core'.

    CD-Length = 46 residues, 100.0% aligned

  • Score = 43.9 bits (102), Expect = 1e-06

  • Q:49 KAGFCPWNLLQMISSTGPCPMKIECSSDRECSGNMKCCNVDCVMTCTPP 97

  • D: 1 KPGVCPWVSISE---AGQCLELNPCQSDEECPGNKKCCPGSCGMSCLTP46


Metal binding domains l.jpg

Metal-binding domains

C2H2 Zinc Finger

  • 2 Cys & 2 His binding to Zinc

  • Not detected even by CD-search in BLAST

  • Detected by Pfam & SMART

  • Sequence Pattern:

    #-X-C-X(1-5)-C-X3-#-X5-#-X2-H-X(3-6)-[H/C]


Structure prediction methods l.jpg

Structure Prediction Methods

  • Secondary Structure Prediction: identify local structural elements such as helices, strands and loops.

  • > 75% accuracy achievable

    • PredictProtein or PHD

      http://cubic.bioc.columbia.edu/pp/

    • PSIPRED

      http://bioinf.cs.ucl.ac.uk/psipred/

    • SSPro

      http://promoter.ics.uci.edu/BRNN-PRED/


Folds from secondary structure predictions l.jpg

Folds from Secondary Structure Predictions

  • Assembling SSEs into folds is a combinatorial problem

  • Current methods depend on available structural data for mapping predictions:

    • FORREST http://abs.cit.nih.gov/foresst/foresst.html

    • TOPITS from the PHD server http://cubic.bioc.columbia.edu/pp


Tertiary structure prediction l.jpg

Tertiary Structure Prediction

  • Fold recognition/Threading: < 20% identity typically

  • Best results obtained by combining several database search and knowledge-based tools:

  • 3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/

  • FUGUE http://www-cryst.bioc.cam.ac.uk/fugue/


Slide47 l.jpg

  • Understanding Protein Structure

  • A Quick Overview of Sequence Analysis

  • Finding a Structural Homologue

  • Template Selection

  • Aligning the Query Sequence to Template Structure(s)

  • Building the Model


One or many templates l.jpg

One or many templates?

  • Sequence similarity: extract template sequences and align with query: select the most similar structure

  • Completeness: Missing data?

REMARK 465 MISSING RESIDUES

REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE

REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN

REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)

REMARK 465

REMARK 465 M RES C SSSEQI

REMARK 465 MET A 1

REMARK 465 THR A 230

REMARK 470 M RES CSSEQI ATOMS

REMARK 470 GLU A 5 OE2

REMARK 470 GLU A 6 CG CD OE1 OE2

REMARK 470 GLU A 17 OE1


One or many templates49 l.jpg

One or many templates?

  • X-ray or NMR?:

    • Lowest resolution X-ray structure

    • X-ray and then NMR

    • NMR average over assembly

  • One or many?:

    • Structure alignment of Ca atoms

    • If 2 templates are very close, keep only one

    • Keep templates that provide new information


Many templates l.jpg

Many templates

  • Sequence alignment from structure comparison of templates (SSA) can be different from a simple sequence alignment (SA).

  • For model building,

    • align templates structurally

    • extract the corresponding SSA


Slide51 l.jpg

  • Understanding Protein Structure

  • A Quick Overview of Sequence Analysis

  • Finding a Structural Homologue

  • Template Selection

  • Aligning the Query Sequence to the Template Structure(s)

  • Building the Model


Query template alignment l.jpg

Query - Template Alignment

  • >40% identity: any alignment method is OK

  • Below this, checks are essential.

    • Collect close sequence homologues (about 10) and align to query to get MSA (multiple sequence alignment)

    • Collect several structural templates (at least 5) and align them using structure comparison methods: extract the SSA (structural sequence alignment)

    • Align MSA to SSA using profile alignment

    • Extract query and selected template(s) from the final alignment – QTA.


Qta checks l.jpg

QTA Checks

  • Residue conservation checks

    • Functional regions

    • Patterns/motifs conserved?

  • Indels

    • Combine gaps separated by few residues

  • Editing the alignment

    • Move gaps from secondary structures to loops

    • Within loops, move gaps to loop ends, i.e. turnaround point of backbone


Qta checks54 l.jpg

QTA Checks

  • Residue conservation checks

    • Functional regions

    • Patterns/motifs conserved?

  • Indels

    • Combine gaps separated by few residues

  • Editing the alignment

    • Move gaps from secondary structures to loops

    • Within loops, move gaps to loop ends, i.e. turnaround point of backbone


Visual inspection of indels l.jpg

Visual Inspection of Indels

  • 2-residue deletion from sequence alignment

  • End-of-loop 2-residue deletion


Slide56 l.jpg

  • Understanding Protein Structure

  • A Quick Overview of Sequence Analysis

  • Finding a Structural Homologue

  • Template Selection

  • Aligning the Query Sequence to Template Structure(s)

  • Building the Model


Input for model building l.jpg

Input for Model Building

  • Query sequence

  • Template structure

    • Template sequence

  • Query-template sequence alignment


Methods available l.jpg

Methods Available

  • WHATIF (Vriend G, 1990) : "

    • High quality models where template is available

      • Indels not modelled

    • Side chain rotamers

    • In silico mutations

    • In silico disulfide bond creation


Methods available59 l.jpg

Methods Available

  • SWISS-MODEL (Schwede et al, 2003) : "

    • Automatic modeling mode with multiple templates

    • Query + template input

    • High Homology situations

    • DeepView for input file creation


Methods available60 l.jpg

Methods Available

  • MODELLER (Sali & Blundell, 1993) :

    • High quality models

    • Sequence alignment

    • Structure analysis/alignment

    • Multiple templates

    • Multiple chains

    • Ligand/cofactor present

  • ESyPred3D (uses MODELLER): "

    • QTAs from several methods & neural networks

    • http://www.fundp.ac.be/urbm/bioinfo/esypred/


Methods available61 l.jpg

Methods Available

  • ICM (Ruben et al, 1994) :

    • High quality models

    • Loop modelling

      • Multiple templates not possible

    • Sequence/Structure alignment/analysis

    • Ab initio peptide modeling

    • Secondary structure prediction

  • Geno3D (Combet et al, 2002) : "

    • Automated modelling

    • Distance geometry used for loops

    • http://geno


Methods available62 l.jpg

Methods Available

  • 3D-JIGSAW (Bates et al, 2001) : "

    • Automatic modeling mode

    • Interactive user mode to select templates

    • Multiple templates

    • Multidomain protein modeling


Methods available63 l.jpg

Methods Available

  • CPH-MODELS (Lund et al, 1997) : "

    • Fully automated

    • FASTA search for templates

    • Not validated


Automatic or manual mode l.jpg

Automatic or Manual Mode?

  • Automatic: High homology

  • Manual

    • Medium/Low homology

    • Template from structure prediction

    • Multiple templates

    • Multiple chains

    • Ligand present


How good is the model l.jpg

How good is the model?

Structural Quality Analysis

  • PROCHECK (Laskowski et al, 1993) :

  • WHATIF (Vriend G, 1990) : "

  • ERRAT (Colovos & Yeates, 1993) : "


Improving ill defined regions l.jpg

Improving ill-defined regions

  • Iterative model building

    • Rebuild or anneal bad regions

    • Check/edit alignment and rebuild

  • Molecular dynamics and/or Monte Carlo simulations

    • Compute intensive

    • Input files need to be set up

    • Optional


Molecular modeling protocol l.jpg

Molecular Modeling Protocol

  • Resources required

    • The query sequence  

    • Personal computer with internet connectivity

    • RASMOL/DeepView for PDB structure visualization

    • CLUSTALX sequence alignment software

    • Access to a UNIX workstation

    • MODELLER/ICM – UNIX software

    • WHATIF – UNIX/PC software

    • PROCHECK – UNIX or Windows software


Mm protocol input files l.jpg

MM Protocol – Input Files

Minimum requirement

  • Query sequence

  • Template structure

    • Template sequence

  • Query-Template alignment


Ex 1 high homology case l.jpg

Ex 1. High Homology Case

  • Human SOX9 WT - homologous to

    SRY (PDB: 1HRY) - 49% identity

    S9WT: ..AGAACAATGG.. highest

    SOXCORE:..GCAACAATCT.. least

  • Mutants (campomelic dysplasia):

    • F12L: No DNA binding

    • H65Y: Minimal binding

    • P70R: altered specificity; no SOXCORE

    • A19V: near WT but normal binding


1 sox9 models l.jpg

1. SOX9 Models

SOX9 & P70Rbased onSRY

  • WT & P70R models built

  • Ca overlay:

    WT-SRY ~ 0.72 Å

  • J. Biol. Chem. 274 (1999) 24023


Ex 1 sox9 dna models l.jpg

Ex 1. SOX9 + DNA Models

SOX9-WT

  • Observed disease-linked mutations mapped

  • Other residues in DNA-binding groove determined

SRY

SOX9-P70R


Ex 2 low homology situation l.jpg

Ex 2. Low Homology Situation

  • Pigments from reef-building corals: similar to Pocilloporin

  • fluoresce under UV and visible radiation

  • similar to the Green Fluorescent Protein - GFP (19.6% identity)

  • contain ‘QYG’ instead of ‘SYG’ in GFP, as proposed fluorophore


2 alignment of poc4 gfp l.jpg

2. Alignment of POC4 & GFP


2 poc4 model l.jpg

2. POC4 Model

  • Barrel ends open

  • C-ter not included

  • b-sheet OK

  • ‘QYG’ fits the site!

  • 26 residues within 5Å of QYG (only 19 in GFP)

  • Increased thermal stability

  • UV protection


Ex 3 small disulfide bonded protein complement factor h l.jpg

Ex 3. Small Disulfide-bonded Protein: Complement Factor H

C-ter

C4

C2

HV loop

C3

C1

N-ter

  • 20 tandem homologous units = SCRs (short consensus repeat) or “sushi” regions

  • Each SCR is ~ 60 aa:

    • conserved Y, P, G

    • 2 disulfide bridges:1-3 & 2-4

    • Linkers of 3-8 aa

  • Heparin binding SCRs: 7 (high affinity) & 20

  • Previous SCRs required for activity: minimum constructs are fH67 and fH18-20


3 sequence alignment of close functional homologues l.jpg

3. Sequence Alignment of Close Functional Homologues

Site A Site d Site B Site c


3 templates for fh scrs 6 7 l.jpg

3. Templates for fH SCRs 6-7

vcp4

hfH15

vcp4

vcp3

hfH16

hfH15

vcp3

hfH16

  • hfH SCRs 15&16 (fH1516; PDB ID: 1HFH)

  • Vaccinia virus complement control protein domains 3&4 (vcp34; PDB ID: 1VVC)

  • Orientations differ considerably

  • Vcp34 28% identical to hfH67 compared to hfH1516 (25%) !


3 query templates alignment l.jpg

3. Query-Templates alignment


3 hfh67 model l.jpg

3. hfH67 model

Sialic acid

Heparin

disaccharide repeat

hfH67

hfH1516


3 locating residues for mutation from model l.jpg

3. Locating residues for mutation from model

SCRs 6&7

SCRs 15&16

Pacific Symposium of Biocomputing 2000, 5:155


Ex 4 protein engineering l.jpg

Ex 4: Protein Engineering

  • Thermolysin-like protease unstable at high temperatures (> 40 ºC unlike trypsin)

  • Homology Model built

  • G8 & N60 suited for disulfide bond

  • Double Mutant functional at 92.5 ºC

J Mansfeld et al. Extreme Stabilization of a Thermolysin-like Protease by an Engineered Disulfide Bond” J. Biol. Chem. 1997 272: 11152-11156.


Ex 5 multiple chains human hand foot mouth disease virus capsid l.jpg

Ex 5. Multiple chains: Human Hand, Foot & Mouth Disease Virus capsid

  • 2000 outbreak of HFMD in Singapore: thousands of children affected – 4 deaths (The Lancet, 2000, 356, 1338)

  • Major etiological agent: EV71 (enterovirus group)

  • Neurological complications


5 ev71 genome structure l.jpg

5. EV71 genome structure

Capsid

Replication

  • 95/94% (RNA) homology coxsackievirus A16/B3

  • Only 1% difference between neurovirulent and non-neuro virulent isolates

  • Most variations in non-capsid regions

  • Within capsid regions, VP1 shows maximum variability relative to other Evs

  • Differences in capsid region 1: VP1 & VP2, 2: VP3

VP4 VP2 VP3 VP1 2A 2B 2C 3A,B 3C 3D

PolyA

VPg

5’ AUG

3’ UAG

Pan-enterovirus primers

EV71 specific primers


5 picornaviridae l.jpg

5. Picornaviridae

Icosahedral

Capsid


5 template hunt l.jpg

5. Template hunt

BLASTP against PDB sequences

  • VP1: 3 templates

    • 1BEV 38.7% (bovine enterovirus)

    • 1EAH 36.5% (poliovirus type 2 strain Lansing)

    • 1FPN 38.0% (human rhinovirus serotype 2)

  • VP2: 1BEV 56.7%

  • VP3: 1BEV 54.9%

  • VP4: 1BEV* 50.0%


5 fixing the vp1 alignment l.jpg

5. Fixing the VP1 Alignment

  • Structural alignment of templates: using VAST (Gibrat, Madej, & Bryant, 1996)

  • Extract corresponding sequence alignment

  • Match HFMDV VP1 to aligned templates using profile alignment in CLUSTALW


5 vp1 alignment to templates l.jpg

5. VP1 alignment to templates

3,10-helices

a-helices

b-strands

Pocket-factor binding residues


5 model building steps l.jpg

5. Model building steps

  • Build all 4 capsid proteins (VP1-VP4) together to ensure 3D fit

  • Use 1BEV alone for VP2-VP4

  • For VP1: use aligned 1BEV, 1EAH, 1FPN

  • Check model


5 round 1 vp1 vp2 vp3 vp4 l.jpg

5. Round 1: VP1,VP2,VP3,VP4

  • Clip hanging ends

  • Re-position problem loops:

    adjust gaps in alignment

  • Build again


5 round 2 pentamer check l.jpg

5. Round 2: Pentamer Check

  • Loops look OK

  • Build pentamer

  • Publish….

  • Oops: clash in pentamer assembly. Go back


5 close encounters of the 3 rd kind l.jpg

5. Close encounters of the 3rd Kind

  • Build only VP3 pentamer

  • N-terminus of each VP3 hydrogen-bonded

  • Also, in BEV, Asp-Lys ion pair

  • First 25 aa overlay v. well


5 fourth foray build with 5 vp3s l.jpg

5. Fourth foray: build with 5 VP3s

  • Only first 50aa of the other 4 VP3s included

  • Model resulted in knots due to insufficient refinement cycles

  • However, VP3 pentameric region OK


5 fifth and final attempt l.jpg

5. Fifth and final attempt


5 canyon pit and antigenic sites l.jpg

5. Canyon Pit and Antigenic Sites

Poliovirus

sites

Neurovirulent Polio (mouse)

Cardiovirus


5 putative antigenic sites l.jpg

5. Putative antigenic sites

VP1

VP2


5 hfmdv conclusions l.jpg

5. HFMDV Conclusions

  • Unique surface loops identified for

    • Immunodiagnostic assays

    • Vaccine design

    • Antibodies being generated

  • Canyon pit: depth is similar to BEV

  • Mapping the antigenic regions of other related enteroviruses on the HFMDV surface: specific VP1 and VP2 sites buried

  • Sunita Singh, Vincent T. K. Chow, C. L. Poh, M. C. Phoon: Dept. of Microbiology, NUS

  • Applied Bioinformatics, Vol 1, issue 1, 43-52: invited research article


ad
  • Login