biomolecular modeling building a 3d protein structure from its sequence
Download
Skip this Video
Download Presentation
Biomolecular Modeling: building a 3D protein structure from its sequence

Loading in 2 Seconds...

play fullscreen
1 / 96

BioMolecular Modeling - Building 3D Protein Structures from ... - PowerPoint PPT Presentation


  • 594 Views
  • Uploaded on

Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School of Medicine National University of Singapore ([email protected]). Biomolecular Modeling: building a 3D protein

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'BioMolecular Modeling - Building 3D Protein Structures from ...' - Antony


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
biomolecular modeling building a 3d protein structure from its sequence

Prof Shoba RanganathanDept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia &Dept of Biochemistry, Yong Loo Lin School of MedicineNational University of Singapore([email protected])

Biomolecular Modeling:

building a 3D protein

structure from its sequence

why protein structure
Why protein structure?
  • In the factory of the living cell, proteins are the workers, performing a variety of tasks
  • Each protein adopts a particular folding pattern that determines its function
    • The 3D structure of a protein brings into close proximity residues that are far apart in the amino acid sequence
how does a protein fold
How does a protein fold?
  • Most newly synthesized proteins fold without assistance!
  • Ribonuclease A: denatured protein could refold and recover its activity (C. Anfinsen -1966)
    • “Structure implies function”
      • The amino acid sequence encodes the protein’s structural information
slide4
Understanding Protein Structure
  • A Quick Overview of Sequence Analysis
  • Finding a Structural Homologue
  • Template Selection
  • Aligning the Query Sequence to Template Structure(s)
  • Building the Model
the basics
The basics
  • Proteins are linear heteropolymers: one or more polypeptide chains
    • Repeat units: 20 amino acid residues
      • Range from a few 10s-1000s
        • Three-dimensional shapes (“folds”) adopted vary enormously
          • Experimental methods: X-ray crystallography, electron microscopy and NMR (nuclear magnetic resonance)
the l amino acid
The (L-)amino acid

N

C

O

O

R

Side chain = H,CH3,…

Backbone

Amino

C a

+

-

Carboxylate

levels of protein structure
Levels of protein structure
  • Zeroth: amino acid composition
  • Primary
    • This is simply the order of covalent linkages along the polypeptide chain, i.e. the sequence itself
levels of protein structure10
Levels of protein structure
  • Secondary
    • Local organization of the protein backbone: a-helix, b-strand (which assemble into b-sheets), turn and interconnecting loop
ramachandran phi psi plot
Ramachandran / phi-psi plot

b-sheet

a-helix (left handed)

y

a-helix (right handed)

f

levels of protein structure12
Levels of protein structure
  • Tertiary
    • packing of secondary structure elements into a compact spatial unit
    • “Fold” or domain – this is the level to which structure prediction is currently possible
levels of protein structure13
Levels of protein structure
  • Quaternary
    • Assembly of homo- or heteromeric protein chains
    • Usually the functional unit of a protein, especially for enzymes
structural classes
Structural classes

All-a (helical)

All-b (sheet)

slide15

Structural classes

a/b(parallelb-sheet)

a+b(antiparallelb-sheet)

structural information
Structural information
  • Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics
    • http://www.rcsb.org/pdb
    • > 45,744structures of proteins
    • Also contains structures of DNA, carbohydrates, protein-DNA complexes and numerous small ligand molecules.
the pdb data
The PDB data
  • Text files
  • Each entry is identified by a unique 4-letter code: say 1emg
  • 1emg entry
    • Header information
    • Atomic coordinates in Å (1 Ångstrom = 1.0e-10 m)
pdb header details
PDB Header details
  • identifies the molecule, any modifications, date of release of PDB entry
  • organism, keywords, method
  • Authors, reference, resolution if X-ray structure
  • Sequence, x-reference to sequence databases

HEADER GREENFLUORESCENT PROTEIN 12-NOV-98 1EMG

TITLE GREEN FLUORESCENT PROTEIN (65-67 REPLACED BY CRO, S65T

TITLE 2 SUBSTITUTION, Q80R)

COMPND MOL_ID: 1;

COMPND 2 MOLECULE: GREEN FLUORESCENT PROTEIN;

COMPND 3 CHAIN: A;

COMPND 4 ENGINEERED: YES;

COMPND 5 MUTATION: 65 - 67 REPLACED BY CRO, S65T SUBSTITUTION, Q80R

COMPND 6 SUBSTITUTION;

COMPND 7 BIOLOGICAL_UNIT: MONOMER

the data itself
The data itself
  • Coordinates for each heavy (non-hydrogen) atom from the first residue to the last
  • Any ligands (starting with HETATM) follow the biomacromolecule
  • O of water molecules (also HETATM) at the end

ATOM 1 N SER A 2 29.089 9.397 51.904 1.00 81.75

ATOM 2 CA SER A 2 27.883 10.162 52.185 1.00 79.71

ATOM 3 C SER A 2 26.659 9.634 51.463 1.00 82.64

ATOM 4 O SER A 2 26.718 8.686 50.686 1.00 81.02

ATOM 5 CB SER A 2 28.039 11.660 51.932 1.00 75.59

ATOM 6 OG SER A 2 27.582 12.038 50.639 1.00 43.28

-------

ATOM 1737 CD1 ILE A 229 39.535 21.584 52.346 1.00 41.62

TER 1738 ILE A 229

structural families
Structural Families
  • SCOP - Structural Classification Of Proteins
    • http://scop.mrc-lmb.cam.ac.uk/scop
  • FSSP – Family of Structurally Similar Proteins
    • http://www.ebi.ac.uk/dali/fssp/
  • CATH – Class, Architecture, Topology, Homology
    • http://www.biochem.ucl.ac.uk/bsm/cath
structure comparison facts
Structure comparison facts
  • Proteins adopt a limited number of topologies.
    • Homologous sequences show very similar structures, with strong conservation in secondary structural elements: variations in non-conserved regions.
      • In the absence of sequence homology, some folds are preferred by vastly different sequences.
structure comparison facts22
Structure comparison facts
  • The “active site” (a collection of functionally critical residues) is remarkably conserved, even when the protein fold is different.
    • Structural models (especially those based on homology) provide insights into possible function for new proteins.
      • Implications for
        • protein engineering
        • ligand/drug design,
        • function assignment of genomic data.
visualizing pdb information
Visualizing PDB information
  • RASMOL: most popular, available for all platforms

(Sayle et al, 2005)

http://www.bernstein-plus-sons.com/software/rasmol

  • DeepView Swiss-PDBViewer: from Swiss-Prot

(Guex & Peitsch, 1997)

http://tw.expasy.org/spdbv/

  • Chemscape Chime Plug-in: for PC and Mac

http://www.mdli.com/products/framework/chemscape

  • PyMOL: Very good, available for all platforms

(DeLano, W.L. The PyMOL Molecular Graphics System, 2002)

http://pymol.sourceforge.net

rasmol views sh2 domain
RASMOL views - SH2 domain

All-atom model

Space-filling model

Atom colors:NOCS

rasmol views 1sha
RASMOL views – 1sha

Ca Trace

Ribbon

Rainbow coloring:Nto C

Coloring:by structural units

homologous folds
Homologous folds
  • Hemoglobinand erythrocruorin: 31% sequence identity
analogous folds
Analogous folds
  • Hemoglobin and phycocyanin: 9% sequence identity
surface properties
Surface Properties

Cro repressor – DNA complex

  • Basic residues in blue
  • Acidic residues in red
mapping functional regions
Mapping Functional Regions

Immunoglobulin l light chain - dimer

  • Hydrophobhic residues in magenta
  • Hydrophilic and charged residues in cyan
slide30
Understanding Protein Structure
  • A Quick Overview of Sequence Analysis
  • Finding a Structural Homologue
  • Template Selection
  • Aligning the Query Sequence to Template Structure(s)
  • Building the Model
siblings and cousins
Siblings and Cousins
  • Siblings or homologues: sequences with at least 30% sequence identity over an alignment length of at least 125 residues and conservation of function.
  • Cousins or paralogues: < 30% identity but with conservation of function
  • Both show structural conservation
  • Homologues located using a database search tool such as BLAST (free webserver): http://www.ncbi.nlm.nih.gov/BLAST
  • Paralogues require a more sensitive method such as PSI-BLAST
multiple sequence alignment
Multiple Sequence Alignment

Finding the best way to match the residues of

related sequences

  • Identical residues must be lined up
  • The rest should be arranged, based on
    • observed substitution in protein families
    • chemical similarity
    • charge similarity
  • Where it is impossible to get the residues to line up, the biological concept of insertion/deletion in invoked: the ‘gap’ in alignments
msa methods
MSA Methods
  • CLUSTALW / CLUSTALX (Thompson et al, 1997): freely available for all platforms and one of the best alignment programs

http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html

  • MAXHOM (Sander & Schneider, 1991): alignment based on maximum homology; available via the PredictProtein webserver, free for academics

http://cubic.bioc.columbia.edu/predictprotein/

  • MALIGN (Johnson et al, 1994): freely available UNIX program, based on the structural alignment of protein families

http://www.abo.fi/fak/mnf/bkf/research/johnson/software.html

alignment checks
Alignment Checks
  • Conservation of functionally important residues: e.g. the catalytic triad (Asp-Ser-His) that are essential for serine proteinase activity
  • Line up of structurally important residues: e.g. cysteines forming disulfide bonds
  • Overall, maximizing the alignment of “like” residues
  • Completely conserved residues usually indicate some conserved structural or functional role, especially buried charges
sequence motifs patterns
Sequence Motifs & Patterns
  • From the analysis of the alignment of protein families
  • Conserved sequence features, usually associated with a specific function
  • PROSITE (Hulo et al, 2006) database for protein “signature” patterns: http://www.expasy.ch/prosite
aligned sequence families
Aligned Sequence Families
  • From alignments of homologous sequences:
    • PRINTS
    • PRODOM: http://www.toulouse.inra.fr/prodom.html
  • From Hidden Markov Model based methods:
    • PFAM: http://www.sanger.ac.uk/Pfam
protein domains
Protein Domains
  • Most proteins are composed of structural subunits called domains
  • A domain is a compact unit of protein structure, usually associated with a function.
  • It is usually a “fold” - in the case of monomeric soluble proteins.
  • A domain comprises normally only one protein chain: rare examples involving 2 chains are known.
  • Domains can be shared between different proteins: like a LEGO block
protein architectures
Protein Architectures
  • Beads-on-a-string: sequential location: tyrosine-protein kinase receptor TIE-1 (immunoglobulin, EGF, fibronectin type-3 and protein kinase).
  • Domain insertions: “plugged-in” - pyruvate kinase (1pyk)
  • SMART: smart.embl-heidelberg.de

Simple Modular Architecture Retrieval Tool

dissection into domains
Dissection into Domains
  • A sequence, usually > 125 residues should be routinely checked to see how many domains are present.
  • Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence
  • E.g. NP_002917 shows similarity to G-protein regulators:
slide40
Understanding Protein Structure
  • A Quick Overview of Sequence Analysis
  • Finding a Structural Homologue
  • Template Selection
  • Aligning the Query Sequence to Template Structure(s)
  • Building the Model
structural homologues
Structural Homologues
  • BLASTP vs. PDB database or PSI-BLAST: look for 4-character PDB ID
    • E < 0.005
  • Domain coverage: at least 60% coverage is recommended
  • Gaps: we don’t want them. Choose between:
    • few gaps and reasonable similarity scores or
    • lots of gaps and high similarity scores?
small proteins disulfide bonds
Small Proteins: Disulfide bonds
  • BLAST-type methods may not locate homologues, if Conserved Domain search is not turned on.
  • Are the Cys residues conserved?
  • Gaps: where are they on the structure?

gnl|Pfam|pfam00095, wap, WAP-type (Whey Acidic Protein)

  • four-disulfide core\'.

CD-Length = 46 residues, 100.0% aligned

  • Score = 43.9 bits (102), Expect = 1e-06
  • Q:49 KAGFCPWNLLQMISSTGPCPMKIECSSDRECSGNMKCCNVDCVMTCTPP 97
  • D: 1 KPGVCPWVSISE---AGQCLELNPCQSDEECPGNKKCCPGSCGMSCLTP46
metal binding domains
Metal-binding domains

C2H2 Zinc Finger

  • 2 Cys & 2 His binding to Zinc
  • Not detected even by CD-search in BLAST
  • Detected by Pfam & SMART
  • Sequence Pattern:

#-X-C-X(1-5)-C-X3-#-X5-#-X2-H-X(3-6)-[H/C]

structure prediction methods
Structure Prediction Methods
  • Secondary Structure Prediction: identify local structural elements such as helices, strands and loops.
  • > 75% accuracy achievable
    • PredictProtein or PHD

http://cubic.bioc.columbia.edu/pp/

    • PSIPRED

http://bioinf.cs.ucl.ac.uk/psipred/

    • SSPro

http://promoter.ics.uci.edu/BRNN-PRED/

folds from secondary structure predictions
Folds from Secondary Structure Predictions
  • Assembling SSEs into folds is a combinatorial problem
  • Current methods depend on available structural data for mapping predictions:
    • FORREST http://abs.cit.nih.gov/foresst/foresst.html
    • TOPITS from the PHD server http://cubic.bioc.columbia.edu/pp
tertiary structure prediction
Tertiary Structure Prediction
  • Fold recognition/Threading: < 20% identity typically
  • Best results obtained by combining several database search and knowledge-based tools:
  • 3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/
  • FUGUE http://www-cryst.bioc.cam.ac.uk/fugue/
slide47
Understanding Protein Structure
  • A Quick Overview of Sequence Analysis
  • Finding a Structural Homologue
  • Template Selection
  • Aligning the Query Sequence to Template Structure(s)
  • Building the Model
one or many templates
One or many templates?
  • Sequence similarity: extract template sequences and align with query: select the most similar structure
  • Completeness: Missing data?

REMARK 465 MISSING RESIDUES

REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE

REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN

REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)

REMARK 465

REMARK 465 M RES C SSSEQI

REMARK 465 MET A 1

REMARK 465 THR A 230

REMARK 470 M RES CSSEQI ATOMS

REMARK 470 GLU A 5 OE2

REMARK 470 GLU A 6 CG CD OE1 OE2

REMARK 470 GLU A 17 OE1

one or many templates49
One or many templates?
  • X-ray or NMR?:
    • Lowest resolution X-ray structure
    • X-ray and then NMR
    • NMR average over assembly
  • One or many?:
    • Structure alignment of Ca atoms
    • If 2 templates are very close, keep only one
    • Keep templates that provide new information
many templates
Many templates
  • Sequence alignment from structure comparison of templates (SSA) can be different from a simple sequence alignment (SA).
  • For model building,
    • align templates structurally
    • extract the corresponding SSA
slide51
Understanding Protein Structure
  • A Quick Overview of Sequence Analysis
  • Finding a Structural Homologue
  • Template Selection
  • Aligning the Query Sequence to the Template Structure(s)
  • Building the Model
query template alignment
Query - Template Alignment
  • >40% identity: any alignment method is OK
  • Below this, checks are essential.
    • Collect close sequence homologues (about 10) and align to query to get MSA (multiple sequence alignment)
    • Collect several structural templates (at least 5) and align them using structure comparison methods: extract the SSA (structural sequence alignment)
    • Align MSA to SSA using profile alignment
    • Extract query and selected template(s) from the final alignment – QTA.
qta checks
QTA Checks
  • Residue conservation checks
    • Functional regions
    • Patterns/motifs conserved?
  • Indels
    • Combine gaps separated by few residues
  • Editing the alignment
    • Move gaps from secondary structures to loops
    • Within loops, move gaps to loop ends, i.e. turnaround point of backbone
qta checks54
QTA Checks
  • Residue conservation checks
    • Functional regions
    • Patterns/motifs conserved?
  • Indels
    • Combine gaps separated by few residues
  • Editing the alignment
    • Move gaps from secondary structures to loops
    • Within loops, move gaps to loop ends, i.e. turnaround point of backbone
visual inspection of indels
Visual Inspection of Indels
  • 2-residue deletion from sequence alignment
  • End-of-loop 2-residue deletion
slide56
Understanding Protein Structure
  • A Quick Overview of Sequence Analysis
  • Finding a Structural Homologue
  • Template Selection
  • Aligning the Query Sequence to Template Structure(s)
  • Building the Model
input for model building
Input for Model Building
  • Query sequence
  • Template structure
    • Template sequence
  • Query-template sequence alignment
methods available
Methods Available
  • WHATIF (Vriend G, 1990) : "
    • High quality models where template is available
      • Indels not modelled
    • Side chain rotamers
    • In silico mutations
    • In silico disulfide bond creation
methods available59
Methods Available
  • SWISS-MODEL (Schwede et al, 2003) : "
    • Automatic modeling mode with multiple templates
    • Query + template input
    • High Homology situations
    • DeepView for input file creation
methods available60
Methods Available
  • MODELLER (Sali & Blundell, 1993) :
    • High quality models
    • Sequence alignment
    • Structure analysis/alignment
    • Multiple templates
    • Multiple chains
    • Ligand/cofactor present
  • ESyPred3D (uses MODELLER): "
    • QTAs from several methods & neural networks
    • http://www.fundp.ac.be/urbm/bioinfo/esypred/
methods available61
Methods Available
  • ICM (Ruben et al, 1994) :
    • High quality models
    • Loop modelling
      • Multiple templates not possible
    • Sequence/Structure alignment/analysis
    • Ab initio peptide modeling
    • Secondary structure prediction
  • Geno3D (Combet et al, 2002) : "
    • Automated modelling
    • Distance geometry used for loops
    • http://geno
methods available62
Methods Available
  • 3D-JIGSAW (Bates et al, 2001) : "
    • Automatic modeling mode
    • Interactive user mode to select templates
    • Multiple templates
    • Multidomain protein modeling
methods available63
Methods Available
  • CPH-MODELS (Lund et al, 1997) : "
    • Fully automated
    • FASTA search for templates
    • Not validated
automatic or manual mode
Automatic or Manual Mode?
  • Automatic: High homology
  • Manual
    • Medium/Low homology
    • Template from structure prediction
    • Multiple templates
    • Multiple chains
    • Ligand present
how good is the model
How good is the model?

Structural Quality Analysis

  • PROCHECK (Laskowski et al, 1993) :
  • WHATIF (Vriend G, 1990) : "
  • ERRAT (Colovos & Yeates, 1993) : "
improving ill defined regions
Improving ill-defined regions
  • Iterative model building
    • Rebuild or anneal bad regions
    • Check/edit alignment and rebuild
  • Molecular dynamics and/or Monte Carlo simulations
    • Compute intensive
    • Input files need to be set up
    • Optional
molecular modeling protocol
Molecular Modeling Protocol
  • Resources required
    • The query sequence  
    • Personal computer with internet connectivity
    • RASMOL/DeepView for PDB structure visualization
    • CLUSTALX sequence alignment software
    • Access to a UNIX workstation
    • MODELLER/ICM – UNIX software
    • WHATIF – UNIX/PC software
    • PROCHECK – UNIX or Windows software
mm protocol input files
MM Protocol – Input Files

Minimum requirement

  • Query sequence
  • Template structure
    • Template sequence
  • Query-Template alignment
ex 1 high homology case
Ex 1. High Homology Case
  • Human SOX9 WT - homologous to

SRY (PDB: 1HRY) - 49% identity

S9WT: ..AGAACAATGG.. highest

SOXCORE: ..GCAACAATCT.. least

  • Mutants (campomelic dysplasia):
    • F12L: No DNA binding
    • H65Y: Minimal binding
    • P70R: altered specificity; no SOXCORE
    • A19V: near WT but normal binding
1 sox9 models
1. SOX9 Models

SOX9 & P70Rbased onSRY

  • WT & P70R models built
  • Ca overlay:

WT-SRY ~ 0.72 Å

  • J. Biol. Chem. 274 (1999) 24023
ex 1 sox9 dna models
Ex 1. SOX9 + DNA Models

SOX9-WT

  • Observed disease-linked mutations mapped
  • Other residues in DNA-binding groove determined

SRY

SOX9-P70R

ex 2 low homology situation
Ex 2. Low Homology Situation
  • Pigments from reef-building corals: similar to Pocilloporin
  • fluoresce under UV and visible radiation
  • similar to the Green Fluorescent Protein - GFP (19.6% identity)
  • contain ‘QYG’ instead of ‘SYG’ in GFP, as proposed fluorophore
2 poc4 model
2. POC4 Model
  • Barrel ends open
  • C-ter not included
  • b-sheet OK
  • ‘QYG’ fits the site!
  • 26 residues within 5Å of QYG (only 19 in GFP)
  • Increased thermal stability
  • UV protection
ex 3 small disulfide bonded protein complement factor h
Ex 3. Small Disulfide-bonded Protein: Complement Factor H

C-ter

C4

C2

HV loop

C3

C1

N-ter

  • 20 tandem homologous units = SCRs (short consensus repeat) or “sushi” regions
  • Each SCR is ~ 60 aa:
    • conserved Y, P, G
    • 2 disulfide bridges:1-3 & 2-4
    • Linkers of 3-8 aa
  • Heparin binding SCRs: 7 (high affinity) & 20
  • Previous SCRs required for activity: minimum constructs are fH67 and fH18-20
3 templates for fh scrs 6 7
3. Templates for fH SCRs 6-7

vcp4

hfH15

vcp4

vcp3

hfH16

hfH15

vcp3

hfH16

  • hfH SCRs 15&16 (fH1516; PDB ID: 1HFH)
  • Vaccinia virus complement control protein domains 3&4 (vcp34; PDB ID: 1VVC)
  • Orientations differ considerably
  • Vcp34 28% identical to hfH67 compared to hfH1516 (25%) !
3 hfh67 model
3. hfH67 model

Sialic acid

Heparin

disaccharide repeat

hfH67

hfH1516

3 locating residues for mutation from model
3. Locating residues for mutation from model

SCRs 6&7

SCRs 15&16

Pacific Symposium of Biocomputing 2000, 5:155

ex 4 protein engineering
Ex 4: Protein Engineering
  • Thermolysin-like protease unstable at high temperatures (> 40 ºC unlike trypsin)
  • Homology Model built
  • G8 & N60 suited for disulfide bond
  • Double Mutant functional at 92.5 ºC

J Mansfeld et al. Extreme Stabilization of a Thermolysin-like Protease by an Engineered Disulfide Bond” J. Biol. Chem. 1997 272: 11152-11156.

ex 5 multiple chains human hand foot mouth disease virus capsid
Ex 5. Multiple chains: Human Hand, Foot & Mouth Disease Virus capsid
  • 2000 outbreak of HFMD in Singapore: thousands of children affected – 4 deaths (The Lancet, 2000, 356, 1338)
  • Major etiological agent: EV71 (enterovirus group)
  • Neurological complications
5 ev71 genome structure
5. EV71 genome structure

Capsid

Replication

  • 95/94% (RNA) homology coxsackievirus A16/B3
  • Only 1% difference between neurovirulent and non-neuro virulent isolates
  • Most variations in non-capsid regions
  • Within capsid regions, VP1 shows maximum variability relative to other Evs
  • Differences in capsid region 1: VP1 & VP2, 2: VP3

VP4 VP2 VP3 VP1 2A 2B 2C 3A,B 3C 3D

PolyA

VPg

5’ AUG

3’ UAG

Pan-enterovirus primers

EV71 specific primers

5 picornaviridae
5. Picornaviridae

Icosahedral

Capsid

5 template hunt
5. Template hunt

BLASTP against PDB sequences

  • VP1: 3 templates
    • 1BEV 38.7% (bovine enterovirus)
    • 1EAH 36.5% (poliovirus type 2 strain Lansing)
    • 1FPN 38.0% (human rhinovirus serotype 2)
  • VP2: 1BEV 56.7%
  • VP3: 1BEV 54.9%
  • VP4: 1BEV* 50.0%
5 fixing the vp1 alignment
5. Fixing the VP1 Alignment
  • Structural alignment of templates: using VAST (Gibrat, Madej, & Bryant, 1996)
  • Extract corresponding sequence alignment
  • Match HFMDV VP1 to aligned templates using profile alignment in CLUSTALW
5 vp1 alignment to templates
5. VP1 alignment to templates

3,10-helices

a-helices

b-strands

Pocket-factor binding residues

5 model building steps
5. Model building steps
  • Build all 4 capsid proteins (VP1-VP4) together to ensure 3D fit
  • Use 1BEV alone for VP2-VP4
  • For VP1: use aligned 1BEV, 1EAH, 1FPN
  • Check model
5 round 1 vp1 vp2 vp3 vp4
5. Round 1: VP1,VP2,VP3,VP4
  • Clip hanging ends
  • Re-position problem loops:

adjust gaps in alignment

  • Build again
5 round 2 pentamer check
5. Round 2: Pentamer Check
  • Loops look OK
  • Build pentamer
  • Publish….
  • Oops: clash in pentamer assembly. Go back
5 close encounters of the 3 rd kind
5. Close encounters of the 3rd Kind
  • Build only VP3 pentamer
  • N-terminus of each VP3 hydrogen-bonded
  • Also, in BEV, Asp-Lys ion pair
  • First 25 aa overlay v. well
5 fourth foray build with 5 vp3s
5. Fourth foray: build with 5 VP3s
  • Only first 50aa of the other 4 VP3s included
  • Model resulted in knots due to insufficient refinement cycles
  • However, VP3 pentameric region OK
5 canyon pit and antigenic sites
5. Canyon Pit and Antigenic Sites

Poliovirus

sites

Neurovirulent Polio (mouse)

Cardiovirus

5 hfmdv conclusions
5. HFMDV Conclusions
  • Unique surface loops identified for
    • Immunodiagnostic assays
    • Vaccine design
    • Antibodies being generated
  • Canyon pit: depth is similar to BEV
  • Mapping the antigenic regions of other related enteroviruses on the HFMDV surface: specific VP1 and VP2 sites buried
  • Sunita Singh, Vincent T. K. Chow, C. L. Poh, M. C. Phoon: Dept. of Microbiology, NUS
  • Applied Bioinformatics, Vol 1, issue 1, 43-52: invited research article
ad