From sequences to structure
Download
1 / 52

From Sequences to Structure - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

From Sequences to Structure. Illustrations from: C Branden and J Tooze , Introduction to Protein Structure, 2 nd ed. Garland Pub. ISBN 0815302703. Protein Functions. Mechanoenzymes: myosin, actin Rhodopsin: allows vision Globins: transport oxygen Antibodies: immune system

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'From Sequences to Structure' - marcus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
From sequences to structure
From Sequences to Structure

Illustrations from: C Branden and J Tooze, Introduction to Protein Structure, 2nd ed. Garland Pub. ISBN 0815302703


Protein functions
Protein Functions

  • Mechanoenzymes: myosin, actin

  • Rhodopsin: allows vision

  • Globins: transport oxygen

  • Antibodies: immune system

  • Enzymes: pepsin, renin, carboxypeptidase A

  • Receptors: transmembrane signaling

  • Vitelogenin: molecular velcro

    • And hundreds of thousands more…


Proteins are chains of amino acids
Proteins are Chains of Amino Acids

  • Polymer – a molecule composed of repeating units


The peptide bond
The Peptide Bond

  • Dehydration synthesis

  • Repeating backbone: N–C–C –N–C–C

    • Convention – start at amino terminus and proceed to carboxy terminus

O

O


Peptidyl polymers
Peptidyl polymers

  • A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids.

  • Since part of the amino acid is lost during dehydration synthesis, we call the units of a protein amino acid residues.

amidenitrogen

carbonylcarbon


Side chain properties
Side Chain Properties

  • Recall that the electronegativity of carbon is at about the middle of the scale for light elements

    • Carbon does not make hydrogen bonds with water easily – hydrophobic

    • O and N are generally more likely than C to h-bond to water – hydrophilic

  • We group the amino acids into three general groups:

    • Hydrophobic

    • Charged (positive/basic & negative/acidic)

    • Polar


The hydrophobic amino acids
The Hydrophobic Amino Acids

Proline severely

limits allowable

conformations!




More polar amino acids
More Polar Amino Acids

And then there’s…



Phi and psi
Phi and psi

  •  =  = 180° is extended conformation

  •  : C to N–H

  •  : C=O to C


The ramachandran plot
The Ramachandran Plot

  • G. N. Ramachandran – first calculations of sterically allowed regions of phi and psi

  • Note the structural importance of glycine

Observed

(non-glycine)

Observed

(glycine)

Calculated


Primary and secondary structure
Primary and Secondary Structure

  • Primary structure = the linear sequence of amino acids comprising a protein:AGVGTVPMTAYGNDIQYYGQVT…

  • Secondary structure

    • Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the-sheet

    • The location and direction of these periodic, repeating structures is known as the secondary structure of the protein


The alpha helix
The alpha Helix

60°


Properties of the alpha helix
Properties of the alpha helix

  • 60°

  • Hydrogen bondsbetween C=O ofresidue n, andNH of residuen+4

  • 3.6 residues/turn

  • 1.5 Å/residue rise

  • 100°/residue turn


Properties of helices
Properties of -helices

  • 4 – 40+ residues in length

  • Often amphipathic or “dual-natured”

    • Half hydrophobic and half hydrophilic

    • Mostly when surface-exposed

  • If we examine many -helices,we find trends…

    • Helix formers: Ala, Glu, Leu,Met

    • Helix breakers: Pro, Gly, Tyr,Ser


The beta strand and sheet
The beta Strand (and Sheet)

 135°+135°


Properties of beta sheets
Properties of beta sheets

  • Formed of stretches of 5-10 residues in extended conformation

  • Pleated – each C a bitabove or below the previous

  • Parallel/aniparallel,contiguous/non-contiguous

OCCBIO 2006 – Fundamental Bioinformatics


Parallel and anti parallel sheets
Parallel and anti-parallel -sheets

  • Anti-parallel is slightly energetically favored

Anti-parallel

Parallel


Turns and loops
Turns and Loops

  • Secondary structure elements are connected by regions of turns and loops

  • Turns – short regionsof non-, non-conformation

  • Loops – larger stretches with no secondary structure. Often disordered.

    • “Random coil”

    • Sequences vary much more than secondary structure regions


Levels of protein structure
Levels of Protein Structure

  • Secondary structure elements combine to form tertiary structure

  • Quaternary structure occurs in multienzyme complexes

    • Many proteins are active only as homodimers, homotetramers, etc.


Disulfide bonds
Disulfide Bonds

  • Two cyteines in close proximity will form a covalent bond

  • Disulfide bond, disulfide bridge, or dicysteine bond.

  • Significantly stabilizes tertiary structure.



Determining protein structure
Determining Protein Structure

  • There are ~ 100,000 distinct proteins in the human proteome.

  • 3D structures have been determined for 14,000 proteins, from all organisms

    • Includes duplicates with different ligands bound, etc.

  • Coordinates are determined by X-ray crystallography


X ray diffraction
X-Ray diffraction

  • Image is averagedover:

    • Space (many copies)

    • Time (of the diffractionexperiment)


Electron density maps
Electron Density Maps

  • Resolution is dependent on the quality/regularity of the crystal

  • R-factor is a measure of “leftover” electron density

  • Solvent fitting

  • Refinement


The protein data bank
The Protein Data Bank

  • http://www.rcsb.org/pdb/

ATOM 1 N ALA E 1 22.382 47.782 112.975 1.00 24.09 3APR 213

ATOM 2 CA ALA E 1 22.957 47.648 111.613 1.00 22.40 3APR 214

ATOM 3 C ALA E 1 23.572 46.251 111.545 1.00 21.32 3APR 215

ATOM 4 O ALA E 1 23.948 45.688 112.603 1.00 21.54 3APR 216

ATOM 5 CB ALA E 1 23.932 48.787 111.380 1.00 22.79 3APR 217

ATOM 6 N GLY E 2 23.656 45.723 110.336 1.00 19.17 3APR 218

ATOM 7 CA GLY E 2 24.216 44.393 110.087 1.00 17.35 3APR 219

ATOM 8 C GLY E 2 25.653 44.308 110.579 1.00 16.49 3APR 220

ATOM 9 O GLY E 2 26.258 45.296 110.994 1.00 15.35 3APR 221

ATOM 10 N VAL E 3 26.213 43.110 110.521 1.00 16.21 3APR 222

ATOM 11 CA VAL E 3 27.594 42.879 110.975 1.00 16.02 3APR 223

ATOM 12 C VAL E 3 28.569 43.613 110.055 1.00 15.69 3APR 224

ATOM 13 O VAL E 3 28.429 43.444 108.822 1.00 16.43 3APR 225

ATOM 14 CB VAL E 3 27.834 41.363 110.979 1.00 16.66 3APR 226

ATOM 15 CG1 VAL E 3 29.259 41.013 111.404 1.00 17.35 3APR 227

ATOM 16 CG2 VAL E 3 26.811 40.649 111.850 1.00 17.03 3APR 228


Views of a protein
Views of a Protein

Wireframe

Ball and stick


Views of a protein1
Views of a Protein

Spacefill

Cartoon

CPK colors

Carbon = green, black

Nitrogen = blue

Oxygen = red

Sulfur = yellow

Hydrogen = white


The protein folding problem
The Protein Folding Problem

  • Central question of molecular biology:“Given a particular sequence of amino acid residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be?”

  • Input: AAVIKYGCAL…Output: 11, 22…= backbone conformation:(no side chains yet)


Forces driving protein folding
Forces Driving Protein Folding

  • It is believed that hydrophobic collapse is a key driving force for protein folding

    • Hydrophobic core

    • Polar surface interacting with solvent

  • Minimum volume (no cavities)

  • Disulfide bond formation stabilizes

  • Hydrogen bonds

  • Polar and electrostatic interactions


Folding help
Folding Help

  • Proteins are, in fact, only marginally stable

    • Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form

  • Many proteins help in folding

    • Protein disulfide isomerase – catalyzes shuffling of disulfide bonds

    • Chaperones – break up aggregates and (in theory) unfold misfolded proteins


The hydrophobic core
The Hydrophobic Core

  • Hemoglobin A is the protein in red blood cells (erythrocytes) responsible for binding oxygen.

  • The mutation E6V in the  chain places a hydrophobic Val on the surface of hemoglobin

  • The resulting “sticky patch” causes hemoglobin S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently

  • Sickle cell anemia was the first identified molecular disease


Sickle cell anemia
Sickle Cell Anemia

Sequestering hydrophobic residues in the protein core protects proteins from hydrophobic agglutination.


Computational problems in protein folding
Computational Problems in Protein Folding

  • Two key questions:

    • Evaluation – how can we tell a correctly-folded protein from an incorrectly folded protein?

      • H-bonds, electrostatics, hydrophobic effect, etc.

      • Derive a function, see how well it does on “real” proteins

    • Optimization – once we get an evaluation function, can we optimize it?

      • Simulated annealing/monte carlo

      • EC

      • Heuristics


Fold optimization
Fold Optimization

  • Simple lattice models (HP-models)

    • Two types of residues: hydrophobic and polar

    • 2-D or 3-D lattice

    • The only force is hydrophobic collapse

    • Score = number of HH contacts


Scoring lattice models
Scoring Lattice Models

H/P model scoring: count noncovalent hydrophobic interactions.

Sometimes:

Penalize for buried polar or surface hydrophobic residues


What can we do with lattice models
What can we do with lattice models?

  • For smaller polypeptides, exhaustive search can be used

    • Looking at the “best” fold, even in such a simple model, can teach us interesting things about the protein folding process

  • For larger chains, other optimization and search methods must be used

    • Greedy, branch and bound

    • Evolutionary computing, simulated annealing

    • Graph theoretical methods


Learning from lattice models
Learning from Lattice Models

The “hydrophobic zipper” effect:

Ken Dill ~ 1997


Representing a lattice model
Representing a lattice model

Absolute directions

UURRDLDRRU

Relative directions

LFRFRRLLFFL

Advantage, we can’t have UD or RL in absolute

Only three directions: LRF

What about bumps? LFRRR

Bad score

Use a better representation


Preference order representation
Preference-order representation

  • Each position has two “preferences”

    • If it can’t have either of the two, it will take the “least favorite” path if possible

  • Example: {LR},{FL},{RL},{FR},{RL},{RL},{FR},{RF}

  • Can still cause bumps:{LF},{FR},{RL},{FL},{RL},{FL},{RF},{RL},{FL}


More realistic models
More Realistic Models

  • Higher resolution lattices (45° lattice, etc.)

  • Off-lattice models

    • Local moves

    • Optimization/search methods and / representations

      • Greedy search

      • Branch and bound

      • EC, Monte Carlo, simulated annealing, etc.


The other half of the picture
The Other Half of the Picture

  • Now that we have a more realistic off-lattice model, we need a better energy function to evaluate a conformation (fold).

  • Theoretical force field:

    G = Gvan der Waals + Gh-bonds + Gsolvent + Gcoulomb

  • Empirical force fields

    • Start with a database

    • Look at neighboring residues – similar to known protein folds?


Threading fold recognition
Threading: Fold recognition

  • Given:

    • Sequence: IVACIVSTEYDVMKAAR…

    • A database of molecular coordinates

  • Map the sequence onto each fold

  • Evaluate

    • Objective 1: improve scoring function

    • Objective 2: folding


Secondary structure prediction
Secondary Structure Prediction

AGVGTVPMTAYGNDIQYYGQVT…

A-VGIVPM-AYGQDIQY-GQVT…

AG-GIIP--AYGNELQ--GQVT…

AGVCTVPMTA---ELQYYG--T…

AGVGTVPMTAYGNDIQYYGQVT…

----hhhHHHHHHhhh--eeEE…


Secondary structure prediction1
Secondary Structure Prediction

  • Easier than folding

    • Current algorithms can prediction secondary structure with 70-80% accuracy

  • Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222.

    • Based on frequencies of occurrence of residues in helices and sheets

  • Neural network based

    • Uses a multiple sequence alignment

    • Rost & Sander, Proteins, 1994 , 19, 55-72



Chou fasman algorithm
Chou-Fasman Algorithm

  • Identify -helices

    • 4 out of 6 contiguous amino acids that have P(a) > 100

    • Extend the region until 4 amino acids with P(a) < 100 found

    • Compute P(a) and P(b); If the region is >5 residues and P(a) > P(b) identify as a helix

  • Repeat for -sheets [use P(b)]

  • If an  and a  region overlap, the overlapping region is predicted according to P(a) and P(b)


Chou fasman cont d
Chou-Fasman, cont’d

  • Identify hairpin turns:

    • P(t) = f(i) of the residue f(i+1) of the next residue f(i+2) of the following residue f(i+3) of the residue at position (i+3)

    • Predict a hairpin turn starting at positions where:

      • P(t) > 0.000075

      • The average P(turn) for the four residues > 100

      • P(a) < P(turn) > P(b) for the four residues

  • Accuracy  60-65%


Chou fasman example
Chou-Fasman Example

  • CAENKLDHVRGPTCILFMTWYNDGP

  • CAENKL – Potential helix (!C and !N)

    • Residues with P(a) < 100: RNCGPSTY

  • Extend: When we reach RGPT, we must stop

  • CAENKLDHV: P(a) = 972, P(b) = 843

  • Declare alpha helix

  • Identifying a hairpin turn

    • VRGP: P(t) = 0.000085

    • Average P(turn) = 113.25

      • Avg P(a) = 79.5, Avg P(b) = 98.25


  • Lots more to come
    Lots More to Come

    • Microarray analysis

    • Mass Spectrometry

    • Interactions/ Knockouts

    • Synthetic Lethality

    • RPPA

    • .....