Structural Bioinformatics

Structural Bioinformatics

In this presentation…… Part 1 – Proteins & Proteomics Part 2 – Protein Structure & Function Part 3 – Analysis & Visualization Part 4 – Protein Structure Prediction

Part1 Proteins & Proteomics

Proteins • Proteins are the fundamental building blocks of life • Enzymes are proteins that are molecular machines responsible for all the chemical transformations cells are capable of • Those structure that are not made of proteins are produced by enzymes (which are proteins) • A human contains proteins of the order of 100,000 different proteins • Proteins are of variable length and shape

Structural types and conceptual models • Globular proteins are soluble in predominantly aqueous solvents such as the cytosol and extra-cellular fluids, and integral membrane proteins exist within the lipid-dominated environment of biological membranes • Conceptual models of protein structure are valuable aids to understanding protein bioinformatics

Globular proteins • The linear amino acid polymer forms a 3D structure by folding into a globular compact shape • Globular proteins tend to be soluble in aqueous solvents and folding is dominated by the hydrophobic effect, which directs hydrophobic amino acid side-chains to the structural core of the protein, away from the solvent

Secondary structure • Globular proteins usually contain elements of regular secondary structure, including –helices and –strands • These are stabilized by hydrogen bonding and contribute most of the amino acids to globular protein cores • Residues in regular secondary structures are given the symbol H, meaning helix, or E (or B), meaning extended or  strand

C atoms of consecutive amino acid residues 0.54 nm (3.6 amino acid residues per turn) Position of polypeptide backbone consisting of C and peptide bond C-N atoms 0.15 nm (100° rotation per residue) Folding of polypeptide chain into an  helix Cross-sectional view of an  helix showing the positions of the side-chains (R groups) of the amino acids on the outside of the helix

H H H H H O H H O R R N N N C C C C C C N C C N C C R H H H R5 R1 R3 O O O R2 R4 R R R R R Amino acid side-chains In the  helix the CO group of residue n is hydrogen bounded to the NH group on residue (n+4) Hydrogen bond Cross-sectional view of an  helix showing the positions of the side-chains (R groups) of the amino acids on the outside of the helix

Tertiary structure • It is the full 3D atomic structure of a single peptide chain • It can be viewed as the packing together of secondary structure elements, which are connected by irregular loops that lie predominantly on the protein surface • Loop residues are given the symbol C to distinguish them from residues in helices or strands

Tertiary Structures

Quaternary structures • Several tertiary structures may pack together to form the biologically functional quaternary structure

Quaternary Structures

Integral membrane proteins • These exist within biological lipid membranes and obey different structural principles compared with globular proteins • They contain runs of generally hydrophobic amino acids, associated with membrane-spanning segments (often but not exclusively helices), connected by more hydrophilic loops that lie in aqueous environments outside the membrane • Membrane proteins are very important components of cellular signaling and transport systems

Domains • Proteins tend to have modular architecture and many proteins contain a number of domains, often with mixed types, for example mixed integral membrane and globular domains

Evolution • In globular proteins, surface residues in loops evolve (change) more quickly than residues in the hydrophobic core • In integral membrane proteins, the most slowly evolving residues are those in the membrane-spanning regions

Protein structure prediction • Identifying all of the proteins in a human is one thing, but to truly understand a protein’s function scientists must discern its shape and structure • The structural genomics initiative calls for use of quasi-automated x-ray crystallography to study normal and abnormal proteins • Conventional structural biology is based on purifying a molecule, coaxing it to grow into crystals and then bombarding the sample with x-rays. X-rays bounce off the molecule’s atoms, leaving a diffraction pattern that can be interpreted to yield molecule’s overall 3D shape • A structural genomics initiative would depend on scaling up and speeding up the current techniques

By figuring out which of the unknown proteins associated with previously identified ones, the CuraGen and University of Washington scientists were able to sort them into functional categories, such as energy generation, DNA repair, aging • Eventhough yeast is an excellent prototype, Drosophila is good when desired to study an organism with multiple cells

Other methods for protein prediction • Another method for studying proteomes is called “guilt by association”: learning about the function of a protein by assessing whether it interacts with another protein whose role in a cell is known • A group lead by Stanley Fields of University of Washington reported that they had deduced 957 interactions among 1,004 proteins in baker’s yeast [S. cerevisiae]

A machine devised by Hochstrasser and his research group goes one step further than the robots. It would automatically extract the protein spots from the gels, use enzymes to chop the proteins into bits, feed the pieces into a laser mass spectrometer and transfer the information to a computer for analysis • With or without robotic arms, 2-D gels have their problems. Besides being tricky to make, they do not resolve highly charged or low mass proteins very well • They also do a poor job of resolving proteins with hydrophobic regions, such as those that span the cell membrane. This is a major limitation, because membrane-spanning receptors are important drug targets

Fields and his colleagues first devised a widely used method for studying protein interactions called the yeast two-hybrid system, which uses known protein “baits” to find “prey” proteins that bind to the “baits” • Another way to study proteins that has recently become available involves so called protein chips. Ciphergen Biosystems, a biotechnology company in Palo Alto, is selling a range of strips for isolating proteins according to various properties, such as whether they dissolve in water or bind to charged metal atoms. Strips can then be placed in chip reader, which includes a mass spectrometer, for identifying the proteins

What’s new • Knowing the exact structural form of each of the proteins in the human proteome should, in theory, help drug designers devise chemicals to fit the slots on the proteins that either activate them or prevent them from interacting • Such efforts, which are generally known as rational drug design, have not shown widespread success so far – but then only roughly one percent of all human proteins have had their structures determined • After scientists catalogue human proteome, it will be the proteins – not the genes – that will be all the rage

Part2 Protein Structure & Function

Structure and function • Proteins rely upon the shapes and properties of key functional areas of their 3D structures to carry out biological functions • Knowledge of protein structure is the key to understanding protein function and this is one reason for its importance in bioinformatics

MUTZM WTZM

Structural and functional constraints • Evolution accepts change to amino acid residues in proteins where they have a neutral or advantageous effect on protein structural stability or protein function • Residues can be conserved for structural or functional reasons • Amino acids are conserved where they are uniquely able to fulfill particular structural roles • This often occurs with cysteine, glycine and proline

RPIP TOLC 150

FGF1 XRCC4 300

Evolution of theoverall protein fold • If two naturally occurring protein sequences can be aligned to show more than 25 percent similarity over an alignment of 80 or more residues, then they will share the same basic structure • The Sander-Schneider formula gives the higher threshold percentage identifies necessary to guarantee structural similarity from shorter alignments

Conservation of structure • Protein structures tend to be conserved even when evolution has changed the sequence almost beyond recognition • Structural knowledge is therefore a key factor in understanding protein evolution

Evolution of function • While structure tends to be conserved by evolution, function is observed to change • There are many examples of proteins whose sequence and structure are very similar, but which have different functions • When function has changed, key functional residues change as well, and this is often clear in multiple sequence alignments

Multiple sequence alignment • Understanding how structures evolve can help us understand multiple sequence alignments • Key structural and functional residues are often observed to be conserved • Insertions and deletions are seen to occur preferentially in hydrophilic surface loops by comparison with regular secondary structure elements • Loops are also subject to faster mutational change • Conservation of hydrophobic core residues in secondary structure elements is also common, as are conservation patterns associated with amphipathic helices

Part3 Analysis & Visualization

Software, data and WWW sites • A large variety of software for structure visualization, alignment and analysis is available on the WWW • All published protein structures are submitted to a public database. Database search and down can be performed at varios WWW sites • Rasmol, Chime and Cn3d are commonly used programs for viewing structural data

Structural and functional analysis of structures • There is an enormous amount of software available for structural data analysis, and also several WWW sites holding pre-prepared analyses • Functional sites in protein structures typically contain a few residues in defined spatial positions • Software and databases have been developed to locate and search for similarity in such sites

Structural alignment • It can be very difficult to find correct, biologically-meaningful alignments of very distantly related protein sequences because they contain only a very small proportion of identical monomers • In such cases, structural information can help because evolution tends to change structure less • Superimposing the backbones of similar structures implies structurally equivalent residues and this process is known as structural alignment

Structural similarity • Structural alignment methods often produce measure of structural similarity • The most common of these is the RMSD, which is reported by most programs • This the root mean square difference in position between the  carbon atoms of aligned residues in optimal structural superposition

Why classify protein structures?… • Classification groups together proteins with similar structures and common evolutionary origins • Examples • CATH, available at http://www.biochem.ucl.ac.uk/bsm/cath • SCOP, available at http://scop.mrc-lmb.cam.ac.uk/scop

Structural classes • Proteins can be assigned to broad structural classes based on secondary structure content and other criteria • CATH has four such broad classes, but SCOP uses more, giving a more detailed description of structural class

Fold or topology • All classifications gather together proteins with the same overall fold or topology • Proteins in the same fold or topology class contain more or less the same SSEs, connected in the same way and in similar relative spatial positions

Homologs and analogs • Homologs (homologous proteins) are related by divergent evolution from a common ancestor, and have the same fold • Analogs (analogous proteins) have the same fold, but other evidence for common ancestry is weak

Super-folds • Super-folds are proteins folds that seem likely to have arisen more than once in evolution • They are thought to have advantageous physio-chemical properties • They appear in SCOP and CATH as fold or topology levels containing several homologous super-families • Examples are the TIM barrel and immunoglobulin fold • Characteristics are that they tend to exhibit approximate symmetries, and are characterized by repeated super-secondary structures

Part4 Protein Structure Prediction

Why predict structure?… • Structure prediction is interesting because experimental structure determination is still much slower than sequence determination • Structure predictions help us to understand function and mechanism and can be used for rational drug design • The early work of Levinthal and Anfinsen made structure prediction a fascinating scientific problem

Structure prediction methods • Comparative modeling • Secondary structure prediction • Fold recognition • Ab initio prediction • Transmembrane segment prediction

Structural Bioinformatics