1 / 35

G53BIO – Bioinformatics

G53BIO – Bioinformatics. Introduction to Proteins Dr. Jaume Bacardit - jqb@cs.nott.ac.uk Prof. Natalio Krasnogor – nxk@cs.nott.ac.uk.

halden
Download Presentation

G53BIO – Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. G53BIO – Bioinformatics Introduction to Proteins Dr. JaumeBacardit - jqb@cs.nott.ac.uk Prof. Natalio Krasnogor – nxk@cs.nott.ac.uk Some material taken from “Arthur Lesk, Introduction to Bioinformatics, 2nd edition, Oxford University Press, 2005, Livingstone, C.D., Barton, G.J.: Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Computer Applications in the Biosciences 9 (1993) 745-756 ”.

  2. Today’s Lecture • Quick Intro to Proteins • How Proteins are made • Protein Structure • Protein Folding • Open questions

  3. Structural Proteins: the organism's basic building blocks, eg. collagen, nails, hair, etc. • Enzymes: biological engines which mediate multitude of biochemical reactions. Usually enzymes are very specific and catalyze only a single type of reaction, but they can play a role in more than one pathway. • Transmembrane proteins: they are the cell’s housekeepers, eg. By regulating cell volume, extraction and concentration of small molecules from the extracellular environment and generation of ionic gradients essential for muscle and nerve cell function (sodium/potasium pump is an example) • Proteins are polypeptide chains, constructed by joining a certain kind of peptides, amino acids, in a linear way • The chain of amino acids, however folds to create very complex 3D structures Crucial molecules for the functioning of life

  4. How Are Proteins Made?(Translation)

  5. Uncovering the code • Scientists conjectured that proteins came from DNA; but how did DNA code for proteins? • If one nucleotide codes for one amino acid, then there’d be 41 amino acids • However, there are 20 amino acids, so at least 3 bases codes for one amino acid are needed since 42 = 16 and 43 = 64 • This triplet of bases is called a “codon” • 64 different codons and only 20 amino acids means that the coding is degenerate: more than one codon sequence code for the same amino acid

  6. Translation (second step from the Central Dogma) • The process of going from RNA to polypeptide. • Three base pairs of RNA (called a codon) correspond to one amino acid based on a fixed table. • Always starts with Methionine and ends with a stop codon

  7. Translation, continued • Catalyzed by Ribosome • Using two different sites, the Ribosome continually binds tRNA, joins the amino acids together and moves to the next location along the mRNA • ~10 codons/second, but multiple translations can occur simultaneously http://wong.scripps.edu/PIX/ribosome.jpg 2009 Nobel Prize http://en.wikipedia.org/wiki/Ada_Yonath

  8. mRNA  Ribosome: Details • mRNA leaves the nucleus via nuclear pores. • Ribosome has 3 binding sites for tRNAs: • A-site: position that aminoacyl-tRNA molecule binds to vacant site • P-site: site where the new peptide bond is formed. • E-site: the exit site • Two subunits join together on a mRNA molecule near the 5’ end. • The ribosome will read the codons until AUG is reached and then the initiator tRNA binds to the P-site of the ribosome. • Stop codons have tRNA that recognize a signal to stop translation. Release factors bind to the ribosome which cause the peptidyl transferase to catalyze the addition of water to free the molecule and releases the polypeptide.

  9. Terminology for tRNA and proteins • Anticodon: The sequence of 3 nucleotides in tRNA that recognizes an mRNA codon through complementary base pairing. • C-terminal: The end of the protein with the free COOH. • N-terminal: The beginning of the protein with the free NH3.

  10. Purpose of tRNA • The proper tRNA is chosen by having the corresponding anticodon for the mRNA’s codon. • The tRNA then transfers its aminoacyl group to the growing peptide chain. • For example, the tRNA with the anticodon UAC corresponds with the codon AUG and attaches methionine amino acid onto the peptide chain.

  11. Protein Structure

  12. Amino Acids 12

  13. Backbone and side chain • All amino acids have a common part: the backbone • Each amino acid type has a different side chain • The Cα atom connects the backbone and the side chain • The first carbon atom in the side chain is called Cβ (except for Gly)

  14. Amino Acids

  15. Protein Structure: Introduction • Different amino acids have different properties • These properties will affect the protein structure and function • Hydrophobicity, for instance, is the main driving force (but not the only one) of the folding process

  16. Global Interactions Local Interactions Protein Structure: Hierarchical nature of protein structure Primary Structure = Sequence of amino acids MKYNNHDKIRDFIIIEAYMFRFKKKVKPEVDMTIKEFILLTYLFHQQENTLPFKKIVSDLCYKQSDLVQHIKVLVKHSYISKVRSKIDERNTYISISEEQREKIAERVTLFDQIIKQFNLADQSESQMIPKDSKEFLNLMMYTMYFKNIIKKHLTLSFVEFTILAIITSQNKNIVLLKDLIETIHHKYPQTVRALNNLKKQGYLIKERSTEDERKILIHMDDAQQDHAEQLLAQVNQLLADKDHLHLVFE Secondary Structure Tertiary

  17. Protein Structure: Hierarchical nature of protein structure • The amino acid composition of a protein is called primary structure or primary sequence • The folding process of a protein involves several steps • The protein creates some patterns due to local interactions with the closest residues in the chain. These patters are called the protein secondary structure • Afterwards, the secondary structure motifs organise into stable patters, called tertirary structure • Finally, proteins can be composed of several subunits or monomers, forming the quaternary structure • Other, less used, levels of this hierarchy are • Supersecondary structure (recurrent patters of interaction between secondary structure elements close in sequence ) • Domains (subunits within a protein with quasi-independent folding stability)

  18. Backbone • The polypeptide chain of proteins in joined together in a very specific way • Two dihedral angles (phi and psi) define the torsion of each amino acid in the chain • Phi is the angle of the Cα –N bond and psi is the angle of the Cα-C bond. http://wiki.cmbi.ru.nl/index.php/Phi-psi_angle

  19. Protein Structure: Hierarchical nature of protein structure Residues form a loop of 3.6 residues/turn and 5.4Å wide • There are two main kinds of secondary structure motifs: • α helices • β sheets • Residues that do not fail in these two categories are said to be in coil state Residues lay flat in parallell strands. Called parallell sheets if all strands have the same N-to-C orientation, and antiparallell if adjacent strands have opposed orientations

  20. Protein Structure: Hierarchical nature of protein structure • Supersecondary structure elements β hairpin β-α-β unit

  21. Protein Folding • Proteins tend to fold into the lowest free energy conformation. • Proteins begin to fold while the peptide is still being translated. • Proteins bury most of its hydrophobic residues in an interior core to form an α helix. • Most proteins take the form of secondary structures α helices and β sheets. • Molecular chaperones, hsp60 and hsp 70, work with other proteins to help fold newly synthesized proteins. • Much of the protein modifications and folding occurs in the endoplasmic reticulum and mitochondria.

  22. Protein Data Bank • Proteins for which scientists have been able to resolve the structure (using x-ray crystallography, NMR, etc.) are stored in the Protein Data Bank (PDB) • Each PDB file has a four letter ID code (PDB id) • A fifth letter (A, B, C, etc.) is used to identify the chain within the PDB entry • In most cases the different chains of a PDB file correspond to the same protein, but not always • Format of PDB files • File for the 1A68PDB entry

  23. Protein Structure: Ramachandran plots • We saw that the backbone of a residue is characterised by two angles: psi and phi. • Can they take any value? • Fortunately not • This effect was studied long ago by GN Ramachandran • He proposed a diagram to visualize these angles (phi in the X axis, psi in the Y axi) of amino acid residues • Different types of secondary structure are clustered in different regions of the diagram

  24. Protein Structure: Ramachandran plots • In real proteins, these plots are not so clear • You can create the Ramachandran plot for any protein in PDB at http://www.fos.su.se/~pdbdna/input_Raman.html • At the right there is the plot for a set of 80 proteins

  25. Protein Structure: Classifications of protein structure • Several tertiary structure classification method exists, for instance, SCOP, CATH, and FSSP/DDD. • No method is perfect, hence www.procksi.org was proposed. • SCOP is the most widespread of them • SCOP = Structural Classification Of Proteins http://scop.mrc-lmb.cam.ac.uk/scop/ • In its 1.75 release (June 2009) it catalogs 38221 PDB entries and 110800 domains • It uses a hierarchical system to catalog the proteins, according to evolutionary origin and structural similarity • The levels of the hierarchy are: class, fold, superfamily, family, protein and species

  26. Protein Structure: Classifications of protein structure • Main classes of SCOP (first level of hierarchy) • All α proteins – proteins that have (almost) only α helices • All β proteins – proteins that have (almost) only β sheets • α+β proteins – proteins that have both α helices and (mostly) antiparallell strands, but segregated in different parts of the protein • α/β proteins – proteins that have both α helices and (mostly) parallell strands, typically forming β+α+β units • Multidomains proteins – proteins having two or more domains belonging to different classes • Membrane and cell surface proteins • Small proteins (metal ligans, heme and proteins with disulfide bridges • Coiled coils proteins • Low resolution protein structure • Peptides • Designed proteins

  27. Protein Structure: Classifications of protein structure • SCOP classification of Flavodoxin from Clostridium beijerinckii • Class: α/β • Fold: Flavodoxin-like: 3 layers, α/β/α; parallel β-sheet of 5 strands • Superfamily: Flavoproteins • Family: Flavodoxin-related binds FMN • Protein: Flavodoxin • Species: Clostridium beijerinckii PDB ID: 5ULL

  28. Protein Structure: Why is structure important? • The function of a protein depends greatly on its structure • The structure that a protein adopts is vital to it’s chemistry • Its structure determines which of its amino acids are exposed to carry out the protein’s function • Its structure also determines what substrates it can react with

  29. Protein Structure: Open questions • Therefore, it is clear that knowing the structure of a protein is crucial for many tasks • However, we only know the structure for a very small fraction of all the proteins that we are aware of • The UniProtKB/TrEMBL archive contains 16886838sequences • The PDB archive of protein structure contains only 76669 structures • In the native state, proteins fold on its own as soon as they are generated, amino-acid by amino-acid (with few exceptions e.g. chaperones)  can we predict this process as to close the gap between protein sequences and their 3D structures?

  30. Protein Structure: Open questions • Protein Structure Prediction (PSP) aims to predict the 3D structure of a protein based on its primary sequence

  31. Protein Structure: Open questions • Another open question is Protein Structure Comparison (PSC) • PSC aims at • Assessing the degree of similarity between two protein structures • Given an unknown protein structure, can you identify from PDB the protein with more similar structure?

  32. Protein Structure: Open questions • Protein function • We also know the exact function of very few proteins • Can we infer the function of an unknown protein based on • Sequence alignment? • Structure comparison? • Can we predict for a protein, from its sequence or structure • Its functionally important residues? • Binding sites?

  33. Protein Structure: Open questions • Protein design • When modifying known proteins, or designing proteins de novo • How can we know that a mutation will not affect the structure/function? • How can we make sure that the protein will have the structure that we are interested in generating? • All the topics mentioned in this lecture (structure prediction & comparison, function prediction) can be applied to protein design

  34. Central Dogma of Biology: A Bioinformatics Perspective The information for making proteins is stored in DNA. There is a process (transcription and translation) by which DNA is converted to protein. By understanding this process and how it is regulated we can make predictions and models of cells. Assembly Protein Sequence/Structure Analysis Sequence analysis Gene Finding Computational Problems

  35. Questions?

More Related