Advancements in Protein Crystallization Techniques at the Derewenda Lab, University of Virginia

Several SER Structures, Strategies, Surfaces, and Such. The Derewenda Lab University of Virginia Earth Day, 2008. Sponsored by the letter S.

Protein crystallized in our group by the surface engineering approach, with solved crystal structures (as of March 2008) • The RGSL domain of PDZRhoGEF (Longenecker KL, et al. & Derewenda ZS. Structure, 2001, 9:559-69) • The LcrV antigen of the plague-causing bacterium Yersinia pestis (Derewenda, U. et al. & Waugh, D.S. Structure, 2001, 9:559-69) • Product of the YkoFB. subtilis gene (Devedjiev, Y. et al. & Derewenda, Z.S. J Mol Biol. 2004, 343:395-406) • Product of the YdeNB. subtilis gene (Janda, I. et al. & Derewenda, Z.S. Acta Crystallogr 2004, D60: 1101-1107) • Product of the Hsp33B. subtilis gene (Janda, I. et al. & Derewenda, Z.S. Structure 2004, 12:1901-1907) • The product of the YkuDB. subtilis gene (Bielnicki, J. et al. & Derewenda, Z.S. Proteins, 2006, 62:144-51) • The Ohr protein of B. subtilis (Cooper, D. et al. & Derewenda, Z.S. Acta Cryst 2007, D63:1269-1273) • The N-DCX domain of human doublecortin (Cierpicki, et al. & Derewenda, Z.S. Proteins; 2006:D64:874-882) • The p23-like domain of the human nuclear migration NudC protein (Zheng, M. et al. & Derewenda, Z.S. in preparation) • APC 1446 Bacillus subtilis (Derewenda, U. et al. & Derewenda, Z.S. in preparation) • DinB Bacillus subtilis (Cooper, D.R. et al. & Derewenda, Z.S. in preparation) • Tm0439 – VanR family transcription factor (Zheng, M. et al. & Derewenda, Z.S. in preparation) • TM1865 – endonuclease V (Utepbergenov, D. et al. & Derewenda, Z.S. in preparation) • Tm0260 – Phosphate transport regulator (Zheng, M. et al. & Derewenda, Z.S. in preparation) • Tm1382 – NUDIX hydrolase (Possible mutT family member) (Choi, W.C., et al. & Derewenda, Z.S. in preparation)

Publications by other groups reporting crystallization of novel proteins(green),or preparations of higher quality crystal forms(red)of proteins previously crystallized, by the SER method (as of March 2008) The CUE:ubiquitin complex (Prag G et al., & Hurley JH, Cell. 2003, 113:609-20) Unactivated insulin-like growth factor-1 receptor kinase (Munshi, S. et al. & Kuo, L.C. Acta Cryst. 2003, D59:1725-1730) Human choline acetyltransferase (Kim, A-R., et al. & Shilton, B. H. Acta Cryst. 2005, D61, 1306-1310) Activated factor XI in complex with benzamidine (Jin, L., et al. & Strickler, J.E. Acta Cryst. 2005, D61, 1418-1425) Axon guidance protein MICAL (Nadella, M., et al. & Amzel, M.L. PNAS, 2005, 102, 16830-16835) Functionally intact Hsc70 chaperone (Jiang, J., et al. & Sousa, R. Molecular Cell, 2005, 20, 513-524) EscJ protein from the Type III secretion system (Yip, C.K., et al. & Strynadka, N.C.J. Nature, 435: 702-707) L-rhamnulose kinase from E. coli (Grueninger D, & Schultz, G.E.) J. Mol. Biol, 2006, 359, 787-797) T4 vertex gp24 protein (Boeshans, K.M.., et al. & Ahvazi, B. Protein Expr. Purif., 2006, 49, 235-243. Borrelia burgdorferi outer surface protein A (Makabe, K., et al. & Koide, S. Protein Science., 2006, 15, 1907-1914) SH2 domain from the SH2-B murine adapter protein (Hu, J., & Hubbard, S.R J. Mol. Biol., 2006, 361, 69-79) Mycoplasma arthriditis-derived mitogen (Guo, Y., et al., & Li, H. J., Acta Cryst. 2006, F62, 238-241) KChIP1 – Kv4.3 T1 complex (Pioletti, M., et al. & Minor, D. L., Nature, Str & Mol Bio. 2006, 13: 988-995 Kinase domain of serum and glucocorticoid-regulated kinase 1 in complex with AMP-PNP (R126A) (Zhao, B., et al & Schackenberg, C.G., Protein Science, 2007, 16, 2761-2769) Human IL-7 bound to unglycosylated and glycosylated forms of its  receptor (Wickham, J. Jr. and Walsh, S.T.R., Acta Crystallographica, 2007, F63, 865-869) Human cyclin B1 (C167S, C283S, C350S, E183A, E184A) (Petri, E.T., et al. & Basavappa, R. Cell Cycle, 2007, 6: 1342-1349) Candida boidinii formate dehydrogenase (Schirwitz, K., Schmidt, A. & Lamzin, V.S. Protein Science, 2007, 16: 1146-1156) EpsI/EpsJ complex (Yanez, M.E., et al., Hol, W.G.J. J. Mol. Biol., 2008, 375:471-486) Periplasmic domain of E. coli YidC (Paetzel, M & Oliver, D.C. J. Biol. Chem., 2008, 283:5208-5216) Candida boidinii formate dehydrogenase (Schirwitz, K., Schmidt, A. & Lamzin, V.S. Protein Science, 2007, 16: 1146-1156) -ketoacyl acyl carrier protein from Streptococcus pneumoniae (FabF) (Parthasarathy, G. et al., & Soisson, Stephen, M. 2008, Acta Crystallographica, D64:141-148)

Our Current SER strategy • Target evaluation and selection— See the slides after the acknowledgements for information on: • PSI Structural Genomics Knowledgebase http://kb.psi-structuralgenomics.org/KB/ • DisMeta (a disorder meta-server) • XtalPred • Expression of Wild Type – taken through to crystallization trials. • Performed on a chromatography system and eluted as a gradient to determine optimal washing concentration of imidazole. • We will work with WT crystals for ~2 months before undertaking mutagenesis. • Mutation Site and Replacement Residue selection • We use the SERp server and use the three best sites. • We make Ala and Tyr variants for the top 3 clusters. • QuikChange mutatgenesis • We make them all at once. • Purification, crystallization. • We use gravity columns and wash with the imidazole concentration determined for the wild type protein. Some lab members like to purify all 6 at once, others like to purify the 1A and 1Y variants first.

Tm1865 Site 1) K49, E50, E51 Site 2) K173, E174 Site 3) K25, K26, K28

Endonuclease V (TM1865), is a DNA repair enzyme. It cleaves a second phosphodiester bond (in 5’ direction) from a deaminated base. Recognizes an unusually broad range of irregularities in the DNA structure: hairpins, unpaired/mispaired bases, deaminated residues, abasic sites etc ATGCxTGC TACGTACG • Found throughout nature – homologs in human, bacteria, archaea • Structure unknown, function is believed to be DNA repair • However, E. coli deficient in EndoV are generally normal and resistant to mutagens (except nitrosating agents). The enzyme is important for the resistance of E.coli to mutagenesis during nitrate/nitrite respiration. • Enzyme is used for mutagenesis and for high throughput detection of mutations in clinical samples • E. coli enzyme commercially available from NEB • Thermatoga enzyme commercially available from Fermentas

TM1865 – crystallization, structure solution • Purifies and crystallizes easily as a wild type, no need to apply SER • Crystals of SeMet derivative were obtained directly from the JCSG screen, (24% PEG1500, 20% glycerol ) using 1.5 M NaCl in reservoir. • P212121, a=69.27, b=71.37, c=119.78 • Scaled at 2.7Å • 3 molecules per ASU, solution from Shelx, model with Solve/Resolve and O. • Current R-factor 18% (Rfree – 29%) further refinement is still necessary

TM1865 – overall structure Asymmetric trimer Monomer

TM1865 belongs to the RNaseHI superfamily. RNaseHI overall structure: Structure of catalytic center: Catalytic site consists of 3-5 residues coordinating two metal ions (Mg or Mn). Metals are known to be crucial for catalysis: one is believed to lower the pKa of attacking nucleophile (water), another is believed to stabilize the negative charge on the formed pentacovalent intermediate.

RNaseHI fold family – proteins in PDB with RNaseHI-like fold • RNaseHI - cleaves RNA strand if it is in duplex with DNA • UvrC – major part of bacterial DNA repair system. Recognizes irregularities in the DNA structure • RuvC – Holliday junction resolvase • Retroviral Integrase – integrates viral genome into host’s DNA • Argonaute – Important players in RNA interference • Transposase – incorporates DNA fragments into another DNA • Mitochondrial Resolvase • RNaseHII - cleaves RNA strand if it is in duplex with DNA All these proteins cleave DNA or RNA strands to perform their function

Closest homologs in PDB 2nrt (magenta) subdomain of UvrC protein from 2dqe – protein with unknown function TM. Uvr is a major DNA repair system in bacteria UPF0125 proteins are found in some organisms living in extreme conditions

Active sites of TM1865 (yellow) and UvrC (gray) seem to be identical

Tm 1865 Conclusions • Endonuclease V belongs to RNase H superfamily of proteins • There are no structures of Endonuclease V in PDB but 2 recent structures have similar fold; there are more similar structures known within RNAse H superfamily. • Catalytic sites of UvrC and EndonucleaseV are identical

Tm0439 Site 1) E188,K119,K122 Site 2) K2, K3 Site 3) E30, K31

Unrooted tree of the proteins of the GntR family HTH motif Effector binding domain Four subfamilies: FadR, HutC, MocR, and YtrA. FadR subfamily: FadR and VanR FadR 1st, regroups 40% All helical C-terminal domain 7 or 6 helices VanR-like regulators, 170 aa and 150 aa Regulation of oxidized substrates Rigali, S. et al. J. Biol. Chem. 2002;277:12507-12515

SERp Crystal

Crystal contact of Tm0439 C C 130A131A 134A N N Wild type: crystals, poor Mutant: 130E131K134K2AAA, 1A, good quality Crystal contact

DNA-Binding domain of Tm0439 Tm0439 2HS5 1E2X T25 S7 D19 1 3 D78 D58 D85 N76 2 V91 E54 A33 V46 E70 An HTH motif: 2 and 3, tight turn Superimpose: conserved 2nd structure element, HTH motif: Tm0439: V46-E70, 2HS5: E54-D78, 1E2X: A33-D58 1-2 loops, equal length, conformation

Stereo model of Tm0439-DNA complex 1 1 2 2 3 3 2 2 1 1 The proposed Tm0439-DNA binding mode Putative DNA contacts: 4 distinct regions 1: At the N-terminus, side chains of V18, L19, V21, and M13-E17 couldn’t be seen 2: At the beginning of 2 helix, V46 and R47 3: 3, major groove, residues S56, F57, T58, P59 and R61 4: At the tip of the 1-2 hairpin, P78 and R79

Effector-binding domain of Tm0439 Tm0439 1E2X 2HS5 7 6 4 5 8 86 9 226 C-terminal domain: 6 -helices (4-9) with short connecting loops, form a bundle 1E2X has 7 helices 2HS5 has 6 helices All helices bundle, superimposed together

The putative switch mechanism of Tm0439 6 6 7 7 5 5 4 4 9 8 9 8 Cavity N N 5 7 7 5 C FadR FadR dimer Tm0439 Tm0439 dimer

Tm1382 Site 1) K158,E159,K160 Site 2) K77,Q78,E80 Site 3) E47, E49

Nudix Hydrolase Superfamily • Pyrophosphohydrolases that act upon Nucleoside DIphosphate connected to another moiety (X) Such substrates include (d)NTPs (both canonical and oxidised derivatives), nucleotide sugars and alcohols, dinucleoside polyphosphates (NpnN), dinucleotide coenzymes and capped RNAs. • The substrate diversity requires equally diverse chemistries. • Tm1382 is classified as a MutT hydrolase by the JCSG, but it is 50% larger than most members of the family. • Consensus Nudix Sequence Gx5Ex5[UA]xREx2EExGU • Tm1382 Sequence • Gx4Ex5LxREx2EExDV

Tm1382 Current Working Model

Some parts are missing tm1382-wt MKSERILVVKTEDFLKEFGEFEGFMRVNFEDFLNFLDQYGFFRERDEAEYDETTKQVIPY 60 working-chA --GGG---GGGGGFLKEFGEFEGFMRVNFEDFLNFLDQYGFFRERDEAEYDETTKQVIPY 55 working-chB -----ILVVKTEDFLKEFGEFEGFMRVNFEDFLNFLDQYGFFRERDEAEYDETTKQVIPY 55 .*********************************************** tm1382-wt VVIMDGDRVLITKRTTKQSEKRLHNLYSLGIGGHVREGDGATPREAFLKGLEREVNEEVD 120 working-chA VVIMDGDRVLITK-------------YSLGIGGHVRR-------EAFLKGLEREVNEEVD 95 working-chB VVIMDGDRVLIT--------------YSLGIGGHVRE------REAFLKGLEREVNEEVD 95 ************ **********. **************** tm1382-wt VSLRELEFLGLINSSTTEVSRVHLGALFLGRGKFFSVKEKDLFEWELIKLEELEKFSGVM 180 working-chA VGGGGGGFLGLINSSTTEVSRVHLGALFLGRGKFFSVGGGGG------GGGGGGGFSGVM 149 working-chB VSLRELEFLGLINSSTTEVSRVHLGALFLGRGKFFSVGGGGG------GGGGGGGFSGVM 149 *. ****************************** . ***** tm1382-wt EGWSKISAAVLLNLFLTQN 199 working-chA EGWSKISAAVLAG---GGG 165 working-chB EGWSKISAAVLL------- 161 *********** Gx4Ex5LxREx2EExDV

Some Distant Homologues(Top Dali Hits) 1hx3 1htz ModBase Model Found on the PSI Knowledgebase 2fkb

Tm1679 Site 1) K159,E160 Site 2) K78,E79 Site 3) K100, K101

Tm1679 We thought there was no viable MR model (see below), but thank to the PSI Structural Genomics Knowledgebase, we have the structure. (http://kb.psi-structuralgenomics.org/KB/) 2p4z 35% Identity RFZ=7.3 TFZ=8.8 PAK=0 LLG=74 LLG=74

The Surface problem • “In accordance with the assumption that solvent exposure of a residue is directly related to its probability of forming random contacts, accessible surface area might be used as the basis of a reference state to compute the number of random contacts expected.”(Dasgupta1997)‏ • surface = sum over all atoms. • 85% residues have ASA > 0 contacts ASA VdW

Selection is futile • Area-based comparisons are almost as bad as number based. • No ASA or rASA threshold will fix different distributions Lys Leu

Patch analysis of crystal contacts • Jones&Thornton introduced a patch methodology to analyse properties of biologically relevant interfaces on the protein surface. • The major problems are: • defining a single contact (interface): • coordination number (only binary)‏ • clustering (artifacts)‏ • sampling the surface: • make random interfaces

Spherical protein approximation • coordinate system and distance measure: x,y r,φ in 3D:- three (0,2π) angles.- one for each axis.- + r the radius Pros:- easy to cluster!- with r, mahalanobis r,φ φ do we need r?

Space is the place ”Sun Ra” • We need to measure the distance between atoms to make continuous patches on the surface: • the coordinate space affects sampling frequency possibly introducing bias.

zenpdb • getting information from pdb files • robust ... workflow based ... scalable • object oriented • outsourcing: • Areaimol, Ncont/Act, Stride, MSMS • numpy/scipy (k-means clustering)‏ • scipy-cluster (hierarchical clustering)‏ • Bio.KDTree (NN distance look-up)‏ • scikits.ANN (NN k look-up)‏ • CGAL, CGAL-python (voronoi)‏ • PyTables (bindings for hdf5)‏

The noble 8-fold path: from zenpdb import * file_name = 'some_pdb_file' parser = PDBParser(forgive =1)‏ parser.set_file(file_name)‏ structure = p.get_structure(file_name[0:4])‏ ACTAtomContacts(in_file, structure)‏ residues = einput(structure, 'R')‏ r_x = residues._select_children({}, 'gt', \ 'CNT_ACT_X', xtra=True).values()‏ HierarchicalResidueClusters(r_x, dmethod ='mahalanobis', lmethod='average', criterion ='maxclust', t=6)‏ BeQu('new_pdb_file.pdb', structure, 'R', 'H_CLUST')‏ http://code.google.com/p/zenpdb/

Structures Around the Corner(need phasing power) • Tm0260 • Several data sets diffracting to ~2.2 Å (R32) • Should have 8 Seleniums in the ASU • MR encouraging • Tm1024 • Lots of beautiful crystals • Several data sets to ~ 2.4 Å of 1A and 1Y mutants • Only 1 Methionine. • Creating several L->M mutations • Creating the 1M Mutant (K45M, K46M)

Tm0260Putative phosphate regulatory protein Site 1) K153,E154,K155 Site 2) E10,E11 Site 3) E78,K79

MR encouraging, but… The closest model is only 16% identical and is symmetrical. Long helices can be seen, but there are no side chain features and the ends are ambiguous. 2iiu

UVA Zygmunt Derewenda Jakub Bielnicki Marvin Cieslik WonChan Choi David Cooper Ulla Derewenda Monika Kijanska Natalya Olekhnovich Darkhan Utepbergenov Jennifer Wingard Meiying Zheng Tomek Boczek Kasia Grelewska Gosia Pinkowska Michal Zawadzki Eliza Zylkiewic Los Alamos Nat’l Lab Tom Terwilliger Chang Yub Kim UCLA David Eisenberg Luki Goldschmidt Tom Holton Lawrence Berkeley Nat’l Lab Li-Wei Hung Minmin Yu (Big Thanks) Jeff Habel And ALL ISFI members! The ISFI is funded by NIH U54 GM074946. Several slides follow.

DisMeta – a NESG MetaServerhttp://www-nmr.cabm.rutgers.edu/bioinformatics/disorder/ Queries up to 12 different disorder prediction servers.

http://kb.psi-structuralgenomics.org/KB/ Submit a sequence!

http://kb.psi-structuralgenomics.org/KB/ Click Here To access these tabs

http://ffas.burnham.org/XtalPred-cgi/xtal.pl

Advancements in Protein Crystallization Techniques at the Derewenda Lab, University of Virginia