Biotechnology in the “Nomic Era” ( 生物技術在 “ 体學 ” 時代 ). Proteomics and human diseases. Jau-Song Yu ( 余兆松 ) Department of Cell and Molecular Biology, Institute of Basic Medical Sciences, Medical College of Chang Gung University. ( 長庚大學基礎醫學所分子生物學科 ). Genomics:
Biotechnology in the “Nomic Era” (生物技術在“体學”時代)
Jau-Song Yu (余兆松)
Department of Cell and Molecular Biology, Institute of Basic Medical Sciences, Medical College of Chang Gung University
“the PROTEin complement of the genOME”
The term proteome, refers to proteins that are encoded and expressed by a genome, and was first suggested in 1994 byMarc Wilkins. Wilkins defines proteomics as
"the study of proteins, how they\'re modified, when and where they\'re expressed, how they\'re involved in the metabolic pathways and how they interact with each other."
The University of New South Wales (UNSW), Sydney, Australia
Interactome research, proteomics, bioinformatics for proteomics and its application to biomedical research.
Changes of physiological functions
Alterations of functional molecules
99% sequence of human
Global changes of DNA, RNA and protein
15 February 2001
16 February 2001The Human Genome
PNAS USA 98, 10869–10874 (2001)
one of the many factors determining the protein function in cells
alternative splicing, etc.
Modification of proteins
activator, inhibitor, etc)
How to analyze hundreds to thousands of proteins in cells or tissues simultaneously?
Separation of proteins on one or more matrixes ---
Identification and/or quantitation of separated proteins in
a high-throughput way --- mass spectrometry
General principle and protocol of 2-dimension gel electrophoresis
pH 9 -
pH 3 +
Cathode (-) electrode
Anode (+) electrode solution
Acidic buffering group:
CH2 - CH-C-NH-R
Basic buffering group:
Production of Immobilized pH Gradient (IPG) strip
30 voltage 12hr
First dimension: Isoelectric focusing
1. Place electrode pads (?)
2. 200 V step-n-hold 1.5hr
3. 500 V step-n-hold 1.5hr
4. 1000 V gradient 1500vhr
5. 8000 V gradient (?)36000vhr
Marker in paper
in running buffer
SDS equilibration buffer
50 mM Tris-HCl
6 M Urea
Protocol of silver stain:
25% acetic acid
ddH2O x 3 times
0.004% DTT solution
2.3M citric acid
5% acetic acid
Fluorescent dyes: Sypro Ruby, Cy3, Cy5, Cy2 etc.
How to analyze hundreds to thousands of proteins in cells or tissues simultaneously?
●Separation of proteins on one or more matrixes ---
● Identification and/or quantitation of separated proteins in a
high-throughput way --- mass spectrometry
Gary Siuzdak (1996) Mass Spectrometry for Biotechnology, Academic Press
Two ionization methods
Electrospray ionization (ESI)
NATURE, 422, 198-207, (2003)
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 5, 699-711 (2004)
MALDI-TOF MS (Matrix-assisted laser desorption/ionization-Time of flight)
Time of Flight
Kinetic Energy = ½ mv2
v = (2KE/m)
1347.7 g/mole x 5 x 10 -18 mole = 6.74 x 10 –15 g
Linking between genomics/bioinformatics/proteomics
Digested by trypsin (Lys, Arg)
(854, 931, 935, 1021,
1067, 1184, 1386, 1438)
(621, 754, 778, 835,
1204,, 1398, 1476, 1582)
(664, 711, 735, 904,
1079, 1188, 1438)
(602, 755, 974,
1166, 1244, 1374)
(Masses of tryptic peptides are predictable from gene sequence databases)
MALDI-TOF MS analysis
(854, 935, 1021,
1067, 1184, 1386, 1438)
(621, 778, 835,
1204,, 1398, 1582)
(735, 904, 1079,
Protein identified (100%?)
Direct identification of the amino acid sequence of peptides by tandem mass spectrometry
Cell. Mol. Life Sci. 62 (2005) 848–869
Recent successes illustrate the role of mass spectrometry-based proteomics as an indispensable tool for molecular and cellular biology and for the emerging field of systems biology. These include the study of protein–protein interactions via affinity-based isolations on a small and proteome-wide scale, the mapping of numerous organelles, the concurrent description of the malaria parasite genome and proteome, and the generation of quantitative protein profiles from diverse species. The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.
Nature, 422, 198-207, 2003
Paper No. in PubMed
Proteomics (since 1998)
Genomics (since 1988)
The Nobel Prize in Chemistry for 2002 is to be shared between scientists working on two very important methods of chemical analysis applied tobiological macromolecules: mass spectrometry (MS) and nuclear magnetic resonance (NMR). Laureates John B. Fenn, Koichi Tanaka (MS) and Kurt Wuthrich (NMR) have pioneered the successful application of their techniques to biological macromolecules. Biological macromolecules are the main actors in the makeup of life whether expressed in prospering diversity or in threatening disease. To understand biology and medicine at molecular level where the identity, functional characteristics, structural architecture and specific interactions of biomolecules are the basis of life, we need to visualize the activity and interplay of large macromolecules such as proteins. To study, or analyse, the protein molecules, principles for their separation and determination of their individual characteristics had to be developed. Two of the most important chemical techniques used today for the analysis of biomolecules are mass spectrometry (MS) and nuclear magnetic resonance (NMR), the subjects of this year’s Nobel Prize award.
A high throughput process including subcellular fractionation and multiple protein separation and identification technology allowed us to establish the protein expression profile of human fetal liver, which was composed of at least 2,495 distinct proteins and 568 non-isoform groups identified from 64,960 peptides and 24,454 distinct peptides. In addition to the basic protein identification mentioned above, the MS data were used for complementary identification and novel protein mining. By doing the analysis with integrated protein, expressed sequence tag, and genome datasets, 223 proteins and 15 peptides were complementarily identified with high quality MS/MS data.
It has long been thought that blood plasma could serve as a window into the state of one’s organs in health and disease because tissue-derived proteins represent a significant fraction of the plasma proteome. Although substantial technical progress has been made toward the goal of comprehensively analyzing the blood plasma proteome, the basic assumption that proteins derived from a variety of tissues could indeed be detectable in plasma using current proteomics technologies has not been rigorously tested. Here we provide evidence that such tissue-derived proteins are both present and detectable in plasma via direct mass spectrometric analysis ofcaptured glycopeptides and thus provide a conceptual basis for plasma protein biomarker discovery and analysis.
From the ‡Department of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), SE-106 91 Stockholm, Sweden and the ¶Department of Genetics and Pathology, Rudbeck Laboratory,
Uppsala University, SE-751 85 Uppsala, Sweden
Antibody-based proteomics provides a powerful approach for the functional study of the human proteome involving the systematic generation of protein-specific affinity reagents. We used this strategy to construct a comprehensive, antibody-based protein atlas for expression and localization profiles in 48 normal human tissues and 20 different cancers. Here we report a new publicly available database containing, in the first version, 400,000 high resolution images corresponding to more than 700 antibodies toward human proteins. Each image has been annotated by a certified pathologist to provide a knowledge base for functional studies and to allow queries about protein profiles in normal and disease tissues. Our results suggest it should be possible to extend this analysis to the majority of all human proteins thus providing a valuable tool for medical and biological research.
Human natural killer cell secretory lysosome --- 222 proteins --- MCP 2007
Human Jurkat T lymphoma cells protein kinases --- 140 kinases --- MCP 2006
Human amniotic fluid proteome --- 69 proteins --- Electrophoresis 2006
Human platelet proteome --- 641 proteins --- Proteomics 2005
Human salivary proteome --- 309 & 1381 proteins --- Proteomics 2005/J Proteome Res 2006
Human breast tumor interstitial fluid proteome --- 267 proteins --- MCP 2004
Human pituitary adenoma proteome --- 111 proteins --- Proteomics 2003
Human cell line (6) proteomes --- 2341 proteins --- MCP 2003
Human stomach tissue --- 136 proteins --- Electrophoresis 2002
Human colon cancer cell line membrane proteome --- 284 proteins --- Electrophoresis 2000
Human centrosome proteome --- 64 proteins --- Nature 2003
Human pleural effusion proteome --- 1415 proteins --- J Proteome Res 2005
Rat liver rough ER, smooth ER, and Golgi apparatus proteomes - >1400 proteins -Cell 2006
Mouse mitochondria proteome --- 591 proteins --- Cell 2003
Mouse cortical neuron proteome --- 3590 proteins --- MCP 2004
Plasma proteome of lymphoma-bearing SJL mice --- 1079 proteins --- J Proteome Res 2005
Bovine proteome database --- 534 proteins --- J Chromatography B 2005
Drosophila phosphoproteome --- 887 phosphopeptides --- Nat Methods 2007
C. elegans proteome --- 1616 proteins --- J Proteome Res 2003
Snake venom proteome --- 42 proteins --- Toxicon 2006
Malaria parasite Plasmodium falciparum proteome --- 2415 & 1289 proteins --- Nature 2002
Yeast proteome --- 2003 proteins --- Genome Biology 2006
Oral microorganisms proteomes --- 330 proteins --- Oral Microbiol Immunol. 2005
Bacillus subtilis phosphoproteome --- 78 phosphorylation sites --- MCP 2007
Rice proteome database --- 11941 proteins --- Nucleic Acids Res 2004
HMDB: the Human Metabolome Database --- >2180 metabolites --- Nucleic Acids Res 2007
Mass spectrometry (MS)-based proteomics has become a powerful technology to map the protein composition of organelles, cell types and tissues. In our department, a large-scale effort to map these proteomes is complemented by the Max-Planck Unified (MAPU) proteome database. MAPU contains several body fluid proteomes; including plasma, urine, and cerebrospinal fluid. Cell lines have been mapped to a depth of several thousand proteins and the red blood cell proteome has also been analyzed in depth. The liver proteome is represented with 3200 proteins. By employing high resolution MS and stringent validation criteria, false positive identification rates in MAPU are lower than 1:1000.Thus MAPU datasets can serve as reference proteomes in biomarker discovery. MAPU contains the peptides identifying each protein, measured masses, scores and intensities and is freely available at http://www.mapuproteome.com using a clickable interface of cell or body parts. Proteome data can be queried across proteomes by protein name, accession number, sequence similarity, peptide sequence and annotation information. More than 4500 mouse and 2500 human proteins have already been identified in at least one proteome. Basic annotation information and links to other public databases are provided in MAPU and we plan to add further analysis tools.
75 mm chromatography
column and eluted using
a 2 h gradient.
LTQ-FTICR MS or LTQ-Orbitrap MS
Figure 1. Workflow for protein identification and validation.
Results: In this study, we employ state-of-the-art mass spectrometric identification, using both a hybrid linear ion trap-Fourier transform (LTQ-FT) and a linear ion trap-Orbitrap (LTQ-Orbitrap) mass spectrometer, and high confidence identification by two consecutive stages of peptide fragmentation (MS/MS/MS or MS3), to characterize the protein content of the tear fluid. Low microliter amounts of tear fluid samples were either pre-fractionated with one-dimensional SDSPAGE and digested in situ with trypsin, or digested in solution. Five times more proteins were detected after gel electrophoresis compared to in solution digestion (320 versus 63 proteins). Ontology classification revealed that 64 of the identified proteins are proteases or protease inhibitors. Of these, only 24 have previously been described as components of the tear fluid. We also identified 18 anti-oxidant enzymes, which protect the eye from harmful consequences of its exposure to oxygen. Only two proteins with this activity have been previously described in the literature.
Conclusion: Interplay between proteases and protease inhibitors, and between oxidative reactions, is an important feature of the ocular environment. Identification of a large set of proteins participating in these reactions may allow discovery of molecular markers of disease conditions of the eye.
exists in clinical specimens ?
A total of 682 individual protein spots were quantified in 90 lung adenocarcinomas by using quantitative two-dimensional polyacrylamide gel electrophoresis (2-DE) analysis. A leave-one-out cross-validation procedure using the top 20 survival-associated proteins identified by Cox modeling indicated that protein profiles as a whole can predict survival in stage I tumor patients (P<0.01)
Protein Expression Profiles Predict Survival in Stage I. Univariate Cox proportional hazards regression analysis using all 90 samples and 682 protein spots indicated 46 proteins were associated with patient survival (P<0.05, Table 1).
(A) Kaplan–Meier survival plots showing the relationship between patient survival and the risk index based on the leave-one-out cross-validation procedure using the top 20 survival-associated proteins among all 682 proteins using all 90 tumors. The
high- and low-risk groups differ significantly (P 0.005).
(B) Relationship between patient survival and the risk index based on the leave-one-out cross-validation procedure using the top 20 survival-associated proteins among the 62 stage I tumors. The high- and low-risk groups differ significantly (P 0.01).
(C) Relationship between patient survival and PGK1 protein expression in an independent validation set of 90 lung adenocarcinomas. PGK1 immunohistochemical analysis of a tissue array indicates that increased PGK1 is associated with a reduced survival (P 0.04).
(D) Relationship between patient survival and serum PGK1 levels (ratio of PGK1total serum protein) by using ELISA analysis with 107 lung adenocarcinomas (P 0.004).
exists in serum samples ?
THE LANCET • Vol 359 • February 16, 2002
Use of proteomic patterns in serum to identify ovarian cancer
Chips for binding proteins from clinical samples
(Surface-enhanced laser desorption ionization)
This result yielded 100% sensitivity (95% CI 93–100) and 95% specificity (87–99). The positive predictive value for this sample set was 94% (84–99), compared with 35% for CA125 for the same samples.
/ www.sciencexpress.org / 26 September 2002 / Page 1/ 10.
Contribution of Human α-Defensin-1, -2, and -3 to the Anti-HIV-1 Activity of CD8 Antiviral Factor
(by David D. Ho’s group)
It is known since 1986 that CD8 T lymphocytes from certain HIV-1-infected individuals who are immunologically stable secrete a soluble factor, termed CAF, that suppresses HIV-1 replication. However, the identity of CAF remained elusive despite an extensive search. By means of a protein-chip technology, we identified a cluster of proteins that were secreted when CD8 T cells from long-term non-progressors with HIV-1 infection were stimulated. These proteins were identified as α-defensins-1, -2, and -3.
SELDI-TOF mass spectra of secrectory proteins of CD8 T cells from different groups
α-defensins stain in green,
CD8 proteins in red,
and nuclei in blue.
Erika Check is Nature’s Washington biomedical correspondent.
Published: 9 June 2003
BMC Bioinformatics 2003, 4:24
Received: 28 March 2003
Accepted: 9 June 2003
Diagnostic value of Low M/Z values
pattern generated by the blood test he believes
can reliably diagnose cancer.
will learn from this experience what rules of evidence we might apply in the future to find useful results more efficiently.”--- Erika Check
Identification of Serum Amyloid A Protein As a Potentially Useful Biomarker to Monitor Relapse of Nasopharyngeal Cancer by Serum Proteomic Profiling
William C. S. Cho,1 Timothy T. C. Yip,1 Christine Yip,2 Victor Yip,2
Vanitha Thulasiraman,2 Roger K. C. Ngan,1 Tai-Tung Yip,2 Wai-Hon Lau,1 Joseph S. K. Au,1 Stephen C. K. Law,1 Wai-Wai Cheng,1 Victor W. S. Ma,1 and Cadmon K. P. Lim1
1Department of Clinical Oncology, Queen Elizabeth Hospital, Hong Kong Special Administrative Region, The People’s Republic of China and 2Ciphergen Biosystems Inc., Fremont, California
Vol. 10, 43–52, January 1, 2004 Clinical Cancer Research
Fig. 1 Identification of serum biomarkers associated with relapse of NPC
Serum samples were thawed, and 20 ul of each serum were denatured byadding 30 ul of 50 mM Tris-HCl buffer containing 9 M urea and 2%
3-[(3- cholamidopropyl)dimethylammonio]-1-propanesulfonic acid (pH 9). The proteins were fractionated in an anion exchange Q HyperD F 96-well filter plate (Ciphergen Biosystems,
Fremont, CA). Six fractions (namely fractions from the flow through pH 7, pH 5, pH 4, and pH 3 and organic eluant fractions) were collected by stepwise decrease in pH. The fractions were diluted and profiled on a Cu (II) Immobilized Metal Affinity Capture (IMAC3) Protein Chip Array (Ciphergen Biosystems, Fremont, CA; see Ref. 11). All fractionation and profiling steps were performed on a Biomek 2000 Robotic Station (Beckman Coulter).
Fig. 2. Distribution of the peak intensities of the two protein-chip-identified biomarkers (11.6 and 11.8 kDa) in nasopharyngeal carcinoma (NPC) patients, lung cancer patients, patients with benign metabolic disease (thyrotoxicosis), and normal individuals.
Table 1 Clinical parameters and peak intensities of the 11.6- and 11.8-kDa biomarkers in the relapse group of nasopharyngeal cancer patients under study
Table 2 Clinical parameters and peak intensities of the 11.6- and 11.8-kDa biomarkers in the remission group of nasopharyngeal cancer patients under study
A, peptide mapping of the two relapse-associated biomarkers by tryptic digestion. B, tandem mass spectrometry (MS/MS) fragmentation analysis of 2177.9-Da peptide generated from tryptic digest of the two biomarkers.
Fig. 4. Longitudinal monitoring of SAA protein level by protein chip profiling and immunoassay and circulating serum EBV DNA by real-time Q-PCR in NPC patients.
A, B, and C, three NPC patients in relapse were monitored by SAA protein chip (SAA Protein Chip), SAA enzyme immunoassay (SAA EIA), and EBV DNA by Q-PCR (EBV DNA Q-PCR). D, 11, 5, and 8 patients in remission were also monitored by the three techniques, respectively. and E, follow-up profiling curves of 11.6 and 11.8 kDa SAA isoforms by protein chip; ‚, follow-up curves of SAA protein by immunoassay; f (EBER1), follow-up of serum EBV DNA encoding EBV small RNA-1 by Q-PCR; BN 2o, bone metastasis; CR, complete response to chemotherapy; CT, salvage chemotherapy; DLN 2o, distant lymph node metastasis; DX, histopathological diagnosis of NPC; Groin LN 2o, metastasis in lymph nodes of the groin; LV 2o, liver metastasis; LV
2oPR, partial response to chemotherapy in tumor lesion in metastatic liver; DLN 2oCR, complete response to chemotherapy in tumor lesion in metastatic distant lymph node; PG, progression of disease; PR, partial response; RT, radiation therapy; SAA, immunoassay curves for SAA protein;
SP 2o, spleen metastasis.
“Proteome”, defined as “the PROTEin complement of the genOME”, was first coined by Wilkins working as part of a collaborative team at Macquarie (Australia) and Sydney Universities (Australia) in 1995.
“the study of protein properties (expression level, post-translational modification, interactions, etc.) on a large-scale that results in a global integrated view of disease processes, cellular programs and networks at protein level”