“Proteomics & Bioinformatics”

“Proteomics & Bioinformatics” MBI, Master's Degree Program in Helsinki, Finland Lecture 5 11 May, 2007 Sophia Kossida, BRF, Academy of Athens, Greece Esa Pitkänen, Univeristy of Helsinki, Finland Juho Rousu, University of Helsinki, Finland

Mining proteomes To identify as many components of the proteome as possible Mapping of proteomes of various organisms and tissues Comparison of protein expression levels for the detection of disease biomarkers

How to select proteome? A proteome is defined by the state of the organism, tissue, or cell that produces it. Because these states are constantly changing, so are the proteomes. Example of proteomes: different kind of cells; liver, … extracellular fluids;blood plasma, urine, CSF…

Applications Systems biology- understand cell-pathways, network, and complex interacting. Biological processes- characterize sub-proteomes such as protein complexes, cellular machines, organelles Biomarkers - discovery of disease (serological, urine, other biological fluids) - diagnostics, treat patients, monitor therapies Drug targets- evaluate toxicity & other biological or pharmaceutical parameters associated with drug treatment

Protein Profiling Measure the expression of a set of proteins in two samples and compare them - Comparative proteomics • 2D gel electrophoresis • Difference gel electrophoresis (DIGE) • LC-MS/MS using coded affinity tagging • (ICAT, iTrac, SILAC..) • ProteinChip Array (SELDI analysis) • Antibody arrays

Laser-Capture Micro dissection, LMC Technique for selectively sampling certain cells within a tissue Biopsy Tissue sample Transfer film Tumor Glass slide Laser beam activates film Cells Selected cells are transferred Genomic/proteomic analysis Modified from “National Cancer Institute”, US National Institutes of Health: http://www.cancer.gov/cancertopics/understandingcancer/moleculardiagnostics/Slide29

Coomassie blue stained gels Silver stained 2D gels, DIGE High resolving power Absolute / relative quantity Easily archived for further comparison Detects some PTMs and alternatives splices Low troughput Poor detection of large, acidic, basic and membrane proteins Only high abundance proteins

DIGE Proteins are labeled prior to running the first dimension with up to three different fluorescent cyanide dyes Mix labeled extracts Internal standard Allows use of an internal standard in each gel-to-gel variation, reduces the number of gels to be run Adds 500 Da to the protein labeled Additional post-electrophoretic staining needed

Control phosphoglycerate mutase phosphoglycerate mutase Alzheimer’s Disease phosphoglycerate mutase phosphoglycerate mutase Human brain proteins Differences in Expression Level in Thalamus

Example of different expression

LC-MS/MS using coded affinity tagging Moderate throughput, but can be automated Detects some low abundance proteins Most isotope label experiments limited to two versions –heavy and light isotope, i.e. binary comparisons only Poor detection of alternative splices and PTMs

Labeling Chemical, ICAT, ITRAQ Chemical modifications to amino acids generally after digestion Most labels differ by 3-10Da in mass (not complete / interferences) Compares only 2-8 samples SILAC Stable isotopes incorporated during cell growth Must be able to grow cells Compares 2 or 3 samples Lys (+8 Da) and Arg (+10 Da) Ion Current No labeling of any kind, See everything in the sample not just what gets labeled Normalization issues, (2 separate runs are compared) Standards needed Robust and many samples and experimental conditions can be compared

Quantification heavy LC-MS-MS m/z light A B Identification Isotope Coded Affinity Tag (ICAT) Two protein samples, are labeled with normal and heavy versions of the same isotope-coded affinity tag (ICAT) reagent, respectively. The reagent binds to cysteine residues and carries a biotin-tag. Samples are mixed, digested and ICAT-labeled peptides are recovered via the biotin tag of the ICAT reagents by -affinity chromatography. Drawback: Cysteine containing peptides only

ICAT • Label protein samples with heavy and light reagent • Reagent contains affinity tag and heavy or light isotopes Chemically reactive group: forms a covalent bond to the protein or peptide Isotope-labeled linker: heavy or light, depending on which isotope is used Affinity tag: enables the protein or peptide bearing an ICAT to be isolated by affinity chromatography in a single step Modified from http://skop.genetics.wisc.edu/AhnaMassSpecMethodsTheory.ppt#260,11,Mass Spectrometry

Reactive group: Thiol-reactive group will bind to Cys Biotin Affinity tag: Binds tightly to streptavidin-agarose resin O Linker: Heavy version will have deuteriums at * Light version will have hydrogens at * NH NH H H N O O * N * I O * * O S O Example of an ICAT Reagent Modified from http://skop.genetics.wisc.edu/AhnaMassSpecMethodsTheory.ppt#260,11,Mass Spectrometry

Stable-isotope labeling Aebersold and Mann, Nature, 2004

Isobaric tag reagent Isobaric tags for relative and absolute quantification Allows us to compare the relative abundance of proteins from four different samples in a single mass spectrometry experiment Isobaric Tag (Total mass =145 Da) Peptide reactive group Reporter mass=114 to 117 Gives strong signature ion in MS/MS Good b- and y-series Maintains charge state and ion masses Signature ion masses lie in quiet low mass region Balance mass 31 to 28 Balances the mass change of reporter to maintain a total mass of 145 Neutral loss in MS/MS Amine specific

114 b 115 P D E T I P E 116 y 117 NHS + peptide NHS + peptide NHS + peptide NHS + peptide 31 114 30 115 29 116 28 117 iTRAQ Uses up to 4 tag reagents that bind covalently to the N-terminus of the peptide and any Lysine side chains at the amine group (global tagging). Each sample set is digested separately and then mixed with the specific iTRAQ tag Samples mixed MS Reporter – Balance - Peptide intact 4 samples identical m/z MS/MS Peptide fragments –equal Reporter ions different Modified from “Quantitative Proteomics Using Isotope Tagging of Peptides” by Kathryn Lilley

iTRAQ spectrum

heavy light m/z cell culture (in vivo) amino acid metabolism Steen & Mann, Nature, 2004 Stable isotope labeling in cell culture SILAC 1. Cell culture with normal Arginine 2. Cell culture plus “heavy” Arginine. LC-MS/MS Combine, digest, (purification) Quantify levels from peak ratio

Ratio ~4:1 4Da @ +2 ion = 8 Da (Lys) SILAC Example From presentation by: Nicholas E. Sherman, Ph.D. http://www.healthsystem.virginia.edu/internet/biomolec/Keck_Dec12_2006.ppt#387,15,Slide 15

SELDI Surface Enhanced Laser Desorption Ionization Ionized proteins are detected and their mass accurately determined by Time-of-Flight Mass Spectrometry High throughput Small amounts of sample More reproducible than 2DE, but lower resolving power Applied for the analysis of crude samples Process is not standardized

Chemical Surfaces (Hydrophobic) (Anionic) (Cationic) (Metal Ion) (Normal Phase) Biological Surfaces (PS10 or PS20) (Antibody - Antigen) (Receptor - Ligand) (DNA - Protein) The SELDI-chip

Antibody arrays Not discovery based Must have 1 or 2 specific high affinity antibodies Very high throughput Can be highly quantitative - relative and absolute Can design reagents to detect PTMs, splice forms

Forward phase Reverse phase Sandwich assay Direct assay Detection with 2nd Antibody Detection with Labeled Analyte Detection with Labeled Antibody Analyte Antibody immobilized on glass substrate Analytes immobilized on glass substrate Antibody array Modified from slide; FullMoonBiosystemsInc. (http://www.fullmoonbio.com/Doc/Overview.pdf)

Protein Protein Interactions From single proteins to systems biology

Protein-Protein Interactions Proteins “work together” forming multi complexes to carry out the specific functions

Identification of interactions • Experimental • x-ray crystallography • NMR spectroscopy • Mass spectrometry • (Tandem affinity purification) • Immunoprecipitation • Yeast two-hybrid • Microarrays • Computational • Genomic data • Phylogenetic profiling • Gene context • Gene fusion • Symmetric evolution • Structural data • Sequence profile • 3D structural distance matrix • Surface patches • Binding interactions

X-ray crystallography Crystals hard to obtain Good for large proteins Bioinformatics center, University of Copenhagen Modified from presentation ;http://www.biosys.dk/courses/Previous_courses/Introductory_Bioinformatics/protein_structure.pdf

Nuclear Magnetic Resonance Multidimensional NMR NMR Spectroscopy For proteins in solution Better for small proteins than large ones

Protein complex Identification by mass spectrometry SDS-PAGE MALDI-TOF Immunoprecipitate anti- Peptide mixture LC-MS-MS “shotgun” identification

Protein complex anti- anti- anti- anti- Immunoprecipitation Immunoprecipitation of a protein of interest, analyzed by 1D-SDS-PAGE Electrophoretically transferred to membrane, the membrane is probed with antibodies suspected as partners of the target protein SDS-PAGE Immunoprecipitation Western blot undetected Only detects what one sets out to look for. Obtaining a suitable antibody is important. The antibody might immuno-precipitate the protein successfully, but not when other interacting proteins are present.

Yeast Two-Hybrid System A transcription factor is split into 2 domains and two hybrid proteins are designed. One protein of interest (bait) is typically fused to a DNA-binding domain. The proteins being screened for interactions with the bait (preys) are fused to a transcription-activating domain. An interaction between the bait and a prey will bring these 2 domains close together which in turn results in the transcription of a reporter gene. The reporter can be: essential, in which case the colony dies if no interaction reversely, the reporter gene can be attached to a green fluorescent protein Prey protein Bait protein mRNA Activation Domain Binding Domain Promoter Region Reporter Gene The rate of false positive is high (estimated > 45%)

Microarray co-expression Microarray: study the expression of genes as a a function of time, or following treatment with a drug, … Co-expression of genes are usually a sign that the two proteins interact. Gene A Gene B Expression level Time or treatment

Identification of Co-expressed Genes To determine which genes have similar/correlated expression patterns – to derive their functional relationships • Data clustering • We can represent each gene as a vector (5, 15, 10, 7, 5, 3) • So a set of expression data can be represented as a collection of data points in K-dimensional space • Genes with similar expression patterns form data clusters

C A In silico Prediction of PPI Phylogenetic Profile The phylogenetic profile of a protein is a string that encodes the presence or absence of the protein in every sequenced genome Conserved presence orabsence of a protein pair suggests functional coupling. • Phylogenetic profile (against N genomes): • For each gene X in a target genome: if gene X has a homolog in genome #i, the ith bit of X’s phylogenetic profile is “1” otherwise it is “0”

Org 1 Protein A Org 2 Protein B Org 3 Protein C Org 4 B A Org 1 Org 2 In silico Prediction of PPI • Gene Context • Conserved gene neighbourhood suggests position- function coupling • Gene Fusion (Rosetta stone) • Seemly unrelated proteins are sometimes found fused in another organism Though gene-fusion has low prediction coverage, its false-positive rate is low

In silico Prediction of PPI Symmetric Evolution Interaction positions on different proteins should co-evolve so as to maintain the interface. Look for correlation between sequence changes at one position and those at another position in a multiple sequence alignment. Docking determination of protein complex structure from individual protein structures

Structure- and interaction databases STRING(EMBL) BOND(Unleashed Informatics) DIP(UCLA) iHOP

STRING http://string.embl.de

BOND Biomolecular Object Network Databank http://bond.unleashedinformatics.com

Database of Interacting Proteins The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. http://dip.doe-mbi.ucla.edu/

ihop http://www.ihop-net.org/UniPub/iHOP/

Proteomics in human diseases

Laser Flight tube + + + + Identification of diagnostic proteomic patterns Bladder Cancer Benign Fingerprinting of bladder cancer Combination of protein extract MALDI-TOF/TOF LC blood/urine Application of bioinformatics tools (feature extraction, classification algorithms) Disease classification

Strategy for Biomarker Discovery Genomic analysis mRNA level Diseasevs. Normal Proteomic analysis (2D gels / MS) Discovery Candidate gene Validation in situhybridizationImmunohistochemistry Application Large # samples Small # candidates Clinical Application Diagnostic Prognostic Therapeutic

Proteins as biomarkers The protein composition may be associated with disease processes in the organism and thus have potential utility as diagnostic markers. Proteins are closer to the actual disease process, in most cases, than parent genes Proteins are ultimate regulators of cellular function Most cancer markers are proteins The vast majority of drug targets are proteins Individual biomarkers are not sufficient for accurate disease detection Panel of biomarkers should be established

Benefits of Molecular Diagnostics proteins MS Patient’s blood sample Ovarian pattern • Create new cancer screening tools • Inform design of new treatments • Monitor treatment effectiveness • Predict patient’s response to treatment

no cancer proteins proteins cancer From known samples to serum proteins Patterns as screening tool MS Protein patterns Early diagnosis of disease Early warning of toxicity MS

Proteomics in nutrition of food Development of fingerprinting techniques to identify changes in modified organisms at different integration levels (2D gels, MALDI) MALDI-MS).

“Proteomics & Bioinformatics”