1 / 69

Protein Interaction Module Detection using Matrix-Based Graph Algorithms

Protein Interaction Module Detection using Matrix-Based Graph Algorithms. Chris Ding Lawrence Berkeley National Laboratory. Bioinformatics & Computational Biology Computational genomics : Molecular biology at genomic level. Genomics Research. More than 100 genomes’ DNA sequenced

nike
Download Presentation

Protein Interaction Module Detection using Matrix-Based Graph Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Interaction Module Detection using Matrix-Based Graph Algorithms Chris Ding Lawrence Berkeley National Laboratory

  2. Bioinformatics & Computational BiologyComputational genomics: Molecular biology at genomic level

  3. Genomics Research • More than 100 genomes’ DNA sequenced • DNA microarray chip technology • Protein – protein interaction technology • Gene knock-out for gene regulatory network • Many high-through technologies • Bio-imaging (embryos imaging, EM) • Huge number of databases • GenBank, Protein Data Bank, SCOP, Pfam • Gene Ontology

  4. A Genomics Research Trend • Large # of genomes have been sequenced. • Traditional Approach: Predict genes, predict proteins, predict structures, prediction functions • This structural genomics is inadequate • Protein interactions: a new approach

  5. Protein – Protein Interactions • Proteins carry out tasks together with other proteins • 83% proteins interact with others • Proteins interact in promoters • Multi-protein complexes (assemblies) • Synergistic interactions • Complex – complex cross-talks • Proteins work out in modular fashion • Gene regulation • Biological Pathway Most drug block certain pathways • Major goal of research: detect protein modules

  6. Protein Interactions Antibody – antigen binding DOE Genome to Life

  7. Protein interaction experiments • Two-hybrid Assay • Protein coordination in promoter region • Binary interactions • Capture transient and unstable interactions • Mass Spectrometry • TAP-MS: Tandem affinity purification • HMS-PCI: high throughput protein interaction id. • Use bait proteins • Capture multi-protein complexes • Problems: • Results do not agree • Lots of noise

  8. Tandem-Affinity Purification with Mass-Spectrometry (TAP-MS) determines constituents of multi-protein complexes. Many baits are simultaneously processed to obtain many complexes Gavin, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002;415(6868):141-147. More reliable technology (Deng, et al)

  9. Protein Interaction Experiments • Different experiments don’t agree: small overlap Salwinski and Eisenberg, 2003

  10. Protein Interactions A genome has 5000 proteins. Each interacts ~ 5 others.

  11. Outline • Protein interaction • Interaction Data • Graph models • Spectral Clustering • Cliques • Bi-cliques • Results

  12. Bipartite Graph Model p –nodes: proteins c –nodes: protein complexes Protein Complex: p –nodes: proteins c –nodes: protein domains Protein domain:

  13. Unified Representation of Protein Complex Data Input: Protein Complex data: B protein – protein network: protein complexe–protein complexnetwork: (Ding, He, Meraz, Holbrook, Proteins, 2004)

  14. A B Co-location of domains: Bridged Bipartite Graph: Pfam domains match SCOP domain Matching : Reach 90% accuracy Compared to direct match (Zhang, Chandonia, Ding, Holbrook, BMC Bioinformatics 2004)

  15. Protein Interaction Module:densely connected subgraphs

  16. Protein Interaction Modules • Find highly connected regions: • Graph clustering • Cliques • Bi-cliques

  17. Outline Protein interaction Interaction Data Graph models Spectral Clustering Cliques Bi-cliques

  18. Spectral Clustering: MinMaxCut min between-cluster similaities (weights) max within-cluster similarities (weights) (Ding, RECOMB’02)

  19. Spectral Clustering Method (MinMaxCut) • Minimize similarity between A,B: • Maximize similarity within A & B: Cluster membership indicator: Minimizing leads to Solution given by eigenvector Cluster assignment:

  20. Graph clustering examples

  21. A NP-hard intractable combinatorial optimization problem can be effectively solved bya simple eigenvector !

  22. Spectral Clustering • 2-way clustering • K-way clustering • Recursive 2-way clustering • K-way relaxation (K eigenvectors) • Cluster Self-aggregation and Perturbation Analysis • Characteristics • Principled approach • Clear and well-motivated clustering objective functions • Everything is proved rigorously • Based on well-established matrix/algebra theory • A rich framework (clustering, ordering, ranking, etc) • State of Art Algorithm

  23. Recursive MinMaxCut Clustering of Lymphoma Issues (Alizadeh et al, 2000) • B cell lymphoma go thru different stages • 3 normal stages • 3 cancer stages • Key question:can we detect them automatically ? (Ding, RECOMB’02)

  24. Gene expression of lymphoma (Stanford) (Ding, RECOMB’02)

  25. Spectral Clustering • 2-way clustering • K-way clustering • Recursive 2-way clustering • K-way relaxation (K eigenvectors) (principled) • Cluster Self-aggregation and Perturbation Analysis • Characteristics • Principled approach • Clear and well-motivated clustering objective functions • Everything is proved rigorously • Based on well-established matrix/algebra theory • A rich framework (clustering, ordering, ranking, etc) • State of Art Algorithm

  26. Outline • Protein interaction • Interaction Data • Graph models • Spectral Clustering • Application to computing protein interaction modules • Cliques • Bi-cliques • Results

  27. Clustering Protein Complex Graph Input: Protein Complex data: B protein – protein network: protein complex–protein complexnetwork: (Ding, He, Meraz, Holbrook, Proteins, 2004)

  28. Computed Protein Clusters

  29. Experimental Protein Complex Computed

  30. Implications of discovered protein clusters on protein interactions: F-statistics F - statistics of amino acids and physical property across all protein clusters: statistical significance Lys, Arg, Asp are most significant: => electrostatic forces are dominant surface factors influencing protein interactions Surprise: secondary structure is not important factor in protein module formation

  31. Protein Secondary Structure • Alpha helix • Beta sheet • Coil regions

  32. Outline • Protein interaction • Interaction Data • Graph models • Spectral Clustering • Cliques • Bi-cliques • Results

  33. Clique K-core Every node connects to everyone else Every node connecto to at leat 3 others Protein Interaction Modules • Find highly connected regions • cliques • k-core: subgraph with node degree > k

  34. Clique Every node connects to everyone else Motzkin-Struss Formalism for computing maximal cliques Clique computing is NP-hard. Even approximating clique is hard. Motzkin-Straus Theorem. on all nodes of the graph Vector L1 enforce sparsity Non-zero entries define the clique

  35. Generalized Motzkin-Straus Formalism on all nodes of the graph Vector s.t. L1 enforce sparsity Non-zero entries define the clique Setting =1.05 we can compute maximum clique better than standard approach =1.0. (Ding, Zhang, Holbrook, 2006)

  36. s.t. Initialize update Algorithm for computing clique Solving the constrained quadratic programming problem Theorem 1. Correctness: Solution converges to local maxima Convergence: Iterative algorithm converges Theorem 2.

  37. Update rule: satisfies KKT condition At Convergence Proof of Correctness Constrained Optimization Theory Introduce Lagrangian function KKT Optimality Condition (Complementarity Slackness): Lagrangian multipier value:

  38. G(x,x’) is an auxiliary function of L(x) if We maximize a lower-bound. set L(x) is monotonically increasing and is bounded from up. Thus the algorithm converges Proof of Convergence Using Auxiliary Function (from Machine Learning)

  39. Proof of Convergence (cont) Key: (1) find auxiliary function, (2) find global maxima The auxiliary function is First order derivative: 2nd order derivative: is negative definite Thus G(x,x’) is concave in x. Global maxima easily obtained.

  40. Partial list of Discovered Cliques

  41. Subunits Of SRP (signal recognition particle) Complex Clique: Srp19, Srp14, Srp21, Srp54, Srp68, Srp72 The clique also includes a yeast protein SRP21, which is not found in mammalian SRP; forms a pre-SRP structure in the nucleolus that is translocated to the cytoplasm (Halic et al, 2004)

  42. Signal Recognition Particle (SRP) help proteins to pass through ER membrane ribosome Network to transport proteins and lipids Fig 27-33 Lehninger

  43. Outline • Protein interaction • Interaction Data • Graph models • Spectral Clustering • Cliques • Bi-cliques • Results

  44. Cliques in a bipartite graph • Finding a complete block in the adjacency matrix • Similarly to bi-clustering, widely used in bioinformatics • Example. Gene expression profiles: a gene is relevant only for certain subset of celluar processs, not all process. • Two types of maximal bi-cliques: • Maximum Node Bicliques: max |R|+|C| (perimeter) • Maximum Edge Bicliques: max |R|*|C| (area)

  45. Bicliques in a 2D Dataset

  46. DNA Gene expression Lymphoma Cancer(Alizadeh et al, 2000) Genes Effects of feature selection: Select 900 genes out of 4025 genes Tissue sample

  47. Generalized Motzkin-Strauss Theoremfor maximal edge biclique Given bipartite graph with adjacency matrix B. Compute maximal edge bi-clique. Generalized Motzkin-Strauss Theorem. Vector on row nodes of the graph Vector on column nodes of the graph s.t. Non-zero entries define the biclique (Ding, Zhang, Holbrook, 2006)

  48. s.t. Theorem 1. At convergence, solution satisfies KKT condition. Theorem 2. Lmonotonically increases under update.Algorithm converges. Algorithm for computing bicliques Solving the constrained quadratic programming problem Update:

  49. A New Upper Bound on the size of maximum-edge biclique Using the generalized Motzin – Strauss theorem, derive Largest singular value of B (Ding, Li, Jordan, 2007)

  50. Biclique Example Solution vectorx Solution vectory

More Related