1 / 21

Sho Murakami, Takuya Yoshihiro, Etsuko Inoue and Masaru Nakagawa

Predicting Combinatorial Protein-Protein Interactions from Protein Expression Data Based on Correlation Coefficient. Sho Murakami, Takuya Yoshihiro, Etsuko Inoue and Masaru Nakagawa Faculty of Systems Engineering, Wakayama University. Agenda. Background

shelby
Download Presentation

Sho Murakami, Takuya Yoshihiro, Etsuko Inoue and Masaru Nakagawa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Combinatorial Protein-Protein Interactions from Protein Expression DataBased on Correlation Coefficient Sho Murakami, Takuya Yoshihiro, Etsuko Inoue and Masaru Nakagawa Faculty of Systems Engineering, Wakayama University

  2. Agenda • Background • Combinatorial Protein-Protein Interactions • The Proposed Data Mining Method • Evaluation • Conclusion 2

  3. Background • FindingInteractions amonggenes/proteinsareimportant • Many data-mining algorithms to discover gene-gene (or protein-protein) interactions are proposed so far. • One of the main source is gene or protein expression data Colorstrength isexpression level Size ofspotisexpressionlevel Microarray (forgene expression) 2D Electorophoresis (forprotein expression)

  4. Related Work for Interaction Discovery • Bayesian Networks • Discoveringinteractions from expression databased on conditional probability among events Ex. to discover protein-protein interactions among proteins A, B and C, 1. Define events A, B and C 2. Compute conditional probability related with A, B and C A B A C If high, Interaction is predicted B C samples Event “C is expressed”

  5. Problems of Bayesian Networks • Bayesian Networks Require large Number of Samples • For gene: microarray supplies cheap and high-speed experiment • For protein: 2D-electrophoresis takes time and expensive ex. to discover protein-protein interactions among proteins A, B and C, 1. Define events A, B and C 2. Compute conditional probability related with A, B and C A B A C sufficient samples in the area ? B C Many Samples are Necessary to obtain statistically reliable results

  6. The Objective of our study Finding combinatorial protein-protein interactions from small-size protein expression data

  7. Expression Data 2D-electrophoresis processed for each sample which includes expression levels of each protein. Expression levels: obtained by measuring size of areas As pre-processing, normalization is applied ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ sample3 sample1 sample2 Each black area indicates a protein: size of areas represent expression levels Proteins 7

  8. Model of Protein-Protein Interaction Considered • Model: two proteins A and B effect on other protein C’s expression level only when both A and B are expressed Sole effect from A,B on C is usually considered Only If both A and B exist, Combinatorial effect works on C! A A A A A C C B B B B Effect on expression levels C B Complex of A and B We want to estimate the combinatorial Effect! 8

  9. Predicting Interactions by Correlation Coefficient • Computing correlation coefficient of (A,B) and C • Correlation coefficient requires less number of samples • The amount of complex (A,B) is estimated by min(A,B) • Total effect on C will be high if correlation is high Expression level Compute correlation of min(A,B) and C This amount would Effect on C Estimated amount of complex of A and B C A B Expression level of A and B of a sample min(A,B) 9

  10. The problem of scale difference Amount of expression level for 1 molecular is different among proteins, so the same amount of A and B not always combined. Therefore, taking min cannot express correct amount of complex Scaling problem and solution The amount of complex is not correct Exp.level is the expression level required for a complex Estimated number of complex A B Solution: correct the scale of A ProteinsA andB Taking min leads correctamount of complex Exp.level A B ProteinsAandB 10

  11. How to determine correct scale? • Select the scale which leads the maximum correlation coefficient of min(A,B) and C • If interaction of our model exists, high correlation value must appear. Expression level Compute Correlation A k1A k2A k3A B Score S min(A,B) min(A,B) min(A,B) min(A,B) Correlation:0.1 Correlation:0.2 Correlation:0.3 Correlation:0.7 We compute Score S: the total effect of (A, B) on C 11

  12. Estimating Combinatorial Effect from Score S • Score S consists of “Sole Effect” and “Combinatorial Effect” • Compute Score S’: Score S assuming no combinatorial effect • Difference between S and S’ is the level of Combinatorial Effect C A B Computing Statistic Distribution Assuming no combinatorial Effect Level of combinatorial effect A C B C Score S’ C A B Score S A C B C C A B The difference between score S and S’ is the combinatorial effect

  13. How to compute distribution of score S’? • Assume that expression levels of proteins A, B and C follow normal distribution • Computer simulation leads the distribution of Score S’ ① Randomly create a distribution of A, B and C where correlation coefficient of A-B is α, that of B-C is β Distribution of A Distribution of B Distribution of C Correlation β Correlation α Repeat computation of score S Score S’ of α=0.5, β=0.3 Score S’ofα=0.5, β=0.4 ② Obtain distribution of score S’ ③ Create the table of average and stddev for each α and β We can obtain the distribution for each α and β. Upper: average Lower: stddev

  14. Computing Combinatorial Effect as Z-score • Place the score S in distribution of S’ • Z-score: Measure difference between score S and average of S’ as the count of standard deviation Score S’ Score S The amount of combinatorial effect level Distribution of score S’ corresponding Compute score S Z-score=(score S-avg(S’)) / stddev(S’) Measurement as count of standard deviation Z-score Score S average The higher z-score is, the stronger the combinatorial effect is !

  15. A C Try every scales B A C B A C B Summary of the proposed algorithm • Trying all combination of A, B and C • Compute the maximum correlation coefficient among all scale of A and Bto compute Score S • Compute z-score and create ranking by them 3 1 2 Compute max correlationamong every scale Compute z-scoresfrom distribution of S’ Trying all combinations S’ correlation:0.3 S correlation:0.8 Expression Data Z-score= 5.5 Score S = 0.8 correlation: 0.5 4 list of all combinations Ranking by z-score

  16. Evaluation • Applying our method into real expression data • Protein expression data of black cattle • # of samples is 195, # of proteins is 879 finding combinatorial protein-protein interactions using our method

  17. The Expression Data Follows Normal Distribution • By way of Jarque-Bera test with confidential level of 95%, we test if expression data follows normal distribution. • Result: 454 proteins out of 879 proteins follow normal distribution • Thus, we use 454 proteins for evaluation

  18. Results • We foundsomanycombinations ofproteinswhich would havecombinatorial effect • The maximum value of z-score is 11.0 • The combinations where z-value is more than about 5.5(p-value isless than 0.000000019(=0.05/454C3))) would have combinatorial effect with confidential level of 95%. The histogram of z-score # of combinations Z-score

  19. Comparing z-scores with normal distribution • We compare thehistogram with that of without combinatorial effect • Createdbyaugmenting normal distribution with the number of trials (454C3) • It is inferred that this data includes considerable amount of combinatorial effect Estimated distribution of z-score obtained from real data Distribution of z-score underassumption no combinatorial effect Histogram of real data Histogram withoutcombinatorialeffect # of combinations # of combinations Z-score Z-score

  20. The Ranking based on Z-score • The rankingtableshowsthat • CombinationswithlowscoreSareretrieved. • Sameproteintends to appearmany times. ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ B C Correlation of A-C Correlation of B-C Score S Z-score A Protein Num B Protein Num C Protein Num Rank A C C A B The ranking of Z-score obtained from real data

  21. Conclusion • Summary • We proposeamethod to estimatecombinatorialeffectofthreeproteinsfrom proteinexpression data • Applyingthe methodintoreal data,wefoundmanycombinationswhich would havecombinatorialeffect • Futurework • To confirmthe reliability, we areplanningto studywhetherthefoundcombinationsincludewell-knownprotein-proteininteractionsor not.

More Related