Late fusion methods and performance metrics for the effective prioritization of drug candidates - PowerPoint PPT Presentation

callum-shelton
late fusion methods and performance metrics for the effective prioritization of drug candidates n.
Skip this Video
Loading SlideShow in 5 Seconds..
Late fusion methods and performance metrics for the effective prioritization of drug candidates PowerPoint Presentation
Download Presentation
Late fusion methods and performance metrics for the effective prioritization of drug candidates

play fullscreen
1 / 16
Download Presentation
Late fusion methods and performance metrics for the effective prioritization of drug candidates
85 Views
Download Presentation

Late fusion methods and performance metrics for the effective prioritization of drug candidates

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Late fusion methods and performance metrics for the effective prioritization of drug candidates Author: Gábor Csizmadia Supervisor: Péter Antal

  2. Abstract There are many different ways to assess similarities between compounds, such as: • Structure-based • Chemical property-based • Biological effect-based • Literature-based If we combine different methods, we should get more accurate results -> data fusion Implemented software: rank and score fusion methods, performance metrics

  3. Overview • Drug prioritization • Data fusion approaches • Rank/score fusion • Performance metrics • Implemented software • Future plans

  4. Drug prioritization List of known active compounds for a specific condition Assess similarities to other compounds Predicting which compounds are active in the unknown set

  5. Data fusion approaches • Early: data vectors are concatenated • Intermediate: similarity matrices are combined • Late: rankings or scorings are combined -> rank and score fusion

  6. Rank and score fusion Learning to rank Rank fusion methods: • Borda fusion • Rank vote • Pareto ranking • Parallel selection Score fusion methods: • Sum score

  7. Borda fusion • Each ranking assigns a certain number of points to the ranked compounds based on their rank • The points are then summed to get the score of each compound

  8. Rank vote • Each ranking votes for its top n compounds • The ranking is based on how many votes a compound received

  9. Pareto ranking Each compound is ranked based on the number of compounds better in all rankings

  10. Parallel selection • Compounds are selected from each ranking in turn • If a compound that would be selected has already been selected before, the next compound from that ranking is selected instead

  11. Sum score The normalized scores of each ranking are summed to get the fused score of a compound

  12. Performance metrics 1. The performance of a ranking (how early it ranks actives) can be measured in various ways: Area under curve (AUC) values for the following: • AC (Accumulation Curve): plots the true positive rate as a function of the fraction of data classified as positive • ROC (Receiver Operating Characteristic): plots the true positive rate as a function of the false positive rate • CAC (Centralized AC) • CROC (Centralized ROC) ROC curve, source: Wikipedia

  13. Performance metrics 2.

  14. Implemented software • Java language • command line 2 modules: fuser (12 classes), performance tester (13 classes + 2 interfaces) • dedicated class for scored rankings: Ranking • common interface for all fusion methods: Fuser • common interface for all metrics: Metric • java fusiontester.Main [type] [r1path] [r1ms] [r2path] [r2ms] ... • java performancetester.Main [type] [rankingpath] [activespath]

  15. Future plans • better handling of incomplete data • testing effects of noise • consider statistical significance of sources • ... • (TDK)

  16. References • Bolgár Bence Márton. Kernel fúziós módszerek alkalmazása a genomikai kísérlettervezésben és adatelemzésben. 2012. • Fredrik Svensson, Anders Karlén, and Christian Sköld. Virtual Screening Data Fusion Using Both Structure- and Ligand-based Methods. J. Chem. Inf. Model. 2012, 52, 225−232. • S. Joshua Swamidass, Chloé-Agathe Azencott, Kenny Daily and Pierre Baldi. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Advance Access publication April 7, 2010. • Jean-François Truchon and Christopher I. Bayly. Evaluating Virtual Screening Methods: Good and Bad Metrics for the “Early Recognition” Problem. J. Chem. Inf. Model. 2007, 47, 488-508.