1 / 26

Solving ILP Problems in the EELA infrastructure

Solving ILP Problems in the EELA infrastructure. In ês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal. Outline. Introduction ILP Examples Motivation Experiments Conclusions Future Work. Introduction. EELA selected application

vanida
Download Presentation

Solving ILP Problems in the EELA infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

  2. Outline • Introduction • ILP • Examples • Motivation • Experiments • Conclusions • Future Work

  3. Introduction • EELA selected application • Task 3.3: additional applications

  4. Introduction • What is ILP? • It is NOT Instruction Level Parallelism • It is NOT Integer Linear Programming • So, what is it???? • .......

  5. Introduction • It is Inductive Logic Programming • data mining • machine learning • Knowledge/information extraction • Where: • Given: • Set of observations (positive and negative) • Background knowledge (descriptions) • Language bias • Find: • A hypothesis (in first order language) that best explains all positive observations and none of the negatives.

  6. Introduction • Advantages: • Use of an understandable description language • Relational knowledge

  7. Introduction: example TRAINS GOING EAST TRAINS GOING WEST

  8. Introduction: example

  9. Introduction: example TRAINS GOING EAST TRAINS GOING WEST

  10. Introduction: example TRAINS GOING EAST TRAINS GOING WEST eastbound(T) IF has_car(T,C) AND short(C) AND closed(C)

  11. Another less “toyish” example: extracting knowledge from mammograms is_malignant(A) if 'BIRADS_category'(A,b5), 'MassPAO'(A,present), 'Age'(A,age6570), previous_finding(A,B,C), 'MassesShape'(B,none), 'Calc_Punctate'(B,notPresent), previous_finding(A,C), 'BIRADS_category'(C,b3). This rule states that finding (A) IS malignant IF it is: classified as BI-RADS 5 AND had a mass present in a patient who: was between the ages of 65 and 70 had two prior mammograms (B, C) and prior mammogram (B): had no mass shape described had no punctate calcifications and prior mammogram (C) was classified as BI-RADS 3

  12. Introduction: Motivation • Applications: • Link discovery • Social Network Analysis • Equivalent identities • Drug design • Protein unfolding • Protein metabolism • Why not? Classifying grid failures () • And...many others!

  13. Introduction: Motivation • Why does ILP need a grid? • Search space can become large very quickly • Need many experiments to have statistical significant results • Cross-validation • Training, tuning, testing • Can combine classifiers: ensembles

  14. Introduction: Motivation • Assume we want to run a task for one domain: find a “good” hypothesis that describes pos examples • Assume we run 5x4-fold cross-validation • Assume we have 100 classifiers per fold • # of experiments: 2,000

  15. Introduction: Motivation • Now assume each experiment takes 1 hour to run • How long would it take to generate the 2,000 classifiers to be combined? ~ 83 days!!! • If we consider varying learning parameters and learning algorithms, this number can be really big!!

  16. Experiment • Predict carcinogenecity in rodents • Difficult task • large search space! • Important problem • Phase 1: • Tuning using 5x4-fold cross-validaton • Generating ensembles up to 100 • Aleph: well-known ILP system • Yap: Yet another prolog 

  17. Experiment: one of the classifiers active(A) if atom(A,_,n,32,B), B ≤-0.401, has_property(A,cytogen_sce,n), methyl(A,_). Sister Chromatid Exchange (SCE) SCE is used for the determination of mutagenity

  18. Experiment • 2 submissions: • From LA • From EU

  19. Submitting jobs from LA....

  20. Experiment EELA resources utilised ~ 300 resources in LA 211 jobs in LA

  21. Experiments • Why 1,969 out of 2,000??? • 2 reasons: • Proxy expiration: • On submission (takes loooooong!!!) • On execution • Use of dynamic libraries

  22. Submitting jobs from EU... • from a non-EELA site, BUT • Using the EELA VO: • Jobs run only on EU resources... • Reasons: • Misconfiguration? • Closer brokers with more machines?

  23. Conclusions • Happiness: EELA is working!!! • We can run thousands of experiments! • Frida is happy!!! (see Condor introductory tutorials, if you feel curious about Frida ) • Experiment showed good utilization of EELA resources in LA and EU • Low failure rate (1%) • Failures motivated by: • Dynamic libs not available in the remote machine • Proxy expiration

  24. Future work • More detailed analysis of jobs and logs • Full ILP experiment • More domains • Other kinds of experiments based on Statistical Relational Learning • And, do not forget: ILP can help to model and diagnose errors in the grid environment!

  25. Collaborators • Fernando Silva (DCC-UPorto) • Vítor Santos Costa (DCC-UPorto) • Rui Camacho (FE-UPorto) • Nuno Fonseca (IBMC/IBMEC, Porto) • Beth Burnside (UW-Madison hospital) • David Page (UW-Madison) • Jesse Davis (UWashington)

  26. Thanks!!! Questions??

More Related