1 / 23

Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming

UC DAVIS Department of Computer Science. San Diego Supercomputer Center. Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming. Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Lud ä scher Dept. of Computer Science & Genome Center

menora
Download Presentation

Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UC DAVIS Department of Computer Science San Diego Supercomputer Center Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher Dept. of Computer Science & Genome Center University of California, Davis ludaesch@ucdavis.edu + many other SDM/SPA & Kepler contributors!

  2. Ilkay Altintas SDM, NLADR, Resurgence, EOL, … Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEK Terence Critchlow SDM Tobin Fricke ROADNet Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEK Efrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOL Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludaescher SDM, SEEK, GEON, BIRN,ROADNet Mark Miller EOL Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy II Bing Zhu SEEK ••• KEPLER/CSP: Contributors, Sponsors, Projects Ptolemy II Ptolemy II www.kepler-project.org LLNL, NCSU, SDSC, UCB, UCD, UCSB, UCSD, …, Zurich SPA Collab. tools: IRC, cvs, skype, Wiki: hotTopics, FAQs, ..

  3. GEON Dataset Generation & Registration(a co-development in KEPLER) % Makefile $> ant run SQL database access (JDBC) Matt,Chad, Dan et al. (SEEK) Efrat (GEON) Ilkay (SDM/SPA) Yang (Ptolemy) Xiaowen (SDM/SPA) Edward et al.(Ptolemy)

  4. Update: endo-SPA (exo-Kepler), endo-Kepler (exo-SPA), … w/o counting peas… • No/minor changes: • XSLT, email, … • Web service actor (SDM) • Updated: dynamic operation display, error reporting • Command line actor (SDM) • Updated: improved interface and error handling • SSH2 actor (SDM) • New: implements ssh2 protocol for remote execution (no plain password sent over the wire) • Timestamp actor (SDM) • New: for logging • BrowserUIv2.0 (SDM) • reimplemented, improved interface • v3.0 planned (“catching” http-get/post via localhost) • Execution logger (SDM) • New: workflow “black box” for keeping track of runs • Documentation framework (SDM) • Autogenerated actor documentation (new doclets and taglets) • Ontology-based actor and dataset classification (SEEK) • Finding relevant components: actors and datasets, suggesting possible connections, … • Kepler/SRB toolkit (GEON, SDM, SEEK, …) • improved interfaces, new functions • …

  5. Application Pull vs Technology Push • Use case driven (application pull) • PIW, TSI-1, TSI-2, … • Solve technology issues along the way (+) solve the particular scientists’ problem (-) one-of-a-kind solutions, few generic & reusable technology Example: • TSI-1 and TSI-2 are conceptually almost identical scientific (“Grid/HPC/HTC”) workflows • but implemented very differently  limited reuse, e.g., evolving/customizing one into the other is hard/impossible…

  6. Application Pull vs Technology Push • Technology driven (technology push) • Generic application integration mechanisms: • web service actor, harvester, command-line actors, ssh2 actor, BrowserUI, … • Specialized interfaces to HPC/HTC systems: • Large-scale data management: • SDSC SRB toolkit (set of SRB actors), • SRM?, PVFS2?, MPI-IO?, … • Interfacing with generic job schedulers: • NIMROD, Condor, APST, … • Interfacing with scientific packages: • Statistics toolkit (R, …), GIS (Grass, ArcIMS, Mapserver…) • GAMESS toolkit, APBS (visualization)… (+) developing a reusable technology / toolkits (!) still need guidance by domain scientists’ problems, but need to lift one-of solutions into a general SWF engineering methodology

  7. Increasing number of Kepler actors…

  8. … creating prototype workflows and test cases (for automated tests) …

  9. … putting them together in generic, reusable packages, e.g.Kepler/SRB toolkit SRB holdings @ SDSC only: 404 TB in 59 million files across 5167 users (12/16/’04, Reagan Moore)

  10. KEPLER/R Toolkit (under development) Source: Dan Higgins, Kepler/SEEK

  11. New Developments & Directions

  12. Ontology-based Actor & Dataset Discovery Ontology based actor (service) and dataset search Result Display

  13. Example: GAMESS Quantum-mechanics cheminformatics workflow • Job management infrastructure in place • Results database: under development • Goal: 1000’s of GAMESS jobs (quantum mechanics)

  14. Towards a Framework for “Grid/HPC/HTC” WFs & Job Management

  15. Technology-oriented meeting: May 12th Ptolemy/Kepler Miniconference in Berkeley

  16. What’s needed, what’s next • Build generic toolkits / packages • Don’t reinvent – Reuse! • Improved R coupling, SCIRun coupling, … • SWF Framework that lets scientists choose… • SRB (Sput, Sget,…), SRM, MPI-IO, GlobusTK (GridFTP,…) , Sabul, …, pNetCDF, parallel-R, … packages • Condor, Nimrod, … schedulers • GRASS, … • General purpose SWF system/PSE that scientists can use themselves

  17. Towards a KEPLER School of Expression (Flow-based Design Patterns) • Generality vs specialization of actors • also loosely coupled vs tightly coupled • Data transformation pipelines • alternate compute and data transformation steps • Stage-execute-fetch pattern (Grid/HPC/HTC-WFs) • Loops, higher-order functions (map, foldr, …) • cf. Taverna’s automatic loop insertion based on data types • JDBC/SRB connection tokens, proxies, certificates connect A B C methods functions f [f1,f2, …fn] F-map producer [f(x)1,…,f(xn)] producer map [x1,x2, …xn] X

  18. Blurring Design (ToDo) and Execution

  19. Kepler@UC Davis Genome Center: Scientific Workflows to Support the Complete (Wet-lab) Experiment Lifecycle • Try to capture and (semi-)automate the Experiment Lifecycle: • Discover similar experiments, … • reuse, customize, • execute, monitor, • manage results, • Register back to an experiment repository • Support Experiment Design, Execution, & Reuse • Scientific workflows and semantic extensions (ontologies, metadata++)

  20. Summary: What we could/should do • Push technology: • Distributed Kepler & “detached” execution • Making Kepler more X-aware, where … • … X=Data plumbing (SRB toolkit, GridTK, others, …) • … X=Grid & Scheduling (need a “Grid director”? Condor director?), • … X=Parameter-sweep (“Nimrod/APST”… director?) • … X=Statistics & other specialized packages (R, parallel-R?, …, Grass, … ) • … X=Visualization (SciRUN, …) • Semantic extensions • Actors and datasets have “semantic types” to support reource discovery, WF design, … • Create “Packages” or “Rolls” • … targeting certain scientific user groups & communities • SWF Life-cycle support: • Design, execution, monitoring, archival, re-use/re-run • Design patterns, “Kepler School of Expression”

More Related