1 / 15

Ilkay ALTINTAS Lab Director, Sc ientific Workflow Automation Technologies

Kepler Scientific Workflows : Current and Future Development. Ilkay ALTINTAS Lab Director, Sc ientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD. Scientific Workflow Systems. Combination of data integration, analysis, and visualization steps

shel
Download Presentation

Ilkay ALTINTAS Lab Director, Sc ientific Workflow Automation Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kepler Scientific Workflows:Current and Future Development Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD

  2. Scientific Workflow Systems • Combination of • data integration, analysis, and visualization steps • automated"scientific process” • Mission of scientific workflow systems • Promote “scientific discovery” by providing tools and methods to generate scientific workflows • Create an extensible and customizable graphical user interface for scientists from different scientific domains • Support computational experiment creation, execution, sharing, reuse and provenance • Design frameworks which define efficient ways to connect to the existing data and integrate heterogeneous data from multiple resources

  3. Ptolemy II: A laboratory for investigating design KEPLER: A problem-solving environment for Scientific Workflow KEPLER = “Ptolemy II + X” for Scientific Workflows Kepler is a Scientific Workflow System www.kepler-project.org • … and a cross-project collaboration • 3rd Beta release (Jan 8, 2007) • Builds upon the open-source Ptolemy II framework

  4. Ecology SEEK: Ecological Niche Modeling and climate change REAP: Modeling parasite invasions in grasslands using sensor networks NEON: Ecological sensor networks COMET: Environmental science Geosciences GEON: LiDAR data processing, Geological data integration NEESit: Earthquake engineering Molecular biology SDM: Gene promoter identification and ScalaBLAST ChIP-chip: Genome-scale research CAMERA: Metagenomics Oceanography REAP: SST data processing LOOKING/OOI CI: ocean observing CI ROADNet: real-time data modeling and analysis Ocean Life project Phylogenetics ATOL: Processing Phylodata CiPRES: Phylogentic tools Chemistry Resurgence: Computational chemistry DART/ARCHER: X-Ray crystallography Library science DIGARCH: Digital preservation UK Text Mining Center: Cheshire feature and archival Conservation biology SanParks: Thresholds of Potential Concerns Physics SDM: astrophysics TSI-1 and TSI-2 CPES: Plasma fusion simulation ITER-EU: ITM fusion workflows Kepler use cases represent many science domains!

  5. Some of the current R&D Distributed execution of workflow parts (peer to peer) Efficient data transfer Provenance tracking of data and processes Tracking workflow evolution Streaming data analysis Easy-to-deploy batch interfaces Intuitive workflow design Customizable semantic typing Interoperability with other workflow and analytical environments (at exec level) Production workflow examples: GEON LiDAR workflow (GLW) 116 registered, 106 active users 2076 submitted jobs to date Center for Plasma Edge Simulation Code-Coupling Workflow (CPES-CCW) 2000 actors, 5 levels of model hierarchy Longest run duration 3 hours PtII AirForce Lab Model 12920 actors, 65331 attributes Longest run duration: 10 minutes Longest running real-time simple monitoring model in PtII - months at a time All generated using the GUI and executed in batch mode… No coding and text manipulation Kepler today is a research prototype and a production workflow tool!

  6. REAP: Realtime Environment for Analytical Processing reap.ecoinformatics.org • Management and Analysis of Observatory Data using Kepler Scientific Workflows • The vision: • An integrated environment for analyzing data from observatories • Funded 2006-2009 • NSF CEO:P • Jones(PI), Altintas, Baru, Ludaescher, Schildhauer • Partners: • NCEAS/UCSB (Lead), SDSC/UCSD, UCDavis, CENS/UCLA, OpenDAP, OSU • Two scientific use cases: • Terrestrial ecology • Oceanography

  7. REAP Views • For scientists • capabilities for designing and executing complex analytical models over near real-time and archived data sources • For data-grid engineers • monitoring and management capabilities of underlying sensor networks • For outside users • access to observatory data and results of models, approachable to non-scientists.

  8. REAP: Terrestrial Ecology Usecase Workflows to develop and test models exploring the impacts of abiotic factors (real-time light, temperature, and rainfall measurements) on the dynamics of plant host populations and their susceptibility to viral pathogens.

  9. REAP: RBNB Streaming Data Actor Example data from Terrestrial UseCase Hardware: a Campbell Scientific CR800 datalogger with eight attached sensors, operating on a workbench.

  10. REAP: Oceanographic Usecase Facilitate the quantitative evaluation of SST data sets.

  11. Kepler/C.O.R.E kepler-project.org • SDCI NMI Improvement: Development of Kepler/CORE – A Comprehensive, Open, Reliable, and Extensible Scientific Workflow Infrastructure • The vision: • Coordinate development of a comprehensive, open, reliable and extensible Kepler scientific workflow infrastructure • Funded 2007-2010 • NSF SDCI • Ludaescher(PI), Altintas, Bowers, Jones, Mc Phillips, Schildhauer • Partners: • Genome Center/UCDavis (Lead), SDSC/UCSD, NCEAS/UCSB Builds on community participation as a driving force for Kepler.

  12. Kepler/C.O.R.E. • Comprehensive • First-class support for technical features • Open • well designed and clearly articulated mechanisms and interfaces provided to facilitate developing extensions • Reliable • Both as a development platform and as a run-time environment for the user • Extensible • Independently extensible by groups not directly collaborating with the team

  13. Directors in Kepler • Means to execute networks of components under multiple execution models • Dataflow (SDF, PN, DDF) vs. time-based (CT) vs. event-based (DE) vs. all combined • Makes use of separation of concerns principle • e.g., component execution, workflow execution and provenance tracking • The manager acts like a “common execution environment” • governing different concerns related to execution of the network and services Ptolemy and Kepler are unique in combining different execution models in heterogeneous models! Dataflow Time Triggered Synchronous/reactive model Discrete Event Wireless Process Networks Rendezvous Publish and Subscribe Continuous Time Finite State Machines

  14. Credits • Kepler community and colleagues • On REAP and Kepler/CORE: • Shawn Bowers, Bertram Ludaescher, Timothy Mc Phillips, Genome Center, UCD • Matt Jones, Derik Barseghian, Mark Schildhauer, NCEAS, UCSB • Eric Seabloom, OSU • Peter Cornillion, OpenDAP

  15. Questions… Ilkay Altintas altintas@sdsc.edu +1 (858) 822-5453 http://www.sdsc.edu

More Related