Kepler project overview status future directions
Download
1 / 38

Kepler Project Overview, Status, Future Directions - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

Kepler Project Overview, Status, Future Directions. Bertram Ludäscher University of California, Davis. Overview. History Origins, Diversity, Challenges Kepler, Kepler/CORE: Issues, Status Next steps Research: Scientific Workflows … Business (workflows) as usual? Or… ?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Kepler Project Overview, Status, Future Directions' - fisk


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Kepler project overview status future directions

Kepler Project Overview, Status, Future Directions

Bertram Ludäscher

University of California, Davis


Overview
Overview

  • History

    • Origins, Diversity, Challenges

  • Kepler, Kepler/CORE:

    • Issues, Status

    • Next steps

  • Research: Scientific Workflows …

    • Business (workflows) as usual?

    • Or… ?


Kepler some history
Kepler: Some History

  • The Origins:

    • AD 2002: NSF/SEEK, DOE/SDM

      • Similar requirements for “scientific workflows”

      • Can we avoid reinventing the wheel (twice…) !?

      • Grass-roots effort, open source collaboration

  • The Head-start:

    • Adopting, extending Ptolemy II from Berkeley

    • Common software platform facilitates grass-root collaboration

    • More than software: Research Results

      • Heterogeneous Modeling & Design, dataflow-, actor-oriented MoCs


Scientific workflows cyberinfrastructure upperware

Upperware

Upper

Middleware

Middleware

Underware

Scientific Workflows: Cyberinfrastructure “Upperware”

NSF/SEEK ITR, 5 Year collaboration: SDSC, UCSB, UCD, UNM, UK, …


Scientific workflow
Scientific Workflow

Capture how a scientist works with data and analytical tools

  • data access, transformation, analysis, visualization

  • possible worldview: dataflow-oriented (cf. signal-processing)

    Scientific workflow (wf) benefits (compare w/ script-based approaches) :

  • wf automation

  • wf & component reuse

  • wf design, documentation

  • wf archival, sharing

  • built-in concurrency

    (task-, pipeline-parallelism)

  • built-in provenance support

  • distributed & parallel exec:

    Grid & cluster support


Simple kepler analysis workflow using r

Data source from EcoGrid

(metadata-driven ingestion)

R processing script

res <- lm(BARO ~ T_AIR)

res

plot(T_AIR, BARO)

abline(res)

Simple Kepler analysis workflow using R …

Dan Higgins, NCEAS


Vs plumbing workflow

to monitor, control remote supercomputer simulations …

50+ composite actors (subworkflows)

4 levels of hierarchy

1000+ atomic (Java) actors

43 actors, 3 levels

196 actors, 4 levels

30 actors

206 actors, 4 levels

33 actors

137 actors

123 actors

150

66 actors

12 actors

243 actors, 4 levels

… vs. “Plumbing” workflow

Norbert Podhorszki

Then: UC Davis, now: ORNL …


Kepler open source open community
Kepler: Open Source + Open Community

  • Huge diversity of domains => needs

    • Astrophysics, nuclear fusion research, geoinformatics, ecology, systematics, bioinformatics, genomics, environmental monitoring, simulation, …

    • Not just bioinformatics and cheminformatics …

  • A broad range of technical problems

    • Workflowdesign with a graphical UI

    • Sharing actors, workflows across communities

    • Distributed workflow execution

    • Data movement on the network

    • Integrate local apps, web services, native actors

    • Support a variety of computational models

    • Not just web service orchestration or Grid deployment


Kepler open source open community1
Kepler: Open Source + Open Community

3. Many kinds of users with different backgrounds and responsibilities:

  • Scientistsautomating and sharing their analyses of their own data or performing meta-analyses on others’ data

  • Software engineersdeveloping their own systems around Kepler

  • Computer scientists doing basic research in scientific workflows, data and provenance management, distributed and collaborative computing.

  • Not just biologists and chemists …

    4. Kepler used in many different deployment contexts

  • Standalone application on a scientist’s desktop computer or laptop.

  • Backend for web-based scientific applications.

  • Embedded workflow engine in larger systems.

  • One size (of deployment) does not fit all!

    5. Kepler open to contribution and extension by anyone:

  • Anyone can contribute to Kepler!

  • Anyone canuse Kepler in their own applications

  • Developing with Kepler doesn’t require collaboration with the “owners” …


Kepler

CORE

Kepler

CORE

COMET!

*

*

*

*

ChIP-chip

Phylogenetics

Astronomy

Library Science

Ecology

Conservation

Biology

Oceanography

Geosciences

Molecular Biology

Chemistry

Particle Physics


Kepler core mission
Kepler-CORE Mission

In collaboration with current and future contributors to Kepler, the Kepler/CORE team will …

  • Develop and maintain the essential, interdisciplinary software components of Kepler

  • Coordinate the contributions of the greater Kepler collaboration to the core (“kernel”) of the system

  • Increase the role of the current and future user community in specifying requirements and priorities


The kepler core project and team
The Kepler-CORE Project and Team

  • Kepler-CORE(sensu stricto)

    • 3-year, $1.7M NSF-OCI funded project

  • Kepler-CORE team @ UCD,UCSB, UCSD:

    • UC Davis

      • Bertram Ludäscher ([email protected]), Shawn Bowers (co-PI), Tim McPhillips (co-PI & software architect), David Welker (software engineer), Sean Riddle (software engineer)

    • UC Santa Barbara

      • Matthew Jones ([email protected]), Mark Schildhauer (co-PI), Aaron Schultz, Chad Berkley (software engineer)

    • UC San Diego

  • Kepler/CORE(sensu lato)

    • Goal: sustain long-term, beyond initial funding period

    • KEPLER = Kepler/Core + Kepler/X + Kepler/Y + …

      • Core, X, Y, … = open community of stakeholders, contributors, users, etc.


Kepler core vision

Kepler-CORE Duo

Kepler-CORE Vision

In the future we foresee Kepler…

  • Satisfying the scientific workflow automation needs of

    • Collaborative government-funded projects

    • Academic research groups

    • Individual researchers in diverse scientific disciplines

  • Enhancing the productivity of researchers by

    • Facilitating discovery and collaboration within and across disciplines

    • Being the best way for scientists to leverage developments and expertise in other domains

  • Leading to further breakthroughs and innovations in the fields of

    • Scientific data management

    • Data provenance

    • Collaborative scientific computing

  • Shepherded by a self-sustaining effort that thrives well beyond the lifetimes of the grants that have contributed to Kepler’s development.


These differences mean that the kepler collaboration will be unique too
These differences mean that the Kepler collaboration will be unique, too

  • Kepler cannot solve everyone’s problems right out of the box

    • Kepler must be adaptable to different domain sciences

    • Adaptation requires more than developing new actors

    • Kepler is as much a development platform as an “end-user” tool

    • No one group can take responsibility for supporting all the ways Kepler will be used …

  • Kepler is open-source but more complex than other open source projects

    • Diversity of domains, users, and deployment contexts mean there can be conflicts between the needs or priorities of contributors

    • Need a way of developing and adding extensions without breaking other’s systems

    • Software engineers developing code for Kepler often are not expert scientists, and cannot be the final authority on what the system should do (unlike projects like Apache, Linux, etc where the engineers are the expert users themselves and can add what they need)

    • PIs and project managers on projects extending Kepler must take responsibility for knowing what needs to be done.

    • It is essential that for each project employing Kepler, representatives authoritative on the scientific and technical needs of their projects participate in driving the future development of Kepler!


Stakeholders essential to success of kepler
Stakeholders: Essential to Success of Kepler unique, too

Kepler stakeholders …

  • Are projects and individuals whose work depend critically on the success of Kepler.

  • Are funded by a variety of sources and work in diverse fields of scientific research.

  • Are more likely to greatly extend Kepler and use Kepler within their own systems than simply develop packages of actors and workflows for use with a standard distribution of Kepler.

  • Need to deliver the software systems they develop to their own community of users.

  • Must deliver their software systems according to their own (e.g. release) schedules as determined by their research and funding programs.

  • Have different requirements that will conflict in the absence of mechanisms for enabling independent extension and deployment of Kepler-based systems.

  • Require recognition for the contributions they make to Kepler as well as for their own systems based on Kepler.

  • Know better than us what they need from Kepler.


Kepler core management
Kepler(-CORE) Management unique, too

  • Leadership Team

    • 3 year terms (current members from UC + [S] + {B, D} )

    • Focus on

      • long term viability of Kepler

      • strategic decisions on behalf of user community, Kepler project

  • Interest Groups

    • Communicate, collaborate on specialized capabilities

  • Development Teams

    • Design, develop, test specific software deliverables

  • Infrastructure Teams

    • Identify, discuss, design, implement Kepler Framework


New kepler build system
New Kepler Build System unique, too

  • Modules and suites

    • Develop against trunk or specific version, release

    • Tag, branch Kepler extensions independently of the kernel

    • easier to share develop, share extensions

    • svn repository (https://code.kepler-project.org/code/kepler/)

  • Module Manager

    • New component to simplify

      working with modules

David Welker et al.


Kepler release roadmap
Kepler Release Roadmap unique, too

  • https://kepler-project.org/developers/teams/build/kepler-release-roadmap

  • Kepler releases

    • based on a “standard set” of modules

  • Individual module releases

    • Provenance

    • Workflow reporting

    • COMAD

    • Distributed: Master/Slave

  • Kepler 2.0

    • Add modules, extensions to installed Kepler dynamically

    • Targeted for Summer 2009



Kepler reap workflow run manager
Kepler/REAP: Workflow Run Manager unique, too

Derik Barseghian et al.

  • Use case “Publication-Ready Archive”

  • Archive workflow with inputs, outputs

  • Tagging

  • Also: Outline view

    • to manage browsing of large/deeply nested workflows


Workflow reports provenance interest group
Workflow Reports (Provenance Interest Group) unique, too

Derik Barseghian et al.


Kepler and scientific workflow research
Kepler and Scientific Workflow Research unique, too

  • Scientific Workflows:

    • Business (workflows) as usual?

      • Data-oriented, data-centric …

        • … as opposed to control-, task-centric

    • Signal processing?

    • Or else …?

  • Modeling scientific processes, analysis methods

    • Understanding is in the Mind of the Beholder!

  • Example areas:

    • Workflow Modeling & Design

    • Provenance

    • Optimization


Modeling example chip chip workflow
Modeling Example (ChIP-chip workflow) unique, too

Tim McPhillips et al.


Modeling design the limits of my language mean the limits of my world

Vanilla Process Network unique, too

Functional Programming Dataflow Network

XML Transformation Network

Collection-oriented Modeling & Design framework (COMAD)

“Look Ma: No Shims!”

Modeling & Design: The limits of my language mean the limits of my world …



Data provenance

AXG unique, too

AYG

AZG

RI1

AI1

alignWarp:1

reslice:1

AH1

convert:1

WP1

slicer:1

RH1

AXG

AXS

RI2

AI2

alignWarp:2

reslice:2

AH2

AI

WP2

softmean:1

slicer:2

RH2

convert:2

RI

RH

RI4

AYG

AYS

AH

alignWarp:3

reslice:3

AI4

WP4

AH4

RH4

slicer:3

convert:3

RI4

AZG

reslice:4

alignWarp:4

AZS

AI4

WP4

AH4

RH4

outputs

inputs

Data Provenance

  • Keep track of data dependencies, processing history

     support interpretation, validation, reproducibility

AlignWarp

Reslice

Softmean

Slicer

Convert


Kepler ppod provenance browser
Kepler/pPOD: Provenance Browser unique, too

Shawn Bowers et al.

For conventional data provenance and

fine-grained dependencies (COMAD style)

Navigate forward and backward in time (VCR style) in different views (collections, processes, combined)


Collection history
Collection History unique, too

  • Collection and invocation view

  • Incrementally step through execution history



Fine grained data moc aware mop
Fine-grained, Data & MoC-aware MoP unique, too

Manish Anand, Shawn Bowers, et al.


Optimization multi level workflows
Optimization: Multi-level Workflows unique, too

Kepler

LPPN

Daniel Zinn et al.


Modeling optimization virtual assembly lines comad
Modeling + Optimization: unique, tooVirtual Assembly Lines (COMAD)


Layers in comad xml pipelines
Layers in COMAD / ∆-XML Pipelines unique, too

WF Graph

  • Access data in XML stream

  • Call Scientific Functions (Services)

  • Put results back into stream

Configurations

(white-box)

CipresRAxML

Scientific Functions

(black-boxes)

Out: (t:Tree, s:score)+

In: DNASeq+

Thres: Float

Method: String

Daniel Zinn (UC Davis)


Conventional vs assembly line comad thinking
Conventional vs Assembly Line / COMAD Thinking unique, too

Daniel Zinn (UC Davis)



Conceptual pipeline w scopes types
Conceptual Pipeline w/ Scopes & Types unique, too

Daniel Zinn (UC Davis)


X csr xml scissor cut ship reassemble
X-CSR (“XML Scissor”): Cut-Ship-Reassemble unique, too

Daniel Zinn (UC Davis)


Acknowledgments

UC unique, tooDAVIS

Department of

Computer Science

Acknowledgments

  • Kepler contributors

    • Many individuals: https://kepler-project.org/developers/kepler-contributors

    • Projects: Ptolemy, SEEK, SDM, CPES, GEON, REAP, CIPRes, ChIP2, pPOD, COMET, BAP, LTER, RAPR, ITER, …

  • Funding agencies: NSF, DOE, …

  • DAKS @ UC Davis Members

    • Research Staff

      • Drs. Shawn Bowers, Timothy McPhillips, Lei Dou, Ustun Yildiz

    • Developers:

      • Sean Riddle, David Welker, Gongjing Cao

    • Students:

      • Manish Anand, Dave Thau, Daniel Zinn, Sven Koehler, Saumen Dey, Supriya Gulati, Faraaz Sareshwala, Xuan Li


ad