January, 23, 2006 Ilkay Altintas

New Developments in Kepler January, 23, 2006 Ilkay Altintas

Kepler System Architecture Authentication GUI …Kepler GUI Extensions… Vergil Documentation Smart Re-run / Failure Recovery Provenance Framework Kepler Object Manager SMS Type System Ext Actor&Data SEARCH Kepler Core Extensions Ptolemy

Joint Authentication Framework • Requirements: • Coordinating between the different security architectures • GEON uses GAMA which requires a single certificate authority. • SEEK uses LDAP with has a centralized certificate authority with distributed subordinate Cas • To connect LDAP with GAMA • Coordinating between 2 different GAMA servers • Single sign-on/authentication at the initialize step of the run for multiple actors that are using authentication • This has issues related to single GAMA repository vs multiple, and requires users to have accounts on all servers. • Kepler needs to be able to handle expired certificates for long-running workflows and/or for users who use it for a long time. • A trust relation between the different GAMA servers must be established in order to allow for single authentication.

Functional Prototype Completed • APIs and tests cases in place • More work required on certificate renewal and multiple server access

Vergil is the GUI for Kepler Actor Search Data Search • Actor ontology and semantic search for actors • Search -> Drag and drop -> Link via ports • Metadata-based search for datasets

Actor Search • Challenges: • Building/searching a repository … • Making changes to MoML (see KAR) • GUI changes • Ontology management • Kepler Actor Ontology • Used in searching actors and creating conceptual views (= folders) • Currently 160 Kepler actors added!

Data Search and Usage of Results • Kepler DataGrid • Discovery of data resources through local and remote services • SRB, • Grid and Web Services, • Db connections • Registry of datasets on the fly using workflows

Vergil Updates • To make it more useful to the user • Updated actor icons • Menu redesign • Improve readability • Develop cohesive visual language • Follow standard HF principles • Improve organization Composite DB Query Computation or Operation Transformation Filter File Operation Web Service

Kepler Archives • Purpose: Encapsulate WF data and actors in an archive file • … inlined or by reference • … version control • More robust workflow exchange • Easy management of semantic annotations • Plug-in architecture (Drop in and use) • Easy documentation updates • A jar-like archive file (.kar) including a manifest • All entities have unique ids (LSID) • Custom object manager and class loader • UI and API to create, define, search and load .kar files

KAR File Example <entity name="Multiply or Divide" class="ptolemy.kernel.ComponentEntity"> <property name="entityId" value="urn:lsid:localhost:actor:80:1" class="org.kepler.moml.NamedObjId"/> <property name="documentation" class="org.kepler.moml.DocumentationAttribute"></property> <property name="class" value="ptolemy.actor.lib.MultiplyDivide" class="ptolemy.kernel.util.StringAttribute"> <property name="id" value="urn:lsid:localhost:class:955:1" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="multiply" class="org.kepler.moml.PortAttribute"> <property name="direction" value="input" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="divide" class="org.kepler.moml.PortAttribute"> <property name="direction" value="input" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/> </property> <property name="output" class="org.kepler.moml.PortAttribute"> <property name="direction" value="output" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="false" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="semanticType00" value="http://seek.ecoinformatics.org/ontology#ArithmeticMathOperationActor" class="org.kepler.sms.SemanticType"/> </entity>

Kepler Object Manager • Designed to access local and distributed objects • Objects: data, metadata, annotations, actor classes, supporting libraries, native libraries, etc. archived in kar files • Advantages: • Reduce the size of Kepler distribution • Only ship the core set of generic actors and domains • Easy exchange of full or partial workflows for collaborations • Publish full workflows with their bound data • Becomes a provenance system for derived data objects => Separate workflow repository and distributions easily

Initial Work on Provenance Framework • Provenance • Track origin and derivation information about scientific workflows, their runs and derived information (datasets, metadata…) • Need for Provenance • Association of process and results • reproduce results • “explain & debug” results (via lineage tracing, parameter settings, …) • optimize: “Smart Re-Runs” • Types of Provenance Information: • Data provenance • Intermediate and end results including files and db references • Process (=workflow instance) provenance • Keep the wf definition with data and parameters used in the run • Error and execution logs • Workflow design provenance (quite different) • WF design is a (little supported) process (art, magic, …) • for free via cvs: edit history • need more “structure” (e.g. templates) for individual & collaborative workflow design

Kepler Provenance Recording Utility • Parametric and customizable • Different report formats • Variable levels of detail • Verbose-all, verbose-some, medium, on error • Multiple cache destinations • Saves information on • User name, Date, Run, etc…

Provenance: Possible Next Steps • Provenance Meeting: Last week at SDSC • Deciding on terms and definitions • .kar file generation, registration and search for provenance information • Possible data/metadata formats • Automatic report generation from accumulated data • A GUI to keep track of the changes • Adding provenance repositories • A relational schema for the provenance info in addition to the existing XML

What other system functions does provenance relate to? • Failure recovery • Smart re-runs • Semantic extensions • Kepler Data Grid • Reporting and Documentation • Authentication • Data registration Re-run only the updated/failed parts Guided documentation generation an updates

Hot Topics in Kepler http://kepler-project.org/Wiki.jsp?page=HotTopics

January, 23, 2006 Ilkay Altintas

January, 23, 2006 Ilkay Altintas

Presentation Transcript

January 2006

WSF 2006 BAMAKO 19-23 January 2006 Another World is Possible

F314 Follow-up Clinical Training January 23, 2006

Annual Meeting Agenda January 23 , 2006

National SCAN-ICT Workshop – Mauritius 23 January, 2006

January 23, 2012

January 23, 2013

January 23, 2009

Terence Critchlow, Xiaowen Xin, Bertram Ludaescher, Ilkay Altintas Mladen Vouk, Zengang Cheng,

National SCAN-ICT Workshop Mauritius 23 January 2006

Shawn Bowers Timothy McPhillips Bertram Ludaescher in collaboration with Ilkay Altintas

Ilkay ALTINTAS Lab Director, Sc ientific Workflow Automation Technologies

January 2006

ALA-MW Presentation January 23, 2006

Privacy in Computing CS 6v81.504 January 23, 2006

January 2006

Ilkay ALTINTAS Assistant Director, National Laboratory for Advanced Data Research

Efrat Jaeger, Ilkay Altintas

23 January

January 2006

F314 Follow-up Clinical Training January 23, 2006

ARIES Meeting, San Diego (January 23, 2006)