1 / 17

January, 23, 2006 Ilkay Altintas

New Developments in Kepler . January, 23, 2006 Ilkay Altintas. Kepler System Architecture. Authentication. GUI. …Kepler GUI Extensions…. Vergil. Documentation. Smart Re-run / Failure Recovery. Provenance Framework. Kepler Object Manager. SMS. Type System Ext. Actor&Data

taite
Download Presentation

January, 23, 2006 Ilkay Altintas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Developments in Kepler January, 23, 2006 Ilkay Altintas

  2. Kepler System Architecture Authentication GUI …Kepler GUI Extensions… Vergil Documentation Smart Re-run / Failure Recovery Provenance Framework Kepler Object Manager SMS Type System Ext Actor&Data SEARCH Kepler Core Extensions Ptolemy

  3. Joint Authentication Framework • Requirements: • Coordinating between the different security architectures • GEON uses GAMA which requires a single certificate authority. • SEEK uses LDAP with has a centralized certificate authority with distributed subordinate Cas • To connect LDAP with GAMA • Coordinating between 2 different GAMA servers • Single sign-on/authentication at the initialize step of the run for multiple actors that are using authentication • This has issues related to single GAMA repository vs multiple, and requires users to have accounts on all servers. • Kepler needs to be able to handle expired certificates for long-running workflows and/or for users who use it for a long time. • A trust relation between the different GAMA servers must be established in order to allow for single authentication.

  4. Functional Prototype Completed • APIs and tests cases in place • More work required on certificate renewal and multiple server access

  5. Vergil is the GUI for Kepler Actor Search Data Search • Actor ontology and semantic search for actors • Search -> Drag and drop -> Link via ports • Metadata-based search for datasets

  6. Actor Search • Challenges: • Building/searching a repository … • Making changes to MoML (see KAR) • GUI changes • Ontology management • Kepler Actor Ontology • Used in searching actors and creating conceptual views (= folders) • Currently 160 Kepler actors added!

  7. Data Search and Usage of Results • Kepler DataGrid • Discovery of data resources through local and remote services • SRB, • Grid and Web Services, • Db connections • Registry of datasets on the fly using workflows

  8. Vergil Updates • To make it more useful to the user • Updated actor icons • Menu redesign • Improve readability • Develop cohesive visual language • Follow standard HF principles • Improve organization Composite DB Query Computation or Operation Transformation Filter File Operation Web Service

  9. Kepler Archives • Purpose: Encapsulate WF data and actors in an archive file • … inlined or by reference • … version control • More robust workflow exchange • Easy management of semantic annotations • Plug-in architecture (Drop in and use) • Easy documentation updates • A jar-like archive file (.kar) including a manifest • All entities have unique ids (LSID) • Custom object manager and class loader • UI and API to create, define, search and load .kar files

  10. KAR File Example <entity name="Multiply or Divide" class="ptolemy.kernel.ComponentEntity"> <property name="entityId" value="urn:lsid:localhost:actor:80:1" class="org.kepler.moml.NamedObjId"/> <property name="documentation" class="org.kepler.moml.DocumentationAttribute"></property> <property name="class" value="ptolemy.actor.lib.MultiplyDivide" class="ptolemy.kernel.util.StringAttribute"> <property name="id" value="urn:lsid:localhost:class:955:1" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="multiply" class="org.kepler.moml.PortAttribute"> <property name="direction" value="input" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="divide" class="org.kepler.moml.PortAttribute"> <property name="direction" value="input" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/> </property> <property name="output" class="org.kepler.moml.PortAttribute"> <property name="direction" value="output" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="false" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="semanticType00" value="http://seek.ecoinformatics.org/ontology#ArithmeticMathOperationActor" class="org.kepler.sms.SemanticType"/> </entity>

  11. Kepler Object Manager • Designed to access local and distributed objects • Objects: data, metadata, annotations, actor classes, supporting libraries, native libraries, etc. archived in kar files • Advantages: • Reduce the size of Kepler distribution • Only ship the core set of generic actors and domains • Easy exchange of full or partial workflows for collaborations • Publish full workflows with their bound data • Becomes a provenance system for derived data objects => Separate workflow repository and distributions easily

  12. Initial Work on Provenance Framework • Provenance • Track origin and derivation information about scientific workflows, their runs and derived information (datasets, metadata…) • Need for Provenance • Association of process and results • reproduce results • “explain & debug” results (via lineage tracing, parameter settings, …) • optimize: “Smart Re-Runs” • Types of Provenance Information: • Data provenance • Intermediate and end results including files and db references • Process (=workflow instance) provenance • Keep the wf definition with data and parameters used in the run • Error and execution logs • Workflow design provenance (quite different) • WF design is a (little supported) process (art, magic, …) • for free via cvs: edit history • need more “structure” (e.g. templates) for individual & collaborative workflow design

  13. Kepler Provenance Recording Utility • Parametric and customizable • Different report formats • Variable levels of detail • Verbose-all, verbose-some, medium, on error • Multiple cache destinations • Saves information on • User name, Date, Run, etc…

  14. Provenance: Possible Next Steps • Provenance Meeting: Last week at SDSC • Deciding on terms and definitions • .kar file generation, registration and search for provenance information • Possible data/metadata formats • Automatic report generation from accumulated data • A GUI to keep track of the changes • Adding provenance repositories • A relational schema for the provenance info in addition to the existing XML

  15. What other system functions does provenance relate to? • Failure recovery • Smart re-runs • Semantic extensions • Kepler Data Grid • Reporting and Documentation • Authentication • Data registration Re-run only the updated/failed parts Guided documentation generation an updates

  16. Hot Topics in Kepler http://kepler-project.org/Wiki.jsp?page=HotTopics

More Related