1 / 47

Ravi Madduri University of Chicago Argonne National Laboratory

Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna. Ravi Madduri University of Chicago Argonne National Laboratory. About me. Research Fellow at the Computation Institute, University of Chicago

bryson
Download Presentation

Ravi Madduri University of Chicago Argonne National Laboratory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University of Chicago Argonne National Laboratory

  2. About me • Research Fellow at the Computation Institute, University of Chicago • Lead architect for Workflow technologies in the caBIG project • Workflow Working Group Chair and a key person in the BIRN project • Interested in Informatics, Applications of High throughput data transfer, computing in Biomedical informatics

  3. And..

  4. Agenda • Introduction to Service Oriented Science (SoS) • Introduction to caBIG as an example of SoS • Introduce caGrid as an enabler of SoSvision • Introduce Workflow concepts • Talk about our implementation using Taverna • Show a few Tavernaworkflows including the AutoQRS workflow from CVRG • Lessons learned and future directions. 

  5. Service-Oriented Science People create services (data, code, instr.) … which I discover (& decide whether to use) … & compose to create a new function ... & then publish as a new service.  I find “someone else” to host services, so I don’t have to become an expert in operatingservices & computers!  I hope that this “someone else” can manage security, reliability, scalability, … ! ! “Service-Oriented Science”, Science, 2005

  6. caBIG Goal and Vision caBIG is a virtual web of interconnected data, individuals and organizations that redefines how research is conducted, care is provided, and patients/participants interact with the biomedical enterprise. • Connect the cancer research community through a shareable, interoperable infrastructure • Deploy and extend standard rules and a common language to more easily share information • Build or adapt tools for collecting, analyzing, integrating and disseminating information associated with cancer research and care

  7. caBIG function dimensions caGrid Clinical Data and Trials Management Biospecimen Management In Vivo Imaging Molecular Characterization

  8. What is caGrid? • Biomedical applications that share data all have common needs for syntactic and semantic interoperability • caGrid is a software toolkit aimed at software developers creating Grid applications

  9. caGrid provides • Metadata services that add semantic information to all Grid services • The GAARDS toolkit, a standard security platform • Introduce: the ‘Eclipse’ for services development • Index Service: A service registry for advertisement and discovery of capabilities

  10. caGrid: nuts and bolts

  11. A scientific workflow • precisely defines a multi-step procedure, to seamlessly integrate and streamline local and remote heterogeneous computational and data resources to perform in silico scientific exploration.

  12. Service discovery Data access Service interaction Security enforcement Knowledge sharing Workflow Requirements

  13. Overview of caGrid Workflow Composition Discovery instruments reuse Community data Orchestration generate Connectivity Analysis • Workflow as consumer • Easily reuse services for complex experiments. • Workflow as contributor • Workflow as “best practice” wrapped as services. • Workflow providing RoI for SOA Virtualization Security caGrid computation resource

  14. caGrid Workflow Suite • Service discovery • Data access • Service interaction • Security enforcement • Knowledge sharing

  15. The caBIG Workflow System • Data-flow modeling flavor • caGrid activity • State management (WSRF) • Security (GSI) • Service discovery based on cancer research metadata. composition Discovery • Implicit iteration: handle parallel execution • WSRF and GSI enforcement • Workflow Execution. Service • Workflows in caGrid Portal Community reuse Execution Reuse generate A “Facebook” for caGrid workflows caGrid

  16. Semantic Service Discovery • Semantic search – searches Index Service for registered caGrid services matching various search criteria: • Service name, inputs, outputs, research center, class names, concept codes, etc.

  17. Semantic Service Discovery Service metadata • Types of query • String based. • Property based. • Semantic based.

  18. caBIG services palette • As a result of semantic search or direct adding • caBIG services appear in Taverna’s Service Panel • Ready to be dragand dropped into caGrid workflows

  19. Data access: CQL Builder

  20. Service interaction: managing state 0 10 20

  21. Security enforcement • Authentication • Ability to invoke services secured by Grid Security Infrastructure (GSI) • Integrated caGrid Security framework (GAARDS) with Taverna’s Credential manager • Transport Level Security • Authorization • This is done on the service side upon looking at User’s credentials • Credential Delegation Service Integration

  22. Secure Grid services • Taverna can invoke secure Grid services that require user to log in to caGrid • Taverna interacts with caGrid’s GAARDS infrastructure to obtain user’s proxy: • Authenticate the user with user’s affiliated Authentication Service • Obtain user’s proxy from Dorian Service • Default proxy lifetime: 12 hours

  23. Using secure caGrid services • Involves: • Discovering a secure caGrid service from Taverna • Logging onto selected caGrid to obtain a proxy certificate • Saving and managing caGrid proxies and username and passwords

  24. Configuring secure services (1/2) • Authentication Service and Dorian Service urls required in order to obtain user’s proxy • Can be configured globally for all services from the same caGrid (in preferences) • Can be configured individually for a particular caGrid service (overrides configuration from preferences)

  25. Configuring secure services (2/2) • View secure’s service details • Configure service’ssecurity properties

  26. Logging onto caGrid • User is prompted for his caGrid username and password when any secure service is invoked from a workflow for the first time

  27. Credential management • Taverna obtains proxy for user from Dorian Service using user’s caGrid username and password • Proxies are saved and managed byCredential Manager • caGrid username and password can also be remembered

  28. Workflow execution service Workflow Portlet Taverna Workbench Client API Workflow Service Taverna Engine Data Services Analytical Services caGrid & Other Services EPR createResource Stateful Resources (Resource Properties) startWorkflow getStatus getOutput Taverna Workflow Service wraps the Taverna execution engine into a WS-Resource and exposes operations such as createResource, startWorkflow, getStatus, and getOutput for user submitted workflows.

  29. Workflow execution service • Taverna Workflow Service • Provides stateful resources that execute the workflows. • Supports caGrid security architecture (GSI Security). • Allows programmatic submission of workflows.

  30. Access Taverna workflow via caGrid portal Taverna Workflow Portlet is deployed in the caGrid Portal on the training Grid: URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow View : 1 • The Portlet currently lists a few workflows with their descriptions that can be browsed from the above URL • Users can select a workflow they are interested in running.

  31. Access Taverna workflow via caGrid portal URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow View : 2 • Based on the number of input ports in the workflow, the portlet prompts the users to enter the input values in the textbox. • For example, the Lymphoma workflow takes only one input in the form an Experiment ID that identifies the experiment that caArray uses for data collection. • Hit submit after the entering the data.

  32. Access Taverna workflow via caGrid portal URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow Views : 3, 4, & 5 • The portlet stores the user submitted workflows in the current session of the portal. • Users can View all the Active and Completed Workflows in the session. • Clicking the Output Button shows the output of the workflow. • The portlet provides workflow specific view-resolvers to render the outputs. For E.g: Lymphoma workflow currently displays the output in a html table.

  33. Knowledge Sharing • Search ‘cabig’ in myExperiment or • Typehttp://www.myexperiment.org/search?type=workflows&query=cabig • Typehttp://tinyurl.com/cabig-workflow

  34. Discovery using myExperiment

  35. Lymphoma Prediction Workflow MicroArray from tumor tissue Microarray preProcessing Lymphoma prediction

  36. Lymphoma type prediction Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT)

  37. AutoQRS Analysis Workflow WFDB binary and Patient ID Store WFDB WFDBdata service Retrieve WFDB Patient Record Analysis Execution Record JSDL service AutoQRS Analytical Service Invoke Processing AutoQRS Output Data Service AutoQRS XML Results

  38. The Taverna workflow

  39. The result in MS Excel

  40. Accomplishments • Lymphoma workflow – Among the top 20 most viewed/downloaded Workflows in myExperiment • This is more impressive given that this workflow was uploaded much later than the other workflows • Our BMC-Bioinformatics Article on “caGrid Workflow Toolkit: A Taverna based workflow tool for cancer Grid” achieved “Highly Accessed” relative to its age • We are part of the CVRG Project that recently got renewed

  41. Lessons Learned • Lower the barriers to entry for sharing data and analytics • Software is surprisingly hard to use for end users – more so if the benefit is not all too clear • Return on Investment of a SOA is in creating reusable workflows (LEGO blocks) • Workflows are only as good as the services we create • Traditional SDLC does not always work in the favor of the end users • 80-20 and KISS

  42. Goals of Workflow Project in CVRG • Deploy existing technology on the CVRG that can be used to store and execute workflows generated locally using the Taverna workbench • Develop new technology that allows non-expert users to graphically compose and execute workflows via a web-interface. • Extend the Taverna Engine and add support to invocation of REST-style services so that users can annotate workflow inputs and outputs using ontology terms from NCBO Bioportal and other ontology repositories • Develop specifications describing how workflows should be designed, validated, and documented, and support user development of workflows. • Extend the technology so that workflows can be executed in a cloud-computing environment

  43. Suggested Direction • Hosted Workflow Solution– SaaS workflow tools • Globus Online • Galaxy

  44. Inventrio Shannon Hastings Stephen Langella Scott Oster Other colleagues from Ohio State University, National Cancer Institute, JHU … Acknowledgements • Univ. Chicago / ANL • Ian Foster • DinanathSulakhe • Bo Liu • Univ. Manchester, UK • Carole Goble • StianSoiland-Reyes • Alexandra Nenadic

  45. Journal papers & book chapters • Composition as a Service. IEEE Internet Computing. 2010 • A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid. CCPE. 2010. • Data-driven Service Composition in Building SOA Solutions: A Petri Net Approach. IEEE T-ASE, 2010 • Scientific workflows that enable Web-scale collaboration: combining the power of Taverna and caGrid. IEEE Internet Computing. 2008 • Workflow in a Service Oriented Cyberinfrastructure Environment. in: Junwei Cao (Ed.). Cyberinfrastructure Technologies and Applications. Nova Science Publishers, 2008. (book chapter)

  46. Conference papers • Scientific workflows as services in caGrid: a Taverna and gRAVI approach. ICWS 2009 • Wrap Scientific Applications as WSRF Grid Services using gRAVI. ICWS 2009 • Orchestrating caGrid Services in Taverna.ICWS 2008 • Building Scientific Workflow with Taverna and BPEL: a Comparative Study in caGrid. WESOA 2008 • Build Grid Enabled Scientific Workflows using gRAVI and Taverna.SWBES 2008

  47. Contact information • Ravi Madduri • madduri@mcs.anl.gov • Computation Institute, Univ. Chicago • http://www.ci.uchicago.edu/

More Related