1 / 50

Designing, Executing and Sharing Workflows with Taverna 2.2

Designing, Executing and Sharing Workflows with Taverna 2.2. Katy Wolstencroft myGrid University of Manchester. Exercise 1: Exploring the Workbench. Taverna can be downloaded from http://www.taverna.org.uk/ Go to the page and click on download Taverna 2.2.0

druce
Download Presentation

Designing, Executing and Sharing Workflows with Taverna 2.2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester

  2. Exercise 1: Exploring the Workbench Taverna can be downloaded from http://www.taverna.org.uk/ Go to the page and click on download Taverna 2.2.0 Download the correct version for your operating system Follow the instructions in the Taverna installer The following page shows a screenshot of Taverna and the different panels that make up the workbench

  3. Taverna Workbench Workflow Diagram Services Panel Workflow Explorer

  4. 1. Workflow Explorer • The Workflow Explorer is the primary editing component within Taverna. Through it you can load, save and edit any property of a workflow. • Details of workflow validation can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available • The workflow explorer is also where you find configuration details of services and advanced options like iteration and looping. We will come back to these things later

  5. 1. Workflow Diagram The visual representation of workflow • Shows inputs / outputs, services and control flows • Allows editing of the workflow by dragging and dropping and connecting services together • Enables saving of workflow diagrams for publishing and sharing

  6. 1. Available Services Panel Lists services available by default in Taverna • Local java services • Simple web services • Soaplab services – legacy command-line application • R Processor • BioMart database services • BioMoby services • Beanshell processor Allows the user to add new services or workflows from the web or from file systems – there are loads more available!

  7. Exercise 2: Adding New Services • New services can be gathered from anywhere on the web • We will find a new service and add it to the workbench • You can find more services in the BioCatalogue • The BioCatalogue is a public curated catalogue of Life Science web services from Manchester and the EBI

  8. 2: Adding New Services Go to: http://www.biocatalogue.org and explore. Through the BioCatalogue you can find, register, or annotate web services

  9. 2. Adding New Services • Type ‘blast’ into the Search box in the BioCatalogue • Select the Blast service from the DDBJ (Hint – it is from Japan) There it is!

  10. 2. Adding New Services • Clicking on the blast service brings • you to the page describing the • service and its operations • Copy the service WSDL location This is what Taverna needs…

  11. 2. Adding New Services Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service Select ‘WSDL service…’ A window will pop-up asking for a web address

  12. 2. Adding New Services Enter the Blast Web service address you just copied Scroll down to the bottom of the Services list and look at the new DDBJ service that is now included.

  13. Exercise 3: Building a Simple Workflow Go to the Services Panel • Type ‘Fasta’ into the ‘search’ box at the top of the panel • You will see several services in the search results • Select ‘Get Protein FASTA’. This service returns a protein sequence in Fasta format from a database if you supply it with a sequence id • Drag this service across to the workflow explorer panel

  14. Exercise 3: Building a Simple Workflow • In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port” • Type in a name for this input (e.g. ID) and click “ok” • Do the same to create a new workflow output. Call this output “sequence”

  15. Exercise 3: Building a Simple Workflow You now have 3 boxes in the diagram and we need to connect them up Click on the input box and drag towards “Get Protein Fasta” and let go. An arrow will connect the two boxes

  16. Exercise 3: Building a Simple Workflow Click on the output box, drag towards “Get protein fasta”, and let go. An arrow will connect the two boxes You have now built your first workflow! It should look something like this

  17. Exercise 3: Building a Simple Workflow Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

  18. Exercise 3: Building a Simple Workflow An input window will appear. As you can see, we have not yet added a description of the workflow or of the input Click on ‘New Value’ in the input window and add a Genbank Gene identifier (e.g. 215422388) where it says “some input data goes here”

  19. Exercise 3: Building a Simple Workflow Click “run workflow” In the bottom left of the results window, click on the results. You will now see a protein sequence from genbank In the services panel, search for “blast” Find the result “SearchSimple – Execute Blast” and drag that across to the workflow panel (this is the service we added at the beginning)

  20. Exercise 3: Building a Simple Workflow Now we have 2 services to connect into a workflow. We will connect “Get_protein_fasta” to “SearchSimple” by right-clicking “Get_protein_fasta” and selecting “link from output output_text” You will get an arrow. Drag the arrow to “searchSimple”. A box will appear asking which port you want to connect to – select “query”. Now the services are connected

  21. 3: Building a Simple Workflow If you show the service ports, you can connect directly between an output port on one service and an input port on another Show the service ports by clicking on the blue square icon at the top of the workflow diagram (next to abc)

  22. Exercise 3: Building a Simple Workflow We need to finish building the workflow by adding inputs and outputs Right click on “SearchSimple -> Result” and select “connect as input to..New Workflow Output Port”

  23. Exercise 3: Building a Simple Workflow Taverna will suggest a name for the output, if this is ok, select “ok” Add two new workflow inputs (called ‘database’ and ‘program’) and connect these to ‘database’ and ‘program’ in SearchSimple

  24. 3: The Finished Workflow! Your workflow should look something like this

  25. 3. Validate your Workflow Taverna can check to see that everything is connected properly and that all the services in your workflow are available Go to the workflow explorer and click on ‘validation report’ See if Taverna has found any problems with the workflow. Errors will be displayed in red, warnings in yellow. Workflows with warnings often still run. If there are problems, follow the instructions to resolve them by clicking on the ‘Solution’ tab

  26. 3: Adding a Workflow Description Right-click on a blank part of the workflow diagram and select “show details” In the workflow explorer panel, the details page will open up. Add some details about the workflow e.g. who is the author, what does it do You can also add examples and descriptions for the workflow inputs by selecting them and selecting “details” An example for database is ‘SWISS’, for program, ‘blastp’, and for ID ‘215422388’ Save the workflow by going to “File -> save workflow”

  27. 4. Running the Workflow Go to “File -> run workflow”. A workflow input window will appear like before This time, each input has its own tab with descriptions and examples as well as a panel to enter data In the fasta_id input, select “New value” and add a genbank GI number (e.g. 215422388) In the database, add “SWISS” In the program, add “blastp” Select “run workflow” at the bottom of the panel to set the workflow going

  28. 4. Setting String Constants • For parameters that do not change often, you will not wish to always type them in as input. In this example, the database and blast program may only change occasionally, so there is an alternative way of defining them. • Go back to the workflow diagram and remove the ‘database’ and ‘program’ inputs by right-clicking and selecting ‘Delete workflow input port’

  29. 4. Setting String Constants • In a blank space in the workflow diagram, right-click and select ‘string constant’ • In the pop-up box add ‘SWISS’ as a value and change the name of the string constant to database • Connect this to the database port on the BLAST service • Create another string constant with a value ‘blastp’ and the name ‘program’ • Connect this to the program port on the BLAST service • Save the workflow and run it again – this time you will only be asked for one input

  30. 4. Checkpoint Exercise • Now modify your workflow so that BLAST searches across all protein databases and you only get back the top 5 hits in a tabular format • HINT: you will need to swap SearchSimple for another service from the same set.

  31. Exercise 5: Sharing Workflows Go to http://www.myexperiment.org myExperiment is a social networking site for sharing workflows and workflow expertise and experiences Browse around the site and see what it contains Create yourself an account and (if you wish) join the group called Tutorial (a useful place to share items from today)

  32. 5. Sharing workflows Explore myExperiment Which is the most downloaded workflow? Which is the most viewed workflow? Is it the same? Explore the workflows packs – how many packs feature workflows for microarray analysis? Find all the items relating to Systems Biology. How did you find them? How many are there? Can all the workflows be downloaded?

  33. 6. Using Workflows from myExperiment You can download and run the workflows from the myExperiment website, or you can use myExperiment directly from Taverna Go back to Taverna and click on the myExperiment icon at the top of the workbench In the search box, type ‘Kegg’. We are going to find all the workflows that explore kegg pathways In the results, find the workflow called “NCBI GI to Kegg Pathways” (by Paul Fisher)

  34. 6. Using Workflows from myExperiment We will add this workflow to our own blast workflow by clicking ‘import’ and selecting ‘Add as nested workflow’ in the pop-up window. NOTE: If you add a workflow as a nested workflow, it continues to be a separate module (a workflow within a workflow). We recommend this modular approach because it is easier to combine and reuse these functional models. You need to connect up the workflow as if it was any other kind of service

  35. 7. Reusing and connecting Workflows The nested workflow has 1 input and 4 outputs Connect the outer workflow input ‘ID’ to the nested workflow input

  36. 7. Reusing and connecting Workflows Create 2 new outputs (by right-clicking on the blank canvas) and call them ‘pathways’ and ‘pathway_descriptions’ Connect the nested workflow output ‘pathway_by_gene’ to the ‘pathways’ output and connect ‘pathway_descriptions’ to ‘pathway_descriptions’

  37. 7. Reusing and connecting Workflows Save the workflow and run it As the workflow runs, track its progress by looking at the graphical view and the progress report in the results panel. As services finish, they turn grey. You can pause and resume the workflow if you wish (this is more useful with longer running workflows!) Look at the results This time, you will have blast results and kegg pathway results

  38. 7. Looking at Intermediate Results You can also track intermediate workflow values through the results view. This is very useful for working out where unexpected results came from. On the diagram, click the service called ‘btit’ and look at its inputs and outputs in the results. This gives you the gene names plus a short description You can save the workflow back onto myExperiment if you wish, but make sure you give credit to the nested workflow author! We will come back to combining workflows later

  39. Controlling data flow in Workflows Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options, such as setting iteration strategies and adding loops to your workflows

  40. As you have already seen, Taverna can automatically iterate over sets of data. When 2 sets of iterated data are combined, however, Taverna needs extra information about how they should be combined. You can have: A cross product – combining every item from list 1 with every item from list 2 - all against all A dot product – only combining item 1 from list 1 with item 1 from list 2, and so on – line against line 8. Iteration

  41. Find and load the workflow ‘Demonstration of configurable iteration’ from myExperiment Read the workflow metadata to find out what the workflow does (by looking at the ‘Details’) Select the ‘ColourAnimals’ service and select the ‘Details’ in the workflow explorer and ‘configure list handling’ Click on ‘dot product’ in the pop-up window. This allows you to switch to cross product 8. Iteration

  42. Run the workflow twice – once with ‘dot product’ and once with ‘cross product’. Save the first results so you can compare them – what is the difference? What does it mean to specify dot or cross product? 8. Iteration

  43. 9. Looping From myExperiment, load the workflow ‘EBI_InterproScan_NewServices’ by Katy Wolstencroft This workflow is asynchronous. This means that when you submit data to the ‘runInterproScan’ service, it will return a jobID and place your job in a queue (this is very useful if your job will take a long time!) The ‘Status’ nested workflow will query your job ID to find out if it is complete

  44. 9. Looping The default behaviour in a workflow is to call each service only once for each item of data – so what if your job has not finished when ‘Status’ workflow asks? Run the workflow Almost every time, the workflow will fail because the results have not been returned before the workflow reaches the ‘get_results’ service

  45. 9. Looping This is where looping is useful. Taverna can keep running the ‘status’ service until it reports that the job is done. Select the ‘Status’ nested workflow and click on the ‘details’ tab in the workflow explorer Select ‘advanced’ and click on ‘add looping’ Use the drop-down boxes in the looping window to set ‘get_status_output_status’ ‘is_not_equal_to’ RUNNING

  46. 9. Looping Save the workflow and run it again This time, the workflow will run until the ‘Status’ nested workflow reports that it is either DONE, or it has an ERROR. You will see results for ‘TextResults’, but you will still get an error for ‘Graphical_results’ and ‘XML_results’. This is because there is one more configuration to change – we also need ‘Control Links’

  47. A control link specifies that there is a dependency of one service on another even though there is no data flowing between them. A control link is a line with a white circle at the end that connects two services (see the link between the ‘Status’ nested workflow and ‘get_Result_input’ 10. Control Links

  48. 10. Control Links We will add control links to the other two output types Right-click on getResult_graphical_input and and select ‘Run after’ from the drop down menu. Set it to ‘Run after’ -> ‘Status’ Repeat this for the XML result Save and run the workflow Now you will see each result returned

  49. 11. Retries: Making your workflow more Robust Web services can sometimes fail due to network connectivity If you are iterating over lots of data items, you can guard against these temporary interruptions by adding retries to your workflow Upload the ‘retry-example’ workflow from the myExperiment pack. This workflow is designed to fail sometimes. Run the workflow as it is and count the number of failed iterations

  50. 11. Retries: Making your workflow more Robust Now, select the ‘sometimes_fails’ service and select the ‘details’ tab in the workflow explorer panel Click on ‘advanced’ and ‘configure’ for retries In the pop-up box, change it so that it retries each service iteration 2 times Run the workflow again – how many failures do you get this time? Change the workflow to retry 5 times – does it work every time now?

More Related