1 / 5

Design Principles

Design Principles. Separation between components into a modular system Independent standalone modules, that are also runnable programs Collaborator wants to run srf2FastQ at home, without a MetaDB Researcher tries custom parameters, but still track his run in the MetaDB

kaili
Download Presentation

Design Principles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design Principles Separation between components into a modular system • Independent standalone modules, that are also runnable programs • Collaborator wants to run srf2FastQ at home, without a MetaDB • Researcher tries custom parameters, but still track his run in the MetaDB • XML Workflows that defines jobs and data dependencies • Parameterized to reuse workflows on different experiments • Based on DAX standard • Execution engine uses open-source Pegasus project • Wraps standard executables, so no modification to your code • Supports multiple cluster submission, including clusters living on EC2 and other clouds • Uses Globus to support SGE, PBS, Torque, Condor, LSF • Stages data and binaries to the appropriate cluster from whichever cluster has them • Manages temporary space and processing environment • creating temp directories, moving input files in, staging and running your program, copying results out

  2. Java API: public interface WrapperInterface { int init(); // Optional int get_syntax(); int do_test(); int do_verify_input(); int do_verify_parameters(); int do_run(); int do_verify_output(); int clean_up(); // Optional } Application Wrapper Interface • Application conforms to a standard interface • Developers and users to not have to understand rest of the the pipeline • Force users to adhere to best practices • Syntax, --help option • Required test harness • Verifications of input, output, parameters • Wrapped applications must be runnable both Local Execution: $ java SeqWareRunner bpostprocess --help → Reports get_syntax() $ java SeqWareRunner bpostprocess input → Run bpostprocess on the command line $ java SeqWareRunner bpostprocess --db input → Same as above, but without MetaDB feedback $ java SeqWareRunner bpostprocess --db input --config=config.txt $ java SeqWareRunner bpostprocess --db input -A 0 -n 8

  3. XML Workflow • Follows DAX Standard, which is input to Pegasus • Defines jobs, arguments, configuration, and data dependencies • Defines dependencies between jobs • Use Java Freemarker to populate the XML template for each experiment <!-- Dependencies --> <child ref="ID0000002"> <parent ref="ID0000001"/> </child> <child ref="ID0000003"> <parent ref="ID0000001"/> <parent ref="ID0000002"/> </child> </adag> </xml> <?xml version="1.0" encoding="UTF-8"?> <adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-2.1.xsd" version="2.1" count="1" index="0" name="bfast" jobCount="3" fileCount="0" childCount="2"> <!-- jobs --> <job id="ID0000001" namespace="seqware" name="runner" version="0.0.1"> <argument>bfast matches %{reference_file} %{experiment}.fastq...</argument> <profile namespace="globus" key="max_memory">24576</profile> <profile namespace="globus" key="count">8</profile> <uses file="%{experiment}.fastq" link="input"> <uses file="%{experiment}.bmf" link="output" transfer="false" register="false"> </job> <job id="ID0000002" namespace="seqware" name="runner" version="0.0.1"> <argument>bfast localalign ...</argument> <uses file="%{experiment}.bmf" link="input"> <uses file="%{experiment}.baf" link="output" transfer="false" register="false"> </job> <job id="ID0000003" namespace="seqware" name="runner" version="0.0.1"> <argument>bfast postprocess ...</argument> <uses file="%{experiment}.bmf" link="input"> <uses file="%{experiment}.bam" link="output" transfer="true" register="true"> </job> .....

  4. Pegasus • Each task is a standalone application, independently runnable • Scientific says 'how do I run Bfast' • Collaborator wants to run srf2FastQ at home, but does not have a pipeline or Metadata DB • Researcher wants to try some custom parameters, but we still want to try his run in the Metadata DB • Each application conforms to a standard, well-defined interface • The interface is abstract enough for users to wrap their applications without knowing anything about the pipeline • The interface forces users to adhere to best practices • Syntax, --help option • Required test harness • Verifications of input, output, parameters • Wrapped applications must be runnable both

  5. Pegasus • Each task is a standalone application, independently runnable • Scientific says 'how do I run Bfast' • Collaborator wants to run srf2FastQ at home, but does not have a pipeline or Metadata DB • Researcher wants to try some custom parameters, but we still want to try his run in the Metadata DB • Each application conforms to a standard, well-defined interface • The interface is abstract enough for users to wrap their applications without knowing anything about the pipeline • The interface forces users to adhere to best practices • Syntax, --help option • Required test harness • Verifications of input, output, parameters • Wrapped applications must be runnable both

More Related