1 / 15

Managing VO data and process flows

T HE US N ATIONAL V IRTUAL O BSERVATORY. Managing VO data and process flows. Matthew J. Graham CACR/Caltech. Overview. Astronomical data VOStore/VOSpace Workflows Astrogrid workflow CEA. VO Wheel™. The importance of data. Data is the raison d’être of the VO

leyna
Download Presentation

Managing VO data and process flows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THE US NATIONAL VIRTUAL OBSERVATORY Managing VO data and process flows Matthew J. Graham CACR/Caltech NVO Summer School 2005

  2. Overview • Astronomical data • VOStore/VOSpace • Workflows • Astrogrid workflow • CEA NVO Summer School 2005

  3. VO Wheel™ The importance of data • Data is the raison d’être of the VO • LSST is the data source nonpareil • data rates of 540MB/s  ~16TB in 8 hrs • final archive > 3PB of data • Well-established ways of handling distributed data: • SRB • PVFS • OGSA-DAI NVO Summer School 2005

  4. Data use cases • Client has data: • stored locally: transfers it to service • stored locally: service retrieves it • stored elsewhere: service retrieves it • Service generates data: • stores it locally: notifies client of location • transfers it to the client’s local store • transfers it to a client-designated store NVO Summer School 2005

  5. VOStore • Provides a uniform interface to existing or new data storage locations (Facade pattern) • Structured/unstructured data both first level • Methods: • get • put • list / listAll • importInit • importData (sync/async) • exportInit • exportData (sync/async) • delete • rename NVO Summer School 2005

  6. VOSpace • Orchestrates VOStores: • data collections: directories, user-defined • authorisation: user groups • processing efficiency: where is the nearest copy? • move • copy • identifiers NVO Summer School 2005

  7. A virtual super-peer data network? NVO Summer School 2005

  8. How to manage the flows? • Way of describing a flow: • processes/steps, inputs/outputs, serial/parallel execution, control logic, variables, inline scripting • preferably XML (verbose but rigourous) • Way of controlling a flow: engine • e-Science vs. e-Business: • open-ended vs. closed • verification and publication • static vs. dynamic workflows • volume and type of data • meta-transactions • customer, manager and user vs. scientist NVO Summer School 2005

  9. Workflow patterns Sequence: Parallel split Synchronisation Multi + Synchronizing Merge AND XOR Exclusive choice Simple Merge Multi + Multi Multi Multi choice Multi Merge Multi + Discriminator Deferred choice Multiple Instances with/out Synch Implicit termination Interleaved Parallel Routing Milestone NVO Summer School 2005

  10. Workflow kerfuffle • Workflow languages: BPEL (BPEL4WS, WSBPEL, WSFL, XLANG), BPML, WS-CDL (WSCL, WSCI) , XPDL, BPSS, PSL, AGWL, DGL, DPML, GJobDL, GSFL, GFDL, GWorkflowDL, MoML, SWFL, YAWL, SCUFL/Xscufl, WPDL, PIF, PSL, OWL-S, xWFL, XPL, INCA • Workflow engines: Taverna, Kepler, Pegasus, DiscoveryNet, Triana, SPA, Geodise, ICENI, Askalon, GridNexus, BioPipe, BizTalk, BPWS4J, DAGMan, GridAnt, GJH, GRMS, GWFE, GWES, ITIEE, JIGSA, Karajan, ScyFLOW, SDSC Matrix, SHOP2, wftk, YAWL Engine, WFEE NVO Summer School 2005

  11. Astrogrid workflow components • JES (Job Execution System) • Astrogrid workflow engine • Manages control flow • Runs steps in a controlled asynchronous fashion • CEC (Common Execution Controller) • Manages step execution • Manages data flow • CEA (Common Execution Architecture) apps • datacenters: support complex quesries against archives • processing: consume data files and reduce them NVO Summer School 2005

  12. Registry Command Line CEA Portal CEC JES Datacenter CEA MySpace Astrogrid workflow schematic Application list Resolve application Submit workflow Client library Save/load workflow Save/load data NVO Summer School 2005

  13. Astrogrid workflow language <workflow name=“a workflow”> <description>description of the workflow</description> <sequence/flow> <set var=“dec” value=“15”/> <step name=“a” result-var=“a-results”> <tool name=“toolA” interface=“simpleInterface”> <input> <parameter name=“RA”><value>21</value></parameter> <parameter name=“Dec”><value>${dec}</value></parameter> </input> <output> <parameter name=“results ”indirect=“true”> <value>ftp://aServer/myResults</value> </parameter> </output> </tool> </step> <step name=“b”>… </sequence/flow> <script>… <if test=…> <while test=…> <for var=… items=…> <parfor var=… items=…> <try> <catch> </workflow> NVO Summer School 2005

  14. CEA • Create a uniform interface and model for an application and its parameters • Provides higher level description than WSDL: • Restrict how interfaces can be expressed • Provide specific semantics for astronomical quantitites • Extra information, such as default values, GUI labels • VOResource extensions for a general application • Provide asynchronous operation: • callback, polling and job identification • Allow separate data and control flows NVO Summer School 2005

  15. Minimum CEA compliance • Must implement CommonExecutionConnector interface • Must send a message to services implementing ResultsListener interface • Should send messages to services implementing JobMonitor interface • Should perform basic type checking on all parameter types during init phase NVO Summer School 2005

More Related