Loading in 2 Seconds...
Loading in 2 Seconds...
Towards Scientific Workflows Based on Dataflow Process Networks (or from Ptolemy to Kepler). Bertram Lud ä scher San Diego Supercomputer Center [email protected] A Note on the Style of the following Slides. Due to lack of time, most of the following slides will be “by reference” only ;-)
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
San Diego Supercomputer Center
Due to lack of time, most of the following slides will be “by reference” only ;-)
…Each speaker was given four minutes to present his paper, as there were so many scheduled -- 198 from 64 different countries. To help expedite the proceedings, all reports had to be distributed and studied beforehand, while the lecturer would speak only in numerals, calling attention in this fashion to the salient paragraphs of his work. ... Stan Hazelton of the U.S. delegation immediately threw the hall into a flurry by emphatically repeating: 4, 6, 11, and therefore 22; 5, 9, hence 22; 3, 7, 2, 11, from which it followed that 22 and only 22!! Someone jumped up, saying yes but 5, and what about 6, 18, or 4 for that matter; Hazelton countered this objection with the crushing retort that, either way, 22. I turned to the number key in his paper and discovered that 22 meant the end of the world… [The Futurological Congress, Stanislaw Lem, translated from the Polish by Michael Kandel, Futura 1977]
From: SciDAC/SDM project and collaboration w/ Matt Coleman (LLNL)
Conceptual Workflow (Promoter Identification Workflow PIW)
For each gene
Interactive nature of these workflows is critical (data verification) - can these steps be automated or semi-automated?
need metadata from collection equipment and experimental design !
Test sample (d)
Native range prediction
(native range) (c)
Environmental layers (native
area prediction map (f)
(invasion area) (c)
Environmental layers (invasion area) (b)
Species presence &absence points (invasion area) (a)GARP Invasive Species Pipeline
From: NSF SEEK (Deana Pennington et al)
data provenance (“virtual data”; cf. several ITR and e-Science projects)
Source: Expressiveness and Suitability of Languages for Control Flow Modelling in Workflows, PhD thesis, Bartosz Kiepuszewski, 2002
Source: W.M.P. van der Aalst et al.
must see (now: snippets following; watch for new ways to compress slides ;-)
a sophisticated system to do “simple” things (dataflows) as well as highly complex things (hybrid models)
(compare to your favorite standard/approach/system)
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/
Think of it as “Workflow Execution Model++”
X = F(X,I)
(my vote is for ‘Kepler’…)
SemType m1 ::
Observation & itemMeasured.AbundanceCount &
DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty
XML raw-data =(X)Query=> object model =link => OWL ontology
See why we said user-definable (or auto-generated) actor libraries?
designed to fit
hand-crafted control solution; also: forces sequential execution!
No data transformations available
Complex backward control-flow
genBankG :: GeneId -> GeneSeqgenBankP :: PromoterId -> PromoterSeqblast :: GeneSeq -> [PromoterId]promoterRegion :: PromoterSeq -> PromoterRegiontransfac :: PromoterRegion -> [TFBS]gpr2str :: (PromoterId, PromoterRegion) -> Stringd0 = Gid "7" -- start with some gene-id d1 = genBankG d0 -- get its gene sequence from GenBankd2 = blast d1 -- BLAST to get a list of potential promotersd3 = map genBankP d2 -- get list of promoter sequences d4 = map promoterRegion d3 -- compute list of promoter regions and ...d5 = map transfac d4 -- ... get transcription factor binding sitesd6 = zip d2 d4 -- create list of pairs promoter-id/regiond7 = map gpr2str d6 -- pretty print into a list of strings d8 = concat d7 -- concat into a single "file" d9 = putStr d8 -- output that file
(= a data streaming model!)
Re-introducing map(f) to Ptolemy-II (was there in PT Classic)
no control-flow spaghetti
free concurrent execution
free type checking
automatic support to go from piw(GeneId) to PIW :=map(piw) over [GeneId]Simplified Process Network PIW
Powerful type checking
Generic, declarative “programming” constructs
Generic data transformation actors
Forward-only, abstractable sub-workflow piw(GeneId)
optimization via functional rewriting possible
e.g. map(fog) = map(f) o map(g)
Technical report &PIW specification in HaskellOptimization by Declarative Rewriting I
map(fo g) instead ofmap(f) o map(g)
Combination of map and zip
e.g., Haskell-like for FP and SQL (XQuery)-like for (XML) database queryingOptimization by Declarative Rewriting II
Source: Real-Time Signal Processing: Dataflow, Visual, and Functional Programming, Hideki John Reekie, University of Technology, Sydney
FYI: Flow-based programming has been re-discovered/re-invented several times:
Flow-based Programming, http://www.jpaulmorrison.com/fbp/index.shtm