1 / 28

The Way Things Go

The Way Things Go. e-Science is a complex activity Scientific knowledge is comprehensible only in the context of those activities Adopt the Rube Goldberg view. Rube Goldberg. Grand challenge: systems-scale science. Observation and modeling of multiple systems at multiple scales

cyrah
Download Presentation

The Way Things Go

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Way Things Go • e-Science is a complex activity • Scientific knowledge is comprehensible only in the context of those activities • Adopt the Rube Goldberg view Rube Goldberg

  2. Grand challenge: systems-scale science • Observation and modeling of multiple systems at multiple scales • Linking data and tools from different disciplines • to get a valid global result! “... modeling complex systems will be a major research challenge for the 21st century” - National Science Foundation

  3. Building current practices up isn't working • Heterogeneous tools, data formats • Little global coordination of research • Little funding for sustained stewardship of tools and data M.C. Escher, “Tower of Babel” (1928)‏

  4. Proposed solutions aren't working • e-Journals – not machine-interpretable • Collaboration tools • scientists just use email like everyone else • Portals and digital libraries – typically: • centralized • domain-specific • The Grid – can orchestrate complex processing jobs, but that's not science

  5. Only networks work at scale • Single researcher • Ad hoc data mgt, single-user apps • Community • Community tools, resources, control • Global • No global practice, tools, control Desktop Workgroup Network

  6. How do we get there? • e-Science means managing • Process, and • Data • Current approaches favor one or the other • Information is getting lost model refine predict observe critical interface data

  7. Trends: process  data process Workflow * provenance * the grid * portals Interactive * e-notebooks * desktop apps * digital libraries * rules * formats * ontologies Batch * mainframes data Data Metadata Semantics

  8. Key technologies • Semantic web: data/metadata • Provides means of merging descriptive information even if it only partially agrees (e.g., comes from two different communities)‏ • Workflow: process • Describes complex procedures independently of how they are executed • Provenance: process + data/metadata • Links workflow, data, and any ancillary descriptive information (e.g., attribution)‏

  9. Semantics: data to knowledge Knowledge Ontologies, rules, models, etc. (a.k.a. semantics)‏ Abstract Learning, inference Information Collections, tags, attributes, etc. (a.k.a. metadata)‏ Aggregation, annotation Data Streams, arrays, swaths, etc. (a.k.a. files)‏ Concrete (cf Reagan Moore)‏

  10. Semantic web: RDF triple subject predicate object • Declarative: asserts a fact • Subject and object URI's identify arbitrary entities (things, people, concepts, events)‏ • Predicate identifies the relationship between them

  11. Triples form an open network • Subject nodes aren't “owned” by any single agent or container • Any actor can add arcs to the implicit, total, world graph • Any two graphs can be joined hasBreed

  12. Non satis non scire(to know is not enough)‏ • Semantic web “layer cake” • Where do we manage process? • User interface? • Applications? • “Semantic Grid” (D. DeRoure, C. Goble)‏ (source: World Wide Web Consortium)‏

  13. Workflow: process description • Describe complex operations as networks of simpler operations • Abstract operation execution from description • Can be shared (but may not be portable)‏ (Taverna)‏ (Kepler)‏

  14. Anatomy of a workflow • Declarative: says what do to • Modules identify arbitrary procedures • Arcs identify flow of control and/or data (data flow is usually implicit)‏ Execution model (usu. implicit)‏ “Module” Control flow

  15. Workflow systems • Modules representing units of computation • Language for specifying WF • modules • control flow • Engine for executing WF D2K (source: NCSA)‏

  16. Work vs. workflow systems • Scientists are not WF modules • Science work also involves • social organization incl. funding • field and “wet lab” manual work • discourse: review, validation (source: CNRS/UCSD)‏

  17. Provenance: what happened • Answers critical questions • What led to this result? • When and how were observations made, conclusions reached? • Is a causal network of events

  18. Process-centric (e.g., workflow)‏ computational events (e.g., service invocations)‏ control flow artifacts are either not mentioned or opaque (tool-specific)‏ Complementary incomplete notions of provenance • Artifact-centric (e.g., digital libraries)‏ • “lineage”= events in lifecycle of artifact e.g., custody • IR's focus on curation events (not antecedent processes)‏

  19. Provenance Challenges 1 & 2 • IPAW 2006, HPDC 2007 • 20 teams, 1 workflow, 9 queries • major players • Interoperability? • lots of manual work required • call for standards (source: gridprovenance.org)‏

  20. Artifact + process provenance = “open provenance” • Can describe any process, not just WF execution (e.g., science!)‏ • Allows alternate accounts by different observers • Rules for inferring transitive causal relationships (source: Luc Moreau et al)‏

  21. Open Provenance Model (source: Luc Moreau et al)‏ • 3 node types – artifact, process, agent • 5 arc types – used, generated, triggered, derived, controlled – and inference rules • Generic – extensibility via annotation • Choice of granularity and focus (e.g., artifact or process-centric)‏

  22. NCSA Provenance Infrastructure Visualization, interaction destkop, portal, etc. Tracking, modeling, presentation OPM toolkit OPM toolkit Open Provenance Model Tupelo Semantic Content Repository Context Context Context Abstraction, inference, storage Store Store Store

  23. Tupelo: semantic content • Abstracts content from storage impls (e.g., Sesame, Mulgara)‏ • Provides location-independent addressing of content and metadata • Supports transparent mirroring, caching, failover, etc. (tupeloproject.org)‏

  24. CyberIntegrator: workflow by example • Records what users do as provenance • source, intermediate, and final artifacts • steps and parameters • Can re-enact interaction as a workflow

  25. MAEviz: analaysis/viz app, workflow “behind the scenes” • GIS app. platform • Earthquake hazard analysis plug-in • Data catalog • built environment • fragility/hazard models • Driven by workflow -> provenance

  26. CyberCollaboratory: collaboration + provenance • User interaction with tools generates events • Events are captured using the OPM and published to Tupelo • Non-portal apps can browse / use provenance

  27. Summary • “The way things go” is critical to e-Science at scale • Provenance is an open causal network • New infrastructure supports provenance

  28. Resources / acknowledgements • Grid Provenance Challenge • http://twiki.gridprovenance.org/ • NCSA technologies • Tupelo: http://tupeloproject.org/ • CyberIntegrator: http://isda.ncsa.uiuc.edu/ • MAEviz: http://maeviz.cee.uiuc.edu/ • CyberCollaboratory: http://ecid.ncsa.uiuc.edu/cybercollab/ • Acknowledgements: • Jim Myers, Luc Moreau, Juliana Friere, Patrick Paulson, Simon Miles, Bob McGrath, and more ...

More Related