1 / 64

Efrat Jaeger, Ilkay Altintas

Using Scientific Workflows in GEON. Efrat Jaeger, Ilkay Altintas. Mission of scientific workflow systems. Promote “scientific discovery” by providing tools and methods to generate scientific workflows

Download Presentation

Efrat Jaeger, Ilkay Altintas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Scientific Workflows in GEON Efrat Jaeger, Ilkay Altintas

  2. Mission of scientific workflow systems • Promote “scientific discovery” by providing tools and methods to generate scientific workflows • Create a generic customizable graphical user interface for scientists from different scientific domains • Support computational experiment creation, execution, sharing, reuse and provenance • Design frameworks which define efficient ways to connect to the existing data and integrate heterogeneous data from multiple resources • Large scale resource sharing and management • Collaborative and distributed applications • Gluing it all together to user’s monitor!!!

  3. Utilizing Kepler in GEON • An extensible, easy to use, workflow design and prototyping tool • Integrating heterogeneous local and remote tools in a single interface: • Web and Grid services • GIS services • Legacy application integration via Shell-Command actor • Remote tools via SSH, SCP and GridFTP • Relational and spatial databases access • Reusable generic and domain specific actors • Support for High Performance Computations: • Job submission and monitoring • Logging of execution trace and registering intermediate products • Data provenance and failure recovery • Portal accessibility. • Deployment of workflows to the GEON portal • Harvesting data and tools from repositories: • Direct access to data and tools registered to the GEON portal • A web service harvester • Storage Resource Broker (SRB) • Reverse engineering of existing approaches

  4. Actor-Oriented Design • Actor • Encapsulation of parameterized actions • Interface defined by ports and parameters • Port • Communication between input and output data • Without call-return semantics • Composite Actors • Abstract information • Sub-workflows • Model of computation (Director) • Communication semantics among ports • Flow of control

  5. Workflow Design and Prototyping Data Search Actor Search • Vergil is the graphical user interface for Kepler • Actor ontology and semantic search for actors • Search -> Drag and drop -> Link via ports • Metadata-based search for datasets

  6. Actor Search • Kepler Actor Ontology • Used in searching actors and creating conceptual views (= folders) • Currently more than 200 Kepler actors added!

  7. Data Search and Usage of Results • Kepler DataGrid • Discovery of data resources through local and remote services • SRB, • Grid and Web Services, • Db connections • Registry of datasets on the fly using workflows

  8. Integrating heterogeneous local and remote tools in a single interface • Generic Web Service Client and Web Service Harvester • GIS Services • Legacy Application Integration via Command Line wrapper tools, e.g. GMT • RDBMS and Spatial Databases Access • Remote Tools Access via SSH, SCP and GridFTP • Some Grid actors-Globus Job Runner, GridFTP-based file access, Proxy Certificate Generator • Generic and domain-oriented actors: • Classification and interpolation algorithms • Native R support • Imaging, Gridding, Vis Support • Textual and Graphical Output • …more …

  9. Some Features • Support for High Performance Computations • Job submission and monitoring • Logging of execution trace and registering intermediate products • Data provenance and failure recovery • Portal accessibility • Deployment of workflows to the GEON portal • Harvesting data and tools from repositories • Direct access to data and tools registered to the GEON portal • A web service harvester • Storage Resource Broker (SRB)

  10. Some actors in place

  11. GEON Workflows Examples

  12. GEON Mineral Classification Workflow An “early” example: Classification for naming Igneous Rocks.

  13. GEON Mineral Classifier Workflow

  14. PointInPolygon algorithm

  15. Enter initial inputs, Run and Display results

  16. Output Visualizers Browser display of results

  17. Integration Scenario: A-type query • Classifying A-types from an Igneous rock database • Integrating between Relational and Spatial (shapefiles) databases to query and interactively display GIS results • Reusing existing and generic Kepler components (Classifier, JDBC) Ghulam Memon, Ashraf Memon

  18. Classification sub-workflow runs for … … each body, each sample and each diagram Reusing The Mineral Classifier

  19. Output

  20. SQL database access (JDBC) Query the UTEPgravity database for bouguer anomalies Extraction of Datasets on the Fly Translating query xml response to web service xml input format. worldImage XML SOAP response

  21. Creating shapefiles on the fly using ESRI mapping services Translating query xml response to web service xml input format. worldImage XML SOAP response Displaying an image of the shapefile on a browser interface Extraction of Datasets on the Fly

  22. Image of the resulting dataset Sample

  23. Annotation form GEON Dataset Registration (as in geonSearch)

  24. ADN metadata Metadata display validation GEON Dataset Registration Registering

  25. Putting it all together

  26. Beach Balls Workflow GOAL: Integrate seismic focal mechanisms with image services

  27. Beach Balls Workflow Output

  28. Gravity Modeling Workflow Observed Gravity Topography Pluton map Sediments Moho Output Residual Map Differencecalculator Densities Source: (GEON) Dogan Seber, Randy Keller Interactive 3D model Defining possible depth distribution of plutons

  29. ToDo Kepler as a Modeling Tool: Gravity Modeling Workflow • Comparing between synthetic and observed gravity models of heterogeneous data sources. Creating a residual map of the difference using ESRI services and displaying it on a web browser • Portrays Kepler as a prototyping tool (“ToDo”) • Adjustable parameter-wise Joint work betweenSDSC and UTEP.

  30. ToDo Gravity Modeling Workflow

  31. R. Haugerud, U.S.G.S LiDAR Introduction Survey Interpolate / Grid Process & Classify D. Harding, NASA Point Cloud x, y, zn, … Analyze / “Do Science”

  32. The Computational Challenge: • LiDAR generates massive data volumes - billions of returns are common. • Distribution of these volumes of point cloud data to users via the internet represents a significant challenge. • Processing and analysis of these data requires significant computing resources not available to most geoscientists. • Interpolation of these data challenges typical GIS / interpolation software. • our tests indicate that ArcGIS, Matlab and similar software packages struggle to interpolate even a small portion of these data. • Traditionally: Popularity > Resources

  33. A Three-Tier Architecture • GOAL: Efficient LiDAR interpolation and analysis using GEON infrastructure and tools • GEON Portal • Kepler Scientific Workflow System • GEON Grid • Use scientific workflows to glue/combine different tools and the infrastructure Portal Grid

  34. Analyze Visualize move process move render display Kepler can be used as a batch execution engine Portal • Configuration phase • Subset: DB2 query on DataStar Monitoring/ Translation Subset • Interpolate: Grass RST, Grass IDW, GMT… • Visualize: Global Mapper, FlederMaus, ArcIMS Scheduling/ Output Processing Grid

  35. Analyze Arizona Cluster Visualize move process Datastar move render display Fledermaus CreateScene file iView3D/Browser sd d1 IBM DB2 NFS Mounted Disk Lidar Processing Workflow (using Fledermaus) Subset d2 d1 d2 (grid file) d1 d2 NFS Mounted Disk

  36. Analyze Arizona Cluster Visualize move process Datastar move render display Global Mapper Get image for grid file Browser d1 IBM DB2 NFS Mounted Disk Lidar Processing Workflow (using Global Mapper) Subset d2 d1 d2 (grid file) d1 d2 NFS Mounted Disk

  37. Analyze Arizona Cluster Visualize ArcIMS move process Datastar move render display ArcInfo ArcSDE ArcIMS d1 IBM DB2 NFS Mounted Disk Lidar Processing Workflow (using ArcIMS) Subset d2 (grid file) d1 d1 d2 NFS Mounted Disk

  38. Lidar Workflow Portlet • User selections from GUI • Translated into a query and a parameter file • Uploaded to remote machine • Workflow description created on the fly • Workflow response redirected back to portlet

  39. x,y,z and attribute Client/ GEON Portal NFS Mounted Disk DB2 Render Map raw data ArcSDE ArcInfo ArcIMS Parameter xml process output Create Workflow Description Map Parameters Grass Functions Map onto the grid (Pegasus) DB2 Spatial query Grass surfacing algorithms: Spline IDW block mean … Compute Cluster Binary grid ASCII grid Text file Tiff/Jpeg/Gif submit ASCII grid Download data KEPLER WORKFLOW LIDAR POST-PROCESSING WORKFLOW PORTLET

  40. Portlet User Interface - Main Page

  41. Behind the Scenes: Workflow Template

  42. Filled Template

  43. Example Outputs

  44. With Additional Algorithms

  45. GLW Monitoring • Job management • A unified interface to follow up on the status of submitted jobs The system • View job metadata • Zoom to a specific bounding box location • Track errors • Modify a job and re-submist • View the processing results • In the future, register desired workflow products • Useful for publication • GLW is exposed to a high risk of components failures • Long running process • Distributed computational resources under diverse controlling authorities • Provides transparent/background error handling using provenance data and ‘smart’ reruns

  46. Examples • Searching for actors and datasets • Actor search for ‘gis’ • Data search for ‘volcanic’ • Create a “Hello World!” workflow • <KEPLER_DIR>/demos/getting-started/04-HelloWorld.xml • Use of GEON data source and portal search • Search for ‘Igneous’ • Relational Database Access and Query • Connect to VT Igneous rocks database: • Database format: DB2 • URL: jdbc:db2://data.sdsc.geongrid.org:60000/IGNEOUS • User: readonly • Passwd: read0n1y • Web service based workflows • <KEPLER_DIR>/demos/getting-started/06-WebServicesAndDataTransformation.xml • Composite actors • Invoke a remote application – SSH • ls to a remote directory • Using various interpolation algorithms • interpolation actor • invoking a perl script through ssh • through a web service

  47. GEON Mineral Classifier Workflow Demo

  48. Atype Workflow Demo

  49. Datasets Extraction and Registration Demo

More Related