1 / 28

High Performance Workflows for Networks and Grids

High Performance Workflows for Networks and Grids. Andrew H. Sherman Chief Technology Officer sherman@turboworx.com. Outline. Technical Computing Workflows Deploying Workflows in HPC Environments TurboWorx Workflow Products. Complex Technical Computations are Critical in Many Industries.

ogden
Download Presentation

High Performance Workflows for Networks and Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

  2. Outline • Technical Computing Workflows • Deploying Workflows in HPC Environments • TurboWorx Workflow Products

  3. Complex Technical Computationsare Critical in Many Industries • Complex technical computing problems and algorithms have become “business critical” • Solutions often involve integrating several applications and many data sources into workflows • Automated coarse-grain parallelism and grid computing are emerging as key technologies

  4. Complex Technical Computationsare Critical in Many Industries • Complex technical computing problems and algorithms have become “business critical” • Solutions often involve integrating several applications and many data sources into workflows • Automated coarse-grain parallelism and grid computing are emerging as key technologies Life Sciences & Medicine Discovery and Development • Data- & compute-intensive applications • Huge databases from multiple sources & in diverse formats • Manual workflows Information-Based Medicine • Complex, heterogeneous databases & applications • Better and more effective diagnosis & treatment from faster, more accurate information interpretation Automotive/Aero Design and Development • Concurrent Engineering requires integration and collaboration between Concept, Design and Development processes • Global design teams that work around the clock • Suppliers part of the design and development process Finance Portfolio Management/Pricing • Scenario-based modeling • Huge quantities of real-time data • Time is money!

  5. What is a Workflow? “The automation of a business process, in whole or parts, where documents, information or tasks are passed from one participant to another to be processed, according to a set of procedural rules” — Workflow Management Coalition

  6. Technical Computing Workflows How do technical computing workflows differfrom traditional business process workflows? • Data flow vs. control flow • Widely distributed data (often with multiple owners) • Dynamic operating environment (e.g., the Grid) • Hierarchical workflow constructs • Requirement for parameterized executions • Evolving/Customized workflow definitions • Significance of collaboration and reuse

  7. TechnicalWorkflows Characterizing Technical Computing Workflows Business Value Repetition Ref: Production Workflows (Leyman, Roller)

  8. DatabaseServer ComputationCluster Linux UNIX HPC Platforms: SMPs & Clusters Shared Memory Multiprocessor • Expensive to buy, costly to upgrade • Poor scalability for computation • Best use: Data storage & access • Blade Solutions • Similar attributes to Linux clusters • More compact — Better flops/ft3 • Often cheaper • Linux Clusters • Cost-effective • Scalable • Modular — easy to upgrade to faster, better cpus (e.g. 64-bit) • Great for computation

  9. DatabaseServer ComputationCluster Linux UNIX AIX Windows Mac OS X Linux Linux Linux HPC Platforms: Enterprise Grids • Enterprise Grids • Efficient - Uses all the hardware available • Provides user comfort and familiarity • More than cycle stealing on idle desktops — usually includes computing on heterogeneous collections of servers • Great for computation, particularly for Life Sciences, where desktop platforms are appropriate for many algorithms

  10. Technical Computing and Workflows Workflows can address some critical computing challenges: • Integrate, manage, and accelerate collections of heterogeneous applications, data, and platforms • Provide horsepower to process massive amounts of data by applying parallelism without source code modification • Address the needs of key user groups (end users, application experts, and IT staff) through easy-to-use interfaces • Facilitate collaboration and reuse to save time in the design, trials and testing, and deployment of new computing solutions

  11. But . . . There are difficulties to overcome: • Scalability & performance: going beyond multithreading with “transparent parallelism” • Management of dynamic computing environments • Automated data and application staging • Integration with rapidly evolving grid standards(to support reuse and collaboration) • Desktop tools for workflow creation; portals for execution • Debugging and monitoring interfaces

  12. Large, complex scripts to orchestrate applications Static embedded infrastructure control; usually aimed at single machine Communication via temp files “Human-in-the-loop” operation Traditional Workflow Implementation What’s wrong with this?

  13. Large, complex scripts to orchestrate applications Static embedded infrastructure control; usually aimed at single machine Communication via temp files “Human-in-the-loop” operation Traditional Workflow Implementation What’s wrong with this? • Poor performance — Mainly aimed at SMPs (but scalability often limited) • Lack of automation is inefficient and error-prone • No support for application integration or data conversion • Difficult to create, maintain, modify (even for skilled programmers) • Little reusability or portability

  14. Access Data A B C Store Data Fast Slow Fast Traditional Life Science Workflows Typical “Human-in-the Loop” Workflow: • Manual component startup • “Cut and paste” data movement • Sequential execution • Limited throughput due to “bottleneck components”

  15. B Access Data A B C Store Data B A Better Way: Automation & Parallelism TurboWorx High-Performance Workflow: Fast Fast Fast • Automated component startup & data conversion • Pipeline acceleration: asynchronous, dynamic, concurrent execution on distributed machines • Transparent data-driven parallelism to eliminate bottlenecks

  16. TurboWorx Enterprise Architecture AIX Linux Linux Linux Windows Mac OS X Workstations Component Library Data Repository Data Storage Command Line Web Portal TurboWorx Hub Builder Compute Clusters (Managed by BQS/DRM Systems) Interfaces User

  17. Workflow Lifecycle • Design • End user or developer?? • Component & workflow development environment • Integration with data • Testing & Debugging • Deployment • Local storage vs. centralized storage • Sharing & Collaboration • Execution • Execution interface: CLI, Proprietary GUI, Portal, Web/Grid Service • Access Control for workflows and data • Resource management • Monitoring • Events reflecting from workflow and services execution • Refinement & Reuse

  18. TurboWorx Workflows Design & Deployment • Atomic Components • Command-line programs (e.g. C/C++/Fortran, Perl), Java, Jython • XML wrappers created by wizards or by editing templates • Dataflow Components • Workflows built from other components (including other workflows) • Automated data flow & transformations between components • Created using visual programming tool • Deployment • Components stored in a “Component Library” (Local or Centralized) • Import/Export and component sharing (collaboration) • Data references via a virtual “Data Repository” interface (supports WebDav, Avaki, FTP, NFS)

  19. TurboWorx Builder Wizard AtomicComponentCreation ClustalW { } ApplicationJava MethodJython Script TurboWorx Component Component Library WorkflowComponentCreation

  20. Special Components: Conditionals

  21. Special Components: Loops Support for: “For”, “While”, “Do Until” While Loop:

  22. Special Components: Splitters & Joiners • Components to convert between groups of many data elementsand sequences of the individual data elements • Support “Fork-Join” data parallelism • Standard splitters/joiners provided with the TurboWorx system. Examples: • Arrays: Convert between array and individual elements (in order) • Collections: Convert between a Java.util.Collection and its elements • Strings/Patterns: Split input stream based on regular expressions • Users may create additional types using Jython or Java

  23. Parallelism in Practice TurboWorx High-Performance Workflow: SPLIT JOIN Access Data A B C Store Data Fast Fast Slow Splitting enables pipeline parallelism (A, B, C run concurrently on different data)

  24. B B Parallelism in Practice TurboWorx High-Performance Workflow: SPLIT JOIN Access Data A B C Store Data Fast Fast Fast Scheduler determines amount of data parallelism dynamically at run time

  25. Key Programs Identify BLASTP homologous pairs clustalwhmmbuildhmmsearch Build families around pairs clustalwhmmbuildhmmsearch Refine & optimize protein families Find consensus clustalw sequences Compute identity clustalw scores vs. leaders Protein Characterization Example Overall Task: Group protein domains into families Process Family Subworkflow

  26. Example: “Process Family” Workflow

  27. Protein Family Example

  28. Take-Home Points • Technical computing workflows are important in various industries • Effective application of workflows requires HPC, including fault-tolerant automation and dynamic parallelism in a grid-like computing environment • TurboWorx workflow products offer one end-to-end solution for developing and deploying high performance technical workflows

More Related