1 / 15

Toward a Distributed and Parallel High Performance Computing Environment

Toward a Distributed and Parallel High Performance Computing Environment. Johan Carlsson and Nanbor Wang {johan,nanbor}@txcorp.com Tech-X Corporation Boulder, CO. CCA Meeting, Arpil 28, 2005. Funded by DOE OASCR SBIR Grant #DE-FG02-04ER84099. Outlines. Phase I project Phase I review

lucius
Download Presentation

Toward a Distributed and Parallel High Performance Computing Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward a Distributed and Parallel High Performance Computing Environment Johan Carlsson and Nanbor Wang{johan,nanbor}@txcorp.comTech-X CorporationBoulder, CO CCA Meeting, Arpil 28, 2005 Funded by DOE OASCR SBIR Grant #DE-FG02-04ER84099

  2. Outlines • Phase I project • Phase I review • Phase I work • Phase II project and Future Work • Motivation for a Distributed and Parallel High-Performance Computing (DPHPC) environment • Phase II work plan

  3. Feasibility of Remoting CCA Components • Support distributed computation by composing remote-capable components into an applications efficiently • Hide the distributed aspect from the localized CCA framework • Provide low-cost mechanisms for connecting uncompatible CCA infrastructures, e.g., Ccafeine, Dune, Ccain, and SciRUN

  4. Phase I Work • Using (almost) current CCA-tools (0.5.6) • Benchmarking two distributed middleware technologies • CORBA (TAO) • Web/Grid Services (gSoap) • Using simple strategy for connection setup – direct connection • Measuring the throughput for invoking a simple cube_double operation

  5. Benchmarking Configuration

  6. Performance Results • Baseline results • Measuring call between two local components with C++ calls • Cost for remoting operations far exceeds local CCA component calls • Distributed applications with remoting components can perform equally good

  7. Performance Results (II) • Making the application distributed by composing remoting components of different distributed technologies • CORBA provides better throughput even with short messages/frequent interactions • Results from other measurement shows CORBA outperforms gSoap with large datasets

  8. Native Array Library (NAL) • Phase I proposed to implement NAL to provide efficient array access for FORTRAN programs • Babel has since added r-array support after the acceptance of proposal • Reviewed and experimented with r-array support. It makes NAL unnecessary • Led to the Phase II task on adding struct support

  9. Motivations for Mixing Distributed Tech. and Parallelism • Provide higher abstractions for HPC infrastructure • Motivating example scenarios: • Provide a different paradigm for partitioning problems – multi-physics simulations • Provide better utilization of high-CPU number hardware • Combine computing resources of multiple clusters/computing centers • Enable parallel data streaming between computing task and post-processing task

  10. Phase II Goals • Overall, to provide a DPHPC environment • Specific technical goals: • Hardening of remoting component implementations and tools • Offer other modern language construct • Examine and review different composition strategies for DPHPC applications • Provide examples of several DPHPC applications

  11. Hardening of Remoting Component Implementations • Proxy-style remoting components • Examine carefully the mapping from SIDL to CORBA IDL • Develop prototype tools for generating remoting component implementations • Integration with Babel RMI APIs • Implementing CORBA-based Babel RMI library • Implementing other efficient IPC mechanisms for certain hardware configuration

  12. Support Modern Language Constructs for DPHPC • Discussion with LLNL researchers: struct is a most requested features • Work on FSP proposal also calls for “struct” support • Need to support for both local and distrubted cases • Will collaborate with LLNL researchers • BNF and AST extension • Code generation for C/C++, FORTRAN • Relationship with FORTRAN Bind (C)

  13. Deployment of DPHPC Applications • Local-CCA component centric view: • Local applications • Employ a distributed “builder service” for registering/requesting distributed ports • Distributed component centric view: • Two-tier deployment – remote components and their implementations • Grid view: • Making distributed components as grid services

  14. Example DPHPC Applications • Running HPC applications using multiple clusters • Utilize ORNL LDRD fusion CCA components • Running HPC applications using large-cpu-count hardware • Collaborate with the FSP team • Connecting HPC applications with online data analysis in real-time • Utilize parallel data streaming

  15. Concluding Remarks • Current status • Benchmarking two major transport mechanisms, CORBA and SOAP • Two major types of interaction models • Frequent short control messages • Periodic large datasets • Our goals are: • Provide an environment for running DPHPC applications • Document usage patterns of developing DPHPC applications • Future work: • Hardening of remoting component implementations and tools • Support other modern language construct • Examine and review different composition strategies for DPHPC applications • Provide examples of several DPHPC applications

More Related