1 / 27

Programming High Performance Applications using Components

Programming High Performance Applications using Components. Thierry Priol PARIS research group IRISA/INRIA http://www.irisa.fr/paris A joint work with A. Denis, C. Perez and A. Ribes Supported by the French ACI GRID initiative. Outline

staceye
Download Presentation

Programming High Performance Applications using Components

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming High Performance Applications using Components Thierry Priol PARIS research group IRISA/INRIA http://www.irisa.fr/paris A joint work with A. Denis, C. Perez and A. Ribes Supported by the French ACI GRID initiative Outline • High-Performance applications and code coupling • The CORBA Component Model • CCM in the context of HPC • GridCCM: Encapsulation of SPMD parallel codes into components • Runtime support for GridCCM components • Conclusions & future works • Some references

  2. Virtual plane Scattering of 1 on 2 Plane wave Scattering of 2 on 1 Structural Mechanics Optics Object 1 Object 2 Thermal Dynamics High Performance applicationsNot anymore a single parallel application but several of them • High performance applications are more and more complex… thanks to the increasing in performance of off-the-shelves hardware • Several codes coupled together involved in the simulation of complex phenomena • Fluid-fluid, fluid-structure, structure-thermo, fluid-acoustic-vibration • Even more complex if you consider a parameterized study for optimal design • Some examples • e-Science • Weather forecast: Sea-Ice-Ocean-Atmosphere-Biosphere • Industry • Aircraft: CFD-Structural Mechanics, Electromagnetism • Satellites: Optics-Thermal-Dynamics-Structural Mechanism Single-physic multiple-object Multiple-physics single-object Electromagnetic coupling Programming High Performance Applications using Components Ocean-atmosphere coupling

  3. MPI code #1 MPI code #2 MPI code #3 MPI code #n … Code coupler (MpCCI, OASIS …) MPI implementation SAN Code A Code B cluster MpCCI MpCCI MPI The current practice… • Coupling is achieved through the use of specific code coupling tools • Not just a matter of communication ! • Interpolation, time management, … • Examples: MpCCI, OASIS, PAWS, CALCIUM, ISAS, … • Limitations of existing code coupling tools • Originally targeted to parallel machines with some on-going works to target Grid infrastructure • Static coupling (at compile time): not “plug and play” • Ad-hoc communication layers (MPI, sockets, shared memory segments, …) • Lack of explicit coupling interfaces • Lack of standardization • Lack of interoperability The MpCCI coupling library Programming High Performance Applications using Components

  4. Code B Code interface Code A Code A In Code interface Code interface Out In In Out Out Out Out In In Code interface Code interface Code D Code C Another approach for code coupling • Component definition by C. Szyperski • “A component is a unit of independent deployment” • “Well separated from its environment and from other components” • Component programming is well suited for code coupling • Codes are encapsulated into components • Components have public interfaces • Components can be coupled together or through a framework (code coupler) • Components are reusable (with other frameworks) • Application design is simpler through composition (but component models are often complex to master !) • Some examples of component models • HPC component models • CCA, ICENI • Standard component models • EJB, DCOM/.NET, OMG CCM Out Code interface Code B In Code coupler Component-based framework Out Out In In In In Out Out Programming High Performance Applications using Components

  5. Distributed components: OMG CCM Component interface • A distributed component-oriented model • An architecture for defining components and their interaction • Interaction implemented through input/output interfaces • Synchronous and asynchronous communication • A packaging technology for deploying binary multi-lingual executables • A runtime environment based on the notion of containers (lifecycle, security, transaction, persistent, events) • Multi-languages, multi-OS, multi-ORBs, multi-vendors, … • Include a deployment model • Could be deployed and run on several distributed nodes simultaneously My Component Facets Receptacles OFFERED REQUIRED Event sinks Event sources Attributes Component-based application Programming High Performance Applications using Components

  6. Server Obj. Impl. IDL Compiler Client IDL Skel. Obj. Inv. POA IDL Stub Object Request Broker (ORB) GIOP ESIOP IIOP From CORBA 2.x to … • Open standard for distributed object computing by the OMG • Software bus, object oriented • Remote invocation mechanism • Hardware, operating system and programming language independence • Vendor independence (interoperability) IDL file interface MatrixOperations{ const long SIZE = 100; typedef double Vector[ SIZE ]; typedef double Matrix[ SIZE ][ SIZE ]; void multiply( in Matrix A, in Vector B, out Vector C ); }; Programming High Performance Applications using Components

  7. … CORBA 3.0 and CCM MvopHome • Component interface specified by the OMG IDL 3.0 Component interface OFFERED REQUIRED interface Mvop { void mvmult(some parameters); }; interface Stat { Void sendt(some parameters); } component MatrixOp { attribute long size; provides Mvop the_mvop; uses Stat the_stat; publishes StatInfo avail; consume MustIgo go; } Home MvopHome manages MatrixOp {}; MatrixOp Facet Receptacle Event sink Event source Attribute Programming High Performance Applications using Components

  8. Container Internal Interfaces Component Implementation Callback API ORB Persistency Notification Transaction Security CCM Runtime Environment & Deployment • CCM includes a runtime environment called Container • A container is a component’s only outside contact • CCM defines a Packaging and Deployment model • Components are packaged into a self-descriptive package (zip) • One or more implementations (multi-binary) + IDL of the component + XML descriptors • Packages can be assembled (zip) • A set of components + XML descriptor • Assemblies can be deployed • Through a deployment tool to be defined… such as a Grid middleware… ZIP OMG IDL [my_favorite_grid_middlware]run myapp.zip Programming High Performance Applications using Components

  9. CCM in the context of HPC • Encapsulation of parallel codes into software components • Parallelism should be hidden from the application designers as much as possible when assembling components • Issues: • How much my parallel code has to be modified to fit the CCM model • What has to be exposed outside the component ? • Communication between components • Components should use the available networking technologies to let component to communicate efficiently • Issues: • How to combine multiple communication middleware/runtime and to make them to run seamlessly and efficiently ? • How to manage different networking technologies transparently ? • Can my two components communicate using Myrinet for one run and using Ethernet for another run without any modification, recompilation,..? Programming High Performance Applications using Components

  10. Component B Component A MPI Master processes MPI Master processes MPI comm. layer MPI comm. layer … … MPI Slave processes MPI Slave processes SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. Ethernet Switch Ethernet Switch 100 Mb/s 100 Mb/s 100 Mb/s 100 Mb/s Encapsulating SPMD codes into CORBA Components • The obvious way : adopt a master/Slave approach • One SPMD process acts as the master whereas the others act as slaves • The master drives the execution of the slaves through message passing • Communication between the two SPMD codes go through the master process • Advantage • Could be used with any* CCM implementation • Drawbacks • Lack of scalability when communicating through the ORB • Need modifications to the original MPI code ORB Code A Code B WAN 40 Gbit/s * In theory … but practically… Programming High Performance Applications using Components

  11. HPC Component A // Comp. A // Comp. B SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. Making CORBA Components parallel-aware What the application designer should see… • Just to remind you : • Communication between components is implemented through a remote method invocation (to a CORBA object) • Constraints • A parallel component with one SPMD process = a standard component • Communication between components should use the ORB to ensure interoperability • Little but preferably no modification to the CCM specification • Scalable communication between components by having several data flows between components HPC Component B … and how it must be implemented ! … … Programming High Performance Applications using Components

  12. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. Comp. A-0 Comp. A-4 Comp. A-2 Comp. A-1 Comp. A-3 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 Comp. B-0 Comp. B-1 Comp. B-3 Comp. B-4 Comp. B-2 Parallel Remote Method invocation A set of processes calling… • Based on the concept of “multi-function” introduced by J-P. Banâtre et al. (1986) • A set of processes collectively calls another set of processes through a multi-function (multi-RPC) • Parallel remote Method invocation • Notion of collection of objects (parallel objects) • Multi-RMI • Parallel Component • A collection of identical components • A collection of parallel interfaces (provide/use) X=1 Y=3 Call f(X,Y,Z) A=Z*2 . . .. X=3 Y=5 Call f(X,Y,Z) A=Z*2 . . .. X=7 Y=8 Call f(X,Y,Z) A=Z*2 . . .. … F(A, B, C) Begin … C=A+3*B … return F(A, B, C) Begin … C=A+3*B … return F(A, B, C) Begin … C=A+3*B … return … … another set of processes Parallel Component A Parallel Component B Programming High Performance Applications using Components

  13. interface[*:2*n] MatrixOperations { typedef double Vector[SIZE]; typedef double Matrix[SIZE][SIZE]; void multiply(in dist[BLOCK][*] Matrix A, in Vector B, out dist[BLOCK] Vector C); void skal(in dist[BLOCK] Vector C, out csum double skal); }; MPI comm. layer SPMD Proc. SPMD Proc. Object Inv. Skel. Skel. Stub POA POA Parallel interface • A parallel interface is mapped on a collection of identical CORBA objects • Invocation to a collection of objects is transparent to the client (either sequential or parallel) • Collection specification through IDL extensions • Size and topology of the collection • Data distribution • Reduction operations • Modification to the IDL compiler • Stub/skeleton code generation to handle parallel objects • New data type for distributed array parameter • Extension to CORBA sequence • Data distribution information • Impact on standard: • Does not require the modification of the ORB • Require extensions to IDL and naming service • Is not portable across CORBA implementations Extended-IDL compiler Parallel CORBA Object Sequential client … Object Request Broker (ORB) Programming High Performance Applications using Components

  14. Sequential client … Object Inv. MPI comm. layer SPMD Proc. SPMD Proc. Object Request Broker (ORB) SPMD Proc. Skel. Skel. Skel. Stub interface[*] Operations { typedef double Matrix[100][100]; void foo(inout dist[BLOCK][*] Matrix A); }; Gather foo( A ) Stub Request0 Request0 Request1 foo( A ) A POA POA ... Pco->foo(A) ... Object Request Broker (ORB) Request2 MPI comm. layer MPI comm. layer foo( A ) Request3 foo( A ) Scatter Parallel invocation and data distribution Parallel CORBA Object • Data management is carried out by the stub and the skeleton • What the stub has to do: • generates requests according to the number of objects belonging to the collection • According to the data distribution • Scatter the data for the input parameters (in) • Gather the data for the output parameters (out) Programming High Performance Applications using Components

  15. 1) Redistribution 2) Parallel Invocation SPMD Proc. SPMD Proc. SPMD Proc. ORB Parallel Server Parallel Client MPI comm. layer MPI comm. layer Object Request Broker (ORB) Data redistribution at client side Skel. Skel. Skel. Stub Stub Stub SPMD Proc. SPMD Proc. SPMD Proc. Parallel Server Parallel Client Redistribution for parameter A POA POA POA A[BLOCK][*] A[*][BLOCK] Parallel invocation and data distribution Parallel CORBA Object • Different scenarios • M x 1 : the server is sequential • M x N : the server is parallel • Stubs synchronized themselves to elect those that call the server objects • Synchronisation through MPI • Data redistribution is performed by the stub before calling the parallel CORBA object • Data redistribution library using MPI Parallel client … No serialisation here But parallel connection Parallel server … Parallel CORBA Object Programming High Performance Applications using Components

  16. Lessons learnt… • Implementation dependant • Parallel CORBA object only available for MICO • Modification of the CORBA standard (Extended-IDL) • The OMG does not like such approach • The end-users too… • Data distribution specified in the IDL • Difficulties to add new data distribution schemes • Other approaches • PARDIS (Indiana University, Prof. D. Gannon) • Modification of the IDL language (distributed sequence, futures) • Data Parallel CORBA (OMG) • Require modifications to the ORB to support data parallel object • Data distribution specification included in the client/server code and given to the ORB Programming High Performance Applications using Components

  17. Interface IExample Method_name: send_data Method_type: parallel Return_type: none Number_argument: 1 Type_Argument1: Distributed… #include “Matrix.idl” Interface IExample { void send_data(Matrix m); }; Distribution (XML) User IDL Parallel IDL compiler Parallel CORBAStub & Skel. Parallel IDL Standard IDL compiler Standard StubStandard Skel. Portable parallel CORBA object • No modification to the OMG standard • Standard IDL • Standard ORB • Standard naming service • Parallelism specification through XML • Operation: sequential, parallel • Argument: replicated, distributed • Return argument: reduction or not • Automatic generation of a new CORBA object (Parallel Stub and Skeleton) • Renaming • Add new IDL interfaces Programming High Performance Applications using Components

  18. MPI WORLD MPI WORLD Parallel User Code (MPI) Interface1 Implementation Interface1 Interface1 Parallel CORBA stub Parallel CORBA skel ManagerInterface1 ManagerInterface1 CORBA stub CORBA skeleton CORBA ORB Portable parallel CORBA object • Behave like standard CORBA objects • I_var o = I::_narrow(...);o->fun(); • A parallel CORBA object is: • A collection of identical CORBA object • A Manager object • Data distribution managed by the Parallel CORBA stub and skeleton • At the client or the server side • Through the ORB • Interoperability with various CORBA implementations Programming High Performance Applications using Components

  19. Comp. A-3 Comp. A-1 Comp. A-2 Comp. A-0 Comp. A-4 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. Back to CCM: GridCCM interface Interfaces1 { void example(in Matrix mat); }; … component A { provides Interfaces1 to_client; usesInterfaces2 to_server; }; Parallel Component Of type A to_client to_server IDL Component: A Port: to_client Name: Interfaces1.example Type: Parallel Argument1: *, bloc … XML Programming High Performance Applications using Components

  20. Runtime support for a grid-aware component model • Main goal for such a runtime • Should support several communication runtime/middleware at the same time • Parallel runtime (MPI) & Distributed Middleware (CORBA) such as for GridCCM • Underlying networking technologies not exposed to the applications • Should be independent from the networking interfaces • Let’s take a simple example • MPI and CORBA using the same protocol/network (TCP/IP, Ethernet) • MPI within a GridCCM component, CORBA between GridCCM components • The two libraries are linked together with the component code, does it work ? • Extracted from a mailing list: Message 8368 of 11692  |  Previous |Next  [ Up Thread ] Message Index From:   ------- -------- < -.-----@n... > Date:   Mon May 27, 2002  10:04 pm Subject:   [mico-devel] mico with mpich I am trying to run a Mico program in parallel using mpich. When calling CORBA::ORB_init(argc, argv) it seems to coredump. Does anyone have experience in running mico and mpich at the same time? Programming High Performance Applications using Components

  21. … MPI code Object Std Object MPI Sim. Std Sim CORBA Framework HLA Framework Middleware & runtimes Grid Services Visualisation … MPI CORBA HLA PadicoTM framework WAN SAN Supercomputer Homogeneous cluster PadicoTM: an open integration framework • Provide a framework to combine various communication middleware and runtimes • For parallel programming: • Message based runtimes (MPI, PVM, …) • DSM-based runtimes (TreadMarks, …) • For distributed programming • RPC/RMI based middleware (DCE, CORBA, Java) • Middleware for discrete-event based simulation (HLA) • To handle a large number of networking interfaces • Avoid the NxM syndrome ! • Transparency vis à vis the applications • Get the maximum performance from the network! • Offer zero-copy mechanism to middleware/runtime • What are the difficulties ? • Provide a generic framework for parallel-oriented runtime (SPMD-based + fixed topology) and distributed-oriented middleware (MPMD-based + dynamic topology) • Multiplexing the networking interface • A single runtime/middleware use several networking interfaces at the same time • Multiple runtime/middleware use simultaneously a single networking interface Programming High Performance Applications using Components

  22. PadicoTM architecture overview • Provide a set of personalities to make easy the porting of existing middleware • BSD Socket, AIO, Fast Message, Madeleine, … • The Internal engine controls the access to the network and scheduling of threads • Arbitration, multiplexing, selection, … • Low level communication layer based on Madeleine • Available on a large number of networking technologies • Associated with Marcel (multithreading) OmniORB MICO Orbacus Orbix/E Mome Mpich Kaffe CERTI PadicoTM Services HLA DSM MPI CORBA JVM Personality Layer PadicoTMCore Internal engine Madeleine Portability across networks TCP Marcel I/O aware multi-threading Networks Multithreading Myrinet SCI Programming High Performance Applications using Components

  23. <defmod name=“CORBA-omniORB-4.0” driver=“binary”> <attr label=“NameService”> corbaloc::paraski.irisa.fr:8088/NameService </attr> <requires>SysW</requires> <requires>VIO</requires> <requires>LibStdC++</requires> <unit>CORBA-omniORB-4.0.so</unit> <file>lib/libomniorb-4.A.so</file> <file>lib/libomnithread4.A.so</file> </defm=od> PadicoTM implementation SPMD process on each participating node • Implemented as a single process • To gain access to networks that cannot be shared • Launched simultaneously on each participating node • Middleware and user’s applications are loaded dynamically at runtime • Appear as modules • Modules can be binary shared objects (“.so” libraries on Unix system) or Java classes User code Applications PadicoTM Services HLA DSM MPI CORBA JVM SOAP or CORBA Circuit Network Topology Management VSock High Performance Sockets TaskManager Modules management Centralized thread policy GK Create Load Start Stop Unload Destroy NetAccess Network multiplexing Polling/callback management PadicoTMCore Madeleine Portability across networks TCP Marcel I/O aware multi-threading Networks Multithreading Myrinet SCI XML Module Descriptor Programming High Performance Applications using Components

  24. Bandwidth (MB/s) 250 CORBA/Myrinet-2000 MPI/Myrinet-2000 200 Java/Myrinet-2000 CORBA/SCI MPI/SCI TCP/Ethernet-100 150 100 50 0 1 10 100 1000 10000 100000 1000000 1E+07 Message size (bytes) Performances • Experimental protocols • Runtimes and middleware: MPICH, CORBA OmniORB3, Kaffe JVM • Hardware: Dual-Pentium III 1 Ghz with Myrinet 2000, Dolphin SCI • Performance results • Myrinet-2000 • MPI: 240 MB/s, 11 ms • CORBA: 240 MB/s, 20 ms • 96 % of the maximum achievable bandwidth of Madeleine • SCI • MPI: 75 MB/s, 23 ms • CORBA: 89 MB/s, 55 ms 18ms Programming High Performance Applications using Components

  25. 300 250 200 Aggregated Bandwidth in MB/s 150 100 Comp. A-4 Comp. A-4 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-2 Comp. A-2 Comp. A-1 Comp. A-1 Comp. A-3 Comp. A-3 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 Comp. A-0 50 SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. 0 1->1 2->2 4->4 8->8 Component configuration Java C++/Eth C++/Myri Preliminary GridCCM performance • Composition scalability • GridCCM code hand written • 1D block distibution • Experimental Protocol • Platform • 16 PIII 1 Ghz, Linux 2.2 • Fast-Ethernet network + Myrinet-2000 network • GridCCM implementation • JAVA: OpenCCM • C++: MicoCCM • C++/Myri based onMicoCCM/PadicoTM Programming High Performance Applications using Components

  26. Conclusion & Future works • Conclusions • Code coupling tools can benefit from component models • Provide a standard to let HPC codes to be reused and shared in various contexts • An existing component model such as CCM can be extended to comply with HPC requirements without modifying the standard • We do not need to reinvent the wheel… • CORBA is a mature technology with several open source implementations • Should be careful when saying that CORBA is not suitable for HPC • It is a matter of implementation ! (zero-copy ORB exists such as OmniORB) • It is a matter of a suitable runtime to support HPC networks (PadicoTM) • Yes, CORBA is a bit complex but do not expect to have a simple solution to program distributed systems… • Future works • Deployment of components using Grid middleware & dynamic composition Programming High Performance Applications using Components

  27. Some references • The Padico project • http://www.irisa.fr/paris/Padico • One of the best tutorial on CCM • http://www.omg.org/cgi-bin/doc?ccm/2002-04-01 • A web site with a lot of useful information about CCM • http://ditec.um.es/~dsevilla/ccm/ Programming High Performance Applications using Components

More Related