1 / 15

Ccain: Essential CCA.

Ccain: Essential CCA. (or, a framework for picky users). Chris Rickett, Craig Rasmussen, Matthew Sottile April 2005 CCA Meeting Lincoln City, Oregon. This paper will also be presented at Parallel CFD ‘05 (these slides will not). Outline of talk. CCA design pattern Ccain design goals

clyde
Download Presentation

Ccain: Essential CCA.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ccain: Essential CCA. (or, a framework for picky users) Chris Rickett, Craig Rasmussen, Matthew Sottile April 2005 CCA Meeting Lincoln City, Oregon This paper will also be presented at Parallel CFD ‘05 (these slides will not)

  2. Outline of talk • CCA design pattern • Ccain design goals • CCAIN = CCA INtegration framework • A case study: n-body simulations • Evaluation of Ccain • Discussion

  3. The Essence of CCA • What CCA is: • A design pattern. • A reasonable implementation of the pattern requires: • 1. Factory to create components • createInstance(“ParticlePusher”) • 2. Name service to find components • Port = getPort(“pusher”) • Well defined language bindings • What CCA is not: • Implementation details • dlopen : CCA doesn’t mandate shared objects • Language interoperability tools • The frameworks are just realizations of the pattern.

  4. Ccain design goals • Ccain was designed to respond to what our customers want. • Simple • Implementation difficulty should scale with component complexity (at worst). • High performance • Agnostic of parallel implementation. • Shared and distributed memory models. • Efficient implementation. • Portable • No platform specific code. • Minimal tool chain. • Framework requires a C compiler. • Non-intrusive • No interference with user programming style. • Data structure freedom.

  5. Ccain implementation • Pure C core. • Services, BuilderServices implementation. • 1700 lines of code, 1 week to develop a working, debugged implementation. • Thread safe implementation (pthreads now). • Fortran 90/95/2003, C++, and C bindings. • C++ via extern “C” interfaces, and so is Fortran. • Python, ZPL, … bindings trivial through C interface. • Static component palette. • …but component composition is dynamic. • After all, they’re just pointers… • Removes many headaches. • Portability, framework complexity, no $!@*ing LD_LIBRARY_PATH, etc… • Failures occur at link time with static, not 3 weeks into a run. • We are not convinced people want a dynamic palette.

  6. A case study: n-body simulations • The component design: • Driver provides a go port, uses an accelerator and pusher port. • N-body component provides accelerator and pusher ports. • Two source files. • Driver.c : main loop that calls accelerate and push over and over. • Nbody_naïve.f03 : a basic O(n^2) implementation of accelerator and a simple pusher. No fancy algorithms in here. • Good test though : O(n^2) really hammers particle data structures. push accelerate

  7. Ccain F03 Pusher Port Implementation subroutine NBody_push(c_self, p, n, dt) bind(C) type(C_PTR), value, intent(in) :: c_self integer(C_INT), value, intent(in) :: n real(C_DOUBLE), value, intent(inout) :: dt type(particle) :: p(n) integer :: i for i = 1, n p%x = p%x + p%vx * dt p%y = p%y + p%vy * dt p%z = p%z + p%vz * dt end for end subroutine NBody_push No modification to data structure usage Core code unmodified

  8. Performance: Experimental setup • Ran tests with 500 particles, 1000 iterations with : • Five non-component implementations testing baseline performance of different data access methods and representations. • Babel/Ccaffeine components with arrays of doubles and arrays of structs. • Ccain components with arrays of doubles and arrays of structs. struct particle { double x,y,z,vx,vy,vz,m; }; TYPE particle = REAL(KIND=8) :: x,y,z,m REAL(KIND=8) :: vx,vy,vz END TYPE particle double particles[][] =

  9. Performance: Runtime Data (seconds) Matt’s original F95 code, 5 flavors. Mean: Comparing 3 versions of the code (2.4 GHz Xeon, Suse Linux) Analysis: Set/get calls likely slowed down babel/ccaffeine. (we followed the instructions!)

  10. Fortran 2003 Status • ISO C binding module becoming available • Already in IBM XLF, Cray Fortran, (Sun?) • Coming soon to Intel, … • Chris adding to gfortran. • Craig taking credit for Chris slave labor. • These F2003 compilers have … “issues”. • ‘i’,’c’,’k’,’y’,’ ‘,’s’,’t’,’r’,’i’,’n’,’g’,’s’,C_NULL_CHAR • Ccain works with standard F9x too.

  11. Measures of Complexity CcainCCA/Babel Framework: Time to build: 2.5 s ~ 1 hour # files: 10 1117 # .in files: 2 75 # extra tools: 0 4 ls -l *.tar .7 MB 70 MB User: # make files: 1 40 # line mod: ~230 same or less? # files 9 142 (10% u) development time: ~1 day ~10 days

  12. Mission Accomplished • Goals revisited: • Simple • 4 lines per port routine + the usual stuff (setServices, port definitions) • No stubs or skeletons. • Well… one for F90. • High performance • Ran the same speed as non-component code • Portable • Ran on P4/Xeon, G4, PPC970 (32 and 64 bit modes), AMD64, Cray X1 • This is all we had available to test with last week. • Compiled with Intel, IBM, Cray, Visual C++, and GNU compilers. • Ran successfully under Suse and Yellowdog Y-HPC Linux, MacOS X, Windows XP Pro, UNICOS • … We are compiler, architecture, and operating system agnostic! • Non-intrusive • No data structure changes from original code. • Data types restricted by languages, NOT the framework.

  13. The end

  14. Ccain Fortran Port Definition ! NEED Comparable C header file too type :: PusherPort include "CcainBasePort.fh" type(C_FUNPTR) :: push end type PusherPort interface subroutine push(c_self, particles, numParticles, dt) bind(C) use intricsic :: iso_c_binding type(C_PTR), value, intent(in) :: c_self real(C_DOUBLE), value, intent(inout) :: dt integer(C_INT), value, intent(in) :: numParticles type(particle), dimension(numParticles) :: particles end subroutine push end interface

  15. Ccain setServices subroutine NBody_setServices(c_self, services) & bind(C, name=”NBody_setServices") type(C_PTR), value :: c_self, services type(NaiveNBody), pointer :: self type(PusherPort), pointer :: pPort character(C_CHAR) :: pPortName(9) = C_CHAR_”PushPort”// C_NULL_CHAR call C_F_Pointer(c_self, self) self%myServices = services call C_F_Pointer(self%c_pPort, pPort) pPort%push = C_FUNLOC(NBody_push) call addProvidesPort(services, self%c_pPort, pPortName, pPortType) end subroutine NBody_setServices

More Related