SCILAB to SCILAB// The OURAGAN Project

-G SCILAB to SCILAB//The OURAGAN Project E. Fleury,E. Jeannot LORIA, INRIA Lorraine Nancy, France E. Caron, D. Lazure, G. UtardLARIA Amiens, France F. Desprez, M. Quinson, F. Suter INRIA Rhône-Alpes LIP ENS Lyon, France S. Chaumette, P. Ramet, J. Roman , F. Rubi LaBRI, U. Bordeaux Bordeaux, France C. Gomez, M. Goursat INRIA Rocquencourt Paris, France S. Contassot, F. Lombard, J.-M. Nicod, L. Philippe LIFC Besançon, France

Outline of the talk • INTRODUCTION • PARALLELIZATION OF MATLAB-like tools • PARALLEL SCILAB • Interfaces to other software • Message passing interface • Parallel libraries interfaces • Netsolve interface • Out-of-core operations • Computational servers • CONCLUSION AND FUTURE WORK

INTRODUCTION • One long term idea for Metacomputing • renting computation power and memory capacity over the Internet • Very high potential • Still difficult to use for non-specialists • Lack of standards (CORBA, JAVA/JINI, sockets, …) to build the computational servers • Matlab and Scilab = heavily used tools in the math. community • Easy to learn and to use • Limited because not parallelized (performances, memory capacity) • Scilab// idea: to provide an access to Metacomputing resources from the workstation using ScilabGoals: Efficiency, transparency, and interactivity

VTHD • High-speed network between INRIA research centers (2.5 Gb/s) and several other research centers • Connecting several clusters of PCs, SGI O2K, NEC Cenju-4, and virtual reality caves • Test platform for our developments • Several Metacomputing projects • parallel CORBA objects, Metacomputing environments and communication layers evaluation, virtual reality, ... Nancy Rocquencourt Rennes Grenoble Sophia

PARALLELISATION OF MATLAB-like TOOLS

Compilation approach • Entry: a Matlab script • We add compilation directives • Compiled to a language like Fortran or C (taking care of types and variable sizes problems) • Using calls to sequential or parallel libraries • Goals: • To avoid interpretation • No data-types problems • Calls to high performance libraries • Using parallelizing compilers technology • Falcon, Menhir, Match, Paradigm, Conlab, ...

Master-worker approach • To keep the interactivity of Matlab • We can duplicate Matlab on every processor and exchange messages between Matlab processes (either in a SPMD way or in a master-worker way) • Easy to develop • Interpretation overhead • Or we can use parallel libraries servers Need of a communication protocol between the client (the Matlab window) and servers • More convenient to modify and improve • independent of Matlab

Projects that leave Matlab interactive • MultiMatlab (Cornell): Duplication + PVM, MPI, BLACS • PPServer (MIT): Servers • ScaLAPACK, PLAPACK et PETSc interfaces • No complete transparency • Homogeneous environment. • Non-shared servers • Matpar (JPL): • ScaLAPACK server • PSI (U. Austin): • PLAPACK Server Interface • NETSOLVE

Goals of the OURAGAN project • Scilab: • free Matlab-like tool • simple syntax • many mathematical functions • graphical interface • high level language • To develop tools for big applications in Scilab. • parallel version of Scilab • dynamic load-balancing • parallel sparse solver libraries (PaStiX) with a parallel graph partitioning tool (Scotch), data visualization tool • computational servers with parallel libraries (in-core and OOC) • in a metacomputing environment

Mixed parallelismmanagement (task/data-parallelism) Message passing between Scilab tasks (PVM, MPI), Netsolve Interface (Univ. of Tennessee), Mixed compilation/run-time support Parallel libraries interfaces, (ScaLAPACK), Out-of-coreparallelism(ScaLAPACK extension) Multithreaded Scilab scheduling & migration CORBA approach(management of servers) Computational servers(SLIM, FAST) Sparse matrix solvers in heterogeneousenvironments Several approaches

Dense libraries Sparse libraries Different levels SCILAB// PM-2 High-Perf Computational servers / Netsolve BIP-MPI / MADELEINE / PACX-MPI / PVM / Netsolve interface / CORBA Performance evaluation toolsNWS, ...

Today’s versions • Message passing • assembly level ! • Netsolve interface • easy development • Netsolve has to be modified for our own application (evaluation of routines, scheduling of tasks, no transparency) • ScaLAPACK interface + Scilab duplication • quite easy to develop • Scilab duplication • must add types (distributed matrices) • Computational servers • master-worker style • choice of connection software between components (sockets, CORBA, …) • usable with other interfaces (languages, tools) • libraries integration (no need of specific data structures)

Message passing performances

Complex matrix product using Scilab-PVM

Scilab/ScaLAPACK interfaceEasy use of parallel programming • Trying to keep the same Scilab interface and matrix language • "Only" add of a new distributed scalar data-type (bi-dimensional) • New commands • Initialization of Scilab// • Distribution of scalar • Overloading of common Scilab operators CTXT = blacs_init(P,Q); DIST = scip_init_dist(”CC",0, 0,CTXT(3),MB,NB); A = rand(M,N); B = rand(M,N); MatA = scip_redist("A", DIST); MatB = scip_redist("A", DIST); Res = MatA(1:1000,1:500)* MatB(1:500,1:1000); size(Res) ans = ! 1000. 1000. !

ScaLAPACK interface performances (SGI O2K)

Out-of-core extension for Scilab • To be able to deal with huge matrices • New object: distributed out-of-core matrices • Reuse of parallel out-of-core ScaLAPACK subroutines • New functions: • Basic operations • Matrix inversion • Conversion in-core/out-of-core • Optimizations: • increase temporal locality • I/O overlap with computations

Out-of-core extension for Scilab (cont ’d) • Performances: Alpha cluster (6) + Fast Ethernet • Scilab overhead is negligible

COMPUTATIONAL SERVERS

Computational Servers • Ideas: • To be independent of Scilab • To avoid multiple interpretation steps • To ease the insertion of new libraries • To benefit of existing developments around metacomputing • Developed from existing software: • Netsolve (University Tennessee) • NWS (UCSD et UTK) for the dynamic evaluation of performances • Our developments on libraries (redistribution routines, sparse solver, out-of-core routines) • LDAP for the software database and CORBA for the management of servers

Our goals • To add some features to Netsolve for our own application: • Data persistence on servers • Redistribution and parallelism between servers • Better accuracy of the evaluation of pairs [routine, machine] for small grain computations • Improving the scheduler (testing on-line scheduling heuristics) • Portable database for the available libraries (LDAP) • To have an test and experimentation platform for our developments: • Mixed parallelism approach (data-parallelism/task-parallelism) • Scheduling heuristics of data-parallel tasks • Parallel algorithms on heterogeneous platforms • Performance evaluation • CORBA management of servers • Interactive visualization of data

Server Data Persistence • After a computation, data stay on the server • The server waits for commands from the client. • 5 commands available: • Send one input data • Send one output data • Send all input data • Send all output data • Delete a data • Data can be sent to the client or to an other server (data redistribution).

Control C=A*B On S1 C=A*B ? A on S1 B on S2 Data C=A*B Open a port for receiving B 33120 Bandwidth Speed Send B to S1 on port 33120 33120 Server 1 Server 2 B Data Redistribution AGENT AGENT A B B

Performance results (IBM SP2)several DGEMM on different servers C = A*B D = E*F G = C*D

Agent Client CORBA Sockets Agent-Client API SCHEDULER SLiM LDAP AGENT FAST NWS Naming service Agent-Servers API CORBA Sockets Servers

SLiM: Scientific Libraries Metaserver • Software resource naming service system • To be able to tell the scheduler • which servers are able to perform which operation (using which library) • who are the users of the environment (and what operations do they perform) • where are the generic data • Utilization of LDAP (Lightweight Directory Access Protocol) • Strength and weakness of LDAP: • Distributed • Hierarchical • Opened protocol, standard • Optimized for the lecture for query (not for update)

C A B FAST: Fast Agent’s System Timer • Performance evaluation of the platform to be able to find a efficient server (redistribution and computation costs) without testing every configuration  performance database for the scheduler • Based on NWS (Network Weather Service) • Computation performances • machine load, memory capacity, and batch queues perfs. (dynamic) • extensive testing of several libraries (static) • Communication performances • To be able to guess the cost of the redistribution of data between two servers as a function of the network architecture and dynamic informations • Latency and bandwidth (hierarchical) • Hierarchical set of agents

Performance results

CORBA approach • Automatic interfaces generation • Object oriented programming • Transparent localization of servers / data (IOR) • Existing ORBs on all systems • Interoperability between ORBs (GIOP)

CORBA approach, future features • Non-blocking calls • Data duplication • Smart transfers between parallel servers • Traders federation for large scale Metacomputing • multi-agent system • hierarchical organization • high scalability • distributed information services

PaStiX solver: Overview • Parallel Cholesky factorization (LDLt) for sparse SPD matrices without pivoting • Block partitioning and scheduling problems • Focus on scalability and efficient memory management • Size of matrices: more than 1 million of unknowns • Extensively tested on IBM SP2 (64 processors) • Concurrent software: PSPASES based on a multifrontal approach • Scilab interface: • simple server • data persistency (for chaining several resolutions for one factorization) • data transfers using files

SCOTCH PaStiX Software processing chain for the PaStiX solver Partitioning Ordering MESH Sequential step Blocks mapping and static scheduling Parallel step Triangular resolution Factorization

Symbolic block factorization Number of processors Simulation based ona greedy algorithm Block mapping Block computation and communication scheduling General scheme of the static scheduling computation Cost modeling Block Partitioning Parallel factorization and resolution Solution

BMW3 16 processors

«The Visualization Interactive Tool that makes your application less fuzzy to understand» Support and tools for handling data structures (sparse, irregular, distributed). Our approach «Our aim is not to provide support for using sparse and irregular data structures inside applications: we want to provide tools dealing with such data structures at a high level of abstraction. Furthermore, we want to make it easy to develop such tools.» • Tools : - distribution visualization • - traces visualization

[An example of Scilab// VisIt Tool ]Irregular distribution visualizationof a sparse matrix (from PaStiX) Processors

74000 X 74000 A zoom [Another example of Scilab// VisIt Tool ]Irregular distribution visualizationof a large sparse matrix using MatView(*) tool OilPan with 8 processors (Harwell-Boeing Sparse Matrix Collection) (*) MatView from Oak Ridge National Laboratory

Conclusions and Future Work • Message passing within Scilab • not adapted to interactive computing ! Use of scripts • ScaLAPACK interface (in-core and out-of-core) • very good performances • need of a version of Scilab on every machine • tough data (re)distribution • modification of Scilab internal to add automatic scheduling and performance analysis • Netsolve interface • easy interfacing • needs operator overloading (data types problems) • modifications of Netsolve for our own needs (scheduler, performance evaluation, hierarchy of agents)

Conclusions and Future Work, cont. • Sparse matrix solvers • Obvious needs of high performance for these applications • Many problems • tough to get into real codes (lots of re-coding, almost no automatic cleaning, …) • Needs problems analysis (Netsolve, LSA) • Interoperability of libraries (data redistribution between servers) • Discussions with Globus, NINF, NWS, AppLeS, and Netsolve teams (at least) ! • Participation to the Grid Forum

References • SCILAB: http://www-rocq.inria.fr/scilab/ • OURAGAN: http://www.ens-lyon.fr/~desprez/OURAGAN/ • SCOTCH: http://dept-info.labri.u-bordeaux.fr/~pelegrin/scotch/ • PASTIX: http://dept-info.labri.u-bordeaux.fr/~ramet/pastix/ • Scilab bookEngineering and Scientific Computing with ScilabBirkhäuser, Claude Gomez Editor

SCILAB to SCILAB// The OURAGAN Project