Data Structures in Java for Matrix Computations

Data Structures in Java for Matrix Computations Geir Gundersen Department of Informatics University of Bergen Norway Joint work with Trond Steihaug

Overview We will show how to utilize Java’s native arrays for matrix computations. • How to use Java arrays as a 2D array for efficient dense matrix computation. • How to create efficient sparse matrix data structure using Java arrays. • Object-oriented programming have been favored in the last decade(s): • Easy to understand paradigm. • Straightforward to build large scale applications. • Java will be used for (limited) numerical computations. • Java is already introduced as the programming language in introductory courses in scientific computation. • Impact on computing will force new fields to use Java.

A “mathematical” 2D array

A 2D Java Array • Array elements that refers to another array creates a multidimensional array.

A true 2D Java Array

Java Arrays • Java arrays are true objects. • Thus creating an array is object creation. • The objects of an array of objects are not necessarily stored continuously. • An array of objects stores references to the actual objects. • The primitive elements of an array are most likely stored continuously. • An array of primitive elements holds the actual values for those elements.

Frobenius Norm Example

Frobenius Norm Example • Basic observation: • Accessing the consecutive elements in a row will be faster then accessing consecutive elements in a column.

Matrix Multiplication Algorithms • The efficiency of the matrix multiplication operation is dependent on the details of the underlying data structure both hardware and software. • We discuss several different implementations using Java arrays as the data structure: • A straightforward matrix multiplication algorithm • A package implementation that is highly optimized • An algorithm that takes the row-wise layout into fully consideration and uses the same optimizing techniques as the package implementation.

Matrix Multiplication Algorithms A straightforward matrix multiplication operation. for(int i = 0; i<m;i++){ for(int j = 0;j<n;j++){ for(int k = 0;k<p;k++){ C[i][j] += A[i][k]*B[k][j]; } } } Interchanging the three for loops give six distinct ways (pure row, pure column, and partial row/column).

Matrix Multiplication Algorithms The loop orders tells us how the matrices involved gets traversed in the course of the matrix multiplication operation. We see the same time differences with pure row versus pure column as we did with the Frobenius norm example. This is the same effect.

Matrix Multiplication Algorithms • The time differences are due to accessing different object arrays when traversing columns as opposed to accessing the same object array several times (when traversing a row). • For a rectangular array of primitive elements, the elements of a row will be stored continuously, but the rows may be scattered. • Differences between row and column traversing is also an issue in FORTRAN, C and C++ but the differences are not so significant.

JAMA • A basic linear algebra package implemented in Java. • It provides user-level classes for constructing and manipulating real dense matrices. • It is intended to serve as the standard matrix class for Java. • JAMA is comprised of six Java classes: • Matrix: • Matrix Multiplication: A.times(B) • CholeskyDecomposition • LUDecomposition • QRDecomposition • SingularValueDecomposition • EigenvalueDecomposition

Matrix Multiplication Operations

JAMA versus Pure-Row • A comparison on input AB is shown for square matrices. • The pure row-oriented algorithm has an average of 30 % better performance than JAMA's algorithm.

JAMA versus Pure-Row • JAMA's algorithm is more efficient than the pure row-oriented algorithm on input Ab with an average factor of two.

JAMA versus Pure-Row • There is a significant difference between JAMA's algorithm versus the pure row-oriented algorithm on bTA with an average factor of 7. • In this case JAMA is less efficient. • The break even results.

Sparse Matrices • A sparse matrix is usually defined as a matrix where "many" of its elements are equal to zero • We benefit both in time and space by working only on the nonzero data structure. • Currently there is no packages implemented in Java for matrix computation on sparse matrices, as complete as JAMA (for dense matrices).

Sparse Matrix Concept • The Sparse Matrix Concept (SMC) is a general object-oriented structure. • The Rows objects stores the arrays for the nonzero values and indexes.

Java Sparse Array • The Java Sparse Array (JSA) format is a new concept for storing sparse matrices made possible with Java. • One array for storing the references to the value arrays and one for storing the references to the index arrays. Java's native arrays can store object references therefore the extra Rows object layer in SMC is unnecessarily in Java.

Compressed Row Storage • The most commonly used storage schemes for large sparse matrices: • Compressed Row/Column Storage • These storage schemes have enjoyed several decades of research • The compressed storage schemes have minimal memory requirements.

Numerical Results These numerical results shows that CRS, SMC and JSAhave approximately the same performance.

Sparse Matrix Update • Consider the outer product abT of the two vectors a,b where many of the elements are 0. • The outer product will be a sparse matrix with some rows where all elements are 0, and the corresponding sparse data structure will have rows without any elements. • A typical operation is a rank one update of an n x n matrix A: • where ai is element i in a and bj is element j in b. Thus only those rows of A where ai is different from 0 need to be updated.

Numerical Results These numerical results shows that JSA is more efficient than CRS with an average factor of 78 which is significant.

Concluding Remarks • Using Java arrays as a 2D array for dense matrices we need to consider that the rows are independent objects. • Other suggestion to eliminate the row versus column “problem”: • Cluster row objects together in memory. • Creating a Java array class, avoiding array of arrays. • Java Sparse Array: • Manipulating only the rows of the structure without updating or traversing the rest of the structure, unlike Compressed Row Storage. • More efficient, less memory requirements and have a more natural notation than SMC. • People will use Java for numerical computations, therefore it may be useful to invest time and resources finding how to use Java for numerical computation. • This work has given ideas of how some constructions in Java restricts natural development (rows versus columns). • Java has flexibility that is not fully explored (Java Sparse Array).

Data Structures in Java for Matrix Computations