Han ku lee department of computer science florida state university feb 19 th 2002 hkl@csit fsu edu
This presentation is the property of its rightful owner.
Sponsored Links
1 / 37

Efficient Compilation of the HPJava Language for HPC PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Han-Ku Lee Department of Computer Science Florida State University Feb 19 th , 2002 [email protected] Efficient Compilation of the HPJava Language for HPC. Outline. Background - review of data-parallel languages HPspmd Programming Language Model HPJava

Download Presentation

Efficient Compilation of the HPJava Language for HPC

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Han-Ku Lee

Department of Computer Science

Florida State University

Feb 19th, 2002

[email protected]

Efficient Compilation of the HPJava Language for HPC

[email protected]


  • Background - review of data-parallel languages

  • HPspmd Programming Language Model

    • HPJava

  • The compilation strategies for HPJava

  • Author’s contributions and Proposedwork

  • Conclusions and Current Status

[email protected]

Research Objectives

  • Data-parallel programming and languages have played a major role in high-performance computing

  • HPF – difficult (compilation)

  • Library-based lower-level SPMD programming – successful

  • HPspmd programming language model – a flexible hybrid of HPF-like data-parallel language and the popular, library-oriented, SPMD style

  • Base-language for HPspmd model should be clean and simple object semantics, cross-platform portability, security, and popular – Java

[email protected]

Proposed Work

  • Efficient Compilation of the HPJava Language for HPC

  • Main thrust of proposal work will be to explore effectiveness of optimizations in the HPspmd translator

  • Continue to investigate which optimization strategies are most critical in a wider range of applications in High Performance Compilers

[email protected]

Data Parallel Languages

  • Large data-structures, typically arrays, are split across nodes

  • Each node performs similar computations on a different part of the data structure

  • SIMD – Illiac IV and Connection Machine for example introduced a new concept, distributed arrays

  • MIMD – asynchronous, flexible, hard to program

  • SPMD – loosely synchronous model (SIMD+MIMD)

    • Each node has its own local copy of program

[email protected]

HPF(High Performance Fortran)

  • By early 90s, value of portable, standardized languages universally acknowledged.

  • Goal of HPF Forum – a single language for High Performance programming. Effective across architectures—vector, SIMD, MIMD, though SPMD a focus.

  • HPF - an extension of Fortran 90 to support the data parallel programming model on distributed memory parallel computers

  • Supported by Cray, DEC, Fujitsu, HP, IBM, Intel, Maspar, Meiko, nCube, Sun, and Thinking Machines

[email protected]


Memory Area


Ideal data distribution

  • Multi-processing and data distribution – communication and load-balance

  • Introduced processor arrangement and Templates

  • Data Alignment

[email protected]

Features of HPJava

  • A language for parallel programming, especially suitable for massively parallel, distributed memory computers.

  • Takes various ideas from HPF.

    • e.g. - distributed array model

  • In other respects, HPJava is a lower levelparallel programming language than HPF.

    • explicit SPMD, needing explicit calls to communication libraries such as MPI or Adlib

  • The HPJava system is built on Javatechnology.

    • The HPJava programming language is an extension of the Java programming language.

[email protected]

Benefits of our HPspmd Model

  • Translators are much easier to implement than HPF compilers. No compiler magic needed

  • Attractive framework for library development, avoiding inconsistent parameterizations of distributed array arguments

  • Better prospects for handling irregular problems – easier to fall back on specialized libraries as required

  • Can directly call MPI functions from within an HPspmd program

[email protected]

HPspmd Architecture

[email protected]

Multidimensional Arrays

  • Java is an attractive language, but needs to be improved for large computational tasks

  • Java provides an array of arrays => disadvantage

    • Time consumption for out-of bounds checking

    • The ability to alias rows of an array

    • The cost of accessing an element

  • HPJava introduces true multidimensional arrays and regular sections

  • For example

    int [[*,*]] a = new int [[5, 5]] ;

    for (int i=0; i<4; i++) a [i, i+1] = 19 ;

    foo ( a [[:, 0]] ) ;

[email protected]





Proces2 p = new Procs(2, 3) ;

on (p) {

Range x = new BlockRange(N, p.dim(0)) ;

Range y = new BlockRange(N, p.dim(1)) ;

float [[-,-]] a = new float [[x, y]] ;

float [[-,-]] b = new float [[x, y]] ;

float [[-,-]] c = new float [[x, y]] ;

… initialize ‘a’, ‘b’

overall (i=x for :)

overall (j=y for :)

c [i, j] = a [i, j] + b [i, j];


  • An HPJava program is concurrently started on all members of some process collection – process groups

  • on construct limits control to the active process group (APG), p




[email protected]

Distributed arrays

  • The most important feature of HPJava

  • A collective object shared by a number of processes

  • Elements of a distributed array are distributed

  • True multidimensional array

  • Can form a regular section of an distributed array

  • When N = 8 in the previous example code, the distributed array, ‘a’ is distributed like:

[email protected]








Distribution format

  • HPJava provides further distribution formats for dimensions of distributed arrays without further extensions to the syntax

  • Instead, the Range class hierarchy is extended

  • BlockRange, CyclicRange, IrregRange, Dimension

  • ExtBlockRange – a BlockRange distribution extended with ghost regions

  • CollapsedRange – a range that is not distributed, i.e. all elements of the range mapped to a single process

[email protected]

Overall constructs

overall (i = x for 1: N-2: 2)

a[i] = i` ;

  • Distributed parallel loop

  • i– distributed index whose value is symbolic location (not integer value)

  • Index triplet represents a lower bound, an upper bound, and a step – all of which are integer expressions

  • With a few exception, the subscript of a distributed array must be a distributed index, and x should be the range of the subscripted array (a)

  • This restriction is an important feature, ensuring that referenced array elements are locally held

[email protected]

Array Sections

  • HPJava supports subarrays modeled on the array sections of Fortran 90

  • The new array section is a subset of the elements of the parent array

  • Triplet subscript

  • The rank of an array section is equal to the number of triplet subscripts

  • e.g. float [[-,-]] a = new float [[x, y]] ;

    float [[-]] b = a [[0, :]] ;

    float [[-,-]] u = a [[0 : N/2-1, 0 : N-1 : 2]] ;

[email protected]

Distributed Array Type

  • Type signature of a distributed array

    T [[attr0, …, attrR-1]] bras

    where R is the rank of the array and each term attrr is either a single hyphen, - or a single asterisk, *, the term bras is a string of zero or more bracket pairs, []

  • T can be any Java type other than an array type. This signature represents the type of a distributed array whose elements have Java type

    T bras

  • A distributed array type is not treated as a class type

[email protected]

Basic Translation Scheme

  • The HPJava system is not exactly a high-level parallel programming language – more like a tool to assist programmers generate SPMD parallel code

  • This suggests the translations the system applies should be relatively simple and well-documented, so programmers can exploit the tool more effectively

    • We don’t expect the generated code to be human readable or modifiable, but at least the programmer should be able to work out what is going on

  • The HPJava specification defines the basic translation scheme as a series of schema

[email protected]

Translation of a distributed array declaration

Source: T [[attr0, …, attrR-1]] a ;

TRANSLATION: T [] a ’dat ;

ArrayBase a ’bas ;

DIMENSION_TYPE (attr0) a ’0 ;

DIMENSION_TYPE (attrR-1) a ’R-1 ;

where DIMENSION_TYPE (attrr) ≡ ArrayDim if attrr is a hyphen, or

DIMENSION_TYPE (attrr) ≡ SeqArrayDim if attrr is a asterisk


float [[-,*]] var ;  float [] var__$DS ;

ArrayBase var__$bas ;

ArrayDim var__$0 ;

SeqArrayDim var__$1 ;

[email protected]

Translation of the overall construct

SOURCE: overall (i = x for e lo : e hi : e stp) S

TRANSLATION: Block b = x.localBlock(T [e lo], T [e hi], T [e stp]) ;

Group p = apg.restrict(x.dim(), apg) ;

for (int l = 0; l < b.count; l ++) {

int sub = b.sub_bas + b.sub_stp * l ;

int glb = b.glb_bas + b.glb_stp * l ;

T [S | p]


where: i is an index name in the source program,

x is a simple expression in the source program,

e lo, e hi, and e stpare expressions in the source,

S is a statement in the source program, and

b, p, l, sub and glb are names of new variables

[email protected]


  • Based on the observations for parallel algorithms such as Laplace equation using red-black iterations, distributed array element accesses are generally located in inner overall loops.

  • The complexity of the associated terms in the subscript expression of a distributed array element access.

    • Strength Reduction - introducing the induction variables

    • Loop-unrolling - hoisting the run-time support classes

    • Common-subexpression elimination

  • The novelty is in adapting these optimizations to make HPspmd practical

[email protected]

Example of Optimization

  • Here we only consider strength reduction optimizations on the index expression

  • Consider the nested overall and loop constructs

    overall (i=x for :)

    overall (j=y for :) {

    float sum = 0 ;

    for (int k=0; k<N; k++)

    sum += a [i, k] * b [k, j] ;

    c [i, j] = sum ;


[email protected]

A correct but naive translation

Block bi = x.localBlock() ;

for (int lx = 0; lx<bi.count; lx ++) {

Block bj = y.localBlock() ;

for (int ly = 0; ly<bj.count; ly ++) {

float sum = 0 ;

for (int k = 0; k<N; k ++)

sum += a.dat() [a.bas() + (bi.sub_bas + bi.sub_stp * lx) * a.str(0) +

k * a.str(1)] *

b.dat() [b.bas() + (bj.sub_bas + bj.sub_stp * ly) * b.str(1) +

k * b.str(0)] ;

c.dat() [c.bas() + (bi.sub_bas + bi.sub_stp * lx) * c.str(0) +

(bj.sub_bas + bj.sub_stp * ly) * c.str(1)] = sum;



[email protected]

Strength-Reduction Optimization

  • The problem is the complexity of the associated terms in the subscript expressions

  • The subscript expressions can be greatly simplified by application of strength-reduction optimization

  • Eliminate complicated expressions involving multiplication from expressions in inner loops by introducing the induction variables:

  • Which can be computed efficiently by increasing at suitable points with the induction increments:

[email protected]

Why benchmark ?

  • Before adapting optimization strategies in HPJava translator, need to benchmark hand-coded optimizations

  • Need to prove distributed arrays in Java don’t introduce unacceptable overhead

[email protected]


  • Benchmarked on Linux Red Hats 7.2 (Pentium IV 1.5 GHZ)

  • Linpack, Matrix-Multiplication, Laplace Equations using red-black relaxation

  • IMB Developer kits 1.3 (JIT)

  • Compared Java and HPJava with GNU cc and Fortran77

[email protected]

Comparison of base languages

  • daxpy() kernel in Linpack

  • N = 200, iter = 100000 with Maximal Optimization

[email protected]

HPJava: Matrix Multiplication

  • N = 100, iter =100 with Maximal Optimization

  • HPJava uses a single-processor

[email protected]

Laplace Equestion using red-black relaxation

  • N = 500, count = 100 with Maximal Optimization

[email protected]

Benchmark results

  • Naïve HPJava is slow because allows for distributed arrays – complexity of subscripting

  • Practical optimizations can remove these overhead

  • HPJava results for a single processor – expected scale with multiple-processors

  • Java is quite competitive with other languages

[email protected]

Fortran is sometimes slower than C ?

  • Could say “performance of Fortran and C” are same

  • But, depends upon compilers

  • GNU Fortran 77 compiler generates more machine codes than GNU cc compiler does for main loop in Linpack

[email protected]

Author’s Contributions to HPJava

  • Developing and maintaining the HPJava front-end and back-end environments at NPAC, CSIT, and Pervasive Technology Labs.

  • Translator, Type-Checker, and Type-Analyzer of HPJava.

  • Some of his early works at NPAC

    • Unparser and Abstract Expression Node generator, and original implementation of the JNI interfaces of the run-time communication library, Adlib.

[email protected]

Current Status of HPJava

  • Collaborated with Bryan Carpenter, Geoffrey Fox, Guansong Zhang, Sang Lim and Zheng Qiang

  • The first fully functional HPJava translator (written in Java) is now operational

  • Parser – JavaCC and JTB tools

  • Has been tested and debugged against small test suite and 800-line multigrid code

[email protected]

Future Work

  • Efficient Compilation of the HPJava Language for HPC

    • optimizations of HPJava

  • Main thrust of proposal work will be to explore effectiveness of optimizations in the HPspmd translator

  • First, need to know which optimization strategies should be applied, by experimenting with hand-coded optimizations in HPJava and need to benchmark on parallel machines such as SP3

  • Next, develop the optimized HPJava translator, test codes and applications over next few months

  • Will continue to investigate which optimization strategies are most critical in a wider range of applications in HPspmd compilers

[email protected]

Publications and Plans

  • Han-Ku Lee, Bryan Carpenter, Geoffrey Fox, Sang Boem Lim. Benchmarking HPJava: Prospects for Performance. Feb 8, 2002. Submitted to Sixth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers(LCR2002). http://motefs.cs.umd.edu/lcr02/

  • Bryan Carpenter, Geoffrey Fox, Han-Ku Lee, and Sang Lim. Node Performance in the HPJava Parallel Programming Language. Feb, 2002. The 16th Annual ACM International Conference on Super Computing(ICS2001). http://www.lcpcworkshop.org/LCPC2001/

  • Bryan Carpenter, Geoffrey Fox, Han-Ku Lee, and Sang Lim. Translation of the HPJava Language for Parallel Programming. May 31, 2001. The 14th annual workshop on Languages and Compilers for Parallel Computing(LCPC2001). http://www.lcpcworkshop.org/LCPC2001/

  • Bryan Carpenter, Guansong Zhang, Han-Ku Lee, and Sang Lim. Parallel Programming in HPJava. Draft of May 2001. http://aspen.csit.fsu.edu/pss/HPJava/

[email protected]


  • Reviewed data-parallel languages such as HPF

  • Introduced HPspmd programming language model – SPMD framework for using libraries based on distributed arrays

    • Specific syntax, new control constructs, basic translation schemes, and basic optimization strategies for HPJava

  • Proposed work:

    • Efficient Compilation of the HPJava Language for HPC

[email protected]


  • This work was supported in part by the National Science Foundation (NSF ) Division of Advanced Computational Infrastructure and Research

  • Contract number – 9872125

[email protected]

  • Login