1 / 11

The Berkeley UPC Compiler

The Berkeley UPC Compiler. Wei Chen The LBNL/Berkeley UPC Group. Unified Parallel C (UPC). UPC is a parallel extension to C for scientific computing With distributed arrays, shared pointers, parallel loops, strict/relaxed memory model. Global Address Space Abstraction SPMD parallelism

htony
Download Presentation

The Berkeley UPC Compiler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Berkeley UPC Compiler Wei Chen The LBNL/Berkeley UPC Group

  2. Unified Parallel C (UPC) • UPC is a parallel extension to C for scientific computing • With distributed arrays, shared pointers, parallel loops, strict/relaxed memory model. • Global Address Space Abstraction • SPMD parallelism • There are vendor compilers on several machines • HP Alpha Server, Cray, Sun, SGI • Open source compiler developed by LBNL/UCB (beta release 3/31)

  3. Overview of Berkeley UPC Compiler UPC Code Translator Open64 based Platform- independent Translator Generated C Code Network- independent Berkeley UPC Runtime System Compiler- independent GASNet Communication System Language- independent Network Hardware Two Goals: Portability and High-Performance

  4. Implementing the UPC to C Translator Preprocessed File • Source to source translation • Ported to gcc 3.2 (done by Rice Open64) • Supports both 32/64 bit platforms • Designed to incorporate existing • optimization framework (currently not enabled) • Communicate with runtime via a standard API and configuration files UPC front end VH Whirl w/ shared types Backend lowering High Whirl w/ runtime calls Whirl2c ANSI-compliant C Code

  5. Components in the Translator • Front end: • UPC extensions to C: shared qualifier, block size, forall loops, builtin functions and values (blocksizeof, localsizeof, etc.), strict/relaxed • Parses and type-checks UPC code, generates Whirl, with UPC-specific information available in symbol table • Backend: • Transform shared read and writes into calls into runtime library (after LNO on H whirl). • Calls can be blocking/non-blocking/bulk/register-based • Whirl2c: • Shared variables are declared as opaque pointer-to-shared • For static shared variables, allocate and initialize them dynamically

  6. Modifications • Symbol Table • Add flags for shared, strict/relaxed, and block size for TY_TAB • Intrinsics • Each UPC runtime function is represented by a new intrinsic (about 100 of them) • Driver • Use sgiupc to compile UPC programs • New flags for passing config file, number of threads • C front end • Modify gccfe/gnu to parse upc extensions, also fixes for ANSI-compliance • Modify gccfe to support upc_forall loops (transformed to WHILE_DO, marked by pragma) • Name mangling for static variables

  7. Modifications II • Backend • Add new lowering phases for transforming shared accesses • Use some VH Whirl (e.g. comma to spill return value) • Adjust field offsets for structs that have shared pointers (also in front end for sizeof) • Symbol table not consistent till lowering finishes • Dynamic nesting of forall loops • Whirl2c • Various UPC-specific changes and bug fixes • Access thread-local data through macros • Dynamically allocate static user data

  8. Future Work • Add UPC-specific optimizations • Possibly as a new phase • Likely will use/modify PREOPT and LNO (alias analysis, dependence analysis, prefetching) • Want WOPT too -- possible to extend whirl2c to work for M Whirl? • Coordination Among Releases • Our version has been merged with the Rice Open64 project • Would like to merge with either Open64 or ORC • One common CVS tree, with each team on different branches?

  9. The End

  10. UPC Programming Model Features • SPMD parallelism • fixed number of images during execution • images operate asynchronously • Several kinds of array distributions • double a[n] a private array on each processor • shared double a[n] a shared array, with cyclic mapping • shared [4] double a[n] a block cyclic array with 4-element blocks • shared [0] double *a = (shared [0] double *) upc_alloc(n); a shared array with all elements local • Pointers for irregular data structures • shared double *sp a pointer to shared data • double *lp a pointers to private data

  11. Parallel Loops in UPC • UPC has a “forall” construct for distributing computation Ex: Vector Addition shared int v1[N], v2[N], v3[N]; upc_forall (i=0; i < N; i++; &v3[i]) { v3[i] = v2[i] + v1[i]; } • Two kinds of affinity expressions: • Integer (compare with thread id) • Shared address (check the affinity of address) • Affinity tests are performed on every iteration Affinity Exp

More Related