1 / 15

Understanding UPC: A Parallel Language for Programmer Control and Global Address Space

UPC is an explicitly parallel language that allows programmers to have control over layout and scheduling, with the ability to read and write remote memory. It is easier to use than MPI for programs with complicated data structures and may offer comparable performance on certain machines. This article provides an overview of UPC, its implementations, funding, compiler status, runtime systems, and applications.

stagner
Download Presentation

Understanding UPC: A Parallel Language for Programmer Control and Global Address Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell

  2. What is UPC? • UPC is an explicitly parallel language • Global address space; can read/write remote memory • Programmer control over layout and scheduling • From Split-C, AC, PCP • Why a new language? • Easier to use than MPI, especially for program with complicated data structures • Possibly faster on some machines, but current goal is comparable performance p0 p1 p2

  3. Background • UPC efforts elsewhere • IDA: Bill Carlson, UPC promoter • GMU (documentation) and UMC (benchmarking) • HP (Alpha cluster and C+MPI compiler (with MTU)) • Cray (implementations) • Intrepid (SGI and t3e compiler) • UPC Book: • T. El-Ghazawi, B. Carlson, T. Sterling, K. Yelick • 3 chapters in draft form; goal is to have proofs by SC03 • Three components of NERSC effort • Compilers for DOE machines (SP and PC clusters) • Runtime systems for ours and other compilers • Applications and benchmarks

  4. UPC Funding • Base program funding K52004 • Compiler/translator work • Applications • Runtime for DOE machines • Part of Pmodels Center K52018 • Runtime support common to Titanium (and hopefully CoArray Fortran, at some point) • Collaboration with ARMCI group • NSA funding • UPC for “clusters”

  5. Compiler Status • NERSC compiler/translator • Costin Iancu and Wei Chen • Translates UPC to C + “Berkeley UPC Runtime” • Based on Open64 compiler for C • Status • Complete in prototype form • Debugging, tuning, extensions ongoing • Release planned for next month: • Quadrics, Myrinet, IBM/SP, and MPI • Shared memory/process implementation is next • Investigating optimization opportunities • Communication optimizations • UPC language optimizations

  6. UPC Compiler • Compiler based on Open64 • Multiple front-ends, including gcc • Intermediate form called WHIRL • Leverage standard optimizations and analyses • Pointer analysis • Loop optimizations • Current focus on C backend • IA64 possible in future • UPC Runtime built on GASNet • Portable • Language-independent UPC Higher WHIRL Optimizing transformations C + Runtime Lower WHIRL Assembly: IA64, MIPS,… + Runtime

  7. Portable Runtime Support • Developing a runtime layer that can be easily ported and tuned to multiple architectures. Direct implementations of parts of full GASNet Runtime: Global pointers (opaque type with rich set of pointer operations), memory management, job startup, etc. Generic support for UPC, CAF, Titanium GASNet Extended API: Supports put, get, locks, barrier, bulk, scatter/gather GASNet Core API: Small interface based on “Active Messages” Core sufficient for functional implementation GASNet released 1/03

  8. Communication Optimizations • Characterizing performance of current machines • Latency, overlap (communication & computation) • Plan to automatically optimization using communication performance model • Preliminary results: 10x improvement on Matmul

  9. Performance without Communication

  10. Preliminary Parallel Performance

  11. Costs of Pointer-to-Shared Arithmetic – Berkeley vs. HP • HP is faster for most operations, since HP generates assembly code • Both compilers optimize for “phaseless” pointers • For some operations, Berkeley can beat the HP (ptr comparison) • Expect gap to narrow once the proper optimizations are built-in for Berkeley UPC

  12. Applications • NAS Parallel Benchmark Sized Apps • UPC MG complete • UPC CG complete • UPC GUPS • GWU has done IS, EP, and FT • Planning on • Several Splash benchmarks • Sparse Cholesky • Possibly AMR

  13. Mesh Generation • Parallel Mesh Generation in UPC • 2D Delaunay triangulation • Based on Triangle software by Shewchuk (UCB) • Parallel version from NERSC uses dynamic load balancing, software caching, and parallel sorting

  14. Summary • Lots of progress on • Compiler • Runtime • Portable communication layer (GASNet) • Applications • Working on developing a large application that depends on UPC • Mesh generation • AMR (?), Sparse LU (?)

  15. Future Plans • Runtime support for Intrepid • Gcc-based open source compiler • Performance tuning of runtime • Additional machines (Infiniband, X1, Dolphin) • Optimization of compiled code • Communication optimizations • Automatic search-based optimizations • Application efforts

More Related