1 / 16

Moderator: John Mellor-Crummey Department of Computer Science Rice University

Programming Languages/Models and Compiler Technologies. Moderator: John Mellor-Crummey Department of Computer Science Rice University. Microsoft Manycore Workshop June 21, 2007. Panelists. David August - Princeton University

evers
Download Presentation

Moderator: John Mellor-Crummey Department of Computer Science Rice University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming Languages/Models and Compiler Technologies Moderator: John Mellor-Crummey Department of Computer Science Rice University Microsoft Manycore Workshop June 21, 2007

  2. Panelists • David August - Princeton University • Saman Amarasinghe - Massachusetts Institute of Technology • Guy Blelloch - Carnegie Mellon University • Charles Leiserson - Massachusetts Institute of Technology • Uzi Vishkin - University of Maryland, College Park

  3. Architectural Challenges • Significant parallelism • Multiple kinds of parallelism • cores • ILP • SIMD • Diversity of cores • Run-time throttling of cores for power mgmt • Memory hierarchy • bandwidth • near term: will continue to be a significant bottleneck • long term: 3D stacked memory? • long and often non-uniform memory latencies • scratch pads

  4. Roles of Parallel Programming Models • Enhance programmer productivity through abstraction • Manage platform resources to deliver performance • Provide standard interface for platform portability

  5. The Goal Simpler ways of conceptualizing, expressing, debugging, and tuning scalable parallel programs • Multiple models will be necessary • Models will necessarily trade off simplicity, expressivity, relevance to legacy code, and performance

  6. To Succeed, Parallel Programming Models Must … • Be ubiquitous • cross platform • at a minimum: laptops, SMP servers • distributed memory clusters? • Be expressive • Be productive • easy to write • easy to read and maintain • easy to reuse • Have a promise of future availability and longevity • Be efficient • Be supported by tools

  7. Simplifying Parallel Programming A high-level parallel language should … • Provide global address space • beware exposed buffering … • Separate concerns: partitioning, mapping, and synchronization vs. algorithm specification • “viscosity” comes from premature mingling of these issues • Enable programmer to manage locality at a high level • locality = performance • affinity between data and computation • e.g. HPF’s “ON HOME” declarations

  8. Design Issues I • Ultimate control vs. simplicity of use • “library developers” vs. “productivity users” • should it be the same language for both? • extensible language model (Sun’s Fortress) • kitchen sink model (X10) • Implicit vs. explicit parallelism • implicit parallelism is often more malleable • better supports dynamic adaptation • Compiler assisted vs. compiler-centric • Co-array Fortran and UPC • user control over work decomposition, data movement, and synchronization • HPF: compiler must deliver or all is lost • Lazy vs. eager parallelism • Cilk’s lazy parallelism provides a model for “scalable” binaries • eager parallelism adds unnecessary overhead

  9. Design Issues II • Deterministic vs. non-deterministic models • deterministic “clocked final model” • Saraswat et al. (www.saraswat.org/cf.pdf) • Static vs. dynamic scheduling • dynamic scheduling will be increasingly important • irregular computations, task parallelism • adaptive scheduling in response to “core throttling” • Cooperative vs. independent scheduling of work • does benefit of shared cache outweigh difficulty of using it? • tightly synchronous vs. more loosely synchronous • Scalable to distributed-memory ensembles? • broad community probably only cares about tightly-coupled platforms • some government and industry clients will always have extreme needs • Importance of managing affinity between cores and data • important for highest efficiency for library developers

  10. Transactions are not “THE” Answer • Transactions are a piece of the puzzle: atomicity • Other aspects of the parallel programming problem • identifying concurrency • partitioning work • ordering actions

  11. Autotuning • Seductive idea • Very successful as a library-based approach • FFTW, Atlas, OSKI, … • Much work needed to apply to applications rather than kernels • huge search space • progress in effective truncated search • model guidance can be effective • autotuning for parallelism • dangerously close to automatic parallelization

  12. Rice Experience: Lessons from HPF • Good data and computation partitionings are essential • without good partitionings, parallelism suffers • flexible user-control is essential • Excess communication undermines scalability • both frequency and volume must be right • embrace user hints to guide communication placement and optimization • e.g. HPF/JA directives: REFLECT, LOCAL, PIPELINE, etc. • Single processor efficiency is critical • must use caches effectively on microprocessors • Icache: beware of complex machine-generated code • Dcache: beware of communication footprint • Optimizing tightly-coupled algorithms can be hard • if the compiler doesn’t optimize it, performance may be doomed!

  13. Rice Experience: HPF vs. Co-array Fortran • Rice dHPF - a decade of investment in compiler technology • not quite, govt cut funding here too, just like architecture • polyhedral code generation models (like Lethin described) • Co-array Fortran for clusters • a few years effort by a pair of students • Result: Co-array Fortran bests HPF • more expressive • higher performance • shorter time to solution • currently, can be HARDER to program than MPI

  14. Principal Compiler and Runtime Challenges • Exploiting multiple levels of heterogeneous parallelism • Choreographing parallelism, data movement, synchronization • Managing memory hierarchy • cache • scratch pad Warning: Don’t try this at home.

  15. Programming Model Ecosystem Issues • Semantic mismatch between programming model and execution model • Debugging: data races and non-determinism • Performance analysis: why isn’t performance scaling • insufficient parallelism • parallelism is too fine grain to be efficient • architecture level issues, e.g., false sharing

  16. A Path Forward • Kernel, benchmark, and application driven studies • assess strengths and weaknesses of models • Explore alternatives & evaluate their effects on • simplicity • expressiveness • correctness • performance

More Related