1 / 13

Software Enablement for Multicore Architectures

Software Enablement for Multicore Architectures. David Bernstein Bilha Mendelson Bernstn@il.ibm.com bilha@il.ibm.com. 20. ?. Conventional Bulk CMOS. SOI (silicon-on-insulator). 10. 8. High mobility. Double-Gate. 6. 4. Relative Device Performance. 2. 1. 0.8. 0.6.

mears
Download Presentation

Software Enablement for Multicore Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Enablement for Multicore Architectures David Bernstein Bilha Mendelson Bernstn@il.ibm.com bilha@il.ibm.com

  2. 20 ? Conventional Bulk CMOS SOI (silicon-on-insulator) 10 8 High mobility Double-Gate 6 4 Relative Device Performance 2 1 0.8 0.6 0.4 0.2 1988 1992 1996 2000 2004 2008 2012 Year Technology Scaling – We’ve Hit The Wall

  3. CMOS IBM GP ? IBM RY5 Pentium 4 IBM RY7 Pulsar IBM RY6 IBM RY4 Apache Merced Pentium II(DSIP) Has This Ever Happened Before? 140 Bipolar 120 IBM ES9000 100 80 Fujitsu VP2000 Watts / cm2 IBM 3090S NTT 60 Fujitsu M-780 40 IBM 3090 Start of CDC Cyber 205 IBM 4381 Water Cooling 20 IBM 3081 Fujitsu M380 IBM 370 IBM 3033 IBM 360 Vacuum 0 1950 1960 1970 1980 1990 2000 2010 Source: Bernie Meyerson, IBM

  4. Sun’s 8-Core Chips: T1 - Niagra Industry trends Intel Quad-Core Cell Broadband Engine

  5. High Speed Network Core Hierarchy of Modular Building Blocks • Systems will increasingly need to implement a hybrid execution model • New programming systems need to reduce the need for programmer awareness of the topology on which their program executes Hierarchical SMP servers with non-uniform memory access characteristics Grid/Cluster Rack Hierarchical SMP servers with NUMA characteristics High Speed Network SMP Interconnect Board • Homogenous SMP on Board • 2 – 128 HW contexts on board • Main Processor(s) with Accelerator(s) • Master-Slace relationship between entities Memory Memory Chip • Heterogenous collection of processors on chip • Heterogenity at data and control flow level • Homogenous SMP on chip • 2-32 HW contexts on chip • Various forms of resource sharing I/OAttach Cache Interconnect Fabric MemCtrl Core Core • The next gen programming system must support programming simplicity while leveraging the performance of the underlying HW topology. Core Core will support multiple HW threads sharing a single cache exhibiting SMP characteristics.

  6. Architecture trends • Several processor cores on a chip and specialized computing engines • XML processing, cryptography, graphics • Questions: • how to interconnect large number of processor cores • how to provide sufficient memory bandwidth • how to structure the multilevel caching subsystem • how to balance the general purpose computing resources with specialized processing engines and all the supporting memory, caching and interconnect structure, given a constant power budget • Software development processes • how to program for multicore architectures • how to test and evaluate the performance of multithreaded applications

  7. Programming multiprocessor systems • Two main directions: • explicit manual programming • exploit the combination of compiler optimization, build tool chains, and run-time subsystems • In HPC and embedded communities: • emphasis was more on explicit manual programming and special resources by expert programmers • resulted in numerous home-grown language directives and extensions, internal tools, obscure run-time systems • hardly portable to new generations of hardware

  8. Programming languages • Very few new languages were invented in the last 2 decades • Java - virtual machine, interpreter, JIT, garbage collection, set of libraries, etc. • Can multicore spur development of new language/environment for parallelism? • map-reduce, cilk, UPC, X10, and STAPL • programmers can provide additional information related to parallelism • Multicore provide multiple types of parallelism • thread-level parallelism (TLP) – coarse-grain • OpenMP - standard for shared-memory models • MPI - standard for distributed-memory models • pthreads, java threads - explicitly use • automatic parallelization optimizations • Most of the original auto-parallelizing compilers focused on FORTRAN • data-level parallelism (DLP) – fine-grain • auto-vectorization, auto-simdification • What about asymmetric multicore architectures (like Cell processor)? • is it possible to have a single source compilation for multiple ISAs? - initial attempts… • how OpenMP can be used for programs - streaming

  9. Performance Analysis Tools • Profile based tools – data aggregation • FDPR-Pro, Code Analyzer, Diablo • Performance evaluation is heavily influenced by thread interaction • stales, locks, races, memory thrashing, pollute hardware counters • trace-based analysis and visualization • introduces timeline views and data to deal with communication issues • lack of scalability: • tend to grow fast, making it difficult to manipulate and visualize • In HPC context: selecting arbitrary subset of cores/threads and arbitrary time intervals • tracing might disturbs program's behavior • HPCToolkit, TAU, Paraver, VTune, Code Analyzer, PDT, Trace Analyzer • Lack of determinism

  10. Performance tools for multi-core: Cell Visual Performance Analyzer 5.0 Cell SDK 3.0 • Infrastructure for collecting profiles on several systems • Infrastructure for using databases for large data sets • Set of interconnected views • Cell support LockAnalyzer PDT ProfileAnalyzer CodeAnalyzer PipelineAnalyzer TraceAnalyzer • Infrastructure for collecting traces on SDK 3.0 libraries • Analysis of lock usage • Input for Trace Analyzer

  11. Debugging and testing tools • Concurrent problems constitute about 10% of the bugs • Bugs like crashes (races) or freeze (deadlocks) stay in the application reducing the up-time • Testing is done at load testing - very late in the process • We have been working on a tool supported methodology • try to find the concurrency issues as early as possible: • teach how to write concurrent code • concurrent bug patterns • explain the concurrent programming constructs • teach general concurrency design patterns • reviews - developed a specialized review technique for concurrent code • teach how to do unit testing - developed synchronization coverage • ConTest - a tool supported method for measuring contention • Make the tests that are likely to exhibit bugs - changing the internal timing • Tools for pinpointing locations of bugs • if we have a test that we can cause the application to fail some of the time • healing bugs so that the impact will not be seen

  12. Software trends • Software enablement system for multicores • Various directions for providing solutions • Active area of research • only some early results in the academic and industrial worlds in terms of established standards and technology • much more will evolve in the years to come • Need: • programming models and compiler support for multicores • performance evaluation tools • testing and debugging tools

  13. Thank You

More Related