1 / 12

Parallelizing the GAP Kernel

Parallelizing the GAP Kernel. Reimer Behrends University of St. Andrews. The GAP Kernel. 170,000 lines of sequential C code. Hundreds of global and static variables. Custom generational garbage collector. Goal: Allow multi-threaded execution. Multiple Interpreter Instances.

adonis
Download Presentation

Parallelizing the GAP Kernel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelizingthe GAP Kernel Reimer Behrends University of St. Andrews

  2. The GAP Kernel • 170,000 lines of sequential C code. • Hundreds of global and static variables. • Custom generational garbage collector. • Goal: Allow multi-threaded execution.

  3. Multiple Interpreter Instances • Interpreter state stored in global variables. • Objectify interpreter state? – or – • Use thread-local storage?

  4. Objectify Interpreter State • Global variable use is pervasive. • Vast majority of functions/macros • Need access to state themselves or • Have to pass it to functions they call. • Function tables. • Too invasive for the code base overall.

  5. Thread-Local Storage • No portable solution. • Only some systems support a TLS ABI. • __thread in gcc, .tls storage segment • pthread_getspecific() portable, but slow. • Use: SP/FP-relative addressing. • Thread stack is allocated on power-of-2 boundaries. • Mask lower bits to derive base of stack area. • pthread_setstack(), alloca().

  6. Garbage Collection • Current “gasman” collector: • Difficult to adapt to multi-threaded environment. • Serialization bottlenecks (CHANGED_BAG). • Interim solution: BDW conservative collector. • Has thread support. • Largely plug-and-play. • Adaptation uses gasman API. • However: Problems with the 64-bit version. • Need finalization.

  7. Synchronization • Programming model still “under construction”. • Build a set of basic thread manipulation and synchronization primitives.

  8. Thread Management • Thread management primitives: • id := CreateThread(func, arg1, …, argn); • WaitThread(id); • Example: x := a; id := CreateThread(function(y) x := x + y; end, b); WaitThread(id);

  9. Channels • Channels are FIFO queues • SendChannel(channel, object); • object := ReceiveChannel(channel); • Blocking and polling versions. • Both bounded and unbounded channel size. • Multiplexing: • object := ReceiveAnyChannel(ch1, …, chn);

  10. Barriers • StartBarrier(barrier, count); • WaitBarrier(barrier); • WaitBarrier(barrier, function);

  11. Single Assignment Variables • WriteSyncVar(var, value); • Only one write permitted. • Subsequent writes result in errors. • value := ReadSyncVar(var); • Blocks if ‘var’ has not been written yet.

  12. Build Process • HPC GAP internal builds use SCons. • Automatic and clean dependency tracking for C. • Proper rebuilds for changes in build setup. • E.g., scons gmp=no. • Python easier to write than m4+/bin/sh+make.

More Related