Compiler Support for Multithreaded Software

Compiler Support forMultithreaded Software Jeremy Condit Rob von Behren Feng Zhou Eric Brewer George Necula

Designing Concurrent Systems • The great debate: threads vs. events • Thread model • Each logically concurrent task is represented by a thread • Modules communicate via call/return • Each thread gets its own stack • Event model • Each logically concurrent task is represented by an event • Modules communicate via event passing • Event handlers unwind stack after each event

Conventional Wisdom • Recent research favors event-based model • Event handlers execute atomically • Lower overhead for managing state • Better scheduling and locality • More flexible control flow • TinyOS, SEDA, Flash, … • We argue that all of these benefits can be achieved in thread-based systems • Thread systems and event systems are duals • Duality proposed by Lauer and Needham in 1978 (message passing vs. process-based systems)

The Stack Problem • How do we limit stack space? • Event systems: stacks are empty at end of handler • Thread systems: stacks can be arbitrarily large at (or between) blocking points • Old solution: preallocate large stacks • Inappropriate when memory available to each thread is limited • New solution: linked stack frames • This talk!

Linked Stack Goals • Limit amount of preallocated memory • Enable stacks of arbitrary size • Recursive functions • Temporary buffers • Short-lived “spikes” in stack size • Provide development tools • Existing debuggers • Profiling tools to tune stack allocation

preallocate small chunks Our Options preallocated whole stack never preallocate

Instrumenting Call Sites • Add instrumentation to some call sites • Check for sufficient stack space • Allocate and link new chunk if necessary • How much space is sufficient? • Largest amount of stack space used until another instrumented call site is reached • How do we get this information? • Analyze call graph at compile time • Dynamic programming • Instrumenting call site $ removing graph edge

Call Graph Analysis Input: • Call graph • MaxPath parameter Output: • Set of edges to instrument • Stack bound for each node • Instrument all back edges • Process each node in call graph, bottom-up • For each successor • Let bound = successor’s bound + current node’s stack • If bound > MaxPath, instrument edge • Set node’s stack bound

Call Graph Example 5k 9k 2k 8k 4k 10k 1k 2k 2k 1k 2k 2k MaxPath = 10k

Wasted Space • Two kinds of wasted space: • Internal: unused space at the end of interior chunks • External: unused space at the end of the final chunk

Tuning • Two parameters: • MaxPath: maximum desired path length • MinChunk: minimum allowable chunk size • Tradeoffs: amount of instrumentation internal wasted space external wasted space MaxPath MinChunk

Results: Apache 2.0.44 • MaxPath = 4 KB, MinChunk = 8 KB • Compile-time statistics • 7500 call sites • 17% instrumented external calls • 5% instrumented internal calls • Run-time statistics (one request) • 1300 function calls • 25% instrumented external calls • instrumentation can be eliminated in 95% of cases • 8% instrumented internal calls • new chunk linked in 25% of cases • 1000 instructions per request

Conclusions • Threads and events are duals • With proper compiler support, threads can perform just as well as events • Threads provide a more appropriate abstraction for many concurrent applications • Linked stacks can reduce stack waste • One example of compiler support for threads

Compiler Support for Multithreaded Software