Computer Systems Principles Concurrency Patterns

Computer Systems PrinciplesConcurrency Patterns Emery Berger and Mark Corner University of Massachusetts Amherst

Web Server webserver • Client (browser) • Requests HTML, images • Server • Caches requests • Sends to client not found http://server/Easter-bunny/200x100/75.jpg client

Possible Implementation while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }

Possible Implementation while (true) { wait for connection; // net read from socket & parse URL; // cpu look up URL contents in cache; // cpu if (!in cache) { fetch from disk / execute CGI;//disk put in cache; // cpu } send data to client; // net }

Problem: Concurrency webserver • Sequential fine until: • More clients • Bigger server • Multicores, multiprocessors • Goals: • Hide latency of I/O • Don’t keep clients waiting • Improve throughput • Serve up more pages clients

Building Concurrent Apps • Patterns / Architectures • Thread pools • Producer-consumer • “Bag of tasks” • Worker threads (work stealing) • Goals: • Minimize latency • Maximize parallelism • Keep progs. simple to program & maintain

Thread Pools • Thread creation relatively expensive • Instead: use pool of threads • When new task arrives, get thread from pool to work on it; block if pool empty • Faster with many tasks • Limits max threads (thus resources) • ( ThreadPoolExecutor class in Java)

Producer-Consumer • Can get pipeline parallelism: • One thread (producer) does work • E.g., I/O • and hands it off to other thread (consumer) producer consumer

Producer-Consumer • Can get pipeline parallelism: • One thread (producer) does work • E.g., I/O • and hands it off to other thread (consumer) producer consumer LinkedBlockingQueueBlocks on put() if full, poll() if empty

Producer-Consumer Web Server • Use 2 threads: producer & consumer • queue.put(x) and x = queue.poll(); while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } while (true) { do something… queue.put (x); } while (true) { x = queue.poll(); do something… }

Producer-Consumer Web Server • Pair of threads – one reads, one writes while (true) { wait for connection; read from socket & parse URL; queue.put (URL); } while (true) { URL = queue.poll(); look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }

Producer-Consumer Web Server • More parallelism –optimizes common case (cache hit) while (true) { wait for connection; read from socket & parse URL; queue1.put (URL); } while (true) { URL = queue1.poll(); look up URL contents in cache; if (!in cache) { queue2.put (URL); return; } send data to client; } 1 2 while (true) { URL = queue2.poll(); fetch from disk / execute CGI; put in cache; send data to client; }

When to Use Producer-Consumer • Works well for pairs of threads • Best if producer & consumer are symmetric • Proceed roughly at same rate • Order of operations matters • Not as good for • Many threads • Order doesn’t matter • Different rates of progress

Producer-Consumer Web Server • Should balance load across threads while (true) { wait for connection; read from socket & parse URL; queue1.put (URL); } while (true) { URL = queue1.poll(); look up URL contents in cache; if (!in cache) { queue2.put (URL); } send data to client; } 1 2 while (true) { URL = queue2.poll(); fetch from disk / execute CGI; put in cache; send data to client; }

Bag of Tasks • Collection of mostly independent tasks worker worker worker worker

Bag of Tasks • Collection of mostly independent tasks • Bag could also be LinkedBlockingQueue(put, poll) addWork worker worker worker worker

Exercise: Restructure into BOT • Re-structure this into bag of tasks: • addWork & worker threads • t = bag.poll() or bag.put(t) while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }

Exercise: Restructure into BOT • Re-structure this into bag of tasks: • addWork & worker • t = bag.poll() or bag.put(t) addWork: while (true) { wait for connection; t.URL = URL; t.sock = socket; bag.put (t); } Worker: while (true) { t = bag.poll(); look up t.URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client via t.sock; }

Bag of Tasks Web Server • Re-structure this into bag of tasks: • t = bag.poll() or bag.put(t) addWork: while (true){ wait for connection; bag.put (URL); } worker addWork worker: while (true) { URL = bag.poll(); look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } worker worker

Bag of Tasks vs. Prod/Consumer • Exploits more parallelism • Even with coarse-grained threads • Don’t have to break up tasks too finely • What does task size affect? • possibly latency… smaller might be better • Easy to change or add new functionality • But: one major performance problem…

What’s the Problem? addWork worker worker worker worker

What’s the Problem? • Contention – single lock on structure • Bottleneck to scalability addWork worker worker worker worker

Work Queues • Each thread has own work queue (deque) • No single point of contention • Threads now generic “executors” • Tasks (balls): blue = parse, yellow = connect… executor executor executor executor

Work Queues • Each thread has own work queue (deque) • No single point of contention executor executor executor executor

Work Queues • Each thread has own work queue • No single point of contention • Now what? executor executor executor executor

Work Stealing • When thread runs out of work,steal work from random other thread worker worker worker worker

Work Stealing • When thread runs out of work,steal work from top of random deque • Optimal load balancing algorithm worker worker worker worker

Work Stealing Web Server • Re-structure:readURL, lookUp, addToCache, output • myQueue.put(new readURL (url)) while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }

while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } readURL, lookUp, addToCache, output class Work {public: virtual void run();}; class readURL : public Work {public: void run() {…} readURL (socket s) { …}};

readURL lookUp addToCache worker output

class readURL {public: void run() { read from socket, f = get file myQueue.put (new lookUp(_s, f)); } readURL(socket s) { _s = s; }};

class lookUp {public: void run() { look in cache for file “f” if (!found) myQueue.put (new addToCache(_f)); else myQueue.put (new Output(s, cont)); } lookUp (socket s, string f) { _s = s; _f = f; }};

class addToCache {public: void run() { fetch file f from disk into cont add file to cache (hashmap) myQueue.put (new Output(s, cont)); }

Work Stealing Web Server • Re-structure:readURL, lookUp, addToCache, output • myQueue.put(new readURL (url)) readURL(url) { wait for connection; read from socket & parse URL; myQueue.put (new lookUp (URL)); }

Work Stealing Web Server • Re-structure:readURL, lookUp, addToCache, output • myQueue.put(new readURL (url)) readURL(url) { wait for connection; read from socket & parse URL; myQueue.put (new lookUp (URL)); } lookUp(url) { look up URL contents in cache; if (!in cache) { myQueue.put (new addToCache (URL)); } else { myQueue.put (new output(contents)); } } addToCache(URL) { fetch from disk / execute CGI; put in cache; myQueue.put (new output(contents)); }

Work Stealing • Works great for heterogeneous tasks • Convert addWork and worker into units of work (different colors) • Flexible: can easily re-define tasks • Coarse, fine-grained, anything in-between • Automatic load balancing • Separates thread logic from functionality • Popular model for structuring servers

The End

Computer Systems Principles Concurrency Patterns