Andrew hobbs
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Threads Cannot be Implemented As a Library PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on
  • Presentation posted in: General

Andrew Hobbs. Threads Cannot be Implemented As a Library. As a library...what does that mean?. Language specification doesn't say anything about it The specification defines what compilers should do So the compiler doesn't know about them either. How does this affect programming?.

Download Presentation

Threads Cannot be Implemented As a Library

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Andrew hobbs

Andrew Hobbs

Threads Cannot be Implemented As a Library


As a library what does that mean

As a library...what does that mean?

  • Language specification doesn't say anything about it

    • The specification defines what compilers should do

    • So the compiler doesn't know about them either


How does this affect programming

How does this affect programming?

  • The compiler transforms your code to hopefully make it as fast as possible

    • It has some restrictions, depending on the language specification

  • But if the compiler doesn't know about concurrency...

    • It can make optimizations that are valid in sequential programs, but can cause bugs in multiprocessor environments


An example

An example

Assuming x and y are both set to 0, suppose we have 2 threads:

Thread 1:

x = 1; r1 = y;

Thread 2:

y = 1; r2 = x;

What are the possible values of r1 and r2 at the end of both threads executing?


An example1

An example

But what if our compiler changes our code to the following?

Thread 1:

r1 = y; x = 1;

Thread 2:

r2 = x; y = 1;

What are the possible values of r1 and r2 at the end of both threads executing?

This results could turn out differently...but from the compiler's view, everything is fine, because it doesn't know each thread can interact with others.


Why did this happen

Why did this happen?

  • The compiler didn't know about concurrency, so it performed optimizations assuming sequential execution

    • Some of these don't work with concurrency!

  • In fact, the hardware itself can also do this in an attempt to speed up execution, by (for example) putting loads before unrelated stores


The pthreads approach

The Pthreads approach

No threads shall read or modify memory that another thread is modifying (such an activity is called a race condition)

To restrict access, the programmer uses synchronization routines:

  • pthread_mutex.lock()

  • pthread_mutex.unlock()


The pthreads approach1

The Pthreads approach

If the programmer uses the synchronization methods correctly to prevent race conditions, then they should have no issues

But this isn't quite true...


Concurrent modification

Concurrent modification

Suppose we had the following two threads:

Thread 1:

if (x == 1) ++y;

Thread 2:

if (y == 1) ++x;

Is there a data race in this program?


Concurrent modification1

Concurrent modification

What if our compiler modified our code a little?

Thread 1:

++y; if (x != 1) --y;

Thread 2:

++x; if (y != 1) --x;

Is there a data race in this program?


Adjacent data

Adjacent data

Suppose we had the following structure definition:

struct { int a:17; int b:15 } x;

There are probably no machines that have a 17-bit wide store, so if someone were to attempt to execute: x.a = 42; it would probably be done like this:

{

tmp = x; // Read both fields into

// 32-bit variable

tmp &= ~0x1ffff; // Mask off old a

tmp |= 42;

x = tmp; // Overwrite all of x

}


Adjacent data1

Adjacent data

Suppose we had the following structure definition:

struct { char a; char b; char c; char d;

Char e; char f; char g; char h; } x;

Where a is the only field that needs to be protected by a lock.

If that was the case, some programmer might write the following code:

x.b = ’b’; x.c = ’c’; x.d = ’d’;

x.e = ’e’; x.f = ’f’; x.g = ’g’; x.h = ’h’;

x = ’hgfedcb\0’ | x.a;

But a compiler might realize that it could just write all of the data at once as a 64-bit quantity (not exact syntax):


Register promotion

Register Promotion

Suppose we had a global shared variable x, protected by a lock...but only conditionally, perhaps only if we had actually created other threads:

for (...) {

...

if (mt) pthread_mutex_lock(...);

x = ... x ...

if (mt) pthread_mutex_unlock(...);

}

r = x;

for (...) {

...

if (mt) {

x = r; pthread_mutex_lock(...); r = x;

}

r = ... r ...

if (mt) {

x = r; pthread_mutex_unlock(...); r = x;

}

}

x = r;

If the conditionals are rarely taken, it might decide to promote x to a register to increase the performance:


What does this mean

What does this mean?

Pthreads says that as long as we prevent race conditions with the synchronization functions, we will be fine

But since our compiler doesn't know, it might make optimizations that break it, even though it looks perfectly fine to us

We can't use locks at a high level if the presence of race conditions depends on the compiler and the hardware


Performance

Performance

So why are we running multiple threads?

To (hopefully) get better performance out of our program

But locking is expensive! Atomic updates are hundreds of times slower than normal ones


Is synchronization always needed

Is synchronization always needed?

Consider the following Sieve of Eratosthenes implementation:

for (my_prime = start;

my_prime < 10000; ++my_prime)

if (!get(my_prime)) {

for (multiple = my_prime;

multiple < 100000000;

multiple += my_prime)

if (!get(multiple)) set(multiple);

}

What happens if we run this on multiple threads, with all of them accessing one shared data block?


The conclusions

The conclusions?

Sometimes there are times when you can gain large performance benefits without directly using atomic operations

But if we use a library that disallows this (like Pthreads), we are throwing away this ability

But we are allowed to, then we need the compiler and hardware to somehow know about it and help us


The conclusions1

The conclusions?

So how do we get the compiler and hardware to help us?

We need to have the programming language itself define a memory model so that the programmer knows whether there are races

Only if we have that can we reason about our programs


  • Login