determinate imperative programming the cf model n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Determinate Imperative Programming: The CF Model PowerPoint Presentation
Download Presentation
Determinate Imperative Programming: The CF Model

Loading in 2 Seconds...

play fullscreen
1 / 22

Determinate Imperative Programming: The CF Model - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Determinate Imperative Programming: The CF Model. Vijay Saraswat IBM TJ Watson Research Center joint work with Radha Jagadeesan, Armando Solar-Lezama, Christoph von Praun http://www.saraswat.org/cf.html. Problem: Many concurrent imperative programs are determinate.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Determinate Imperative Programming: The CF Model' - americus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
determinate imperative programming the cf model

Determinate Imperative Programming: The CF Model

Vijay Saraswat

IBM TJ Watson Research Center

joint work with Radha Jagadeesan, Armando Solar-Lezama, Christoph von Praun

http://www.saraswat.org/cf.html

outline
Problem:

Many concurrent imperative programs are determinate.

Determinacy is not apparent from the syntax.

Basic idea

A variable is the stream of values written to it by a thread.

Many examples

Semantics

Implementation

Future work

Outline
background x10
Five basic themes:

Partitioned address space

Pervasive explicit asynchrony (Cilk-style recursive parallelism)

Java base

Guaranteed VM invariants

Explicit, distributed VM

Few language extensions

<s> = async <s>

<s> = finish <s>

<s> = foreach ( <v>, …,<v> in <e>) <s>

Multidimensional arrays over distributions

Background: X10

Subsumes MPI, OpenMP, SPMD languages, Cilk …

x10 clocks clocked final data structures
Clocks can be created dynamically.

Activities are registered with clocks.

An activity may register a newly created activity with one of its clocks.

“next;” resumes each clock; blocks until each clock advances.

This is sufficient for deadlock-freedom.

Adequate for parallel operations on arrays

But not dataflow

Clock advances when all activities registered on it resume the clock.

Operations

c.resume(); next;

c.drop();

Clocked final datum

In each phase of the clock the datum is immutable.

Read gets current value; write updates in next phase.

X10: clocks, clocked final data structures

Clocks do not introduce deadlock; clocked finals are determinate.

clocked final example array relaxation
Clocked final example: Array relaxation

G elements are assigned to at most once in each phase of clock c.

Each activity is registered on c.

intclocked (c) final[0:M-1,0:N-1] G = …;

finish foreach (int i,j in [1:M-1,1:N-1]) clocked (c) {

for (int p in [0:TimeStep-1]) {

G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j];

next;

}

}

Read current value of cell.

Wait for clock to advance.

Write visible (only) when clock advances.

Takeaway: Each cell is assigned a clocked stream of immutable values.

imperative programming revisited
Variables

Value in a Box

Read: fetch current value

Write: change value

Stability condition: Value does not change unless a write is performed

Very powerful

Permit repeated many-writer, many-reader communication through arbitrary reference graphs

Asynchrony introduces indeterminacy

May write out either 0 or 1.

Imperative Programming Revisited

int x = 0;

async x=1;

print(x);

Reader-reader, reader-writer, writer-writer conflicts.

determinate concurrent imperative frameworks
Asynchronous Kahn networks

Nodes can be thought of as (continuous) functions over streams.

Pop/peek

Push

Node-local state may mutate arbitrarily

Concurrent Constraint Programming

Tell constraints

Ask if a constraint is true

Subsumes Kahn networks (dataflow).

Subsumes (det) concurrent logic programming, lazy functional programming

Determinate Concurrent Imperative frameworks

Do not support arbitrary mutable variables.

determinate concurrent imperative frameworks1
Safe Asynchrony (Steele 1991)

Parent may communicate with children.

Children may communicate with parent.

Siblings may communicate with each other only through commutative, associative writes (“commuting writes”).

Determinate Concurrent Imperative Frameworks

Good:

int x=0;

finish foreach (int i in 1:N) {

x += i;

}

print(x); // N*(N+1)/2

Bad:

int x=0;

finish foreach (int i in 1:N) {

x += i;

async print(x);

}

Useful but limited. Does not permit dataflow synch.

the cf basic model
A shared variable is a stream of immutable values.

Each activity maintains an index i + clean/dirty bit for every shared variable.

Initially i=1, v[0] contains initial value.

Read: If clean, block until v[i] is written and return v[i++] else return v[i-1]. Mark as clean.

Write: Write into v[i++]. Mark as dirty.

A read stutters (returns value in last phase) if no activity can write in this phase.

E.g. for local variables.

World Map=Collection of indices for an activity.

Index transmission rules.

Activity initialized with current world map of parent activity.

On finish, world map of activity is lubbed with world map of finished activities. (clean lub dirty = clean)

All programs are determinate and scheduler independent.

May deadlock … nexts are not conjunctive.

The CF Basic model

The clock of clocked final is made implicit.

cf example array relaxation
CF example: Array relaxation

shared int [0:M-1,0:N-1] G = …;

finish foreach (int i,j in [1:M-1,1:N-1]) {

for (int p in [0:TimeStep-1]) {

G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j];

}

}

All clock manipulations are implicit.

some simple examples
Some simple examples

shared int x=0;

finish {

async {int r1 = x; int r2 = x; println(r1); println(r2);}

async {x=1;x=2;}

}

0

1

Only one result – independent of the scheduler!

some simple examples1
Some simple examples

shared int x=0;

finish {

async {int r1 = x; int r2 = x; println(r1); println(r2);}

async {x=1;}

async {x=1; int r3 = x; async {x=2;}}

}

println(x);

0

1

2

All programs are determinate.

some streamit examples
Some StreamIt examples

X10/CF

StreamIt

0

1

void -> void pipeline Minimal {

add IntSource;

add IntPrinter;

}

void ->int filter IntSource {

int x;

init {x=0;}

work push 1 { push(x++);}

}

int->void filter IntPrinter {

work pop 1 { print(pop());}

}

shared int x=0;

async while (true) x++;

async while (true) println(x);

The communication is through assignment to x, so the same result is obtained with:

0

1

shared int x=0;

async while (true) ++x;

async while (true) println(x);

Each shared variable is a multi-reader, multi-writer stream.

some streamit examples fibonacci
Some StreamIt examples: fibonacci

shared int x=1, y=1;

async while (true) y=x;

async while (true) x+=y;

Activity 1

Activity 2

Can express any recursive, asynchronous Kahn network.

streamit examples moving average
StreamIt examples: Moving Average

void->void pipeline MovingAverage {

add intSource();

add Averager(10);

add IntPrinter();

}

int->int filter Average(int n) {

work pop 1 push 1 peek n {

int sum=0;

for (int i=0; i < n; i++)

sum += peek(i);

push(sum/n);

pop();

}

}

shared int y=0;

shared int x=0; async while (true) x++;

async while (true) {

int sum=x;

for (int i in 1:N-1) sum += peek(x, i);

y = sum/N;

}

  • peek(x, i) reads the i’th future value, without popping it. Blocks if necessary.
streamit examples bandpass filter
StreamIt examples: Bandpass filter

float->float pipeline BandPassFilter(float rate,

float low, float high, int taps) {

add BPFCore(rate, low, high, taps);

add Subtracter();}

float ->float splitjoin BPFCore

(float rate, float low,

float high, int taps) {

split duplicate;

add LowPass(rate, low, taps, 0);

add LowPass(rate, high, taps, 0);

join roundrobin;}

float->float filter Subtracter {

Work pop 2 push 1 {

push(peek(1)-peek(0));

pop(); pop();}}

float bandPassFilter(float rate, float low,

float high, int taps, int in) {

int tmp=in;

shared int in1=tmp, in2=tmp;

async while (true) in1=in;

async while (true) in2=in;

shared int o1 = lowPass(rate, low, taps, 0, in1),

o2 = lowPass(rate, high, taps, 0, in2);

shared int o = o1-o2;

async while(true) o = o1-o2;

return o;

}

Functions return streams.

canon matrix multiplication
Canon matrix multiplication

Parameters whose values are finalized.

<final int N>void canon (double[N,N] c, double[N,N] a, double[N,N] b) {

finish foreach (int i,j in [0:N-1,0:N-1]) {

a[i,j] = a[i,(j+1) % N];

b[i,j] = b[(i+j)%N, j];

}

for (int k in [0:N-1])

finish foreach (int i,j in [0:N-1,0:N-1]) {

c[i,j] = c[i+j] + a[i,j]*b[i,j];

a[i,j] = a[i,(j+1)%N];

b[i,j] = b[(i+1)%N, j];

}

}

Local variables in each activity.

The natural sequential program works (for  finish foreach).

histogram
Histogram

<int N> [1:N][] histogram([1:N][] A) {

final int[] B = new int [1:N];

finish foreach(int i in A) B[A[i]]++;

return B;

}

  • Permit “commuting” writes to be performed simultaneously in the same phase.
  • Phase is completed when all activities that can write have written.

B’s phase is not yet complete. A subsequent read will complete it.

cilk programs with races
Cilk programs with races

int x;

cilk void foo() {

x = x +1;

}

cilk int main() {

x=0;

spawn foo();

spawn foo();

sync;

printf(“x is \%d\n”, x);

return 0;

}

Determinate: Will always print 1 in CF.

CF smoothly combines Cilk and StreamIt.

implementation
Each activity’s world map increases monotonically with time.

Use garbage collection to erase past unreachable values.

Programs with no sibling communication may be executed in buffers with unit windows.

Considering permitting user to specify bounds on variables (cf push/pop specifications in StreamIt).

This will force writes to become blocking as well.

Implementation

Scheduling strategy affects size of buffers, not result.

formalization
MJ/CF

Very straightforward additions to field read/write.

Paper contains details.

Formalization

Surprisingly localized.

future work
Future work
  • Paper contains ideas on detecting deadlock (stabilities) at runtime and recovering from them.
    • Programmability being investigated.
  • Implementation.
    • Leverage connection with StreamIt, and static scheduling.
  • Coarser granularity for indices.
    • Use same clock for many variables.
    • Permits “coordinated” changes to multiple variables.