120 likes | 325 Views
Cid. CS498LVK 4 April 2006 Aaron Becker Abhinav Bhatele Isaac Dooley. Overview. Model: MIMD threads with lock-protected shared data Intended to be similar to SMP threaded programs Preprocessor for standard C compiler. Cid is C With…. Global pointers global int* n;
E N D
Cid CS498LVK 4 April 2006 Aaron Becker Abhinav Bhatele Isaac Dooley
Overview • Model: MIMD threads with lock-protected shared data • Intended to be similar to SMP threaded programs • Preprocessor for standard C compiler
Cid is C With… • Global pointers global int* n; cid_get(n, CID_READ); cid_rel(n); • Threads spawned on remote processors cid_fork(jv; ) do_work(); cid_jwait(&jv); • More stuff we’ll talk about later
struct node_s { int info; global struct node_s* left; global struct node_s* right; } node; cid_forkable global node* build_tree(int d) { node* nodep; cid_jvar jv = CID_JVAR_INITIAL; if (d == 0) return CID_NULL; else { nodep = (node*) malloc(sizeof(node)); cid_fork(jv; ) nodep->left = build_tree(d-1); cid_fork(jv; ) nodep->right = build_tree(d-1); nodep->info = compute node info; cid_jwait(&jv); return cid_to_gptr(nodep); } }
cid_forkable int sum_tree(global node* nodep) { int i, s1, s2; cid_jvar jv = CID_JVAR_INITIAL; if (nodep == CID_NULL) return 0; else { cid_get(nodep, CID_READ); cid_fork(jv; ) s1 = sum_tree(nodep->left); cid_fork(jv; ) s2 = sum_tree(nodep->right); i = nodep->info; cid_rel(nodep); cid_jwait(&jv); return i + s1 + s2; } } What’s wrong with this approach?
cid_forkable int sum_tree(global node* nodep) { int i, s1, s2; cid_jvar jv = CID_JVAR_INITIAL; if (nodep == CID_NULL) return 0; else { cid_get(nodep, CID_READ); cid_fork(jv; cid_to_pe(nodep->left)) s1 = sum_tree(nodep->left); cid_fork(jv; cid_to_pe(nodep->right)) s2 = sum_tree(nodep->right); i = nodep->info; cid_rel(nodep); cid_jwait(&jv); return i + s1 + s2; } }
cid_forkable int sum_graph(global node* nodep) { int i, s1, s2; cid_jvar jv = CID_JVAR_INITIAL; if (nodep == CID_NULL) return 0; else { cid_get(nodep, CID_WRITE); if (nodep->mark == TRUE) { cid_rel(nodep); return 0; } else { nodep->mark = TRUE; cid_fork(jv; cid_to_pe(nodep->left)) s1 = sum_graph(nodep->left); cid_fork(jv; cid_to_pe(nodep->right)) s2 = sum_graph(nodep->right); i = nodep->info; cid_rel(nodep); cid_jwait(&jv); return i + s1 + s2; } }
Automatic Load Balancing • On fork, if destination PE not specified, runtime attempts to choose underutilized processor • Work-stealing scheduler attempts to balance load dynamically
Accumulators for (j=0; j<N; j++) cid_fork(…) results[j] = f(…); s = 0; for (j=0; j<N; j++) s += results[j];
Accumulators s = 0; for (j=0; j<N; j++) cid_fork(…) s += f(…);
Distributed Arrays • Similar to HPF cid_alloc_2d(&jv, &gp, NI, NJ, sizeof_elem, distrib, block_factor); block, cyclic, etc. size of dist. unit