1 / 11

Cid

Cid. CS498LVK 4 April 2006 Aaron Becker Abhinav Bhatele Isaac Dooley. Overview. Model: MIMD threads with lock-protected shared data Intended to be similar to SMP threaded programs Preprocessor for standard C compiler. Cid is C With…. Global pointers global int* n;

khan
Download Presentation

Cid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cid CS498LVK 4 April 2006 Aaron Becker Abhinav Bhatele Isaac Dooley

  2. Overview • Model: MIMD threads with lock-protected shared data • Intended to be similar to SMP threaded programs • Preprocessor for standard C compiler

  3. Cid is C With… • Global pointers global int* n; cid_get(n, CID_READ); cid_rel(n); • Threads spawned on remote processors cid_fork(jv; ) do_work(); cid_jwait(&jv); • More stuff we’ll talk about later

  4. struct node_s { int info; global struct node_s* left; global struct node_s* right; } node; cid_forkable global node* build_tree(int d) { node* nodep; cid_jvar jv = CID_JVAR_INITIAL; if (d == 0) return CID_NULL; else { nodep = (node*) malloc(sizeof(node)); cid_fork(jv; ) nodep->left = build_tree(d-1); cid_fork(jv; ) nodep->right = build_tree(d-1); nodep->info = compute node info; cid_jwait(&jv); return cid_to_gptr(nodep); } }

  5. cid_forkable int sum_tree(global node* nodep) { int i, s1, s2; cid_jvar jv = CID_JVAR_INITIAL; if (nodep == CID_NULL) return 0; else { cid_get(nodep, CID_READ); cid_fork(jv; ) s1 = sum_tree(nodep->left); cid_fork(jv; ) s2 = sum_tree(nodep->right); i = nodep->info; cid_rel(nodep); cid_jwait(&jv); return i + s1 + s2; } } What’s wrong with this approach?

  6. cid_forkable int sum_tree(global node* nodep) { int i, s1, s2; cid_jvar jv = CID_JVAR_INITIAL; if (nodep == CID_NULL) return 0; else { cid_get(nodep, CID_READ); cid_fork(jv; cid_to_pe(nodep->left)) s1 = sum_tree(nodep->left); cid_fork(jv; cid_to_pe(nodep->right)) s2 = sum_tree(nodep->right); i = nodep->info; cid_rel(nodep); cid_jwait(&jv); return i + s1 + s2; } }

  7. cid_forkable int sum_graph(global node* nodep) { int i, s1, s2; cid_jvar jv = CID_JVAR_INITIAL; if (nodep == CID_NULL) return 0; else { cid_get(nodep, CID_WRITE); if (nodep->mark == TRUE) { cid_rel(nodep); return 0; } else { nodep->mark = TRUE; cid_fork(jv; cid_to_pe(nodep->left)) s1 = sum_graph(nodep->left); cid_fork(jv; cid_to_pe(nodep->right)) s2 = sum_graph(nodep->right); i = nodep->info; cid_rel(nodep); cid_jwait(&jv); return i + s1 + s2; } }

  8. Automatic Load Balancing • On fork, if destination PE not specified, runtime attempts to choose underutilized processor • Work-stealing scheduler attempts to balance load dynamically

  9. Accumulators for (j=0; j<N; j++) cid_fork(…) results[j] = f(…); s = 0; for (j=0; j<N; j++) s += results[j];

  10. Accumulators s = 0; for (j=0; j<N; j++) cid_fork(…) s += f(…);

  11. Distributed Arrays • Similar to HPF cid_alloc_2d(&jv, &gp, NI, NJ, sizeof_elem, distrib, block_factor); block, cyclic, etc. size of dist. unit

More Related