1 / 12

CRL (C Region Library)

CRL (C Region Library). Chao Huang, James Brodman, Hassan Jafri CS498LVK. Introduction. CRL is an all-software distributed shared memory (DSM) system Provides shared address space Built on PVM “Region”: an arbitrarily sized, continuous area of memory Consistent cached copy at local nodes.

tocho
Download Presentation

CRL (C Region Library)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CRL (C Region Library) Chao Huang, James Brodman, Hassan Jafri CS498LVK

  2. Introduction • CRL is an all-software distributed shared memory (DSM) system • Provides shared address space • Built on PVM • “Region”: an arbitrarily sized, continuous area of memory • Consistent cached copy at local nodes

  3. Functions • Environment • crl_init • crl_num_nodes, crl_self_addr • Basic region operations • rid_t rgn_create(unsigned size) • void rgn_destroy(rid_t rgn_id) • rid_t rgn_rid(void *rgn) • unsigned rgn_size(void *rgn) • void rgn_flush(void* rgn)

  4. Functions • Region mapping • void* rgn_map(rid_t rgn_id) • void rgn_unmap(void* rgn) • Region read and write • void rgn_start_read(void *rgn) • void rgn_end_read(void *rgn) • void rgn_start_write(void *rgn) • void rgn_end_write(void *rgn)

  5. Functions • Global synchronization • void rgn_barrier(void) • void rgn_bcast_send(int len, void *buf) • void rgn_bcast_recv(int len, void *buf) • double rgn_reduce_dadd(double arg) • double rgn_reduce_dmin(double arg) • double rgn_reduce_dmax(double arg)

  6. Example /* Compute the dot product of * two n-element vectors, each * of which is represented by * appropriately-sized region * x: region identifier for 1st vector * y: address at which 2nd vector is already mapped */ double dotprod(rid_t x, double *y, int n) { int i; double *z; double rslt; /* map 1st vector and initiate read operation */ z = (double *) rgn_map(x); rgn_start_read(z); /* initiate read operation on 2nd vector */ rgn_start_read(y); /* compute dot product */ rslt = 0; for (i=0; i<n; i++) rslt += z[i] * y[i]; /* terminate read operations and unmap 1st vector */ rgn_end_read(y); rgn_end_read(z); rgn_unmap(z); return rslt; }

  7. Discussions • All-software: latency of communication operations may be higher than hardware based system • Region size can be chosen to correspond to user data structures (programmer’s responsibility) • Fixed-home, directory-based invalidate protocol • Ordered message delivery: 32-bit version number tags each region • Unmapped region cache : unique mapping can be cached after unmapped

  8. URC • Enables Lazy Release Consistency for CRL • rgn_start_op can be satisfied locally if region is not invalidated before next time it is mapped • Even if data/region is invalidated, later accesses can be satisfied more quickly

  9. Software • Prototype implementation available • Platforms • CM-5 Thinking Machines (message passing multicomputer) • Alewife (Distributed memory multiprocessor). Provides Native shared memory support • TCP/Unix Implementation for SunOS • Expect a Linux port soon

  10. Machine Characteristics

  11. Basic Ops Latencies

  12. Applications • 32-way completion time of apps with CRL on Alewife comparable to that of Alewife native shared memory • How? Upto 5 remote headers supported by LimitLESS (Alewife’s software-based cache-coherence subsystem)

More Related