1 / 10

Application Mapping Over OFIWG SFI

Application Mapping Over OFIWG SFI. Sean Hefty. MPI Over SFI Example. MPI Implementation over SFI Demonstrates possible usage model Initialization Send injection Send Completions Polling RMA Counters Completions. Query Interfaces: Tagged. Reliable unconnected endpoint.

naida-good
Download Presentation

Application Mapping Over OFIWG SFI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application Mapping Over OFIWG SFI Sean Hefty

  2. MPI Over SFI Example • MPI Implementation over SFI • Demonstrates possible usage model • Initialization • Send injection • Send Completions • Polling • RMA • Counters • Completions

  3. Query Interfaces: Tagged Reliable unconnected endpoint /* Tagged provider */ hints.type = FID_RDM; #ifdef MPIDI_USE_AV_MAP hints.addr_format= FI_ADDR; #else hints.addr_format= FI_ADDR_INDEX; #endif hints.protocol = FI_PROTO_UNSPEC; hints.ep_cap = FI_TAGGED | FI_BUFFERED_RECV | FI_REMOTE_COMPLETE | FI_CANCEL; hints.op_flags= FI_REMOTE_COMPLETE; Address vector optimized for minimal memory footprint and no internal lookups Transport agnostic Behavior required by endpoint Default flags to apply to data transfer operations

  4. Query Interfaces: RMA/Atomics Separate endpoint for RMA operations /* RMA provider */ hints.type= FID_RDM; #ifdef MPIDI_USE_AV_MAP hints.addr_format = FI_ADDR; #else hints.addr_format = FI_ADDR_INDEX; #endif hints.protocol = FI_PROTO_UNSPEC; hints.ep_cap = FI_RMA | FI_ATOMICS | FI_REMOTE_COMPLETE | FI_REMOTE_READ | FI_REMOTE_WRITE; hints.op_flags = FI_REMOTE_COMPLETE; Support for RMA and atomic operations Remote RMA read and write support

  5. Query Interfaces: Message Queue Event queue optimized to report tagged completions eq_attr.mask= FI_EQ_ATTR_MASK_V1; eq_attr.domain= FI_EQ_DOMAIN_COMP; eq_attr.format= FI_EQ_FORMAT_TAGGED; fi_eq_open(domainfd, &eq_attr, &p2p_eqfd, NULL); eq_attr.mask= FI_EQ_ATTR_MASK_V1; eq_attr.domain= FI_EQ_DOMAIN_COMP; eq_attr.format= FI_EQ_FORMAT_DATA; fi_eq_open(domainfd, &eq_attr, rma_eqfd, NULL); fi_bind(tagged_epfd, p2p_eqfd, FI_SEND | FI_RECV); fi_bind(rma_epfd, rma_eqfd, FI_READ | FI_WRITE); Event queue optimized to report RMA completions Associate endpoints with event queues

  6. Query Limits Query endpoint limits optlen= sizeof(max_buffered_send); fi_getopt(tagged_epfd, FI_OPT_ENDPOINT, FI_OPT_MAX_INJECTED_SEND, &max_buffered_send, &optlen); optlen= sizeof(max_send); fi_getopt(tagged_epfd, FI_OPT_ENDPOINT, FI_OPT_MAX_MSG_SIZE, &max_send, &optlen); Maximum ‘inject’ data size – buffer is reusable immediately after function call returns Maximum application level message size

  7. Short Send intMPIDI_Send(buf, count, datatype, rank, tag, comm, context_offset, **request) { data_sz = get_size(count, datatype); if (data_sz <= max_buffered_send) { match_bits= init_sendtag(comm->context_id + context_offset, comm->rank, tag, 0); fi_tinjectto(tagged_epfd, buf, data_sz, COMM_TO_PHYS(comm, rank), match_bits); } else { ... } } Small sends map directly to tagged-injectto call Fabric address provided directly to provider

  8. Large Message Send Large sends require request allocation intMPIDI_Send(buf, count, datatype, rank, tag, comm, context_offset, **request) { /* code for type calculations, tag creation, etc */ REQUEST_CREATE(sreq); fi_tsendto(MPIDI_Global.tagged_epfd,send_buf, data_sz, NULL, COMM_TO_PHYS(comm,rank), match_bits, &(REQ_OF2(sreq)->of2_context)); *request = sreq; } SFI completion context embedded in request object

  9. Progress/Polling for Completions Fields align on tagged entry to data_entry intMPIDI_Progress() { eq_tagged_entry_twc; fid_eq_tfd[2] = {p2p_eqfd, rma_eqfd}; for(i=0;i<2;i++) { MPID_Request *req; rc = fi_eq_read(fd[i],(void *)&wc, sizeof(wc)); handle_errs(rc); req = context_to_request(wc.op_context); req->callback(req); } }

  10. RMA Completions (Counters and Completions) intMPIDI_Win_fence(MPID_Win *win) { /* synchronize software counters via completions */ PROGRESS_WHILE(win->started!=win->completed); /* Syncronize hardware counters */ fi_sync(WIN_OF2(win)->rma_epfd, FI_WRITE|FI_READ|FI_BLOCK, NULL); /* Notify any request based objects that use counter completion */ RequestQ->notify() }

More Related