application mapping over ofiwg sfi n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Application Mapping Over OFIWG SFI PowerPoint Presentation
Download Presentation
Application Mapping Over OFIWG SFI

Loading in 2 Seconds...

play fullscreen
1 / 10

Application Mapping Over OFIWG SFI - PowerPoint PPT Presentation


  • 183 Views
  • Uploaded on

Application Mapping Over OFIWG SFI. Sean Hefty. MPI Over SFI Example. MPI Implementation over SFI Demonstrates possible usage model Initialization Send injection Send Completions Polling RMA Counters Completions. Query Interfaces: Tagged. Reliable unconnected endpoint.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Application Mapping Over OFIWG SFI' - naida-good


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
mpi over sfi example
MPI Over SFI Example
  • MPI Implementation over SFI
  • Demonstrates possible usage model
    • Initialization
    • Send injection
    • Send Completions
    • Polling
    • RMA
      • Counters
      • Completions
query interfaces tagged
Query Interfaces: Tagged

Reliable unconnected endpoint

/* Tagged provider */

hints.type = FID_RDM;

#ifdef MPIDI_USE_AV_MAP

hints.addr_format= FI_ADDR;

#else

hints.addr_format= FI_ADDR_INDEX;

#endif

hints.protocol = FI_PROTO_UNSPEC;

hints.ep_cap = FI_TAGGED |

FI_BUFFERED_RECV |

FI_REMOTE_COMPLETE |

FI_CANCEL;

hints.op_flags= FI_REMOTE_COMPLETE;

Address vector optimized for minimal memory footprint and no internal lookups

Transport agnostic

Behavior required by endpoint

Default flags to apply to data transfer operations

query interfaces rma atomics
Query Interfaces: RMA/Atomics

Separate endpoint for RMA operations

/* RMA provider */

hints.type= FID_RDM;

#ifdef MPIDI_USE_AV_MAP

hints.addr_format = FI_ADDR;

#else

hints.addr_format = FI_ADDR_INDEX;

#endif

hints.protocol = FI_PROTO_UNSPEC;

hints.ep_cap = FI_RMA | FI_ATOMICS |

FI_REMOTE_COMPLETE |

FI_REMOTE_READ |

FI_REMOTE_WRITE;

hints.op_flags = FI_REMOTE_COMPLETE;

Support for RMA and atomic operations

Remote RMA read and write support

query interfaces message queue
Query Interfaces: Message Queue

Event queue optimized to report tagged completions

eq_attr.mask= FI_EQ_ATTR_MASK_V1;

eq_attr.domain= FI_EQ_DOMAIN_COMP;

eq_attr.format= FI_EQ_FORMAT_TAGGED;

fi_eq_open(domainfd, &eq_attr, &p2p_eqfd, NULL);

eq_attr.mask= FI_EQ_ATTR_MASK_V1;

eq_attr.domain= FI_EQ_DOMAIN_COMP;

eq_attr.format= FI_EQ_FORMAT_DATA;

fi_eq_open(domainfd, &eq_attr, rma_eqfd, NULL);

fi_bind(tagged_epfd, p2p_eqfd, FI_SEND | FI_RECV);

fi_bind(rma_epfd, rma_eqfd, FI_READ | FI_WRITE);

Event queue optimized to report RMA completions

Associate endpoints with event queues

query limits
Query Limits

Query endpoint limits

optlen= sizeof(max_buffered_send);

fi_getopt(tagged_epfd, FI_OPT_ENDPOINT,

FI_OPT_MAX_INJECTED_SEND,

&max_buffered_send, &optlen);

optlen= sizeof(max_send);

fi_getopt(tagged_epfd, FI_OPT_ENDPOINT,

FI_OPT_MAX_MSG_SIZE,

&max_send, &optlen);

Maximum ‘inject’ data size – buffer is reusable immediately after function call returns

Maximum application level message size

short send
Short Send

intMPIDI_Send(buf, count, datatype, rank, tag,

comm, context_offset, **request)

{

data_sz = get_size(count, datatype);

if (data_sz <= max_buffered_send) {

match_bits= init_sendtag(comm->context_id +

context_offset,

comm->rank, tag, 0);

fi_tinjectto(tagged_epfd, buf, data_sz,

COMM_TO_PHYS(comm, rank),

match_bits);

} else {

...

}

}

Small sends map directly to tagged-injectto call

Fabric address provided directly to provider

large message send
Large Message Send

Large sends require request allocation

intMPIDI_Send(buf, count, datatype, rank, tag,

comm, context_offset, **request)

{

/* code for type calculations, tag creation, etc */

REQUEST_CREATE(sreq);

fi_tsendto(MPIDI_Global.tagged_epfd,send_buf,

data_sz,

NULL,

COMM_TO_PHYS(comm,rank),

match_bits,

&(REQ_OF2(sreq)->of2_context));

*request = sreq;

}

SFI completion context embedded in request object

progress polling for completions
Progress/Polling for Completions

Fields align on tagged entry to data_entry

intMPIDI_Progress()

{

eq_tagged_entry_twc;

fid_eq_tfd[2] = {p2p_eqfd, rma_eqfd};

for(i=0;i<2;i++)

{

MPID_Request *req;

rc = fi_eq_read(fd[i],(void *)&wc, sizeof(wc));

handle_errs(rc);

req = context_to_request(wc.op_context);

req->callback(req);

}

}

rma completions counters and completions
RMA Completions (Counters and Completions)

intMPIDI_Win_fence(MPID_Win *win)

{

/* synchronize software counters via completions */

PROGRESS_WHILE(win->started!=win->completed);

/* Syncronize hardware counters */

fi_sync(WIN_OF2(win)->rma_epfd,

FI_WRITE|FI_READ|FI_BLOCK,

NULL);

/* Notify any request based objects that use

counter completion

*/

RequestQ->notify()

}