Fast Communication for Multi – Core SOPC

Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Spring 2007 Fast Communication for Multi – Core SOPC Supervisor: Evgeny Fiksman Performed by: Moshe Bino Alex Tikh

Table of Contents

General • The presentation describes the topics covered so far and near future progress to be done. • Topics mainly divided into Software & Hardware.

Topics covered so far DONE DONE DONE DONE DONE NOW

MPI - Message Passing Interface • MPI is a library specification (language independent) for message-passing, proposed as a standard by a broadly based committee of vendors, implementers, and users. • Designed for high performance on both massively parallel machines and on workstation clusters. • MPI is widely available, with both free available and vendor-supplied implementations.

Software • MPI standard requires a set of 6 basic functions for processors communication with more functions optimizing the basic set. • We’ll implement the basic set with an additional function – broadcast, from the complete MPI functions set.

MPI Functions 1. Each MPI program has to start and end with: • MPI_Init (argc, argv); • MPI_Finalize (); Where argc,argv are the arguments that function main receives from the command line. 2. Checking processor’s groups size and rank: • MPI_Comm_size (comm, size); • MPI_Comm_rank (comm, rank); Where commspecifies the group’s name, sizeis the group size & rank is the rank of one processor in a group.

Example • Each group has different comm name • Inside each group, every processor has a rank – a number: 0, 1 or 2

MPI Functions 3. Data transfer between processors is done by: • MPI_Send (buf, count, data_type, dest, tag, comm); • MPI_Recv(buf, count, data_type, source, tag, comm, status); • MPI_Bcast (buf, count, data_type, root, comm); buf is the memory address which stores count items of type data_type. source, root, dest : rank of processor inside comm group. tag and status used for getting additional info from the message. We’ll implement tag only, because status requires the use of additional MPI functions.

Software Layers • Application Layer: interface of 7 MPI functions • Network Layer: hardware independent implementation of these functions • Data layer: relies on command bit fields • Physical layer: designed for FSL bus

Command Bit Fields • Fields tag, count, comm and dest, has the same meaning as before. • datatype field is for char, integer, double etc. • command field describes MPI command (send, recv etc.).

Loading an immediate (executing MPI command) into FSL, takes 2 clock cycles for 16 bit immediate, 3 clock cycles for 32 bit immediate. Thus, we’ll try to include all necessary command parameters into the lower 16 bit.

Hardware Router • 4 processors will pass messages through a router. • First implementation will use First Come, First Served (FCFS) routing policy. • Since message size is declared by the sending processor, bigger choice of routing policies available for optimization.

Router: high level design

Cross bar implementation • On send command, allows the router to connect two pairs of processors concurrently. • Allows the router to broadcast to all three destinations simultaneously.

Router Design considerations • Embedded systems demand fast communication and on-chip space economy. • Fast and simple router must be chosen. • In bigger systems, with increased message size, bandwidth and routing policy is crucial. • More complicated router may be needed.

Possible system generalization • Adding more processors/sub-systems is possible, by adding more FSL interfaces to the router. MicroBlaze Sub-system #0 Sub-system #1

Possible system generalization • Router + processors comprise a cluster. • System of several clusters can be constructed.

Messages ordering issue • Suppose a program, in which processors #0 and #1, need to send a message to processor #2. • Suppose in this program, #2 should get the message from #0 prior to #1’s message.

The problem • Since #0 and #1 working concurrently, #1 might send it’s message first. • The router isn’t aware of the expected message arrival ordering, therefore it will pass #1’s message first.

Solution: Flow Chart

Present system block diagram

Time Table – first semester

Time Table – second semester • Build a quad core system • Implementing router for the system • Test and debug • Run a test application

QUESTIONS ?

Fast Communication for Multi – Core SOPC