Integrating new capabilities into netpipe
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Integrating New Capabilities into NetPIPE PowerPoint PPT Presentation


  • 33 Views
  • Uploaded on
  • Presentation posted in: General

Integrating New Capabilities into NetPIPE. Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This work was funded by the MICS office of the US Department of Energy. N. e. t. w. o. r. k. P. r. o. t. o. c. o. l.

Download Presentation

Integrating New Capabilities into NetPIPE

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Integrating new capabilities into netpipe

Integrating New Capabilities into NetPIPE

Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes

Scalable Computing Laboratory of Ames Laboratory

This work was funded by the MICS office of the US Department of Energy


Integrating new capabilities into netpipe

N

e

t

w

o

r

k

P

r

o

t

o

c

o

l

I

n

d

e

p

e

n

d

e

n

t

P

e

r

f

o

r

m

a

n

c

e

E

v

a

l

u

a

t

o

r

T

C

P

M

P

I

w

o

r

k

s

t

a

t

i

o

n

s

M

P

I

C

H

L

A

M

/

M

P

I

P

C

s

M

P

I

/

P

r

o

M

P

_

L

i

t

e

G

M

C

l

u

s

t

e

r

s

P

V

M

M

y

r

i

n

e

t

c

a

r

d

s

N

e

t

P

I

P

E

n

a

t

i

v

e

I

n

f

i

n

i

b

a

n

d

T

C

G

M

S

G

2

-

s

i

d

e

d

s

o

f

t

w

a

r

e

M

e

l

l

a

n

o

x

V

A

P

I

p

r

o

t

o

c

o

l

s

r

u

n

s

o

n

l

a

y

e

r

s

A

R

M

C

I

o

r

M

P

I

A

R

M

C

I

1

-

s

i

d

e

d

i

n

t

e

r

n

a

l

T

C

P

,

G

M

,

V

I

A

,

p

r

o

t

o

c

o

l

s

s

y

s

t

e

m

s

Q

u

a

d

r

i

c

s

,

L

A

P

I

M

P

I

-

2

1

-

s

i

d

e

d

M

P

I

_

P

u

t

o

r

M

P

I

_

G

e

t

L

A

P

I

I

B

M

S

P

S

H

M

E

M

C

r

a

y

T

3

E

m

e

m

c

p

y

1

-

s

i

d

e

d

S

H

M

E

M

S

G

I

s

y

s

t

e

m

s

p

u

t

s

a

n

d

g

e

t

s

&

G

P

S

H

M

E

M

A

R

M

C

I

+

B

a

s

i

c

s

e

n

d

/

r

e

c

v

w

i

t

h

o

p

t

i

o

n

s

t

o

g

u

a

r

a

n

t

e

e

p

r

e

-

p

o

s

t

i

n

g

o

r

u

s

e

M

P

I

_

A

N

Y

_

S

O

U

R

C

E

.

+

O

p

t

i

o

n

t

o

m

e

a

s

u

r

e

p

e

r

f

o

r

m

a

n

c

e

w

i

t

h

o

u

t

c

a

c

h

e

e

f

f

e

c

t

s

.

+

O

n

e

-

s

i

d

e

d

c

o

m

m

u

n

i

c

a

t

i

o

n

s

u

s

i

n

g

e

i

t

h

e

r

G

e

t

o

r

P

u

t

,

w

i

t

h

o

r

w

i

t

h

o

u

t

f

e

n

c

e

c

a

l

l

s

.

+

M

e

a

s

u

r

e

p

e

r

f

o

r

m

a

n

c

e

o

r

d

o

a

n

i

n

t

e

g

r

i

t

y

t

e

s

t

.

http://www.scl.ameslab.gov/Projects/NetPIPE/


Integrating new capabilities into netpipe

The NetPIPE utility

  • NetPIPE does a series of ping-pong tests between two nodes.

  • Message sizes are chosen at regular intervals, and with slight perturbations, to fully test the communication system for idiosyncrasies.

  • Latencies reported represent half the ping-pong time for messages smaller than 64 Bytes.

Some typical uses

  • Measuring the overhead of message-passing protocols.

  • Help in tuning the optimization parameters of message-passing libraries.

  • Optimizing driver and OS parameters (socket buffer sizes, etc.).

  • Identifying dropouts in networking hardware and drivers.

What is not measured

  • NetPIPE cannot measure the load on the CPU yet.

  • The effects from the different methods for maintaining message progress.

  • Scalability with system size.


Integrating new capabilities into netpipe

Recent additions to NetPIPE

  • Can do an integrity test instead of measuring performance.

  • Streaming mode measures performance in 1 direction only.

    • Must reset sockets to avoid effects from a collapsing window size.

  • A bi-directional ping-pong mode has been added (-2).

  • One-sided Get and Put calls can be measured (MPI or SHMEM).

    • Can choose whether to use an intervening MPI_Fence call to synchronize.

  • Messages can be bounced between the same buffers (default mode), or they can be started from a different area of memory each time.

    • There are lots of cache effects in SMP message-passing.

    • InfiniBand can show similar effects since memory must be registered with the card.

Process 1

Process 0

0

2

1

3


Integrating new capabilities into netpipe

Current projects

  • Overlapping pair-wise ping-pong tests.

    • Must consider synchronization if not using bi-directional communications.

Ethernet Switch

n0

n1

n2

n3

Line speed vs

end-point limited

n0

n1

n2

n3

  • Investigate other methods for testing the global network.

    • Evaluate the full range from simultaneous nearest neighbor communications to all-to-all.


Integrating new capabilities into netpipe

Performance on Mellanox InfiniBand cards

A new NetPIPE module allows us to measure the raw performance across InfiniBand hardware (RDMA and Send/Recv).

Burst mode preposts all receives to duplicate the Mellanox test.

The no-cache performance is much lower when the memory has to be registered with the card.

An MP_Lite InfiniBand module will be incorporated into LAM/MPI.

MVAPICH 0.9.1


Integrating new capabilities into netpipe

10 Gigabit Ethernet

Intel 10 Gigabit Ethernet cards

133 MHz PCI-X bus

Single mode fiber

Intel ixgb driver

Can only achieve 2 Gbps now.

Latency is 75 us.

Streaming mode delivers up to 3 Gbps.

Much more development work is needed.


Integrating new capabilities into netpipe

Channel-bonding Gigabit Ethernet for better communications between nodes

Channel-bonding uses 2 or more Gigabit Ethernet cards per PC to increase the communication rate between nodes in a cluster.

GigE cards cost ~$40 each.

24-port switches cost ~$1400.

ÔÉ® $100 / computer

This is much more cost effective for PC clusters than using more expensive networking hardware, and may deliver similar performance.


Integrating new capabilities into netpipe

Performance for channel-bonded Gigabit Ethernet

GigE can deliver 900 Mbps with latencies of 25-62 us for PCs with 64-bit / 66 MHz PCI slots.

Channel-bonding 2 GigE cards / PC using MP_Lite doubles the performance for large messages.

Adding a 3rd card does not help much.

Channel-bonding 2 GigE cards / PC using Linux kernel level bonding actually results in poorer performance.

The same tricks that make channel-bonding successful in MP_Lite should make Linux kernel bonding working even better.

Any message-passing system could then make use of channel-bonding on Linux systems.

Channel-bonding multiple GigE cards using MP_Lite and Linux kernel bonding


Integrating new capabilities into netpipe

Channel-bonding in MP_Lite

User space

Kernel space

device driver

Application

on node 0

Large

socket

buffers

device

queue

GigE

card

a

b

dev_q_xmit

DMA

TCP/IP stack

b

TCP/IP stack

GigE

card

a

dev_q_xmit

DMA

MP_Lite

device

queue

Flow control may stop a given stream at several places.

With MP_Lite channel-bonding, each stream is independent of the others.


Integrating new capabilities into netpipe

Linux kernel channel-bonding

User space

Kernel space

device driver

Application

on node 0

device

queue

Large

socket

buffer

GigE

card

dqx

DMA

bonding.c

TCP/IP

stack

dqx

dqx

GigE

card

DMA

device

queue

A full device queue will stop the flow at bonding.c to both device queues.

Flow control on the destination node may stop the flow out of the socket buffer.

In both of these cases, problems with one stream can affect both streams.


Integrating new capabilities into netpipe

Comparison of high-speed interconnects

InfiniBand can deliver 4500 - 6500Mbps at a 7.5 us latency.

Atoll delivers 1890 Mbps with a 4.7 us latency.

SCI delivers 1840 Mbps with only a 4.2 us latency.

Myrinet performance reaches 1820 Mbps with an 8 us latency.

Channel-bonded GigE offers 1800 Mbps for very large messages.

Gigabit Ethernet delivers 900 Mbps with a 25-62 us latency.

10 GigE only delivers 2 Gbps with a 75 us latency.


Conclusions

Conclusions

  • NetPIPE provides a consistent set of analytical tools in the same flexible framework to many message-passing and native communication layers.

  • New modules have been developed.

    • 1-sided MPI and SHMEM

    • GM, InfiniBand using the Mellanox VAPI, ARMCI, LAPI

    • Internal tests like memcpy

  • New modes have been incorporated into NetPIPE.

    • Streaming and bi-directional modes.

    • Testing without cache effects.

    • The ability to test integrity instead of performance.


Integrating new capabilities into netpipe

Current projects

  • Developing new modules.

    • ATOLL

    • IBM Blue Gene/L

    • I/O performance

  • Need to be able to measure CPU load during communications.

  • Expanding NetPIPE to do multiple pair-wise communications.

    • Can measure the backplane performance on switches.

    • Compare the line speed to end-point limited performance.

  • Working toward measuring more of the global properties of a network.

    • The network topology will need to be considered.


Contact information

Contact information

Dave Turner - [email protected]

http://www.scl.ameslab.gov/Projects/MP_Lite/

http://www.scl.ameslab.gov/Projects/NetPIPE/


Integrating new capabilities into netpipe

One-sided Puts between two Linux PCs

  • MP_Lite is SIGIO based, so MPI_Put() and MPI_Get() finish without a fence.

  • LAM/MPI has no message progress, so a fence is required.

  • ARMCI uses a polling method, and therefore does not require a fence.

  • An MPI-2 implementation of MPICH is under development.

  • An MPI-2 implementation of MPI/Pro is under development.

Netgear GA620 fiber GigE 32/64-bit 33/66 MHz AceNIC driver


The mp lite message passing library

The MP_Lite message-passing library

  • A light-weight MPI implementation

  • Highly efficient for the architectures supported

  • Designed to be very user-friendly

  • Ideal for performing message-passing research

    http://www.scl.ameslab.gov/Projects/MP_Lite/


Integrating new capabilities into netpipe

A NetPIPE example: Performance on a Cray T3E

Raw SHMEM delivers:

  • 2600 Mbps

  • 2-3 us latency

    Cray MPI originally delivered:

  • 1300 Mbps

  • 20 us latency

    MP_Lite delivers:

  • 2600 Mbps

  • 9-10 us latency

    New CrayMPI delivers:

  • 2400 Mbps

  • 20 us latency

The top of the spikes are where the message size is divisible by 8 Bytes.


  • Login