ADaPT: An Event-Passing Protocol for Reducing Delivery Costs in Scatter-Gather Parallel Processes

### ADaPT: An Event-Passing Protocol for Reducing Delivery Costs in Scatter-Gather Parallel Processes

Outline

ADaPT: An Event-Passing Protocol

For Reducing Delivery Costs in

Scatter-Gather Parallel Processes

Motivation

Established Techniques

ADaPT

Performance Comparison

Conclusions

What is the Laboratory for Neural Dynamics?

- A computational-science section of the Center for Neural Engineering
- Part of a National Science Foundation engineering research center dedicated to biomimetic microelectronic systems
- Combines computational electrophysiology, engineering, pharmacology, and other disciplines
- Integrates empirically-measured, realistic, and biologically-inspired synaptic models for the purposeof temporal signals processing

The Dynamic Synapse

- Biologically-inspired rather than realistic
- Computationally-complex and non-linear
- Signals processing application was originally a proof of concept
- Now a synergistic field for the Center

Postsynapse

Neuron

Presynapse

Na+

Figure 1: Electro-chemical synaptic transmission

Ca2+

Ca2+

Feedback

threshold

threshold

Action Potential input

Glutamate release

Synaptic Potential Summation

Action Potential output

AP

AP

AP

EPSP

EPSP

EPSP

EPSP

EPSP

EPSP

EPSP

EPSP

EPSP

EPSP

EPSP

EPSP

MotivationDynamic Synapse Neural Networks

LAYER 1

- Classical NN structure
- Increased synaptic functionality
- Parameter trainingvia genetic algorithms

Array of K input neurons

LAYER 2

3xK Pre-synaptic Matrix

3xK Post-synaptic Matrix

K

Output Neurons

3

2

1

AP

AP

A

AP

AP

Feedback Modulation

Captured Sound

K length filter bank

Microphone Array

Figure 2: 3xK 2-Layer DSNN Single Word Classifier.

Established Techniques

ADaPT

Performance Comparison

Conclusions

Established Techniques

Master

Scatter-Gather

Computation Time

- Naïve Approach
- Multiple SequentialScatter-Gathers
- With uniform computation time, exhibits decent parallelism
- With variable computation times,significant idle time

Worker n -2

Worker n -1

Worker 1

Worker 2

Worker 3

Worker 4

Worker 5

Worker 6

Worker n

Master

Worker n -2

Worker n -1

Worker 1

Worker 2

Worker 3

Worker 4

Worker 5

Worker 6

Worker n

Master

Worker n -2

Worker n -1

Worker 1

Worker 2

Worker 3

Worker 4

Worker 5

Worker 6

Worker n

Master

Figure 3: Multi-phase evaluation of 3n genomes by n workers using naïve scatter-gathering.

Established Techniques

A More Efficient Mapping

Master

Computation Time

- Asynchronous scattering
- Reduced idle time for workers
- Closer to optimal time to solution
- Dynamic allocation of resources
- More difficult

Worker n -2

Worker n -1

Worker 1

Worker 2

Worker 3

Worker 4

Worker 5

Worker 6

Worker n

Master

Figure 4: Multi-phase evaluation of 3n genomes by n workers using a more efficient mapping.

Established Techniques

ADaPT

Performance Comparison

Conclusions

ADaPT

Adaptive Data-parallel Publish/Subscribe Transport Protocol

- Publish/Subscribe
- Worker-centric, i.e. processes subscribe to the master
- Data is transported (published) to workers as events
- Unsubscription is possible
- Two-phase adaptive protocol
- Learning phase: request-reply, monitoring of time between requests
- Aggressive phase: events are pushed to workers at regular intervals

Established Techniques

ADaPT

Performance Comparison

Conclusions

Performance Comparison

Message-passing costs for MPI scatters

- Two protocols
- Aggressive & Conservative
- Scatter/Gathers in most implementations use conservative protocols
- Analysis due to Gropp, et. al.

C(MPI Scatter) = (# pop.)[3s + r(n+3e)]

Where n = event payload e = envelope

r = network bandwidth

s = latency

Equation 1: Computation time cost in of scatters

In MPI.

Performance Comparison

Computational Costs for Multiple scatters in MPI

- Our assumption is a normally distributed population of compute times
- An ideal ordering of computations would be sortedby compute time
- How much idle time is present?

C(Computation) = (# pop.)(avg. compute time) + (# workers)(avg. compute time)

Equation 2: Computation time of a normally-distributed population using scatters in MPI.

Figure 5: Graph of sorted compute times of anormal distribution illustrating idle time.

Performance Comparison

Message-Passing costs for ADaPT

- Three different costs of event-passing in ADaPT:
- Subscription
- Learning Phase
- Aggressive Phase

C(subscription) = (# workers) x [s + re]

C(learning) = (# samples) x [2s + r(n+2e)]

C(aggressive) = (# pop - # samples) x

[s + r(n+e)]

Note: we assume control events to be

of size e

Equation 3: Event-passing costs of ADaPT.

Performance Comparison

Unsubscribe Costs for ADaPT

- An unsubscribe occurs when a worker’s event buffer is in danger of overflowing
- With ADaPT, an overflowoccurs when a worker receivesm-1 events triggering computetimes greater than the estimatedaverage (assuming a worker buffers m events)
- Conservatively, we have decidedthat workers should clear theirbuffers before resubscribing

- We used a Monte Carlo simulation (details in paper) to determine E,the % pop with compute times > than the estimated mean given error as a function of % pop. sampled

P(unsubscribe) =

E*Pop C m-1

Pop C m-1

C(unsubscribe) = P(unsubscribe) x

[2(s+re) + (m-1)(avg. compute+ Δ)]

Equation 4: Costs of worker unsubscription in ADaPT.

Performance Comparison

Analysis

(# pop - # samples)2re + (# samples)(avg. compute time) >(# pop / m) x P(unsubscribe) x

[2re + (m-1)(avg. compute time)]

- Which protocol is more appropriate?
- For simplicity of comparisonwe will drop the latency termand assume the number of samples to be equal to thenumber of workers
- i.e. each worker’s firstcomputation is monitored

Equation 5: Cost comparison of MPI vs. ADaPT.

Figure 6: Graph of inequality in Equation 5.

ADaPT: An Event-Passing Protocol

For Reducing Delivery Costs in

Scatter-Gather Parallel Processes

Motivation

Motivation

Established Techniques

ADaPT

Performance Comparison

Conclusions

Conclusions

What have we shown?

- ADaPT is useful when multiple scattering of data must occur due to natural aggregation
- An example is the training of the DSNN using genetic algorithms
- Worker-centric approach for reduced processor idle time
- Unsubscription is expensive but can be avoided withgreater event-buffering capabilities
- ADaPT exploits an event pattern which emerges fromthe application of a well-known architectural pattern

