400 likes | 476 Views
Explore the implementation and optimization of data processing algorithms with sample sorting for parallel systems, covering topics such as bucket sort, recursive approaches, communication time complexities, and performance considerations. Learn about adaptive quadrature in numerical integration and the gravitational N-body problem.
E N D
Adding numbers n data items, p processors ts = O(n) tp = O(n/p) if data on each proc => S=ts/tp=O(p) tp = O(n + n/p) if data needs broadcasting => S=ts/tp=o(1)
Parallel Recursion tcomm = O(n/2 +n/4 + ..+ n/p) = O(n) S=o(1) tcomp = O(n/2 +n/4 + ..+ n/p) = O(n)
tcomm = O(1 +1 + ..+ 1) = O(log p) S=O(n / log p) tcomp = O(1 +1 + ..+ 1) = O(log p)
Sequential m buckets , n numbers ts = O(n + m((n/m) log (n/m))) = O(n log(n/m))
m buckets , n numbers, p=m processors tp = O(n + (n/p) log (n/p))
tp = O(n/p + (n/p) log (n/p)) = O( (n/p) log (n/p)) => S=O(p)
Det. Sample Sort • sort locally and create p-sample
Det. Sample Sort • send all p-samples to processor 1
Det. Sample Sort • proc.1: sort all received samples and compute global p-sample
Det. Sample Sort • broadcast global p-sample • bucket locally according to global p-sample • send bucket i to proc.i • resort locally
Det. Sample Sort Lemma: Each proc. receives at most 2 n/p data items n/p2 n/p2 global sample global sample
Det. Sample Sort Post-Processing: “Array Balancing” n/p n/p n/p n/p n/p n/p n/p n/p 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 2 Rounds: • Each proc. sends rec. data size to all other proc. • Move data to right location via one h-relation
Det. Sample Sort • 5 MPI_AlltoAllv for n/p > p2 • O(n/p log n) local comp. • Goodrich (FOCS'98): O(1) rounds for n/p > pe
static assignment of processors to segments of [a,b] area = d (f(p)+f(q))/2
Adaptive Quadrature Terminate when C is sufficiently small Problem: different parts of the curve need different resolution
segment 1 segment 3 segment 4 segment 2 segment 5
for each time step: for each object: traverse tree to determine its forces Problem:traversals have different lengths
object 1 object 3 object 5 object 2 object 4