1 / 23

Parallel Workload

Parallel Workload. Jongeun Lee. Fall 2013. N-Body. N-body problem: to find the positions and velocities of a collection of interacting particles over a period of time Eg : collection of stars (astrophysics), or of molecules or atoms (chemistry)

hugh
Download Presentation

Parallel Workload

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Workload Jongeun Lee Fall 2013

  2. N-Body • N-body problem: • to find the positions and velocities of a collection of interacting particles over a period of time • Eg: collection of stars (astrophysics), or of molecules or atoms (chemistry) • Input: the mass, position, and velocity of each particle at the start of the simulation • Output: the position and velocity of each particle at a sequence of user-specified times

  3. The Problem • n-body solver that simulates the motions of planets or stars • Particle q, with mass mq, at time t, position sq(t), force fqk(t) • Total force on q, exerted by all particles 0, 1, …, n – 1 • Applying Newton’s law, Fq = mqaq = mqsq’’,gives us a system of differential equations to solve • Now let’s find s(t) and s’(t) at

  4. Serial Program • Get input data • for each timestep: • Print positions and velocities of particles • for each particle q: • Compute total force on q • for each particle q: • Compute position and velocity of q

  5. First Inner Loop: Basic Algorithm • for each particle q { • for each particle k != q { • x_diff= pos[q][X] - pos[k][X]; • y_diff= pos[q][Y] - pos[k][Y]; • dist= sqrt(x_diff*x_diff + y_diff*y_diff); • dist_cubed= dist*dist*dist; • forces[q][X] -= G*masses[q]*masses[k]/dist_cubed * x_diff; • forces[q][Y] -= G*masses[q]*masses[k]/dist_cubed * y_diff; • } • }

  6. Reduced Algorithm • for each particle q • forces[q][X] = forces[q][Y] = 0; • for each particle q { • for each particle k > q { • x_diff= pos[q][X] - pos[k][X]; • y_diff= pos[q][Y] - pos[k][Y]; • dist= sqrt(x_diff*x_diff + y_diff*y_diff); • dist_cubed= dist*dist*dist; • force_qk[X] = G*masses[q]*masses[k]/dist_cubed * x_diff; • force_qk[Y] = G*masses[q]*masses[k]/dist_cubed * y_diff; • forces[q][X] += force_qk[X]; • forces[q][Y] += force_qk[Y]; • forces[k][X] -= force_qk[X]; • forces[k][Y] -= force_qk[Y]; • } • }

  7. Integration

  8. Completing Serial Program • Second inner loop: computing position and velocity of q pos[q][X] += delta_t*vel[q][X]; pos[q][Y] += delta_t*vel[q][Y]; vel[q][X] += delta_t/masses[q]*forces[q][X]; vel[q][Y] += delta_t/masses[q]*forces[q][Y];

  9. Communication among Tasks

  10. Mapping • How to map tasks to cores? • Two dimensions • n particles? • T timesteps? • Load balancing? • Shared memvs. Message passing • Optimized algorithms • Hierarchical methods (eg., BH, FMM)

  11. Monte Carlo Method • Popular in • computational physics, numerical integration, optimization, etc. • Basic idea • How to calculate the probability of a solitaire game coming out successfully? • Very useful for • modeling phenomena with significant uncertainty in inputs (e.g., calculation of risk in business) • simulating systems with many coupled degrees of freedom • evaluating multidimensional definite integrals with complicated boundary conditions Source: wikipedia.com

  12. MapReduce http://mm-tom.s3.amazonaws.com/blog/MapReduce.png

  13. MapReduce Example http://blogs.vmware.com/vfabric/files/2013/05/map-reduce-core-idea_numbered.jpg

  14. Structured Grid • A simple stencil • More generally • physics simulation (e.g., simulation of the strong nuclear force, temperature of an oven, etc.) • multiple refinements, until error threshold is reached • many dimensions (over 10^7 for QCD), many nodes Value of red node is updated by a linear combination of the values of the blue nodes

  15. Parallel implementation • Simpler case: 1~3 dimensional grids • assign to each processor a chunk of the grid, determined by a partitioning algorithm • each processor computes the updates of the chunk in each iteration • partitioning may be done statically or dynamically • Ghost nodes • for border grid points • Double-buffering

  16. Advanced Solvers • Multi-Grids • instead of fixed size chunks, make several copies of the grid at various chunk granularities • result from one node can propagate more quickly to far-away node • faster convergence

  17. Advanced Solvers • Adaptive mesh refinement • finer discretization in regions where solution changes more rapidly in space or time • convergence rate can improve vastly • after each update, check the solution, make a decision about whether to subdivide the region

  18. Advanced Solvers • Invariants • no communication between PEs on a given refinement level, except for the ghost nodes • need only one or a small number of prior iterations

  19. Example • Finite Difference method • simulate temperature of a rod insulated except at the ends • temperature: u(x,t) • heat equation

  20. Seven Questions

  21. Summary 1 • Application • 13 dwarfs that form the core of future apps • Spawn graphics, Machine Learning, AI, etc. • Hardware • 1000s of cores on a die • Heterogeneous cores for area, power advantages • Shared/transactional mem/full empty bits for synch • On-chip communication crucial for high performance

  22. Summary II • Programming models • Based on psychology research to make them more intuitive • Independent of number of processors • Systems software • Using search-bound auto tuners instead of compilers • Virtual machine based approach • Success Metrics • Maximize programmers productivity and app performance • Hardware counters & monitors to help caliberate success • Others?

  23. Summary III “This report is intended to be the start of a conversation about these perspectives. There is an open, exciting, and urgent research agenda…”

More Related