1 / 42

How to Succeed in MPI Without Really Trying

How to Succeed in MPI Without Really Trying. Dr. Jeff Squyres jsquyres@cisco.com December 9 , 2010. Who am I?. MPI Architect at Cisco Systems, Inc. Co-founder, Open MPI project http://www.open-mpi.org/ Language Bindings Chapter author, MPI-2 Secretary, MPI-3 Forum

jamal
Download Presentation

How to Succeed in MPI Without Really Trying

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Succeed in MPI Without Really Trying Dr. Jeff Squyres jsquyres@cisco.com December 9, 2010

  2. Who am I? • MPI Architect at Cisco Systems, Inc. • Co-founder, Open MPI project • http://www.open-mpi.org/ • Language Bindings Chapter author, MPI-2 • Secretary, MPI-3 Forum • http://www.mpi-forum.org/ MPI Geek

  3. Assumptions • You generally know what MPI is • Examples: • You’ve run an MPI application • You’ve written an application that uses MPI • You’ve ported an existing application to use MPI • You know how to spell MPI

  4. How I think of MPI

  5. MPI is hard • Race conditions • Deadlocks • Performance bottlenecks • High latency • Low bandwidth …oh my! MPI

  6. MPI is easy ✔ • Network agnostic • Multi-core aware • Waaay easier than native network APIs • Simple things are simple • Continually evolving …oh my! MPI

  7. Where do you land? ? MPI Easy Hard

  8. Happy MPI users MPI = Fun! MPI: will you marry me? MPI saved me 15% off my car insurance!

  9. MPI is ___ …a mindset • “Think in parallel” …only a tool • It is not your work, your data, or your algorithms …one of many tools • Is MPI the right tool for your application?

  10. Become a happy MPI user • There are many ways to do this • Sometimes it’s not about writing better code • Think about all the parts involved • What is the problem you are trying to solve?

  11. Way #1:Don’t reinvent the wheel • “Not invented here” syndrome is dumb • Did someone else MPI-ize (or otherwise parallelize) a tool that you can use? • Visit the library, troll around on Google • Become a happy MPI user by avoiding MPI • Focus on your work • Easier to think in serial rather than in parallel

  12. #2: Get help • Consult local MPI / parallel algorithm experts • Parallel programming is hard • There are people who can help • “Cross functional collaboration” is sexy these days • Consider: we readily ask for help from librarians, mechanics, doctors, … • Why not ask a computer scientist?

  13. Proof that networking is hard • Open MPI SVN commit r22788 • Fixes a bug in TCP IPv4 and IPv6 handling • Represents months of work • 87 line commit message • One character change in the code base: 0 to 1 • Users should not need to care about this junk • This is why MPI (middleware) is good for you

  14. #3: Focus on time to solution • Which is most important? • Individual application execution time • Overall time to solution • Put differently: • Can you tolerate 5-10% performance loss if you finish the code a week earlier? • “Premature optimization is the root of all evil” • Donald Knuth (famous computer scientist)

  15. #4: printf() is not enough • For the love of all that is holy, please, Please, PLEASE use tools • Debuggers • Memory checkers • Profilers • Deadlock checkers • Spending a day to learn how to use a tool can save you (at least) a week of debugging

  16. Non-determinism = BAD Your output • Print statements (can) introduce non-determinism • Changes race condition timings • Makes Heisenbugsincredibly difficult to find • Remember: print statements take up bandwidth and CPU resources mpirun MPI MPI MPI MPI MPI MPI MPI MPI MPI

  17. #5: Use the right equipment • Find the right-sized parallel resource • Your laptop • Your desktop • One of Stanford’s clusters • Amazon EC2 • NSF resource • Look at your requirements

  18. Application requirements • The Big 4: • Memory • Disk (IO) • Network • Processor • YMMV • ...but this is a good place to start Vs. Vs. Vs.

  19. #6: Avoid common MPI mistakes • Properly setup your parallel environment • Test those shell startup / module files • Make sure the basics work • Example: try “mpirun hostname” (with OMPI) • Example: try “mpirun ring” • …and so on • Ensure PATH and LD_LIBRARY_PATH settings are right and consistent

  20. #6: Avoid common MPI mistakes • Don’t orphan MPI requests • Use MPI_TEST and MPI_WAIT • Don’t assume they have completed • Great way to leak resources and memory MPI_Isend(…, req) … MPI_Recv(…) // Ah ha! I know the // send has completed. // But don’t forget to // complete it anyway MPI_Wait(req, …)

  21. #6: Avoid common MPI mistakes • Avoid MPI_PROBE when possible • Usually forces an extra memory copy • Try pre-posting MPI_IRECVs instead • No additional copy while (…) { MPI_Probe(…); } // Pre-post receives MPI_Irecv(…) while (…) { MPI_Test(…) }

  22. #6: Avoid common MPI mistakes • Avoid mixing compiler suites when possible • Compile all middleware and your app with a single compiler suite • Vendors do provide (some) inter-compiler compatibility • But it’s usually easier to avoid this

  23. #6: Avoid common MPI mistakes • Don’t assume MPI implementation quirks • Check what the MPI spec really says • http://www.mpi-forum.org/ • True quotes I’ve heard from users: • “MPI collectives always synchronize” • “MPI_BARRIER completes sends” • “MPI_RECV isn’t always necessary” (my personal favorite)

  24. #6: Avoid common MPI mistakes • Don’t blame MPI for application errors • Your application is huge and complex • Try to replicate the problem in a small example • Not to be a jerk, but it’s usually an application bug…

  25. #6: Avoid common MPI mistakes • Don’t (re-)use a buffer before it’s ready • Completion is not guaranteed until MPI_TEST or MPI_WAIT • Do not read / modify before completion! MPI_Isend(buf, …); buf[3] = 10; // BAD! MPI_Irecv(buf, …); MPI_Barrier(…); A = buf[3]; MPI_Wait(…, req); // Bad!

  26. #6: Avoid common MPI mistakes • Do not mix MPI implementations • Compile with MPI ABC • Run with MPI XYZ • Do not mix MPI implementation versions • Sometimes it may work • …sometimes it may not • Be absolutely sure of your environment settings

  27. #6: Avoid common MPI mistakes • Avoid MPI_ANY_SOURCE when possible • Similar to MPI_PROBE • MPI_ANY_SOURCE can disable some internal optimizations • Instead, pre-post receives • …when there are only a few possible peers • Ok to use MPI_ANY_SOURCE when many possible peers

  28. #6: Avoid common MPI mistakes • Double check for unintended serialization • Using non-blocking sends and receives • But there’s an accidental “domino effect” • Example: process X cannot continue until it receives from process (X-1) • Message analyzer tools make this effect obvious MPI MPI MPI MPI MPI

  29. #6: Avoid common MPI mistakes • Do not assume MPI_SEND behavior • It may block • It may not block • Completion ≠ the receiver has the data • Only means you can re-use the buffer • Every implementation is different • Never assume any of the above behaviors

  30. #7: Use MPI collectives • MPI implementations use the fruits of 15+ years of collective algorithm research • [Almost] Always better than application-provided versions • This was not always true • Collectives in mid-90’s were terrible • They’re much better now • Go audit your code • Still an active research field

  31. #7: Use MPI collectives Application-provided broadcast Implementation-provided broadcast MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI

  32. #8: Location, location, location! • It’s all about the NUMA • Memory is distributed around the server • Avoid remote memory! • Don’t forget the internal network (!) • It now matters  Local memory Local memory CPU CPU CPU CPU Local memory Local memory

  33. #8: Location, location, location! • Paired with MPI • For off-node communication / data access • (At least) 2 levels of networking • “NUNA” • Non-uniform network access Node Node Node Node Network

  34. #8: Location, location, location! • Design your algorithms to exploit data locality • Use what is local first • Use non-blocking for remote access Node Node Node Node Network

  35. #8: Location, location, location! • Shameless plug: Hardware Locality Toolkit (hwloc) • http://www.open-mpi.org/projects/hwloc/ • Provides local topology via CLI and a C API

  36. #9: Do not use MPI one-sided • Some application (greatly) benefit from a one-sided communication model • MPI-2’s one-sided is terrible • It is being revamped in MPI-3 • Avoid MPI one-sided for now

  37. #10: Talk to your vendors • Server, network, MPI, … • Tell us what’s good • Tell us what’s bad • Then tell us what’s good again  • We don’t know you’re having a problem unless you tell us! • Particularly true for open source software

  38. #11: Design for parallel • Retro-fitting parallelism can be Bad • Can make spaghetti code • Can lead to questionable speedups • Think about parallelism from the very beginning • Not everything parallelizes well • Might need a lot of synchronization • Might need to talk to all peers every iteration • Try solving the problem a different way • …or buy a Cray 

  39. But… what about portability? • I didn’t really mention “portability” at all • Here’s the secret: • If you do what I said, your code will be as portable as possible • Write modular code for the non-portable sections

  40. Moral(s) of the story • It’s a complex world • But MPI (and parallelism) can be your friend • Focus on the problem you’re trying to solve • Design for parallelism • Design for change over time

  41. Additional resources • MPI Forum web site • The only site for the official MPI standards • http://www.mpi-forum.org/ • NCSA MPI basic and intermediate tutorials • Requires a free account • http://ci-tutor.ncsa.uiuc.edu/ • “MPI Mechanic” magazine columns • http://cw.squyres.com/

  42. Questions?

More Related