1 / 20

MPI in uClinux on Microblaze

MPI in uClinux on Microblaze. Neelima Balakrishnan Khang Tran 05/01/2006. Project Proposal. Port uClinux to work on Microblaze Add MPI implementation on top of uClinux Configure NAS parallel benchmarks and port them to work on RAMP. What is Microblaze?.

keona
Download Presentation

MPI in uClinux on Microblaze

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006

  2. Project Proposal • Port uClinux to work on Microblaze • Add MPI implementation on top of uClinux • Configure NAS parallel benchmarks and port them to work on RAMP

  3. What is Microblaze? • Soft core processor, implemented using general logic primitives • 32-bit Harvard RISC architecture • Supported in the Xilinx Spartan and Virtex series of FPGAs • Customizability of the core makes it challenging while opening up vistas for kernel configurations

  4. Components • uClinux - kernel v2.4 • MPICH2 - portable, high performance implementation of the entire MPI-2 standard • Communication via different channels -sockets, shared memory, etc. • MPI port for Microblaze communication is over FSL

  5. Components (contd.) • NASPB v2.4 - MPI-based source code implementations written and distributed by NAS • 5 kernels • 3 pseudo-applications

  6. Porting uClinux to Microblaze • Done by Dr. John Williams - Embedded Systems group, University of Queensland in Brisbane, Australia • Part of their reconfigurable computing research program. The work on this is still going on • http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux

  7. Challenge in porting uClinux to Microblaze • Linux derivative for microprocessors that lack a memory management unit (MMU) • No memory protection • No virtual memory • For most user applications, the fork() system call is unavailable • malloc() function call needs to be modified

  8. MPI implementation • MPI – Message Passing Interface • Standard API used to create parallel applications • Designed primarily to support the SPMD (single program multiple data) model • Advantage over older message passing libraries • Portability • Fast as each implementation is optimized for the hardware it runs on

  9. Interactions between Application and MPI Other processors …………………………. Communication Channel MPI process manager MPI process manager MPI interface MPI interface Initiating application Application on other processors

  10. NAS parallel benchmarks • Set of 8 programs intended to aid in evaluating the performance of parallel supercomputers • Derived from computational fluid dynamics (CFD) applications, • 5 kernels • 3 pseudo-applications • Used NPB2.4 version – MPI-based source code implementation

  11. Phases • Studied the uClinux and found the initial port done for Microblaze • Latest kernel (2.4) and distribution from uClinux.org • Successful compilation for Microblaze architecture • MPI - MPICH2 out of many versions of MPI • Investigated the MPICH2 implementation available from Argonne National Laboratory • Encountered challenges in porting MPI onto uClinux

  12. Challenges in porting MPI to uClinux • Use of fork and a complex state machine • Default process manager for unix platforms is MPD written in Python and uses a wrapper to call fork • Simple fork->vfork is not possible as the function is called deep inside other functions and require a lot of stack unwinding • Alternate Approaches • Port SMPD, written in C • It will involve a complex state machine and stack unwinding after the fork • Use pthreads • Might involve a lot of reworking of code as the current implementation is not using pthreads • Need to ensure thread safety

  13. NAS Parallel Benchmark • Used NAS PB v2.4 • Compiled and executed it on a desktop and Millennium Cluster • Obtained information about • MOPS • Type of operation • Execution time • Number of nodes involved • Number of processes and iterations

  14. NAS PB simulation result(Millennium cluster, Class A)

  15. Simulation result (cont.)

  16. Estimated statistics for the floating point group • 4 test benches use floating point op heavily are: BT, CG, MG, and SP • Very few fp comparison ops in all • BT (Block Tridiagonal) all fp ops are add, subtract, and multiply. About 5% of all ops is division • CG (Conjugate Gradient) has the highest % of ops that is sqrt, 30%. Add, mult is about 60%, divide is about 10%. • MG (Multigrid) about 5% is sqrt, 20% is division. The rest is add, subtract, and multiply • For SP (Scalar Pentadiagonal) almost all ops are add, 10% is division

  17. Floating Point Operation Frequency

  18. Most frequently used MPI functions in NASPB v2.4

  19. Observations about NASPB • NASPB suite – 6 out of 8 benchmarks are predictive of parallel performance • EP – little/negligible communication between processors. • IS – high communication overhead.

  20. Project status • Compiled uClinux and put it on Microblaze • Worked on the porting of MPI but not completed • Compiled and executed NASPB on desktop and Millennium (which currently uses 8 computing nodes)

More Related