1 / 16

Re-engineering Applications for Data-Parallel Hardware

Re-engineering Applications for Data-Parallel Hardware. Opportunities and Challenges. Santonu Sarkar SETLabs, Infosys. The Chemistry of Concurrent and Distributed Programming Mysore Park Workshop, Mysore, India. Feb 16-19, 2011. Distributed Computing. Paradigm Shift in Computing.

ivana
Download Presentation

Re-engineering Applications for Data-Parallel Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Re-engineering Applications for Data-Parallel Hardware Opportunities and Challenges Santonu Sarkar SETLabs, Infosys The Chemistry of Concurrent and Distributed Programming Mysore Park Workshop, Mysore, India. Feb 16-19, 2011

  2. Distributed Computing Paradigm Shift in Computing

  3. GPGPU as Computing Platform • General Purpose computing on Graphics Processing Units • Using GPUs for computation intensive, non-graphical applications • Why GPU Computing? • GPUs are faster, programmable, easily available and cheap • Change in Computing Paradigm • Traditional super-scalar architectures have their limits for intensive workloads • Parallel computing becoming a common-place Cannot be automatically leveraged

  4. Desktop “Super”computing Linpack benchmark CPU Server: 2x Intel Xeon X5550, 2.66 GHz, 48GB Memory, $7K, 0.55KW CPU-GPU Server: 2x Tesla C2050 + 2x Intel Xeon X5550, 48GB Memory, $11K, 1KW http://www.vpac.org/files/GPU-Slides/01.tesla_introduction.pdf 37 TFlop Top 150 System

  5. New platform offers HIGH cost-performance ratio, low power usage BUT…. Programming for Parallelism is not easy CCDP 2011, Mysore Park Workshop

  6. Why is Parallel Programming Difficult? Sequential Approach Start with 1 Add -1/3 Add +1/5 Add (-1)n/(2n+1) ∏= 4x result Parallel Approach +1 Generate large no. of random points (x,y) within (-1, +1) -1 +1 -1 Which point falls within circle? Count number of points within circle ∏= 4x (number within circle)/(total number of points) D. Patterson, “The Trouble with Multi-Core”, IEEE Spectrum 2010 Parallel Programming needs entirely different way of thinking.. For example- Calculating Value of ∏

  7. HPC – Crossing the Chasm New Infrastructure * More and more raw compute power (GPU/ many-core/Cloud) Business/ Scientific Computation * Ever increasing demand • New design challenges • Architecture-aware design • GPU memory hierarchy, thread model • Elastic infrastructure • Data-driven computation (functional programming paradigm) • Software engineering support • Design Assistance • Programming Assistance • Verification, validation • Transformation, refactoring Building parallel algorithm is 5 to 10 times harder Existing applications are not meant for parallel infrastructure

  8. Focus Area

  9. Why is Parallel Workbench important? Challenges/Research Questions How do I refractor my application to exploit multiple cores on the CPU and GPU? How do I simplify the design and implementation of parallel applications? How do I optimally use the computing power? Optimal usage of Thread Optimal usage of Memory Optimal usage of Clusters • Faster to build • Faster to re-factor • Help to hide architectural complexity • Better portability • Better code • usage of hardware resource

  10. Source Code Parallelization Assistant CCDP 2011, Mysore Park Workshop

  11. CCDP 2011, Mysore Park Workshop

  12. Approach to Loop Analysis • Makes loop conditions as • affine constraints (i.e. linear + constants) • to form a polytope for(i=0;i<n;i++){ for(j=0; j<i; j++){ S; } } An integral polytope has an associated Ehrhart polynomial which encodes the relationship between the volume of a polytope and the number of integer points the polytope contains. All the polyhedra points denote the variable values wherein the loop conditions are satisfied. Use a polytope solver to approximate the total number of iterations. Barvinok, A. I. (2006). Computing the Ehrhart quasi-polynomial of a rational simplex. Math. Comp. 75, 1449–1466

  13. Volume Computation • Volume computation is performed by Barvinok, an opensourcepolytope library. • Given a polytope represented by a set of affine inequalities we can determine the volume of the polytope by subdividing it into simplexes • Simplexes are a generalization of the triangle to N-dimensions whose volume can be easily computed using linear algebra. • The final result is obtained by summing together the number of points inside all the simplexes. CCDP 2011, Mysore Park Workshop

  14. Example Code – dcraw.c CCDP 2011, Mysore Park Workshop

  15. Barvinok Equation CCDP 2011, Mysore Park Workshop

  16. Future Work • Enabling developer to supply domain specific knowledge • Devising usable parameters • Use of source code annotations • Program slicing to enable quicker analysis • Loop iteration dependency analysis • Generation of GPU specific code for the identified part

More Related