1 / 53

Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1

Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1. Charles B. Cameron. United States Naval Academy Department of Electrical Engineering United States Naval Academy 105 Maryland Avenue, Stop 14B Annapolis, Maryland 21402-5025. Research supported by:

logan-lott
Download Presentation

Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using FPGAs to Supplement Ray-Tracing Computations on the Cray XD-1 Charles B. Cameron United States Naval Academy Department of Electrical Engineering United States Naval Academy 105 Maryland Avenue, Stop 14B Annapolis, Maryland 21402-5025 • Research supported by: • NASA Goddard Space Flight Center (Code 586) • NRL Applied Optics Branch (Code 5630) • DoD High Performance Computing Modernization Program at NRL (Code 5593) • United States Naval Academy • Xilinx, Inc.

  2. Topics • Ray tracing • Conventional parallel processing • Modulo scheduling • Coordination of sequential and parallel processing • Expected Performance

  3. Ray tracing • MODIS • Moderate-resolution Imaging Spectroradiometer • The Intersection Problem • Finding the Perpendicular • Refraction • Reflection

  4. MODIS Optical System (Moderate-resolution Imaging Spectroradiometer)

  5. MODIS Optical System • 485 pinholes • 400 rays per pinhole • 241 ´ 121 rays reflected from the diffuser • 5.66 ´ 109 rays

  6. Ray Directed to a Surface • MODIS • Moderate-resolution Imaging Spectroradiometer • The Intersection Problem • Finding the Perpendicular • Refraction • Reflection • Coordinate Transformation

  7. Calculate the Intercept Point • MODIS • Moderate-resolution Imaging Spectroradiometer • The Intersection Problem • Finding the Perpendicular • Refraction • Reflection • Coordinate Transformation

  8. Find the Normal • MODIS • Moderate-resolution Imaging Spectroradiometer • The Intersection Problem • Finding the Perpendicular • Refraction • Reflection • Coordinate Transformation

  9. Find the Refracted Ray • MODIS • Moderate-resolution Imaging Spectroradiometer • The Intersection Problem • Finding the Perpendicular • Refraction • Reflection • Coordinate Transformation

  10. Find the Reflected Ray • MODIS • Moderate-resolution Imaging Spectroradiometer • The Intersection Problem • Finding the Perpendicular • Refraction • Reflection • Coordinate Transformation

  11. Coordinate Transformation • MODIS • Moderate-resolution Imaging Spectroradiometer • The Intersection Problem • Finding the Perpendicular • Refraction • Reflection • Coordinate Transformation (Hard to visualize this!)

  12. Topics • Ray tracing • Conventional parallel processing • Modulo scheduling • Coordination of sequential and parallel processing • Expected Performance

  13. Parallelism

  14. Performance (5.66 ´ 109 rays) * 99.998 % 5,857 % * Rate based on a linear regression of results obtained using a varying numbers of processors.

  15. Performance (5.66 ´ 109 rays)

  16. Efficiency

  17. Topics • Ray tracing • Conventional parallel processing • Modulo scheduling • Coordination of sequential and parallel processing • Expected Performance

  18. Operations Required as a Function of Surface, Aperture, and Interaction Types Not too many of these Lots of these

  19. Quadratic Equation Latency Critical Path (Data-Flow Limit) 88 cycles

  20. Modulo Scheduling:One Multiplier

  21. Modulo Scheduling:One Multiplier

  22. Modulo Scheduling:One Multiplier

  23. Modulo Scheduling:One Multiplier

  24. Modulo Scheduling:One Multiplier

  25. Modulo Scheduling:One Multiplier

  26. Modulo Scheduling:One Multiplier

  27. Modulo Scheduling:One Multiplier Equal to the Data-Flow Limit

  28. Modulo Scheduling:Filling the Pipeline One collective computation

  29. Modulo Scheduling:Filling the Pipeline

  30. Modulo Scheduling:Filling the Pipeline Multipliers are 100 % utilized No schedule conflicts

  31. Modulo Scheduling:Two Multipliers Two multipliers with two multiplications each

  32. Modulo Scheduling:Two Multipliers One adder with two additions Two cycles Maximum efficiency

  33. Modulo Scheduling:Two Multipliers Improved efficiency: Up from 25 %

  34. Modulo Scheduling:Two Multipliers

  35. Modulo Scheduling:Two Multipliers

  36. Modulo Scheduling:Two Multipliers Less than the Data-Flow Limit

  37. Modulo Scheduling:Two Multipliers Less than the Data-Flow Limit, but double the throughput.

  38. Topics • Ray tracing • Conventional parallel processing • Modulo scheduling • Coordination of sequential and parallel processing • Expected Performance

  39. Cray XD-1 • MPI (Message Passing Interface) • Master node • Reads file • Distributes file • Collates results

  40. One Node of the Cray XD-1 • Open MP (Multi Processing) • 144 of 220 nodes have a Xilinx Virtex II Pro FPGA • Opteron processors • Sequential program • Depth first • FPGA • Pipelined hardware • Breadth first

  41. Topics • Ray tracing • Conventional parallel processing • Modulo scheduling • Coordination of sequential and parallel processing • Expected Performance

  42. Performance

  43. Performance

  44. Performance

  45. Performance

  46. Summary • Modulo scheduling produces 100 % efficiency of critical resources. • Sequential processors get a boost from supplemental FPGA processing. • Deep pipelines are efficient only if filled much of the time. • FPGAs beat ASICs only if they can take advantage of special problem knowledge. • Opteron uses 55 W. • Virtex II Pro FPGA uses 4 W to 45 W.

  47. Equations • Intersection of a Ray with a Plane • Intersection of a Ray with a Sphere • Intersection of a Ray with a Conicoid • Finding the Perpendicular • Interaction of a Ray with an Optical Surface • Coordinate Transformations

  48. Intersection of a Ray with a Plane Point in the plane Initial direction Final point Initial point Normal to the plane List of equations

  49. Intersection of a Ray with a Sphere Initial direction Final point Initial point List of equations

  50. Intersection of a Ray with a Conicoid Final point Initial point Initial direction List of equations

More Related