reformulating the wrf model for graphics processors l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Reformulating the WRF Model for Graphics Processors PowerPoint Presentation
Download Presentation
Reformulating the WRF Model for Graphics Processors

Loading in 2 Seconds...

play fullscreen
1 / 21

Reformulating the WRF Model for Graphics Processors - PowerPoint PPT Presentation


  • 1441 Views
  • Uploaded on

Reformulating the WRF Model for Graphics Processors By John Ciolek Local-scale NWP on an $5K PC? 16th Meeting of the DMCC Video Gaming Industry Estimated size of the gaming industry 2005: $31.3 Billion 2006: $36.1 Billion 2007: $42.8 Billion Trend toward more realistic images

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Reformulating the WRF Model for Graphics Processors' - libitha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
reformulating the wrf model for graphics processors

Reformulating the WRF Model for Graphics Processors

By John Ciolek

Local-scale NWP on an $5K PC?

16th Meeting of the DMCC

video gaming industry
Video Gaming Industry
  • Estimated size of the gaming industry
    • 2005: $31.3 Billion
    • 2006: $36.1 Billion
    • 2007: $42.8 Billion
  • Trend toward more realistic images
    • Requires more powerful rendering hardware
    • Created explosive growth in graphics processors

May 4, 2009

graphics cards
Graphics Cards
  • Meant to plug into standard computer bus
    • Control rendering of pixels, voxels, facets, etc.
  • Controlled by the central processing unit (CPU)
  • Contain many processors
    • Graphics Processing Unit (GPU) (similar to CPU)
  • Stream processing
    • Input set of data (stream)
    • Kernel operates on the stream
      • Performs one or more operations

May 4, 2009

slide5
GPUs
  • Maximize number of processors
  • Minimize cache and control structures

May 4, 2009

memory access
Memory Access
  • Relies on localized memory
  • Slower access to main system memory
  • Note how threads are organized:
    • Grids
      • Blocks
        • Threads

May 4, 2009

programmer accessibility
Programmer Accessibility
  • Vendors created Application Programming Interfaces (APIs)
    • Programmers can access GPU’s capabilities
  • Graphics card programming languages
    • Vendor specific
      • CUDA, Brook, Cell
    • Generic
      • OpenCL
  • GPUs gained more programmer functionality
    • BLAS, FFT, PhysX

May 4, 2009

price performance explosion
Price/Performance Explosion

NVIDIA Tesla

960 Cores

Playstation 3

Cluster - 8 PS3s

Earth Simulator

5120 procs

Blue Gene/L

65,536 procs

TeraFLOPS/$Million

Roadrunner

19,440 procs

Cray 1

1 proc

ASCI Red

4,510 procs

Cray Y-MP

8 procs

May 4, 2009

serious experimenters
Serious Experimenters
  • 23.2 TeraFLOPS!
    • Running Folding@home
  • 6,240 streaming processors
  • 13 GTX 295 graphics cards
  • 14 CPU cores
  • Cost ~ $15,000

May 4, 2009

serious science
Serious Science
  • Astrophysics
  • Electrodynamics
  • Life sciences
  • Nanotechnology simulations
  • Computational fluid dynamics
  • Finance
  • Chemistry
  • Molecular dynamics
  • Etc.

May 4, 2009

the wrf connection
The WRF Connection
  • John Michalakes (NCAR)
    • Formulating & optimizing WRF
    • Group working on reformulating WRF for GPUs
      • Mostly for CUDA on NVIDIA cards
  • Claim: “Most recent performance improvements came from CPU speed increases”
    • No recoding was required
    • This will not continue to be the case

May 4, 2009

what s the catch
What’s the Catch?
  • Need to identify segments of code that can be reformulated for stream processing
    • Recode those segments
    • Recompile & link (with optimize switches)
  • Must manage memory access
  • Machine specific
    • Need to use limited instruction set
    • CUDA allows upward portability on NVIDIA devices

May 4, 2009

wrf reformulation process
WRF Reformulation Process
  • Identify target WRF packages
  • Benchmark performance of current coding
  • Identify quick improvement actions
    • Using CUDA compiler switches
    • CUDA intrinsic functions
    • FORTRAN to C conversion
  • Rewrite code
    • Rethink how to implement algorithms
    • Will take the most time
  • Revalidate

May 4, 2009

early successes
Early Successes
  • Early work on microphysics kernel
    • 0.4% of code
    • 25% of elapsed time
  • Results:
    • 5 to 20 x increase for this kernel
    • Translates to 1.25 to 1.3 x overall improvement
      • Limited by Amdahl’s Law
    • Based on simple rewrite
      • Did not attempt CUDA optimizations

May 4, 2009

microphysics kernel improvements
Microphysics Kernel Improvements
  • Compiler switch: use_fast_math
  • Eliminated temporary array storage
  • Graph is based on recent results (March 2009)

May 4, 2009

other key findings
Other Key Findings
  • Need to:
    • Reduce transfers between memories
    • Maximize number of threads actively running
    • Enhance fine-grained parallelism
      • Supports “strong-scaling”
        • N times more threads ~ N times better performance
    • Explore hardware-specific optimization
  • Work is continuing on WRF rewrite
    • Next WRF release will have GPU switch
    • Need additional help from community

May 4, 2009

target wrf kernels
Target WRF Kernels
  • Single Moment 5 Cloud Microphysics
  • 5th Order Positive Definite Tracer Advection
  • KPP-generated Chemical-kinetics Solver
  • Long-wave Radiation Physics
  • Short-wave Radiation Physics

May 4, 2009

quote
Quote:
  • “I wouldn’t recommend groups go out and buy GPU clusters just yet (to run WRF), but maybe by the end of the year…”
    • John Michalakes

May 4, 2009

the beginning
The Beginning…

John Ciolek

jciolek@alphatrac.com

http://www.mmm.ucar.edu/wrf/WG2/GPU/

http://www.nvidia.com/page/home.html

May 4, 2009