Modified from a survey of general purpose computation on graphics hardware
Download
1 / 41

Modified from: - PowerPoint PPT Presentation


  • 188 Views
  • Updated On :

Modified from: A Survey of General-Purpose Computation on Graphics Hardware. John Owens University of California, Davis. David Luebke University of Virginia. with Naga Govindaraju, Mark Harris, Jens Kr ü ger, Aaron Lefohn, Tim Purcell. Motivation: The Potential of GPGPU. In short:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Modified from:' - ivanbritt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Modified from a survey of general purpose computation on graphics hardware l.jpg

Modified from:A Survey of General-Purpose Computation on Graphics Hardware

John Owens

University of California, Davis

David Luebke

University of Virginia

with Naga Govindaraju, Mark Harris, Jens Krüger, Aaron Lefohn, Tim Purcell


Motivation the potential of gpgpu l.jpg
Motivation: The Potential of GPGPU

  • In short:

    • The power and flexibility of GPUs makes them an attractive platform for general-purpose computation

    • Example applications range from in-game physics simulation to conventional computational science

    • Goal: make the inexpensive power of the GPU available to developers as a sort of computational coprocessor


Problems difficult to use l.jpg
Problems: Difficult To Use

  • GPUs designed for & driven by video games

    • Programming model unusual

    • Programming idioms tied to computer graphics

    • Programming environment tightly constrained

  • Underlying architectures are:

    • Inherently parallel

    • Rapidly evolving (even in basic feature set!)

    • Largely secret

  • Can’t simply “port” CPU code!


Star goals l.jpg
STAR Goals

  • Detailed & useful survey of general-purpose computing on graphics hardware

    • Hardware and software developments behind GPGPU

    • Building blocks: techniques for mapping general-purpose computation to the GPU

    • Applications: important applications of GPGPU

    • A comprehensive GPGPU bibliography


Nvidia geforce 6800 3d pipeline l.jpg
NVIDIA GeForce 6800 3D Pipeline

Vertex

Triangle Setup

Z-Cull

Shader Instruction Dispatch

Fragment

L2 Tex

Fragment Crossbar

Composite

Memory

Partition

Memory

Partition

Memory

Partition

Memory

Partition

Courtesy Nick Triantos, NVIDIA


Programming a gpu for graphics l.jpg

  • Each fragment is shaded w/ SIMD program

  • Shading can use values from texture memory

  • Image can be used as texture on future passes

Programming a GPU for Graphics


Programming a gpu for gp programs l.jpg

  • Run a SIMD kernel over each fragment

  • “Gather” is permitted from texture memory

  • Resulting buffer can be treated as texture on next pass

Programming a GPU for GP Programs


Feedback l.jpg
Feedback

  • Each algorithm step depend on the results of previous steps

  • Each time step depends on the results of the previous time step


Cpu gpu analogies l.jpg
CPU-GPU Analogies

. . .

Grid[i][j]= x; . . .

Array Write = Render to Texture

CPU

GPU


Cpu gpu analogies10 l.jpg
CPU-GPU Analogies

CPU GPU

Stream / Data Array = Texture

Memory Read = Texture Sample


Kernels l.jpg
Kernels

Kernel / loop body / algorithm step = Fragment Program

CPU

GPU


Scatter vs gather l.jpg
Scatter vs. Gather

  • Grid communication

    • Grid cells share information

  • Gather

    • Indirect read from memory ( x = a[i])

    • Naturally maps to a texture fetch

    • Used to access data structures and data streams

  • Scatter

    • Indirect write to memory (a[i] = x)

    • Difficult to emulate:

    • Usually done on CPU


Computational resources inventory l.jpg
Computational Resources Inventory

  • Programmable parallel processors

    • Vertex & Fragment pipelines

  • Rasterizer

    • Mostly useful for interpolating addresses (texture coordinates) and per-vertex constants

  • Texture unit

    • Read-only memory interface

  • Render to texture

    • Write-only memory interface


Vertex processor l.jpg
Vertex Processor

  • Fully programmable (SIMD / MIMD)

  • Processes 4-vectors (RGBA / XYZW)

  • Capable of scatter but not gather

    • Can change the location of current vertex

    • Cannot read info from other vertices

    • Can only read a small constant memory

  • Latest GPUs: Vertex Texture Fetch

    • Random access memory for vertices

    • Gather (But not from the vertex stream itself)


Fragment processor l.jpg
Fragment Processor

  • Fully programmable (SIMD)

  • Processes 4-component vectors (RGBA / XYZW)

  • Random access memory read (textures)

  • Capable of gather but not scatter

    • RAM read (texture fetch), but no RAM write

    • Output address fixed to a specific pixel

  • Typically more useful than vertex processor

    • More fragment pipelines than vertex pipelines

    • Direct output (fragment processor is at end of pipeline)



Gpgpu building blocks l.jpg
GPGPU Building Blocks

  • fundamental techniques & computational building blocks:

    • Flow control (a very fundamental building block)

    • Stream operations

    • Data structures

    • Differential equations & linear algebra

    • Data queries


Flow control l.jpg
Flow control

  • Surprising number of issues on GPUs

  • Main themes:

    • Avoid branching when possible

    • Move branching earlier in the pipeline when possible

    • Largely SIMD  coherent branching most efficient

  • Mechanisms:

    • Rasterized geometry

    • Z-cull

    • Occlusion query


Domain decomposition l.jpg
Domain Decomposition

  • Avoid branches where outcome is fixed

    • One region is always true, another false

    • Separate FPs for each region, no branches

  • Example: boundaries



Flat 3d textures21 l.jpg
Flat 3D Textures

  • Advantages

    • One texture update per operation

    • Better use of GPU parallelism

    • Non-power-of-two Textures

    • Quick simulation preview

  • Disadvantage

    • Must compute texture offsets


Staggered simulation l.jpg
Staggered Simulation

  • Non-interactive application:

    • Simulate as fast as possible

    • Frame rate suffers

20ms


Staggered simulation23 l.jpg
Staggered Simulation

  • Interactive frame rate!

    • Simulation still proceeds pretty fast

10

20ms


Z cull l.jpg
Z-Cull

  • In early pass, modify depth buffer

    • Write depth=0 for pixels that should not be modified by later passes

    • Write depth=1 for rest

  • Subsequent passes

    • Enable depth test (GL_LESS)

    • Draw full-screen quad at z=0.5

    • Only pixels with previous depth=1 will be processed

  • Can also use early stencil test

  • Note: Depth replace disables ZCull


Pre computation l.jpg
Pre-computation

  • Pre-compute anything that will not change every iteration!

  • Example: arbitrary boundaries

    • When user draws boundaries, compute texture containing boundary info for cells

    • Reuse that texture until boundaries modified

    • Combine with Z-cull for higher performance!


Stream operations l.jpg
Stream Operations

  • Several stream operations in GPGPU toolkit:

    • Map: apply a function to every element in a stream

    • Reduce: use a function to reduce a stream to a smaller stream (often 1 element)

    • Scatter/gather: indirect read and write

    • Filter: select a subset of elements in a stream

    • Sort: order elements in a stream

    • Search: find a given element, nearest neighbors, etc


Simple fire effect l.jpg
Simple Fire Effect

  • Blur and scroll upward

  • Trails of blur emerge from bright source ‘embers’ at the bottom

VA

VC

VB

VD


Cellular automata l.jpg
Cellular Automata

  • Great for generating noise and other animated patterns to use in blending

  • Game of Life in a Pixel Shader

    • Cell ‘state’ relative to the rules is computed at each texel

    • Dependent texture read

    • State accesses ‘rules’ table, which is a texture

  • Highly complex rules are easy!

  • The Rules

  • For a space that is 'populated':

    • Each cell with one or no neighbors dies,

    • as if by loneliness.

    • Each cell with four or more neighbors dies,

    • as if by overpopulation.

    • Each cell with two or three neighbors survives.

  • For a space that is 'empty' or 'unpopulated'

  • Each cell with three neighbors becomes populated


Lattice computations l.jpg
Lattice Computations

  • How far can we take them?

    • Anything we can describe with discrete PDE equations!

      • Discrete in space and time

    • Also other approximations


Approximate methods l.jpg
Approximate Methods

  • Several different approximations

    • Cellular Automata (CA)

    • Coupled Map Lattice (CML)

    • Lattice-Boltzmann Methods (LBM)


Coupled map lattice l.jpg
Coupled Map Lattice

  • Mapping:

    • Continuous state  lattice nodes

  • Coupling:

    • Nodes interact with each other to produce new state according to specified rules


Coupled map lattice32 l.jpg
Coupled Map Lattice

  • CML introduced by Kaneko (1980s)

    • Used CML to study spatio-temporal chaos

    • Others adapted CML to physical simulation:

      • Boiling [Yanagita 1992]

      • Convection [Yanagita 1993]

      • Clouds [Yanagita 1997; Miyazaki 2001]

      • Chemical reaction-diffusion [Kapral ‘93]

      • Saltation (sand ripples / dunes) [ Nishimori ‘93]

      • And more


Cml vs ca l.jpg
CML vs. CA

  • CML extends cellular automata (CA)


Cml vs ca34 l.jpg
CML vs. CA

  • Continuous state is more useful

    • Discrete: physical quantities difficult

      • Must filter over many nodes to get “real” values

    • Continuous: physical quantities easy

      • Real physical values at each node

      • Temperature, velocity, concentration, etc.


Rules l.jpg
Rules?

  • CML updated via simple, local rules

    • Simple: same rule applied at every cell (SIMD)

    • Local: cells updated according to some function of their neighbors’ state


Example buoyancy l.jpg
Example: Buoyancy

  • Used in temperature-based boiling simulation

  • At each cell:

    • If neighbors to left and right of cell are warmer, raise the cell’s temperature

    • If neighbors are cooler, lower its temperature


Cml operations l.jpg
CML Operations

  • Implement operations as building blocks for use in multiple simulations

    • Diffusion

    • Buoyancy (2 types)

    • Latent Heat

    • Advection

    • Viscosity / Pressure

    • Gray-Scott Chemical Reaction

    • Boundary Conditions

    • User interaction (drawing)

    • Transfer function (color gradient)


Anatomy of a cml operation l.jpg
Anatomy of a CML operation

  • Neighbor Sampling

    • Select and read values, v, of nearby cells

  • Computation on Neighbors

    • Compute f(v) for each sample (f can be arbitrary computation)

  • Combine new values (arithmetic)

  • Store new values back in lattice


Graphics hardware l.jpg
Graphics Hardware

  • Why use it?

    • Speed: up to 25x speedup in our sims

    • GPU perf. grows faster than CPU perf.

    • Cheap: GeForce 4 Ti 4200 < $130

    • Load balancing in complex applications

  • Why not use it?

    • Low precision computation (not anymore!)

    • Difficult to program (not anymore!)



Simulating the world l.jpg
Simulating the world

  • Simulate a wide variety of phenomena on GPUs

    • Anything we can describe with discrete PDEs or approximations of PDEs


ad