gpu architecture cg l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
GPU Architecture & Cg PowerPoint Presentation
Download Presentation
GPU Architecture & Cg

Loading in 2 Seconds...

play fullscreen
1 / 39

GPU Architecture & Cg - PowerPoint PPT Presentation


  • 694 Views
  • Uploaded on

GPU Architecture & Cg Mark Colbert PhD Candidate UCF Graphics Group / MCL colbert@cs.ucf.edu © 2006 University of Central Florida welcome Assumptions Experienced in C\C++ Basic OpenGL\DirectX Knowledge Some Graphics Knowledge Familiarity with Geometric Transformations Linear Algebra

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'GPU Architecture & Cg' - emily


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
gpu architecture cg

GPU Architecture & Cg

Mark ColbertPhD CandidateUCF Graphics Group / MCL

colbert@cs.ucf.edu

© 2006 University of Central Florida

welcome
welcome
  • Assumptions
    • Experienced in C\C++
    • Basic OpenGL\DirectX Knowledge
    • Some Graphics Knowledge
    • Familiarity with Geometric Transformations
    • Linear Algebra
  • Purpose
    • Introduction to the Programmable GPU
    • Extremely Fast Paced
    • For the Geeky at Heart
overview
overview
  • GPU Architecture
  • GPU Pipeline
  • Introduction to Cg
  • Implications for GPGPU
slide4
GPU
  • Graphics Processing Unit
  • Parallelized SIMD Architecture
  • Denoted as Pipes
    • 24 fragment pipes on nVidia 7800
  • Each Pipe Handles 4 Vector Operations
rules of the game
rules of the game
  • Not a Generalized Vector Processor
  • Cannot read and write to same areas of memory
  • Limited output capability
    • Currently, very expensive to output to locations arbitrary locations in memory
notation
notation
  • Vertex
    • A data structure for a point in a mesh, containing position, normal, texture coordinates and more…
  • Fragment
    • A pixel, possibly sub-pixel, of a rasterized image
  • Shaders
    • Small programs ran in the GPU at specific stages of the GPU pipeline
memory constructs
memory constructs
  • Buffered Objects
  • Uniform Registers/State Table
  • Interpolated Registers
  • Temporary Registers
  • Textures
memory constructs8
memory constructs
  • Buffered Objects
    • CPU Generated Streams of Data
    • Limited Modifiability
    • Example
      • Vertex Data of a Mesh
memory constructs9
memory constructs
  • Uniform Registers/State Table
    • Constant Data through the Pipeline
      • Only Necessarily Constant for 1 Polygon
    • 32 general purpose registers
    • State Table Specific Registers
      • Projection/Model View Matrices
      • Lights
      • … and more
memory constructs10
memory constructs
  • Interpolated Registers
    • Per Vertex Data of a Polygon
    • Stores Information Interpolated Across Polygon
    • 10 General Purpose Interpolated Registers
memory constructs11
memory constructs
  • Temporary Registers
    • Standard Notion of Registers
    • Temporary Registers for In Shader Calculations
memory constructs12
memory constructs
  • Textures
    • Closest to Random Access Memory
    • Expensive to Access
      • Multiple Dependent AccessesExtremely Expensive
gpu pipeline
GPU pipeline

Program/

API

Driver

CPU

Bus

GPU

GPU Front End

Vertex

Processing

Primitive

Assembly

Rasterization &

Interpolation

Fragment

Processing

Raster

Operations

Framebuffer

gpu pipeline14

Program/

API

GPU pipeline
  • Program
    • Your Program
  • API
    • Either OpenGL or DirectX Interface
gpu pipeline15

Driver

GPU pipeline
  • Driver
    • Black-box
      • Implementations are Company Secrets
    • Largest Bottleneck in many GPU programs
gpu pipeline16

GPU Front End

GPU pipeline
  • GPU Front End
    • Receives commands & data from driver
    • PCI Express helps at this stage
gpu pipeline17

Vertex

Processing

GPU pipeline
  • Vertex Processing
    • Normally performs transformations
    • Programmable

data for rasterization

POSITION

vertex

POSITION, NORMAL, BINORMAL*, TANGENT*, TEXCOORD[0-7], COLOR[0-1], PSIZE

PSIZE

Vertex

Processor

FOG

TEXCOORD[0-7]

COLOR[0-1]

shader

data for interpolation

textures

gpu pipeline18

Primitive

Assembly

GPU pipeline
  • Primitive Assembly
    • Compiles Vertices into Points, Lines and/or Polygons
    • Link elements and set rasterizer
gpu pipeline19

Rasterization &

Interpolation

GPU pipeline
  • Rasterization
    • For each fragment determine respective area of triangle (Barycentric Coordinates) or other primitive
  • Interpolation

Primitive

Assembler

Primitive Type

data for rasterization

POSITION

Rasterizer

rasterized data

PSIZE

DEPTH

Barycentric

Coordinates

FOG

TEXCOORD[0-7]

COLOR[0-1]

Interpolator

TEXCOORD[0-7]

COLOR[0-1]

interpolated data

data for interpolation

gpu pipeline20

Fragment

Processing

GPU pipeline
  • Fragment Processing
    • Programmable

rasterized data

Fragment

Processor

data for raster ops

DEPTH

COLOR[0-3]

TEXCOORD[0-7]

COLOR[0-1]

DEPTH

shader

interpolated data

textures

gpu pipeline21

Raster

Operations

GPU pipeline
  • Depth Checking
    • Check framebuffer to see if lesser depth already exists (Z-Buffer)
    • Limited Programmability
  • Blending
    • Use alpha channel to combine colors already in the framebuffer
    • Limited Programmability
example
example

Program/

API

Code Snippet

….

glBegin(GL_TRIANGLES);

glTexCoord2f(1,0); glVertex3f(0,1,0);

glTexCoord2f(0,1); glVertex3f(-1,-1,0);

glTexCoord2f(0,0); glVertex3f(1,-1,0);

glEnd();

Driver

Bus

GPU Front End

Vertex

Processing

Primitive

Assembly

Rasterization &

Interpolation

Fragment

Processing

Raster

Operations

Framebuffer(s)

example23

GPU

example

Program/

API

Driver

Bus

GPU Front End

01001001100….

Vertex

Processing

Primitive

Assembly

Rasterization &

Interpolation

Fragment

Processing

Raster

Operations

Framebuffer(s)

example24
example

Program/

API

Driver

Bus

GPU Front End

Vertex

Processing

viewing frustum

Primitive

Assembly

Rasterization &

Interpolation

Fragment

Processing

Raster

Operations

Framebuffer(s)

example25
example

Program/

API

Driver

Bus

GPU Front End

Vertex

Processing

screen space

Primitive

Assembly

Rasterization &

Interpolation

Fragment

Processing

Raster

Operations

Framebuffer(s)

example26
example

Program/

API

Driver

Bus

GPU Front End

Vertex

Processing

framebuffer

Primitive

Assembly

Rasterization &

Interpolation

Fragment

Processing

Raster

Operations

Framebuffer(s)

example27
example

Program/

API

Driver

Bus

GPU Front End

Vertex

Processing

framebuffer

Primitive

Assembly

Rasterization &

Interpolation

Fragment

Processing

Raster

Operations

Framebuffer(s)

quick architecture notes
quick architecture notes
  • Limits in Shader Size
    • Pixel Shader 3.0 Spec
      • Vertex Program – 65535 asm instructions
      • Fragment Program – 65535+ asm instructions
  • MIMD
    • Branches are supported with a large overhead
  • Rasterizer & Interpolator
    • Programmable in DirectX 10
    • Geometric Shaders
  • Unified Shading Architecture
    • Xbox 360 – ATI
    • Pool of processors with load balancing
higher level shading languages
higher level shading languages
  • Vectorized languages for designing shader programs
  • Easy way out of tedious assembly coding
  • Not Perfect
    • Results Are Sometimes Clearly Not Optimized
  • Examples
    • Cg
    • GLSL
    • HLSL
slide30
Cg
  • nVidia’s Solution
  • Nearly Identical to HLSL
  • C++ Based
    • New Intrinsic Classes
    • New Intrinsic Functions
    • Semantics
slide31
Cg
  • Intrinsic Classes
    • Vectorized Primitives
      • i.e. float2, float3, float4
    • 16-bit Floating Point Constructs
      • half, half2, half3, half4
      • not enabled in ARB shaders
    • Fixed Precision Decimals
      • fixed, fixed2, fixed3, fixed4
      • Not enabled in ARB shaders
slide32
Cg
  • Intrinsic Classes (cont’d)
    • Membership Access
      • Constructor
        • e.g. float4 v = float4(a,b,c,d);
      • Array Operator
        • e.g. v[0], v[1], v[2], or v[3]
      • Swizzle Operator
        • Re-order/Build Vectors
          • e.g. v.xyz, v.xxxz, v.yyx, v.yx, v.xyzw
        • Replaceable with rgba instead of xyzw
slide33
Cg
  • Intrinsic Classes (cont’d)
    • Matrices
      • Compounded Vector Classes
        • e.g. float4x4
      • Constructed with multiple vectors
        • float4 v = float4(a,b,c,d); float4x4 m = float4x4(v,v,v,v);
    • Samplers
      • Texture Data Type
        • sampler1D, sampler2D, samplerRECT, sampler3D
        • samplerRECT – Same as sampler2D but uses pixel locations as texture coordinates instead of from [0,1]
slide34
Cg
  • Intrinsic Functions
    • Many have direct correspondence to assembly instructions or good approximations
    • Linear Algebra Functions
      • dot(a,b) – Dot Product
      • mul – Matrix-Matrix, Vector-Matrix, or Matrix-Vector multiplication
    • Texture Lookup Functions
      • tex*(sampler* texture, float* texCoord)
      • * - The dimensionality of the texture
slide35
Cg
  • Intrinsic Functions (cont’d)
    • Geometric Intrinsic
      • distance, faceforward, length, normalize, reflect, refract
    • A good chunk of math.h
      • Most Taylor series expansions for two coefficients
slide36
Cg
  • Semantics
    • Binds variables to GPU Memory Constructs
    • Uniform Registers
      • In declaration, use keyword uniform in front of variable type
    • Vertex Data/Interpolated Registers
      • float* varName : SEMANTIC
      • Only used as main function parameteror global variable
    • Textures
      • Same as uniform variable
fx composer
FX Composer
  • Program for quick shader design
  • Uses Cg as underlying shading language
  • Additional Semantic Bindings
  • NOTE: Uses DirectX as base, so uses vector-matrix multiplication notation
fx composer38
FX Composer
  • Walkthrough Example
gpgpu
GPGPU
  • General Purpose GPU Processing
  • Key Notes
    • Goal to exploit fragment processor
    • Each pixel represents a compacted 4-component element of data
    • Most optimal in gathering algorithms
      • Vertex shader needed to re-order output
      • Possibly Optimal in Unified Shading Architecture