1 / 34

A User-Programmable Vertex Engine

A User-Programmable Vertex Engine. Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation Presented by Han-Wei Shen. Where does the Vertex Engine fit? . Transform & Lighting. Traditional Graphics Pipeline. setup rasterizer. texture blending. frame-buffer anti-aliasing.

lori
Download Presentation

A User-Programmable Vertex Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A User-Programmable Vertex Engine • Erik Lindholm • Mark Kilgard • Henry Moreton • NVIDIA Corporation • Presented by Han-Wei Shen

  2. Where does the Vertex Engine fit? Transform & Lighting Traditional Graphics Pipeline setup rasterizer texture blending frame-buffer anti-aliasing

  3. GeForce 3 Vertex Engine Vertex Program Transform & Lighting setup rasterizer texture blending frame-buffer anti-aliasing

  4. API Support • Designed to fit into OpenGL and D3D API’s • Program mode vs. Fixed function mode • Load and bind program • Simple to add to old D3D and OpenGL programs

  5. Programming Model • Enable vertex program • glEnable(GL_VERTEX_PROGRAM_NV); • Create vertex program object • Bind vertex program object • Execute vertex program object

  6. Create Vertex Program • Programs (assembly) are defined inline as • character strings static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \ END";

  7. Create Vertex Program (2) • Load and bind vertex programs similar to texture objects glLoadProgramNV(GL_VERTEX_PROGRAM_NV, 7, strelen(programString), programString); …. glBindProgramNV(GL_VERTEX_PROGRAM_NV, 7);

  8. Invoke Vertex Program • The vertex program is initiated when a vertex is given, i.e., when • glBegin(…) • glVertex3f(x,y,z) • … • glEnd()

  9. Let’s look at the sample program static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \ END"; O[HPOS] = M(c0,c1,c2,c3) * v - HPOS? O[COL0] = v[3] - COL0? Calculate the clip space point position and Assign the vertex with v[3] as its diffuse color

  10. Programming Model V[0] … V[15] VertexSource Program Constants c[0] … c[96] 16x4 registers O[HPOS] O[COL0] O[COL1] O[FOGP] O[PSIZ] O[TEX0] … O[TEX7] Vertex Program 96x4 registers R0 … R11 Temporary Registers 128 instructions 12x4 registers Vertex Output 15x4 registers All quad floats

  11. Input Vertex Attributes • V[0] – V[15] • Aliased (tracked) with conventional per-vertex attributes (Table 3) • Use glVertexAttribNV() to explicitly assig values • Can also specify a scalar value to the vertex attribute array - glVertexAttributesNV() • Can change values inside or outside glBegin()/glEnd() pair

  12. Program Constants • Can only change values outside glBegin()/glEnd() pair • No automatic aliasing • Can be used to track OpenGl matrices (modelview, projection, texture, etc.) • Example: • glTrackMatrix(GL_VERTEX_PROGRAM_NV, 0, GL_MODELVIEW_PROJECTION_NV, GL_IDENTIGY_NV) • - track 4 contiguous program constants starting with c[0]

  13. Program Constants (cont’d) • DP4 o[HPOS].x, c[0], v[OPOS] • DP4 o[HPOS].y, c[1], v[OPOS] • DP4 o[HPOS].z, c[2], v[OPOS] • DP4 o[HPOS].w, c[3], v[OPOS] • What does it do?

  14. Program Constants (cont’d) • glTrackMatrixNV(GL_VERTEX_PROGRAM_NV, 4, GL_MODEL_VIEW, GL_INVERSE_TRANPOSE_NV) • DP3 R0.x, C[4], V[NRML] • DP3 R0.y, C[5[, V[NRML] • DP3 R0.z, C[6], V[NRML] • What doe it do?

  15. Hardware Block Diagram Vertex In Vertex Attribute Buffer (VAB) Vector FP Core Vertex Out

  16. Vertex Attribute Buffer (VAB) 128 ( 32 x 4 ) … VAB dirty bits 128 IB 0 1 14 15 ….

  17. HW Block Diagram

  18. Data Path X Y Z W X Y Z W X Y Z W Swizzle Swizzle Swizzle Negate Negate Negate FPU Core Write Mask X Y Z W

  19. Instruction Set: The ops • 17 instructions total • MOV, MUL, ADD, MAD, DST • DP3, DP4 • MIN, MAX, SLT, SGE • RCP, RSQ, LOG, EXP, LIT • ARL

  20. Instruction Set: The Core Features • Immediate access to sources • Swizzle/negate on all sources • Write mask on all destinations • DP3,DP4 most common graphics ops • Cross product is MUL+MAD with swizzling • LIT instruction implements phonglighting

  21. Dot Product Instruction • DP3 R0.x, R1, R2 • R0.x = R1.x * R2.x + R1.y * R1.y + R1.z * R2.z • DP4 R0.x, R1, R2 • 4-component dot product

  22. MUL instruction • MUL R1, R0, R2 (component-wise mult.) • R1.x = R0.x * R2.x • R1.y = R0.y * R2.y • R1.z = R0.z * R2.z • R1.w = R0.w * R2.w

  23. MAD instruction • MAD R1, R2, R3, R4 • R1 = R2 * R3 + R4 • *: component wise multiplication • Example: • MAD R1, R0.yzxw, R2.zxyw, -R1 • What does it do?

  24. Cross Product Coding Example # Cross product R2 = R0 x R1 MUL R2, R0.zxyw, R1.yzxw; MAD R2, R0.yzxw, R1.zxyw, -R2;

  25. Lighting instruction • LIT R1, R0 (phong light model) • Input: R0 = (diffuse, specular, ??, shiness) • Output R1 = (1, diffuse, specular^shininess, 1) • Usually followed by • DP3 o[COL0], C[21], R1 (assuming using c[21]) • where C[xx] = (ka, kd, ks, ??)

  26. Ready to trace some program?

  27. Previous Work: Geometry Engine • High bandwidth + lots of Flops • Low clock rate • No architectural continuity • VERY hard to program • Some high-level language support (maybe) • A compromise solution (vtx,prim,pix,…)

  28. Alternative: The CPU • Low bandwidth + reasonable Flops • High clock rate • Excellent architectural continuity • VERY hard to use efficiently • Excellent high-level language support • Flexible, but often too slow

  29. New Design: The Vertex Engine • Simple hardware for a commodity GPU • Allows user to manipulate vertex transform • Simple to use programming model • Superset of fixed function mode

  30. Why Vertex Processing? • Very parallel • Use single vertex programming model • Hardware can batch or interleave • KISS

  31. Why Not Primitive Processing? • Face culling and clipping break parallelism • Complicates memory accesses • Inefficient (control takes time) • Let hardware designers optimize

  32. Programming Model: Vertex I/O • Streaming vertex architecture • Source data converted to floats • Source data loaded • Run program • Destination data drained • Destination data re-formatted for hw

  33. Hardware Implementation • Vector SIMD Unit + Special Function Unit • Multithreaded and pipelined to hide latency • Any one instruction/cycle • All instructions equal latency • Free swizzling/negate/write mask support

  34. Conclusion • Very simple, efficient implementation • Allows vertex programming continuity • Stanford Imagine Architecture • A work in progress, lots more to come… • We welcome your feedback

More Related