290 likes | 390 Views
Dive into the post-geometry pipeline, focusing on clipping, viewport transformation, triangle setup, traversal, and interpolation for optimized rasterization. Explore hardware requirements and traversal algorithms for high-performance rendering. Learn about fragment clipping, interpolation methods, and traversal algorithm effects for enhanced graphics processing.
E N D
Status – Week 240 Victor Moya
Summary • Post Geometry Pipeline. • Rasterization. • Triangle Setup. • Triangle Traversal. • Interpolation. • Current status.
Post Geometry Pipeline • Divide by w? • Clipping? • NVidia doesn’t seem to have geometric clipping. • Alpha kill in NV2x for user clip planes. • ATI seems to have geometric clipping. • Proper user clipping. • No support for transformed and lit vertex clipping. • What do we do?
Post Geometry Pipeline • Clipping: • 6 frustum clip planes. • At least 6 user clip planes. • Hardware requeriments: • Plane – edge intersection (?). • Generates new vertices (for triangles 1 or 2). • Interpolate output attributes at the new vertex. • Can generate new triangles (for triangles 1). • Affects primitive assembly. • At least frustum clipping should be fast.
Post Geometry Pipeline • Viewport Transformation • Delay to end of rasterization (at conversion from fixed point to float point fragment attributes). • Use fixed point device coordinates [-1, 1] for rasterization. • Rasterization.
MC StF StOC StC PA TS TT Int 1 1 1 2 1 1 1 A*TL+L 1 A*TL+L StL Shader 1 1 MC: Memory Controller Shader: Vertex Shader StF: Streamer Fetch PA: Primitive Assembly StL: Streamer Loader TS: Triangle Setup StOC: Streamer Output Cache TT: Triangle Traversal StC: Streamer Commit Int: Interpolation
Rasterization • We can divide it in three phases: • Setup. • Calculate linear equation coefficients, start values and slopes. • Perform area and face culling. • Traversal. • Traverse the triangle generating fragments inside the triangle. • Clipping of fragments by frustum and user clip. • Interpolation. • Interpolate all fragment attributes for the generated fragment.
Triangle Setup • Use 2DH rasterization setup. • Create matrix (inverse or just adjoint matrix?) from the three vertex 2DH positions. • Calculate determinant. • Cull for sign (face culling) and zero (zero area). • Send the edge equation coefficients or/and start and slope values to Triangle Traversal. • Optional: send other equations (1/w, clip planes, interpolators …).
Triangle Setup • Adjoint rasterization matrix adj(M): • First level: 18 muls. • Second level: 9 adds. • a0 = y1w2 – y2w1 • a1 = y2w0 – y0w2 • a2 = y0w1 – y1w0 • b0 = x2w1 – x1w2 • b1 = x0w2 – x2w0 • b2 = x1w0 – x0w1 • c0 = x1y2 – x2y1 • c1 = x2y0 - x0y2 • c2 = x0y1 – x1y0
Triangle Setup • Matrix determinant det(M): • 1 DP3: {w0, w1, w2} X {c0, c1, c2} • Inverse matrix M-1 (not needed?): • First level: 1 reciproque: 1/det(M). • Second level: 9 muls. • Edge equations: • M-1 rows. • E0 = [a0, b0, c0] • E1 = [a1, b1, c1] • E2 = [a2, b2, c2]
Triangle Setup • 1/w equation: • Sum of rows (param vector {1, 1, 1}). • Can be calculated as the sum of the edge equations. • Additional equations: • param vector {u0, u1, u2} X M-1 : 3 DP3. • Frustum/Viewport clip: • D0 = [1, 0, -x0] • D1 = [-1, 0, x0 + w] • D2 = [0, 1, -y0] • D3 = [0, -1, y0 + h]
* + + * * * * DP3
Triangle Traversal • Different algorithms: • I don’t know which is better. • Scanline. • Centerline (PixelVision). • Tiled (Neon, McCormack). • Incremental and Hierarchical Hilbert Order (McCool). • Others?
Triangle Traversal • Traversal algorithm effects: • Can improve the texture pattern access (Neon, Hilbert). • Can improve framebuffer memory access (Neon). • Traversal algorithm requeriments: • Must produce at least 2x2 fragments per cycle or multiples (2 2x2 or 3 2x2, etc). • Must be efficient and generate the less fragments outside the triangle. • Antialiasing?
Triangle Traversal • Uses edge equation coefficients and/or start and slope values calculated from then to walk the triangle. • One ‘step’ per cycle. • Fixed point arithmetic : integer addition. • Requires to save state (2 to 3 saved states) or must use walk back (spends cycles). • Tests (sign) the edge equations values at n positions per cycle. • May test frustum and znear/zfar clip at the same time.
Triangle Traversal • Hardware requeriments: • Multiple fixed point adders. • Multiple sign testers. • Registers for current (at least 3 for each edge equation) and saved states. • Registers for edge slops/increments (as many as fragments generated per cycle and edge equations?).
Traversal Algorithm TEST + + +
Interpolation. • Using barycentric method: • Use the edge equation result (McCool): • F0(x,y) = E0 • F1(x,y) = E1 • F2(x,y) = E2 • Calculate sum of edge equations at the fragment: • R’(x,y) = F0 (x,y) + F1(x,y) + F2(x,y) • Calculate reciproque: • r = 1/R’(x,y) • Interpolate attribute at the fragment: • pk(x,y) = pk0rF0 (x,y) + pk1rF1(x,y) + pk2rF2(x,y)
Interpolation • Alternative (Olano & Greer): • At setup: • Use 2DH method and calculate coefficients for all the attributes. • Calculate 1/w (sum of rows) coefficients. • Requires a vector matrix mul per attribute. • At traverse/interpolation: • Interpolate 1/w and attributes using fixed point incremental arithmetic. • Calculate reciproque of 1/w. • Mul interpolated attribute by reciproque of 1/w
Interpolation • Barycentric coordinates (McCool): • no cost at setup. • store the parameter values at the three triangle edges. • fixed: 1 addition, 1 reciproque and 3 muls • per parameter: 1 DP3. • Interpolation using Olano & Greer: • vector matrix mul at setup per parameter and 1/w: 3 DP3. • store current state and slope increment for all the parameters and 1/w. • fixed: 1 addition, 1 reciproque • per parameter: 1 addition, 1 mul.
Interpolation • How many attributes/parameters can be interpolated per cycle? • XBOX: • 5 interpolators? • general interpolator: color diffuse + color specular (shared). • Texture interpolators: 4? • Note: each of those interpolators is for a 4D vector.
VERTEX ATTRIBUTES FRAGMENT ATTRIBUTES * * + * * * * + 1/x
Current status • Implemented Primitive Assembly box (with trivial degenerate triangle rejection). • Added GPU_VERTEX_OUTPUT_ATTRIBUTE register. • Boolean vector of MAX_VERTEX_ATTRIBUTES that stores if a vertex output register is written in the shader (and therefore must be transmited). • Now the transmission latency for vertex between the Shader and Streamer Commit and between Streamer Commit and Primitive Assembly is determined by the number of ouput attributes.
Current Status • Started Triangle Setup box and support classes.
Current Status • Comments: • Streamer Loader to Shader transmission should also have transmission latency penalty? • Where are stored the vertex output attributes? • How many times we must pay the vertex transmission penalty?
Current Status • Signal Analyzer: • Already works with large traces.
References • Triangle Scan Conversion using 2D Homogeneous Coordinates, Marc Olano, Trey Greer. • Tiled Polygon Traversal Using Half-Plane Edge Functions, Joel McCormack, Robert McNamara. • Incremental and Hierarchical Hilber Order Edge Equation Polygon Rasterization, Michael D. McCool, Chris Wales, Kevin Moule.
References • A Parallel Algorithm for Polygon Rasterization, Juan Pineda.