Evolution of the Graphical Processing Unit

Evolution of the Graphical Processing Unit A professional paper submitted in partial fulfillment of the requirements for the degree of Master of Science with a major in Computer Science. Thomas Scott Crow February 3, 2005

Acknowledgements • I would like to thank Dr. Harris for his considerable patience and help. • I would like to thank my committee members, Dr. Egbert and Dr. Mensing for their valuable time.

Overview • Introduction • “Computer Graphics” Milestones • The Modern GPU • General Purpose GPU Computing • Future of the GPU

Introduction • Definition: Used primarily for 3D applications, a graphical processing unit (GPU) is a single chip processor that creates lighting effects and transforms objects every time a 3D scene is redrawn. These are mathematically intensive tasks, which otherwise would put quite a strain on the CPU. • History: Graphics computation has evolved from software written to perform graphics functions and run on the main CPU to specialized hardware to run certain types of graphics computation and the CPU performing the rest, to a fully implemented 3D graphics pipeline run entirely on a GPU. This history has followed closely the idea of the “Wheel of Reincarnation” first presented by Sutherland and Myers in a 1968 ACM paper.

Introduction Sutherland and Myer’s, “Wheel of Reincarnation”

“Computer Graphics” Milestones • MIT’s Whirlwind Project - 1944 Significance: First computer built specifically for interactive, real-time control which displayed real-time text and graphics on a video terminal.

“Computer Graphics” Milestones • “Magnetic” Core Memory (RAM) – 1951 Significance: Miniaturization, speed, and non-volatility.

“Computer Graphics” Milestones • SAGE (Semi-Automatic Ground Environment) – 1958 Significance: Introduced real-time software, showed feasibility of CRTs in interactive computing, and the light-pen as an input device.

“Computer Graphics” Milestones • SAGE (Semi-Automatic Ground Environment) –1958 With light-pen

“Computer Graphics” Milestones • MIT’s TX-0 (Transistorized Experimental Computer Zero) – 1956 Significance: First real-time, programmable, general-purpose computer made entirely from transistors and first ever operating system.

“Computer Graphics” Milestones • MIT’s TX-2 – 1959 Significance: Specialized I/O circuitry allowed for “online” computing which allowed for the creation of Sutherland’s “Sketchpad”.

“Computer Graphics” Milestones • Ivan Sutherland’s Sketchpad – 1963 Significance: Precursor of the direct manipulation computer graphic interface of today. Ancestor of Computer Aided Design (CAD) and the modern graphical user interface.

“Computer Graphics” Milestones • Digital Equipment Corporation (DEC) and the Minicomputer – 1957 Significance: Drastic shift away from the mainframe “time-sharing” model of computing. The VAX supermini would become the workhorse for the CAD industry.

“Computer Graphics” Milestones • Computer Aided Design (CAD) Systems Significance: Furthered the concept of Sketchpad by allowing the creation, rotation, and manipulation of 3D models. General Motors DAC-1

“Computer Graphics” Milestones Information Displays IDIIOM

“Computer Graphics” Milestones • The PC Revolution Significance: Allowed the computing power of the early mainframes and minicomputers to be available to consumers. Intel 4004, the first Microprocessor

“Computer Graphics” Milestones The Altair 8800 is considered the first personal computer.

The Modern GPU • Graphical Processing Unit (GPU)

The Modern GPU • Professional Graphics Adapter (PGA) • First processor based video card with an Intel 8088 microprocessor onboard. • All video related tasks were performed by onboard microprocessor.

The Modern GPU • Silicon Graphics Inc. (SGI) – 1980’s SGI’s two most important contributions to the modern GPU • - Vendor independent Application Programming Interface (API) for the development of 2D and 3D graphics applications. has become an industry standard API used and supported by all major vendors. • Graphics Pipeline - A conceptual model of stages that graphics data is sent through. It is simply a process for converting 3D coordinates of a model into 2D screen images.

The Modern GPU 3D Graphics Pipeline from nVidia

The Modern GPU • Generalized 2-Step Graphics Pipeline • Geometry Stage – Changes 3D object coordinates into 2D window coordinates. • Rendering Stage - Fills the area of pixels between the 2D coordinates with pixels to represent the surface of the object.

The Modern GPU • Main Components of the Geometry Stage • Transform and Lighting – Transform is the process of displaying the coordinates of a 3D object onto a 2D space and lighting is the process of providing lighting effects to the scene. • Triangle Setup – Converts triangle vertices into pixels and computes the rate of change of color values between pixels.

The Modern GPU • GPU Timeline

The Modern GPU • Transform Matrix Multiplication • Transform Matrix – Made up of many interim action matrices multiplied together. • Interim Action Matrix – Includes such actions as scaling, rotation, translation, etc.

The Modern GPU • Fixed Function Pipeline

The Modern GPU • Programmable Pipeline • Vertex Programs replace the T&L stages of pipeline • Fragment Programs replace multi-texturing and blending

The Modern GPU • The Classic Von Neumann Architecture • Von Neumann Bottleneckis the separation between the CPU and memory.

The Modern GPU • The Stream Processing Model • Streamsare sets of sequential data elements that require similar computation. • Kernelsare pieces of code that operate on every element of a stream.

The Modern GPU • Three Levels of Parallelism Exposed by the Stream Processing Model • Instruction-Level Parallelism– Simultaneous execution of multiple instructions within a kernel. • Data-Level Parallelism– Instruction execution on multiple stream elements simultaneously. • Task-Level Parallelism– Multiple stream processors can divide the work from one kernel or different kernels run on different stream processors.

The Modern GPU • Memory Access is Expensive: CPUs use caches to reduce off-chip memory access. Caches benefit from: • Spatial Locality – Items located physically near an item referenced in the near past will have a higher probability of being referenced in the near future. • Temporal Locality – Items referenced in the near past have a higher probability of being re-referenced in the near future. GPUs benefit from: • Producer-Consumer Locality – Production of a stream that is immediately consumed by another kernel. Memory-to-Arithmetic Operations Ratio: • Traditional Accumulator 1:1 • Scalar Processor 1:4 • Stream Processor 1:100

General Purpose GPU Computing • Why General Purpose Computing on a GPU? • GPUs are not hampered by the classic sequential code structure of the CPU. Basically means that GPUs can more effectively utilize additional transistors. • Moore’s Law says transistor count at a given die size doubles every 18 months. That of a GPU doubles every 6 months. • Pentium 4 has 222 million transistors. • GeForce 6 has more than double. • Speed - The lure of raw computational power; parallelism. • Cost - The multi-billion dollar gaming industry drives down the cost of the commodity GPU making it a very cost effective alternative to the CPU.

General Purpose GPU Computing Moore’s Law Cubed From ‘Stream Programming Environments’ – Hanrahan, 2004

General Purpose GPU Computing • Current Research Topics • Computer Vision • Computational Geometry • Stream Processing • Cloud Simulation • Ice Crystal Growth Simulation • Database Queries • Monte Carlo Methods • Computational Fluid Dynamics • Collision Detection • Voronoi Computations • Molecular Dynamics • Many More…

General Purpose GPU Computing • Stanford’s “General Purpose” Imagine Stream Processor

General Purpose GPU Computing • Imagine Bandwidth Hierarchy

General Purpose GPU Computing • Matrix-Matrix Multiplication – A Test Case C=AB, where A and B are large, dense NxN matrices. System Requirements: CPU Test: • Pentium III 750MHz • ScienceMark 2.0 – BLAS (Basic Linear Algebra Subprograms) software suite. GPU Test: • GeForce FX 5200 – 1st fully programmable 3D Graphics Pipeline GPU. • Source code from GPUBench suite of performance testing tools, which is written in Cg “C for Graphics”. • Microsoft Visual Studio .Net 2003 – Programming Environment. • Cygwin – Linux environment for MS Windows.

General Purpose GPU Computing • Results

General Purpose GPU Computing • Efficiency e = CPU: Theoretical peak GFLOPS for the Pentium III 750MHz is 3 GFLOPS. Observed Peak GFLOPS for this test was 1.2 GFLOPS. e = 40% efficiency GPU: Theoretical peak GFLOPS for the GeForce FX 5200 is 4 GFLOPS. Observed Peak GFLOPS for this test was 0.6 GFLOPS. e = 15% efficiency NOT EXPECTED • In this test the GPU is capable of 25% more GFLOPS than the CPU, but was found to perform ½ as well.

General Purpose GPU Computing c

Future of the GPU • Potential Improvements • Design of new algorithms • New languages that are highly parallel and data streaming capable. • Compilers and tools to advance parallel stream programming. • Stanford University’s BrookGPU • Memory bandwidth hierarchy improvements.

Future of the GPU • GPU Clusters • nVIDIA SLI (Scalable Link Interface) Can double the performance from a single GPU

Future of the GPU Examples of Load Balancing: • Alternate Frame Rendering

Future of the GPU Examples of Load Balancing: • Split Frame Rendering

Future of the GPU GPU Clustering at Stony Brook University

Questions Evolution of the Graphical Processing Unit

Evolution of the Graphical Processing Unit

Evolution of the Graphical Processing Unit

Presentation Transcript

Implementation of Parallel Processing Techniques on Graphical Processing Units

Evolution Unit

Evolution Unit

Unit III: The Evolution of Cooperation

Unit III: The Evolution of Cooperation

Unit III: The Evolution of Cooperation

Evolution Unit

Evolution Unit

Unit III: The Evolution of Cooperation

The Central Processing Unit

Evolution Unit

1. Evolution of ILP-processing

Unit III: The Evolution of Cooperation

Implementation of Parallel Processing Techniques on Graphical Processing Units

Evolution Unit

Evolution of the ILP Processing

Unit III: The Evolution of Cooperation

Evolution of the ILP Processing

The Central Processing Unit

Unit III: The Evolution of Cooperation

Unit III: The Evolution of Cooperation

The evolution of a transaction processing system