590 likes | 727 Views
Explore the significance of parallelism, survey various hardware options, and learn how to code effectively for multi- and many-core platforms. Understand the power and heat considerations, as well as the impact on existing hardware/software contracts. Discover how to start coding for CPUs, GPUs, and APUs. Dive into CUDA and OpenCL programming, undergraduate research opportunities, and advanced courses to stay ahead in this era of parallel computing.
E N D
Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?
Agenda • Introduction/Motivation • Why Parallelism? Why now? • Survey of Parallel Hardware • CPUs vs. GPUs • Conclusion • How Can I Start?
Talk Goal • Encourage undergraduates to answer the call to the era of parallelism • Education • Software Engineering
Why Parallelism? Why now? • You’ve already been exposed to parallelism • Bit Level Parallelism • Instruction Level Parallelism • Thread Level Parallelism
Why Parallelism? Why now? • Single-threaded performance has plateaued • Silicon Trends • Power Consumption • Heat Dissipation
Why Parallelism? Why now? • Issue: Power & Heat • Good: Cheaper to have more cores, but slower • Bad: Breaks hardware/software contract
Why Parallelism? Why now? • Hardware/Software Contract • Maintain backwards-compatibility with existing codes
Agenda • Introduction/Motivation • Why Parallelism? Why now? • Survey of Parallel Hardware • CPUs vs. GPUs • Conclusion • How Can I Start?
Personal Mobile Device Space iPhone 5 Galaxy S3
Personal Mobile Device Space 2 CPU cores/ 3 GPU cores iPhone 5 Galaxy S3
Personal Mobile Device Space 2 CPU cores/ 3 GPU cores 4 CPU cores/ 4 GPU cores iPhone 5 Galaxy S3
Desktop Space 16 CPU cores • Rare To Have “Single Core” CPU • Clock Speeds < 3.0 GHz • Power Wall • Heat Dissipation AMD Opteron 6272
Desktop Space • General Purpose • Power Efficient • High Performance • Not All Problems Can Be Done on GPU 2048 GPU Cores AMD Radeon 7970
Warehouse Space (HokieSpeed) • Each node: • 2x Intel Xeon 5645 (6 cores each) • 2x NVIDIA C2050 (448 GPUs each)
Warehouse Space (HokieSpeed) • Each node: • 2x Intel Xeon 5645 (6 cores each) • 2x NVIDIA C2050 (448 GPUs each) • 209 nodes
Warehouse Space (HokieSpeed) • Each node: • 2x Intel Xeon 5645 (6 cores each) • 2x NVIDIA C2050 (448 GPUs each) • 209 nodes • 2508 CPU cores • 187264 GPU cores
Convergence in Computing • Three Classes: • Warehouse • Desktop • Personal Mobile Device • Main Criteria • Power, Performance, Programmability
Agenda • Introduction/Motivation • Why Parallelism? Why now? • Survey of Parallel Hardware • CPUs vs. GPUs • Conclusion • How Can I Start?
What is a CPU? • CPU • SR71 Jet • Capacity • 2 passengers • Top Speed • 2200 mph
What is the GPU? • GPU • Boeing 747 • Capacity • 605 passengers • Top Speed • 570 mph
CPU Architecture • Latency Oriented (Speculation)
APU = CPU + GPU • Accelerated Processing Unit • Both CPU + GPU on the same die
CPUs, GPUs, APUs • How to handle parallelism? • How to extract performance? • Can I just throw processors at a problem?
CPUs, GPUs, APUs • Multi-threading (2-16 threads) • Massive multi-threading (100,000+) • Depends on Your Problem
Agenda • Introduction/Motivation • Why Parallelism? Why now? • Survey of Parallel Hardware • CPUs vs. GPUs • Conclusion • How Can I Start?
How Can I start? • CUDA Programming • You most likely have a CUDA enabled GPU if you have a recent NVIDIA card
How Can I start? • CPU or GPU Programming • Use OpenCL (your laptop could potentially run)
How Can I start? • Undergraduate research • Senior/Grad Courses: • CS 4234 – Parallel Computation • CS 5510 – Multiprocessor Programming • ECE 4504/5504 – Computer Architecture • CS 5984 – Advanced Computer Graphics
In Summary … • Parallelism is here to stay • How does this affect you? • How fast is fast enough? • Are we content with current computer performance?
Thank you! • Carlo del Mundo, • Senior, Computer Engineering • Website: http://filebox.vt.edu/users/cdel/ • E-mail: cdel@vt.edu Previous Internships @
Programming Models • pthreads • MPI • CUDA • OpenCL
pthreads • A UNIX API to create and destroy threads
MPI • A communications protocol • “Send and Receive” messages between nodes
CUDA • Massive multi-threading (100,000+) • Thread-level parallelism
OpenCL • Heterogeneous programming model that is catered to several devices (CPUs, GPUs, APUs)
Comparisons † Productivity is subjective and draws from my experiences
Parallel Applications • Vector Add • Matrix Multiplication
Vector Add • Serial • Loop N times • N cycles† • Parallel • Assume you have N cores • 1 cycles† † Assume 1 add = 1 cycle