1 / 15

Lecture 1 ( Advanced Object Oriented Programming) GPGPU-Programming

Lecture 1 ( Advanced Object Oriented Programming) GPGPU-Programming. Arne Kutzner Hanyang University / Seoul Korea. Contact. Contact data: E-Mail kutzner@hanyang.ac.kr Phone 2220 2397 Office Room 77-714 in emergency cases: 010 3938 1997.

keely-doyle
Download Presentation

Lecture 1 ( Advanced Object Oriented Programming) GPGPU-Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 1(Advanced Object Oriented Programming)GPGPU-Programming Arne Kutzner Hanyang University / Seoul Korea

  2. Contact • Contact data: • E-Mail kutzner@hanyang.ac.kr • Phone 2220 2397 • Office Room 77-714 • in emergencycases: 010 3938 1997 Algorithm Analysis

  3. Prof. Dr. Arne Kutzner / Weekly Schedule 2010.2 Alg. Analysis13:00 - 14:30(H77 - 203) C++ 14:00-16:00(H77 – 703) Master Students14:00 – 17:00 C++ 16:00-18:00(H77 – 703) Alg. Analysis16:30-18:00(H77 – 302) Algorithm Analysis

  4. Goals • Study some selected problems and their solutions using GPGPU. Examples: • Multimedia-Content (e.g. video encoding, decoding) • Simulations (cellular automata) • Sorting, Merging • Combinatory Problems • Scheduling - Problems • Brute Force Attacks (cryptography) • Starting Points • Examples contained in the NVIDIA or ATI SDK • Nvidia Examples:http://developer.nvidia.com/object/cuda_sdk_samples.html • Papers from conferences with focus on parallel computing • List of Conferences on parallel computing:http://www.google.com.bz/Top/Computers/Parallel_Computing/Conferences/ Algorithm Analysis

  5. Structure of Course • Seminar like style • Each participant selects some topic and prepares a presentation about this topic • Parallel we will discuss the ongoing research work in Prof. Kutzner’s workgroup (parallel reductions) • Prof. Kutzner will report about his work in the area of merging and sorting using GPGPU • Participants can decide whether they want to report about more technical aspects or rather algorithmic aspects in general • Grading: • Presentation, Summary, Participation, Attendance Algorithm Analysis

  6. Introduction toGPGPU

  7. Why parallel CPU/GPU architectures? • There are limits regarding the speed (frequency) for the clocking of a single CPU • the higher the frequency, the higher the power consumption (thermal performance) • Limit nowadays ≈ 3 Ghz • The energy efficiency of IT-infrastructures (IT-gadgets) becomes more and more important • Cost aspect (company), battery lifespan (hand-phones etc.) • Increasing wish for computational performance – e.g. with hardcore gamers • Solution: Parallel architectures Algorithm Analysis

  8. Development in field of parallel CPU/GPU architectures GPU’s – OpenGL(architectures optimized for many parallel floating point operations) Single core CPU’s(general purpose processing units) yesterday Multi core CPU’s(2, 4, 8 cores) today GPU’s – CUDA /OpenCL(general purpose programming on GPU’s) Multi core CPU’swith 64 or more cores, where some cores are “simplified” tomorrow ? GPU’s many core, where some behave like a CPU Flexible Parallel Architectures Algorithm Analysis

  9. Texture Texture Texture Texture Texture Texture Texture Texture Texture Host Input Assembler Thread Execution Manager Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Load/store Load/store Load/store Load/store Load/store Load/store Global Memory Modern GPU - Architecture “Master control unit” Multiprocessor Processor Scalar Processor (SP) Algorithm Analysis

  10. Programmers view of GPGPU Thread Block 1 Thread Block 2 Shared Memory Shared Memory Thread 1 Thread 2 Thread 1 Thread 2 Register Register Register Register Global Memory (GPU – on graphics card) Algorithm Analysis

  11. GPGPU Basics (1) • All threads execute the same code (kernel) • Code on GPU side is called kernel • Threads have a unique thread id number • Blocks have a unique block id number • Using thread id and block id we can compute for all threads a unique global id Algorithm Analysis

  12. Multi-Processor Processors (Cores) and Thread-blocks • The model allows some form of automatic scalability • If we have more cores, we don’t have to change the code Algorithm Analysis

  13. GPGPU Basics (2) • Memory Hierarchy • Access (Read/Write) to global memory is expensive • Access to shared memory is efficient but can create access conflicts and requires synchronization • Access to registers is most efficient because free of conflicts Algorithm Analysis

  14. GPGPU Basics (3) – Memory Access • Modern cards try to “bundle” access to the global memory • This technique is call coalitioning • The capabilities of different cards vary from NVIDIA documentation Algorithm Analysis

  15. GPGPU and programmers / algorithm engineering • The special architecture of modern GPUs requires some special form of thinking on programmers side • Idea’s that work great in a single thread word can represent a great mess in the context of GPGPU • Example: Algorithms that contain some form of implicit serialization • To solve a problem partially redundant can become advantageous in the GPGPU-world Algorithm Analysis

More Related