1 / 38

Multicore, parallelism, and multithreading

Multicore, parallelism, and multithreading. By: Eric Boren, Charles Noneman , and Kristen Janick. Multicore Processing. Why we care. What is it?. A processor with more than one core on a single chip

siusan
Download Presentation

Multicore, parallelism, and multithreading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multicore, parallelism, and multithreading By: Eric Boren, Charles Noneman, and Kristen Janick

  2. Multicore Processing Why we care

  3. What is it? • A processor with more than one core on a single chip • Core: An independent system capable of processing instructions and modifying registers and memory

  4. Motivation • Advancements in component technology and optimization are limited in contribution to processor speed • Many CPUapplications attempt to do multiple things at once: • Video editing • Multi-agent simulation • So, use multiple cores to get it done faster

  5. Hurdles • Instruction assignment (who does what?) • Mostly delegated to the operating system • Can be done to a small degree through dependency analysis on the chip • Cores must still communicate at times – how? • Shared-memory • Message passing

  6. Advantages • Multiple Programs: • Can be separated between cores • Other programs don’t suffer when one hogs CPU • Multi-threaded Applications: • Independent threads don’t have to wait as long for each other – results in faster overall execution • VS Multiple Processors • Less distance between chips - faster communication results in higher maximum clock rate • Less expensive due to smaller overall chip area, shared components (caches, etc.)

  7. Disadvantages • OS and programs must be optimized for multiple cores, or no gain will be seen • In a singly-threaded application, little to no improvement • Overhead in assigning tasks to cores • Real bottleneck is typically memory and disk access time – independent of number of cores

  8. Amdahl’s Law • Potential performance increase on a parallel computing platform is given by Amdahl’s law. • Large problems are made upof several parallelizable parts and non-parallelizable parts. • S = 1/(1-P) • S = speed-up of program • P = fraction of program that is parallizable

  9. Current State of the Art • Commercial processors: • Most have at least 2 cores • Quad-core are highly popular for desktop applications • 6-core processors have recently appeared on the market (Intel’s i7 980X) • 8-core exist but are less common • Academic and research: • MIT: RAW 16-core • Intel Polaris – 80-core • UC Davis: AsAP – 36 and 167-core, individually-clocked

  10. Parallelism

  11. What is Parallel Computing? • Form of computation in which many calculations are carried out simultaneously. • Operating on the principle that large problems can often be divided into smaller ones, which are solved concurrently.

  12. Types of Parallelism • Bit level parallelism • Increase processor word size • Instruction level parallelism • Instructions combined into groups • Data parallelism • Distribute data over different computing environments • Task parallelism • Distribute threads across different computing environments

  13. Flynn’s Taxonomy

  14. Single Instruction, Single Data (SISD) • Provides no parallelism in hardware • 1 data stream processed by the CPU in 1 clock cycle • Instructions executed in serial fashion

  15. Multiple Instruction, Single Data (MISD) • Process single data stream using multiple instruction streams simultaneously • More theoretical model than practical model

  16. Single Instruction, Multiple Data (SIMD) • Single instruction steam has ability to process multiple data streams in 1 clock cycle • Takes operation specified in one instruction and applies it to more than 1 set of data elements at 1 time • Suitable for graphics and image processing

  17. Multiple Instruction, Multiple Data (MIMD) • Different processors can execute different instructions on different pieces of data • Each processor can run independent task

  18. Automatic parallelization • The goal is to relieve programmers from the tedious and error-prone manual parallelization process. • Parallelizing compiler tries to split up a loops so that its iterations can be executed on separate processors concurrently • Identify dependences between references -- independent actions can operate in parallel

  19. Parallel Programming languages • Concurrent programming languages, libraries, API’s, and parallel programming models have been created for programming parallel computers. • Parallel languages make it easier to write parallel algorithms • Resulting code will run more efficiently because the compiler will have more information to work with • Easier to identify data dependencies so that the runtime system can implicitly schedule independent work

  20. Multithreading techniques

  21. fork() • Make a (nearly) exact duplicate of the process • Good when there is no or almost no need to communicate between processes • Often used for servers

  22. fork() Parent Globals Heap Stack Child Globals Heap Stack Child Globals Heap Stack Child Globals Heap Stack Child Globals Heap Stack

  23. fork() pid_tpID = fork(); if (pID == 0) { //child } else { //parent }

  24. POSIX Threads • C library for threading • Available in Linux, OS X • Shared Memory • Threads are created and destroyed manually • Has mechanisms for locking memory

  25. Thread Stack Thread Stack Thread Stack Thread Stack POSIX Threads Process Globals Heap

  26. pthread_t thread; pthread_create( &thread, NULL, function_to_call, (void*) data); //Do stuff pthread_join(thread, NULL); POSIX Threads

  27. int total = 0; void do_work() { //Do stuff to create “result” total = total + result; } Thread 1 reads total (0) Thread 2 reads total (0) Thread 1 does add and saves total (1) Thread 2 does add and saves total (2) POSIX Threads

  28. int total = 0; pthread_mutex_tmutex = PTHREAD_MUTEX_INITIALIZER; void do_work() { //Do stuff to create “result” pthread_mutex_lock( &mutex ); total = total + result; pthread_mutex_unlock( &mutex ); } POSIX Threads

  29. Library and compiler directives for multi-threading Support in Visual C++, gcc Code compiles even if compiler doesn't support OpenMP Popular in high performance communities Easy to add parallelism to existing code OpenMP

  30. const int array_size = 100000; int i, a[array_size]; #pragma omp parallel for for (i = 0; i < array_size; i++) { a[i] = 2 * i; } OpenMPInitialize an Array

  31. #pragma omp parallel for reduction(+:total) for(i = 0; i < array_size; i++) { total = total + a[i]; } OpenMPReduction

  32. Apple Technology for Multi-Threading Programmer puts work into queues A system central process determines the number threads to give to each queue Add code to queues using a closure Right now Mac only, but open source Easy to add parallelism to existing code Grand Central Dispatch

  33. dispatch_apply(array_size, dispatch_get_global_queue(0, 0), ^(int i) { a[i] = 2*i; }); Grand Central DispatchInitialize an Array

  34. void analyzeDocument(doc) { do_analysis(doc); //May take a very long time update_display(); } Grand Central DispatchGUI Example

  35. void analyzeDocument(doc) { dispatch_async(dispatch_get_global_queue(0, 0), ^{ do_analysis(doc); update_display(); }); } Grand Central DispatchGUI Example

  36. Threading in Java, Python, etc. MPI – for clusters Other Technologies

  37. Questions?

  38. Supplemental Reading • Introduction to Parallel Computing • https://computing.llnl.gov/tutorials/parallel_comp/#Abstract • Introduction to Multi-Core Architecture • http://www.intel.com/intelpress/samples/mcp_samplech01.pdf • CPU History: A timeline of microprocessors • http://everything2.com/title/CPU+history%253A+A+timeline+of+microprocessors

More Related