1 / 37

Multi-core Software Development with examples in C++

By Jon Nosacek. Multi-core Software Development with examples in C++. Why should you care?. Multi-core systems are becoming the standard for all devices Less heat 1 core = 2 cores at half frequency using ¼ power! (P = C × V 2 × F)

Download Presentation

Multi-core Software Development with examples in C++

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. By Jon Nosacek Multi-core Software Development with examples in C++

  2. Why should you care? • Multi-core systems are becoming the standard for all devices • Less heat • 1 core = 2 cores at half frequency using ¼ power! • (P = C × V2 × F) • Designing a new system around multi-core architecture can be quite difficult.

  3. Why should you care? (cont) • Technology isn’t evolving like it was before • Not automatic gains • We want fast! • Our users deserve the same

  4. Multi-threaded VS Multi-core • Same basic principle, but can yield very different results • Multi-threaded assumes no knowledge of the release environment and can make the program slower on a single-core platform • Multi-core means specifically designing your system for a platform that you know has two or more cores. Can yield significant performance boosts if done correctly

  5. Hardware • To understand how the software works, you must first understand how the hardware works • Very much a hardware-oriented evolution (Hardware could not keep up with our increasing demands)

  6. Why transition to multi-core? • Higher processor frequencies necessitated better cooling • There is a limit based on materials and methods • Computers are replacing us • Brain is not sequential

  7. Why multi-core (cont) • Traditional: • Multi-core

  8. Intel Core 2 extreme Quad http://www.techspot.com/articles-info/23/images/img2.jpg Intel Core i7 965 quad core (8 threads) http://tinyurl.com/3tgfygn

  9. Terminology • Thread • Smallest unit of execution that a program can be broken down into • Contains all the info that is needed for it to run • Atomic Statement • Single operation by the processor. Can’t slice out during execution

  10. Terminology (cont) • Hyper threading: (SMT) • Intel’s route of having 2 threads per core to simulate more cores and reduce CPU waste • Virtual processors not necessarily tied to physical ones • Example of hardware helping software

  11. How to design a multi-core system • Planning • Implementation • Testing • Deployment • Maintenance

  12. Planning • A “code-and-fix” laissez faire mentality WILL NOT WORK • Too many things to go wrong, hard to pinpoint problem post factum • Single most important step • Problems here will cascade into other steps and become worse • Clear vision is a must • How deep into threading do you want to go?

  13. Planning (cont.) • Opportunity comes during the decomposition phase • Need to model • the state of the threads and what combinations effect each other • Thread interaction • Number of threads • More threads => more problems • Balance performance with understandability, maintainability, time • Fairness and priority • More threads => more communication

  14. Planning (cont.) • Error handling is more important • Who handles the errors? Other threads might take a while to respond and what if everyone responds? • Synchronization and semaphores should be used sparingly. • Threads should be as independent as possible • Need to make rules on memory access • Dataflow diagrams!

  15. Concurrent Vs Parallel Design • Which do you think is better? http://blog.rednael.com/content/binary/parallel%20vs%20concurrent.jpg

  16. Concurrent Parallel • Easy to design and implement • Works well for IO • Minimal interaction to plan and synchronize • Less CPU waste • Even more difficult to track • CPU has to keep track and time slice more (swap time)

  17. Implementation • Languages are becoming more and more open to multi-core programming • There are libraries for C++ that help ease the workload • A lot of threading is OS tied and Microsoft knows theirs better than anyone • Usually support goes Linux & Microsoft then Macs • Watch for CPU specific commands that can improve performance

  18. Implementation (cont.) • Make sure resources are being managed • Update the models as the system changes • The IDE you choose during this phase can be very important and effects what you see your system doing • Using existing libraries usually reduces workload and are often more efficient • Make sure all basic/shared initializations are done before the threads are created

  19. Implementation (cont.) • Watch for evolving trends • If a lot of communication is going on between two threads, see if things can be merged/swapped • See which threads take up the most resources and what will increase program responsiveness • Keep the future in mind • More cores will always be added. • Think about the simplest case and expand into the complex • Also realize that more features are being added to C++ to help abstract multithreading

  20. // Basic example: #include < iostream > #include < pthread.h > void *task1(void *X) //define task to be executed by ThreadA { cout < < “Thread A complete” < < endl; return (NULL); } void *task2(void *X) //define task to be executed by ThreadB { cout < < “Thread B complete” < < endl; return (NULL); } int main(int argc, char *argv[]) { pthread_tThreadA,ThreadB; // declare threads pthread_create( & ThreadA,NULL,task1,NULL); // create threads pthread_create( & ThreadB,NULL,task2,NULL); pthread_join(ThreadA,NULL); // wait for threads to “join up” pthread_join(ThreadB,NULL); return (0); }

  21. // Doing little things can make a big difference too: array<int, 4> a = { 24, 26, 41, 42 }; vector<tuple<int,int>> results1; concurrent_vector<tuple<int,int>> results2; elapsed = time_call([&] { for_each (a.begin(), a.end(), [&](int n) { results1.push_back(make_tuple(n, fibonacci(n))); }); }); elapsed = time_call([&] { parallel_for_each (a.begin(), a.end(), [&](int n) { results2.push_back(make_tuple(n, fibonacci(n))); });}); // a 4 core system outputs: 9250 ms, 5726 ms

  22. Testing • Race conditions are the most prevalent • Identify critical paths • Balance threads and tweak for performance • Non-determinism (for some initial state, the final state is ambiguously determined)

  23. Deployment • Mostly the same • See what platforms are actually using you program and tune as necessary

  24. Maintenance • Need to keep up with the changing tech (still pretty new) • Adding new functionality will be more difficult especially when it’s very different from existing. • Much more testing needed • Going back to the original plan and seeing how new features fit in and what is effected is much more important

  25. Maintenance (cont.) • What about adding to an existing system? • Very difficult • Should focus on largest time consumers (IO, disk, complex algorithms) • Applications with low coupling are the best to add parallel aspects

  26. Challenges • Lots of planning needed • Thorough understanding of the environment • Very hard to debug • Built in support is hit-and-miss (language & IDE) • Security concerns (from other programs as well as your own) • A lot of life-critical embedded systems are sticking with single core platforms

  27. What apps can help me out? • Intel’s Threading Building Blocks • OpenMP • Microsoft Visual Studio • MULTI-Green Hills • Total View - Rogue Wave

  28. Intel’s Threading Building Blocks • Template Library • Algorithms, containers, mutex, atomic statements, timing, scheduling • Implements “Task Stealing” • If one core is idle, it will take a scheduled task from another to reduce CPU waste • Automatically creates the threads for you to maximize performance • Much like parallel_for • Tries to be like the STL • ease of use, generality, but more aggressive

  29. Intel’s Threading Building Blocks (cont.) • A bit more memory/cache oriented than STL • Intel knows their own cores and how to schedule on them • Adds a lot more concurrency-oriented data types (concurrent_queue, concurrent_vector, concurrent_hash_map) • Also geared for easy scalability • More atomic operations (also from knowing their own cores) • Follows a pipe-line architecture like graphics

  30. OpenMP

  31. OpenMP int th_id, nthreads; #pragma omp parallel private(th_id) shared(nthreads) { th_id = omp_get_thread_num(); #pragma omp critical { cout << "Hello World from thread " << th_id << '\n'; } #pragma omp barrier #pragma omp master { nthreads = omp_get_num_threads(); cout << "There are " << nthreads << " threads" << '\n'; } }

  32. Microsoft Visual Studio • Thread View

  33. Microsoft Visual Studio (cont.)

  34. MULTI IDE – Green Hills • Cool debugging/recording features http://www.ghs.com/products/MULTI_IDE.html

  35. Total View - Rogue Wave • Thread viewer:

  36. Sources: • Buttari, Alfredo, Jack Dongarra, Jakub Kurzak et all. The Impact of Multicore on Math Software • Hughes, Cameron, and Tracey Hughes. Professional Multicore Programming Design and Implementation for C++ Developers. Indianapolis, IN: Wiley Pub., 2008. • http://msdn.microsoft.com/en-us/concurrency/default.aspx • http://channel9.msdn.com/search?term=concurrency • http://www.cs.kent.edu/~farrell/amc09/lectures/

  37. Any Questions? • This is all sounds like a lot of work. Why should we bother when something easier might come along? • It’s very much a game of figuring out how much effort gets the largest returns. • True progress will take both EE’s and SE’s (and CS’s too if any showed up today) • Might be a long time before we see change

More Related