Multi-core Software Development with examples in C++

By Jon Nosacek Multi-core Software Development with examples in C++

Why should you care? • Multi-core systems are becoming the standard for all devices • Less heat • 1 core = 2 cores at half frequency using ¼ power! • (P = C × V2 × F) • Designing a new system around multi-core architecture can be quite difficult.

Why should you care? (cont) • Technology isn’t evolving like it was before • Not automatic gains • We want fast! • Our users deserve the same

Multi-threaded VS Multi-core • Same basic principle, but can yield very different results • Multi-threaded assumes no knowledge of the release environment and can make the program slower on a single-core platform • Multi-core means specifically designing your system for a platform that you know has two or more cores. Can yield significant performance boosts if done correctly

Hardware • To understand how the software works, you must first understand how the hardware works • Very much a hardware-oriented evolution (Hardware could not keep up with our increasing demands)

Why transition to multi-core? • Higher processor frequencies necessitated better cooling • There is a limit based on materials and methods • Computers are replacing us • Brain is not sequential

Why multi-core (cont) • Traditional: • Multi-core

Intel Core 2 extreme Quad http://www.techspot.com/articles-info/23/images/img2.jpg Intel Core i7 965 quad core (8 threads) http://tinyurl.com/3tgfygn

Terminology • Thread • Smallest unit of execution that a program can be broken down into • Contains all the info that is needed for it to run • Atomic Statement • Single operation by the processor. Can’t slice out during execution

Terminology (cont) • Hyper threading: (SMT) • Intel’s route of having 2 threads per core to simulate more cores and reduce CPU waste • Virtual processors not necessarily tied to physical ones • Example of hardware helping software

How to design a multi-core system • Planning • Implementation • Testing • Deployment • Maintenance

Planning • A “code-and-fix” laissez faire mentality WILL NOT WORK • Too many things to go wrong, hard to pinpoint problem post factum • Single most important step • Problems here will cascade into other steps and become worse • Clear vision is a must • How deep into threading do you want to go?

Planning (cont.) • Opportunity comes during the decomposition phase • Need to model • the state of the threads and what combinations effect each other • Thread interaction • Number of threads • More threads => more problems • Balance performance with understandability, maintainability, time • Fairness and priority • More threads => more communication

Planning (cont.) • Error handling is more important • Who handles the errors? Other threads might take a while to respond and what if everyone responds? • Synchronization and semaphores should be used sparingly. • Threads should be as independent as possible • Need to make rules on memory access • Dataflow diagrams!

Concurrent Vs Parallel Design • Which do you think is better? http://blog.rednael.com/content/binary/parallel%20vs%20concurrent.jpg

Concurrent Parallel • Easy to design and implement • Works well for IO • Minimal interaction to plan and synchronize • Less CPU waste • Even more difficult to track • CPU has to keep track and time slice more (swap time)

Implementation • Languages are becoming more and more open to multi-core programming • There are libraries for C++ that help ease the workload • A lot of threading is OS tied and Microsoft knows theirs better than anyone • Usually support goes Linux & Microsoft then Macs • Watch for CPU specific commands that can improve performance

Implementation (cont.) • Make sure resources are being managed • Update the models as the system changes • The IDE you choose during this phase can be very important and effects what you see your system doing • Using existing libraries usually reduces workload and are often more efficient • Make sure all basic/shared initializations are done before the threads are created

Implementation (cont.) • Watch for evolving trends • If a lot of communication is going on between two threads, see if things can be merged/swapped • See which threads take up the most resources and what will increase program responsiveness • Keep the future in mind • More cores will always be added. • Think about the simplest case and expand into the complex • Also realize that more features are being added to C++ to help abstract multithreading

// Basic example: #include < iostream > #include < pthread.h > void *task1(void *X) //define task to be executed by ThreadA { cout < < “Thread A complete” < < endl; return (NULL); } void *task2(void *X) //define task to be executed by ThreadB { cout < < “Thread B complete” < < endl; return (NULL); } int main(int argc, char *argv[]) { pthread_tThreadA,ThreadB; // declare threads pthread_create( & ThreadA,NULL,task1,NULL); // create threads pthread_create( & ThreadB,NULL,task2,NULL); pthread_join(ThreadA,NULL); // wait for threads to “join up” pthread_join(ThreadB,NULL); return (0); }

// Doing little things can make a big difference too: array<int, 4> a = { 24, 26, 41, 42 }; vector<tuple<int,int>> results1; concurrent_vector<tuple<int,int>> results2; elapsed = time_call([&] { for_each (a.begin(), a.end(), [&](int n) { results1.push_back(make_tuple(n, fibonacci(n))); }); }); elapsed = time_call([&] { parallel_for_each (a.begin(), a.end(), [&](int n) { results2.push_back(make_tuple(n, fibonacci(n))); });}); // a 4 core system outputs: 9250 ms, 5726 ms

Testing • Race conditions are the most prevalent • Identify critical paths • Balance threads and tweak for performance • Non-determinism (for some initial state, the final state is ambiguously determined)

Deployment • Mostly the same • See what platforms are actually using you program and tune as necessary

Maintenance • Need to keep up with the changing tech (still pretty new) • Adding new functionality will be more difficult especially when it’s very different from existing. • Much more testing needed • Going back to the original plan and seeing how new features fit in and what is effected is much more important

Maintenance (cont.) • What about adding to an existing system? • Very difficult • Should focus on largest time consumers (IO, disk, complex algorithms) • Applications with low coupling are the best to add parallel aspects

Challenges • Lots of planning needed • Thorough understanding of the environment • Very hard to debug • Built in support is hit-and-miss (language & IDE) • Security concerns (from other programs as well as your own) • A lot of life-critical embedded systems are sticking with single core platforms

What apps can help me out? • Intel’s Threading Building Blocks • OpenMP • Microsoft Visual Studio • MULTI-Green Hills • Total View - Rogue Wave

Intel’s Threading Building Blocks • Template Library • Algorithms, containers, mutex, atomic statements, timing, scheduling • Implements “Task Stealing” • If one core is idle, it will take a scheduled task from another to reduce CPU waste • Automatically creates the threads for you to maximize performance • Much like parallel_for • Tries to be like the STL • ease of use, generality, but more aggressive

Intel’s Threading Building Blocks (cont.) • A bit more memory/cache oriented than STL • Intel knows their own cores and how to schedule on them • Adds a lot more concurrency-oriented data types (concurrent_queue, concurrent_vector, concurrent_hash_map) • Also geared for easy scalability • More atomic operations (also from knowing their own cores) • Follows a pipe-line architecture like graphics

OpenMP

OpenMP int th_id, nthreads; #pragma omp parallel private(th_id) shared(nthreads) { th_id = omp_get_thread_num(); #pragma omp critical { cout << "Hello World from thread " << th_id << '\n'; } #pragma omp barrier #pragma omp master { nthreads = omp_get_num_threads(); cout << "There are " << nthreads << " threads" << '\n'; } }

Microsoft Visual Studio • Thread View

Microsoft Visual Studio (cont.)

MULTI IDE – Green Hills • Cool debugging/recording features http://www.ghs.com/products/MULTI_IDE.html

Total View - Rogue Wave • Thread viewer:

Sources: • Buttari, Alfredo, Jack Dongarra, Jakub Kurzak et all. The Impact of Multicore on Math Software • Hughes, Cameron, and Tracey Hughes. Professional Multicore Programming Design and Implementation for C++ Developers. Indianapolis, IN: Wiley Pub., 2008. • http://msdn.microsoft.com/en-us/concurrency/default.aspx • http://channel9.msdn.com/search?term=concurrency • http://www.cs.kent.edu/~farrell/amc09/lectures/

Any Questions? • This is all sounds like a lot of work. Why should we bother when something easier might come along? • It’s very much a game of figuring out how much effort gets the largest returns. • True progress will take both EE’s and SE’s (and CS’s too if any showed up today) • Might be a long time before we see change

Multi-core Software Development with examples in C++

Multi-core Software Development with examples in C++

Presentation Transcript

Multi-core architectures

Multi-core computing

Multi-Core Systems

Tax Software Development in a Multi-Jurisdictional Environment

Multi-Core Computing

Towards Multi-Paradigm Software Development

Multi-core Programming

Multi-core processors

Multi-core Programming

Multi-core Programming

Multi-core Programming

Multi-Core Development

Introducing with Microsoft visual C++ software development Environment

FPGA Multi-core

Software development fundamentals (C#)

Examples- With Development Fees:

Multi-core CPU’s

C Examples

web and software development company, software development c

Core Banking Software Development Company

Why multi-threading/multi-core?