1 / 31

Parallelism in C++ using the Concurrency Runtime

Parallelism in C++ using the Concurrency Runtime. Don McCrady , Principal Development Lead Parallel Computing Platform June 7-10. Topics. Some cool new C++ Parallel Iteration Tricks for reducing shared state Asynchronous Agents Concurrent containers. Demo: N-Bodies.

irisa
Download Presentation

Parallelism in C++ using the Concurrency Runtime

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelism in C++ using the Concurrency Runtime Don McCrady, Principal Development LeadParallel Computing Platform June 7-10

  2. Topics • Some cool new C++ • Parallel Iteration • Tricks for reducing shared state • Asynchronous Agents • Concurrent containers

  3. Demo: N-Bodies

  4. Scale to many cores

  5. Concurrency Runtime • Part of the C++ Runtime • No new libraries to link in • PPL: Parallel Pattern Library • Agents: Asynchronous Agents Library • Abstracts away the notion of threads • Tasks are computations that may be run in parallel • Use PPL & Agents to express your potential concurrency • Let the runtime map it to the available concurrency • Scale from 1 to 256 cores

  6. Lambdas – Cool New C++ class _FT { public: _FT(int x, int& y) : _x(x), _y(y) { } void operator()(intz) { _y += _x - z; } private: int_x; int& _y; }; intx = 5; inty = 7; _FT functor(x, y); functor(3); cout << y; • intx = 5; • inty = 7; • autofunctor = • [x,&y] (intz) { • y += x - z; • }; • functor(3); • cout << y;

  7. Lambdas – Functional Programming Lambdas make functional programming palatable in C++ • #include <vector> • #include <algorithm> • using namespace std; • vector<int>v = …; • foreach(v.begin(), v.end(), [&v] (intitem) { • cout << item << endl; • });

  8. parallel_for parallel_for iterates over a range in parallel • #include <ppl.h> • using namespace Concurrency; • parallel_for(0, 1000, [] (inti) { • work(i); • });

  9. parallel_for parallel_for(0, 1000, [] (inti) { work(i); }); • Order of iteration is indeterminate. • Cores may come and go. • Ranges may be stolen by newly idle cores. Core 1 Core 2 work(0…249) work(250…499) Core 3 Core 4 work(500…749) work(750…999)

  10. parallel_for • parallel_for considerations: • Designed for unbalanced loop bodies • An idle core can steal a portion of another core’s range of work • Supports cancellation • Early exit in search scenarios • For fixed-sized loop bodies that don’t need cancellation, consider parallel_for_fixed from the sample pack.

  11. parallel_for: Tips Parallelize outer loops first Usually plenty of outer loop iterations to spread out to all cores Inner loops do sufficient work to overcome parallel overheads • parallel_for(0, yBound, [] (inty) { • for (intx=0; x < xBound; ++x) { • complex c(minReal + deltaReal * x, • minImag + deltaImag * y); • Color pixel = ComputeMandelBrotColor(c); • … • } • });

  12. parallel_for_each parallel_for_each iterates over an STL container in parallel • #include <ppl.h> • using namespace Concurrency; • vector<int>v = …; • parallel_for_each(v.begin(), v.end(), [] (inti) { • work(i); • });

  13. parallel_for_each: Tips • Works best with containers that support random-access iterators: • std::vector, std::array, std::deque, Concurrency::concurrent_vector, … • Works okay, but with higher overhead on containers that support forward (or bi-di) iterators: • std::list, std::map, …

  14. Shared State Shared state kills scalability of parallel iteration critical_sectioncs; double sum = 0; parallel_for(0, 1000, [&sum, &cs] (inti) { cs.lock(); if (SomeCondition(i)) sum += SomeComputation(i); cs.unlock(); SomeFurtherComputation(i); }); • High contention: entire loop is serialized. • Cache thrashing. • Potential thread explosion.

  15. Shared State Reduce contention if possible. critical_sectioncs; double sum = 0; parallel_for(0, 1000, [&sum, &cs] (inti) { if (SomeCondition(i)) { cs.lock(); sum += SomeComputation(i); cs.unlock(); } SomeFurtherComputation(i); }); • Contention potentially reduced by moving lock inside the if-statement. • Still thrashes the cache.

  16. Shared State Use combinable for per-thread computations. Each thread has its own state; no shared state. Operations must be commutative. combinable<double> sums; parallel_for(0, 1000, [&sums] (inti) { if (SomeCondition(i)) sums.local() += SomeComputation(i); SomeFurtherComputation(i); }); double sum = sums.combine(std::plus<double>()); • Practically zero contention. • No cache thrashing.

  17. Demo: Relatively Prime Numbers

  18. Messaging and Agents • Not all patterns map to loops or tasks. • Pipelines, state machines, producer/consumer • Agent: an asynchronous object that communicates through message passing. • Message Blocks: participants in message-passing which transport from source to target. • Message: encapsulates state that is transferred between message blocks.

  19. Asynchronous Agents Library Message blocks for storing data • unbounded_buffer<T> • overwrite_buffer<T> • single_assignment<T> Message blocks for pipelining • transformer<T,U> • call<T> Send and receive • send, asend • receive • try_receive Message blocks for joining data • choice • join

  20. Simple Agents Example unbounded_buffer “glorp”` propagate “glorp”` send transformer (reverse) “prolg”` propagate receive

  21. Simple Agents Example: ReverserAgent • classReverserAgent: publicConcurrency::agent • { • private: • transformer<string,string> reverser; • public: • unbounded_buffer<string> inputBuffer; • ReverseAgent() • : reverser([] (string in) -> string { • string reversed(in); • reverse(reversed.begin(), reversed.end()); • returnreversed; • }) • { • inputBuffer.link_target(&reverser); • } • protected: • virtualvoid run(); • };

  22. Simple Agents Example: ReverserAgent::run • voidReverserAgent::run() { • for(;;) { • string s = receive(&reverser); • if (s == "pots") { • done(); • return; • } • cout<< "Received message : " << s << endl; • } • }

  23. Simple Agents Example: Sending messages • void main() • { • ReverserAgentreverseAgent; • reverseAgent.start(); • for(;;) { • string s; • cin>> s; • send(reverseAgent.inputBuffer, s); • if(s == "stop") • break; • } • agent::wait(&reverseAgent); • }

  24. Demo: String Reverse Agent

  25. Concurrent Containers • Two thread-safe, lock-free containers provided: • concurrent_vector<T>: • Lock-free push_back, element access, and iteration • No deletion! • concurrent_queue<T>: • Lock-free push and pop • Sample pack adds: • concurrent_unordered_map<T,U> • concurrent_set<T>

  26. concurrent_vector<T> • #include<ppl.h> • #include <concurrent_vector.h> • using namespace Concurrency; • concurrent_vector<int> carmVec; • parallel_for(2, 5000000, [&carmVec](inti) { • if (is_carmichael(i)) • carmVec.push_back(i); • });

  27. concurrent_queue<T> • #include<ppl.h> • #include <concurrent_queue.h> • using namespace Concurrency; • concurrent_queue<int> itemQueue; • parallel_invoke([&itemQueue]{ // Produce 1000 items • for (inti=0; i<1000; ++i) • itemQueue.push(i); • }, • [&itemQueue] { // Consume 1000 items • for (inti=0; i<1000; ++i) { • intresult = -1; • while (!itemQueue.try_pop(result)) • Context::Yield(); • ProcessItem(result); • } • });

  28. Take-aways • The “Many Core Shift” is happening • VS2010 with the Concurrency Runtime can help • Use PPL & Agents to express your potential concurrency • Let the runtime figure out the actual concurrency • Parallel iteration can help your application scale • Asynchronous Agents provide isolation from shared state • Concurrent collections are scalable and lock-free

  29. Resources • Parallel Computing Developer Center http://msdn.com/Concurrency • ConcRT Sample Pack http://code.msdn.com/concrtextras • Native Concurrency Blog http://blogs.msdn.com/nativeconcurrency • Forums http://social.msdn.microsoft.com/Forums/en-US/category/parallelcomputing

  30. Q&A

  31. Backup: is_carmichael() • boolis_carmichael(constint n) { • if (n < 2) { return false; } • intk = n; • for (inti = 2; i <= k / i; ++i) { • if (k % i == 0) { • if ((k / i) % i == 0) { return false; } • if ((n - 1) % (i - 1) != 0) { return false; } • k /= i; • i = 1; • } • } • return k != n && (n - 1) % (k - 1) == 0; • }

More Related