houston tech fest 2011 scalable concurrent c using microsoft concrt and amp n.
Skip this Video
Download Presentation
Houston Tech Fest 2011 Scalable Concurrent C++ Using Microsoft ConcRT and AMP

Loading in 2 Seconds...

play fullscreen
1 / 46

Houston Tech Fest 2011 Scalable Concurrent C++ Using Microsoft ConcRT and AMP - PowerPoint PPT Presentation

  • Uploaded on

Houston Tech Fest 2011 Scalable Concurrent C++ Using Microsoft ConcRT and AMP. Presented by David Cravey 10/15/2011. About Me – David Cravey. Started programming in 4 th grade Learned BASIC on a V-Tech “ Precomputer 1000” and then GW-BASIC, and eventually QuickBasic

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Houston Tech Fest 2011 Scalable Concurrent C++ Using Microsoft ConcRT and AMP' - tait

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
houston tech fest 2011 scalable concurrent c using microsoft concrt and amp

Houston Tech Fest 2011Scalable Concurrent C++ Using Microsoft ConcRT and AMP

Presented by David Cravey


about me david cravey
About Me – David Cravey
  • Started programming in 4th grade
    • Learned BASIC on a V-Tech “Precomputer 1000” and then GW-BASIC, and eventually QuickBasic
    • Got bored with BASIC in 8th Grade so moved to C++
  • Software Development Manager at Vivicom
  • President of the Houston C++ User Group
    • Meets at Microsoft’s Houston Office
    • 1st Thursday of Each Month @ 7PM
  • Microsoft Visual C++ MVP
  • Why C++?
  • Concurrent Runtime
    • Tasks
    • PPL
    • Agents
    • AMP
  • Resources
  • Summary

The language of power!

why c
Why C++


  • C++ Provides
    • Speed
      • Down to the metal performance!
    • Access to the Latest Hardware and Drivers
      • Example: GPGPU
    • Multi-paradigm Programming
      • Procedural
      • Object Oriented
      • Generic Programming
    • High Level Programming (i.e. Strong Abstractions)
      • Classes AND Templates
      • But still allows you to step down to Low Level as needed!
    • Portable Code
modern c
Modern C++:




*Used with permission from Herb Sutter’s “Writing modern C++ code: how C++ has evolved over the years” http://channel9.msdn.com/Events/BUILD/BUILD2011/TOOL-835T

automatic memory management
Automatic Memory Management
  • Never type “delete” again!

unique_ptr< >

shared_ptr< >

weak_ptr< >

what s different at a glance
What’s Different: At a Glance

T* shared_ptr<T>

new make_shared

  • Then
  • Now

auto type deduction

for/while/do std:: algorithms[&] lambda functions

no need for “delete”

automatic lifetime management


not exception-safe

missing try/catch, __try/__finally

circle* p = newcircle( 42 );

vector<shape*> vw= load_shapes();

for( vector<circle*>::iterator i = vw.begin(); i != vw.end(); ++i ) { if( *i && **i == *p ) cout << **i << “ is a match\n”;}

for( vector<circle*>::iterator i = vw.begin();i != vw.end(); ++i ) {delete *i;}

delete p;

auto p = make_shared<circle>( 42 );

vector<shared_ptr<shape>> vw= load_shapes();

for_each( begin(vw), end(vw), [&]( shared_ptr<circle>& s ) { if( s && *s == *p ) cout << *s << “ is a match\n”;} );

*Used with permission from Herb Sutter’s “Writing modern C++ code: how C++ has evolved over the years” http://channel9.msdn.com/Events/BUILD/BUILD2011/TOOL-835T


Because processors will keep getting more cores … but not very many more GHz!

why concurrency
Why Concurrency?

You can deal with problems faster if you have more threads (or “light sabers”)!!!


why a concurrency runtime
Why A Concurrency Runtime?
  • According to the MSDN:


    • A runtime for concurrency provides uniformity and predictability to applications and application components that run simultaneously.
    • (i.e. Without a single concurrency runtime various libraries and routines will end up “competing” instead of “cooperating” for processor resources.)
without a concurrency runtime
Without a Concurrency Runtime

Threads will compete for system resources and the program will run slower instead of faster!!!!


with a concurrency runtime
With a Concurrency Runtime

Threads will cooperate to make maximum use of system resources and the program will faster!!!!


what does concrt provide
What does ConcRT Provide?
  • Improved use of processing resources
    • Cooperative Task Scheduling
    • Cooperative Blocking
    • Work Stealing Task Queues
  • Low Level Building Blocks
    • Synchronization Primitives
    • Task Schedulers
    • Resource Managers
  • 2 High Level Libraries
    • PPL – Parallel Patterns Library
    • Agents – Asynchronous Agents Library
  • Concurrent Container and Message Passing Libraries
concrt architecture diagram
ConcRT Architecture Diagram

(Diagram taken from MSDN http://msdn.microsoft.com/en-us/library/ee207192.aspx)

concrt task s
ConcRT Task’s

MSDN - http://msdn.microsoft.com/en-us/library/dd492427.aspx

  • Basic building block for concurrency under ConcRT
  • A Task is a unit of work that performs a specific job
  • Tasks can be further broken down into more fine grain tasks (fork and join on “child” tasks)
  • Tasks are kinds like very light weight Threads
    • Threads normally reserve 1MB of memory for their stacks.
    • Thread context switches eat processing time reducing throughput
work stealing
Work Stealing
  • When a running task creates additional tasks it adds them to the bottom of the queue for the current Processor.
  • If another Processor does not have any tasks in its queue it will steal a task from the top of another Processor’s queue (the top of the queue is the least likely to still be in the other Processor’s Cache).

Processor #1

Task #1

Task #1

Task #2

Processor #2

Task #3

Task #2

synchronization data structures
Synchronization Data Structures
  • Concurrency::critical_section
    • Cooperative mutual exclusion object
    • (yields to other tasks instead of preemting)
  • Concurrency::reader_writer_lock
    • Only allows a single writer
    • Allows multiple readers if no writers
  • Concurrency::scoped_lockand Concurrency::scoped_read_lock
    • RAII locking for critical_section and reader_writer_lock
  • Concurrency::event
    • Allows Tasks to signal each other that an Event has occurred
potential concurrency
Potential Concurrency
  • Potential Concurrency is the concurrency that your application could have if computer could utilize it.
  • Tasks are lightweight so that they are “cheap” to create. This allows you create many tasks to express the Potential Concurrency of your program.
  • In other words … expressing the Potential Concurrency of your application Future Proofs your application!
parallel patterns library overview
Parallel Patterns Library Overview
  • Task Parallelism
    • Tasks and Task Groups
      • Concurrency::task_group
      • Concurrency::structured_task_group
  • Parallel Algorithms
    • Concurrency::parallel_for
    • Concurrency::parallel_for_each
    • Concurrency::parallel_invoke
  • Parallel Containers and Objects
    • Concurrency::concurrent_vector<T>
    • Concurrency::concurrent_queue<T>
    • Concurrency::combinable<T>
ppl task groups
PPL Task Groups
  • Tasks are grouped by the task group they are created within.
  • A tasks is cancelled as a group
    • This is useful for operations such a search, where once the item searched for is found then all tasks that are searching should be canceled.
    • Note that if a Task Group is cancelled while waiting on anther Task Group to complete the Task Group that is waiting will also be cancelled.
ppl algorithms today
PPL Algorithms Today
  • Concurrency::parallel_for
    • Performs parallel tasks using iteration values
    • (much like a normal for loop)
  • Concurrency::parallel_for_each
    • Performs parallel tasks for each item in an iterator range
    • (much like std::for_each)
  • Concurrency::parallel_invoke
    • Executes a set of tasks in parallel
  • PPL algorithms do not return until all the tasks within them complete or are canceled.
concrt extras and sample pack
ConcRT Extras and Sample Pack
  • Microsoft has released the ConcRT Extras and Sample Pack to give early access to new enhancements to the ConcRT before the next version of VC++.
  • The ConcRT Extras and Sample Pack can be downloaded at:


  • These are Template Libraries, so only need to include the header files.
  • Microsoft has stated they encourage users to not only use, but to modify the Libraries to learn more.
upcoming ppl algorithms
Upcoming PPL Algorithms
  • Currently Available as part of the ConcRT Sample Pack
    • Concurrency::parallel_transform
    • Concurrency::parallel_reduce
    • Concurrency::parallel_sort
    • Concurrency::parallel_buffered_sort
    • Concurrency::parallel_radixsort
    • Parallel Partitioners
  • These have been announced to be part of vNext


ppl containers and objects
PPL Containers and Objects
  • Concurrency::concurrent_vector<T>
    • Provides Concurrent Safe
      • Random Access, Element Access, Iterator Access/Transversal
      • Append
    • Does Not Provide Deletion Of Elements
  • Concurrency::concurrent_queue<T>
    • Provides Concurrent Safe
      • Enqueue and Dequeue operations
  • Concurrency::combinable<T>
    • Reuseable Thread Local Storage
    • Allows Associative Operations to be combined at the end of a parallel_for, parallel_for_each, etc.
upcoming ppl containers
Upcoming PPL Containers
  • Currently Available as part of the ConcRT Sample Pack
    • concurrent_unordered_map
    • concurrent_unordered_multimap
    • concurrent_unordered_set
    • concurrent_unordered_multiset
  • Like the new algorithms these new containers have been announced to be part of vNext


when to use ppl
When To Use PPL
  • When you have reasonably large tasks that can be processed in parallel
    • This often requires that you change your algorithm to be parallel-able (for example using combinable<T>)
  • It is easy to change your existing code to use PPL to accomplish:
    • Parallel Sorts
    • Parallel Sums/Counts/Averages (use Combinable<T>)
    • Parallel Map/Reduce
ppl best practices
PPL Best Practices

From MSDN - http://msdn.microsoft.com/en-us/library/ff601930.aspx

  • Do Not Parallelize Small Loop Bodies
  • Express Parallelism at the Highest Possible Level
  • Use parallel_invoke to Solve Divide-and-Conquer Problems
  • Use Cancellation or Exception Handling to Break from a Parallel Loop
  • Understand how Cancellation and Exception Handling Affect Object Destruction
  • Do Not Block Repeatedly in a Parallel Loop
  • Do Not Perform Blocking Operations When You Cancel Parallel Work
  • Do Not Write to Shared Data in a Parallel Loop
  • When Possible, Avoid False Sharing
  • Make Sure That Variables Are Valid Throughout the Lifetime of a Task

Using the PPL to parallelize loops

asynchronous agents overview
Asynchronous Agents Overview
  • According to MSDN:

An asynchronous agent (or just agent) is an application component that works asynchronously with other agents to solve larger computing tasks.

Read File From Disk

Decrypt Input Data

Decompress Input Data

Process File Data

Transmit Output Data

Encrypt Output Data

Compress Output Data

agent message passing
Agent Message Passing
  • Programming Model
    • Message Passing Based “Life Cycle” Pattern
  • Asynchronous Message Blocks
    • Concurrency::unbounded_buffer<T>
    • Concurrency::overwrite_buffer<T>
    • Concurrency::single_assignment<T>
  • Message Passing Functions
    • Concurrency::send<T>
    • Concurrency::asend<T>
    • Concurrency::receive<T>
    • Concurrency::try_receive<T>
agent message passing diagram
Agent Message Passing Diagram

(Diagram taken from MSDN http://msdn.microsoft.com/en-us/library/ee207192.aspx)

when to use asynchronous agents
When to use Asynchronous Agents
  • When you have multiple processing steps that can work in parallel to process data as a pipeline
  • (i.e. when you can arrange your code to work as an assembly line such that you can achieve parallelism)
  • Examples:
    • Image Processing
    • Large Calculations That Build Upon Previous Calculations
heterogeneous computing
Heterogeneous Computing

Programming the GPU using AMP

the power of heterogeneous computing
The Power of Heterogeneous Computing






Interactive visualization of volumetric white matter connectivity

Ionic placement for molecular dynamics simulation on GPU

Astrophysics N-body simulation

Simulation in Matlab using .mex file CUDA function

Transcoding HD video stream to H.264






Financial simulation of LIBOR model with swaptions

Ultrasound medical imaging for cancer diagnostics

Highly optimized object oriented molecular dynamics

GLAME@lab: An M-script API for linear Algebra operations on GPU

Cmatch exact string matching to find similar proteins and gene sequences


*Used with permission from Daniel Moth’s “Taming GPU compute with C++ Accelerated Massive Parallelism”


cpus vs gpus today
CPUs vs GPUs today
  • CPU
  • GPU

images source: AMD

*Used with permission from Daniel Moth’s “Taming GPU compute with C++ Accelerated Massive Parallelism”


Low memory bandwidth

Higher power consumption

Medium level of parallelism

Deep execution pipelines

Random accesses

Supports general code

Mainstream programming

High memory bandwidth

Lower power consumption

High level of parallelism

Shallow execution pipelines

Sequential accesses

Supports data-parallel code

Niche programming

c amp
  • Accelerated Massive Parallelism
  • Best for Data Parallelism
  • Bring GPGPU to the Masses
    • Write C++ Code that runs on the GPU
  • Available as part of the Visual Studio 2011 Developer Preview
    • http://msdn.microsoft.com/en-US/vstudio/hh127353
  • When running VS11 on Win8 there is even GPGPU debugging!
  • Microsoft is submitting it as an Open Specification
    • Several other compiler vendors have committed to implementing AMP.
hello world array addition
Hello World: Array Addition

void AddArrays(int n, int * pA, int * pB, int * pC)


    for (int i=0; i<n; i++)


pC[i] = pA[i] + pB[i];



#include <amp.h>

using namespace concurrency;

void AddArrays(int n, int * pA, int * pB, int * pC)


array_view<int,1> a(n, pA);

array_view<int,1> b(n, pB);

array_view<int,1> sum(n, pC);



[=](index<1> i) restrict(direct3d)


        sum[i] = a[i] + b[i];




*Used with permission from Daniel Moth’s “Taming GPU compute with C++ Accelerated Massive Parallelism”



For your reference

general c links
General C++ Links

Microsoft’s MSDN C++ Developer Center



(Great Site for quick refernce to C++ and STL)


Visual Studio Team Blog


Herb Sutter’s Blog

(ISO C++ Chairman and Microsoft Software Architect)


parallel programming in native code blog
Parallel Programming in Native Code Blog

Best Way To Stay Up To Date

  • Parallel Programming in Native Code Blog


Great tutorials and more

  • How to pick your parallel sort?


  • concurrent_vectorand concurrent_queueexplained


  • Synchronization with the Concurrency Runtime (2 parts)


  • Resource Management in the Concurrency Runtime (3 parts)


concrt written resources
ConcRT Written Resources

MSDN - Concurrency Runtime




Parallel Programming with Microsoft Visual C++

(Free Book Online, PBook and EBook not free)


Introducing the Visual C++ Concurrency Runtime (59 page hands on lab)


Parallel Programming in Native Code Blog


concrt video resources
ConcRT Video Resources

Don McCrady - Parallelism in C++ Using the Concurrency Runtime


The Concurrency Runtime: Fine Grained Parallelism for C++


Parallel Programming for C++ Developers: Tasks and Continuations (2 Parts)


Native Parallelism with the Parallel Patterns Library


amp resources
AMP Resources

Herb Sutter: Heterogeneous Computing and C++ AMP

(Learn about the future of computing)


Taming GPU compute with C++ AMP


Walkthrough: Debugging an AMP Application


Daniel Moth’s Blog

(AMP Project Manager)


  • C++ is a Modern Language
  • C++ is the language of choice to:
    • Maximize Speed
    • Minimize Power Consumption
    • Target the latest hardware
    • Have full control of your application
  • Native Concurrency using C++ PPL, Agents, and AMP provide a powerful set of tools to enable you to unlock your potential concurrency!!!
  • C++ is AMPed!!!
thank you for coming
Thank you for coming!

Please fill out a evaluation form before you leave!

If you would like a copy of this slide deck please email me at dcravey@gmail.com

If you would more information please contact me or better yet, come to either the local C++ User Groups:

Houston C++ User Group (1st Thursday each month)

University of Houston C++ User Group (Wednesday before 1st Thursday each month)