slide1
Download
Skip this Video
Download Presentation
virtual techdays

Loading in 2 Seconds...

play fullscreen
1 / 28

virtual techdays - PowerPoint PPT Presentation


  • 168 Views
  • Uploaded on

INDIA │ 18-20 august 2010. virtual techdays. Parallelize applications using Intel Threading Building Blocks. Om Sachan │ SSG, Intel Corporation. INDIA │ 18-20 august 2010. virtual techdays. Intel® Threading Building Blocks overview Generic Parallel Algorithms

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' virtual techdays' - carson


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

INDIA │ 18-20 august2010

virtual techdays

Parallelize applications using Intel Threading Building Blocks

Om Sachan│ SSG, Intel Corporation

slide2

INDIA │ 18-20 august2010

virtual techdays

  • Intel® Threading Building Blocks overview
  • Generic Parallel Algorithms
  • Lab: Parallelize serial application
  • Generic Concurrent Containers
  • Synchronization Primitives
  • Advanced Features Overview
  • Summary

S E S S I O N A G E N D A

slide3

INDIA │ 18-20 august2010

virtual techdays

  • Enables you to specify tasks instead of threads
      • automatically maps task onto physical threads in the way that makes efficient use of processor resources
  • Targets threading for performance
      • solution for parallelizing a computationally intensive work units and preserve good scalability across various hardware
  • Compatible with other threading packages
      • work well for CPU bound tasks, not I/O bound; coexists with other threading packages
  • Emphasizes scalable, data parallel programming
      • scales well for the bigger number of processors
  • Relies on generic programming
      • Set of templates implemented in the Intel® TBB allows writing the flexible algorithms.

Intel® Threading Building Blocks

Overview

slide4

INDIA │ 18-20 august2010

virtual techdays

  • Product package includes:
    • Dynamic libraries (debug and release)
    • Header files
    • Sample code
    • Documentation: tutorial, getting started guide,reference

Intel® Threading Building Blocks

Overview

  • Supported Platforms:
    • IA-32, Intel64
    • Parallel Studio
  • Intel® TBB is a set of generic algorithms and data structures (C++ templates) Trivial Intel® TBB program:

#include "tbb/task_schedulerInit.h"

using namespace tbb;

int main ()

{

task_scheduler_init TBB_Init;

return 0;

}

All public classes and functions are

in tbb namespace

Library requires explicit initialization:

at least one task_scheduler_init object

must be active

slide5

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Usage Model

  • Algorithms and data structures that manipulate with concepts
    • A concept is requirements on type
    • A type models a concept
    • Program defines types required by Intel® TBB constructs
  • Parallel Generic Algorithms and Concurrent Containers
    • C++ programming experience, basic STL and basic threading knowledge are required to get started. No need to be threading Expert.
  • Task Scheduler
    • An engine to power Parallel Generic Algorithms that hide the complexity of the tasks management. Task Scheduler may be used for advanced programming when your algorithm doesn’t naturally map onto one of pre-packaged Parallel Algorithms. Threading programming and tuning experience are required.
  • Synchronization Primitives
    • The objects should be used carefully as inappropriate use of synchronization may lead to performance and correctness issues. Solid threading programming and tuning experience are required.
slide6

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Generic Parallel Algorithms

slide7

X::X (X&, Split)

  • Splitting constructor. Splits x into x and y
  • Range Concept
    • The type R represents recursively divisible set of values; it must model Splittable Concept
  • R::R (const R&)
  • Copy constructor
  • R::~R ()
  • Destructor
  • bool R::is_empty() const
  • Returns ‘true’ if range is empty
  • bool R::is_divisible() const
  • Returns ‘true’ if range can be partitioned in to two sub-ranges
  • R::R (R&, Split)
  • Splitting constructor

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Generic Parallel Algorithms : Basic Concepts

  • Splittable Concept
    • The type X is splittable if it has a constructor that allows an instance to be split into two pieces
slide8

parallel_for Body Concept Requirements

  • Body::Body (const Body&)
  • Copy constructor
  • Body::~Body ()
  • Destructor
  • void Body::operator() (Range&) const
  • Apply Body to Range

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Generic Parallel Algorithms : parallel_for Template Function

  • #include “tbb/ParallelFor.h”
  • template <Range, Body> parallel_for (const Range& range, const Body& body>
    • represents parallel execution of Body over each value in the Range
  • Range type must model Intel® Threading Building Blocks Range Concept described on the previous foil
slide9

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Example: Parallelizing Simple Loops

  • Task:loop over the fixed size array of elements and apply a function to each of them (iterations are independent)
  • Serial version of the solution:

const int N = 20000000;

void ChangeAarraySerial (int* array, int M) {

for (int i = 0; i < M; i++){

array[i] *= 2;

}

}

int main (){

int A[N];

for (int i = 0; i < N; i++) { A[i] = i;}

ChangeArraySerial (A, N);

return 0;

}

slide10

#include "tbb/blocked_range.h"

#include "tbb/parallel_for.h"

using namespace tbb;

const int IdealGrainSize = <some number>;

class ChangeArray{

int* array;

public:

ChangeArray (int* a): array(a) {}

void operator()( const blocked_range<int>& r ) const{

for (int i=r.begin(); i!=r.end(); i++ ){

array[i] *= 2;

}

}

};

void ChangeArrayParallel (int* a, int n )

{

parallel_for (blocked_range<int>(0, n, IdealGrainSize), ChangeArray(a));

}

int main (){

int A[N];

// initialize tbb, array here…

ChangeArrayParallel (A, N);

return 0;

}

ChangeArray class

models ParallelFor Body

Blocked_range is a

pre-packaged 1D iteration

space, models Range Concept

Apply change

to array element

in the body of operator()

Call generic function

Parallel_for<Range, Body>:

Range  Blocked_Range

Body  ChangeArray

Experiment with

Grain Size

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

  • Parallel solution with Intel® TBB : using parallel_for
slide11

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Lab 1:

  • Convert Serial Matrix multiplication application into parallel application using parallel_for.
slide12

Body::Body (const Body&)

  • Copy constructor
  • Body::~Body ()
  • Destructor
  • void Body::operator() (Range&)
  • Apply Body to Range
  • Splitting constructor; must be able to run concurrently with ‘join’, `operator()’
  • Body::Body (const Body&, Split)
  • The result of rhs must be merged with result of `this`
  • void Body::join (const Body& rhs)

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Generic Parallel Algorithms : parallel_reduce Template Function

  • #include “tbb/ParallelReduce.h”
  • template <Range, Body> parallel_reduce (const Range& range, const Body& body >

- represents parallel reduction of Body over each value in the Range

  • parallel_reduce Body Concept Requirements
  • Range type must model Intel® Threading Building Blocks Range Concept
slide13

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

  • Parallel solution with Intel® TBB : using parallel_reduce

#include "tbb/blocked_range.h"

#include "tbb/parallel_reduce.h"

using namespace tbb;

const int IdealGrainSize = <some number>;

class SumArray{

int* array;

public:

int sum;

SumArray (int* a): array(a), sum(0) {}

void operator()( const blocked_range<int>& r ) {

for (counter i=r.begin(); i!=r.end(); i++ ){

sum += array[i];

}

}

SumArray (SumArray& partial_sum,split): array(partial_sum.array), sum(0) {}

void join (const SumArray& partial_sum) { sum += partial_sum.sum; }

};

void SumArrayParallel (int* a, int n )

{

SumArray sum_array (a);

parallel_reduce (blocked_range<int>(0, n, IdealGrainSize), sum_array);

return sum_array.sum;

}

Class SumArray models parallel_reduce

Body Concept

Calculate partial ‘sum’ of

array elements

in the body of operator()

Define splitting

constructor

Perform Reduction

in the body of ‘join’

Call generic function

parallel_reduce<Range, Body>

slide14

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Generic Concurrent Containers

slide15

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Concurrent Containers

  • Provides concurrent containers
    • STL containers are not thread-safe: attempt to modify them concurrently can corrupt container
    • Standard practice is to wrap a lock around STL containers
      • Turns container into serial bottleneck
  • Interfaces are similar to STL but don’t match 100%.
    • Some STL interfaces are inherently not thread-safe
  • Fine-grained locking or lockless implementations
    • Worse single-thread performance, but better scalability.
    • Can be used with the library, OpenMP, or native threads.
slide16

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Concurrent Containers : concurrent_hash_table

  • concurrent_hash_table <Key, T, HashCompare>
      • Maps Key to element of type T
      • Hash table of to std::pair <const Key, T>
      • You should implement HashCompare class and define 2 methods: ‘hash’ (mapping Key to hash code of type size_t), and predicate ‘equal’ (returns true if two Key’s are equal)
slide17

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Concurrent Containers : concurrent_vector

  • concurrent_vector <T>
      • Dynamically growable array of T: grow_by and grow_to_atleast
      • clear() method is not thread-safe with respect to resizing
      • ConcurrentVector never moves the element until the array cleared
slide18

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Concurrent Containers : concurrent_queue

  • concurrent_queue <T>
      • For single threaded run it supports “first-in-first-out” ordering
      • If one thread pushes two values and the other thread pops those two values they will come out in the order as they were pushed
      • The type of ‘size’ is signed number: if queue is empty and size() returns ‘–n’ this means ‘n’ pops are pending
      • Method ‘empty’ returns true if size is a negative value
slide19

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Synchronization Primitives

slide20

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Synchronization Primitives : Mutex Concept

Mutexes are C++ objects based on scoped locking pattern

slide21

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Synchronization Primitives : Mutex Flavors

  • spin_mutex
      • Non-reentrant, unfair, spins in the user space
      • VERY FAST in lightly contended situations; use it if you need to protect very few instructions
  • queuing_mutex
      • Non-reentrant, fair, spins in the user space
      • Use Queuing_Mutex when scalability and fairness are important
  • queuing_rw_mutex
      • Non-reentrant, fair, spins in the user space
  • spin_rw_mutex
      • Non-reentrant, fair, spins in the user space
      • Use ReaderWriterMutex to allow non-blocking read for multiple threads
  • mutex
      • Wrapper for OS sync: CRITICAL_SECTION for Windows*, pthread_mutex on Linux*
slide22

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Synchronization Primitives : Example of spin_rw_mutex

  • Allows multiple threads to read the protected data, but only one can exclusively change the data (writer)
  • Upgrade/Downgrade operations
      • update_to_writer: returns true if it successfully upgraded a lock without temporarily releasing the mutex
      • downgrade_to_reader

#include “tbb/spin_rw_mutex.h”

using namespace tbb;

spin_rw_mutex MyMutex;

int foo (){

/* Construction of ‘lock’ acquires ‘MyMutex’ */

spin_rw_mutex::scoped_lock lock (MyMutex, /*is_writer*/ false);

if (!lock.upgrade_to_writer ()) { … }

else { … }

return 0;

/* Destructor of ‘lock’ releases ‘MyMutex’ */

}

slide23

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Advanced Features Overview

slide24

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks

Synchronization Primitives : Mutex Concept

Concurrent Containers

concurrent_hash_table

concurrent_queue

concurrent_vector

Generic Parallel Algorithms

parallel_for

parallel_while

parallel_reduce

pipeline

parallel_sort

parallel_scan

task_scheduler

Low-Level Synchronization Primitives

spin_mutex

queuing_rw_mutex

spin_rw_mutex

mutex

slide25

INDIA │ 18-20 august2010

virtual techdays

Intel® Threading Building Blocks : Summary

  • Scalable data-parallel decompositionproviding patterns for parallel algorithms and concurrent data structures
  • Paradigm of logical tasksthat are efficiently and automatically mapped onto physical threads by task scheduler
  • Works good for computationally intensive tasks as task schedulerefficiently load balances tasksacross the physical threads and it’s cache aware
slide26

INDIA │ 18-20 august2010

virtual techdays

  • Resource-1
    • http://www.threadingbuildingblocks.org/
  • Resource-2
    • http://www.threadingbuildingblocks.org/
    • You may participate in our community support web site.
    • Tools Knowledge Base: http://software.intel.com/en-us/articles/tools
    • User forums: http://software.intel.com/en-us/forums/
    • Intel® Software Product support info: http://www.intel.com/software/support

RESOURCES

slide27

INDIA │ 18-20 august2010

virtual techdays

  • Session-1
    • Speaker Name
    • Timing
  • Session-2
    • Speaker Name
    • Timing
  • Session-3
    • Speaker Name
    • Timing

RELATED CONTENT

ad