INDIA
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

virtual techdays PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on
  • Presentation posted in: General

INDIA │ 18-20 august 2010. virtual techdays. Parallelize applications using Intel Threading Building Blocks. Om Sachan │ SSG, Intel Corporation. INDIA │ 18-20 august 2010. virtual techdays. Intel® Threading Building Blocks overview Generic Parallel Algorithms

Download Presentation

virtual techdays

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Virtual techdays

INDIA │ 18-20 august2010

virtual techdays

Parallelize applications using Intel Threading Building Blocks

Om Sachan│ SSG, Intel Corporation


Virtual techdays

INDIA │ 18-20 august2010

virtual techdays

  • Intel® Threading Building Blocks overview

  • Generic Parallel Algorithms

  • Lab: Parallelize serial application

  • Generic Concurrent Containers

  • Synchronization Primitives

  • Advanced Features Overview

  • Summary

S E S S I O N A G E N D A


Virtual techdays

INDIA │ 18-20 august2010

virtual techdays

  • Enables you to specify tasks instead of threads

    • automatically maps task onto physical threads in the way that makes efficient use of processor resources

  • Targets threading for performance

    • solution for parallelizing a computationally intensive work units and preserve good scalability across various hardware

  • Compatible with other threading packages

    • work well for CPU bound tasks, not I/O bound; coexists with other threading packages

  • Emphasizes scalable, data parallel programming

    • scales well for the bigger number of processors

  • Relies on generic programming

    • Set of templates implemented in the Intel® TBB allows writing the flexible algorithms.

  • Intel® Threading Building Blocks

    Overview


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    • Product package includes:

      • Dynamic libraries (debug and release)

      • Header files

      • Sample code

      • Documentation: tutorial, getting started guide,reference

    Intel® Threading Building Blocks

    Overview

    • Supported Platforms:

      • IA-32, Intel64

      • Parallel Studio

    • Intel® TBB is a set of generic algorithms and data structures (C++ templates) Trivial Intel® TBB program:

    #include "tbb/task_schedulerInit.h"

    using namespace tbb;

    int main ()

    {

    task_scheduler_init TBB_Init;

    return 0;

    }

    All public classes and functions are

    in tbb namespace

    Library requires explicit initialization:

    at least one task_scheduler_init object

    must be active


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Usage Model

    • Algorithms and data structures that manipulate with concepts

      • A concept is requirements on type

      • A type models a concept

      • Program defines types required by Intel® TBB constructs

    • Parallel Generic Algorithms and Concurrent Containers

      • C++ programming experience, basic STL and basic threading knowledge are required to get started. No need to be threading Expert.

    • Task Scheduler

      • An engine to power Parallel Generic Algorithms that hide the complexity of the tasks management. Task Scheduler may be used for advanced programming when your algorithm doesn’t naturally map onto one of pre-packaged Parallel Algorithms. Threading programming and tuning experience are required.

    • Synchronization Primitives

      • The objects should be used carefully as inappropriate use of synchronization may lead to performance and correctness issues. Solid threading programming and tuning experience are required.


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Generic Parallel Algorithms


    Virtual techdays

    • X::X (X&, Split)

    • Splitting constructor. Splits x into x and y

    • Range Concept

      • The type R represents recursively divisible set of values; it must model Splittable Concept

    • R::R (const R&)

    • Copy constructor

    • R::~R ()

    • Destructor

    • bool R::is_empty() const

    • Returns ‘true’ if range is empty

    • bool R::is_divisible() const

    • Returns ‘true’ if range can be partitioned in to two sub-ranges

    • R::R (R&, Split)

    • Splitting constructor

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Generic Parallel Algorithms : Basic Concepts

    • Splittable Concept

      • The type X is splittable if it has a constructor that allows an instance to be split into two pieces


    Virtual techdays

    • parallel_for Body Concept Requirements

    • Body::Body (const Body&)

    • Copy constructor

    • Body::~Body ()

    • Destructor

    • void Body::operator() (Range&) const

    • Apply Body to Range

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Generic Parallel Algorithms : parallel_for Template Function

    • #include “tbb/ParallelFor.h”

    • template <Range, Body> parallel_for (const Range& range, const Body& body>

      • represents parallel execution of Body over each value in the Range

    • Range type must model Intel® Threading Building Blocks Range Concept described on the previous foil


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Example: Parallelizing Simple Loops

    • Task:loop over the fixed size array of elements and apply a function to each of them (iterations are independent)

    • Serial version of the solution:

    const int N = 20000000;

    void ChangeAarraySerial (int* array, int M) {

    for (int i = 0; i < M; i++){

    array[i] *= 2;

    }

    }

    int main (){

    int A[N];

    for (int i = 0; i < N; i++) { A[i] = i;}

    ChangeArraySerial (A, N);

    return 0;

    }


    Virtual techdays

    #include "tbb/blocked_range.h"

    #include "tbb/parallel_for.h"

    using namespace tbb;

    const int IdealGrainSize = <some number>;

    class ChangeArray{

    int* array;

    public:

    ChangeArray (int* a): array(a) {}

    void operator()( const blocked_range<int>& r ) const{

    for (int i=r.begin(); i!=r.end(); i++ ){

    array[i] *= 2;

    }

    }

    };

    void ChangeArrayParallel (int* a, int n )

    {

    parallel_for (blocked_range<int>(0, n, IdealGrainSize), ChangeArray(a));

    }

    int main (){

    int A[N];

    // initialize tbb, array here…

    ChangeArrayParallel (A, N);

    return 0;

    }

    ChangeArray class

    models ParallelFor Body

    Blocked_range is a

    pre-packaged 1D iteration

    space, models Range Concept

    Apply change

    to array element

    in the body of operator()

    Call generic function

    Parallel_for<Range, Body>:

    Range  Blocked_Range

    Body  ChangeArray

    Experiment with

    Grain Size

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    • Parallel solution with Intel® TBB : using parallel_for


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Lab 1:

    • Convert Serial Matrix multiplication application into parallel application using parallel_for.


    Virtual techdays

    • Body::Body (const Body&)

    • Copy constructor

    • Body::~Body ()

    • Destructor

    • void Body::operator() (Range&)

    • Apply Body to Range

    • Splitting constructor; must be able to run concurrently with ‘join’, `operator()’

    • Body::Body (const Body&, Split)

    • The result of rhs must be merged with result of `this`

    • void Body::join (const Body& rhs)

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Generic Parallel Algorithms : parallel_reduce Template Function

    • #include “tbb/ParallelReduce.h”

    • template <Range, Body> parallel_reduce (const Range& range, const Body& body >

      - represents parallel reduction of Body over each value in the Range

    • parallel_reduce Body Concept Requirements

    • Range type must model Intel® Threading Building Blocks Range Concept


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    • Parallel solution with Intel® TBB : using parallel_reduce

    #include "tbb/blocked_range.h"

    #include "tbb/parallel_reduce.h"

    using namespace tbb;

    const int IdealGrainSize = <some number>;

    class SumArray{

    int* array;

    public:

    int sum;

    SumArray (int* a): array(a), sum(0) {}

    void operator()( const blocked_range<int>& r ) {

    for (counter i=r.begin(); i!=r.end(); i++ ){

    sum += array[i];

    }

    }

    SumArray (SumArray& partial_sum,split): array(partial_sum.array), sum(0) {}

    void join (const SumArray& partial_sum) { sum += partial_sum.sum; }

    };

    void SumArrayParallel (int* a, int n )

    {

    SumArray sum_array (a);

    parallel_reduce (blocked_range<int>(0, n, IdealGrainSize), sum_array);

    return sum_array.sum;

    }

    Class SumArray models parallel_reduce

    Body Concept

    Calculate partial ‘sum’ of

    array elements

    in the body of operator()

    Define splitting

    constructor

    Perform Reduction

    in the body of ‘join’

    Call generic function

    parallel_reduce<Range, Body>


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Generic Concurrent Containers


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Concurrent Containers

    • Provides concurrent containers

      • STL containers are not thread-safe: attempt to modify them concurrently can corrupt container

      • Standard practice is to wrap a lock around STL containers

        • Turns container into serial bottleneck

    • Interfaces are similar to STL but don’t match 100%.

      • Some STL interfaces are inherently not thread-safe

    • Fine-grained locking or lockless implementations

      • Worse single-thread performance, but better scalability.

      • Can be used with the library, OpenMP, or native threads.


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Concurrent Containers : concurrent_hash_table

    • concurrent_hash_table <Key, T, HashCompare>

      • Maps Key to element of type T

      • Hash table of to std::pair <const Key, T>

      • You should implement HashCompare class and define 2 methods: ‘hash’ (mapping Key to hash code of type size_t), and predicate ‘equal’ (returns true if two Key’s are equal)


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Concurrent Containers : concurrent_vector

    • concurrent_vector <T>

      • Dynamically growable array of T: grow_by and grow_to_atleast

      • clear() method is not thread-safe with respect to resizing

      • ConcurrentVector never moves the element until the array cleared


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Concurrent Containers : concurrent_queue

    • concurrent_queue <T>

      • For single threaded run it supports “first-in-first-out” ordering

      • If one thread pushes two values and the other thread pops those two values they will come out in the order as they were pushed

      • The type of ‘size’ is signed number: if queue is empty and size() returns ‘–n’ this means ‘n’ pops are pending

      • Method ‘empty’ returns true if size is a negative value


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Synchronization Primitives


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Synchronization Primitives : Mutex Concept

    Mutexes are C++ objects based on scoped locking pattern


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Synchronization Primitives : Mutex Flavors

    • spin_mutex

      • Non-reentrant, unfair, spins in the user space

      • VERY FAST in lightly contended situations; use it if you need to protect very few instructions

  • queuing_mutex

    • Non-reentrant, fair, spins in the user space

    • Use Queuing_Mutex when scalability and fairness are important

  • queuing_rw_mutex

    • Non-reentrant, fair, spins in the user space

  • spin_rw_mutex

    • Non-reentrant, fair, spins in the user space

    • Use ReaderWriterMutex to allow non-blocking read for multiple threads

  • mutex

    • Wrapper for OS sync: CRITICAL_SECTION for Windows*, pthread_mutex on Linux*


  • Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Synchronization Primitives : Example of spin_rw_mutex

    • Allows multiple threads to read the protected data, but only one can exclusively change the data (writer)

    • Upgrade/Downgrade operations

      • update_to_writer: returns true if it successfully upgraded a lock without temporarily releasing the mutex

      • downgrade_to_reader

    #include “tbb/spin_rw_mutex.h”

    using namespace tbb;

    spin_rw_mutex MyMutex;

    int foo (){

    /* Construction of ‘lock’ acquires ‘MyMutex’ */

    spin_rw_mutex::scoped_lock lock (MyMutex, /*is_writer*/ false);

    if (!lock.upgrade_to_writer ()) { … }

    else { … }

    return 0;

    /* Destructor of ‘lock’ releases ‘MyMutex’ */

    }


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Advanced Features Overview


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks

    Synchronization Primitives : Mutex Concept

    Concurrent Containers

    concurrent_hash_table

    concurrent_queue

    concurrent_vector

    Generic Parallel Algorithms

    parallel_for

    parallel_while

    parallel_reduce

    pipeline

    parallel_sort

    parallel_scan

    task_scheduler

    Low-Level Synchronization Primitives

    spin_mutex

    queuing_rw_mutex

    spin_rw_mutex

    mutex


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    Intel® Threading Building Blocks : Summary

    • Scalable data-parallel decompositionproviding patterns for parallel algorithms and concurrent data structures

    • Paradigm of logical tasksthat are efficiently and automatically mapped onto physical threads by task scheduler

    • Works good for computationally intensive tasks as task schedulerefficiently load balances tasksacross the physical threads and it’s cache aware


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    • Resource-1

      • http://www.threadingbuildingblocks.org/

    • Resource-2

      • http://www.threadingbuildingblocks.org/

      • You may participate in our community support web site.

      • Tools Knowledge Base: http://software.intel.com/en-us/articles/tools

      • User forums: http://software.intel.com/en-us/forums/

      • Intel® Software Product support info: http://www.intel.com/software/support

    RESOURCES


    Virtual techdays

    INDIA │ 18-20 august2010

    virtual techdays

    • Session-1

      • Speaker Name

      • Timing

    • Session-2

      • Speaker Name

      • Timing

    • Session-3

      • Speaker Name

      • Timing

    RELATED CONTENT


    Virtual techdays

    THANKS│18-20 august2010

    virtual techdays

    email [email protected]


  • Login