Intel Array Building Blocks
200 likes | 316 Views
Developed in 2007, Intel Array Building Blocks (ArBB) is a C++ API designed to simplify parallel programming on multicore CPUs and GPUs. By leveraging Single Instruction, Multiple Data (SIMD) techniques, ArBB provides a user-friendly approach for data-intensive computations across various fields such as bioinformatics, engineering, and financial analytics. With built-in protection against race conditions and deadlocks, ArBB enables developers to focus on application logic while the underlying complexities of hardware and vector instruction set architecture are abstracted away.
Intel Array Building Blocks
E N D
Presentation Transcript
Intel Array Building Blocks By: Edward Jones
Background • Intel Ct: • Developed in 2007 • Parallel programming model for multicore chips • Exploits Single Instruction, Multiple Data (SIMD) • RapidMind • Started in 2004 • Provided software product that simplifies the use of multi-core processors and graphics processing units (GPUs) • Intel acquired RapidMind on August 19, 2009
Intel ArBB • Intel ArBB is a C++ API • Promote parallel programming • Hide intricacies hardware and vector ISA • Oriented to data-intensive mathematical computations • Built in protection • An ArBB program cannot create race conditions or deadlocks by default
What is it used for? • Bioinformatics • Engineering Design • Financial Analytics • Oil and Gas • Medical Imaging • Visual Computing • Signal and Image Processing • Science and Research • Enterprise
Extend C++ • Use standard C++ feature to create new types and operators • Constructs of ArBB • Scalar types – equivalent to primitive C++ types • Vector types – parallel collections of scalar data • Operators– Scalar and vector operators • Functions – User defined code fragments • Control flow
Dense Containers • Very similar to vectors • Dynamically changes size during runtime • Operations: • Element wise scalar operations • Indexing • Reordering • Reductions • Property Access • Most operations run in parallel
Dense Containers Example void vecsum (dense<f32> a, dense<f32> b, dense<f32>&c){ c = a + b; } int main(int argc, char** argv){ #define SIZE = 1024; float a[SIZE]; float b[SIZE]; float c[SIZE]; dense<f32> va; bind (va, a, SIZE); dense<f32> vb; bind (vb, b, SIZE); dense<f32> vc; bind (va, c, SIZE); call(vecsum)(va, vb, vc); }
Element-wise and Vector-scalar Operators • All standard C++ arithmetic, bitwise, and logical operators can be used in vector computations • This allows these operations to be done in parallel to speed up runtime. • Other operators
Collective Operators • Perform computations where output(s) depend on all of the inputs. • Example Reduction – applies an operator over an entire vector to compute a distilled value or values. add_reduce([1 0 2 -1 4]) yields 6 Scan – computes reductions on all prefixes of a collection add_iscan([1 0 2 -1 4]) yields [1 (1+0) (1+0+2) (1+0+2+(-1)) (1+0+2+(-1)+4)]
Other Types of Operators • Permutation Operators • These operations alter the size and order of vectors • a = shift(b, -1, value); • a = rotate(b, -1) • Facility Operators • Provides data processing features
Differences from C++ _for(i32 i=0, i<=N, i++) { _if(condition){ /* code */ /* code */ } _end_for; } _else { _while(condition){ /* code */ /* code */ } _end_if; } _end_while;
Functions • Calling ArBB functions is different from normal function calls • Form: mfc fnct = call(my_function); • Calling a function creates a closure for that function • Once created the first time it will never be created again • Allows for Currying • ‘map’ function allows the programmer to execute a function for every element in a vector
Dynamic Execution Engine • Array Building Blocks provides a dynamic execution engine which comprises three major services: • Threading Runtime • Provides a model for fine-grained model for data and task parallel threading • Memory Manager • Segregates normal C++ memory from the ArBB memory • Set of lock-free memory interfaces as a garbage collector • Just-in-time Compiler/Dynamic Engine • Constructs intermediate representation of computations, performs optimizations, and generates code.
Monte Carlo Computation of PiC/C++ double computepi(){ int cnt = 0; for(int i = 0; i < NEXP; i++){ float x = float(rand()) / float(RAND_MAX); float y = float(rand()) / float(RAND_MAX); float dst = sqrtf (x*x + y*y); if (dst <= 1.0f){ cnt++; } } return 4.0 * ((double) cnt) /NEXP; } *NEXP = O(2p(n))
Monte Carlo Computation of Pi ArBB Void computepi(f64& pi) { random_generator rng; dense<f32> x = rng.randomize(NEXP); dense<f32> y = rng.randomize(NEXP); dense<f32> dist = sqrt(x*x + y*y); dense<Boolean> mask = (dist <= 1.0f); dense<i32> cnt = select(mask, 1, 0); pi = 4.0 * add_reduce(cnt) / NEXP; }
Intel ArBB Today • Preview Release August 25, 2011 • 1.0 beta 6 • Project retired by Intel October 2012 • Overshadowed by Intel Cilk Plus and Intel Threading Building Blocks
Sources http://www.drdobbs.com/parallel/array-building-blocks-a-flexible-paralle/227300084 http://openlab-mu-internal.web.cern.ch/openlab-mu-internal/03_Documents/4_Presentations/Slides/2010-list/02_CERN_openLab_Workshop-2010_Hans_Pabst.pdf