1 / 23

Hierarchically Tiled Arrays

Hierarchically Tiled Arrays. Taylor Finnell Alex Schwartz. Overview. Background and g oals of HTA approach Tiling and Locality Structure of HTAs Creating an HTA Accessing an HTA’s components Operations Communication Operations Global Computations Implementations of HTA (Libraries)

cutter
Download Presentation

Hierarchically Tiled Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchically Tiled Arrays Taylor Finnell Alex Schwartz

  2. Overview • Background and goals of HTA approach • Tiling and Locality • Structure of HTAs • Creating an HTA • Accessing an HTA’s components • Operations • Communication Operations • Global Computations • Implementations of HTA (Libraries) • MATLAB • C++ • Conclusions

  3. Background and Goals • Developed in 2006 by researchers from University of Illinois, IBM, and University of Coruña(Spain) • HTA is the first attempt to make tiles part of a language • Main goals of HTA: • Extend a sequential language to support parallel computation • Facilitate • Control of locality • Communication in parallel programs

  4. Locality • Definition • The phenomenon of the same value or related storage locations being frequently accessed • Types • Temporal Locality • Reuse of specific data within relatively small time durations • Spatial Locality • Use of data elements within relatively close storage locations • Sequential Locality • Special case of Spatial Locality • Data elements are arranged and accessed linearly

  5. Tiling • The structured division of memory to facilitate parallel computation • Used to organize computations so that communication costs in parallel programs are reduced • Important in scientific computing • Many iterative and recursive algorithms can be expressed naturally with tiles • Planned by programmer during program design • Some compilers have automatic tiling • Not always effective in tiling of complex algorithms

  6. Structure of HTA • Type of tiled array object • Tiles can be composed of additional nested tiles • Uses array operations to manipulate tiles directly • Parallel computations and interprocessor communications are represented by assignments between HTAs

  7. Creating an HTA • From an existing array (pictured) • Empty HTA of thesame size can becreated usinghta(3,3) • Empty HTAs must beassigned content tobe considered com-plete

  8. Accessing an HTA • Accessing Tiles • C{2,1} refers to lowerleft tile • Accessing Elements Directly • C(5,4) refers to specific element, disregarding the tile structure • Note parenthesis vs. brackets • Accessing Elements Relatively • C{2,1}{1,2}(1,2) and C{2,1}(1,4) refer to the same element as C(5,4)

  9. Accessing an HTA (cont.) • Regional Access (Flattening) • Disregards tiling, returns a standard array • C(1:2, 3:6) in example returns 2x4 matrix • Logical Indexing/Selection • Matrix of boolean valueswith same dimensions asthe HTA

  10. Operations on HTAs • Two types of operations • Communication Operations • Global Computations

  11. Communication Operations • Represented as assignments on distributed HTAs • Array Operations • HTA's provide overloaded array operations (circular shift, transpose, arithmetic operators, etc.) so that you can perform these operations at the tile level • Ex. Circular Shift will shift whole tiles instead of individual array elements

  12. Communication Operations (cont.) • Example: Cannon’s algorithm • Distributed algorithm for matrix multiplication • Uses circular shift to distribute the work • Normal implementation • shifts rows and columns • Performs the multiplication element by element • HTA implementation • Shifts tiles • Performs the multiplication as matrix multiplication of tiles • Each processor owns a tile • Advantages • Aggregates data into a tile for communication • Increased locality due to single matrix multiplication • Locality further increased if more levels of tiling are used

  13. Cannon’s Algorithm • Can use an alternative matrixmultiplication function toincrease locality

  14. Global Computations • Passing an HTA to a function/operation • Operates in parallel on a set of tiles from an HTA distributed across a parallel machine • parHTA(@func, H) • @func is a function pointer • Reduction • An operation applied to all the components of a vector to produce a scalar • Applied to all the components of a n-dimensional array to produce a n-1 dimensional array • Matrices have row reduction and column reduction • Ex. Matrix Vector multiplication

  15. Global Computations (cont.) • Ex. Matrix Vector multiplication • Matrix-vector multiplication takes place locally at each processor (the A*B part)

  16. Implementations of HTA • MATLAB • Advantages • Disadvantages • C++ • Allocation approach • Advantages over MATLAB • Comparison to existing paradigm (MPI)

  17. MATLAB implementation • Advantages • MATLAB's syntax for array access is convenient for representing HTA access (“Triplet Notation”) • Disadvantages • Not as efficient as C++'s implementation • Interpreted MATLAB limits efficiency due to immense overhead • Lots of temporary variables created to hold partial expression results • Parameters passed by value • Assignment statements also replicate data

  18. C++ Implementation • Allocation/construction of HTAs • HTAs represented as composite objects with methods to operate on HTAs • HTAs allocated on heap, return a handle/reference • All HTA access occurs through this handle • Uses reference counting for garbage collection • Advantages over MATLAB • Higher performance than MATLAB implementation • Memory layout is controllable by the user unlike MATLAB

  19. HTA vs. MPI (Message Passing Interface) • HTA is integrated into languages using operator overloading and polymorphism • Improved readability • HTA requires much less code • Communication takes place through assignment rather than • send and receive instructions • packing and unpacking data • checking boundary conditions • HTAs are partitioned and distributed through a single constructor • MPI has to compute • limits of data owned by each processor • Neighbors of a processor • Acitve set of processors, etc. • Communication is made more obvious

  20. HTA vs. MPI (cont.)

  21. Conclusions • Ideal for algorithms that can be naturally expressed in terms of blocks • Communication is less involved versus other parallel programming approaches • Slow and elegant in MATLAB, but fast and verbose in C++ • Due to lack of operator overloading in the C++ library • Good for architectures consisting of multiple independent processing cores • One example given by the authors is the CELL processor

  22. References Bikshandi, G. (2006, October 12). Hierarchically Tiled Arrays: A Programming paradigm for parallelism and locality. Urbana, IL, USA.:www.cs.uiuc.edu/class/fa06/cs498dp/notes/hta.pdf Bikshandi, Guo, & Hoeflinger. (2006, March 29). Programming for Parallelism and Locality with Hierarchically Tiled Arrays. New York, NY, USA.:https://wiki.engr.illinois.edu/download/attachments/195766122/p48-bikshandi.pdf Various. (2010, January 15). Programming for Parallelism and Locality with Hierarchically Tiled Arrays. Retrieved July 12, 2012, from CSU Computer Science Department Wiki: https://www.cs.colostate.edu/wiki/Programming_for_Parallelism_and_Locality_with_Hierarchically_Tiled_Arrays. Various. (2012, May 16). Locality of Reference. Retrieved July 12, 2012, from Wikipedia: http://en.wikipedia.org/wiki/Locality_of_reference

  23. Hierarchically Tiled Arrays Taylor Finnell Alex Schwartz

More Related