Data locality its optimization techniques
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

Data Locality & ITs Optimization Techniques PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Data Locality & ITs Optimization Techniques. Presented by Preethi Rajaram. CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012 . Why?. Processor Speed - increasing at a faster rate than the memory speed

Download Presentation

Data Locality & ITs Optimization Techniques

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data locality its optimization techniques

Data Locality & ITsOptimizationTechniques

Presented by

Preethi Rajaram

CSS 548 Introduction to Compilers

Professor Carol Zander

Fall 2012


Data locality its optimization techniques

Why?

  • Processor Speed - increasing at a faster rate than the memory speed

  • Computer Architectures -more levels of cache memory

  • Cache - takes advantage of data locality

  • Good Data Locality - good application performance

  • Poor Data Locality - reduces the effectiveness of the cache


Data locality

Data Locality

  • It is the property that, references to the same memory location or adjacent locations are reused within a short period of time

  • Temporal locality

  • Spatial locality

    Fig: Program to find the squares of the differences (a) without loop fusion (b) with loop fusion

    [Image from: The Dragon book 2ndedition]


Matrix multiplication example

Matrix Multiplication - Example

Fig: Basic Matrix Multiplication Algorithm

[Image from: The Dragon book 2ndedition]

  • Poor data locality

  • N2 multiply add operations separates the reuse of same data element in matrix Y

  • N operations separate the reuse of same cache line in Y

  • Solutions

    • Changing the layout of the data structures

    • Blocking


Matrix multiplication example contd

Matrix Multiplication – Example Contd…

  • Changing the data structure layout

    • Store Y in column-major order

    • Improves reuse of cache lines of matrix Y

    • Limited Applicability

  • Blocking

    • Changes the execution order of instructions

    • Divide the matrix into submatrices or blocks

    • Order the operations such that entire block is used over a short period of time

    • Choose B such that, one block from each of the matrices fits into cache

  • Image from: The Dragon book 2nd edition


Data reuse

Data Reuse

  • Locality Optimization

  • Identify set of iterations that access the same data or same cache line

  • Static Access- an instruction in a program e.g x = z[i,j]

  • Dynamic Access- execution of instruction many times as in a loop nest

  • Types of Reuse

    • Self

      • Iterations using same data come from same static access

    • Group

      • Iterations using same data come from different static access

    • Temporal

      • If the same exact location is referenced

    • Spatial

      • If the same cache line is referenced


Self temporal reuse

Self Temporal Reuse

  • Save substantial memory by exploiting self reuse

  • n(d-k) times reused for data with ‘k’ dimensions in a loop nest of depth ‘d’

    e.g. 3-deep nested loop accesses one column of an array, then there is a potential saving accesses of n2 accesses

  • Dimensionality of access- Rank of the matrix in access

  • Iterations referring to the same location – Null Space of a matrix

  • Rank of a Matrix

    • No. of rows or columns that are linearly independent

  • Null Space of a matrix

    • A reference in ‘d’ deep loop nest with ‘r’ rank, accesses O(nr) data elements in O(nd) iterations, so on an average, O(nd-r) iterations must refer to the same array element

Nullity = 3-2 = 1

Loop depth = 3

Rank = 2

Rank = Dimensionality = 2

2nd row = 1st + 3rd

4th row = 3rd – 2* 1st


Self spatial reuse

Self Spatial Reuse

  • Depends on data layout of the matrix – e.g. Row major order

  • In an array of ‘d’ dimension, array elements share a cache line if they differ only in the last dimension

    e.g. Two array elements share the same cache line if and only if they share the same row in a 2-D array

  • Truncated matrix is obtained by dropping of the last row from the matrix

  • If the resulting matrix has a rank ‘r’ that is less than depth ‘d’, we can assure for spatial reuse

Truncated Matrix, r = 1, d = 2

r<d, assures spatial reuse


Group reuse

Group Reuse

  • Group reuse only among accesses in a loop sharing the same coefficient matrix

    Fig: 2-deep loop nest

    [Image from: The Dragon book 2ndedition]

  • z[i,j] and z[i-1,j] access almost the same set of array elements

  • Data read by access z[i-1,j] is same as the data written by z[i,j], except for i = 1

Rank = 2, no self temporal reuseTruncated Matrix, Rank = 1,

self spatial reuse


Locality optimization

Locality Optimization

  • Temporal Locality of data

    Use the results as soon as they are generated

Fig: Code excerpt for a multigrid algorithm (a) before partition (b) after patition

[Image from: The Dragon book 2ndedition]


Locality optimization contd

Locality Optimization Contd…

  • Array Contraction

    Reduce the dimension of the array and reduce the number of memory locations accessed

Fig: Code excerpt for a multigrid algorithm after partition and after array contraction

Image from: The Dragon book 2nd edition


Locality optimization contd1

Locality Optimization Contd…

  • Instead of executing each partition one after the other; we interleave a number of the partitions so that reuse among partitions occur close together

  • Interleaving Inner Loops in a Parallel Loop

  • Interleaving Statements in a Parallel Loop

Fig: Interleaving four instances of the inner loop

[Image from: The Dragon book 2ndedition]

Fig: The statement interleaving transformation

[Image from: The Dragon book 2ndedition]


References

References

  • Wolf, Michael E., and Monica S. Lam. "A data locality optimizing algorithm." ACM Sigplan Notices 26.6 (1991): 30-44.

  • McKinley, Kathryn S., Steve Carr, and Chau-Wen Tseng. "Improving data locality with loop transformations." ACM Transactions on Programming Languages and Systems (TOPLAS) 18.4 (1996): 424-453.

  • Bodin, François, et al. "A quantitative algorithm for data locality optimization." Code Generation: Concepts, Tools, Techniques (1992): 119-145.

  • Kennedy, Ken, and Kathryn S. McKinley. "Optimizing for parallelism and data locality." Proceedings of the 6th international conference on Supercomputing. ACM, 1992.

  • Compilers ‐ Principles, Techniques, and Tools by A. Aho, M. Lam (2nd edition), R. Sethi, and J.Ullman, Addison‐Wesley.


Data locality its optimization techniques

Thank You!

Questions??


  • Login