1 / 19

Tile Size Selection Using Cache Organization and Data Layout

Tile Size Selection Using Cache Organization and Data Layout. Stephanie Coleman Intermetrics, Inc. Kathryn S. M c Kinley Computer Science, LGRC, University of Massachusetts Amherst 10/27/01. Where to Use Tiling/Blocking?. Register TLB L1 cache L2 cache any other memory hierarchy.

regina
Download Presentation

Tile Size Selection Using Cache Organization and Data Layout

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tile Size Selection Using Cache Organization and Data Layout Stephanie Coleman Intermetrics, Inc. Kathryn S. M c Kinley Computer Science, LGRC, University of Massachusetts Amherst 10/27/01

  2. Where to Use Tiling/Blocking? • Register • TLB • L1 cache • L2 cache • any other memory hierarchy

  3. CacheMisses • Compulsory misses • Capacity misses • Interference misses • Self-interference • Cross-interference

  4. Data Reuse and locality • Data reuse • Temporal reuse • Spatial reuse • Locality: reused data remain in cache • Reuse does not necessarily result in locality

  5. Without Tiling • Matrix Multiply for I=1 to N do for K=1 to N do R=X(K,I) for J=1 to N do Z(J,I)=Z(J,I)+R*Y(J,K)

  6. Reuse Pattern without tiling

  7. Reuse Pattern after tiling

  8. After tiling (tile size=TK* TJ) for KK=1 to N by TK do for JJ=1 to N by TJ do for I=1 to N do for K=KK to MIN(KK+TK-1,N) do R=X(K,I) for J=JJ to MIN(JJ+TJ-1,N) do Z(J,I)=Z(J,I)+R*Y(J,K)

  9. General Formula for tiling • Before tiling: for I= lo to hi do • Tiled into: for It=floor((lo-off)/ts)*ts+off to floor((hi-off)/ts)*ts+off by ts do for I=max(lo, It) to min(hi, It+ts-1) (off: offset ts: tile size)

  10. Loop Interchange • Interchange an innter tile loop with an outer element loop: for I=max(l1,l2,..) to min(u1,u2,…) do for Jt=floor((k1*I+m1)/ts)*ts+off to floor((ku*I+mu)/ts)*ts+off by ts do • The limit for the I loop: do not change; • The new lower/upper limit for Jt loop will be the max of a set of expressions,where each expression is its old limit with I replaced by one of l1,l2,…(if k1>0) , or u1,u2,…(if k1<0).

  11. Tile Size Selection

  12. Tile Size selection Cache layout with a tile size of 24

  13. Potential column dimensions • Euclidean algorithm • G.C.D(a,b)=G.C.D(a-b,b) CS= q1*N+r1 N = q2*r1+r2 r1 = q3*r2+r3 … 1024 = 5* 200 + 24 200 = 8*24 + 8 Potential column dimensions: 24, 8.

  14. Computing row size for a column size

  15. Improve Spatial Locality with Cache Line Size colSize if colSize mod CLS =0, or if colSize=column length colSize= floor(colSize/CLS)*CLSotherwise

  16. Minimize Cross Interference • Working set size constraint: TJ*TK+TJ+1*CLS<CS

  17. Tile Size Selection Algorithm(TSS)

  18. Other Algorithm for Computing Tile Size • LRW • improves the average cache performance • sensitive to the array size • ineffective cache utilization • ESS • effective only for one-dimensional tiling • no consideration on cross-interference

  19. Conclusion • TSS incorporate the effect of cache line size and cross-interference between arrays • Performs better on direct-mapped caches and higher associative caches than ESS and LRW • sensitive to array dimension • not fully exploit temporal reuse for some matrix sizes

More Related