1 / 4

Software Prefetching

Software Prefetching. Reduzierung der Miss-Rate Erfordert Prefetch-Instruktionen Nicht blockierend Erzeugen keine Ausnahmen Nicht blockierende Caches Der Compiler fügt Prefetch-Instruktionen ein. Vorteil gegenüber Hardware Prefetching Auch unregelmäßige Zugriffe können verbessert werden.

naiya
Download Presentation

Software Prefetching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Prefetching • Reduzierung der Miss-Rate • Erfordert • Prefetch-Instruktionen • Nicht blockierend • Erzeugen keine Ausnahmen • Nicht blockierende Caches • Der Compiler fügt Prefetch-Instruktionen ein. • Vorteil gegenüber Hardware Prefetching • Auch unregelmäßige Zugriffe können verbessert werden. • Bei geringer Latenzzeit kann Unrolling verwendet werden. • Bei langer Latenzzeit wird Software Pipelining eingesetzt.

  2. Beispiel Software Prefetching • Cache • 8 KB direct-mapped, 16-byte cache lines • Write back with write allocate • Cache misses • Array A exploits spatial locality: 3*100/2=150 misses • Array B doesn‘t exploit spatial locality but twice temporal locality • One miss for i=0, j=0 via b[j][0] • 100 misses for i=0, j=0..99 via b[j+1][0] • Total of 251 cache misses real*8 a[3][100],b[101][3] for (i=0; i<3;i++) for (j=0;j<100;j++) a[i][j]=b[j][0]*b[j+1][0]

  3. Example with Prefetching • Ignoring • No prefetching for first accesses • Not suppressing prefetches at end of the loop real*8 a[3][100],b[101][3] for (j=0;j<100;j++) prefetch (b[j+8][0]); //b(j,0) for 7 iterations later prefetch (a[0][j+8]); //a(0,j) for 8 iterations later a[0][j]=b[j][0]*b[j+1][0]; for (i=1;i<3;i++) for (j=0;j<100;j++) prefetch (a[i][j+8]); //a(i,j) for 8 iterations later a[i][j]=b[j][0]*b[j+1][0] • Misses • 7 in first loop: b[0][0] … b[6][0] • 4 in first loop: a[0][0],…,a[0][6] • 2*4 in second loop: a[1][0],…,a[1][6] ,…, a[2][0],…,a[2][6]

  4. Example: Result • Instead of 251 only 19 cache misses • Costs: 400 prefetch instructions • Further optimization • Elimination of every second prefetch for array A due to spatial locality.

More Related