Software Prefetching

Software Prefetching • Reduzierung der Miss-Rate • Erfordert • Prefetch-Instruktionen • Nicht blockierend • Erzeugen keine Ausnahmen • Nicht blockierende Caches • Der Compiler fügt Prefetch-Instruktionen ein. • Vorteil gegenüber Hardware Prefetching • Auch unregelmäßige Zugriffe können verbessert werden. • Bei geringer Latenzzeit kann Unrolling verwendet werden. • Bei langer Latenzzeit wird Software Pipelining eingesetzt.

Beispiel Software Prefetching • Cache • 8 KB direct-mapped, 16-byte cache lines • Write back with write allocate • Cache misses • Array A exploits spatial locality: 3*100/2=150 misses • Array B doesn‘t exploit spatial locality but twice temporal locality • One miss for i=0, j=0 via b[j][0] • 100 misses for i=0, j=0..99 via b[j+1][0] • Total of 251 cache misses real*8 a[3][100],b[101][3] for (i=0; i<3;i++) for (j=0;j<100;j++) a[i][j]=b[j][0]*b[j+1][0]

Example with Prefetching • Ignoring • No prefetching for first accesses • Not suppressing prefetches at end of the loop real*8 a[3][100],b[101][3] for (j=0;j<100;j++) prefetch (b[j+8][0]); //b(j,0) for 7 iterations later prefetch (a[0][j+8]); //a(0,j) for 8 iterations later a[0][j]=b[j][0]*b[j+1][0]; for (i=1;i<3;i++) for (j=0;j<100;j++) prefetch (a[i][j+8]); //a(i,j) for 8 iterations later a[i][j]=b[j][0]*b[j+1][0] • Misses • 7 in first loop: b[0][0] … b[6][0] • 4 in first loop: a[0][0],…,a[0][6] • 2*4 in second loop: a[1][0],…,a[1][6] ,…, a[2][0],…,a[2][6]

Example: Result • Instead of 251 only 19 cache misses • Costs: 400 prefetch instructions • Further optimization • Elimination of every second prefetch for array A due to spatial locality.

Software Prefetching

Software Prefetching

Presentation Transcript

Low-Cost Adaptive Data Prefetching

Prefetching for RC

Prefetching

Informed Prefetching and Caching

Informed Mobile Prefetching

Far Fetched Prefetching?

CS7810 Prefetching

The 1st JILP Data Prefetching Championship (DPC-1) Enhancement for Accurate Stream Prefetching

Fetch Directed Prefetching - a Study

Power-Aware Hardware Prefetching

Decoupled Architecture for Data Prefetching

Proposed Prefetching Methods

Energy Efficient Prefetching and Caching

Web Prefetching

Application-level Prefetching

Improving Index Performance through Prefetching

Prefetching Techniques

Feedback Directed Prefetching

Energy Efficient Prefetching and Caching

Precomputation-based Prefetching

Fetch Directed Prefetching - a Study