220 likes | 300 Views
Explore AMPM prefetching method, parallel pattern matching, access map pattern detection, and its effectiveness in improving performance metrics like IPC and L2 cache misses. Learn about prefetcher optimizations and complexity-effective implementation.
E N D
Access Map Pattern Matching Prefetch:Optimization Friendly Method Yasuo Ishii1, Mary Inaba2, and Kei Hiraki2 1 NEC Corporation 2 The University of Tokyo
Background • Speed gap between processor and memory has been increased • To hide long memory latency, many techniques have beenproposed. • Importance of HW data prefetch has been increased • Many HW prefetchers have been proposed
Conventional Methods • Prefetchers uses • Instruction Address • Memory Access Order • Memory Address • Optimizations scrambles information • Out-of-Order memory access • Loop unrolling
Limitation of Stride Prefetch[Chen+95]Out-of-Order Memory Access ・・・ Memory Address Space for (int i=0; i<N; i++) { load A[2*i]; ・・・・・ (A) } 0xAAFF 0xAB00 Access 1 0xAB01 0xAB02 Access 2 Out of Order 0xAB03 0xAB04 Access 3 Tag Address Stride State 0xAB05 A 0xAB04 2 steady 0xAB06 Access 4 ・・・ Cannot detect strides Cache Line 0xABFF ・・・
Weakness of Conventional Methods • Out-of-Order Memory Access • Scrambles memory access order • Prefetcher cannot detect address correlations • Loop-Unrolling • Requires additional table entry • Each entry trained slowly Optimization friendly prefetcher is required
Access Map Pattern Matching • Pattern Matching • Order Free Prefetching • Optimization Friendly Prefetch • Access Map • Map-base history • 2-bit state map • Each state is attached to cache block
State Diagram for Each Cache Block Access Init Access Prefetch Pre- fetch Success Access • Init • Initialized state • Access • Already accessed • Prefetch • Issued Pref. Requests • Success • Accessed Pref. Data
Memory Access Pattern Map Memory Address Space ・・・ Zone Size ・・・ Memory Access Pattern Map A S ・・・ P I A I Cache Line ・・・ Pattern Match Logic • Corresponding to memory address space • Cache line granularity
Pattern Matching Logic Access Map Shifter I I I A I A A A I A A ・・・ ・・・ 0 0 Priority Encoder & Adder Memory Access Pattern Map Addr I I I A I A A A I A A Access Map Shifter ・・・ A I I I I A A A I A A ・・・ 0 1 1 Feedback Path Addr 1 0 +2 +3 +1 ・・・ Priority Encoder & Adder (Addr+2) Prefetch Request Access Map Shifter Pattern Detector Pipeline Register Prefetch Selector
Parallel Pattern Matching A I I A I A I I I A I A S I A ・・・ ・・・ Memory Access Pattern Map • Detects patterns from memory access map • Detects address correlations in parallel • Searches candidates effectively
AMPM Prefetch Memory Address Space Memory Access Map Table Zone Hot Zone Zone P S A ・・・ I ・・・ Zone Hot Zone P S A ・・・ I Prefetch Request Zone Pattern Match Logic Hot Zone Zone Access Zone • Memory address space divides into zone • Detects hot zone • Memory Access Map Table • LRU replacement • Pattern Matching
Features of AMPM Prefetcher • Pattern Matching Base Prefetching • Map base history • Optimization friendly prefetching • Parallel pattern matching • Searches candidates effectively • Complexity-effective implementation
Configuration for DPC Competition • AMPM Prefetcher • Full-assoc 52 maps, 256 states / map • Adaptive Stream Prefetcher [Hur+ 2006] • 16 Histograms, 8 Stream Length • MSHR Configuration • 16 entries for Demand Requests (Default) • 32 entries for Prefetch Requests (Additional)
Methodology • Simulation Environment • DPC Framework • Skips first 4000M instructions and evaluate following 100M instructions • Benchmark • SPEC CPU2006 benchmark suite • Compile Option: “-O3 -fomit-frame-pointer -funroll-all-loops”
IPC Measurement Improves performance by 53% Improves performance in all benchmarks
L2 Cache Miss Count Reduces L2 Cache Miss by 76%
Related Works • Sequence-base Prefetching • Sequential Prefetch [Smith+ 1978] • Stride Prefetching Table [Fu+ 1992] • Markov Predictor [Joseph+ 1997] • Global History Buffer [Nesbit+ 2004] • Adaptive Prefetching • AC/DC [Nesbit+ 2004] • Feedback Directed Prefetch [Srinath+ 2007] • Focus Prefetching[Manikantan+ 2008]
Conclusion • Access Map Pattern Matching Prefetch • Order-Free Prefetch • Optimization friendly prefetching • Parallel Pattern Matching • Complexity-effective implementation • Optimized AMPM realizes good performance • Improves IPC by 53% • Reduces L2 cache miss by 76%
Q & A Buffer Block Gindele1977 Sequential Smith+ 1978 Commercial Processors Software Adaptive Software Support Mowry+ 1992 SuperSPARC Stride Prefetch Fu+ 1992 Adaptive Seq. Dahlgren+ 1993 PA7200 HW/SW Integrate Gornish+ 1994 Spatial RPT Chen+ 1995 R10000 Markov Prefetch Joseph+ 1997 Hybrid Hsu+ 1998 Locality Detect Johnson+, 1998 Pentium 4 Tag Correlation Hu+ 2003 Hybrid Power4 AC/DC Nesbit+ 2004 GHB Nesbit+ 2004 Spatial Pat. Chen+ 2004 Sequence-Base (Order Sensitive) Adaptive Stream Hur+ 2006 SMS Somogyi 2006 FDP Srinath+ 2007 AMPM Prefetch Ishii+ 2009 Feedback based Honjo 2009