1 / 33

Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Based on the paper by :Jie Tang, Shaoshan Liu,Zhimin Gu,Chen Liu and Jean-Luc Gaoudiot,Fellow, IEEE Computer Architecture Letters Volume 10 Issue 1. Prefetching in Embedded Mobile Systems Can Be Energy Efficient. Overview . Introduction Motivation and Background Previous Work Methodology

Download Presentation

Prefetching in Embedded Mobile Systems Can Be Energy Efficient

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Based on the paper by :Jie Tang, Shaoshan Liu,Zhimin Gu,Chen Liu and Jean-Luc Gaoudiot,Fellow, IEEE Computer Architecture Letters Volume 10 Issue 1 Prefetching in Embedded Mobile Systems Can Be Energy Efficient

  2. Overview • Introduction • Motivation and Background • Previous Work • Methodology • Prefetcher Performance • Energy Efficiency • Energy Consumption Analysis • Energy Efficiency Model • Conclusion

  3. Introduction • Data prefetching, is the process of fetching data that is needed in the program in advance, before the instruction that requires it is executed. • It removes apparent memory latency. • Data prefetching has been a successful technique in modern high-performance computing platforms. • It was found however, that prefetching significantly increases power consumption.

  4. Introduction • Embedded Mobile Systems typically have constraints for space , cost and power. • This means that they can’t afford power consuming processes. • Hence, prefetching was considered unsuitable for Embedded Systems.

  5. Background • Embedded mobile systems have now come to be powered by powerful processors such as dual core processors like the Tegra2 by Nvidia • Smart phone applications include web browsing, multimedia, gaming, Webtop control all of which require a very high performance from the computing system.

  6. Background • To meet the this requirement, methods such as prefetching which were earlier shunned, can now be used. • With better, more power efficient technology, the energy consumption behavior may have also changed. • Due to this reason, we have decided to study and model the energy efficiency of different types of prefetchers.

  7. Previous Work • Over the years, the main bottleneck that was preventing the speeding up of systems, has been the slowness of memory and not the processor speed. • Prefetching date can be implemented in hardware by observing fetching patterns, such as prefetching the most recently used data first. • Sequential prefetching takes advantage of spatial locality in the memory. • Tagged prefetching associates a tag bit with every memory block and prefetches based on that value

  8. Previous Work • Stride-based prefetching detects the stride pattern in the address stream like fetches from different iterations from the same loop. • Stream prefetchers try to capture sequential nearby misses and prefetch an entire block at a time. • Correlated prefetchers issue prefetches based on the previously recorded correlations between addresses of cache misses.

  9. Previous Work • There had been some studies focusing on improving energy efficiency in hardware prefetching: PARE is one of these techniques, which constructs a power-aware hardware prefetching engine. • By categorizing memory accesses into different groups • It uses a table with indexed hardware history which is continuously updated and different memory fetches are categorized, and the prefetching decisions are based on the information in this table.

  10. Methodology: 1. Benchmarks • Modern embedded mobile systems execute a wide variety of workloads • The first set includes two XML data processing benchmarks taken from Xerces-C++ • They are implementing event-based parsing which is data centric (SAX) and tree-based parsing model which is document centric (DOM). Table 1 Benchmark Set

  11. Methodology: 1. Benchmarks • The second set is taken from MediaBench II which provides application level benchmarks, representing multimedia and entertainment workloads , based on the ISO JPEG-2000 and ISO JPEG-2000 standard. • It also has the H.264 Video Compression standards. • The third set is taken from the PARSEC(Princeton Application Repository for Shared-Memory Computers) benchmark for multithreaded processors which is used in many gaming applications.

  12. Methodology: 2. Hardware Prefetchers • Cache hierarchy indicates the level of cache that the prefetcher covers. • Prefetching degree shows whether the prefetching degree of the prefetcher is static or dynamically adjusted. • Trigger L1 and Trigger L2 respectively show what triggers the prefetch. • Table 2 Summary of Prefetchers

  13. Methodology: 3. Energy Modeling • To study the performance of the selected prefetchers, we use CMP$IM,a cache simulator, to model high-performance embedded systems. • It is a Pin based multi-core cache simulator • Simulation parameters are shown in Table 3, which resembles modern Smartphone and e-book systems • Table 3 Simulation Parameters

  14. Methodology: 3. Energy Modeling • To study the impact of prefetching on energy consumption of memory subsystem, we use CACTI to model energy parameter of different technology implementations. • In a simulator, a hardware prefetcher can be defined by a set of hardware tables, its output is in the form of tables of data, hence it’s energy consumption can be modeled.

  15. Prefetcher Performance • Prefetching techniques are effective on improving performance by more than 5% on average. In detail, the effectiveness of prefetchers depends on both prefetching technique itself and natures of applications. • P3 results in the best average performance because it’s the most aggressive prefetcher. • JPEG2000 decoding and encoding programs can receive up to 22% of performance improvement due to its streaming feature.

  16. Prefetcher Performance • Fig1 Performance Improvement

  17. Energy Efficiency • We study the energy efficiency of both 90 nm and 32 nm technologies. The results are summarized in Figures 2 and 3 respectively. • The baseline for comparison is energy consumption without any prefetcher, thus a positive number shows that with the prefetcher the system dissipates more energy. • For instance, 0.1 means that with the prefetcher, the system dissipates 10% more energy compared to baseline.

  18. Energy Efficiency • In 90nm technology, most prefetchers significantly increase overall energy consumption, which confirms the findings of previous studies. • Thus, in 90 nm technology, only very conservative prefetchers can be energy efficient.

  19. Energy Efficiency • Fig 2 90nm

  20. Energy Efficiency • Fig 3 32nm

  21. Energy Efficiency • In 32 nm technology, P4 is still the most energy efficient prefetcher, reducing overall energy by almost 4% on average; when running JPEG 2000 Decode, it achieves close to 10% energy saving. • P2 and P3 are still the most energy-inefficient prefetchers due to their aggressiveness. However, in the worst case they only consume 25% extra energy, a four-fold reduction compared to the 90 nm implementations. • Thus most prefetchers are able to provide performance gain with less than 5% energy overheads; and P1 and P4 even result in 2% to 5% energy reductions.

  22. Energy Consumption Analysis • In equation 1, the total energy consumption consists of two contributors: static energy (Estatic) and dynamic energy (Edynamic) • Nm is the number of read/write memory accesses • Edynamic = number of read/write accesses with the energy dissipated on the bus & memory subsystem of each access (E’m).

  23. Energy Consumption Analysis • (Estatic) is production of overall execution time (t) and the system static power consumption (Pstatic). • When prefetchers accelerate the process, the reduced execution timereduces the static energy consumption. • However, prefetchers generate significant amount of extra memory subsystem accesses leading to pure dynamic overheads. • Equation 1: E = Estatic+Edynamic= (Pstatic x t)+(Nm xE’m)

  24. Energy Consumption Analysis Table 4 Energy Category

  25. Energy Consumption Analysis Fig 4

  26. Energy Consumption Analysis • In 90 nm technology, dynamic energy contributes to up to 66% of the total energy consumption: 14% from the pre-fetcher and 52% from the memory subsystem. Static energy only accounts for 34% of the total energy consumption. • Hence, although the prefetchers are able to reduce execution time, there leaves little room for total energy saving, leading to energy inefficiency for most pre-fetchers in 90 nm implementations.

  27. Energy Consumption Analysis • In 32 nm technology, static energy contributes over 66% of the total energy consumption: 65% from the memory subsystem, and 1% from the prefetcher hardware. • Dynamic energy is far less compared to static. • 32 nm technology, prefetchers become energy-efficient in many different cases.

  28. Energy Efficiency Model • We propose an analytical model to evaluate efficiency. Equation 2: Eno-pref > Epref (?) • To simplify the model, we assume there is only one level in the memory subsystem. Compared to Eno-pref, Epref has two more contributors: static energy and dynamic energy consumption coming from prefetcher hardware. • Equation 3: Pm-static*t1+Nm1xE’m>Pm-staticxt2+Nm2xE’m+Pp-staticxt2+NpxE’p • Equation 4: (t1-t2)/t1> [(Nm2-Nm1)*E’m+Np*E’p+Pp-static*t2]/Pm-static*t1

  29. Energy Efficiency Model • The left-hand side shows the performance gain as a result of prefetching. • The dividend of right-hand side contains three terms: energy overhead incurred by the extra memory accesses ;dynamic energy; and static energy consumption. • The divisor of the right-hand side represents the static energy of the original design without prefetching

  30. Energy Efficiency Model • As summarized in Equation 5, if a prefetcher needs to be energy efficient, the performance gain (G) it brings must be greater than the ratio of the energy overhead (Eoverhead) it incurs over the original static energy (Eno-pref-static). Equation 5: G> Eoverhead/Eno-perf-static Equation 6 : EEI=G - Eoverhead/Eno-perf-static

  31. Energy Efficiency Model • We define a metric Energy Efficiency Indicator (EEI) in Equation 6. A positive EEI indicates the prefetcher is energy-efficient and vice versa. • We have validated the analytical results with the empirical results shown in table, thus indicating the simplicity and effectiveness of our analytical models.

  32. Conclusion • With a new trend in highly capable embedded mobile applications, it seems conducive to implement high-performance techniques-> PREFETCHING • They do not seem to put a burden on energy consumption and should thus be implement

  33. Conclusion • A simple analytical model has been demonstrated to estimate the effects of prefetching and to effectively calculate it. • System designers can estimate the energy efficiency of their hardware prefetcher designs and make changes accordingly.

More Related