1 / 35

EFetch : Optimizing Instruction Fetch for Event-Driven Web Applications

EFetch : Optimizing Instruction Fetch for Event-Driven Web Applications. Gaurav Chadha , Scott Mahlke , Satish Narayanasamy University of Michigan August, 2014. University of Michigan Electrical Engineering and Computer Science. Evolution of the Web. Web 1.0. Web 2.0. server.

jadon
Download Presentation

EFetch : Optimizing Instruction Fetch for Event-Driven Web Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, SatishNarayanasamy University of Michigan August, 2014 University of Michigan Electrical Engineering and Computer Science

  2. Evolution of the Web Web 1.0 Web 2.0 server published content user generated content published content user generated content client • Static Web Pages • Passively view content • Dynamic Web Pages • Collaborate and generate content

  3. Evolution of Web Web 1.0 Web 2.0 server compute compute published content user generated content published content user generated content compute client • Rich user experience

  4. Evolution of the Web Web 1.0 Web 2.0 30x more instructions executed Good client-side performance Rich User Experience Browser responsiveness yahoo.com in 2014 yahoo.com in 1996

  5. Core Specialization Core 2 Core 3 Core 4 Core 1 Private Caches Private Caches Private Caches Private Caches Multi-core processor Core 2 Core 3 Core 4 Core 1 Private Caches Private Caches Private Caches Private Caches

  6. Web Core Core 2 Core 3 Core 4 Core 1 Private Caches Private Caches Private Caches Private Caches Multi-core processor Core 2 Core 3 Core 4 Core 1 Web Core WebBoost Private Caches Private Caches Private Caches Private Caches

  7. WebBoost 1.0 Web browser computational components Other Web 1.0 Web 2.0 Web client-side script performance Browser responsiveness Script performance: High L1-I cache misses Goal: Specialized instruction prefetcher for web client-side script

  8. Poor I-Cache Performance • Web pages tend to support numerous functionalities • Large instruction footprint • Lack hot code graphics effects image editing online forms document editing web personalization games audio & video • Web client-side script inefficiencies : code bloat • JIT compiled by JS engine • Dynamic typing V8 IonMonkey Nitro Chakra

  9. Lack of Hot Code 860 20,400 95%

  10. Poor I-Cache Performance • Compared to conventional programs, JS code incurs many more L1-I misses • Perfect I-Cache: 53% speedup

  11. Problem Statement • Problem: Poor web client-side script I-Cache performance • Opportunity: Web client-side scripts are executed in an event-driven model • Solution: • Specialized prefetcherthat is customized for event-driven execution model • Identifies distinct events in the instruction stream

  12. Outline

  13. Web Browser Events Mouse Click External Input Event On Load Internal Browser Event

  14. Event-driven Web Applications Popping an event for execution Executes on JS Engine Event Queue • Poor I-Cache performance • Different events tend to execute different code • Events typically execute for a very short duration E2 E3 E1 Events inserted in to the queue Head Events generate other events Internal Events Event Queue empty - Program waits External Input Events • Timer event • DOMContentLoaded Mouse Click Keyboard key press GPS events Renderer Thread

  15. EFetch • Event Fetch - Instruction Prefetcher for event-driven web applications • Technique: • Uses an event ID to identify distinct events in the instruction stream • Event ID is augmented to create an event signature that predicts control flow well E1 E2 E3 Event ID Renderer Thread

  16. Event Signature Event Handler Event Type E1 • Formed by the browser • Uniquely identifies an event Event ID E2 Event Signature Correlates well the program control flow E3 Formed in the hardware from context depth (3)ancestor functions in the Call Stack Function Call Context Renderer Thread

  17. Instruction Prefetcher: Facets Instruction Prefetcher Whatto prefetch? Whento prefetch?

  18. What to Prefetch? • Naïve solution: On a function call, prefetch the function body • But, this is too late • Our approach: On a function call, predict its callees and prefetch their function body addresses c1: <I-Cache Addr> c2: <I-Cache Addr> c3: <I-Cache Addr> event ID Event Signature ci-callee

  19. Duplication of Addresses • A function can appear in two distinct event signatures • Its body addresses might be duplicated I-Cache addresses callee event h < A, B, C > g f h < A, C, D > h f g event event

  20. Compacting I-Cache Addresses h < A, B, C > < A, B, C, D > < A, C, D > h callee bit vector g g f f ( 1, 1, 1, 0 ) ( 1, 0, 1, 1 ) h h < A, B, C, D > f g event event

  21. Recording Callees and Function Bodies c1 bit vector < A, B, C, D > c2 bit vector c2 bit vector FunctionTable callee Context Table event signature

  22. Instruction Prefetcher: Facets Instruction Prefetcher Whatto prefetch? Whento prefetch?

  23. When to Prefetch? • When?: Important to prefetch sufficiently in advance, but not too early • Goal: Prefetch the next predicted function • Able to hide LLC hit latency • Typically sufficient due to low instruction miss rate in LLC • Our Design: Keep track of a speculative call stack – Predictor Stack

  24. Predictor Stack • Maintains the call stack as predicted by the prefetcher • Helps prefetch the next function predicted to be called Function Prefetched f call return return call call return h i h h i h i i f f Call Stack Predictor Stack

  25. Architecture Function Table Context Table Predicted callees, addresses b1 b2 EA Function Call Context Event Signature X d ci bv bv Event-ID Prefetch Queue Call Stack Predictor Stack

  26. Methodology • Instrumented open source browser – Chromium • It uses the V8 JS engine shared with Google Chrome • Browsing sessions of popular websites were studied • Their instruction traces were simulated with Sniper Sim • Our focus was on JS code execution, which was simulated

  27. Architectural Details • Modeled after Samsung Exynos5250 • Core: 4-wide OoO, 1.66 GHz • L1-(I,D) Cache: 32 KB, 2-way • L2 Cache: 2 MB, 16-way • Energy Modeling: Vdd= 1.2 V, 45 nm

  28. Related Work • We compare EFetch with the following designs: • L1I-64KB: Hardware overhead of EFetch provisioned towards extra L1-I cache capacity – 64 KB • N2L: Next-2 line prefetcher • CGP: Call Graph Prefetching • PIF: Proactive Instruction Fetch • RDIP: Return address stack Directed Instruction Prefetching Annavaram, et. al. HPCA ‘01 Ferdman, et. al. MICRO ‘11 Kolli, et. al. MICRO ‘13

  29. Prefetcher Efficacy

  30. Performance

  31. Energy Consumption • Prefetching hardware structures consume little energy • Ranging from 0.01% of the total energy consumed for EFetch to 1.06% for PIF • Erroneous prefetches consume significant fraction of energy

  32. Energy, Performance, Area CGP N2L PIF Energy RDIP EFetch Performance

  33. Conclusion • Web 2.0 places greater demands on client-side computing • I-Cache performance is poor for web client-side script execution • EFetch exploits the event-driven nature of web client-side script execution • It achieves 29% performance improvement over no prefetching

  34. EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications Gaurav Chadha, Scott Mahlke, SatishNarayanasamy University of Michigan August, 2014 University of Michigan Electrical Engineering and Computer Science

  35. Performance Potential Perfect I-Cache: 53% speedup

More Related