1 / 19

The Memory Behavior of Data Structures

The Memory Behavior of Data Structures. Kartik K. Agaram, Stephen W. Keckler, Calvin Lin, Kathryn McKinley Department of Computer Sciences The University of Texas at Austin. Memory hierarchy trends. Growing latency to main memory Growing cache complexity More cache levels

leegilbert
Download Presentation

The Memory Behavior of Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Memory Behaviorof Data Structures Kartik K. Agaram, Stephen W. Keckler, Calvin Lin, Kathryn McKinley Department of Computer Sciences The University of Texas at Austin

  2. Memory hierarchy trends • Growing latency to main memory • Growing cache complexity • More cache levels • New mechanisms, optimizations • Growing application complexity • Lots of abstraction Hard to predict how an application will perform on a specific system

  3. Application understanding is hard • Observations can generate Gigabytes of data • Aggregation is necessary • Current metrics are too lossy • Different application behaviors →similar miss-rate • New metrics needed, richer but still concise Our approach: data structure decomposition

  4. Why decompose by data structure? • Irregular app = multiple regular data structures • while (tmp) tmp=tmp->next; • Data structures are high-level • Results easy to visualize • Can be correlated back to application source code

  5. Outline • Data structure decomposition using DTrack • Automatic instrumentation + timing simulation • Methodology • Tools, configurations simulated, benchmarks studied • Results • Data structures causing the most misses • Different types of access patterns • Case study: data structure criticality

  6. Conventional simulation methodology • Simulated application shares resources with simulator • disk, file system, network • ..but is not aware of it Application Simulator Host Processor Resources

  7. A different perspective Application can communicate with simulator Leave core application oblivious; automatically add simulator-aware instrumentation Simulator Application Resources

  8. DTrack Application Sources Instrumented Sources Application Executable Detailed Statistics Source Translator Compiler Simulator - DTrack’s protocol for application-simulator communication

  9. Application stores mapping at a predetermined shared location • (start address, end address) → variable name • Application signals simulator somehow • We enhance ISA with new opcode • Other techniques possible • Simulator detects signal, reads shared location Simulator now knows variable names of address regions DTrack’s protocol • Application stores mapping at a predetermined shared location • (start address, end address) → variable name • Application signals simulator somehow • We enhance ISA with new opcode • Other techniques possible • Simulator detects signal, reads shared location

  10. DTrack instrumentation Global variables: just after initialization int globalTime ; int main () { … } Before: After: int Time ; int main () { print (FILE, “Time”, Time, sizeof(Time)); … asm (“mop”); }

  11. DTrack instrumentation Heap variables: just after allocation Before: x = malloc(4); After: x = malloc(4); DTRACK_PTR = x ; DTRACK_NAME = “x” ; DTRACK_SIZE = 4 ; asm(“mop”);

  12. Design decisions • Source-based rather than binary-based translation • Local variables – no instrumentation • Instrumenting every call/return is too much overhead • Doesn’t cause many cache misses anyway • Dynamic allocation on the stack: handle alloca just like malloc • Signalling opcode: overload an existing one • avoid modifying compiler, allow running natively

  13. Minimizing perturbance • Global variables are easy • One-time cost • Heap variables are hard • DTRACK_PTR, etc. always hit in the cache • Measuring perturbance • Communicate specific start and end points in application to simulator • Compare instruction counts between them with and without instrumentation Instruction count <4% even with frequent malloc Instrumentation can perturb app behavior • Minimizing perturbance • Global variables are easy • One-time cost • Heap variables are hard • DTRACK_PTR, etc. always hit in the cache • Measuring perturbance • Communicate specific start and end points in application to simulator • Compare instruction counts between them with and without instrumentation

  14. Outline • Data structure decomposition using DTrack • Automatic instrumentation + timing simulation • Methodology • Tools, configurations simulated, benchmarks studied • Results • Data structures causing the most misses • Different types of access patterns • Case study: data structure criticality

  15. Methodology • Source translator: C-Breeze • Compiler: Alpha GEM cc • Simulator: sim-alpha • Validated model of 21264 pipeline • Simulated machine: Alpha 21264 • 4-way issue, 64KB 3-cycle DL1 • Benchmarks: 12 C applications from SPEC CPU2000 suite

  16. Major data structures by DL1 misses % DL1 misses

  17. art art f1[i] f1[i] bu[i] bu[i] i=i+1 i=i+1 i=i+1 i=i+1 node child node parent node sibling node siblingp node child node parent node sibling node siblingp Large variety in access patterns mcf mcf node[i] node[i] node = DFS(node) node = DFS(node) i=i+1 i=i+1 twolf twolf t1 = b[c[i]cblock] t2 = t1tileterm t3 = n[t2net] … t1 = b[c[i]cblock] t2 = t1tileterm t3 = n[t2net] … i=rand() i=rand() Code + Data profile = Access pattern

  18. Most misses ≣ Most pipeline stalls? • Process: • Detect stall cycles when no instructions were committed • Assign blame to data structure of oldest instruction in pipeline • Result • Stall cycle ranks track miss count ranks • Exceptions: • tds in 179.art • search in 186.crafty

  19. Summary • Toolchain for mapping addresses to high-level data structure • Communicating information to simulator • Reveals new patterns about applications • Applications show wide variety of distributions • Within an application, data structures have a variety of access patterns • Misses not correlated to accesses or footprint • ..but they correlate well with data structure criticality

More Related