1 / 33

Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

MOSAIC : . The Case for a Scalable Coherence Protocol for Complex On-Chip Cache Hierarchies in Many-Core Systems. Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain). Outline. Motivation Directory Schemas In-cache Sparse MOSAIC Coherence Protocol

lorna
Download Presentation

Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MOSAIC : The Case for a Scalable Coherence Protocol for Complex On-Chip Cache Hierarchies in Many-Core Systems Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

  2. Outline • Motivation • Directory Schemas • In-cache • Sparse • MOSAIC Coherence Protocol • Examples • Evaluation Results • Conclusions

  3. Motivation • Performance improvement: more processors per chip • Major challenges: off-chip bandwidth wall • Introduce cache into the chip • Complex on-chip cache hierarchies • Coherence protocol: fundamental role to play

  4. Motivation • What coherence protocol to use with large number of cores: • Broadcast-based protocols  high energy requirements • Directory-based protocols  more storage necessities for sharing information • MOSAIC: new coherence protocol • Directory without inclusiveness • Token Coherence to guarantee correctness

  5. Outline • Motivation • Directory Schemas • In-cache • Sparse • MOSAIC Coherence Protocol • Examples • Evaluation Results • Conclusions

  6. Directory schemas: In-cache • Each block in LLC includes tag, data and the sharers information • LLC receives requests  needs precise knowledge • Inclusiveness is necessary: any block in the private levels needs to be allocated in LLC • Advantage: coherence protocol less complex • Disadvantage: all LLC blocks has storage overhead

  7. Directory schemas: In-cache LLC + in-cache directory P P P P P P P P P P P P Interconnection network Processors and private caches Overhead!!!

  8. Directory schemas: In-cache LLC + in-cache directory P P P P P P P P Interconnection network Processors and private caches Overhead!!! Overhead!!!

  9. Directory schemas: Sparse • Directory entries separated from data • Allocated under demand • Overhead proportional to the aggregate private levels size (not LLC) • Capacity and associativity has to be sufficient to keep private-level cache tags

  10. Directory schemas: Sparse LLC Sparse dir P P P P P P P P Interconnection network Processors and private caches

  11. Directory schemas: Sparse • Duplicate-tag directory: holding all the tags of private levels • Example: 16 cores with 4-way 32KB L1 64-way Associativity = # cores * private caches associativity tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag # sets = # private caches sets tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag

  12. Directory schemas: Sparse Decrease Associativity: now << # cores * private caches associativity • One tag may be in various private caches • More than 1 tag per entry  conflicts • Inclusiveness needed  invalidate private data (recalls messages) sharers sharers tag sharers tag tag tag tag tag tag tag sharers tag tag tag tag tag sharers sharers tag sharers tag tag tag tag tag tag tag sharers tag tag tag tag tag sharers sharers tag tag sharers tag tag tag tag tag sharers tag tag tag tag tag tag sharers sharers tag sharers tag tag tag tag tag sharers tag tag tag tag tag tag tag sharers sharers tag sharers tag tag tag tag tag tag tag sharers tag tag tag tag tag Increase number of sets sharers sharers tag sharers tag tag tag tag tag tag sharers tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag tag

  13. Outline • Motivation • Directory Schemas • In-cache • Sparse • MOSAIC Coherence Protocol • Examples • Evaluation Results • Conclusions

  14. MOSAIC Protocol • In-cache or sparse  it doesn’t matter • No inclusiveness • No invalidations of data in private caches • Reconstruction of sharing information under demand • Uses token counting to avoid extra traffic and guarantee correctness • Token Coherence protocol: • Initially each block := # tokens (==#procs) • Read request: data and 1 token • Write request: data and all tokens

  15. MOSAIC Conceptual Approach 3 4 P0 P1 P2 Private Caches I 0 N/A O 2 DATA S 1 DATA 5 1 1 On-chip network 3 2 Last Level Cache Data_slice Dir_slice Memory Controller I V Sharers 2 I 0 N/A State Num. Tokens Data

  16. MOSAIC Key Facts • When data not present in LLC  broadcast for reconstruction • Private caches inform of num. of held tokens • Token counting avoids negative acknowledgements or timeouts • Reconstruction message piggybacks type of request and requestor • Key: directory may replace silently no invalidations

  17. MOSAIC Read Request P0 P1 P2 P3 Dir LLC 3 tokens 1 token Read Reconstruction Invalid State IS Data + token Info 1 token State S Sharers [P2] Owner: ¿? Info 2 tokens Owner • State O Unblock (info 1 token) Sharers [P2, P1] Owner: P1 • State C • State A Sharers [P2, P1, P0] Owner: P1 Read Forward GETS to Owner Data + token Unblock Sharers [P2, P1, P0, P3] Owner: P1

  18. MOSAIC Write Request P0 P1 P2 P3 Dir LLC 3 tokens 1 token Write Reconstruction Invalid State IS Data + 3 tokens 1 token State S • State O • State C Unblock (info all tokens) • State A Sharers [P0] Owner: P0 State IM State M Directory Eviction

  19. Outline • Motivation • Directory Schemas • In-cache • Sparse • MOSAIC Coherence Protocol • Examples • Evaluation Results • Conclusions

  20. Evaluation methodology Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 Slice 0 Slice 1 Slice 2 Slice 3 Slice 0 Slice 1 Slice 2 Slice 3 Slice 4 Slice 5 Slice 6 Slice 7 Slice 8 Slice 9 Core 4 Core 15 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Slice 4 Slice 5 Slice 6 Slice 7 Slice 10 Slice 11 Slice 12 Slice 13 Slice 14 Slice 15 Core 5 Core 14 Slice 8 Slice 9 Slice 10 Slice 11 Slice 16 Slice 17 Slice 18 Slice 19 Slice 20 Slice 21 Core 6 Core 13 Slice 22 Slice 23 Slice 24 Slice 25 Slice 26 Slice 27 Slice 12 Slice 13 Slice 14 Slice 15 Core 12 Core 7 Slice 28 Slice 29 Slice 30 Slice 31 Core 4 Core 5 Core 6 Core 7 Core 9 Core 8 Core 11 Core 10

  21. Simulation stack and Workloads • GEMS: full-system evaluation • SLICC: Specification Language for Implementing Cache Coherence

  22. MOSAIC PerformanceReducing associativity Normalized execution time 128KB  16K entries (8 bytes per entry)

  23. Number of misses x2 Normalized num. misses

  24. MOSAIC Performance Reducing associativity and capacity Normalized execution time 128KB  16K entries (8 bytes per entry) 16KB  2K entries

  25. MOSAIC Latency 16KB  2K entries

  26. MOSAIC Link Utilization Average network link utilization

  27. MOSAIC Link Utilization vs. Dir 40%!!

  28. MOSAIC Scalability • 16 cores configuration Normalized link utilization

  29. Conclusions • Low complexity and great scalability • Very low storage overhead • No noticeable energy cost • Alternative for future many-core cache coherent CMPs • Bandwidth scalability of a directory Elegancy of Token Coherence • MOSAIC Coherence Protocol

  30. Thank you for your attention

  31. Realistic Cache Configuration L1: 4-way 32KB / L2: 8-way 256KB x2 full dir 1/10 full dir Normalized execution time - Same experiment with BASE: 20% impact in some cases

  32. MOSAIC Energy Normalized Dynamic Energy

More Related