1 / 31

August 8 th , 2011 Kevan Thompson

Creating a Scalable Coherent L2 Cache. August 8 th , 2011 Kevan Thompson . Outline. Motivation Cache Background System Overview Methodology Progress Future Work. 2. Motivation. Goal Create a configurable shared Last Level Cache for the use in the PolyBlaze system. 3. Introduction.

Download Presentation

August 8 th , 2011 Kevan Thompson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating a Scalable Coherent L2 Cache August 8th, 2011 Kevan Thompson

  2. Outline • Motivation • Cache Background • System Overview • Methodology • Progress • Future Work 2

  3. Motivation Goal • Create a configurable shared Last Level Cache for the use in the PolyBlaze system 3

  4. Introduction Kevan Zia Eric 4

  5. Cache Background • In modern systems, processors out perform main memory, creating a bottleneck • This problem is only exacerbated as more cores contend for the memory • This problem is reduced if each processor maintains a local copy of the data 5

  6. Caches • A cache is a small amount of memory on the same die as the processor • The cache is capable of providing a lower latency and a higher throughput than the main memory • Systems may include multiple cache levels • The smallest and most local cache is the L1 cache. The next level cache is the L2, etc 6

  7. Shared Last Level Cache • Acts as a common location for data • Can be used to maintain cache coherency between processors • Does not exist in current MicroBlaze system • We will design our own shared L2 Cache to maintain cache coherency 7

  8. Cache Speeds • In typical systems: • An L1 cache is very fast (1 or 2 cycles ) • An L2 cache is slower (10’s of cycles) • Main memory is very slow (100’s of cycles) 8

  9. Cache Speeds • In our system we expect : • The L1 cache to be very fast (1 or 2 cycles ) • The L2 cache to be about (10 of cycles) • Main memory to be faster (10’s of cycles) • In order to model the memory bottleneck of a much faster system we’ll need to stall the Main Memory 9

  10. Direct Mapped Cache • Caches store Data, a Valid Bit and a unique identifier called a tag 10

  11. Tags • As an example imagine a system with the following : • 32-bit Address Bus, and 32-bit Word Size • 64-KByte Cache with 32-Byte Line Size • Therefore we have 2047 (211) Lines 11

  12. Set-Associated Cache A cache with n possible entries for each address is called an n-way set associated cache 4-Way Set Associated Cache 12

  13. Replacement Policies • When an entry needs to be evicted from the cache we need to decide which Way it is evicted from. • To do this we use a replacement policy • LRU • Clock • FIFO 13

  14. LRU • Keep track of when each entry is accessed • Always evict the Least Recently Used • Implemented using a stack Access 4 Access 2 MRU LRU 14

  15. Clock • For each Way we store a Reference Bit • Also store a pointed to the oldest entry (Hand) • Starting with the Hand we test and clear each R Bit until we reach one that is 0 1 0 1 0 1 0 0 0 1 2 3 15

  16. System Overview 16

  17. PolyBlaze L2 Cache • 1-16 Way Set Associated Cache • LRU or Clock Replacement Policy • 32 or 64 Byte Line Width • 64 Bit Memory Interface • Write Back Cache 17

  18. L2 Cache 18

  19. Reuse Policy • Determines which Way is evicted on Cache Miss • Currently uses LRU Policy 19

  20. Tag Bank • Contains Tags and Valid Bits • Stored on FPGA using BRAMs • Instantiate one bank for each Way 20

  21. Control Unit • Finite State Machine for L2 Cache Pipelining • If a request is outstanding from NPI we can service other requests in SRAM 21

  22. Data Bank • Control interface for off-chip SRAM 22

  23. SRAM • 32-bit ZBT synchronous SRAM • 1 MB 23

  24. Methodology • Break L2 cache into three parts and test separately then combine and test system • SRAM Controller • NPI Interface • L2 Core • Complete L2 Cache 24

  25. SRAM Controller • Create a wrapper that connects the SRAM controller to the MicroBlaze by an FSL • Write a program that will write and read data to all addresses in the SRAM • Write all 1’s • Write all 0’s • Alternate writing all 1’s and all 0’s • Write Random data √ √ √ √ 25

  26. NPI Interface • Uses a custom FSL width, so we cannot test using MicroBlaze • Create a hardware test bench to read and write data to all addresses • Write all 1’s • Write all 0’s • Alternate writing all 1’s and all 0’s • Write Random data X X X X 26

  27. L2 Core X • Simulate the core of the L2 cache in iSim • Write a test bench that will approximate the responses from the L1/L2 Arbiter, SRAM Controller, and NPI Interface • The test bench will write to each line multiple times to create a large number of cache misses X X 27

  28. Complete L2 Cache X • Combine the L2 Cache with the rest of PolyBlaze • Write test programs to read and write to various regions of memory X 28

  29. Current Progress • SRAM Controller and Data Bank: • Designed and Tested • NPI Interface: • Testing and Debugging in Progress • L2 Core: • Testing and Debugging in Progress 29

  30. Future Work • Add Clock Replacement Policy to L2 Cache • Add a Write Back Buffer to L2 Cache • Migrate System from XUPV5 to a BEE3 so we can create a system with more cores • Modify the L2 Cache into a NUMA system • Add Custom Hardware Accelerators to PolyBlaze 30

  31. Questions? 31

More Related