1 / 13

Out of Order SuperScalar

Out of Order SuperScalar. Ankit Sethia Daya Shanker Gaurav Chadha Kuldeep Singh. Basic Design. Out of Order (T3) 2 way SuperScalar Number of RS – 16 Number of ROB – 64 (tested for 8 as well). PRF entry – 64 ALU – 2 Multiplier - 2 System V erilog used for the design process.

helga
Download Presentation

Out of Order SuperScalar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Out of Order SuperScalar AnkitSethia DayaShanker GauravChadha Kuldeep Singh

  2. Basic Design • Out of Order (T3) • 2 way SuperScalar • Number of RS – 16 • Number of ROB – 64 (tested for 8 as well). • PRF entry – 64 • ALU – 2 • Multiplier - 2 • System Verilogused for the design process. • Helpful in designing. • We had just 5 synthesis runs.

  3. Advanced Features • 2 way superscalar • Instruction Prefetcher • Stride Prefetcher • RAS • Load Store Queue (4 loads, 4 stores) • BTB, Local Branch Predictor • Non-blocking D - Cache Attempted Features • Unconditional branch resolution in IF stage.

  4. LSQ • Out of order load launch. • After dependency resolution with preceding stores. • Forwarding of data from store queue to load structure. • Load structure is not a queue • Auxiliary load queue for outstanding loads

  5. DCache • Handles Hit under Miss and Miss under Miss. • Can support 16 outstanding load requests. • Highest priority to eviction, followed by current request, followed by outstanding misses. • Has the highest priority among requests to memory.

  6. Features Contd. • Heavy instruction Prefetching • 60 at the max • Varied a lot • BTB/ Branch Predictor • 2 bit local branch predictor

  7. Features Contd. • Unconditional branch resolution in IF-stage. • Calculate the next PC for br/bsr in the IF-stage • RAS • Lot of difficulties in implementing RAS

  8. Stride Prefetcher • A data prefetching mechanism, which prefetches data from stride based access pattern. • Can handle upto four loads. • Keeps a table of four non-stride loads that may be present. • 3rd highest priority among requests to memory.

  9. Results • Final clock period after synthesis 6.7 ns • All 33 benchmarks passed in simulation and synthesis • CPI varies from 0.59 – 5.00

  10. Results

  11. Interesting Bugs • In the I-cache the input address doesn’t change but the data has changed. so the fetching stops • Eviction during branch squash - reason same reset. • Speculative load with invalid address returns a continuous Nack from the cache-controller.

  12. Suggestions • System-Verilog was really helpful • Always_comb(no inferred latches ), always_ff • Don’t worry about wire, reg. Use logic type. • Structures, multiple dimensional arrays, literal assignment. • As queues are used a lot (ROB, LSQ, DCACHE, PREFETCHER). A robust queue could be given beforehand. • Faced problems with bottom up synthesis. This could be made as a tutorial section.

  13. Questions? Comments?

More Related