1 / 9

Memory Consistency in Vector IRAM

Memory Consistency in Vector IRAM. David Martin. The Memory Consistency Model. Consistency model applies to instructions in a single instruction stream (different than multi-processor consistency!). a = after V = vector R = read VP = virtual processor

tonya
Download Presentation

Memory Consistency in Vector IRAM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Consistencyin Vector IRAM David Martin

  2. The Memory Consistency Model • Consistency model applies to instructions in a single instruction stream (different than multi-processor consistency!). a = after V = vector R = read VP = virtual processor W = write * = no sync required S = scalar + = sync required • Definition of a “XaY” sync: • All operations of type Y occurring before the sync in program order appear to execute before any operation of type X occurring after the sync in program order. • Definition of a “XaY” sync to vector register $vri: • The most recent operation of type Y to $vri appears to execute before any operation of type X occurring after the sync in program order.

  3. Why Relax Memory Consistency? • Natural micro-architecture has multiple paths to memory • Want to decouple scalar and vector units without complex hardware Fetch Scalar Core Sync Vector Unit Memory • Trade-off between more complex hardware (speculation, disambiguation, cache coherence) and more complex software (sync instructions) • Should explore solutions to this trade-off that involve more hardware: e.g. Hardware guarantees SaV and VaS ordering, but leaves VaV and VP orderings to software.

  4. Software Conventions for Syncs • Vector code is responsible for not messing things up. • Allows us to vectorize libraries to speed up existing programs. • Don’t want to assume that our compiler will compile and globally optimize all non-vector code that we run. • Alternative model: Pass around flags to communicate sync requirements or history • Must assume that our compiler compiles all code run on IRAM. • Not sure we want to accept that restriction. Vector Function Conventions: 1. Execute VaS and VaV syncs on entry to vector code. 2. Execute SaV sync on exit from vector code. VaS,VaV Scalar Code Vector Code SaV

  5. Sync Implementations and Costs • SaV : Stall fetch unit until vector unit has committed all vector memory instructions. • Could take 1000s of cycles with many indexed vector memory operations in flight! • Very difficult to delay issue since it is often issued at the end of a vector routine. • VaS : Stall fetch unit until scalar unit has committed all scalar memory instructions. • Not too expensive (10s of cycles?) because scalar unit is ahead of the vector unit, because the scalar core is simple, and because the data cache is write-thru. • Easy to delay issue because it is often issued at the start of a vector routine. • VaV and VPaVP: No operation. • Nop because we have 1 vector memory unit and no vector caches.

  6. Current Sync Analysis Tool • Executes a program and tells you: 1. Whenever two memory references are not: • Ordered by architectural guarantees • Ordered by register dependencies • Ordered by an intervening sync instruction 2. Whenever a sync instruction is not used to resolve any hazard, as described in (1). • Caveats: • Hazards are detected from a single program execution: Information may not hold true for all possible executions of the program. • Hazard detection is conservative in the presence of synchronization chains. Two Examples of Synchronization Chains Write(A) <- r1 RAW SYNC Read(A) <- r2 WAR SYNC Write(A) <- r3 Write(A) <- r1 RAW SYNC Read(A) <- r2 Write(A) <- r2 Hazard? Hazard?

  7. Optimizing Code • Basic problem: • Vector unit requires setup: VL, VPW, mask, exceptions • Vector code responsible for issuing syncs • Both of these are required in a vector routine if nothing is known about the calling context! • All solutions share the notion of giving control of the calling context to the compiler. Two options: (1) Pass around flags so that syncs and setup code can be avoided at run-time (2) Do global optimizations so that syncs and setup code can be eliminated at compile-time • . • . • . • Scalar code • Vector setup • VaS and VaV sync • Vector function • SaV sync • Scalar code • Vector setup • VaS and VaV sync • Vector function • SaV sync • Scalar code • . • . • .

  8. Optimization Example • Demonstrates potential benefit from optimizing scalar-vector communication • Code computes A+B+C+D+E+F in the following manner: A B C D E F • Unoptimized code calls a general vector add routine 5 times • First optimization inlines the 5 routines and removes vector initialization sequences • Second optimization also removes unnecessary sync instructions + + + + + • Optimization goal is to avoid “sawtooth” in instantaneous performance graphs caused by draining the vector pipelines between vector loops

  9. Large optimization potential for short vector loops. • SaV syncs are most important to eliminate or delay. • VaS sync performance impact is unclear. • VaV syncs are virtually free in VIRAM-1. • Setup code is expensive. For this example, it is as expensive as the SaV syncs.

More Related