trace caches
Download
Skip this Video
Download Presentation
Trace Caches

Loading in 2 Seconds...

play fullscreen
1 / 26

Trace Caches - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

Trace Caches. Michele Co CS 451. Motivation. High performance superscalar processors High instruction throughput Exploit ILP Wider dispatch and issue paths Execution units designed for high parallelism Many functional units Large issue buffers Many physical registers

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Trace Caches' - saxton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
trace caches

Trace Caches

Michele Co

CS 451

motivation
Motivation
  • High performance superscalar processors
    • High instruction throughput
    • Exploit ILP
      • Wider dispatch and issue paths
    • Execution units designed for high parallelism
      • Many functional units
      • Large issue buffers
      • Many physical registers
  • Fetch bandwidth becomes performance bottleneck
fetch performance limiters
Fetch Performance Limiters
  • Cache hit rate
  • Branch prediction accuracy
  • Branch throughput
    • Need to predict more than one branch per cycle
  • Non-contiguous instruction alignment
  • Fetch unit latency
problems with traditional instruction cache
Problems with Traditional Instruction Cache
  • Contain instructions in compiled order
    • Works well for sequential code with little branching, or code with large basic blocks
suggested solutions
Suggested Solutions
  • Multiple branch target address prediction
    • Branch address cache (1993, Yeh, Marr, Patt)
      • Provides quick access to multiple target addresses
      • Disadvantages
        • Complex alignment network, additional latency
suggested solutions cont d
Suggested Solutions (cont’d)
  • Collapsing buffer
    • Multiple accesses to btb (1995, Conte, Mills, Menezes, Patel)
      • Allows fetching non-adjacent cache lines
      • Disadvantages
        • Bank conflicts
        • Poor scalability for interblock branches
        • Significant logic added before and after instruction cache
  • Fill unit
    • Caches RISC-like instructions derived from CISC instruction stream
    • (1988, Melvin, Shebanow, Patt)
problems with prior approaches
Problems with Prior Approaches
  • Need to generate pointers for all noncontiguous instruction blocks BEFORE fetching can begin
    • Extra stages, additional latency
    • Complex alignment network necessary
  • Multiple simultaneous access to instruction cache
    • Multiporting is expensive
  • Sequencing
    • Additional stages, additional latency
potential solution trace cache
Potential Solution – Trace Cache
  • Rotenberg, Bennett, Smith (1996)
  • Advantages
    • Caches dynamic instruction sequences
      • Fetches past multiple branches
    • No additional fetch unit latency
  • Disadvantages
    • Redundant instruction storage
      • Between trace cache and instruction cache
      • Within trace cache
trace cache details
Trace Cache Details
  • Trace
    • Sequence of instructions potentially containing branches and their targets
    • Terminate on branches with indeterminate number of targets
      • Returns, indirect jumps, traps
  • Trace identifier
    • Start address + branch outcomes
  • Trace cache line
    • Valid bit
    • Tag
    • Branch flags
    • Branch mask
    • Trace fall-through address
    • Trace target address
next trace prediction ntp
Next Trace Prediction (NTP)
  • History register
  • Correlating table
    • Complex history indexing
  • Secondary Table
    • Indexed by most recently committed trace ID
  • Index generating function
trace cache optimizations
Trace Cache Optimizations
  • Performance
    • Partial matching [Friendly, Patel, Patt (1997)]
    • Inactive issue [Friendly, Patel, Patt (1997)]
    • Trace preconstruction [Jacobson, Smith (2000)]
  • Power
    • Sequential access trace cache [Hu, et al., (2002)]
    • Dynamic direction prediction based trace cache [Hu, et al., (2003)]
    • Micro-operation cache [Solomon, et al., 2003]
trace processors
Trace Processors
  • Trace Processor Architecture
    • Processing elements (PE)
      • Trace-sized instruction buffer
      • Multiple dedicated functional units
      • Local register file
      • Copy of global register file
    • Use hierarchy to distribute execution resources
  • Addresses superscalar processor issues
    • Complexity
      • Simplified multiple branch prediction (next trace prediction)
      • Elimination of local dependence checking (local register file)
      • Decentralized instruction issue and result bypass logic
    • Architectural limitations
      • Reduced bandwidth pressure on global register file (local register files)
trace cache variations
Trace Cache Variations
  • Block-based trace cache (BBTC)
    • Black, Rychlik, Shen (1999)
    • Less storage capacity needed
bbtc optimization
BBTC Optimization
  • Completion time multiple branch prediction (Rakvic, et al., 2000)
    • Improvement over trace table predictions
trace cache variations cont d
Trace Cache Variations (cont’d)
  • Software trace cache
    • Ramirez, Larriba-Pey, Navarro, Torrellas (1999)
    • Profile-directed code reordering to maximize sequentiality
      • Convert taken branches to not-taken
      • Move unused basic blocks out of execution path
      • Inline frequent basic blocks
      • Map most popular traces to reserved area of i-cache
ad