Hiding synchronization delays in a gals processor microarchitecture
Download
1 / 13

Hiding Synchronization Delays in a GALS Processor Microarchitecture - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

Hiding Synchronization Delays in a GALS Processor Microarchitecture. Greg Semeraro David H. Albonesi Grigorios Magklis Michael L. Scott Steven G. Dropsho Sandhya Dwarkadas. Why GALS?. Simplified clock distribution network Reduced clock power dissipation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Hiding Synchronization Delays in a GALS Processor Microarchitecture' - padma


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Hiding synchronization delays in a gals processor microarchitecture

Hiding Synchronization Delays in a GALS Processor Microarchitecture

Greg Semeraro

David H. Albonesi

Grigorios Magklis

Michael L. Scott

Steven G. Dropsho

Sandhya Dwarkadas


Why gals
Why GALS? Microarchitecture

  • Simplified clock distribution network

  • Reduced clock power dissipation

  • Allows modular design of the processor

  • Can run each domain at optimal frequency

  • Can use conventional design and testing methods

  • Fine-grained DVS/DFS

ASYNC 2004 - University of Rochester


But there is a cost
But there is a cost… Microarchitecture

  • Inter-domain synchronization can hurt performance

  • Synchronization circuit costs in area and power

  • We have to be careful how we divide the processor

ASYNC 2004 - University of Rochester


The mcd microprocessor

Integer Microarchitecture

int.

register

file

int.

FUs

CPU

IIQ

Main

Memory

Frontend

ROB

Memory

L2

unified

cache

L1

data

cache

fetch

IFQ

dispatch

LSQ

L1

instr.

cache

branch

predict

rename

Floating Pt

fp.

register

file

fp.

FUs

FIQ

The MCD Microprocessor

ASYNC 2004 - University of Rochester


Inter domain synchronization
Inter-domain Synchronization Microarchitecture

  • Queue design based on Chelcea and Nowick (WVLSI ’00)

    • Modified for Issue Queue configuration

  • Synchronization circuit based on Nyström and Martin (WCED ’02)

    • Converted to single-rail logic

  • Timing analysis based on Sjogren and Myers (ARVLSI ’97)

    • Skip a cycle rather than pause the clock

ASYNC 2004 - University of Rochester


Synchronization via queues

FIFO Queue Microarchitecture

Issue Queue

Synchronization via Queues

ASYNC 2004 - University of Rochester


Timing analysis

1 Microarchitecture

4

CLK1

2

3

CLK2

T

Timing Analysis

  • Source runs with CLK1, destination with CLK2

  • Source writes at edge 1

  • If T > Ts then the data can be used at edge 2

  • If T < Ts then the data can be used at edge 3

  • 25% < Ts < 35%

ASYNC 2004 - University of Rochester


Simulation methodology
Simulation Methodology Microarchitecture

  • Two processor pipelines

    • Alpha 21264

    • StrongARM SA-1110

  • Synchronization penalty was measured against an identical synchronous design

  • 30 benchmarks

    • MediaBench, Olden, SPEC 2000

ASYNC 2004 - University of Rochester


Simulation methodology1
Simulation Methodology Microarchitecture

  • Simplescalar + Wattch + MCD

  • Independent clock for each domain

    • Independent jitter for each domain

    • Next edge based on period, last edge, jitter

  • When source and destination clocks are too close, one cycle penalty is assessed

ASYNC 2004 - University of Rochester


Synchronization analysis
Synchronization Analysis Microarchitecture

  • OoO and superscalar capabilities removed from Alpha

ASYNC 2004 - University of Rochester


Synchronization analysis1
Synchronization Analysis Microarchitecture

  • OoO and superscalar capabilities added to StrongARM

ASYNC 2004 - University of Rochester


What we have learned
What we have learned Microarchitecture

  • Synchronization penalty doesn’t mean performance loss

    • Out-of-order execution allows useful work to be performed when instructions are delayed

    • Superscalar design means that synchronization penalties can be “shared” across multiple instructions

    • For Alpha 95% of penalty hidden

    • For StrongARM++ 63% of penalty hidden

  • We have to be careful

    • Cannot have too many domains

    • Careful where you split!

  • ASYNC 2004 - University of Rochester


    Conclusions
    Conclusions Microarchitecture

    • GALS is a good idea for real processors

      • small IPC loss

      • clock network simplification

      • reduction in power dissipation

      • higher frequency

      • independent domain tuning

    ASYNC 2004 - University of Rochester


    ad