1 / 63

Clocking and Timing in Fault-Tolerant Systems-on-Chip

Clocking and Timing in Fault-Tolerant Systems-on-Chip. Andreas Steininger. Outline. The Clock as a Blessing The Clock as a Curse Alternative Synchronization Schemes GALS fully asynchronous the DARTS approach Conclusion. Contributors to this Work. The DARTS project team

brigid
Download Presentation

Clocking and Timing in Fault-Tolerant Systems-on-Chip

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clocking and Timing in Fault-Tolerant Systems-on-Chip Andreas Steininger

  2. Outline The Clock as a Blessing The Clock as a Curse Alternative Synchronization Schemes • GALS • fully asynchronous • the DARTS approach Conclusion

  3. Contributors to this Work The DARTS project team TU Vienna Gottfried Fuchs Matthias Fuegger Ulrich Schmid Thomas Handl RUAG Space Gerald Kempf Manfred Sust Wolfgang Zangerl

  4. The Need for Fault Tolerance miniaturizationiskeytoprogress in VLSI => smallerstructures => lowervoltage swing => smallercriticalcharge => higheroperatingfrequencies …result in highersusceptibilitytofaults (SET, EMI,…) => cannotavoidfaults, needtotoleratethem

  5. The Roleof Time “The only reason for time is so that everything doesn’t happen at once”, Albert Einstein

  6. The Need forClocking activitiesneedtobeco-ordinated • on systemlevel (brakingofwheels, …) • on algorithmiclevel (consensus, …) • on communicationlevel • on logiclevel (statemachineswitching,…) co-ordination in the time domain (synchronization) is an efficientwaytoattainthis => need a global notionof time (discrete „ticks“)

  7. The Quality ofSynchronization local time (numberofticks) precision π real time

  8. Typical Precision Values on systemlevel: ms … ms on algorithmlevel: ms … ms on communicationlevel: ns … ms on logiclevel: ps … ns

  9. SynchronizationRequirements phasesynchronisation(for „hardwareclock“on logiclevel) 1ms isexcellentprecisionfordistributedclock at 1GHz thismeans 360.000° phaseshift clocksynchronisation(fordistributed time baseon algorithmiclevel)

  10. GloballySynchronous Design whole design is „isochronic“ („perfect“ precision) • time conveyedbyclocktransitions • perfectco-ordination of all activities veryefficient design • canassumeconsistentstates • high levelofabstraction veryefficientimplementation: • singlecrystaloscillator • singlecontrolline (clocknet)

  11. „Isochronic“ Regions ? speedof light (in medium) = 2x 108 m/s = 20cm/ns Ref 2cm 1GHz 4GHz 8GHz

  12. The Variation Problem Designer User ?(unknown) projectedconditions actualconditions worstcase actualsystem ?(imperfections) systemmodel safetymargins Timing completelyfixed after design Nowaytoreacttoactualconditions & system („PVT variations“)

  13. Fault-Tolerant Architectures • Duplication & Comparison • Triple-Modular Redundancy FU FU vo-ter ERR Y =? FU FU FU

  14. Lock-Step Operation singleclock singlepointoffailure goodreplicadeterminism FU „3“ „4“ vo-ter Y FU „4“ „3“ FU „4“ „3“

  15. Lock-Step Operation independentclocks single fault tolerant badreplicadeterminism FU „3“ „4“ vo-ter Y FU „3“ „4“ FU „3“ „4“

  16. Fault-Tolerant HW-Clocking FU v vo-ter Y FU v FU v

  17. Fault-Tolerant HW-Clocking ? FU v D vo-ter Y FU v D ? FU v

  18. The Charme ofSoCs billionsoftransistors fit on one die => structuringinto (IP) modules „System-on-Chip“ BUT: large clockdistributionnetworks => „isochronic“?? FT clockingdoes not workwith large skew mayneed individual clocksforfunctionmodules => clock-synchronyneitherattainablenordesirable

  19. Co-ordination of Data Exchange Whencan SNK useitsinput? Whenitis valid andconsistent f(x) SRC SNK Whencan SRC applythenextinput? When SNK hasconsumedthepreviousone

  20. The Synchronous Approach f(x) SRC SNK co-ordination based on (global) time

  21. Alternative: Asynchronous Design co-ordination based on handshaking REQ: „Data word valid, youcanuseit“ f(x) SRC SNK ACK: „Data wordconsumed, send thenext“

  22. Async. Design – Advantages closed-loop controlmakestimingmuchmore robust and adaptive to PVT variations noneedforworst-casetiming localhandshakesreplace global clock activityonlywhenneeded beneficialfor EMI tendstostopoperation in caseof fault

  23. Async. Design – Disadvantages Need to handle racebetween REQ anddata

  24. Async. Design – Disadvantages Need to handle racebetween REQ anddata REQ: „Data word valid, youcanuseit“ f(x) SRC SNK

  25. Async. Design – Disadvantages Need to handle racebetween REQ anddata Solution 1: „Bundled Data“ REQ: „Data word valid, youcanuseit“ f(x) SRC SNK

  26. Async. Design – Disadvantages Need to handle racebetween REQ anddata Solution 2: „Delay Insensitive“ (Coding) REQ: „Data word valid, youcanuseit“ Completiondetection f(x) SRC SNK

  27. Async. Design – Disadvantages Need to handle racebetween REQ anddata significant HW overhead (coding, delayelements) „adaptive“ timing not aspredictable moredifficultto design classical fault-toleranceschemes not applicable tendstostopoperation in caseoffault

  28. Best ofBothWorlds GALS: GloballyAsynchronousLocallySynchronous retainefficiencyofsynchronous design whereverpossible: „intra-module“ useasynchronousprinciplewhere clockdistribution toocumbersome: „inter-module“ First mention in PhDthesisbyChapiro / Stanford 84

  29. A GALS Example DSP2,7GHz CPU2GHz PCI-IF533MHz USB-IF24MHz

  30. Communication in GALS Shared Memory producerwritestomemory, consumerreadsfromtherepro: controlflowstaysindependent • shared single-portmemory • true dual-portmemory Direct Messages (Data words) movedatawordfromproducer‘soutputregistertoconsumer‘sinputregister • non-buffered / buffered (FIFO-queues) • clockfixed, data-drivenorpausible

  31. SharedMemory decouplingofclockdomainsbymemoryactingas a thirdparty => high areaoverhead => unusual forsingleportmemoryarbitrationrequired • arbitrationproblem (unboundeddelay…) • onesidemay block theotheratthearbiter formultiportmemoryproblemsareconfinedtoaccesstothe same cell • busyflagmaybecomemetastable • blocking still possibleforonespecificaddress

  32. Shared Memory perfectdecouplingofdatapath potential metastabilityproblemsatarbitrationlogic potential blockingthrougharbitration DSP2,7GHz CPU2GHz 0xff14 Arbi-tration shared memory

  33. Direct Messages clockdomainboundaryisbetweenproducer‘soutputregisterandconsumer‘sinputregister in general a synchronizerisneededatconsumer‘sinput • definitelyforconventional (fixed) clock • canbeavoidedbydata-driven / pausibleclocking controlflowsofproducerandconsumerarestronglycoupled: not maintainingtheinput/outputregisterblocksotherparty buffers/queues/FIFOscan • mitigate, but not avoidthisproblem (full/empty) • compensatevariations in thedata rate on bothsides, but not different averagedatarates

  34. Direct Messages S S datamovingoverclockdomainboundary metastabilityproblems => needtoinserthandshake …withsynchronizers DSP2,7GHz CPU2GHz 0xff14 and (optional) buffers

  35. Arbiter: Principle purpose: ○ manage concurringrequeststosharedresource method: ○ handle pairsofrequest_in / grant_out ○ requestsmayarrive in anyorder ○ arbitermust activateonlyonegrant_outat a time(respondtothefirstrequester)Mutual Exclusion (MUTEX) problem: ○ resolveconcurrentrequests => metastabilityproblem

  36. Arbiter: Circuit MUTEX-element: SR-latch Vout,FF R1 G1’ G1 Vmeta Vth,inv G2’ G2 R2 t „Metastabilityfilter“: e.g., hi-thresholdinverter [from D. J. Kinniment „Synchronizationand Arbitration in Digital Systems“, Wiley]

  37. Arbiter: Operation R1 G1’ G1 G2’ G2 R2 R1 R2 G1 G2

  38. Muller C-Element IF a = bTHEN y = aELSE hold y a b C y reset a y RS a C b set y b

  39. Muller C-Element: Circuit [Alan Martin, Caltech]

  40. Data-DrivenClocking Principle:○ assoonasnewdataarrive => startclocking ○ determinenumberkofclockcyclesrequiredtoprocessnewdata ○ stopclocking after kcycles, waitfornextdata Properties:○ needtoswitchclock on and off => bewarespuriousclockpulses! ○ nometastabilityproblem: datastableassoonasconsumerclockstarts ○ potential for power saving ○ usefulforspecificapplicationsonly (nopipe!)

  41. Data-DrivenClock: Circuit / 1 CLK out • CLK half period determined by D D D CLK out

  42. Data-DrivenClock: Circuit / 2 CLK out • transition on REQ answered by transition on CLK out • min CLK half period deter-mined by D C REQ D ACK D CLK out REQ ACK

  43. PausibleClocking Principle: ○ producerrequestsconsumer‘sclocktopause ○ dataprovidedtoinputregisterduringidle time ○ consumer‘sclockmayresume - freerunning („pausibleclock“) - withonecycleonly („stoppableclock“) Properties: ○ needtoswitchclock on and off => bewarespuriousclockpulses! => bewareofclocktreedelays! ○ producercontrolsconsumer‘sclock (blocking!) ○ applicationsmust copewithpausedclock

  44. PausibleClock: Circuit / 1 CLK out • inverter generates next REQ from ACK • self-oscillation C REQ D ACK D CLK out REQ ACK

  45. PausibleClock: Circuit / 2 CLK out ACK’ • external unit can safely stop CLK by activating REQ’ • … and gets ACK’ as a response C REQ’ Arb D D CLK out REQ’ ACK’

  46. PausibleClock: Circuit / 3 ACK1 ACKn CLK out REQ1 REQn Arb Arb C D • for more external sources arbiters can be added and “anded” before the Muller C-Element • the two inverters can be eliminated by using a Muller C-Element with inverting output

  47. Advantages of GALS synchronousislandscanbedesignedefficiently modulesoperateindependently canusemodulespecific-clock & timing clockingisnosinglepointoffailure

  48. Problems with GALS operationofmodules not (inherently) co-ordinatedsynchronyforcommunication but not on system / algorithmlevel communicationhastocrossclockboundaries potential formetastability=> performancepenaltythroughsynchronizers OR => module must handle irregularclocking

  49. The DARTS Idea Distributed Algorithmsfor Robust Tick Synchronization phase synchronisation tick synchronisation clock synchronisation

  50. The DARTS Approach Concept:Multiple synchronized tick generators Method: Distributed algorithm for fault-tolerant tick generation implemented in (asynchronous) digital logic Advantages • No crystal oscillator(s) • No critical clock tree • Clock isno single point of failure! • Reasonable synchrony

More Related