1 / 18

On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme. Vienna University of Technology Embedded Computing Systems Group {fuchs, fuegger, steininger}@ecs.tuwien.ac.at. Gottfried Fuchs, Matthias Függer and Andreas Steininger. Outline.

nowles
Download Presentation

On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme Vienna University of Technology Embedded Computing Systems Group {fuchs, fuegger, steininger}@ecs.tuwien.ac.at Gottfried Fuchs, Matthias Függer and Andreas Steininger

  2. Outline • Asynchronous fault-tolerant algorithm • Investigate its susceptibility to metastability • In this context: study Sutherland’s micropipeline 2

  3. Clocking in SoCs DARTS GALS synchronousSoC • (+) no single point of failure • (-) no common time across chip • (+) no single point of failure • (+) common time across chip (< small # of ticks) (-) single point of failure (Seifert et al.) (+) common time across chip (< 1 tick)

  4. SoC with Common Time q’s local clock domain tick(3) tick(4) tick(5) q p p tick(2) tick(3) tick(4) tick(5) q π(t) = 2 #ticks(Δ) = 3 precision: at any t,π(t) bounded accuracy: l(Δ) < #ticks in any Δ < u(Δ) Common time eases solving other problems (replica determinism, …). 4

  5. DARTS Hardware Implementation Common time property proved in [EDCC06, PODC09]. • Initially: • send tick(0) to all; clock:= 0; • If received tick(m) from at least f+1 remote nodes and m > clock: • send tick(clock+1),…, tick(m) to all; clock:= m; • If received tick(m) from at least 2f+1 remote nodes and m >= clock: • send tick(m+1) to all; clock:= m+1; 5

  6. DARTS Hardware Implementation Common time property proved in [EDCC06, PODC09]. But: Proofs cover digital behavior, only. What about metastability (during non-normal operation)? 6

  7. Potential for metastability (1) • TG-Alg has • (a) stable state • (b) fault  non-closed (unrestricted) environment • (no stability condition as in QDI) •  exists a malicious input pulse.  Make sure metastability does not propagate across ECR boundary 7

  8. Existence of metastability barrier? (Sutherland) 8

  9. Does a micropipeline “synchronize”? in(t) out(t) maliciousout (t) tE1 tE2 Critical pulse window size (2 stages) = tE2 -tE1 9

  10. Does a micropipeline “synchronize”? in(t) out(t) maliciousout (t) Critical pulse window size (4 stages) 10

  11. Metastability decay in a C-Element (1) Latch C-Element Model Model Decay towards LO/HI MTBU formula Do equivalent formulas exist? 11

  12. Metastability decay in a C-Element (2) a,b inputs (b = armed)z outputx feedback a(t),f(a,b,x)(t) For t > tE :Consider homogenous solution f(a,b,x)(t) = x(t) x0 tE 12

  13. Metastability decay in a C-Element (2) Near metastability point: with assumption x0= “midway” yields Remember the latch:  strong indication for synchronizing behavior 13

  14. Simulation Setup 4 stage pipeline, MATLABs stiff ODE parameters: CMOS 180nm,but G = 1.66 (numeric resolution) maliciousout (t)  choose Tmaxcorr = 3Tnom 14

  15. Simulation Results (1) Dependence on RC constants critical window critical window size approx. linear dependence only 15

  16. Simulation Results (2) Dependence on #stages critical window critical window size ~10-1/stage 16

  17. Simulation Results (3) Dependence on G ~10-7/1 In case of DARTS Simulation indicates that critical pulse window size < 1fs. 17

  18. Conclusions • Example for fault-tolerant asynchronous algorithm: DARTS. • Identified micropipeline as metastability barrier. • Characterized its synchronizing behavior. Open research: • Refined C-Element models (yield results for larger G). • Extend analysis to incorporate masking effects and calculate metastability upset probability. 18

More Related