A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocesso...
Download
1 / 23

- PowerPoint PPT Presentation


  • 469 Views
  • Uploaded on

A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor. Jason Blome, Scott Mahlke, Daryl Bradley*, Krisztián Flautner* Advanced Computer Architecture Lab, University of Michigan *ARM Ltd. Embedded Everywhere. Not just cellphones

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - DoraAna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor

Jason Blome, Scott Mahlke,

Daryl Bradley*, Krisztián Flautner*

Advanced Computer Architecture Lab, University of Michigan

*ARM Ltd.

1


Embedded everywhere l.jpg
Embedded Everywhere Production-Level Embedded Microprocessor

  • Not just cellphones

  • Safety critical applications:

    • Automotive

    • Healthcare

Patterson and Hennessy 2005

2


Embedded domain constraints l.jpg
Embedded Domain Constraints Production-Level Embedded Microprocessor

  • Power efficient performance

    • Longer clock cycle times

    • Increased logic depth between stages

    • Higher area ratio of combinational logic to state elements

  • Less speculative state

    • Potentially less masking

  • Limited real estate

All of these high level constraints affect the behavior of faults and the potential of fault tolerance techniques

3


Objectives l.jpg
Objectives Production-Level Embedded Microprocessor

  • Understand the effects of transient faults on a typical embedded design

    • Architectural contributions to soft error effects

    • Production-grade core

      • Reference synthesis flow

      • Design for test methodologies

  • Simulate faults in both combinational and sequential logic

4


Soft error rate contributions l.jpg
Soft Error Rate Contributions Production-Level Embedded Microprocessor

Soft Error Rate Contributions

Mitra 2005

Shivakumar 2002

Increasing contribution of faults in combinational logic to the overall soft error rate

5


Processor model l.jpg

ALU Production-Level Embedded Microprocessor

Processor Model

  • ARM926EJ-S

  • Cell library characterized for 130 nm

  • 5 ns clock cycle time

ARM926EJ-S

Instruction Fetch

Instruction Decode

Data

cache

Data Interface

MMU

Instruction

Address

Logic

Register

Bank

Mux

Array

Instruction

cache

Shift

MMU

Write Buffer/

Bus Interface

Multiply

Bus Interface

Data

Address

Logic

6


Analysis infrastructure l.jpg
Analysis Infrastructure Production-Level Embedded Microprocessor

testbench

reference

design

test

design

benchmark

error checking

and logging

fault injection

scheduler

fault injection/error analysis

framework

report generation

7


Fault masking l.jpg

0 Production-Level Embedded Microprocessor

0

CLK

tsetup

thold

Fault Masking

  • Logical: faulted value does not affect logical operation of the circuit

  • Architectural/Software: incorrect state is written before it is read

  • Latching-Window: the fault pulse does not reach a state element within the latching window

  • Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit

8


Observed error rates l.jpg

94% Production-Level Embedded Microprocessor

7%

16%

4%

Observed Error Rates

Faults Occurring in Registers

Faults Occurring in Combinational Logic

At the software interface, error rates within 3%

9


Observed error rates10 l.jpg
Observed Error Rates Production-Level Embedded Microprocessor

Faults Occurring in Registers

Faults Occurring in Combinational Logic

Faults in combinational logic have a much more dramatic effect on system state

10


Architectural errors per cycle l.jpg
Architectural Errors per Cycle Production-Level Embedded Microprocessor

Faults Occurring in Registers

Faults Occurring in Combinational Logic

11


Architectural corruption characteristics l.jpg
Architectural Corruption Characteristics Production-Level Embedded Microprocessor

Bits per Architectural Register Corrupted

Number of Architectural Registers Corrupted

12


Results summary l.jpg
Results Summary Production-Level Embedded Microprocessor

  • Faults occurring in logic:

    • Will likely be much more frequent in embedded design

    • Tend to have a more dramatic effect on system state

    • Multi-bit/multi-register architectural errors common

  • Design for test methodologies can greatly impact soft error characteristics

  • Error rates at the software interface consistent with those observed in high-performance microprocessors

13


Traditional error detection protection l.jpg
Traditional Error Detection/Protection Production-Level Embedded Microprocessor

  • Reliable Encoding

    • ECC/Parity

      • Limited use for faults in logic

      • Unclear where/how much to protect

  • Redundant Computation

    • In space

      • Area/energy overhead

    • In time

      • Energy overhead

      • Requires performance slack

14


Case study i l.jpg

Cycle 1: Production-Level Embedded Microprocessor51 Errors

instr_reg_ID[0, 16, 22, 31]

ID_decode_info[0, 16, 31]

stored_instr[29, 30]

Cycle 2: 51 Errors

instr_reg_EX[0, 16, 22, 31]

EX_decode_info[0, 16, 31]

Cycle 3: 17 Errors

ALU_out[0, 1, 2, 3, 4, 5, 6]

Cycle 5: 29 Errors

Reg0_reg[0, 1, 2, 3, 4, 5, 6]

Cycle 4: 18 Errors

ALU_result_wb[0,1,2,3,4,5,6]

ALU

Case Study I

IRoute

Instruction Fetch

Instruction Decode

Data

cache

Data Interface

MMU

Instruction

Address

Logic

Register

Bank

Mux

Array

Instruction

cache

Shift

MMU

Write Buffer/

Bus Interface

Multiply

Bus Interface

Data

Address

Logic

15


Case study ii l.jpg

Cycle 1: Production-Level Embedded Microprocessor9 Errors

instr_reg_ID[3,12,17, 18,24,26,29,30,31]

Cycle 2: 62 Errors

instr_reg_EX

shifter_data_opEx_reg

Shifter_data_reg

alu_cc_reg

Cycle 3: 49 Errors

Shifter_data_EX

alu_out_reg

ALU

Cycle 4: 183 Errors

writeback and forwarding state

register bank

Case Study II

IPipe

Instruction Fetch

Instruction Decode

Data

cache

Data Interface

MMU

Instruction

Address

Logic

Register

Bank

Mux

Array

Instruction

cache

Shift

MMU

Write Buffer/

Bus Interface

Multiply

Bus Interface

Data

Address

Logic

16


Fault characteristics l.jpg
Fault Characteristics Production-Level Embedded Microprocessor

  • Case Study I: uCORE.uIRoute.U600

    • First cycle error sites: 51 errors

      • uIRoute.INSTRHeld_reg[0]

      • uIRoute.INSTRHeld_reg[16]

      • uIRoute.INSTRHeld_reg[22]

      • uIRoute.INSTRHeld_reg[31]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[0]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[16]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[31]

      • u9EJ.uARM9.uCORECTL.uIPIPE.StoredInstrInt_reg[29]

      • u9EJ.uARM9.uCORECTL.uIPIPE.StoredInstrInt_reg[30]

  • Case Study II: uCORE.u9EJ.uARM9.uCORECTL.uIPIPE.U3626

    • First cycle error sites:9 errors

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[3]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[12]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[17]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[18]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[24]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[26]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[29]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[30]

      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[31]

17


Embedded design space potential l.jpg
Embedded Design Space Potential Production-Level Embedded Microprocessor

  • Leverage significant signal fanout

  • Determine that a fault has occurred during the cycle that it occurs

    • Transition detection circuits

  • Selectively deploy fault detection units

    • Intersection of high fanout fault targets

    • No roll-back necessary – simply flush the pipeline

    • Low cost/area overhead critical for embedded designs

18


Conclusion l.jpg
Conclusion Production-Level Embedded Microprocessor

  • Design domain critical:

    • Affects fault behavior

    • Limits applicable tolerance techiques

  • Key observations:

    • Faults in combinational logic much more likely in embedded designs

    • Faults in combinational logic behave dramatically different than those in state elements

    • Fault fanout offers potential for low overhead detection

19


Soft error terminology l.jpg

transient fault Production-Level Embedded Microprocessor

soft error

Soft Error Terminology

transistor

20


Dependence on fault duration l.jpg
Dependence on Fault Duration Production-Level Embedded Microprocessor

21


Pulse detection l.jpg
Pulse Detection Production-Level Embedded Microprocessor

flip-flop

D

Q

CLK

~Q

error

shadow latch

22


Microarchitectural errors per cycle l.jpg
Microarchitectural Errors per Cycle Production-Level Embedded Microprocessor

Faults Occurring in Registers

Faults Occurring in Combinational Logic

Multi-bit errors common for Faults in combinational logic

23


ad