slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor PowerPoint Presentation
Download Presentation
A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor

Loading in 2 Seconds...

play fullscreen
1 / 23

A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor - PowerPoint PPT Presentation


  • 473 Views
  • Uploaded on

A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor. Jason Blome, Scott Mahlke, Daryl Bradley*, Krisztián Flautner* Advanced Computer Architecture Lab, University of Michigan *ARM Ltd. Embedded Everywhere. Not just cellphones

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor' - DoraAna


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor

Jason Blome, Scott Mahlke,

Daryl Bradley*, Krisztián Flautner*

Advanced Computer Architecture Lab, University of Michigan

*ARM Ltd.

1

embedded everywhere
Embedded Everywhere
  • Not just cellphones
  • Safety critical applications:
    • Automotive
    • Healthcare

Patterson and Hennessy 2005

2

embedded domain constraints
Embedded Domain Constraints
  • Power efficient performance
    • Longer clock cycle times
    • Increased logic depth between stages
    • Higher area ratio of combinational logic to state elements
  • Less speculative state
    • Potentially less masking
  • Limited real estate

All of these high level constraints affect the behavior of faults and the potential of fault tolerance techniques

3

objectives
Objectives
  • Understand the effects of transient faults on a typical embedded design
    • Architectural contributions to soft error effects
    • Production-grade core
      • Reference synthesis flow
      • Design for test methodologies
  • Simulate faults in both combinational and sequential logic

4

soft error rate contributions
Soft Error Rate Contributions

Soft Error Rate Contributions

Mitra 2005

Shivakumar 2002

Increasing contribution of faults in combinational logic to the overall soft error rate

5

processor model

ALU

Processor Model
  • ARM926EJ-S
  • Cell library characterized for 130 nm
  • 5 ns clock cycle time

ARM926EJ-S

Instruction Fetch

Instruction Decode

Data

cache

Data Interface

MMU

Instruction

Address

Logic

Register

Bank

Mux

Array

Instruction

cache

Shift

MMU

Write Buffer/

Bus Interface

Multiply

Bus Interface

Data

Address

Logic

6

analysis infrastructure
Analysis Infrastructure

testbench

reference

design

test

design

benchmark

error checking

and logging

fault injection

scheduler

fault injection/error analysis

framework

report generation

7

fault masking

0

0

CLK

tsetup

thold

Fault Masking
  • Logical: faulted value does not affect logical operation of the circuit
  • Architectural/Software: incorrect state is written before it is read
  • Latching-Window: the fault pulse does not reach a state element within the latching window
  • Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit

8

observed error rates

94%

7%

16%

4%

Observed Error Rates

Faults Occurring in Registers

Faults Occurring in Combinational Logic

At the software interface, error rates within 3%

9

observed error rates10
Observed Error Rates

Faults Occurring in Registers

Faults Occurring in Combinational Logic

Faults in combinational logic have a much more dramatic effect on system state

10

architectural errors per cycle
Architectural Errors per Cycle

Faults Occurring in Registers

Faults Occurring in Combinational Logic

11

architectural corruption characteristics
Architectural Corruption Characteristics

Bits per Architectural Register Corrupted

Number of Architectural Registers Corrupted

12

results summary
Results Summary
  • Faults occurring in logic:
    • Will likely be much more frequent in embedded design
    • Tend to have a more dramatic effect on system state
    • Multi-bit/multi-register architectural errors common
  • Design for test methodologies can greatly impact soft error characteristics
  • Error rates at the software interface consistent with those observed in high-performance microprocessors

13

traditional error detection protection
Traditional Error Detection/Protection
  • Reliable Encoding
    • ECC/Parity
      • Limited use for faults in logic
      • Unclear where/how much to protect
  • Redundant Computation
    • In space
      • Area/energy overhead
    • In time
      • Energy overhead
      • Requires performance slack

14

case study i

Cycle 1: 51 Errors

instr_reg_ID[0, 16, 22, 31]

ID_decode_info[0, 16, 31]

stored_instr[29, 30]

Cycle 2: 51 Errors

instr_reg_EX[0, 16, 22, 31]

EX_decode_info[0, 16, 31]

Cycle 3: 17 Errors

ALU_out[0, 1, 2, 3, 4, 5, 6]

Cycle 5: 29 Errors

Reg0_reg[0, 1, 2, 3, 4, 5, 6]

Cycle 4: 18 Errors

ALU_result_wb[0,1,2,3,4,5,6]

ALU

Case Study I

IRoute

Instruction Fetch

Instruction Decode

Data

cache

Data Interface

MMU

Instruction

Address

Logic

Register

Bank

Mux

Array

Instruction

cache

Shift

MMU

Write Buffer/

Bus Interface

Multiply

Bus Interface

Data

Address

Logic

15

case study ii

Cycle 1: 9 Errors

instr_reg_ID[3,12,17, 18,24,26,29,30,31]

Cycle 2: 62 Errors

instr_reg_EX

shifter_data_opEx_reg

Shifter_data_reg

alu_cc_reg

Cycle 3: 49 Errors

Shifter_data_EX

alu_out_reg

ALU

Cycle 4: 183 Errors

writeback and forwarding state

register bank

Case Study II

IPipe

Instruction Fetch

Instruction Decode

Data

cache

Data Interface

MMU

Instruction

Address

Logic

Register

Bank

Mux

Array

Instruction

cache

Shift

MMU

Write Buffer/

Bus Interface

Multiply

Bus Interface

Data

Address

Logic

16

fault characteristics
Fault Characteristics
  • Case Study I: uCORE.uIRoute.U600
    • First cycle error sites: 51 errors
      • uIRoute.INSTRHeld_reg[0]
      • uIRoute.INSTRHeld_reg[16]
      • uIRoute.INSTRHeld_reg[22]
      • uIRoute.INSTRHeld_reg[31]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[0]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[16]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[31]
      • u9EJ.uARM9.uCORECTL.uIPIPE.StoredInstrInt_reg[29]
      • u9EJ.uARM9.uCORECTL.uIPIPE.StoredInstrInt_reg[30]
  • Case Study II: uCORE.u9EJ.uARM9.uCORECTL.uIPIPE.U3626
    • First cycle error sites:9 errors
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[3]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[12]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[17]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[18]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[24]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[26]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[29]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[30]
      • u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[31]

17

embedded design space potential
Embedded Design Space Potential
  • Leverage significant signal fanout
  • Determine that a fault has occurred during the cycle that it occurs
    • Transition detection circuits
  • Selectively deploy fault detection units
    • Intersection of high fanout fault targets
    • No roll-back necessary – simply flush the pipeline
    • Low cost/area overhead critical for embedded designs

18

conclusion
Conclusion
  • Design domain critical:
    • Affects fault behavior
    • Limits applicable tolerance techiques
  • Key observations:
    • Faults in combinational logic much more likely in embedded designs
    • Faults in combinational logic behave dramatically different than those in state elements
    • Fault fanout offers potential for low overhead detection

19

pulse detection
Pulse Detection

flip-flop

D

Q

CLK

~Q

error

shadow latch

22

microarchitectural errors per cycle
Microarchitectural Errors per Cycle

Faults Occurring in Registers

Faults Occurring in Combinational Logic

Multi-bit errors common for Faults in combinational logic

23