Architectural vulnerability factor avf computation for address based structures
Download
1 / 27

architectural vulnerability factor avf computation for address-based structures - PowerPoint PPT Presentation


  • 297 Views
  • Uploaded on

Architectural Vulnerability Factor (AVF) Computation for Address-Based Structures. Arijit Biswas, Paul Racunas, Shubu Mukherjee FACT Group, DEG, Intel Joel Emer VSSAD, Intel Razvan Cheveresan Sun Microsystems, Intern FACT Group Ram Rangan Princeton University, Intern FACT Group. 12x GAP.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'architectural vulnerability factor avf computation for address-based structures' - Jims


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Architectural vulnerability factor avf computation for address based structures l.jpg

Architectural Vulnerability Factor (AVF) Computation for Address-Based Structures

Arijit Biswas, Paul Racunas, Shubu Mukherjee

FACT Group, DEG, Intel

Joel Emer

VSSAD, Intel

Razvan Cheveresan

Sun Microsystems, Intern FACT Group

Ram Rangan

Princeton University, Intern FACT Group


Moore s law graph l.jpg

12x GAP

10000

1000

100

Latches

Failure rate from Vulnerable

100% Vulnerable

10

20% Vulnerable

1

2007

2008

2009

2010

2011

2012

2003

2004

2005

2006

1000 year MTBF

Year

Goal

Moore’s Law Graph

  • Soft errors are a serious problem

    • Assuming a certain error rate, failure rate of whole chip increases

Chart based on 200,000 latches as used in the Fujitsu SPARC Processor (2003)

FACT Group, Intel


All bits are not created equal l.jpg

Bit

1 0

All bits are not created equal!

Particle Strike

Causes Bit Flip!

FACT Group, Intel


All bits are not created equal4 l.jpg

Bit

Read?

Bit has error protection

benign fault

no error

benign fault

no error

benign fault

no error

Does bit matter?

Does bit matter?

True Detected Unrecoverable Error

False Detected Unrecoverable Error

Silent Data Corruption

All bits are not created equal!

Particle Strike

Causes Bit Flip!

no

yes

Detection &

Correction

no

Detection

only

no

yes

no

yes

FACT Group, Intel


Does bit matter l.jpg
Does bit matter?

  • Architectural Vulnerability Factor (AVF)

    • Probability that a bit flip will cause user-visible error

  • Soft Error Rate of a Structure = (AVFbit) x (# Bits) x (Intrinsic Error Rate)bit

  • Reducing AVF reduces SER

    • High AVF indicates need for protection

    • Low AVF can help remove protection hardware

  • SER Protection can be Expensive

    • Impacts Area, Power, Performance, Design Time

FACT Group, Intel


Simple examples l.jpg
Simple Examples

  • Committed Program Counter AVF ~ 100%

  • Branch Predictor AVF = 0%

FACT Group, Intel


Complex examples l.jpg
Complex Examples

  • Instruction Queue AVF = 29%

  • Execution Units AVF = 9%

  • Used a new concept

    • Architecturally Correct Execution (ACE)

FACT Group, Intel


Slide8 l.jpg

Architecturally Correct Execution (ACE)

Program Input

Program Outputs

  • ACE path requires only a subset of values to flow correctly through the program’s data flow graph (and the machine)

  • Anything else (un-ACE path) can be derated away

FACT Group, Intel


Slide9 l.jpg

Example of un-ACE instruction: Dynamically Dead Instruction

Dynamically

Dead

Instruction

Most bits of an un-ACE instruction do not affect program output

FACT Group, Intel


Ace breakdown of instruction queue l.jpg
ACE Breakdown of Instruction Queue

Average across all of Spec2K slices for an IA64-like processor

ACE % = AVF = 29%

FACT Group, Intel


A new avf analysis address based structures l.jpg
A New AVF Analysis – Address-Based Structures

  • Caches, data translation buffers, store buffers

    • Make up large portions of a modern chip

  • Simple ACE analysis is no longer enough

  • Data & Tag structures need new concepts

    • Extended Lifetime Analysis

    • Hamming-Distance-1 Analysis

    • Cooldown

    • AVF Reduction - Flushing

FACT Group, Intel


Lifetime analysis l.jpg
Lifetime Analysis

  • Idle is unACE

    • Assuming all time intervals are equal

    • For 3/5 of the lifetime the bit is valid

    • Gives a measure of the structure’s utilization

      • Number of useful bits

      • Amount of time useful bits are resident in structure

      • Valid for a particular trace

Fill

Read

Read

Evict

Idle

Valid

Valid

Valid

Idle

FACT Group, Intel


Lifetime analysis of write through data cache l.jpg
Lifetime Analysis of Write-through Data Cache

  • Valid is not necessarily ACE

  • ACE % = AVF = 2/5 = 40%

  • Example Lifetime Components

    • ACE: fill-to-read, read-to-read

    • unACE: idle, read-to-evict, write-to-evict

Fill

Read

Read

Evict

Idle

Idle

Write-through Data Cache

FACT Group, Intel


Lifetime analysis of write through data cache14 l.jpg
Lifetime Analysis of Write-through Data Cache

  • Data ACEness is a function of instruction ACEness

  • Second Read is by an unACE instruction

  • AVF = 1/5 = 20%

Fill

Read

Read

Evict

Idle

Idle

Write-through DCache

FACT Group, Intel


Tags are hard l.jpg
Tags are Hard

  • A fault associated with a tag that is nominally associated with a particular instruction can impact the correct execution of a different independent instruction

  • False Negatives only error if writeback is necessary

    • Uses standard lifetime analysis

  • False Positives always result in error

    • Need bit-level analysis

FACT Group, Intel


False positive l.jpg
False Positive

  • Expected Tag Miss, but got Hit – Error

  • How do you compute the AVF? Fault injection?

Incoming Address

Tag Address

1

0

0

1

1

0

0

0

  • Expect:

MISS

Tag Address

Incoming Address

1

0

0

1

1

0

0

1

  • Acquire:

HIT

FACT Group, Intel


Hamming distance 1 analysis l.jpg
Hamming-Distance-1 Analysis

  • Assuming a single-bit error model

  • Now we can use lifetime analysis on the identified bit(s)

Tag Array

101010

Hamming-Distance-1 Match

Incoming Address

001010

111010

000001

111000

Hamming-Distance-1 Match

010101

111111

FACT Group, Intel


Edge effects l.jpg
Edge Effects

  • Simulation introduces unknown component

    • Simulation not run to completion

    • Only execute small segment of code

  • Worst Case AVF = Known AVF + Unknown AVF

  • How do we reduce/eliminate unknown?

Fill

Read

Read

Evict

Idle

Unknown

Not Simulated

Idle

Sim End

FACT Group, Intel


Cooldown l.jpg
Cooldown

  • run simulation beyond end interval.

    • Any bits that were already valid (the unknown bits), are resolved

  • Trend: unknown AVF primarily resolves to unACE

  • Best Estimate AVF = Known AVF after Cooldown

10 Million Instructions Simulation

10 Million Instructions Cooldown

No Cooldown

Cooldown

FACT Group, Intel


Data avfs average l.jpg
Data AVFs (Average)

  • STB AVF lower due to large idle component and bytemasks

  • DTB AVF higher due to high average utilization

  • Dcache (WB) AVF higher than Dcache (WT) since dirty bytes still ACE after last read

FACT Group, Intel


Data avf of dtb l.jpg
Data AVF of DTB

  • Large variability in AVF

    • Ranges from ~0% to 80%

  • Based on structure utilization by benchmark

FACT Group, Intel


Tag avfs average l.jpg
Tag AVFs (Average)

  • Tag AVFs lower than expected for DTB and DCache (WT)

    • Only Hamming-Distance-1 matches contribute ACE time

  • Tag AVFs higher than data for STB and DCache (WB)

    • Dynamically dead tags are still ACE for dirty bytes

FACT Group, Intel


Tag avf of dtb l.jpg
Tag AVF of DTB

  • AVFs surprisingly small, little variation

  • Protection added to DTB CAMs prior to AVF calculation (large # bits)

    • AVF calculation shows NO protection was needed in this case

FACT Group, Intel


Avf observations l.jpg
AVF Observations

  • DTB and Write-through Data Cache

    • Typically Tag AVF < Data AVF

      • only hamming-distance 1 hits contribute to Tag AVF

      • dynamic dead data are unACE

  • STB and Write-back Data Cache

    • Typically Tag AVF ≥ Data AVF

      • Tag AVF ACE till eviction if line is dirty

      • dynamic dead data can be ACE

      • Bytemasks and writes may make certain bytes of data unACE while all bits of tag are always ACE

FACT Group, Intel


Avf reduction flushing l.jpg

Fill

Flush

AVF Reduction: Flushing

  • Flushing (emulates a context switch)

    • Also eliminates unknowns by flushing all live entries at end of simulation

  • Main concept: Transform part of ACE time into unACE at the Expense of some Performance

Fill

Read

Read

Evict

Idle

ACE

ACE

Idle

FACT Group, Intel


Avf reduction flushing26 l.jpg

Data

Tags

AVF Reduction: Flushing

  • >50% AVF reduction for 100K cycle Flush (Flush takes 0 time)

    • Max IPC reduction: 1.77% DTB, 1.25% WT/WB DCache

    • Avg IPC reduction: 0.56% DTB, 0.19% WT/WB DCache

No Flushing

5M cycle Flush

1M cycle Flush

100K cycle Flush

FACT Group, Intel


Summary l.jpg
Summary

  • SER is an ever-increasing problem

    • Need standard, quantitative way to evaluate design cost of adding protection/recovery to structures

  • AVF Gives us a Quantitative way to Measure the cost of adding Protection

  • Presented a Methodology to Compute the AVF of Address Based Structures

    • Lifetime Analysis

    • False Negatives and False Positives

      • Hamming Distance-1 Analysis for False Positives

    • Edge Effects and Cooldown

      • Analogous to Warmup

    • AVF Reduction - Flushing

FACT Group, Intel


ad