Fault analysis using pin
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Fault Analysis Using Pin PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on
  • Presentation posted in: General

Fault Analysis Using Pin. Srilatha (Bobbie) Manne Intel. What are we trying to do?. Purpose: Simulate the occurrence of transient (or persistent) faults and analyze their impact on applications. Why Pin? Easy to model faults and measure their impact.

Download Presentation

Fault Analysis Using Pin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fault analysis using pin

Fault Analysis Using Pin

Srilatha (Bobbie) Manne

Intel


What are we trying to do

What are we trying to do?

  • Purpose: Simulate the occurrence of transient (or persistent) faults and analyze their impact on applications.

  • Why Pin?

    • Easy to model faults and measure their impact.

    • Relatively fast (5-10 minutes per fault injection)

    • Provides full program analysis


Pros cons

Pros & Cons

Software

Instrumentation

Architectural

Simulator

RTL

Silicon

Accuracy

Ease of Use


Pin s view of the world

Pin’s View of the world

uArch

State

Arch

Reg

Memory


Modeling microarchitectural faults in pin

Modeling Microarchitectural Faults in Pin

  • Accuracy of fault methodology depends on the complexity of the underlying microarchitecture

    • Easier to model faults in an in-order, single issue machine

  • Build a microarchitectural model into Pin

    • A low fidelity model may suffice

    • Adds complexity and slows down simulation time

  • Mimic certain types of microarchitectural faults in Pin


Example destination register transmission fault

Example: Destination Register Transmission Fault

Fault occurs in latches when forwarding instruction output

  • Change architectural value of destination register at the instruction where fault occurs

  • NOTE: This is different than inserting fault into register file because the destination is selected based on the instruction where fault occurs

Exec

Unit

Bypass Logic

ROB

RS

Latches


Example load data transmission faults

Example: Load Data Transmission Faults

Fault occurs when loading data from the memory system

  • Before load instruction, insert fault into memory

  • Execute load instruction

  • After load instruction, remove fault from memory (Cleanup)

  • NOTE: This models a fault occurring in the transmission of data from the STB or L1 Cache

STB

Load

Buffer

Latches

DCache


Five step program for fault analysis

Five Step Program for Fault Analysis

  • Determine ‘when’ the fault occurs

  • Determine ‘where’ the fault occurs

  • Inject Fault

  • Cleanup (Optional)

  • Determine Outcome


Step 1 when

Step 1: WHEN

  • Reality:

    • Assuming that environmental conditions stay the same, transient faults can occur with equal probability at any time during the run of the application.

  • Approximation:

    • Transient faults occur on any dynamic instruction with equal probability


Step 1 when1

Step 1: WHEN

  • Sample Pin Tool: InstCount.C

    • Purpose: Efficiently determines the number of dynamic instances of each static instruction.

  • Output: For each static instruction

    • Function name

    • Dynamic instructions per static instruction

IP: 135000941 Count: 492714322 Func: propagate_block.104

IP: 135000939 Count: 492714322 Func: propagate_block.104

IP: 135000961 Count: 492701800 Func: propagate_block.104

IP: 135000959 Count: 492701800 Func: propagate_block.104

IP: 135000956 Count: 492701800 Func: propagate_block.104

IP: 135000950 Count: 492701800 Func: propagate_block.104


Step 2 where

Step 2: WHERE

  • Reality:

    • Where the transient fault occurs is a function of the size of the structure on the chip.

    • Faults can occur in both architectural and microarchitectural state.

  • Approximation:

    • Pin only provides architectural state, not microarchitectural state (no uops, for instance)

      • Either inject faults only into architectural state

      • Build an approximation for some microarchitectural state


Step 3 injecting fault

Step 3: Injecting Fault

  • Pass context and other relevant information to analysis routine to modify the architectural state

  • Inject fault

  • Flush code cache to force immediate reinstrumentation

  • Force execution at a particular point using the context


Step 4 cleanup

Step 4: Cleanup

  • Cleanup is an optional step and is only necessary for modeling microarchitectural faults, not architectural faults

    • Modeling a fault in the transmission of data to load op


Step 5 determining outcome

Step 5 :Determining Outcome

  • Outcomes that can be tracked:

    • Did the program complete?

    • Did the program complete and have the correct IO result?

    • If the program crashed, how many instructions were executed after fault injection before program crashed?

    • If the program crashed, why did it crash (trapping signals)?


Fault insertion state diagram

Fault Insertion State Diagram

START

Insert Fault

Count Insts

After Fault

Clear Code

Cache

No

Reached

CheckPoint?

Count By

Basic Block

Yes

No

Restart Using

Context

Reached

Threshold?

Print HB &

Update Checkpoint

Counter

Yes

Cleanup?

No

Count Every

Instruction

Reached Max

HB?

No

Yes

No

Yes

Yes

Cleanup Fault

Found Inst?

Detach From Pin &

Run to Completion

Post Fault

Pre-Fault

Fault


Register fault pin tool regfault c

Register Fault Pin Tool: RegFault.C

main(int argc, char * argv[])

{

if (PIN_Init(argc, argv))

{

return Usage();

};

out_file.open(KnobOutputFile.Value().c_str());

faultInst = KnobFaultInst.Value();

TRACE_AddInstrumentFunction (Trace, 0);

INS_AddInstrumentFunction(Instruction, 0);

PIN_AddFiniFunction(Fini, 0);

PIN_AddSignalInterceptFunction(SIGSEGV, SigFunc, 0);

PIN_AddSignalInterceptFunction(SIGFPE, SigFunc, 0);

PIN_AddSignalInterceptFunction(SIGILL, SigFunc, 0);

PIN_AddSignalInterceptFunction(SIGSYS, SigFunc, 0);

PIN_StartProgram();

return 0;

}

MAIN


Fault insertion state diagram1

Fault Insertion State Diagram

START

Insert Fault

Count Insts

After Fault

Clear Code

Cache

No

Reached

CheckPoint?

Count By

Basic Block

Yes

No

Restart Using

Context

Reached

Threshold?

Print HB &

Update Checkpoint

Counter

Yes

Cleanup?

No

Count Every

Instruction

Reached Max

HB?

No

Yes

No

Yes

Yes

Cleanup Fault

Found Inst?

Detach From Pin &

Run to Completion

Fault

Post Fault

Pre-Fault


Fault analysis using pin

if (fineGrainCount == false)

{

for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))

{

BBL_InsertIfCall(bbl, IPOINT_BEFORE, (AFUNPTR)FindFineGrainThreshold,

IARG_UINT32, BBL_NumIns(bbl), IARG_END);

BBL_InsertThenCall(bbl, IPOINT_BEFORE,(AFUNPTR) SwitchToFineGrainCounting,

IARG_END);

}

}

TRACE

Instrumentation

UINT32 FindFineGrainThreshold(UINT32 i)

{

curDynInst += i;

return ( curDynInst >= (faultInst - fineGrainTrigger) );

}

VOID SwitchToFineGrainCounting()

{

if (fineGrainCount == false)

{

fineGrainCount = true;

PIN_RemoveInstrumentation();

}

}

TRACE

Analysis


Fault insertion state diagram2

Fault Insertion State Diagram

START

Insert Fault

Count Insts

After Fault

Clear Code

Cache

No

Reached

CheckPoint?

Count By

Basic Block

Yes

No

Restart Using

Context

Reached

Threshold?

Print HB &

Update Checkpoint

Counter

Yes

Cleanup?

No

Count Every

Instruction

Reached Max

HB?

No

Yes

No

Yes

Yes

Cleanup Fault

Found Inst?

Detach From Pin &

Run to Completion

Fault

Post Fault

Pre-Fault


Fault analysis using pin

VOID Instruction(INS ins, VOID *v)

{

if (fineGrainCount == true)

{

if (faultDone == 0)

{

INS_InsertIfCall(ins, IPOINT_BEFORE, (AFUNPTR)FindFaultInst, IARG_END);

INS_InsertThenCall(ins, IPOINT_BEFORE, (AFUNPTR)InsertFault, IARG_CONTEXT,

IARG_END);

}

if (faultDone == 1)

{ ….

Instruction

Instrumentation

INT32

FindFaultInst()

{

curDynInst++;

return ( curDynInst >= faultInst );

}

Instruction

Analysis


Fault insertion state diagram3

Fault Insertion State Diagram

START

Insert Fault

Count Insts

After Fault

Clear Code

Cache

No

Reached

CheckPoint?

Count By

Basic Block

Yes

No

Restart Using

Context

Reached

Threshold?

Print HB &

Update Checkpoint

Counter

Yes

Cleanup?

No

Count Every

Instruction

Reached Max

HB?

No

Yes

No

Yes

Yes

Cleanup Fault

Found Inst?

Detach From Pin &

Run to Completion

Fault

Post Fault

Pre-Fault


Fault analysis using pin

VOID

InsertFault(CONTEXT* _ctxt)

{

srand(curDynInst);

GetFaultyBit(_ctxt, &faultReg, &faultBit);

UINT32 old_val;

UINT32 new_val;

old_val = PIN_GetContextReg(_ctxt, faultReg);

faultMask = (1 << faultBit);

new_val = old_val ^ faultMask;

PIN_SetContextReg(_ctxt, faultReg, new_val);

PIN_RemoveInstrumentation();

faultDone = 1;

PIN_ExecuteAt(_ctxt);

}

Fault Insertion

Analysis Routine


Fault insertion state diagram4

Fault Insertion State Diagram

START

Insert Fault

Count Insts

After Fault

Clear Code

Cache

No

Reached

CheckPoint?

Count By

Basic Block

Yes

No

Restart Using

Context

Reached

Threshold?

Print HB &

Update Checkpoint

Counter

Yes

Cleanup?

No

Count Every

Instruction

Reached Max

HB?

No

Yes

No

Yes

Yes

Cleanup Fault

Found Inst?

Detach From Pin &

Run to Completion

Pre-Fault

Fault

Post Fault


Fault analysis using pin

VOID Instruction(INS ins, VOID *v)

{

if (fineGrainCount == true)

{

if (faultDone == 0)

{ …. }

if (faultDone == 1)

{

if (INS_HasFallThrough(ins))

{

INS_InsertCall(ins, IPOINT_AFTER, (AFUNPTR)PrintHeartbeat,

IARG_END);

}

if (INS_IsBranchOrCall(ins))

{

INS_InsertCall(ins, IPOINT_TAKEN_BRANCH, (AFUNPTR)PrintHeartbeat,

IARG_END);

}

}

}

}

Post Fault Instruction

Instrumentation


Fault analysis using pin

VOID

PrintHeartbeat()

{

postFaultInsts++;

if (postFaultInsts & dumpMask)

{

out_file << "H: " << dec << dumpMask << endl;

out_file.flush();

dumpMask = dumpMask << 1;

}

if (dumpMask > maxHB)

{

PIN_Detach();

}

}

Post Fault Analysis


Output

OUTPUT

IP: 8192fcf COUNT: 937440391 REG: esi FBIT: 24 MASK: 1000000 OLD: bffeca90 NEW: befeca90

H: 1

H: 2

H: 4

H: 8

.

.

.

H: 8388608

Fault Masked

IP: 80babc0 COUNT: 92958481 REG: ebp FBIT: 20 MASK: 100000 OLD: 0 NEW: 100000

H: 1

H: 2

H: 4

H: 8

H: 16

H: 32

Signal: 11 PostFaultInsts: 38

Program Failure


Sample results

Sample Results


Step 5 determining outcome extreme edition

Step 5: Determining Outcome, Extreme Edition

  • In the InjectFault step (STEP 3)

    • Fork a process and inject fault into one process (parent process)

    • Communicate information between processes (mkfifo)

  • After fault injection, keep track of all writes to memory

  • At each checkpoint, compare architectural state and stores

  • What if there’s a control deviation?

    • For every control operation, compare the next IP between processes

    • If the control flow deviates, then wait until both routines return from the function where the deviation occurred before checking state.


Step 5 extreme edition

Step 5: Extreme Edition

  • Adding this fork and compare feature takes time but it can be done.

  • What does it buy?

    • Does the fault propagate?

    • How far does it propagate?

    • How many registers, bytes of memory does it impact?

    • What happens when there is a control deviation?

    • Is there a higher incidence of program failure or IO error in the presence of a control deviation?


Pin based fault checker

Pin Based Fault Checker

START

Insert Fault

Count Insts

After Fault

Clear Code

Cache

No

Reached

CheckPoint?

Count By

Basic Block

Yes

No

Restart Using

Context

Reached

Threshold?

Print HB &

Update Checkpoint

Counter

No Change

Yes

Cleanup?

No

Count Every

Instruction

Reached Max

HB?

No

Yes

No

Yes

Yes

Cleanup Fault

Found Inst?

Detach From Pin &

Run to Completion

Pre-Fault

Fault

Post Fault


Fault checker fault insertion

Parent

Child

Both

Fault Checker: Fault Insertion

Fault Insertion

Fork Process &

Setup

Communication

Links

Parent Process?

Yes

Insert Fault

No

Restart Using

Context

Parent Process?

Cleanup Required?

Yes

Yes

Cleanup Fault

No

No

Post Fault


Fault checker post fault

Parent

Child

Both

Fault Checker: Post Fault

Post Fault

Get Next Inst &

Count Insts

Old Data!=

New Data?

Yes

Yes

Store OP?

Save Data

No

No

Parent IP !=

Child IP?

Ctrl OP?

Yes

Yes

Ctrl Deviation

No

No

No

CheckPoint?

Checkpoint

Comparison

Yes

Parent State ==

Child State

Yes

Parent?

Read Info From Child

& Compare state

No

Send Continue

Signal to Child

No

Yes

Communicate Reg &

Store Data to Parent

Send Done Signal to

Child & Detach

Yes

Yes

No

Done Or Cont?

Done?

Detach & Exit

No


Fault checker ctrl deviation

Parent

Child

Both

Fault Checker: Ctrl Deviation

Ctrl Deviation

Call Counter = 0

Get Next Inst

Old Data!=

New Data?

Yes

Yes

Store OP?

Save Data

No

No

Function Call

Yes

Call Counter ++

No

Call Counter

< 0 ?

No

No

Function Return?

Yes

Call Counter --

No

Yes

Checkpoint Comparison


Fault checker additional info

Fault Checker: Additional Info

  • Cannot check faults beyond a system call

    • Kill child process and detach parent process from Pin

    • Run parent/faulty process to completion

  • Although not shown in flow chart, the Pin tool detaches after reaching a max number of check points

  • Providing tighter bounds on ctrl deviation:

    • May take a long time before returning from function call

    • On a control deviation

      • For both parent and child processes, save each store address and data

      • For the parent process, tag the store with the number of instructions executed since control deviation occurred.

    • After control merges and if architectural state is the same between the two processes, walk the list of stores from oldest to youngest and determine where the two processes matched.


Conclusion

Conclusion

  • Fault insertion using Pin is a great way to determine the impacts faults have within an application

    • Easy to use

    • Enables full program analysis

    • Accurately describes fault behavior once it has reached architectural state


  • Login