automatic data structure repair for self healing systems n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Automatic Data Structure Repair for Self-Healing Systems PowerPoint Presentation
Download Presentation
Automatic Data Structure Repair for Self-Healing Systems

Loading in 2 Seconds...

play fullscreen
1 / 39

Automatic Data Structure Repair for Self-Healing Systems - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Automatic Data Structure Repair for Self-Healing Systems. Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology. Motivation. Broken Data Structure. Errors Missing elements Inappropriate sharing Dangling references Out of bounds array indices

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automatic Data Structure Repair for Self-Healing Systems' - easter


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automatic data structure repair for self healing systems

Automatic Data Structure Repair for Self-Healing Systems

Brian Demsky

Martin Rinard

Laboratory for Computer Science

Massachusetts Institute of Technology

motivation
Motivation

Broken Data Structure

Errors

  • Missing elements
  • Inappropriate sharing
  • Dangling references
  • Out of bounds array indices
  • Inconsistent values

F = 20

G = 10

F = 20

G = 5

I = 5

J = 2

slide3

Goal

Broken Data Structure

Consistent Data Structure

F = 2

G = 1

F = 20

G = 10

F = 10

G = 5

F = 20

G = 10

F = 20

G = 5

Repair

Algorithm

I = 3

I = 5

J = 2

J = 2

slide4

Goal

Broken Data Structure

Consistency

Properties

From

Developer

Consistent Data Structure

F = 2

G = 1

F = 20

G = 10

F = 10

G = 5

F = 20

G = 10

F = 20

G = 5

Repair

Algorithm

I = 3

I = 5

J = 2

J = 2

what does repair algorithm produce
What Does Repair Algorithm Produce?
  • Data structure that
    • Satisfies consistency properties, and
    • Heuristically close to broken data structure
  • Not necessarily the same data structure as (hypothetical) correct program would produce
  • But enough to keep program operating successfully
precursors
Precursors
  • Data structure repair has historically appeared in systems with extreme reliability goals
    • 5ESS switch – hand coded audit routines
    • IBM MVS operating system – hand coded failure recovery routines
  • Key component of these systems
where is this likely to be useful
Where Is This Likely To Be Useful?
  • Not for systems with slack - can just reboot
    • Cause of error must go away after reboot
    • Must be OK to lose volatile state
    • Must be OK to wait for reboot
  • Persistent data structures

(file systems, application files)

  • Autonomous and/or safety critical systems
    • Monitor/control unstable physical phenomena
    • Largely independent subcomputations
    • Moving time window
architecture
Architecture

Broken

Abstract Model

Repaired

Abstract Model

Internal

Consistency

Properties

External

Consistency

Properties

Model

Definition &

Translation

1011100110001111011

1010101011110011101

1010111000111101110

1010011110001111011

1010110101110011010

1010111011001100010

Broken

Bits

Repaired

Bits

architecture rationale
Architecture Rationale

Why go through the abstract model?

  • Simple, uniform structure
    • Sets of objects
    • Relations between objects
  • Simplifies both
    • Expression of consistency properties
    • Repair algorithm
  • Enables system to support full range of efficient, heavily encoded data structures
file system example
struct Entry {

byte name[Length];

int firstBlock;

}

struct Block {

int nextBlock;

data byte[BlockSize];

}

File System Example

abst

0

intro

2

1

-5

1

-1

Directory Entries

Disk Blocks

struct Disk {

Entry dir[NumEntries];

Block block[NumBlocks];

}

Disk D;

model definition
Model Definition
  • Sets of objects

set blocks of integer : partition used | free;

  • Relations between objects – values of object fields, referencing relationships between objects

relation next : used, used;

blocks

next

used

free

model translation
Model Translation

Bits translated to sets and relations in abstract model using statements of the form:

Quantifiers, Condition  Inclusion Constraint

for i in 0..NumEntries, 0  D.dir[i].firstBlock and D.dir[i].firstBlock < NumBlocks 

D.dir[i].firstBlock in used

for b in used, 0  D.block[b].nextBlock and D.block[b].nextBlock < NumBlocks b,D.block[b].nextBlock in next

for b,n in next, true  n inused

for b in 0..NumBlocks, not (b in used) b in free

model in example
Model in Example

abst

0

intro

2

1

-5

1

-1

Directory Entries

Disk Blocks

blocks

used

0

next

free

1

3

next

2

internal consistency properties
Internal Consistency Properties

Quantifiers, Body

  • Body is first-order property of basic propositions
  • Inequality constraints on values of numeric fields
    • V.R = E, V.R < E, V.R  E, V.R  E, V.R > E
  • Presence of required number of objects
    • size(S) = C, size(S)  C, size(S)  C
  • Topology of region surrounding each object
    • size(V.R) = C, size(V.R)  C, size(V.R)  C
    • size(R.V) = C, size(R.V)  C, size(R.V)  C
  • Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R
  • Example: for b in used, size(next.b)  1
internal consistency violations
Internal Consistency Violations

Evaluate consistency properties, find violations

for b in used, size(next.b)  1 is false for b = 1

blocks

used

0

next

free

1

3

next

2

repairing violations of internal consistency properties
Repairing Violations of Internal Consistency Properties
  • Violation provides binding for quantified variables
  • Convert Body to disjunctive normal form

(p1  …  pn )  …  (q1  …  qm )

p1 …pn , q1 …qm are basic propositions

  • Choose a conjunction to satisfy
  • Repair violated basic propositions in conjunction
repairing violations of basic propositions
Repairing Violations of Basic Propositions
  • Inequality constraints on values of numeric fields
    • V.R = E, V.R < E, V.R  E, V.R  E, V.R > E
    • Compute value of expression, assign field
  • Presence of required number of objects
    • size(S) = C, size(S)  C, size(S)  C
    • Remove or insert objects from/to set
  • Topology of region surrounding each object
    • size(V.R) = C, size(V.R)  C, size(V.R)  C
    • size(R.V) = C, size(R.V)  C, size(R.V)  C
    • Remove or insert pairs from/to relation
  • Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R
    • Remove or add the object or pair from/to set or

relation

slide18

Repair in Example

for b in used, size(next.b)  1 is false for b = 1

Must repair size(next.1)  1

Can remove either 0,1 or 2,1 from next

blocks

used

0

next

free

1

3

next

2

slide19

Repair in Example

for b in used, size(next.b)  1 is false for b = 1

Must repair size(next.1)  1

Can remove either 0,1 or 2,1 from next

blocks

used

0

next

free

1

3

2

acyclic repair dependences
Acyclic Repair Dependences
  • Questions
    • Isn’t it possible for the repair of one constraint to invalidate another constraint?
    • What about infinite repair loops?
    • What about unsatisfiable specifications?
  • Answer
    • We require specifications to have no cyclic repair dependences between constraints
    • So all repair sequences terminate
    • Repair can fail only because of resource limitations
external consistency constraints
External Consistency Constraints

Quantifiers, Condition  Body

  • Body of form V = E, V.F = E, V.F[I] = E
  • Example

for b in free, true  D.block[b].nextBlock = -2

for i,j in next, true  D.block[i].nextBlock = j

for b in used, size(b.next) = 0  D.block[b].nextBlock = -1

  • Repair simply performs assignments
  • Translates model repairs to bit repairs
slide22

abst

0

intro

2

1

-5

1

-1

Directory Entries

Disk Blocks

abst

0

intro

2

1

-1

-1

-2

Directory Entries

Disk Blocks

Repair in Example

Inconsistent File System

Repaired File System

when to test for consistency and repair
When to Test for Consistency and Repair
  • Persistent data structures
    • Repair can be independent activity, or
    • Repair when data written out or read in
  • Volatile data structures in running program
    • Under programmer control
    • Transaction-based approach
      • Identify transaction start and end
      • Repair at start, end, or both
    • Failure-based approach
      • Wait until program fails
      • Repair and restart from latest safe point
experience
Experience
  • We acquired four benchmarks (written in C/C++)
    • CTAS (air-traffic control tool)
    • Simplified Linux file system
    • Freeciv interactive game
    • Microsoft Word files
  • We developed specifications for all four
    • Very little development time (days, not weeks)
    • Most of time spent figuring out Freeciv and CTAS
  • Each benchmark has
    • Workload
    • Fault insertion methodology
  • Ran benchmarks with and without repair
slide25
CTAS
  • Set of air-traffic control tools
    • Traffic management
    • Arrival planning
    • Flow visualization
    • Shortcut planning
  • Deployed in centers around country (Dallas/Ft. Worth, Los Angeles, Denver, Miami, Minneapolis/St. Paul, Atlanta, Oakland)
  • Approximately 1 million lines of C/C++ code
results
Results
  • Workload – recorded radar feed from DFW
  • Fault insertion
    • Simulate error in flight plan processing
    • Bad airport index in flight plan data structure
  • Without repair
    • System crashes – segmentation fault
  • With repair
    • Aircraft has different origin or destination
    • System continues to execute
    • Anomaly eventually flushed from system
aspects of ctas
Aspects of CTAS
  • Lots of independent subcomputations
    • System processes hundreds of aircraft – problem with one should not affect others
    • Multipurpose system (visualization, arrival planning, shortcuts, …) – problem in one purpose should not affect others
  • Sliding time window: anomalies eventually flushed
  • Rebooting ineffective – system will crash again as soon as it sees the problematic flight plan
slide29

Simplified Linux File System

intro

0

110

1011

directory

block

super

block

group

block

inode

bitmap

block

block

bitmap

block

inode

inode

disk blocks

inode block

Some Consistency Properties

  • inode bitmap consistent with inode usage
  • block bitmap consistent with block usage
  • directory entries refer to valid inodes
  • files contain valid blocks only
  • files do not share blocks
results1
Results
  • Workload – write and verify several files
  • Fault insertion – crash file system
    • Inode and block bitmap errors
    • Partially initialized directory and inode entries
  • Without repair
    • Incorrect file contents because of inode and disk block sharing
  • With repair
    • Bitmaps repaired preventing illegal sharing, correct file contents
slide31

Freeciv

Terrain Grid

O = Ocean

Consistency Properties

  • Tiles have valid terrain values
  • Cities are not in the ocean
  • Each city has exactly one reference from city location grid
  • City locations are consistent in
    • City structures and
    • tile grid

O

P

M

M

P = Plain

O

O

P

M

M = Mountain

O

P

M

M

City

Structures

P

P

P

M

loc: 3,0

loc: 2,3

results2
Results
  • Workload – Freeciv software plays against itself
  • Fault insertion – randomly corrupt terrain values
  • Without repair – program fails (seg fault)
  • With repair
    • Game runs just fine
    • But game plays out differently because of the different terrain values
microsoft word files
Microsoft Word Files
  • Files consist of a sequence of streams
  • Streams stored using FAT-based data structure
  • Consistency Properties
    • FAT blocks exist and contain valid entries
    • FAT streams are properly terminated
    • Free blocks properly marked
    • Streams contain valid blocks
    • No sharing of blocks between streams

abst

1

7

0

intro

1

9

2

1

-1

-1

-2

Directory Entries

FAT

Disk Blocks

results3
Results
  • Workload – several Microsoft Word files
  • Fault insertion – scramble FAT
  • Without repair
    • If blocks containing the FAT were incorrectly marked as free, Word successfully loads file
    • Otherwise, “The document name or path is not valid”
  • With repair
    • Word loads all files
extensions
Extensions
  • Elimination of external consistency constraints
    • Eliminates problems with translating repairs on the abstract model to the actual data structure
    • Repair algorithm analyzes model definition rules to generate repair actions for the actual data structure
extensions1
Extensions
  • Support for doubly linked data structures
    • Enables the repair algorithm to regenerate back links
extensions2
Extensions
  • Compilation and optimization of consistency checking
    • Achieved significant speedups (n x) by compiling the specification
    • Achieved further speedups () by partially optimizing away the construction of the abstract model
related work
Related Work
  • Hand-coded repair
    • Lucent 5ESS switch
    • IBM MVS operating system
  • Self-stabilizing algorithms
  • Log-based recovery for database systems
  • Recovery-oriented computing
    • Recursive restartability
    • Undo framework
conclusion
Conclusion
  • Data structure repair interesting way to (potentially) improve reliability
  • Specification-based approach promises to make technique more widely applicable
  • Moving towards more robust, probabilistic, continuous concept of system behavior