1 / 30

Rewind, Repair, Replay: Three R’s to improve dependability

Rewind, Repair, Replay: Three R’s to improve dependability. Aaron Brown and David Patterson ROC Research Group University of California at Berkeley SIGOPS European Workshop, 23 September 2002. What if computer systems could travel in time?. We could have retroactive repair

lyndon
Download Presentation

Rewind, Repair, Replay: Three R’s to improve dependability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rewind, Repair, Replay:Three R’s to improve dependability Aaron Brown and David Patterson ROC Research GroupUniversity of California at Berkeley SIGOPS European Workshop, 23 September 2002

  2. What if computer systems could travel in time? • We could have retroactive repair • travel back and fix problems before they had a chance to corrupt data • We could eliminate human operator error • make a mistake? Just travel back and try it again. • Our systems could be more robust • we could eliminate the dangers of upgrades • we could better tolerate buggy software • we might even be able to tolerate viruses and hackers • We could make more dependable systems

  3. Sci-fi time travel our hero loses a loved one or lives through disaster hero uses time machine to travel back in time hero alters the past to avert the future disaster hero returns to the present; past changes have been merged into the original timeline Computer time travel human error, software bug, or attack causes data loss Rewind: roll system state backwards in time Repair: make changes to avert foretold disaster Replay: roll system state forward, merging the original timeline with the effects of repairs Sci-fi vs. computer time travel • Three R’s are the fundamental primitives of computer time travel

  4. Key properties of the 3R’s • Recovery from problems at any system layer • rewind, repair, replay cover OS through application • Recovery from unanticipated problems • arbitrary repair • No assumptions about correct application behavior • physical rewind • Integrated interface • provide “undo for sysadmins”

  5. What about existing approaches?

  6. Designing a 3R system • Goals • application-neutrality • provide abstractions for reasoning about 3R behavior • Target domain: network services • accessed by remote users via well-defined interfaces • email, messaging, e-commerce, auctions, forums, web hosting, enterprise applications (J2EE, .NET), ... • Challenges, learned from first attempt • integrating history and repair during replay • managing inconsistency in externally-visible state

  7. ControlUI App. Service Includes: - user state - application - operating system UndoManager Time-travelstorage layer HistoryLog 3R API control Basic architecture • Application-independent undo manager • coordinates 3R cycle; manages external inconsistencies • linked via a set of APIs to application, time-travel storage, history log, and control UI

  8. Abstracting the application service • To the undo manager, the application is: • a collection of state • a history of events affecting the state • an event is typically a user interaction with the service • a model of acceptable external consistency • These are encoded into application-defined verbs • high-level encodings of user interactions (events) • records of intent to alter state, not actual state changes • reference application state by opaque UIDs • provide policies that define external consistency

  9. Verbs and the 3R cycle • Normal operation • undo manager logs application-provided verbs to disk Userinteraction ControlUI App. Service Verbs Includes: - user state - application - operating system UndoManager HistoryLog Time-travelstorage layer control

  10. Verbs and the 3R cycle • Rewind • time-travel storage layer reverts system hard state to rewind point • all changes since rewind point are discarded ControlUI App. Service Includes: - user state - application - operating system UndoManager HistoryLog Time-travelstorage layer control

  11. Verbs and the 3R cycle • Repair • operator edits logged history and/or makes arbitrary changes to system ControlUI Repairs Edits App. Service Includes: - user state - application - operating system UndoManager HistoryLog Time-travelstorage layer control

  12. Verbs and the 3R cycle • Replay • undo manager feeds verbs back to application for re-execution in the context of repaired system ControlUI App. Service Includes: - user state - application - operating system UndoManager HistoryLog Verbs Time-travelstorage layer control

  13. The fundamental roles of verbs • Providing application-independence • verbs encapsulate application semantics, but remain semi-opaque to undo manager • Integration of repair into history • high-level specification of intent makes verbs relatively independent of system changes • verbs are re-executed, not restored, so they inherit effects of repairs • Scoping restored history • only changes logged as verbs will be preserved by 3Rs • effects of bugs, corruption, human error are discarded • can reason about what is preserved/lost in 3R cycle

  14. Managing external inconsistency • External inconsistency == time paradox? • system is internally-consistent after a 3R cycle • but external observers see inexplicable state changes • external inconsistency is OK unless affected state was externalized (observed) before the 3R cycle • Coping with external inconsistency • cannot eliminate • must manage: ignore, explain, compensate, encompass • Verbs let us manage external inconsistency

  15. Managing inconsistency with verbs • To detect inconsistencies: • verbs specify the state that they depend upon • undo manager tracks signatures of that state • if verb is altered or if signatures don’t match, there is an inconsistency • applications supporting relaxed consistency can replace signature-check with arbitrary consistency predicates • To detect state viewed externally: • verbs indicate what state they externalize • example: IMAP fetch verb externalizes email message • To handle externalized inconsistencies: • verb supplies compensation functions

  16. Hello olleH m m ! Deliver Fetch Inbox olleH Move olleH Folder1 DeliverMsg MoveMsg FetchMsg Externalizes: — ContentDep: — ExistsDep: Inbox Externalizes: — ContentDep: — ExistsDep: Inbox, Folder1 Externalizes: m ContentDep: m ExistsDep: m, Folder1 + input “Hello” + Signature(m)=“olleH” Email example: original timeline Systemboundary Systemstate Verbs Historylog Time

  17. Hello Hello Hello olleH m m m m ! Deliver Deliver Fetch Fetch Inbox olleH Move Move Hello olleH Hello Folder1 mismatch! => inconsistency DeliverMsg DeliverMsg MoveMsg MoveMsg FetchMsg FetchMsg Externalizes: — ContentDep: — ExistsDep: Inbox Externalizes: — ContentDep: — ExistsDep: Inbox Externalizes: — ContentDep: — ExistsDep: Inbox, Folder1 Externalizes: — ContentDep: — ExistsDep: Inbox, Folder1 Externalizes: m ContentDep: m ExistsDep: m, Folder1 Externalizes: m ContentDep: m ExistsDep: m, Folder1 + input “Hello” + input “Hello” + Signature(m)=“olleH” + Signature(m)=“olleH” Email example: replay timeline Systemboundary X Systemstate Verbs Historylog Time

  18. Recap: 3R architecture • Goal: application-neutral implementation of 3R’s • verb abstraction couples generic undo manager to app. • verbs provide tools to reason about 3R behavior • Challenges • integrating history and repair during replay • re-executing verbs restores intent of history • managing inconsistency in externally-visible state • verbs track externalization, state dependencies, and define compensations

  19. Status • Prototype implementation of 3R primitives nearly complete • app-independent undo manager written in Java • all APIs defined as Java interfaces • Network Appliance filer as time-travel storage layer • BerkeleyDB as history log • First target app: web-based email service • 3R-enhanced JavaMail API provider classes • plus additional hooks to verb-ify operator maintenance tasks like account creation • JWebMail web front-end • RDBMS-based backend mail store (DB2 or MySQL) • implementation in progress

  20. Open issues & future work • Resource impact of the 3R’s • what are the performance/space penalties for the 3R’s? • Verb definition • can we specify verbs & consistency policy declaratively? • Providing the 3R’s at multiple granularities • can we track & manage cross-granularity dependencies? • Measuring the dependability benefit of 3R’s • how do we build recovery/dependability benchmarks? • Other uses for verb-based characterizations • easy georeplication? online self-checking? automatic verification of upgrades?

  21. Conclusions • We can build time travel for computers • using the 3R’s: Rewind, Repair, Replay • An architecture for the 3R primitives • generic undo manager coupled to application by verbs • Verbs are a useful abstraction for the 3R’s • can use to reason about effects of 3R’s on state • help address problem of external inconsistencies • Prototype 3R-enabled email system under construction • hope to demonstrate increased dependability and faster recovery from problems

  22. Rewind, Repair, Replay:Three R’s to improve dependability For more information: http://roc.cs.berkeley.edu/ abrown@cs.berkeley.edu

  23. Backup slides

  24. Verbs vs. transactions • Both encapsulate state-altering events • But, unlike transactions: • verbs are higher-level, recording end-user intent, not specific state changes • verbs do not depend on internal data models (but do depend on external protocols) • transactions are the reverse • verbs do not necessarily conform to ACID consistency • verbs inherit consistency model provided by application at the external-protocol level

  25. Implementing verbs • Verbs are defined by a type hierarchy • base type defines interfaces for state dependencies, externalizations, predicates, compensations • applications subclass the base type for their verbs • additions to the type are opaque to the undo manager • Referencing state • all user-visible state named by time-invariant UIDs • undo manager requires signature method for all state • Consistency predicates and compensations are application-supplied functions • they encode the app’s external consistency model

  26. Defining verbs • Currently, verbs are defined procedurally • provide dependency information via lists of state IDs • provide functions for special consistency predicates • provide functions for compensation • Better: declarative specification • compile textual specification into verb code using libraries of predicates and compensation fns • reduces complexity of adding 3R’s to the application • increases confidence in undo system via easier testing

  27. External consistency policies • Verbs capture external consistency policies • Example: email • message order in folder is irrelevant • AppendMessage verb does not express dependency on content of target folder, only its existence • content of messages is relevant, except for headers • ReadMsg verb depends on hash of target message body; if changed, compensate by inserting explanatory text • Example: e-commerce • order total depends on item prices, not descriptions • Checkout verb depends on prices of items in cart, not their hash-values; if sum of prices changed, compensate by emailing customer for approval

  28. External consistency policies (2) • Example: auctions • new bid must be larger than prior bids • PlaceBid verb depends on content of all bids in bid set; if one is now larger than new bid, compensate by canceling new bid and informing bidder

  29. Application implications • To support the 3R’s, an application must have: • a high-level, verb-structured interface/API for user, operator, and external actions • a state model where all user-visible state: • is nameable via the API • is tagged with GUIDs • supports a signature/hash method • a relaxed external consistency model that allows compensation for externalized inconsistent verbs

  30. Example: a 3R email store • State • mailstores, folders, messages, user properties, aliases • Verbs • transport: create/delete/alter mapping; deliver msg • directory: create/alter/delete user-entry; create/alter/delete filter-rule; add/remove maildrop • store: create/delete store; create/rename/delete folder; expunge folder; list folder; set folder flags; copy msg; append msg; fetch msg; set msg flags HTTP IMAP, internal WebUI SMTP Transport Store internal LDAP, internal verbs Directory/Auth. UndoMgr verbs

More Related