1 / 48

A chicken in every pot: a persistent snapshot memory scaled in time

A chicken in every pot: a persistent snapshot memory scaled in time. Liuba Shrira and Hao Xu Brandeis University. Storage systems: the 7 year itch. 1984: rotational delay – FFS 1991: large memory - LFS 1998: cheaper disk - Elephant 2005: .. a chicken in every pot :

marla
Download Presentation

A chicken in every pot: a persistent snapshot memory scaled in time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A chicken in every pot:a persistent snapshot memoryscaled in time Liuba Shrira and Hao Xu Brandeis University

  2. Storage systems: the 7 year itch 1984: rotational delay – FFS 1991: large memory - LFS 1998: cheaper disk - Elephant 2005: .. a chicken in every pot : snapshot box on the side..

  3. Trends Hardware: Disk Cheap (1$/GB) and cheaper Software Industry: Forbes (12/2004) says: need for keeping past state is growing

  4. Trends cont. - A casino chases a card counter - IT dept. chased by Sarbanes Oxley - Hippocratic DB audited about patient privacy preservation Need to analyze past activity

  5. SNAP: a snapshot system for an object storage system Goal: Storage system capability for back-in-time execution (BITE): application runs against read-only snapshots without synchronization analysis in retrospect

  6. Baseline Requirements for BITE Consistent snapshots: same (old) invariants hold BITE of general code: after-the-fact ad-hoc analysis ( vs predefined SQL access methods) App chooses the snapshot: snapshot state meaningful to app (vs “some time in the past” ) High time “resolution”: fine-grained past analysis (vs backup for recovery)

  7. Over long time-scales.. Living with the past: how close? today: too close (Temporal DB, CVFS) or too far (warehouse - Netezza) Snapshots can be of long-term importance, or transient today: uniform - apps can not discriminate Inherent tension: latency of access vs cost of representation (space and time) today: limited adaptation - compress or not

  8. Capturing past states Two ways: Cheep - no-overwrite update past stays put, copy new : less to write, but bloated DB, past inherits same rep Opportunistic- in-place update past is copied-out, separated: more to write but can write smartly, can tailor past rep, and DB stays clustered (vigor)

  9. Our requirements: Non-disruptive past: just right distance - separated At adaptive distance: e.g. faster BITE on more recent states Discriminated past: application classifies, snapshot system filters: Some snapshots outlive others, somecan be accessed faster Flexible classification: e.g. after the fact

  10. Snapshot system operations Request to take a snapshot (declaration): sid: snapshot_request (filter_spec) Request to access a snapshot v: snapshot_access (sid) Request to specify a filter for a snapshot v: lazy_filter (sid,filter_spec) T1, T2, S1, T3, T4, T5, S2,…

  11. Baseline storage system General interface: pages and a page table transactions access objects on pages Server: DB disk: slotted pages of objects physical oid (page#,o#) and a page table Transaction Log Cache: pages and modifed object cache

  12. Storage system, cont.optimistic CC+ARIES Clients fetch pages, run transactions send modifed objects to server Server validates, commits (WAL) caches committed modifications no-force, no-STEAL

  13. The snapshot system Archive separated from DB: Archive i/o sequential, DB random Copy-on-write (COW): copy out snapshot states into archive just before updating DB during cleaning.

  14. Snapshot interface Same as DB - SnapshotPages Snapshot Page Table So BITE is transparent: BITE on snapshot S(v) uses PageTable(v)

  15. Snapshot system:below the interface: Some S(v) pages are in the archive, some in DB and pages in the archive can have a different representations

  16. BITE (v): namespace redirection

  17. Creating non-disruptive snapshots: (i/o bound system) Archiving snapshot states when cleaning can slow down cleaning compared to a system without snapshots. Copying to the archive disk (sequential I/O) in parallel to database I/O (random) can partially hide archiving cost behind database I/O.

  18. Creating snapshots: how well can you hide? Is determined by: how much is archived: compactness of snapshot representation, frequency, snapshot update workload (overwriting) cost of archiving, sequential, other archive traffic – BITE

  19. Creating snapshots: some issues Issue: avoid overwriting snapshot states (without blocking, pinning etc) Issue: update snapshot meta data efficiently (large, dynamic page tables ) Issue: filter out long-lived snaps (focus here)

  20. New techniques for copy-out snapshots: - VMOB: in-memory versioned data structure preserves snapshot states w/out blocking • LPT: incrementally archived page table with logarithmic reconstruction cost • Filtering: exploit smart representation for past states (focus here)

  21. Filtering: motivation Want unlimited past at high resolution but some snapshots are transient others of long-term interest to application application needs to discriminate between snapshots

  22. Thresher: a filtering system for SNAP

  23. Snapshot representation What can representation do for filtering? life-time based allocation – avoids fragmentation diff-based encoding – reduces cost of copying adaptive combination - real winner

  24. Example: hierarchical snapshots at multiple time granularity ICU patient monitoring DB takes snapshots:: minute by minute vital sign monitor readings hourly includes nurse’s writeup summarizing monitor readings daily includes doctor’s notes summarizing nurse’s checkups Doctor’s have longer life-time than nurse’s …

  25. Brief overview: snapshot creation Some notation: Snapshot span Recorded pages example: .. v4, T: w (x_P), T’: w (y_S), v5, T’’.. Span of v4 : T, T’ Pages recorded by snapshot v4: P, S

  26. Incremental snapshot creation: Archived snapshot pages: dispersed: v4 P S v5 P Q …-|-----------------------|------------------------ Archived snapshot page tables (PT): PT(v4): addr (P4), addr(S4); PT(v5): addr(P5), addr(Q5).. …-|-----------------------|------------------------- Another talk: how to constructarchived page tables: :ConstructAPT (v4) = recorded (v4) + Construct APT (v5)

  27. Filtering example: filter out short-lived v5 Doctor’s Nurse’s v4 P S v5 P Q v6 …-|-----------------------|-----------------------|- Archive Filter: long-lived v4, reclaim v5: reclaim P5 retain Q5 (v4 needs it) filtering incremental snapshots creates fragmentation

  28. Problem: fragmentation • fragmented archive, over time: non sequential archive writes or random reads to copy out long lived states

  29. Our approach: filter-spec Filter spec determines relative snapshot lifetime “App knows best”: the app supplies a filter spec the system filters

  30. avoid fragmentation with filter-spec Known at snapshot declaration – use lifetime-based allocation After the fact - use a flexible rep to filter lazily rep allows adaptive trade-off: cost of filtering vs cost of BITE

  31. App specifies filter at declaration P4 S4 Q5 long-lived pages …-|----------------------------------------------- P5 short-lived …-|----------------------------------------------- Invariant : to reclaim w/out fragmentation, short-lived areas store no long-lived pages

  32. FilterTree: filter pages for free

  33. After-the-fact (lazy) filtering Some applications want to defer filter specification Lazy filtering requires copying We can specialize representation (compact) to reduce copying cost

  34. Compact representation: diffs Two components filtered separately: compact diffs – reduce cost of copying (diffs clustered by page) checkpoints – accelerate BITE (page-based snapshots system-declared, can use FilterTree)

  35. Adaptive trade-off Like recovery log: less frequent checkpoints increase compactness more frequent checkpoints accelerate BITE

  36. Lazy filtering: checkpoints filtered for free Archive regions for diff extents FilterTree for checkpoints E G2(diffs) … … B1 G1(diffs) E1 E2 E3 B1 B2 B3

  37. But some applications want more: lazy filtering and faster BITE e.g. - app runs BITE on batch of recent snapshots to decide which ones to retain - needs fast BITE to keep up..

  38. Combined hybrid Faster BITE in recent window and Lazy filtering

  39. Hybrid: checkpoints and checkpointfiltered for free

  40. Status Implemented: SNAP and Thresher for Thor storage system Performance results – encouraging. here is a 5000 feet view:

  41. Performance metrics Cost of filtering: non-disruptiveness = rate-of-drain/ rate-of-pour t_clean determins rate-of-drain workload parameter: overwriting Compactness of diff-based rep: retention relative to page-based rep R_diff - fixed R_ckp - tunable by frequency of checkpoints workload parameter:density BITE - page-based snapshots, vs diff-based vs DB

  42. Non-disruptiveness Storage system w/hybrid snapshots vs w/out snapshots (Thor) How much drop in rate-of-drain / rate-of-pour

  43. Experimental configuration Workoads: extend multiuser 007 to control density overwriting System configuration: single client, medium 007 – small DB 185MB multiple clients – large DB 140GB

  44. FIlterTree Free!

  45. Non-disruptiveness/ single client “summertime …life is easy”

  46. Non-disruptiveness/multi user: “DB works harder”

  47. Summary: non-disruptive snapshot memory Unlimited filtered past is cheaper than you may think. .. A chicken in every pot.. Every storage system can have a snapshot box on the side..

  48. To get there: Generalize: ARIES/ STEAL / underway file systems / need extended interfaces Beyond: upgrades/ have techniques provenance / need ideas..

More Related