1 / 16

Deciding when to forget in the Elephant file system

Deciding when to forget in the Elephant file system. Douglas S. Santry Michael J. Feeley Norman C. Hutchinson Alistair C. Veitch Ross W. Carton Jacob Ofir. Key Idea. Elephant automatically retains all important versions of user files

loring
Download Presentation

Deciding when to forget in the Elephant file system

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deciding when to forgetin the Elephant file system Douglas S. Santry Michael J. FeeleyNorman C. Hutchinson Alistair C. VeitchRoss W. Carton Jacob Ofir

  2. Key Idea • Elephant automatically retains all important versions of user files • Elephant uses file-grain user-specified retention policies to reclaim storage • Previous file versions are named by combining a traditional pathname with a time when the desired version of a file or directory existed

  3. INTRODUCTION • Modern file systems associate • Deletion of a file with the immediate release of storage • File writes with the irrevocable change of file contents • Users control what is on disk by explicitly creating, updating and deleting files • Best solution when disk space was at a premium

  4. The problem • Key problem with current approach is that user actions have immediate and irrevocableeffect on disk storage • Users are not protected against their own mistakes • Goes against file system objective of protecting data against failure • We can do better today

  5. Current solutions (I) • Cedar protected against accidental overwrites by saving the last few versions of file • Cedar files were immutable: each write created a new version of the file • Does nothing for deleted files • Windows and Mac OS allow users to undelete recently deleted files • Does nothing for files that were overwritten

  6. Current solutions (II) • Many systems are regularly backed up • Can restore the state of any file at backup time • Many users maintain multiple versions of their critical data

  7. Basic issues • Can maintain multiple versions of user filesbut not all versions of all files • Need a retention policy • Should we involve the user in the retention/reclamation decisions? Involving the user means • Less protection from user mistakes • A retention policy that might be better suited to the users’ needs

  8. Not all files are created equal • Read-only files (like application executables) have no version history • Derived files (like object files) can be easily reconstituted • Cached files require no version history • Temporary files might benefit from a short-term history but not from a long-term history • User-modified fileswould benefit most from a long-term and a short-term history

  9. The two objectives • Providing users with the ability of undoing recent changes • Keep the complete history of a file over a short period of time (one hour to one week) • Maintaining a long-term history of important versionsof each file • Keep forever landmark versions of each file

  10. Finding the landmark versions • Could rely on the user • User ability to recognize landmark versions of a file degrades with age of versions • Elephant detects landmark versions bylooking at time line of updates to the file • Can identify groups of updates separated by long periods of stability • Last versions of each group of updates are assumed to be landmark versions

  11. User interface • File versions are • Indexed by their creation time • Named by combining the file pathname with a date and time • Versioning is extended to directories • Allow for recovery of deletes • Previous versions of a file or a directory are read-only

  12. Retention policies (I) • Keep One: only keeps latest version of the file • Keep All:keeps all versions of the file • Keep Safe:keeps all versions of the file during a specific second-chance interval • Keep Landmarks : keeps all versions of the file during a specific second-chance intervaland only landmark versions after that

  13. Retention policies (II) • Keep-Landmarks policy also allows user to group files for consideration • Important for inter-dependent files as their consistency requires viewing all files as of the same point of time • Grouping policy is quite flexible: user can specify • Individual files • Entire directories of subtrees

  14. Implementation (I) • I-nodes of non-versioned files are stored in a special i-node file • I-nodes of versioned files are stored in an i-node log • Versions are stored as an ordered sequence of i-nodes • Changes are detected at the block level • Versions of the same file share identical blocks

  15. Implementation (II) • Elephant use a different mechanism for versioned directories • We did not discuss it in class

  16. Performance • Somewhat slower than conventional file systems • Using HP-UX traces collected at HP Labs one can estimate that Keep-Landmarks files would account for 62.4 % of files but only 15.2% of the disk space

More Related