venti a new approach to archival storage
Skip this Video
Download Presentation
Venti : A new approach to archival storage

Loading in 2 Seconds...

play fullscreen
1 / 19

Venti : A new approach to archival storage - PowerPoint PPT Presentation

  • Uploaded on

Venti : A new approach to archival storage. Sean Quinlan & Sean Dorward Bell Labs, Lucent Technology FAST 2002 Presented by: Fabián E. Bustamante. Archival Storage. Abundant storage - storing data for long time, even forever Archival system – write-once policy – changes how we see storage

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Venti : A new approach to archival storage' - abner

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
venti a new approach to archival storage

Venti : A new approach to archival storage

Sean Quinlan & Sean Dorward

Bell Labs, Lucent Technology

FAST 2002

Presented by: Fabián E. Bustamante

archival storage
Archival Storage
  • Abundant storage - storing data for long time, even forever
  • Archival system – write-once policy – changes how we see storage
  • Prevalent form of archival storage - tape backup
    • Central service for a number of clients
    • Restoring data tedious and error prone (requires sys admin or privileged software)
    • Infrequent, so problems may go undetected for long
    • Tradeoff bet/ performance of backup & restore (incremental backups are more efficient to generate, but not self-contained)
  • Snapshots (Plan 8, Andrew FS, …)
    • A consistent read-only view of FS at some point in past
    • Maintains file system permissions
    • Can be accessed with standard tools – ls, cat, cp, grep, diff
    • Avoids tradeoff between full vs. incremental backup
      • Looks like full backup
      • Implementation resembles incremental backup – share blocks
      • Only a small number of versions are kept
  • GOAL: “To provide a write-once archival repository that can be shared by multiple client machines and applications”
  • Block level network storage system
    • Actually a backend storage for client apps
    • Block level i/f places few restrictions on data structures/format
  • Blocks addressed by hash of their contents
    • Uses SHA-1 algorithm
    • SHA-1 output is 160 bit (20 byte) fingerprint of data block
  • Write once policy
    • Once written cannot be deleted
  • Multiple writes of same data coalesced
    • Data sharing – increases effective storage capacity
    • Makes write operation idempotent
  • Multiple clients can share a Venti server
    • Hash fn gives an universal namespace
  • Inherent integrity checking of data
    • Fingerprint computation on retrieval
  • Caching is simplified
  • Uses magnetic disks as storage technology
    • Access time comparable to non-archival data
  • Hashing function – sha1 is more than enough for now
  • Exabyte of storage (1018bytes) in 8 Kbytes blocks (~1014 blocks - n) with sha1 (b = 160 bits) – p < 10-20

Number of pairs of blocks

Probability that a given pair will collide

applications general
Venti as a building block to construct storage apps

Data divided into blocks

App needs fingerprint for retrieval

One approach - fingerprints packed into pointer blocks, also written to server

Repeated recursively to get single fingerprint - root of tree

Applications - general
applications general1
Tree cannot be modified, but

Unchanged sections of tree reused

More complex data structures

Mixing data & fingerprints in a block

e.g. structure for storing file system

3 types of blocks

Directory – file metadata + root fingerprint



Applications - general
example app vac
Example app: Vac
  • Similar to zip & tar – store collection of files & directories as single object
  • Tree of blocks formed for selected files
  • vac archive file - 45 bytes long
    • 20 bytes for root fingerprint
    • 25 bytes fixed header string
    • any amount of data compressed down to 45 bytes
  • Unvac – to restore file from archive
  • Duplicate copies of file coalesced on server
    • Multiple users vac same data – only 1 copy stored
    • vac on changed contents
example app physical level backup
Example app: Physical level backup
  • Copy the raw disk blocks to Venti
    • No need to walk file hierarchy
    • Higher throughput
  • Duplicate blocks are coalesced
  • Free block – null in pointer tree
  • User sees full backup of device
  • Storage space advs of incremental backup retained
  • Random access possible
    • Directly mounting a backup file system image (w/ OS support)
    • Enables lazy (on demand) restore
example app plan 9 fs
Example app - Plan 9 FS
  • Previously
    • Magnetic disk + write-once optical jukebox
    • Cache for faster access & accumulated changes bet/ snapshots
    • Cache smaller than active file system
    • Cache misses are painful
  • Plan 9 FS on top of Venti
    • Venti as storage device (instead of jukebox)
    • Equalizes access to both active & archival storage
    • Cache can also be smaller
Append-only log of data blocks

RAID array

Separate index maps fingerprints to log location

Can be regenerated from data log



Venti Server










venti s data log
Venti’s data log

Append only section of data blocks (variable size)

Log divided into arenas for maintenance

For integrity checking & data recovery

  • Data compressed (LZ77) –
  • encoding – if, algo.
  • esize size after compression

For fast index rebuild & error recovery

Data log and directory grow in opposite directionsWhen arena is filled, mark as sealed, fingerprint compute

venti s index
Venti’s index
  • No. of fingerprints >> no. of blocks on a server
  • Index as disk-resident hash table
  • Hashing function maps fingerprints to index buckets
  • Going through index is main performance penalty
    • Caching (block and index)
    • Index striped across multiple disks – fingerprint location in index is random & index is too big to fit in memory
    • Write buffering (to increase number of concurrent accesses)
  • Block cache - hit - index lookup & data log bypassed
  • Index cache - hit - index lookup bypassed
performance computing environments
Performance: computing environments
  • Testbed
    • Dedicated dual 550Mhz Pentium 3, 2GB mem.
    • Access through a 100Mbps Ethernet
    • Data log stored on a 500GB MaxTronic IDE Raid 5
    • Index stored on 8 Seagate Cheetah 18XL 9GB SCSI
  • Micro-benchmark results
storage usage file servers used
Storage usage – file servers used
  • 2 plan 9 file servers, bootes and emelie
    • Spanning 1990 to 2001
  • 522 user accounts, 50-100 active at a time
  • Numerous development projects hosted
  • Several large data sets
    • Astronomical data, satellite imagery, multimedia files
  • Next figures
    • Size of active file system
    • Space consumed on jukebox
    • Space when using Venti
    • Ratio – using Venti reduces the cost of retaining snapshots (so ratio is smaller)
storage saving
Storage saving
  • When stored on Venti, size of jukebox data reduced by 3 factors
    • Elimination of duplicate blocks
    • Elimination of block fragmentation
    • Compression of block contents
reliability recovery
Reliability & recovery
  • Tools for integrity checking & error recovery
    • Verifying structure of arena
    • Checking there is an index entry for every block in data log, vice versa
    • Rebuilding index from data log
    • Copying arena to removable media
  • Data log on RAID 5 disk array
    • Protection against single drive failures
  • Off-site mirrors for extra protection, if that’s not enough …
  • Stored sealed arenas into removable media
future work
Future Work
  • Load balancing
    • Distribute Venti across multiple machines
    • Replicate server
    • Use of proxy server to hide it from client
  • Better access control
    • Currently just authentication to server
    • Single root fingerprint gives access to entire file tree
  • Use of variable sized blocks as in LBFS
  • Addressing block by SHA-1 hash of contents – good fit for archival storage
  • Magnetic disks as storage technology
    • Large capacity at low price – online storage
    • Random access
    • Performance comparable to non-archival data
  • Write once model
    • Reduces accidental or malicious data loss
    • Simplifies administration
    • Simplifies caching
    • Allows sharing of data