Venti a new approach to archival storage
Sponsored Links
This presentation is the property of its rightful owner.
1 / 19

Venti : A new approach to archival storage PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Venti : A new approach to archival storage. Sean Quinlan & Sean Dorward Bell Labs, Lucent Technology FAST 2002 Presented by: Fabián E. Bustamante. Archival Storage. Abundant storage - storing data for long time, even forever Archival system – write-once policy – changes how we see storage

Download Presentation

Venti : A new approach to archival storage

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Venti : A new approach to archival storage

Sean Quinlan & Sean Dorward

Bell Labs, Lucent Technology

FAST 2002

Presented by: Fabián E. Bustamante

Archival Storage

  • Abundant storage - storing data for long time, even forever

  • Archival system – write-once policy – changes how we see storage

  • Prevalent form of archival storage - tape backup

    • Central service for a number of clients

    • Restoring data tedious and error prone (requires sys admin or privileged software)

    • Infrequent, so problems may go undetected for long

    • Tradeoff bet/ performance of backup & restore (incremental backups are more efficient to generate, but not self-contained)

  • Snapshots (Plan 8, Andrew FS, …)

    • A consistent read-only view of FS at some point in past

    • Maintains file system permissions

    • Can be accessed with standard tools – ls, cat, cp, grep, diff

    • Avoids tradeoff between full vs. incremental backup

      • Looks like full backup

      • Implementation resembles incremental backup – share blocks

      • Only a small number of versions are kept


  • GOAL: “To provide a write-once archival repository that can be shared by multiple client machines and applications”

  • Block level network storage system

    • Actually a backend storage for client apps

    • Block level i/f places few restrictions on data structures/format

  • Blocks addressed by hash of their contents

    • Uses SHA-1 algorithm

    • SHA-1 output is 160 bit (20 byte) fingerprint of data block

  • Write once policy

    • Once written cannot be deleted

  • Multiple writes of same data coalesced

    • Data sharing – increases effective storage capacity

    • Makes write operation idempotent


  • Multiple clients can share a Venti server

    • Hash fn gives an universal namespace

  • Inherent integrity checking of data

    • Fingerprint computation on retrieval

  • Caching is simplified

  • Uses magnetic disks as storage technology

    • Access time comparable to non-archival data

  • Hashing function – sha1 is more than enough for now

  • Exabyte of storage (1018bytes) in 8 Kbytes blocks (~1014 blocks - n) with sha1 (b = 160 bits) – p < 10-20

Number of pairs of blocks

Probability that a given pair will collide

Venti as a building block to construct storage apps

Data divided into blocks

App needs fingerprint for retrieval

One approach - fingerprints packed into pointer blocks, also written to server

Repeated recursively to get single fingerprint - root of tree

Applications - general

Tree cannot be modified, but

Unchanged sections of tree reused

More complex data structures

Mixing data & fingerprints in a block

e.g. structure for storing file system

3 types of blocks

Directory – file metadata + root fingerprint



Applications - general

Example app: Vac

  • Similar to zip & tar – store collection of files & directories as single object

  • Tree of blocks formed for selected files

  • vac archive file - 45 bytes long

    • 20 bytes for root fingerprint

    • 25 bytes fixed header string

    • any amount of data compressed down to 45 bytes

  • Unvac – to restore file from archive

  • Duplicate copies of file coalesced on server

    • Multiple users vac same data – only 1 copy stored

    • vac on changed contents

Example app: Physical level backup

  • Copy the raw disk blocks to Venti

    • No need to walk file hierarchy

    • Higher throughput

  • Duplicate blocks are coalesced

  • Free block – null in pointer tree

  • User sees full backup of device

  • Storage space advs of incremental backup retained

  • Random access possible

    • Directly mounting a backup file system image (w/ OS support)

    • Enables lazy (on demand) restore

Example app - Plan 9 FS

  • Previously

    • Magnetic disk + write-once optical jukebox

    • Cache for faster access & accumulated changes bet/ snapshots

    • Cache smaller than active file system

    • Cache misses are painful

  • Plan 9 FS on top of Venti

    • Venti as storage device (instead of jukebox)

    • Equalizes access to both active & archival storage

    • Cache can also be smaller

Append-only log of data blocks

RAID array

Separate index maps fingerprints to log location

Can be regenerated from data log



Venti Server











Venti’s data log

Append only section of data blocks (variable size)

Log divided into arenas for maintenance

For integrity checking & data recovery

  • Data compressed (LZ77) –

  • encoding – if, algo.

  • esize size after compression

For fast index rebuild & error recovery

Data log and directory grow in opposite directionsWhen arena is filled, mark as sealed, fingerprint compute

Venti’s index

  • No. of fingerprints >> no. of blocks on a server

  • Index as disk-resident hash table

  • Hashing function maps fingerprints to index buckets

  • Going through index is main performance penalty

    • Caching (block and index)

    • Index striped across multiple disks – fingerprint location in index is random & index is too big to fit in memory

    • Write buffering (to increase number of concurrent accesses)

  • Block cache - hit - index lookup & data log bypassed

  • Index cache - hit - index lookup bypassed

Performance: computing environments

  • Testbed

    • Dedicated dual 550Mhz Pentium 3, 2GB mem.

    • Access through a 100Mbps Ethernet

    • Data log stored on a 500GB MaxTronic IDE Raid 5

    • Index stored on 8 Seagate Cheetah 18XL 9GB SCSI

  • Micro-benchmark results

Storage usage – file servers used

  • 2 plan 9 file servers, bootes and emelie

    • Spanning 1990 to 2001

  • 522 user accounts, 50-100 active at a time

  • Numerous development projects hosted

  • Several large data sets

    • Astronomical data, satellite imagery, multimedia files

  • Next figures

    • Size of active file system

    • Space consumed on jukebox

    • Space when using Venti

    • Ratio – using Venti reduces the cost of retaining snapshots (so ratio is smaller)

Storage usage

Storage saving

  • When stored on Venti, size of jukebox data reduced by 3 factors

    • Elimination of duplicate blocks

    • Elimination of block fragmentation

    • Compression of block contents

Reliability & recovery

  • Tools for integrity checking & error recovery

    • Verifying structure of arena

    • Checking there is an index entry for every block in data log, vice versa

    • Rebuilding index from data log

    • Copying arena to removable media

  • Data log on RAID 5 disk array

    • Protection against single drive failures

  • Off-site mirrors for extra protection, if that’s not enough …

  • Stored sealed arenas into removable media

Future Work

  • Load balancing

    • Distribute Venti across multiple machines

    • Replicate server

    • Use of proxy server to hide it from client

  • Better access control

    • Currently just authentication to server

    • Single root fingerprint gives access to entire file tree

  • Use of variable sized blocks as in LBFS


  • Addressing block by SHA-1 hash of contents – good fit for archival storage

  • Magnetic disks as storage technology

    • Large capacity at low price – online storage

    • Random access

    • Performance comparable to non-archival data

  • Write once model

    • Reduces accidental or malicious data loss

    • Simplifies administration

    • Simplifies caching

    • Allows sharing of data

  • Login