Venti a new approach to archival storage
1 / 19

Venti : A new approach to archival storage - PowerPoint PPT Presentation

  • Uploaded on

Venti : A new approach to archival storage. Sean Quinlan & Sean Dorward Bell Labs, Lucent Technology FAST 2002 Presented by: Fabián E. Bustamante. Archival Storage. Abundant storage - storing data for long time, even forever Archival system – write-once policy – changes how we see storage

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Venti : A new approach to archival storage' - abner

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Venti a new approach to archival storage

Venti : A new approach to archival storage

Sean Quinlan & Sean Dorward

Bell Labs, Lucent Technology

FAST 2002

Presented by: Fabián E. Bustamante

Archival storage
Archival Storage

  • Abundant storage - storing data for long time, even forever

  • Archival system – write-once policy – changes how we see storage

  • Prevalent form of archival storage - tape backup

    • Central service for a number of clients

    • Restoring data tedious and error prone (requires sys admin or privileged software)

    • Infrequent, so problems may go undetected for long

    • Tradeoff bet/ performance of backup & restore (incremental backups are more efficient to generate, but not self-contained)

  • Snapshots (Plan 8, Andrew FS, …)

    • A consistent read-only view of FS at some point in past

    • Maintains file system permissions

    • Can be accessed with standard tools – ls, cat, cp, grep, diff

    • Avoids tradeoff between full vs. incremental backup

      • Looks like full backup

      • Implementation resembles incremental backup – share blocks

      • Only a small number of versions are kept


  • GOAL: “To provide a write-once archival repository that can be shared by multiple client machines and applications”

  • Block level network storage system

    • Actually a backend storage for client apps

    • Block level i/f places few restrictions on data structures/format

  • Blocks addressed by hash of their contents

    • Uses SHA-1 algorithm

    • SHA-1 output is 160 bit (20 byte) fingerprint of data block

  • Write once policy

    • Once written cannot be deleted

  • Multiple writes of same data coalesced

    • Data sharing – increases effective storage capacity

    • Makes write operation idempotent


  • Multiple clients can share a Venti server

    • Hash fn gives an universal namespace

  • Inherent integrity checking of data

    • Fingerprint computation on retrieval

  • Caching is simplified

  • Uses magnetic disks as storage technology

    • Access time comparable to non-archival data

  • Hashing function – sha1 is more than enough for now

  • Exabyte of storage (1018bytes) in 8 Kbytes blocks (~1014 blocks - n) with sha1 (b = 160 bits) – p < 10-20

Number of pairs of blocks

Probability that a given pair will collide

Applications general

Venti as a building block to construct storage apps

Data divided into blocks

App needs fingerprint for retrieval

One approach - fingerprints packed into pointer blocks, also written to server

Repeated recursively to get single fingerprint - root of tree

Applications - general

Applications general1

Tree cannot be modified, but

Unchanged sections of tree reused

More complex data structures

Mixing data & fingerprints in a block

e.g. structure for storing file system

3 types of blocks

Directory – file metadata + root fingerprint



Applications - general

Example app vac
Example app: Vac

  • Similar to zip & tar – store collection of files & directories as single object

  • Tree of blocks formed for selected files

  • vac archive file - 45 bytes long

    • 20 bytes for root fingerprint

    • 25 bytes fixed header string

    • any amount of data compressed down to 45 bytes

  • Unvac – to restore file from archive

  • Duplicate copies of file coalesced on server

    • Multiple users vac same data – only 1 copy stored

    • vac on changed contents

Example app physical level backup
Example app: Physical level backup

  • Copy the raw disk blocks to Venti

    • No need to walk file hierarchy

    • Higher throughput

  • Duplicate blocks are coalesced

  • Free block – null in pointer tree

  • User sees full backup of device

  • Storage space advs of incremental backup retained

  • Random access possible

    • Directly mounting a backup file system image (w/ OS support)

    • Enables lazy (on demand) restore

Example app plan 9 fs
Example app - Plan 9 FS

  • Previously

    • Magnetic disk + write-once optical jukebox

    • Cache for faster access & accumulated changes bet/ snapshots

    • Cache smaller than active file system

    • Cache misses are painful

  • Plan 9 FS on top of Venti

    • Venti as storage device (instead of jukebox)

    • Equalizes access to both active & archival storage

    • Cache can also be smaller


Append-only log of data blocks

RAID array

Separate index maps fingerprints to log location

Can be regenerated from data log



Venti Server











Venti s data log
Venti’s data log

Append only section of data blocks (variable size)

Log divided into arenas for maintenance

For integrity checking & data recovery

  • Data compressed (LZ77) –

  • encoding – if, algo.

  • esize size after compression

For fast index rebuild & error recovery

Data log and directory grow in opposite directionsWhen arena is filled, mark as sealed, fingerprint compute

Venti s index
Venti’s index

  • No. of fingerprints >> no. of blocks on a server

  • Index as disk-resident hash table

  • Hashing function maps fingerprints to index buckets

  • Going through index is main performance penalty

    • Caching (block and index)

    • Index striped across multiple disks – fingerprint location in index is random & index is too big to fit in memory

    • Write buffering (to increase number of concurrent accesses)

  • Block cache - hit - index lookup & data log bypassed

  • Index cache - hit - index lookup bypassed

Performance computing environments
Performance: computing environments

  • Testbed

    • Dedicated dual 550Mhz Pentium 3, 2GB mem.

    • Access through a 100Mbps Ethernet

    • Data log stored on a 500GB MaxTronic IDE Raid 5

    • Index stored on 8 Seagate Cheetah 18XL 9GB SCSI

  • Micro-benchmark results

Storage usage file servers used
Storage usage – file servers used

  • 2 plan 9 file servers, bootes and emelie

    • Spanning 1990 to 2001

  • 522 user accounts, 50-100 active at a time

  • Numerous development projects hosted

  • Several large data sets

    • Astronomical data, satellite imagery, multimedia files

  • Next figures

    • Size of active file system

    • Space consumed on jukebox

    • Space when using Venti

    • Ratio – using Venti reduces the cost of retaining snapshots (so ratio is smaller)

Storage saving
Storage saving

  • When stored on Venti, size of jukebox data reduced by 3 factors

    • Elimination of duplicate blocks

    • Elimination of block fragmentation

    • Compression of block contents

Reliability recovery
Reliability & recovery

  • Tools for integrity checking & error recovery

    • Verifying structure of arena

    • Checking there is an index entry for every block in data log, vice versa

    • Rebuilding index from data log

    • Copying arena to removable media

  • Data log on RAID 5 disk array

    • Protection against single drive failures

  • Off-site mirrors for extra protection, if that’s not enough …

  • Stored sealed arenas into removable media

Future work
Future Work

  • Load balancing

    • Distribute Venti across multiple machines

    • Replicate server

    • Use of proxy server to hide it from client

  • Better access control

    • Currently just authentication to server

    • Single root fingerprint gives access to entire file tree

  • Use of variable sized blocks as in LBFS


  • Addressing block by SHA-1 hash of contents – good fit for archival storage

  • Magnetic disks as storage technology

    • Large capacity at low price – online storage

    • Random access

    • Performance comparable to non-archival data

  • Write once model

    • Reduces accidental or malicious data loss

    • Simplifies administration

    • Simplifies caching

    • Allows sharing of data