1 / 24

A Low-Bandwidth Network File System

A Low-Bandwidth Network File System. A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU. Key Ideas. A network file systems for slow or wide-area networks Exploits similarities between files or versions of the same file

Leo
Download Presentation

A Low-Bandwidth Network File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Low-Bandwidth Network File System A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU

  2. Key Ideas • A network file systems for slow or wide-area networks • Exploits similarities between files or versions of the same file • Avoids sending data that can be found in the server’s file system or the client’s cache • Also uses conventional compression and caching • Requires 90% less bandwidth than traditional network file systems

  3. Working on slow networks • Make local copies • Must worry about update conflicts • Use remote login • Only for text-based applications • Use instead a LBFS • Better than remote login • Must deal with issues like auto-saves blocking the editor for the duration of transfer

  4. LBFS • Exploits cross-file similarities especially with previous versions of the same file • Auto-save files, … • LBFS file server divides the files it stores into chunks and indexes the chunks by hash value • LBFS client similarly indexes a large persistent file cache • LBFS never transfers chunks that the recipient already has

  5. Previous Work (I) • AFS Callbacks require server to notify clients when a cached file has been modified • Leases achieve same goal but have an expiration time • Coda supports slow networks and even disconnected operation • Defers some updates to saves bandwidth • OceanStore applies Bayou’s conflict resolution mechanisms to a file system

  6. Previous Work (II) • Operation-based updates (Lee et al.) • Proxy-client close to the server duplicates client computations in the hope of duplicating its output files • Spring and Wetherall propose to use two large cooperating caches storing identical copies of the last n megabytes of network traffic • Rsync uses directory tree mirroring at client and server.

  7. LBFS • LBFS provides close-to-open consistency • Similar to AFS session consistency • LBFS assumes clients will have a cache large enough to contain a user’s entire working set of files • When possible, LBFS reconstitutes files using chunks of existing data in the file system and client cache instead of transmitting those chunks over the network

  8. Indexing Issues • Major challenge is keeping the index a reasonable size while dealing with shifting offsets • Indexing conventional file blocks would not work • Indexing and hashing overlapping file blocks at all offsets would require too much space

  9. LBFS Solution • Considers only non-overlapping chunks of files • Sets chunk boundaries based on file contents to avoid sensitivity to shifting file offset • Examines every overlapping 48-byte region of the file to selects boundary regions, or breakpoints, using Rabin fingerprints • Expected chunk size is 8 KB plus the size of the 48-byte breakpoint window

  10. Handling Insertions

  11. More Indexing Issues • Pathological cases • Very small chunks • Sending hashes of chunks would consume as much bandwidth as just sending the file • Very large chunks • Cannot be sent in a single RPC • LBFS imposes minimum and maximum chuck sizes

  12. The Chunk Database • Indexes each chunk by the first 64 bits of its SHA-1 hash • To avoid synchronization problems, LBFS always recomputes the SHA-1 hash of any data chunk before using it • Simplifies crash recovery • Recomputed SHA-1 values are also used to detect hash collisions in the database

  13. Protocol • Based on NFS version 3 • Adds • Extensions to exploit inter-file commonality (GETHASH) • Leases • Compresses all traffic using conventional gzip

  14. File Consistency (I) • Whenever a client makes any RPC on an LBFS file, it gets back a read lease on the file. • If a user opens a file whose lease has expired, the client asks the server for the attributes of the file • Grants the client a lease on the file. • Client can check if it has the current version of the file in its cache • If the file times have changed, client must obtain new contents of file from server

  15. File Consistency (II) • No need for write leases • LBFS provides close-to-open consistency • Server never demands back a dirty file • If multiple clients are writing the same file,the last one to close the file will overwrite changes from the others • File updates are atomic • Limits damage caused by concurrent updates

  16. Security Issues • LBFS uses SFS security infrastructure • Servers have public keys • Messages are encrypted • Specific security issue: • A user could check whether the file system contains a particular chunk of data by observing subtle timing differences in server’s answer to CONDWRITE request

  17. Implementation (I)

  18. Implementation (II) • Uses NFS • Two NFS-related issues • When server commits a temporary file to a target file, it must copy the contents of the temporary file onto the target file to preserve the target file i-node • Hard to preserve previous contents of a truncated file • Message order is guaranteed by TCP

  19. Evaluation (I) • Communality of data in /usr/local

  20. Evaluation (II) • Normalized bandwidth consumption(2 of 3 benchmarks)

  21. Key • First four bars of each workload show upstream bandwidth, the second four downstream bandwidth. • CIFS is Windows natural network file system • “Leases+Gzip” uses LBFS file caching, leases, and data compression but not its chunking scheme • “LBFS, new DB” is LBFS starting with a a new database

  22. Evaluation (III) Normalized application times

  23. Key • Execution times weere normalized orma,ized execution times Measurements made over a cable modem link with 384 Kb/sc uplink and 1.5 Mb/s downlink • LAN data were obtained on a 100 Mb/s full-duplex LAN.

  24. Conclusion • Under normal circumstances, LBFS consumes 90% less bandwidth than traditional file systems. • Makes transparent remote file access a viable and less frustrating alternative to running interactive programs on remote machines.

More Related