360 likes | 500 Views
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment. Introduction. Farsite: serverless distributed file system Logically functions as a centralized file server Designed for desktop environments Need some effort for initial configurations
E N D
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment
Introduction • Farsite: serverless distributed file system • Logically functions as a centralized file server • Designed for desktop environments • Need some effort for initial configurations • With little central administration to maintain
Farsite Characteristics • Peer-to-peer among untrusted machines • Need to handle privacy, integrity, durability • Cryptography • Randomized replication • Byzantine fault-tolerance
Farsite Workloads • High access locality • Low update rate • Sequential accesses with rare concurrency
Administration • Machine certificates bind machines to their public keys • User certificates bind users to their public keys • Namespace certificates bind namespace roots to their managing machines
Design Assumptions • for ~105 machines • All interconnected by a high-bandwidth, low-latency network • Majority of machines to be up most of the time • Uncorrelated permanent machine failures • Read-mostly sharing • Few malicious users
Enabling Technology Trends • Increase in unused disk capacity • In 2000, 58% of disk capacity unused at Microsoft • Can replicate data for reliability • Decrease in the computational cost • Can easily encrypt at 53 MB/sec • Disk transfers at 32 MB/sec • Can use strong cryptography for security
Namespace Roots • Allow multiple roots for multiple machines
Trust and Certification • Based on public-key-cryptographic certificates • Encrypt(Keypublic, textplain) textcipher • Decrypt(Keyprivate, textcipher) textplain • Encrypt(Keyprivate, textplain) textcipher • Decrypt(Keypublic, textcipher) textplain
Public Key Encryption Basics • Idea • Public key is published • Private key is the secret • Encrypt(Keymy_public, “Hi, Andy”) • Anyone can create it, but only I can read it • Encrypt(Keymy_private, “I’m Andy”) • Everyone can read it, but only I can create it
Public Key Encryption Basics • Encrypt(Keyyour_public, Encrypt(Keymy_private, “I know your secret”)) • Only you can read it, and only I can send it
Basic System • Every machine has three roles • Client • A machine that interacts with a user • Directory group • A set of machines that manage files via Byzantine-fault-tolerant protocol • Every group member owns a replica • File host
More on the Basic System + Reliability + Data integrity - Performance • Byzantine’s algorithm can only tolerate up to 1/3 of failed replicas • Need lots of replicas - Privacy - Storage consumption
System Enhancements • Local caching • A client can lease a copy of a file • Encrypt written files with public keys of all authorized clients • Offload those files to file hosts • Store only the content hash of those files locally • Can validate damaged copies • Can tolerate n – 1 file host failures
File Meta-Data Traditional Byzantine Approach [CL99] Client Byzantine fault-tolerant protocol 3f +1 file copies to handle f failures Byzantine servers
Farsite: BFT only for meta-data Client Byzantine fault-tolerant protocol f + 1 file copiesfor f failures Directory group File hosts
Semantic Differences from NTFS • Hard limit on concurrent writes • Soft limit on concurrent read • Sometime supply stale snapshots • No name-locking on open file’s path
File System Features • Reliability • Availability • Security • Durability • Consistency • Scalability • Efficiency • Manageability
Reliability and Availability • Replication • When a machine in unavailable for an extended period • Its functions migrate to others • Caching
Hash Encrypt Privacy • File content and metadata are encrypted • Convergent encryption • Encrypt(Hashone_way(blockplain), blockplain) blockcipher Data blocks
More on Convergent Encryption • Block hashes are used to identify identical block contents • Block-level encryption allows block-level changes without re-encrypting the entire file
Encrypt More on Convergent Encryption • Encrypt(Keyfile, file_hashesplain) file_hashescipher Block hashes
More on Convergent Encryption • Encrypt(Keyclient1_public, Keyfile) Keyfile_cipher1 • Encrypt(Keyclient2_public, Keyfile) Keyfile_cipher2 • … • Store both encrypted file and keys
Directories • Also encrypted • Use exclusive encryption • Prevent malicious client from encrypting a syntactically illegal name
Integrity • Use hash trees to compare files • If the root matches, two files are identical • If not, compare the hashes at the lower level • Until the discrepancy is identified • The cost of in-place updates is logarithmic of the file size • Linear time to verify the integrity of individual blocks
Durability • Updates are logged and compressed locally • The log is pushed back to the directory group periodically and when a lease is recalled • Each log entry is verified
Consistency • Control can be loaned to clients • Content leases • Name leases • Mode leases • Access leases
Data Consistency • Content leases • Read/write • Read-only • Assures no stale data • Single-writer, multiple-reader semantics • A lease is kept until it is expired or recalled • Can lease a file, directory, a tree
Namespace Consistency • Name leases • Can create a file name • Can create a directory and its files and subdirectories
Windows File-Sharing Semantics • Mode leases • Read, write, delete, exclude-read, exclude-write, exclude-delete
Windows Deletion Semantics • Open it, mark it for deletion, close it • A file is not deleted until the last file close • Access leases • Public: Lease holder has the file open • Protected • No other client will be granted access without first contacting the lease holder • Private • No other client has any access lease on the file
Scalability • Hint-based pathname translation • Caching • Delayed directory-change notification
Space Efficiency • Reclaim space from duplicate files • Workgroup-shared documents • Multiple copies of common applications • Can save 50% of storage requirement • Based on hash comparisons
Time Efficiency • Insert a delay between a file creation and replication • Expect many files get deleted shortly after their creation • Reduced network traffic
Local-Machine Administration • Machine replacement • A special case of hardware failure • Little need for backup
Performance Measurements • Used only five machines… • With only 1 hour of file-system trace • 450,164 file operations • 2 to 4 times as long as NTFS reads/writes/closes • 9 times as long for opens • 20 times as long for metadata accesses • 5.5 times slower I/O latencies