1 / 19

A Locality Preserving Decentralized File System

A Locality Preserving Decentralized File System. Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky Intel Research Pittsburgh. objects in two tasks:. Project Intro. Defragmenting DHT: data layout for

Download Presentation

A Locality Preserving Decentralized File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky Intel Research Pittsburgh

  2. objects in two tasks: Project Intro • Defragmenting DHT: data layout for • Improved availability for entire tasks • Amortize data lookup latency Current DHT Data Layout: random placement Defragmented DHT Data Layout: sequential placement

  3. Background • EXISTING DHTSTORAGE SYSTEMS • Each server responsible for pseudo-random range of ID space • Object are given pseudo-random IDs 324 987 160 211-400 401-513 150-210 800-999

  4. Preserving Object Locality • Motivation • Fate sharing: all objects in a single operation are more likely to be available at once • Effective caching/prefetching: servers I’ve contacted recently are more likely to have what I want next • Design options: • Namespace locality (e.g., filesystem hierarchy) • Dynamic clustering (e.g., based on observed access patterns)

  5. Is Namespace Locality Good Enough?

  6. Encoding Object Names Bill 6 userid path encode blockid Docs 6 1 0 … bid 1 6 1 1 … bid 1 6 1 2 … bid 2 Bob 7 570-600 601-660 661-700

  7. Dynamic Load Balancing • Motivation • Hash function is no longer uniform • Uniform ID assignments to nodes leads to load imbalance • Design options: • Simple item balancing (MIT) • Mercury (CMU) storage load node number Load balance with 1024 nodes using the Harvard trace

  8. Results • How much improvement in availability and lookup latency can we expect? • What is the overhead of load balancing? • Setup • Trace-based simulation with Harvard trace • File blocks named using our encoding scheme • Same availability calculation as before • Clients keep open connections to 1-100 of the most recently contacted data servers • 1024 servers

  9. Potential Reduction in Lookups

  10. Potential Availability Improvement Random (expected) Ordered (unif) Optimal • Encoding has nearly identical failure prob as the “alphabetical” encoding (differs by ~0.0002)

  11. Data Migration Overhead

  12. Summary • Designed a DHT-based filesystem that preserves namespace locality • Potentially improves availability and lookup latency by an order of magnitude • Load balancing overhead is low • Todo: completing actual implementation, evaluation for NSDI

  13. Extra Slides

  14. Related Work • Namespace Locality: • Cylinder group allocation [FFS] • Co-locating data+meta-data [C-FFS] • Isolating user data in clusters [Archipelago] • Namespace flattening in object based storage [Self-*] • Load Balancing + Data Indirection: • DHT Item Balancing [SkipNets, Mercury] • Data Indirection [Total Recall]

  15. Namespace locality preserving encoding: 480 bits depth: 12 width: 65k petabytes 160 bits 16 bits 16 bits 16 bits 64 bits 64 bits SHA1(pkey) dir1 dir2 ... file block no. ver. hash intel.pkey /home/bob/Mail/INBOX 0000 . 0001 . 0004 . 0003 . 0000 . … b-tree-like 8kb blocks hash(data) Example Encoding Object Names 160 bits Traditional DHT key encoding: data SHA-1 Hash SHA1(data) • Leverage: • Large key space (amortized cost over wide-area is minimal) • Workload properties (e.g., 99% of the time directory depth < 12) • Corner cases: • Depth or width overflow: use 1 bit to signal overflow region and just use SHA1(filepath)

  16. Handling Temporary Resource Constraints • Drastic storage distribution changes can cause frequent data movement • Node storage can be temporarily constrained (i.e., no more disk space) • Solution: • Lazy data movement • Node responsible for a key keeps a pointer to actual data blocks • Data blocks can be stored anywhere in system

  17. rep1 NO SPACE! rep2 rep3 rep1 NO SPACE! rep2 rep3 rep1 rep2 rep3 Handling Temporary Resource Constraints data data WRITE data

  18. Load Balancing Algorithm • Basic Idea: • Contact a random node in the ring • If myLoad > delta*hisLoad (or vis versa), the lighter node changes its ID to move before the heavier node. • Heavy node’s load splits in two. • Node load within factor of 4 in O(log(n)) steps • Mercury optimizations: • Continuous sampling of load around the ring • Use estimated load histogram to do informed probes

  19. Load Balance Over Time

More Related