1 / 33

NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System

NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System. Yuchong Hu 1 , Chiu-Man Yu 2 , Yan-Kit Li 2 Patrick P. C. Lee 2 , John C. S. Lui 2 1 Institute of Network Coding 2 Department of Computer Science and Engineering The Chinese University of Hong Kong

trung
Download Presentation

NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu1, Chiu-Man Yu2, Yan-Kit Li2Patrick P. C. Lee2, John C. S. Lui2 1Institute of Network Coding2Department of Computer Science and Engineering The Chinese University of Hong Kong NetCod ’11

  2. NCFS Overview • NCFS: Network-coding-based distributed file system • Use network coding to speed up repair while preserving fault tolerance of storage • No intelligence required on storage nodes • Data is transparently striped across nodes

  3. NCFS Overview • NCFS is a file system • Organize storage in hierarchical namespace • Manage file namespace metadata (e.g., filenames, directories, access rights) • Clients only see a mount drive • Similar to Linux/Windows file systems # Login as root (now ncfs needs root access) sudo -i # mount file system on ./mountdir ./ncfs rootdir mountdir # copy file to ncfs time cp test.dat ./mountdir

  4. Why We Build NCFS? • Develop a proof-of-concept prototype for experimenting erasure codes and regenerating codes in practice • We by no means claim it’s ready for large-scale use • Provide a storage interface between user applications and distributed storage backends • Provide an extensible platform for future systems research

  5. Definitions native block • A data stream is divided into block groups, each with m native blocks • m native blocks are encoded to form c code blocks code block (or parity block) segment (aka stripe) D1 D2 Dn An array of n disks

  6. MDS Codes • A maximum distance separable code, denoted by MDS(n, k), has the property that any k (< n) out of n nodes can be used to reconstruct original native blocks • i.e., at most n-k disk failures can be tolerated • Example: • RAID-5 is an MDS(n, n-1) code as it can tolerate 1 disk failure

  7. Repair Degree • Repair degree d • Consider the case where one disk is failed • The repair for the lost blocks of the failed disk is achieved by retrieving data from d disks • The design of an MDS code is determined by the parameters (n, k, d, m, c)

  8. Classes of MDS Codes • Erasure codes • Stripe data with redundancy • e.g., RAID-like codes, array codes • Regenerating codes • Stripe data with redundancy and reduce the repair bandwidth based on network coding We focus on the optimal erasure codes (i.e., the MDS property is achieved). Near-optimal erasure codes require slightly more than k blocks to recover the original data Dimakis et al., “A Survey on Network Codes for Distributed Storage”, Proc. of the IEEE, March 2011

  9. Erasure Codes Data stream of native blocks • RAID-5 m1 m2 m3 m4 m5 … c1 = m1 + m2 + m3 • Define + as XOR operator • Code block • c1 = XOR-sum of native blocks in same segment • Parameters: • n = can be any ≥ 2 • k = n - 1 • d = k • m = n - 1 • c = 1 m1 m2 m3 c1 m4 m5 c2 m6 m7 c3 m8 m9 c4 m10 m11 m12 m13 m14 m15 c5 m16 m17 c6 m18 D1 D2 D3 D4

  10. Erasure Codes Data stream of native blocks • RAID-6 m1 m2 m3 m4 m5 … c1 = m1 + m2 c2 = m1 + 2 * m2 • Code blocks: • c1 = XOR-sum of native blocks in same segment • c2 = Linear combination based on Reed Solomon codes • Parameters: • n = can be any ≥ 3 • k = n - 2 • d = k • m = n - 2 • c = 2 m1 m2 c1 c2 m3 c3 c4 m4 c5 c6 m5 m6 m7 m8 c7 c8 m9 c9 c10 m10 c11 c12 m11 m12 D1 D2 D3 D4 Intel. “Intelligent RAID6 Theory Overview and Implementation”, 2005.

  11. Regenerating Codes • Regenerating codes are the MDS codes that realize the optimal tradeoff between storage and repair bandwidth • Two extreme points of the tradeoff curve: • MBR (minimum bandwidth regenerating codes) • MSR (minimum storage regenerating codes) Dimakis et al., “Network Coding for Distributed Storage Systems”, IEEE Trans. on Info. Theory, 2010

  12. Regenerating Codes • E-MBR (Minimum bandwidth regenerating) codes with exact repair • Minimize repair bandwidth while maintaining MDS property • Idea: • Assume d = n – 1 (repair data from n – 1 survival disks during a single-node failure) • Make a duplicate copy for each native/code block • To repair a failed disk, download exactly one block per segment from each survival disk Rashmi et al., “Explicit construction of optimal exact regenerating codes for distributed storage”. Allerton 2009.

  13. Regenerating Codes Data stream of native blocks • E-MBR(n, k=n-1, d=n-1) m1 m2 m3 m4 m5 … • Duplicate each native block. Both native and duplicate blocks are in different disks • Parameters: • n = can be any ≥ 2 • k = n - 1 • d = n - 1 • m = n(n-1)/2 • c = 0 (i.e., no code block) m1 m1 m2 m3 m2 m4 m4 m5 m3 m5 m6 m6 m7 m7 m8 m9 m8 m10 m10 m11 m9 m11 m12 m12 D1 D2 D3 D4

  14. Regenerating Codes Data stream of native blocks • E-MBR(n, k=n-2, d=n-1) m1 m2 m3 m4 m5 … • Code block: • c1 = m1 + m2 + m3 + m4 + m5 • Parameters: • n = can be any ≥ 3 • k = n - 2 • d = n - 1 • m = (n-2)(n+1)/2 • c = 1 m1 m1 m2 m3 m2 m4 m4 m5 m3 m5 c1 c1 m6 m6 m7 m8 m7 m9 m9 m10 m8 m10 c2 c2 D1 D2 D3 D4

  15. Regenerating Codes • E-MBR(n, k, d=n-1) • For general n, k • Each native/code block still has a duplicate copy • Code blocks are formed by Reed-Solomon codes • Parameters: • n = can be any ≥ k+1 • k = can be any • d = n – 1 • m = k(2n−k−1)/2 • c = (n − k)(n − k − 1)/2

  16. Regenerating Codes • Repair in E-MBR: download one block per segment from each survival disk m1 m1 m2 m3 m2 m4 m4 m5 m3 m5 m6 m6 m7 m7 m8 m9 m8 m10 m10 m11 m9 m11 m12 m12 E-MBR(n, k=n-1, d=n-1) D1 D2 D3 D4

  17. Regenerating Codes • Others regenerating codes (e.g., MSR) typically require nodes to perform encoding • This assumption may be too strong for some types of storage nodes • e.g., block devices such as harddisks

  18. Codes That We Consider • We consider following MDS codes in this talk: • Erasure codes: RAID • Regenerating codes: E-MBR(d = n-1) • Both RAID and E-MBR(d=n-1) codes do not require encoding capabilities in disks

  19. NCFS Architecture • NCFS is a proxy-based file system that interconnects different types of storage nodes • Both client-side and server-side deployments are feasible Client-side deployment NCFS file client /mnt/ncfs storage nodes Network

  20. NCFS Architecture Server-side deployment NCFS file client /mnt/ncfs storage nodes Network

  21. NCFS Architecture Repair through NCFS: read data from survival nodes regenerate lost data blocks write the regenerated blocks to a new node NCFS B A A storage nodes B C=A+B C failed node new node

  22. Layered Design of NCFS • NCFS uses a layered design to allow extensibility • File system layer • Implemented in FUSE • Use FUSE to implement all filesystem semantics and repair operations NCFS storage nodes /mnt/ncfs • Coding layer: • RAID-5, RAID-6, MBR, MSR • Storage layer: • Interface to storage nodes File System Layer Coding layer Storage layer

  23. Experiments • Goal: empirically experiment different coding schemes under NCFS • Upload/download throughput • Degraded download throughput • Repair throughput • NCFS deployment • Intel quad-core machine • Linux

  24. Topologies • Three network topologies considered: NAS NAS department LAN PC 1Gb switch NCFS NCFS NCFS 4-node switch setting - Test basic functionality 8-node switch setting - Test scalability 4-node LAN setting - Test bandwidth bottlenecks

  25. Experiment: Normal Mode • Normal usage (i.e., all nodes work) • Upload a 256-MB file • Download a 256-MB file • Different coding schemes have different storage costs • e.g., actual storage for n = 4, • RAID-5: 341MB • RAID-6: 512MB • E-MBR(k = n − 1): 512MB • E-MBR(k = n − 2): 614MB

  26. Normal Upload Throughput • E-MBR(k=n-1) has the highest upload throughput because of no code-block update, even it uploads more data

  27. Upload Time Breakdown 4-node switch setting

  28. NAS 1Gb switch NCFS Experiment: Repair • Recover all lost data to a new node • Lost data includes native blocks and redundancy • Measure effective throughput Teff: • f = fraction of redundant blocks • N = size of all lost blocks • t = time for recovery • Teff = (1-f)*N / t new node 4-node switch setting

  29. Repair Throughput • E-MBR has higher repair throughput, which conforms to the theory Repairing a single-node failure

  30. Repair Time Breakdown Repairing a single-node failure (4-node switch setting)

  31. Experiment Summary • Demonstrate how NCFS is used in testing different coding schemes • E-MBR not only improves repair throughput, but also improves throughput in normal upload/download • Trade-off: higher storage overhead • More coding schemes can be plugged into NCFS

  32. Open Issues • Performance optimization • Caching strategy • Grouping multiple block accesses into one request • Parallelization • Security • Confidentiality • Byzantine fault tolerance • Pollution resilience

  33. NCFS References • Source code: • http://ansrlab.cse.cuhk.edu.hk/software/ncfs

More Related