1 / 32

NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories

Learn about NOVA, a high-performance and fault-tolerant file system designed for non-volatile main memories. This system ensures data integrity, provides direct access, and offers fast atomic operations.

ebirney
Download Presentation

NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NOVA: A High-Performance,Fault-Tolerant File System for Non-Volatile Main Memories Andiry Xu, Lu Zhang, AmirsamanMemaripour, AkshathaGangadharaiah, Amit Borase, Tamires Brito Da Silva, Andy Rudoff (Intel), Steven Swanson Non-Volatile Systems Laboratory Department of Computer Science & Engineering University of California, San Diego

  2. Non-volatile main memory and DAX • Non-volatile main memory (NVMM) • Byte-addressable, durable memory • Latency and bandwidth close to DRAM • Battery-backed DRAM, PCM, 3D XPoint • DAX: direct access (from user-space) • Load/store access • Bypass page-cache • No kernel code in the way

  3. Expectations on NVMM file systems DAX Fault Tolerance Direct Access POSIX I/O Atomicity Speed

  4. DAX Ext4 XFS BtrFS F2FS ✔ ✔ Fault Tolerance Direct Access ❌ ❌ ❌ POSIX I/O Atomicity Speed

  5. DAX PMFSExt4-DAX XFS-DAX ✔ ✔ 3D XPoint NVDIMM DRAM + Flash NVDIMM ✔ ❌ ❌ Fault Tolerance Direct Access POSIX IO Atomicity Speed

  6. DAX NOVA FAST’16 3D XPoint NVDIMM DRAM + Flash NVDIMM ✔ ✔ ✔ ✔ Fault Tolerance Direct Access ❌ POSIX IO Atomicity Speed

  7. DAX NOVA-Fortis SOSP’2017 3D XPoint NVDIMM DRAM + Flash NVDIMM ✔ ✔ ✔ ✔ ✔ Fault Tolerance Direct Access POSIX IO Atomicity Speed

  8. Outline • NOVA architecture • Atomicity • Fault tolerance • Evaluations • Conclusions

  9. NOVA: Log Structured file system For NVMM • Scalable per-inode log design • Inode keeps head/tail pointers • Log contains operation entries • Data pages stored separately CPU 0 CPU 1 Inode table Inode Inode log Head Tail Data 0 Data 1 Data 2

  10. Atomicity: Single log appending • Log-structuring for atomic single log appending • Write, chmod, msync etc. • Atomically update the 8-byte tail • Lower overhead than journaling and shadow paging append(tail, entry); persist(); *tail = new_offset; persist(); Tail Tail Inode log

  11. Atomicity: Lightweight journaling • Lightweight journaling for update across logs • Create, rename, etc • Only journal inode tails instead of metadata or data Directory log Tail Tail Tail Tail File log Dir tail Journal File tail

  12. Atomicity: Copy-on-write • Copy-on-write for atomic file data writes • Data pages stored separately • Recycle pages when overwritten File log Tail Tail Data 0 Data 1 Data 1 Data 2

  13. Fault-tolerance • Snapshot: Point-in-time backups • Data integrity: Continuously detect and repair errors

  14. Snapshot for file access • Global snapshot ID • Saved to each log entry • Checked before recycling file pages Snapshot ID 0 1 Snapshot 0 0 Page 1 1 Page 1 1 Page 1 Inode log Data Data Data Data File write entry Epoch ID Data in snapshot Current data

  15. Snapshot for DAX: Idea • Set pages read-only, then copy-on-write Applications: no file system intervention File system: DAX-mmap() File data: RO

  16. Snapshot for DAX: Incorrect implementation • Application invariant: if V is True, then D is valid Snapshot values Application values • NOVA • snapshot Application thread D = ?; V = False; D = 5; V = True; snapshot_begin(); set_read_only(page_d); copy_on_write(page_d); set_read_only(page_v); snapshot_end(); ? 5 5 ? ? F F T T T ? page fault

  17. Snapshot for DAX: Correct implementation • Application invariant: if V is True, then D is valid Snapshot values Application values • NOVA • snapshot Application thread D = ?; V = False; D = 5; V = True; snapshot_begin(); set_read_only(page_d); set_read_only(page_v); snapshot_end(); copy_on_write(page_d); copy_on_write(page_v); 5 5 ? ? ? F T F F F ? page fault

  18. Application performance during snapshot • Two kinds of applications • File access • DAX-mmap • Normal execution v.s. taking a snapshot every 10 seconds File access DAX-mmap <1% throughput loss average 3.7% throughput loss

  19. NVMM failure modes • Detectable by hardware • Detectable & correctable • Detectable & uncorrectable • Undetectable by hardware • Undetectable media errors • “Scribbles” due to kernel bugs

  20. NVMM failure modes: Media failures • Detectable by hardware • Detectable & correctable • Detectable & uncorrectable • Undetectable by hardware • Undetectable media errors • “Scribbles” due to kernel bugs Software: NVMM Ctrl.: Consumes good data Read Detects & corrects errors NVMM data: Media error

  21. NVMM failure modes: Media failures • Detectable by hardware • Detectable & correctable • Detectable & uncorrectable • Undetectable by hardware • Undetectable media errors • “Scribbles” due to kernel bugs Software: NVMM Ctrl.: Receives MCE Read Detects uncorrectable errors Raises exception NVMM data: Media error

  22. NVMM failure modes: Media failures • Detectable by hardware • Detectable & correctable • Detectable & uncorrectable • Undetectable by hardware • Undetectable media errors • “Scribbles” due to kernel bugs Software: NVMM Ctrl.: Consumes corrupted data Read Sees no error NVMM data: Media error

  23. NVMM failure modes: Scribbles • Detectable by hardware • Detectable & correctable • Detectable & uncorrectable • Undetectable by hardware • Undetectable media errors • “Scribbles” due to kernel bugs Software: NVMM Ctrl.: Bug code scribbles NVMM Write Updates ECC NVMM data: Scribble error

  24. Capture machine checks • Copy (meta)data from NVMM to DRAM • memcpy with exception handling • Check memcpy return code

  25. NOVA metadata protection • Checksum + replication • Per-inode, per-log-entry checksum • Replicate all metadata structures Head Tail replica inode inode C1 Entn Cn Ent1 … Head’ Head1 Head1 Tail1 Tail’ Tail1 csum’ csum csum Head2 Head2 Tail2 Tail2 Ent1 C1 … Entn Cn Data 1 Data 2 Head Tail csum

  26. NOVA tick-tock metadata updates • Always have an fail-safe version of metadata Old Updating New replicainode Head1 Tail1 csum Head2 Tail2 inode Head1 Tail1 csum Head2 Tail2

  27. NOVA data integrity • Checksum + parity • Sub-page per-strip checksums • One parity strip for each page Data 1 Data 2 1 Page (8 strips) P = S0..7 S0 S1 S2 S3 S4 S5 S6 S7 P Ci = CRC32(Si) Replicated Ci

  28. Application performance on NOVA

  29. Application performance on NOVA

  30. Application performance on NOVA

  31. Conclusions • NOVA’s multi-log design achieves high performance, strong consistency, and fault tolerance on NVMM • NOVA supports consistent snapshot with DAX-mmap() • NOVA outperforms existing file systems while providing stronger atomicity and data integrity https://github.com/NVSL/linux-nova

  32. Thank you! https://github.com/NVSL/linux-nova

More Related