1 / 31

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008. Shimin Chen Big Data Reading Group. Introduction. SSD: block-level APIs as disks Lost of opportunity

Download Presentation

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transactional FlashV. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group

  2. Introduction • SSD: block-level APIs as disks • Lost of opportunity • Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases

  3. Idea: Transactional Flash (Txflash) • An SSD (w/ new features) • Addressing: a linear array of pages • Support read and write operations • Support a simple transactional construct • Each tranx consists of a series of write operations • Atomicity • Isolation • Durability

  4. Why is this useful? • Transaction abstraction required in many places: file system journals, etc. • Each application implements its own • Complexity • Redundant work • Reliability of the implementation • Great if a storage layer provides transactional API

  5. Previous Work: disk-based • Copy-on-Write + Logging • Fragmentation  poor read performance • Checkpointing and cleaning • Cleaning cost • SSDs mitigate these problems • SSDs already do CoW for flash-related reasons • Random read accesses are fast

  6. Outline • Introduction • The Case for TxFlash • Commit Protocols • Implementation • Evaluation • Conclusion

  7. In-progress tranx Core of TxFlash Not issue conflict writes s TxFlash Architecture & API WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability Abort aborting in-progress tranx

  8. Simple Interface • WriteAtomic: multi-page writes • Useful for file systems • Not full-fledged tranx: no reads in tranx • Reduce complexity • Backward compatible

  9. Flash is good for this purpose • Copy-on-write: already supported by FTL • Fast random reads • High concurrency • multiple flash chips inside • New device: • New interface more likely

  10. Outline • Introduction • The Case for TxFlash • Commit Protocols • Implementation • Evaluation • Conclusion

  11. Traditional Commit • First write to a log: • Intention record: (data, page# & version#, tranx ID) • … • Intention record • Commit record • Tranx is committed == commit record exists • Intention records  modify original data • If modifications are done, the records can be garbage collected

  12. Traditional Commit on SSDs • Optimizations: • All writes can be issued in parallel • Not update the original data, just update the remap table • Problem: commit record • Extra latency after other writes • Garbage collection is complicated: • Must know if all the updates complete or not

  13. New Proposal (1): Simple Cyclic Commit • No commit record • Intension records of the same tranx use next links to form a cycle • (data, page# & version#, next page# & version#) • Tranx is committed == all intension records are written • Flash page (4KB) + metadata (128B)are co-located

  14. Problem

  15. Solution: • Any uncommitted intention on the stable storage must be erased before any new writes are issued to the same or a referenced page

  16. Operations • Initialization: • Setting version# to 0, next-link to self • Transaction • Garbage Collection: • For any uncommitted intention • For committed page if a newer version is committed • Recovery: scan all pages then look for cycles

  17. New Proposal (2):Back Pointer Cyclic Commit • Another way to deal with ambiguity • Intention record: • (data, page#&version#, next-link, link to last committed version)

  18. A3 is a straddler of A2 Some complexity in garbage collection and recovery because of this

  19. Protocol Comparison

  20. Outline • Introduction • The Case for TxFlash • Commit Protocols • Implementation • Evaluation • Conclusion

  21. Implementation • Simulatior • DiskSimtrace-driven SSD simulator (UNIX’08)modifications for TxFlash • Support tranx of maximum size 4MB • Pseudo-device driver for recording traces • TxExt3: • Employ Txflash for Ext3 file system • Tranx: Ext3 journal commit

  22. Experimental Setup • TxFlash device: • 32GB: 8x 4GB flash packages • 4 I/O operations within every flash package • 15% of space reserved for garbage collection • Workload on top of Ext3: • IOzone: micro benchmark (no sync writes) • Linux-build (no sync writes) • Maildir (sync writes) • TPC-B: simulate 10,000 credit-debit-like operations on TxExt3 file system (sync writes) • Synthetic workloads

  23. Cyclic commit vs. Traditional commit

  24. Unlike database logging, large tranx sizes: no sync; data are included

  25. simple cyclic commit has a high cost if there are aborts

  26. TxFlash vs. SSD • Remove WriteAtomic from traces • Use SSD simulator • SSD does not provide any transaction guarantees (so should have better performance)

  27. Space comparison: TxFlash needs 25% of more main memory than SSD • 4+1 MB per 4GB flash  40 MB for the 32GB TxFlash device

  28. End-to-end performance • TxFlash: • Run pseudo-device driver on real SSD • The performance is close to that of TxFlash • Ext3: • Use SSD as journal • SSD cache is disabled in both cases

  29. Summary • TxFlash: • Adding transaction interface in SSD • Cyclic commit protocols • Nice solution for file system journaling

More Related