Download
boxwood abstractions as the foundation for storage infrastructure n.
Skip this Video
Loading SlideShow in 5 Seconds..
Boxwood: Abstractions as the Foundation for Storage Infrastructure PowerPoint Presentation
Download Presentation
Boxwood: Abstractions as the Foundation for Storage Infrastructure

Boxwood: Abstractions as the Foundation for Storage Infrastructure

308 Views Download Presentation
Download Presentation

Boxwood: Abstractions as the Foundation for Storage Infrastructure

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Boxwood: Abstractions as the Foundation for Storage Infrastructure Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John MacCormick, Nick Murphy, and Marc Najork

  2. Distributed Storage Applications are Hard to Build • Distributed storage: low hardware cost, but high development/deployment cost • Application logic on low-level storage interface • Hardware parallelism and concurrency control • Fault tolerance a necessity • Incremental expansion and dynamic reconfiguration vs. system consistency • Our goal: Distributed storage applications made easyto design, build, and deploy Boxwood

  3. Target Application and Setting Enterprise storage applications and back-end storage for data-intensive Internet services Boxwood

  4. Roadmap • Boxwood Vision • Boxwood Architecture • Building Applications on Boxwood • Performance • Related Work and Conclusion Boxwood

  5. Boxwood Vision Incorporate rich virtualized abstractions into low levels of the storage An evolution path for distributed storage: Storage Applications Boxwood

  6. Boxwood Vision Incorporate rich virtualized abstractions into low levels of the storage An evolution path for distributed storage: Storage Applications Virtual Disk Boxwood

  7. Boxwood Vision Incorporate rich virtualized abstractions into low levels of the storage An evolution path for distributed storage: Storage Applications Tree Table List … … Boxwood

  8. Why High-Level Abstractions • Reduce the complexity of distributed storage applications • Natural continuum of storage virtualization • “High-level programming language” for building distributed storage applications • Potential built-in performance optimization by exploiting structural information • Caching • Prefetching Boxwood

  9. Roadmap • Boxwood Vision • Boxwood Architecture • Building Applications on Boxwood • Performance • Related Work and Conclusion Boxwood

  10. Services Locking Logging Consensus Boxwood Architecture Storage Application B-Tree High-level Storage Abstractions Chunk Store Reliable “Media” Replicated Logical Device Magnetic Media Boxwood

  11. Persistent storage with “malloc”-like interface Virtualization layer that hides the distributed nature Manage address space or free space for higher layers Reliable storage through replicated logical device Chunk Store Allocate Read De-allocate Write Chunk Store Replicated Logical Device Boxwood

  12. B-Tree: A proven useful data structure for storage applications Distributed/reliable B-Link trees in Boxwood B-Link trees: high concurrency with simple locking Distributed reliable storage from chunk store Caching for performance Distributed lock service for consistency Logging for recovery B-Tree Abstraction Create Lookup Insert Enumerate Delete B-Link Tree Logging Locking Chunk Store Boxwood

  13. Boxwood Services • Distributed lock service for coordinating concurrent access to shared data • Logging and recovery service for atomicity in face of transient failures • Consensus service for system consistency Clean design of these services is crucial for scalability and for managing complexity Boxwood

  14. Roadmap • Boxwood Vision • Boxwood Architecture • Building Applications on Boxwood • Performance • Related Work and Conclusion Boxwood

  15. Distributed Storage Applications on Boxwood: A Recipe • Design applications for local storage • Map application logic to storage abstractions • Adapt the design for a distributed storage infrastructure • Boxwood abstractions are virtualized • Boxwood offers facilitating distributed services Separating algorithmic design from distributed system concerns is attractive. Boxwood

  16. Logging Local Disks Local Disks From B-Link Tree Algorithm to Distributed Reliable B-Link Trees B-Link Tree Algorithm Local Locks B-Link trees on a single machine Boxwood

  17. From B-Link Tree Algorithm to Distributed Reliable B-Link Trees B-Link Tree Algorithm Global Lock Service Reliable Logging Chunk Store Replicated Logical Device Distributed and reliable B-Link trees Boxwood

  18. Exported via NFS v2 Directory/File  B-Tree Directory: maps names to NFS file handle with embedded B-tree handle File: maps block number to chunk handle File blocks  chunks Locking/caching at file system level ~2500 lines of C# code BoxFS:Multi-Node File Server on Boxwood BoxFS Services B-Link Tree Chunk Store Boxwood

  19. Roadmap • Boxwood Vision • Boxwood Architecture • Building Applications on Boxwood • Performance • Related Work and Conclusion Boxwood

  20. Prototype Deployment and Performance Evaluation • System setup • Eight Dell PowerEdge 2650 servers with a single 2.4 GHz Xeon processor, 1GB of RAM • Gigabit Ethernet switch • Adaptec AIC-7899 dual SCSI adapter, and 5 SCSI drives • Performance evaluation • Single-machine non-replicated performance (BoxFS vs. NFS) • B-tree operation scalability • BoxFS operation scalability Boxwood

  21. BoxFS vs. NFS over NTFS:Connectathon Benchmarks Boxwood

  22. B-Tree Scaling (Private Tree) Boxwood

  23. BoxFS Scaling (Read) Boxwood

  24. B-Tree Scaling (Shared Tree) Boxwood

  25. BoxFS Scaling (Write/MkDirEnt) Boxwood

  26. Roadmap • Boxwood Vision • Boxwood Architecture • Building Applications on Boxwood • Performance • Related Work and Conclusion Boxwood

  27. Related Work • Distributed Storage/Operating Systems • Virtual/Logical disks • File systems • Database systems • Scalable Distributed Data Structures • Linear Hash Table (LH) and its variants (Litwin, 1980--present) • Scalable distributed hash table(Gribble et al., 2000) • Highly concurrent B-trees (Lehman and Yao, 1981; Sagiv, 1986) Boxwood

  28. Conclusion and Future Directions A storage infrastructure offering virtualized high-level abstractions is promising Future Work: • Explore more abstractions and applications; expose flexible interfaces (e.g., through hints) • Leverage high-level abstractions for better load balancing, prefetching, and caching • Graceful degradation during massive failures Boxwood