boxwood abstractions as the foundation for storage infrastructure n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Boxwood: Abstractions as the Foundation for Storage Infrastructure PowerPoint Presentation
Download Presentation
Boxwood: Abstractions as the Foundation for Storage Infrastructure

Loading in 2 Seconds...

play fullscreen
1 / 28

Boxwood: Abstractions as the Foundation for Storage Infrastructure - PowerPoint PPT Presentation


  • 282 Views
  • Uploaded on

Boxwood: Abstractions as the Foundation for Storage Infrastructure. Lidong Zhou, Microsoft Research Silicon Valley Joint work with Chandu Thekkath, John MacCormick, Nick Murphy, and Marc Najork. Distributed Storage Applications are Hard to Build.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Boxwood: Abstractions as the Foundation for Storage Infrastructure' - liam


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
boxwood abstractions as the foundation for storage infrastructure

Boxwood: Abstractions as the Foundation for Storage Infrastructure

Lidong Zhou, Microsoft Research Silicon Valley

Joint work with Chandu Thekkath, John MacCormick, Nick Murphy, and Marc Najork

distributed storage applications are hard to build
Distributed Storage Applications are Hard to Build
  • Distributed storage: low hardware cost, but high development/deployment cost
    • Application logic on low-level storage interface
    • Hardware parallelism and concurrency control
    • Fault tolerance a necessity
    • Incremental expansion and dynamic reconfiguration vs. system consistency
  • Our goal: Distributed storage applications made easyto design, build, and deploy

Boxwood

target application and setting
Target Application and Setting

Enterprise storage applications and back-end storage for data-intensive Internet services

Boxwood

roadmap
Roadmap
  • Boxwood Vision
  • Boxwood Architecture
  • Building Applications on Boxwood
  • Performance
  • Related Work and Conclusion

Boxwood

boxwood vision
Boxwood Vision

Incorporate rich virtualized abstractions into low levels of the storage

An evolution path for distributed storage:

Storage Applications

Boxwood

boxwood vision1
Boxwood Vision

Incorporate rich virtualized abstractions into low levels of the storage

An evolution path for distributed storage:

Storage Applications

Virtual Disk

Boxwood

boxwood vision2
Boxwood Vision

Incorporate rich virtualized abstractions into low levels of the storage

An evolution path for distributed storage:

Storage Applications

Tree

Table

List

Boxwood

why high level abstractions
Why High-Level Abstractions
  • Reduce the complexity of distributed storage applications
    • Natural continuum of storage virtualization
    • “High-level programming language” for building distributed storage applications
  • Potential built-in performance optimization by exploiting structural information
    • Caching
    • Prefetching

Boxwood

roadmap1
Roadmap
  • Boxwood Vision
  • Boxwood Architecture
  • Building Applications on Boxwood
  • Performance
  • Related Work and Conclusion

Boxwood

boxwood architecture

Services

Locking

Logging

Consensus

Boxwood Architecture

Storage Application

B-Tree

High-level

Storage

Abstractions

Chunk Store

Reliable

“Media”

Replicated

Logical Device

Magnetic Media

Boxwood

chunk store
Persistent storage with “malloc”-like interface

Virtualization layer that hides the distributed nature

Manage address space or free space for higher layers

Reliable storage through replicated logical device

Chunk Store

Allocate

Read

De-allocate

Write

Chunk Store

Replicated

Logical Device

Boxwood

b tree abstraction
B-Tree: A proven useful data structure for storage applications

Distributed/reliable B-Link trees in Boxwood

B-Link trees: high concurrency with simple locking

Distributed reliable storage from chunk store

Caching for performance

Distributed lock service for consistency

Logging for recovery

B-Tree Abstraction

Create

Lookup

Insert

Enumerate

Delete

B-Link Tree

Logging

Locking

Chunk Store

Boxwood

boxwood services
Boxwood Services
  • Distributed lock service for coordinating concurrent access to shared data
  • Logging and recovery service for atomicity in face of transient failures
  • Consensus service for system consistency

Clean design of these services is crucial for scalability and for managing complexity

Boxwood

roadmap2
Roadmap
  • Boxwood Vision
  • Boxwood Architecture
  • Building Applications on Boxwood
  • Performance
  • Related Work and Conclusion

Boxwood

distributed storage applications on boxwood a recipe
Distributed Storage Applications on Boxwood: A Recipe
  • Design applications for local storage
    • Map application logic to storage abstractions
  • Adapt the design for a distributed storage infrastructure
    • Boxwood abstractions are virtualized
    • Boxwood offers facilitating distributed services

Separating algorithmic design from distributed system concerns is attractive.

Boxwood

from b link tree algorithm to distributed reliable b link trees

Logging

Local Disks

Local Disks

From B-Link Tree Algorithm to Distributed Reliable B-Link Trees

B-Link Tree

Algorithm

Local

Locks

B-Link trees on a single machine

Boxwood

from b link tree algorithm to distributed reliable b link trees1
From B-Link Tree Algorithm to Distributed Reliable B-Link Trees

B-Link Tree

Algorithm

Global

Lock

Service

Reliable

Logging

Chunk Store

Replicated

Logical Device

Distributed and reliable B-Link trees

Boxwood

boxfs multi node file server on boxwood
Exported via NFS v2

Directory/File  B-Tree

Directory: maps names to NFS file handle with embedded B-tree handle

File: maps block number to chunk handle

File blocks  chunks

Locking/caching at file system level

~2500 lines of C# code

BoxFS:Multi-Node File Server on Boxwood

BoxFS

Services

B-Link

Tree

Chunk Store

Boxwood

roadmap3
Roadmap
  • Boxwood Vision
  • Boxwood Architecture
  • Building Applications on Boxwood
  • Performance
  • Related Work and Conclusion

Boxwood

prototype deployment and performance evaluation
Prototype Deployment and Performance Evaluation
  • System setup
    • Eight Dell PowerEdge 2650 servers with a single 2.4 GHz Xeon processor, 1GB of RAM
    • Gigabit Ethernet switch
    • Adaptec AIC-7899 dual SCSI adapter, and 5 SCSI drives
  • Performance evaluation
    • Single-machine non-replicated performance (BoxFS vs. NFS)
    • B-tree operation scalability
    • BoxFS operation scalability

Boxwood

roadmap4
Roadmap
  • Boxwood Vision
  • Boxwood Architecture
  • Building Applications on Boxwood
  • Performance
  • Related Work and Conclusion

Boxwood

related work
Related Work
  • Distributed Storage/Operating Systems
    • Virtual/Logical disks
    • File systems
    • Database systems
  • Scalable Distributed Data Structures
    • Linear Hash Table (LH) and its variants

(Litwin, 1980--present)

    • Scalable distributed hash table(Gribble et al., 2000)
  • Highly concurrent B-trees

(Lehman and Yao, 1981; Sagiv, 1986)

Boxwood

conclusion and future directions
Conclusion and Future Directions

A storage infrastructure offering virtualized high-level abstractions is promising

Future Work:

  • Explore more abstractions and applications; expose flexible interfaces (e.g., through hints)
  • Leverage high-level abstractions for better load balancing, prefetching, and caching
  • Graceful degradation during massive failures

Boxwood