BlueSSD: Distributed Flash Store for Big Data Analytics

BlueSSD: Distributed Flash Store for Big Data Analytics Sang Woo Jun, Ming Liu, Kermin Fleming, Arvind Computer Science and Artificial Intelligence Laboratory MIT

Introduction – Flash Storage • Low latency, high density • Throughput per chip is fixed • Many chips are organized into multiple busses that can work concurrently • High throughput is achieved with more busses • Read/write speed difference, limited write lifetime • Not the main focus… yet

Flash Deployment Goals • High Capacity / Low Unit Cost • COREFU - Share distributed Storage over commodity network • TBs of storage at <1ms latency, 1GB throughput at high distribution • High Throughput / Low Latency • FusionIO - Maximum performance using many busses/chips and PCIE • 100s of GB at 100s of us latency, 3GB throughput

BlueSSD – Best of Both Worlds • Shared distributed storage over faster custom network to accelerate big data analytics • PCIE • 8x PCIe 2.0 (~1GB/s) • Inter-FPGA SERDES • Low latency sideband network (<1us, ~1GB/s) • Automatic network/flow control synthesis

The Physical System (Old) PCIe (~1GB/s) Sideband Link (~1GB/s) Flash Board (~80MB/s)

The Physical System (Now-4 Nodes)

System Configuration • 6 Xilinx ML605 Development Boards + Hosts • 4 Custom Flash Boards • 4 busses with 8 chips, 16GB per board • 2 Xilinx XM104 Connector Expansion Boards • 5 SMA Connections SMA Hub node FPGA FPGA XM014 XM014 SMA Host PC PCIE FPGA1 FPGA2 FPGA3 FPGA4 Custom Flash Board Custom Flash Board Custom Flash Board Custom Flash Board Storage Node The ML605 only has one SMA port, requiring hubs

System Configuration • Single software host can access all nodes • All nodes have identical memory maps of the entire address space • Requests are redirected to nodes that have the data SMA FPGA FPGA XM014 XM014 SMA Host PC PCIE FPGA1 FPGA2 FPGA3 FPGA4 Custom Flash Board Custom Flash Board Custom Flash Board Custom Flash Board

Network Flash Controller Requests Data PCIE Host PC Client Interface SMA Address Mapping FPGA Remote Node XM014 Host PC PCIE Flash Controller FPGA1 FPGA1 Custom Flash Board Flash Board

Network Hub • Programmatically define high-level connections • N-to-N crossbar-like network is generated SMA FPGA1 FPGA1 ML605 ML605 ML605 XM014 FPGA2 FPGA2 ML605 ML605 FPGA3 FPGA3 FPGA4 FPGA4

Software • FUSE provides a file system abstraction • Custom FUSE module interfaces with FPGA • The entire storage can be accessed as a single regular file • Currently running SQLite off-the-shelf • How to benchmark? SQLite stdio File System FUSE PCIE Driver FPGA

Storage Structure • Focusing on read-intensive workloads • Writes are done offline, no coherence issues • Address is striped across FPGAs • Concurrent writes will require more than coherence • SQLite assumes exclusive access to storage • If we are to have more than one file, file system metadata will need o be synchronized

Performance Measurement Throughput bottlenecked by custom flash card *COREFU performance at 32 nodes

Scalability • Latency increase is small enough to accommodate 16+ FPGAs • Single SMA cable can accommodate 10+ Flash board throughput • More should be possible with good topology • Different story if flash boards are faster(link compression?)

Future Work (1) • Bring up the 4 node system • Bring up the 8 node system • 8 more ML605 boards have been asked from Xilinx • More capacity + throughput

Future Work (2) • Offload computation to FPGA • Do computation near storage • Relational algebra processor • Complex analytics? • Looking for interesting application

Future Work (3) • Multiple concurrent writers • Software level transaction management • Hardware level pseudo-filesystem is probably required

The End • Thank you!

BlueSSD: Distributed Flash Store for Big Data Analytics

BlueSSD: Distributed Flash Store for Big Data Analytics

Presentation Transcript

Introduction to Flash Memory

Flash Memory from SSD Design Viewpoint

Chapter 23

Big Data and Clouds: Computing, Analytics and Curriculum

Maximizing Network and Storage Performance for Big Data Analytics

Maximizing Network and Storage Performance for Big Data Analytics

Digital Marketing

Spark - Shark Data Analytics Stack on a Hadoop Cluster

Flash Memory and SSD

Business Analytics for Managers - Taking Business Intelligence Beyond reporting

Chapter 22: Distributed Databases

Distributed Systems: Shared Data

Communication in Distributed Systems

MySQL

Software Engineering Data flow diagrams

Outline

Distributed Systems: Coordination models and languages

Chapter 17 – Macromedia Flash MX 2004: Building Interactive Animations

DIS Revision

Chapter 2: Basics of Business Analytics

Improving app health with crash analytics