FAWN: A Fast Array of Wimpy Nodes

Presented by: Aditi Bose & Hyma Chilukuri FAWN: A Fast Array of Wimpy Nodes

Motivation Large-scale data-intensive applications like high performance key-value storage systems are being used by Facebook, LinkedIn, Amazon with more regularity. Being I/O, Requiring RA over large DB, performing parallel, concurrent and mostly independent operations, requiring large clusters and storing small sized objects are several common features these workloads share.System performance: queries/sec Energy efficiency: queries/joule CPU performance and I/O bandwidth Gap : For data intensive computing workloads, storage, network and memory bandwidth bottlenecks lead to low CPU utilizationSolution: wimpy processors to reduce I/O induced idle cyclesCPU Power consumption: operating processors at higher freq requires more energy. techniques to mask CPU bottleneck cause energy inefficiency branch prediction, speculative execution – more processor die areaSolution: slower CPUs execute more instructions per joule 1 billion vs. 100 million instructions per Joule

FAWN Efficient – 1W at heavy load Vs 10W at load Fast random reads – up to 175 times faster Slow random writes – updating a single page means erasing an entire block before writing the modified block in its placeCluster of embedded CPUs using flash storage Efficient – 1W at heavy load Vs 10W at load Fast random reads – up to 175 times faster Slow random writes – updating a single page means erasing an entire block before writing the modified block in its place FAWN-KeyValue nodes organized into a ring using consistent Hashing physical node is a collection of virtual nodeFAWN-DS Log structured key-value stores contains values for key range associated with VID

FAWN - DS Uses as in-memory Hash Index to map 160-bit key to a value stored in the data logstores only a fragment of the actual key. Hash Index bucket = i low order index bits key fragment = next 15 low order bitsEach bucket -6 bytes - stores frag, valid bit and 4-byte pointer

Virtual Node Maintenance: Split Merge Compact FAWN - DS Basic Functions: Store Lookup Delete Concurrent operations

FAWN - KV FAWN-KV organizes the back-end VIDs into a storage ring-structure using consistent hashingManagement node assigns each front-end to circular key space Front-end node manages fraction of key-space manages the VID membership list forwards out-of-range request Back-end nodes – VIDs owns a key range contacts front-end when joining

FAWN - KV Chain replication

FAWN - KV Join split key range pre-copy chain insertion log flush Leave merge key range Join into each chain

Individual Node Performance • Lookup speed • Bulk store speed: 23.2 MB/s, or 96% of raw speed

Individual Node Performance • Put speed • Compared to BerkeleyDB: 0.07 MB/s – shows necessity of log-based filesystems

Individual Node Performance • Read- and write-intensive workloads

System Benchmarks • System throughput and power consumption

Impact of Ring Membership Changes • Query throughput during node join and maintenance operations

Alternative Architectures Large Dataset, Low Query → FAWN+Disk number of nodes dominated by storage capacity per node has the lowest total cost per GBSmall Dataset, High Query → FAWN+DRAM number of nodes dominated by per node query capacity has the lowest cost for queries/secMiddle Range → FAWN+SSD best balance of storage capacity, query rate and total cost

Conclusion • Fast and energy efficient processing of random read-intensive workloads • Over an order of magnitude more queries per Joule than traditional disk-based systems

FAWN: A Fast Array of Wimpy Nodes