NoSQL
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

NoSQL DB Benchmarking with high performance Networking solutions PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on
  • Presentation posted in: General

NoSQL DB Benchmarking with high performance Networking solutions. WBDB, Xian, July 2013. Leading Supplier of End-to-End Interconnect Solutions . Storage Front / Back-End. Server / Compute. Switch / Gateway. Virtual Protocol Interconnect. Virtual Protocol Interconnect. 56 G IB & FCoIB.

Download Presentation

NoSQL DB Benchmarking with high performance Networking solutions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Nosql db benchmarking with high performance networking solutions

NoSQL DB Benchmarking with high performance Networking solutions

WBDB, Xian, July 2013


Leading supplier of end to end interconnect solutions

Leading Supplier of End-to-End Interconnect Solutions

Storage

Front / Back-End

Server / Compute

Switch / Gateway

Virtual Protocol Interconnect

Virtual Protocol Interconnect

56G IB & FCoIB

56G InfiniBand

10/40/56GbE & FCoE

10/40/56GbE

Fibre Channel

Comprehensive End-to-End InfiniBand and Ethernet Portfolio

ICs

Adapter Cards

Switches/Gateways

Cables

Host/Fabric Software


Motivation to accelerate data analytics

Motivation to Accelerate Data Analytics

  • Data Analysis Requires Faster Network

    • Hadoop Map Reduce Framework is a network intensive workload

      • Mapped data is shuffled between nodes in the cluster

    • Data Replication

      • A high availability event triggers Multi-Tera of data movement

  • Provide Higher Data Value

    • Expose SSD’s low latency capabilities

    • Better server/CPU utilization

Big Data Applications Require High Bandwidth and Low Latency Interconnect

* Data Source: Intersect360 Research, 2012, IT and Data scientists survey


Cassandra update latency

Cassandra, Update Latency

  • Cassandra Database enables update capabilities

  • Latency factors

    • Commit-log settings

    • Workload


Cassandra read latency

Cassandra, Read Latency

  • Cassandra Database Read

  • Latency factors

    • Media used

    • Workload


System used for cassandra benchmark

System Used for Cassandra Benchmark

  • 5 Nodes in the Ring

  • 64GB RAM

    • 8 x 8GB DDR3 1333MHz

  • 2 x E5-2670

    • 8 Cores per socket

  • 5 x Seagate® Constellation® ES SATA 6Gb/s 2TB Hard Drive

    • 7200 RPM

  • NIC: Mellanox Technologies MT27500 Family [ConnectX-3]

    • 10Gb Ethernet

    • FW_VER=2.11.500

  • Switch SX1036

  • OS: RH 6.3

    • MLNX_OFED_LINUX-1.5.3

  • Apache Cassandra 1.1.12, 2 seeds


Unlocking the power of ssds in hadoop environment

Unlocking the Power of SSDs In Hadoop Environment

  • SSDs Become De-Facto standard in HDFS deployment

    • Read capability is a critical factor for application performance

  • E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster

    • 1GbE network impede any performance benefit from SSD deployment

E-DFSIO, Showing the Power of SSD @ HDFS


Hbase benchmarking update latency

HBase Benchmarking, Update Latency

  • Updates are made to server memory

    • Extreme low latency for HBase

      • Java GC policy hurting on large throughput


Hbase benchmarking read latency

HBase Benchmarking, Read Latency

  • Hitting the media capabilities


System used for hbase benchmarks

System Used for HBase Benchmarks

  • 4 Region servers, 1 Master, 3 Zookeeper quorum servers

  • 64GB RAM

    • 8 x 8GB DDR3 1333MHz

  • 2 x E5-2670

    • 8 Cores per socket

  • 5 x Seagate® Constellation® ES SATA 6Gb/s 2TB Hard Drive

    • 7200 RPM

  • NIC: Mellanox Technologies MT27500 Family [ConnectX-3]

    • 10Gb Ethernet

    • FW_VER=2.11.500

  • Switch SX1036

  • OS: RH 6.3

    • MLNX_OFED_LINUX-1.5.3

  • Apache Hbase 0.94.9, Zookeeper 3.4.5, Apache Hadoop 1.1.2


Test drive your big data

Test Drive Your Big Data

  • EMC 1000-Node Analytic Platform

  • Accelerates Industry's Hadoop Development

  • 24 PetaByte of physical storage

  • Mellanox VPI Solutions

Hadoop

Acceleration

2X Faster Hadoop Job Run-Time

High Throughput, Low Latency, RDMA Critical for ROI


The great things in hadoop distributed file system

The Great Things in Hadoop Distributed File System

  • HDFS is a block storage solution

  • Block size can be modified to provide efficient solutions for very large files

  • Inherent reliability, no need for high end storage solution to make sure data is there!

  • Tuned for Hadoop work loads, write one and read many


The less great things in hdfs

The Less Great Things in HDFS

Metadata Server Failure

Default 3x Replication

Small files or latency sensitive

It’s hard to manage

the different setting

to get the right nodes

into the right capabilities.

Ingress and extraction

of data requires

additional tools.


Local disks the common practice

Local Disks – The Common Practice


Other distributed storage solution for hadoop really

Other Distributed Storage Solution for Hadoop, Really?!


Orangefs as hadoop storage solution

OrangeFS as Hadoop Storage Solution


Lustre as hadoop storage solution

Lustre as Hadoop Storage Solution

Source: Map/Reduce on Lustre, Hadoop Performance in HPC Environments, Nathan Rutman, Senior Architect, Networked Storage Solutions, Xyratex


Ceph as hadoop storage solution

CEPH as Hadoop Storage Solution

  • Generating lot of Interest since the Ceph kernel client was pulled into Linux kernel 2.6.34

    • Object-based parallel file system

    • Scalable metadata server

    • Each file can specify it’s own striping strategy and object size

    • Automatic rebalancing of data with minimal data movement

    • Hadoop module for integrating Ceph has been in development since 0.12 release

  • Benchmarks on Ceph is still WIP

    • We are currently working on using running benchmarks on Ceph – Stay tuned!!


  • Login