Apache Hadoop HBase • What is it ? • Why use it ? • Architecture • Storage • Related Projects
Hbase – What is it ? • A Hadoop Data Store • A noSQL store for big data • It is Open Source, written in Java • It is a distributed database • Automatic sharding, table data spread over cluster • Automatic region server fail over
Hbase – Why / When use it ? • Data in billions of rows • Complex data • High volume of I/O • High level of data nodes, 5 + • No need for extra RDBMS functions i.e. transactions
HBase – Architecture Where does Hbase sit in relation to Hadoop ?
HBase – Architecture • HBase is a data store • Uses Hadoop for distributed storage • Data stored across region servers • Region server data spread across HDFS data nodes • A write ahead log (WAL) is used to record changes
HBase – Storage • What is the architecture ?
HBase – Storage • Client makes call i.e. put • Request RPC'ed as key value to Region server • Key Value routed to region for row • Data is written to WAL • Data written to region memStore • If region server cashes WAL can be used to recover data
HBase – Related Projects • Apache Flume – move large data sets to Hadoop • Apache Sqoop – cmd line, move rdbms data to Hadoop • Apache Hbase – Non relational database • Apache Pig – analyse large data sets • Apache Oozie – work flow scheduler • Apache Mahout – machine learning and data mining • Apache Hue – Hadoop user interface • Apache Zoo Keeper – configuration / build
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • email@example.com • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems