1 / 22

Introduction to Hbase

Introduction to Hbase. Agenda. What is Hbase About RDBMS Overview of Hbase Why Hbase instead of RDBMS Architecture of Hbase Hbase interface Summarize. What is Hbase. Hbase is an open source, distributed sorted map modeled after Google's BigTable. Open Source.

pink
Download Presentation

Introduction to Hbase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Hbase

  2. Agenda • What is Hbase • About RDBMS • Overview of Hbase • Why Hbase instead of RDBMS • Architecture of Hbase • Hbaseinterface • Summarize

  3. What is Hbase Hbaseis an open source, distributed sorted map modeled after Google's BigTable

  4. Open Source • Apache 2.0 License • Committers and contributors from diverse organizations like Facebook, Trend Micro etc.

  5. About RDBMS • Have a lot of Limitations • Both read / write throughput not high (transactional databases) • Specialized Hardware is quite expensive

  6. Background • Google releases paper on Bigtable – 2006 • First usable Hbase – 2007 • Hbase becomes Apache top-level project – 2010

  7. Overview of Hbase • Hbase is a part of Hadoop eco-system. • Apache Hadoop is an open source system to reliably store and process data across many commodity computers • Hadoop provides: • Fault tolerance • Scalability

  8. Hadoop's components • MapReduce (Process) • Fault tolerant distributed processing • HDFS (store) • Self-healing • High-bandwidth • Clustered storage

  9. Difference Between Hadoop/HDFS and Hbase • HDFS is a distributed file system that is well suited for the storage of large files. • Hbase is built on top of HDFS and provides fast record lookups (and updates) for large tables. • HDFS has based on GFS.

  10. Hbase is • Distributed – uses HDFS for storage • Column – Oriented • Multi-Dimensional • Storage System

  11. Hbase is NOT • A sql Database – No Joins, no query engine, no datatypes, no sql • No Schema

  12. Storage Model • Column – oriented database (column families) • Table consists of Rows, each which has a primary key(row key) • Each Row may have any number of columns • Table schema only defines Column families(column family can have any number of columns) • Each cell value has a timestamp

  13. Static Columns

  14. Something different • Row1 → ColA = Value • ColB = Value • ColC = Value • Row2 → ColX = Value • ColY= Value

  15. A Big Map Row Key + Column Key + timestamp => value

  16. RDBMS vsHbase

  17. Terms and Daemons • Region • A subset of table's rows • Region Server(slave) • Serves data for reads and writes • Master • Responsible for coordinating the slaves • Assigns regions, detects failures of Region Servers • Control some admin function

  18. Distributed coordination • To manage master election and server availability we use Zookeeper • Set up a cluster, provides distributed coordination primitives • An excellent tool for building cluster management systems

  19. Hbase Architecture

  20. Hbase Interface • Java • Thrift (Ruby, Php, Python, Perl, C++,..) • Hbase Shell

  21. Use Hbase if • You need random write, random read or both • You need to do many thousands of operations per sec on multiple TB of data • Your access patterns are simple

  22. Thank You

More Related