1 / 10

HBase

HBase. A column-centered database. Overview. An Apache project Influenced by Google’s BigTable Built on Hadoop A distributed file system Supports Map-Reduce Goals Scalability Versions Compression In memory tables. Architectural issues. Cluster of nodes is general architecture

caryn-weber
Download Presentation

HBase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HBase A column-centered database

  2. Overview • An Apache project • Influenced by Google’s BigTable • Built on Hadoop • A distributed file system • Supports Map-Reduce • Goals • Scalability • Versions • Compression • In memory tables

  3. Architectural issues • Cluster of nodes is general architecture • Standalone mode for single machine • There is a Java API accessed with JRuby • There is a JRuby shell

  4. Modeling constructs • Table • Has a row key • A series of column families • Each has a column name and a value • Operations • Create table • Insert a row with “Put” command • Only one column at a time • Query a table with a “Get” command • (uses a table name and a row key)

  5. Filters • Scan • can get a series of rows based on two key values • Can provide a filter for such things as column families, timestamps • Filters can be pushed to the server

  6. Updating • When a column value is written to the db, old values are kept and organized by timestamp • Each such value is a cell • You can explicitly assign timestamps manually • Otherwise, current timestamp with insert • When getting, uses most recent version • Operations that alter column family structures is expensive

  7. Other characteristics • Text compression • Rows are stored in order by key value • A region is some set of rows • Each is stored in a single region server • Regions can be automatically merged and split • Uses write-ahead logging to prevent loss of data with node failures • This is called journaling in Unix file systems • Supports a master/slave multi-cluster strategy

  8. An HBase clustertaken from: http://www.packtpub.com/article/hbase-basic-performance-tuning

  9. Tasks of components • Zookeeper cluster is a coordination service for the HBase cluster • Finds the correct server • Selects the master • Master allocates regions & load balancing • Region servers hold the regions • Hadoop supports Map-Reduce

  10. Some key concepts • De-normalization • Fast random, key-row retrieval • Use of a multi-component architecture to leverage existing software tools • Controllable in-memory selection

More Related