hbase n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
HBase PowerPoint Presentation
Download Presentation
HBase

Loading in 2 Seconds...

play fullscreen
1 / 10

HBase - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

HBase. A column-centered database. Overview. An Apache project Influenced by Google’s BigTable Built on Hadoop A distributed file system Supports Map-Reduce Goals Scalability Versions Compression In memory tables. Architectural issues. Cluster of nodes is general architecture

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'HBase' - caryn-weber


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hbase

HBase

A column-centered database

overview
Overview
  • An Apache project
  • Influenced by Google’s BigTable
  • Built on Hadoop
    • A distributed file system
    • Supports Map-Reduce
  • Goals
    • Scalability
    • Versions
    • Compression
    • In memory tables
architectural issues
Architectural issues
  • Cluster of nodes is general architecture
  • Standalone mode for single machine
  • There is a Java API accessed with JRuby
  • There is a JRuby shell
modeling constructs
Modeling constructs
  • Table
    • Has a row key
    • A series of column families
      • Each has a column name and a value
  • Operations
    • Create table
    • Insert a row with “Put” command
      • Only one column at a time
    • Query a table with a “Get” command
      • (uses a table name and a row key)
filters
Filters
  • Scan
    • can get a series of rows based on two key values
    • Can provide a filter for such things as column families, timestamps
    • Filters can be pushed to the server
updating
Updating
  • When a column value is written to the db, old values are kept and organized by timestamp
    • Each such value is a cell
  • You can explicitly assign timestamps manually
    • Otherwise, current timestamp with insert
    • When getting, uses most recent version
  • Operations that alter column family structures is expensive
other characteristics
Other characteristics
  • Text compression
  • Rows are stored in order by key value
  • A region is some set of rows
    • Each is stored in a single region server
    • Regions can be automatically merged and split
  • Uses write-ahead logging to prevent loss of data with node failures
    • This is called journaling in Unix file systems
  • Supports a master/slave multi-cluster strategy
an hbase cluster taken from http www packtpub com article hbase basic performance tuning
An HBase clustertaken from: http://www.packtpub.com/article/hbase-basic-performance-tuning
tasks of components
Tasks of components
  • Zookeeper cluster is a coordination service for the HBase cluster
    • Finds the correct server
    • Selects the master
  • Master allocates regions & load balancing
  • Region servers hold the regions
  • Hadoop supports Map-Reduce
some key concepts
Some key concepts
  • De-normalization
  • Fast random, key-row retrieval
  • Use of a multi-component architecture to leverage existing software tools
  • Controllable in-memory selection