220 likes | 398 Views
Thanks to our Sponsors! . To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login . Introduction to. Giri Vislawath Senior Software Developer Overstock.com giri.vislawath@gmail.com. Agenda.
E N D
Thanks to our Sponsors! To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login
Introduction to Giri Vislawath Senior Software Developer Overstock.com giri.vislawath@gmail.com
Agenda • What is HBase ? • What HBase is NOT? • Relational Database vsHBase • HBase • Architecture • Data Model • Logical & Physical View • Design Considerations • Setup • Clients • Demo • Q & A
What is HBase? • Open source Apache project • Non-relational, distributed Database • Runs on top of HDFS • Modeled after Google’s BigTable technology • Written in Java • NoSQL (Not Only SQL) Database • Consistent and Partition tolerant • Runs on commodity hardware • Large Database ( terabytes to petabytes). • Low latency random read / write to HDFS. • Many companies are using HBase • Facebook, Twitter, Adobe, Mozilla, Yahoo!, Trend Micro, and StumbleUpon
HBase is NOT • A direct replacement for RDBMS • ACID (Atomicity, Consistency, Isolation, and Durability) complaint • HBase provides row-level atomicity • A scan is NOT consistent view of a table (neither isolated) • All visible data is also durable data.
Relational Database vsHBase • Hardware • Expensive Enterprise multiprocessor systems • Same as Hadoop • Fault Tolerance • RDBMS are configured with high availability. Server down time intolerable. • Built into the architecture. Individual Node failure does not impact overall performance. • Database Size • RDBMS can hold upto TBs (Tera bytes) • Hbase can hold PBs (Peta bytes) • Data Layout • RDBMS are rows and columns oriented • Hbase is Column oriented
Relational Database vsHBase • Data Type • Rich data type. • Bytes • Transactions • Fully ACID complaint. • ACID on single row only. • Indexes • PK, FK and other indexes. • Sorted Row-key (not a real index)
HBase Architecture Zookeeper Client Master Region Server 1 Region Server 2 Region Server 3 HDFS / Hadoop
HBase – Fault Tolerance • What if region server dies? • The hbase master will assign a new regionserver. • What if maser dies? • The back up master will take over. • What if the backup master dies? • You are dead. • Replication of Data • HBase achieves this using HDFS replication mechanism. • Failure Detection • Zookeeper is used for identifying failed region servers.
HBase Data Model • No Schema • Table • Row-key must be unique • Rows are formed by one or more columns • Columns are grouped into Column Families • Column Families must be defined at table creation time • Any number of Columns per column family • Columns can be added on the fly • Columns can be NULL • NULL columns are NOT stored (free of cost) • Column only exist when inserted (Sparse) • Cell • Row Key, Column Family, Qualifier , Timestamp / Version • Data represented in byte array • Table name, Column Family name, Column name
HBase – Logical View of Data RDBMS View Logical Hbase View
HBase – Physical View of Data Info column family tweet column family KEY (ROW KEY, CF, QUALIFIER, TIMESTAMP) => VALUE
Hbase – Logical to Physical View CF1 CF2 HFile for CF2 HFile for CF1 ROW1:CF1:C1:V1 ROW1:CF1:C3:V3 ROW2:CF1:C1:V4 ROW2:CF1:C2:V6 ROW2:CF1:C4:V7 ROW3:CF1:C3:V6 ROW4:CF1:C1:V10 ROW4:CF1:C3:V11 ROW1:CF2:C6:V6 ROW3:CF2:C6:V5 ROW4:CF2:C6:V2 Physical View
Design Considerations • Row Key design • To Leverage Hbase system, row-key design is very important • Row Key must be designed based on how you access data. • Salting rowkey (prefix) • Must be designed to make sure data uniformly distributed (Avoid hotspotting) • Column Family design • Designed based on grouping of like information (user base info, user tweets) • Short name for column family (every row in Hfile contains the name, in bytes) • Two to three column families per Table
Hbase - Setup • HBase is written in Java • HBase Shell is based on JRuby’s IRB (interactive ruby shell) • Download HBasefrom https://hbase.apache.org/ • Latest stable version is 0.94.17 • Hbase • Standalone • $HBASE_HOME/bin/start-hbase.sh • $HBASE_HOME/bin/stop-hbase.sh • $HBASE_HOME/bin/hbase shell • Single Node Cluster mode (pseudo) • Cloudera VM (on VMPlayer or VirtualBox) (www.cloudera.com)
HBase – Clients • Program / API based clients • Java, REST, Thrift, Avro • Batch Clients • MapReduce (Pig, Hive) • Shell • Command Line Interface • Supports Client and Administrative operations. • Web-based UI • HUI (Hbase cluster UI)
Hbase – Shell (commands) CRUD explained CREATE = PUT READ = GET UPDATE = PUT DELETE = DELETE
References HBase: The Definitive Guide by Lars George HBase in Action by Nick Dimiduk and AmandeepKhurana