1 / 21

Thanks to our Sponsors!

Thanks to our Sponsors! . To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login . Introduction to. Giri Vislawath Senior Software Developer Overstock.com giri.vislawath@gmail.com. Agenda.

magar
Download Presentation

Thanks to our Sponsors!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thanks to our Sponsors! To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login

  2. Introduction to Giri Vislawath Senior Software Developer Overstock.com giri.vislawath@gmail.com

  3. Agenda • What is HBase ? • What HBase is NOT? • Relational Database vsHBase • HBase • Architecture • Data Model • Logical & Physical View • Design Considerations • Setup • Clients • Demo • Q & A

  4. What is HBase? • Open source Apache project • Non-relational, distributed Database • Runs on top of HDFS • Modeled after Google’s BigTable technology • Written in Java • NoSQL (Not Only SQL) Database • Consistent and Partition tolerant • Runs on commodity hardware • Large Database ( terabytes to petabytes). • Low latency random read / write to HDFS. • Many companies are using HBase • Facebook, Twitter, Adobe, Mozilla, Yahoo!, Trend Micro, and StumbleUpon

  5. HBase is NOT • A direct replacement for RDBMS • ACID (Atomicity, Consistency, Isolation, and Durability) complaint • HBase provides row-level atomicity • A scan is NOT consistent view of a table (neither isolated) • All visible data is also durable data.

  6. Relational Database vsHBase • Hardware • Expensive Enterprise multiprocessor systems • Same as Hadoop • Fault Tolerance • RDBMS are configured with high availability. Server down time intolerable. • Built into the architecture. Individual Node failure does not impact overall performance. • Database Size • RDBMS can hold upto TBs (Tera bytes) • Hbase can hold PBs (Peta bytes) • Data Layout • RDBMS are rows and columns oriented • Hbase is Column oriented

  7. Relational Database vsHBase • Data Type • Rich data type. • Bytes • Transactions • Fully ACID complaint. • ACID on single row only. • Indexes • PK, FK and other indexes. • Sorted Row-key (not a real index)

  8. HBase Architecture Zookeeper Client Master Region Server 1 Region Server 2 Region Server 3 HDFS / Hadoop

  9. HBase – Fault Tolerance • What if region server dies? • The hbase master will assign a new regionserver. • What if maser dies? • The back up master will take over. • What if the backup master dies? • You are dead. • Replication of Data • HBase achieves this using HDFS replication mechanism. • Failure Detection • Zookeeper is used for identifying failed region servers.

  10. HBase Data Model • No Schema • Table • Row-key must be unique • Rows are formed by one or more columns • Columns are grouped into Column Families • Column Families must be defined at table creation time • Any number of Columns per column family • Columns can be added on the fly • Columns can be NULL • NULL columns are NOT stored (free of cost) • Column only exist when inserted (Sparse) • Cell • Row Key, Column Family, Qualifier , Timestamp / Version • Data represented in byte array • Table name, Column Family name, Column name

  11. HBase – Logical View of Data RDBMS View Logical Hbase View

  12. HBase – Physical View of Data Info column family tweet column family KEY (ROW KEY, CF, QUALIFIER, TIMESTAMP) => VALUE

  13. Hbase – Logical to Physical View CF1 CF2 HFile for CF2 HFile for CF1 ROW1:CF1:C1:V1 ROW1:CF1:C3:V3 ROW2:CF1:C1:V4 ROW2:CF1:C2:V6 ROW2:CF1:C4:V7 ROW3:CF1:C3:V6 ROW4:CF1:C1:V10 ROW4:CF1:C3:V11 ROW1:CF2:C6:V6 ROW3:CF2:C6:V5 ROW4:CF2:C6:V2 Physical View

  14. Design Considerations • Row Key design • To Leverage Hbase system, row-key design is very important • Row Key must be designed based on how you access data. • Salting rowkey (prefix) • Must be designed to make sure data uniformly distributed (Avoid hotspotting) • Column Family design • Designed based on grouping of like information (user base info, user tweets) • Short name for column family (every row in Hfile contains the name, in bytes) • Two to three column families per Table

  15. Hbase - Setup • HBase is written in Java • HBase Shell is based on JRuby’s IRB (interactive ruby shell) • Download HBasefrom https://hbase.apache.org/ • Latest stable version is 0.94.17 • Hbase • Standalone • $HBASE_HOME/bin/start-hbase.sh • $HBASE_HOME/bin/stop-hbase.sh • $HBASE_HOME/bin/hbase shell • Single Node Cluster mode (pseudo) • Cloudera VM (on VMPlayer or VirtualBox) (www.cloudera.com)

  16. HBase – Clients • Program / API based clients • Java, REST, Thrift, Avro • Batch Clients • MapReduce (Pig, Hive) • Shell • Command Line Interface • Supports Client and Administrative operations. • Web-based UI • HUI (Hbase cluster UI)

  17. Hbase – Shell (commands) CRUD explained CREATE = PUT READ = GET UPDATE = PUT DELETE = DELETE

  18. Hbase – Java API (examples)

  19. References HBase: The Definitive Guide by Lars George HBase in Action by Nick Dimiduk and AmandeepKhurana

  20. Thank You

More Related