Hbase Operations At facebook

Hbase OperationsAt facebook • Paul Tuckfield • January 2012

HBase Operations • The Hbase cells • Many hbase cells • 3 versions, several minor branches/revs • Mostly uniform host types • Varying network topologies/rack topologies • Varying sizes • We (Ryan, Alex and me) are the “DBAs” or “SREs”ofhbaseat facebook • Moving towards slightly more differentiation of roles for teams at facebook as hbase effort matures We have some important use cases running Hbase, but are small compared to what is running in MySQL and Hadoop. That said, there are some critical use cases, and the fraction of very large facebook environment is still pretty large. • The Use cases: some live some not • Titan (user facing messaging) • Facebook specific time series • Puma (user facing stats) • ODS (system metrics)Hashout • Eris “multi tennant” “dormitory” for incubation of new projects • CDB : a few use cases replacing what would have been on smallish shardedmysql setups • ODS-Hbase: facebookinstrumetnation and alerting system, currently on mysql • prototype/testing of general user data on hbase

SMC / HSH : basic facebook “cloud” tools used for HBASE • SMC: • User defined sets of host:port “services” • Arbitrary metadata • Machine states (enabled,disabled) • HSH • Better version of dsh • Integration with SMC Other examples besides deploy: • Cluster start/stop • Autostart • Scan ports • Scan logs • Deploy: push slaves info to smc, use smc/hsh to push code to hosts that make up the cell SMC Deploy, tool Utility, whatever HBASE SVN/Git

HBase Maintenance • “It’s self-healing” • Backups • Stage 1,2,3 • Repairs • FBAR • Upgrades • Rolling, cold • Rack concerns

Attempt to standardize bandwidth/rack dispersion tradeoffs • Running on several different generations of network core/rackswitch combo, some slow some fast • Rack oriented would have better intra cell performance in worst case situations (not uncommon) • “horizontally” organized hopefully can survive single rack issues • I’m not so sure it’s a good thing: Network is pretty reliable, why emphasize uplink failure tolerance. maybe we should have shardedhbase setups • 2 cells of 40 hosts each, spread across 5 racks rather than “vertical” Cell 1 Cell 2 Spares

Things we monitor/alert • Monitor hundreds of variables in ODS, the facebooktimeseries database • Alert /SMS on: • Hbck failures • Dfsfsck failures • Probe / scan a table from client • Thruput rates in some cases • Most application alarms left to other teams in an attempt to be relatively generic service to the rest of facebook

Troubleshooting • Typical problems • Regionserver/Slave apocalypse • fsck inconsistencies • hbck inconsistencies • Long recoveries/timeouts after failures • Wedged regions/meta info • Log splitting during recovery • Memory /thread exhausted -> regionserver deaths • GC pauses , tuning related deaths • Rackswitch bandwidth related issues

Setting up Hbase Clusters • Doing all the things • HBase versions, 0.89 vs 0.92 • Rack and host selection • Imaging and partitioning • Populating SMC tiers • Building from templates • Pushing • Starting up everything!

Tools use $CELLNAME envvar • Typical session • Run “setcell” to set environ, all subsequent commands are “pointed at” the given hbase cell • Hbscan to see status of hosts in that cell • Hblog to look at logs • Hbprocess (like showprocess) • Etc.

Typical operations: setcell/hbhost Typically start with “setcell” Hbhost just shows what is in SMC for this cell “hbhostnn” or “hbhost master” to ssh to the given host without caring about hostnames.

Hbscan : python “nmap” like scan • Hbscan to get a quick impression of the state of the cell • Queries SMC for topology • Scans all hosts for all known ports (tcp connect ) • Takes a few seconds

Hblog: “normalize” and summarize loglines • Attempt to remove entropy to get to “core” message • Fingerprint with md5 • Summarize by md5/host • Columns -> clusterwide errors • Rows-> this particular node is jacked

Observation: Cluster is as slow as the slowest regionserver • Common pattern is to ingest data and multiput to hbase from many frontends • The larger the multiput, the more likely clients will serialize/collide on a hot regionserver • Don’t look at the average . . Look at the average *and* the outliers • But which metric? • (imagine lines drawn from every box to every can)

Observation: evolution/selection of balance • In a few cases performance issues or bugs relating to load cause hosts to crash • When crash happens regions move around • A new “hand is drawn” with different combintations of regions • When combination of regions is such that there’s no death . . Balanced!

Observation: balancing could be much better • In cases where skew seems to dominate we’ve experimented with manual region placement /splitting • Developed basic jruby/groovy scripts using HBaseAdmin • Maybe support ‘user space’ balancers

Hbase Operations At facebook