chapter 8 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CHAPTER 8 PowerPoint Presentation
Download Presentation
CHAPTER 8

Loading in 2 Seconds...

play fullscreen
1 / 72

CHAPTER 8 - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

CHAPTER 8. Hadoop 設定與配置. Outline. 前置作業 Hadoop 安裝 設定 HBase 叢集安裝 Hadoop 基本 操作 HBase 基本 操作 網頁介面. 前置作業 Hadoop 安裝設定 HBase 叢集安裝 Hadoop 基本操作 HBase 基本操作 網頁介面. 前置作業 (1/5). Hadoop 可建立在 GNU/Linux 及 Win32 平台之上,本實作以 GNU/Linux 為安裝平台 。

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CHAPTER 8' - chase-lee


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chapter 8

CHAPTER 8

Hadoop設定與配置

outline
Outline
  • 前置作業
  • Hadoop安裝設定
  • HBase叢集安裝
  • Hadoop基本操作
  • HBase基本操作
  • 網頁介面
slide3
前置作業

Hadoop安裝設定

HBase叢集安裝

Hadoop基本操作

HBase基本操作

網頁介面

slide4
前置作業(1/5)
  • Hadoop可建立在GNU/Linux及Win32平台之上,本實作以GNU/Linux為安裝平台。
  • 安裝Hadoop前需先安裝兩個套件,分別是Java及ssh。Hadoop以Java撰寫而成,所以必需在Java的環境(JRE)下運作;
  • 因此系統需安裝Java第六版(或更新的版本)。
slide5
前置作業(2/5)
  • 本範例所使用的作業系統為CentOS 5.5,而在安裝作業系統過程中,預設會安裝OpenJDK的Java套件,安裝成功與否可由指令java -version進行查詢:

~# java -version

java version "1.6.0_17"

OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-i386)

OpenJDK Client VM (build 14.0-b16, mixed mode)

  • 如看到上述畫面,可確認系統已安裝OpenJDK套件。若尚未安裝,亦可透過yum進行安裝:

~# yum -y install java-1.6.0-openjdk

slide6
前置作業(3/5)
  • 雖然Hadoop可在OpenJDK上運作,但OpenJDK對某些應用並不支援,且為了後續程式開發,本範例的Java環境以Oracle (Sun) Java JDK為主,Oracle (Sun) Java JDK可從Oracle官網(http://www.oracle.com)下載
slide7
前置作業(4/5)
  • 本範例將jdk-6u25-linux-i586.bin下載至/usr後,將檔案改為可執行模式:

~# chmod +x jdk-6u25-linux-i586.bin

  • 接著執行安裝檔:

~# ./jdk-6u25-linux-i586.bin

  • 開始執行指令便可自動安裝,並在/usr(放置安裝檔的目錄)中創建名為jdk1.6.0_25的目錄。接下來使用指令alternatives讓Oracle (Sun) Java JDK代替OpenJDK:

~# alternatives --install /usr/bin/java java /usr/jdk1.6.0_25/bin/java 20000

~# alternatives --install /usr/bin/javac javac /usr/jdk1.6.0_25/bin/javac 20000

slide8
前置作業(5/5)
  • 最後再度確認Java環境是否安裝成功:

~# java –version

java version "1.6.0_25"

Java(TM) SE Runtime Environment (build 1.6.0_25-b06)

Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)

~#javac -version

Javac 1.6.0_25

  • 接者安裝ssh與rsync,並啟動服務如下:

~# yum -y install openssh rsync

~# /etc/init.d/sshd restart

  • 請注意!為了方便Hadoop的安裝,安裝示範將直接使用root權限操作。
slide9
前置作業

Hadoop安裝設定

HBase叢集安裝

Hadoop基本操作

HBase基本操作

網頁介面

hadoop
Hadoop安裝設定
  • 安裝Hadoop可分為三種模式:
    • Local (Standalone) Mode
      • 單機環境
      • 適合用於除錯
    • Pseudo-Distributed Mode
      • 單機環境
      • 模擬多個節點運行環境
    • Fully-Distributed Mode
      • 叢集環境
      • 分散式運算
local standalone mode 1 7
Local (Standalone) Mode(1/7)
  • 首先下載Hadoop安裝檔案,到Apache Hadoop官網(http://hadoop.apache.org/)下載
    • 雖然Hadoop最新版為Hadoop 0.21.0,但目前此版本仍不穩定。
    • 因此範例以Hadoop 0.20.2為主,使用指令wget從其中一個鏡像站下載hadoop-0.20.2.tar.gz:
  • 下載完畢後解壓縮:

~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz

~# tar zxvf hadoop-0.20.2.tar.gz

local standalone mode 2 7
Local (Standalone) Mode(2/7)
  • 接著把解壓縮出來的目錄hadoop-0.20.2移至目錄/opt下,並將目錄更名為hadoop:

~# mv hadoop-0.20.2 /opt/hadoop

  • 接著將Java加到Hadoop的環境變數。首先進入hadoop目錄,使用vi編輯conf/hadoop-env.sh:

~# cd /opt/hadoop/

/hadoop# vi conf/hadoop-env.sh

local standalone mode 3 7
Local (Standalone) Mode(3/7)
  • 開啟hadoop-env.sh後加入JAVA_HOME路徑(export JAVA_HOME=/usr/jdk1.6.0_25) 到下列位置。
  • 此外,為避免因為IPv6協定所造成的錯誤,可先關閉IPv6,或在hadoop-env.sh加入export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true,優先使用IPv4解決此問題:

# Command specific options appended to HADOOP_OPTS when specified

...

...

...

export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"

export JAVA_HOME=/usr/jdk1.6.0_25 ←在此加入JAVA_HOME路徑

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true ←優先使用IPv4

local standalone mode 4 7
Local (Standalone) Mode(4/7)
  • 到這邊已完成Hadoop Local (Standalone) Mode的安裝,可用下列指令測試Hadoop功能是否正常:
  • 若出現上面訊息,則表示安裝成功;若出現錯誤訊息,請檢查conf/hadoop-env.sh中JAVA_HOME路徑是否正確。

/hadoop# bin/hadoop

Usage: hadoop [--config confdir] COMMAND

where COMMAND is one of:

namenode -format format the DFS filesystem

...

...

...

or

CLASSNAME run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

local standalone mode 5 7
Local (Standalone) Mode(5/7)
  • 於此可以先執行Hadoop壓縮檔內所附的範例程式hadoop-0.20.2-examples.jar,使用grep功能計算輸入文件中包含指定字串的每個字出現的次數。
  • 可透過下列指令先創建一個名為input的目錄,並把conf/中所有xml檔都複製到新創的目錄input中:

/hadoop# mkdir input

/hadoop# cp conf/*.xml input

local standalone mode 6 7
Local (Standalone) Mode(6/7)
  • 接著執行範例hadoop-0.20.2-examples.jar,這邊使用其grep功能,過濾出所有輸入檔案中以“config”為字首的單字:

/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'

  • 這個範例應該很快可以執行完畢,便可利用下列指令查看結果:

/hadoop# cat output/*

13 configuration

4 configuration.xsl

1 configure

local standalone mode 7 7
Local (Standalone) Mode(7/7)
  • 在執行過hadoop-0.20.2-examples.jar grep這個範例後,需把輸出目錄output清除掉,否則再次執行此範例時會出現目錄已存在的錯誤,因此需將目錄output刪除:

/hadoop# rm -rf output

pseudo distributed mode 1 9
Pseudo-Distributed Mode(1/9)
  • 因為Pseudo-Distributed Mode跟Local (Standalone) Mode一樣是單機環境
  • 因此,延續上面的步驟,修改目錄conf內的core-site.xml、hdfs-site.xml及mapred-site.xml。
  • 首先修改core-site.xml:

/hadoop# vi conf/core-site.xml

pseudo distributed mode 2 9
Pseudo-Distributed Mode(2/9)
  • 在core-site.xml內容的<configuration>與</configuration>之間插入

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

pseudo distributed mode 3 9
Pseudo-Distributed Mode(3/9)
  • 接著修改hdfs-site.xml:

/hadoop# vi conf/hdfs-site.xml

  • 在hdfs-site.xml內容的<configuration>與</configuration>之間插入

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

pseudo distributed mode 4 9
Pseudo-Distributed Mode(4/9)
  • 最後修改mapred-site.xml:

/hadoop# vi conf/mapred-site.xml

  • 在mapred-site.xml內容的<configuration>與</configuration>之間插入

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

pseudo distributed mode 5 9
Pseudo-Distributed Mode(5/9)
  • 設定SSH登入免密碼步驟:
    • 由於Hadoop系統中的主機都是透過ssh互相溝通,因此需設定ssh登入時免輸入密碼。
    • 首先確認登入本地機器是否需要密碼(第一次登入會出現是詢問是否連結的訊息,輸入yes並按下Enter後便可繼續登入):

~# ssh localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established.

RSA key fingerprint is <your RSA key fingerprint>

Are you sure you want to continue connecting (yes/no)? yes ←輸入yes

Warning: Permanently added 'localhost' (RSA) to the list of known hosts.

root@localhost's password: ←要求輸入密碼

pseudo distributed mode 6 9
Pseudo-Distributed Mode(6/9)
  • 若出現要求輸入密碼,此時可按“Ctrl + C”,先跳出輸入密碼步驟。接下來使用下列指令完成登入免密碼:

~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""

~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

  • 完成後再登入本地機器確認是否還須要密碼,便可發現不需輸入密碼即可登入,登入後輸入exit即可登出:

~# sshlocalhost

Last login: Mon May 16 10:04:39 2011 from localhost

~# exit

pseudo distributed mode 7 9
Pseudo-Distributed Mode(7/9)
  • 啟動Hadoop:經過上述步驟後,Hadoop的環境設定就大致完成,接著輸入bin/hadoopnamenode -format進行HDFS格式化步驟:

/hadoop# bin/hadoopnamenode -format

11/05/16 10:20:27 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

...

...

...

11/05/16 10:20:28 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1

************************************************************/

pseudo distributed mode 8 9
Pseudo-Distributed Mode(8/9)
  • 接著只需輸入bin/start-all.sh就可啟動名稱節點、第二組名稱節點、資料節點、Jobtracker及Tasktracker:

/hadoop# bin/start-all.sh

starting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.out

localhost: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Host01.out

localhost: starting secondarynamenode, logging to /opt/hadoop/bin/../logs/hadoop-root-secondarynamenode-Host01.out

starting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.out

localhost: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker-Host01.out

pseudo distributed mode 9 9
Pseudo-Distributed Mode(9/9)
  • 測試範例:
    • 同樣的,執行hadoop-0.20.2-examples.jar grep這個範例測試運行狀態。
    • 先利用指令bin/hadoopfs -put將目錄conf中的所有檔案放到HDFS中目錄input底下,再執行hadoop-0.20.2-examples.jar grep這個範例。

/hadoop# bin/hadoopfs -put conf input

/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'

fully distributed mode 1 14
Fully-Distributed Mode(1/14)
  • 在安裝Hadoop叢集之前,需先架設叢集系統,其架設方法於本節不再贅述。
  • 此安裝範例使用的叢集環境如下:
    • 不論擔任Master或Slave,每台機器都需先安裝Java及ssh
fully distributed mode 2 14
Fully-Distributed Mode(2/14)
  • 若之前已執行單機安裝流程,請執行下列步驟清除之前設定。先使用stop-all.sh指令停止Hadoop運行,再將Hadoop及.ssh目錄刪除:

/hadoop# /opt/hadoop/bin/stop-all.sh

~# rm -rf /opt/hadoop

~# rm -rf ~/.ssh

~# rm -rf /tmp/*

  • 下載Hadoop到Host01安裝:與單機安裝步驟相同,以下步驟都在Host01上操作。首先到官網下載Hadoop 0.20.2版並解壓縮,解壓縮後移到Hadoop目錄到/opt/hadoop中:

~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz

~# tar zxvf hadoop-0.20.2.tar.gz

~# mv hadoop-0.20.2 /opt/hadoop

fully distributed mode 3 14
Fully-Distributed Mode(3/14)
  • 首先設定bin/hadoop-env.sh,進入Hadoop的目錄/opt/hadoop後,利用vi編輯器修改bin/hadoop-env.sh內容:

~# cd /opt/hadoop/

/hadoop# vi conf/hadoop-env.sh

  • 接著開啟hadoop-env.sh,加入JAVA_HOME路徑到下列位置:

# Command specific options appended to HADOOP_OPTS when specified

export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"

export JAVA_HOME=/usr/jdk1.6.0_25 ←在此加入JAVA_HOME路徑

fully distributed mode 4 14
Fully-Distributed Mode(4/14)
  • 再來設定conf/core-site.xml,利用vi開啟:

/hadoop# vi conf/core-site.xml

  • 在conf/core-site.xml裏的<configuration>及</configuration>之間加入
fully distributed mode 5 14
Fully-Distributed Mode(5/14)

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://Host01:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/var/hadoop/hadoop-${user.name}</value>

</property>

</configuration>

fully distributed mode 6 14
Fully-Distributed Mode(6/14)
  • 接著設定conf/hdfs-site.xml,利用vi開啟conf/hdfs-site.xml:

/hadoop# vi conf/hdfs-site.xml

  • 在conf/hdfs-site.xml裏的<configuration>及</configuration>之間加入

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

</configuration>

fully distributed mode 7 14
Fully-Distributed Mode(7/14)
  • 接下來設定conf/mapred-site.xml,利用vi開啟conf/mapred-site.xml:

/hadoop# vi conf/mapred-site.xml

  • 在conf/mapred-site.xml裏的<configuration>及</configuration>之間加入

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>Host01:9001</value>

</property>

</configuration>

fully distributed mode 8 14
Fully-Distributed Mode(8/14)
  • 接著設定conf/masters,利用vi開啟conf/masters:
    • 把這個檔案的內容清空即可

/hadoop# vi conf/mapred-site.xml

  • 再來設定conf/slaves,利用vi開啟conf/slaves:
    • 把conf/slaves中的localhost刪除,改為Host02即可

/hadoop# vi conf/mapred-site.xml

fully distributed mode 9 14
Fully-Distributed Mode(9/14)
  • 設定兩台主機登入免密碼:
    • 與單機安裝時相同,只需多一步驟,利用指令scp把公鑰傳給其它主機:

~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""

~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

~# scp -r ~/.ssh Host02:~/

  • 接著測試是否登入免密碼:

~# ssh Host02 ←從Host01登入Host02

~# ssh Host01 ←從Host02登入Host01

~# exit ←離開Host01

~# exit ←離開Host02 (即又回到原來的Host01)

fully distributed mode 10 14
Fully-Distributed Mode(10/14)
  • 將Hadoop目錄複製到其它主機上:
    • 設定完畢的Hadoop目錄需複製到每台Slave上,也可透過共享式檔案系統(如NFS)共用同一個目錄。
    • 複製Host01的Hadoop目錄到Host02之步驟如下:

~# scp -r /opt/hadoop Host02:/opt/

  • 啟動Hadoop:與單機安裝相同,先進行HDFS格式化步驟:

/hadoop# bin/hadoop namenode -format

fully distributed mode 11 14
Fully-Distributed Mode(11/14)
  • 接著可看到下列訊息:

11/05/16 21:52:13 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = Host01/127.0.0.1

STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 0.20.2

STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

************************************************************/

11/05/16 21:52:13 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel

...

...

11/05/16 21:52:13 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at Host01/127.0.0.1

************************************************************/

fully distributed mode 12 14
Fully-Distributed Mode(12/14)
  • 接著啟動Hadoop:

/hadoop# bin/start-all.sh

starting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.out

Host02: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Host02.out

starting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.out

Host02: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker-Host02.out

fully distributed mode 13 14
Fully-Distributed Mode(13/14)
  • 於此已完成Fully-Distributed Mode 架設,接下來可用指令bin/hadoopdfsadmin -report來查看HDFS的狀態。
    • 最上面顯示整個HDFS的資訊,接著為每個資料節點的個別資訊:

/hadoop# bin/hadoopdfsadmin -report

Configured Capacity: 9231007744 (8.6 GB)

...

...

Blocks with corrupt replicas: 0

Missing blocks: 0

-------------------------------------------------

Datanodes available: 1 (1 total, 0 dead)

...

...

DFS Remaining%: 41.88%

Last contact: Mon May 16 22:15:03 CST 2011

fully distributed mode 14 14
Fully-Distributed Mode(14/14)
  • 接著可測試一下hadoop-0.20.2-examples.jar grep範例。先在HDFS上創建一個目錄input,再將Hadoop目錄中conf/內的檔案全放上去,接著便執行hadoop-0.20.2-examples.jar grep範例:

/hadoop# bin/hadoopfs -mkdir input

/hadoop# bin/hadoopfs -put conf/* input/

/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'

  • 查詢執行結果:

/hadoop# bin/hadoopfs -cat output/part-00000

19 configuration

6 configuration.xsl

1 configure

slide41
前置作業

Hadoop安裝設定

HBase叢集安裝

Hadoop基本操作

HBase基本操作

網頁介面

hbase 1 9
HBase叢集安裝(1/9)
  • 安裝HBase有下列三項需求:
    • 必需先安裝Hadoop叢集系統並啟動
    • HBase0.20版以後的叢集安裝需搭配ZooKeeper
    • 使用NTP校對叢集系統內所有主機時間:
hbase 2 9
HBase叢集安裝(2/9)
  • 下載HBase:
    • 首先到HBase的官網(http://hbase.apache.org/)下載HBase壓縮檔
    • 目前可下載最新版hbase-0.90.2.tar.gz,解壓縮後放到目錄/opt/內並更名為hbase,最後進入HBase的目錄中:

~# wget http://apache.cs.pu.edu.tw//hbase/hbase-0.90.2/hbase-0.90.2.tar.gz/hadoop~# tar zxvf hbase-0.90.2.tar.gz

~# mv hbase-0.90.2 /opt/hbase

~# cd /opt/hbase/

  • 利用vi編輯conf/hbase-env.sh:

/hbase# vi conf/hbase-env.sh

hbase 3 9
HBase叢集安裝(3/9)
  • 並將下列內容加入到conf/hbase-env.sh裡:

export JAVA_HOME=/usr/jdk1.6.0_25/

export HBASE_MANAGES_ZK=true

export HBASE_LOG_DIR=/tmp/hadoop/hbase-logs

export HBASE_PID_DIR=/tmp/hadoop/hbase-pids

  • 編輯conf/hbase-site.xml內容,並設定HBase的一些相關參數

/hbase# vi conf/hbase-site.xml

hbase 4 9
HBase叢集安裝(4/9)
  • 並將下列內容加入到conf/hbase-site.xml裡:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

-->

<configuration>

<property>

<name>hbase.rootdir</name>

<value>hdfs://Host01:9000/hbase</value>

</property>

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

<property>

<name>hbase.zookeeper.property.clientPort</name>

<value>2222</value>

</property>

<property>

<name>hbase.zookeeper.quorum</name>

<value>Host01,Host02</value>

</property>

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/tmp/hadoop/hbase-data</value>

</property>

hbase 5 9
HBase叢集安裝(5/9)

<property>

<name>hbase.tmp.dir</name>

<value>/var/hadoop/hbase-${user.name}</value>

</property>

<property>

<name>hbase.master </name>

<value>Host01:60000</value>

</property>

</configuration>

hbase 6 9
HBase叢集安裝(6/9)
  • 接著利用vi編輯conf/regionservers

/hbase# vi conf/regionservers

  • 此範例只有一台Slave,所以在slaves檔中只有一行該Slave的主機名稱:

Host02

  • 接著複製Hadoop的設定檔至HBase的目錄conf/中:

/hbase# cp /opt/hadoop/conf/core-site.xml conf/

/hbase# cp /opt/hadoop/conf/mapred-site.xml conf/

/hbase# cp /opt/hadoop/conf/hdfs-site.xml conf/

hbase 7 9
HBase叢集安裝(7/9)
  • 將hbase目錄中的lib/hadoop-core-0.20-append-r1056497.jar檔刪除,並複製hadoop目錄中的hadoop-0.20.2-core.jar到hbase的目錄lib內做為替換:

/hbase# rm lib/hadoop-core-0.20-append-r1056497.jar

/hbase# cp /opt/hadoop/hadoop-0.20.2-core.jar ./lib/

  • 接著將目錄hbase複製到其它Slave節點:

/hbase# scp -r /opt/hbase Host02:/opt/hbase

hbase 8 9
HBase叢集安裝(8/9)
  • 啟動HBase:

/hbase# bin/start-hbase.sh

Host02: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host02.out

Host01: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host01.out

starting master, logging to /tmp/hadoop/hbase-logs/hbase-root-master-Host01.out

Host02: starting regionserver, logging to /tmp/hadoop/hbase-logs/hbase-root-regionserver-Host02.out

hbase 9 9
HBase叢集安裝(9/9)
  • 透過執行hbase shell指令,進入HBase控制台後輸入list。若可正常執行,則表示HBsae安裝完成:

/hbase# bin/hbase shell

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011

hbase(main):001:0> list ←輸入list並按下Enter

TABLE

0 row(s) in 0.3950 seconds

hbase(main):002:0>

slide51
前置作業

Hadoop安裝設定

HBase叢集安裝

Hadoop基本操作

HBase基本操作

網頁介面

hadoop 2 7
Hadoop基本操作(2/7)
  • 其它HDFS指令可透過bin/hadoopfs指令查詢:

/hadoop# bin/hadoopfs

Usage: java FsShell

[-ls <path>]

[-lsr <path>]

[-du <path>]

...

...

...

-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster

-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.

-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

hadoop 3 7
Hadoop基本操作(3/7)
  • MapReduce Job操作指令:
    • 一般而言,在Hadoop中的MapReduce Job必需被包裝成jar檔,才可在Hadoop環境執行。
    • bin/hadoop jar [MapReduce Job jar檔路徑] [Job主類別] [Job參數]
    • Hadoop內建的hadoop-0.20.2-examples.jar範例除了前文使用過的grep功能外,還有許多其它功能如wordcount、pi等等可測試,可用下列指令查詢:

/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar

hadoop 4 7
Hadoop基本操作(4/7)
  • Hadoop目錄內的其它jar檔分別有其它功能。
    • hadoop-0.20.2-core.jar內包含了hadoop common、hdfs及mapreduce的類別
    • hadoop-0.20.2-test.jar是一些測試 Hadoop的工具
    • hadoop-0.20.2-ant.jar內包含了給Ant使用的類別
hadoop 5 7
Hadoop基本操作(5/7)
  • 透過bin/hadoop job可執行一些與Job相關的操作:
    • 列出所有程序之指令如下:

/hadoop# bin/hadoop job -list all

5 jobs submitted

States are:

Running : 1 Succeded : 2 Failed : 3 Prep : 4

JobId State StartTimeUserName Priority SchedulingInfo

job_201105162211_0001 2 1305555169692 root NORMAL NA

job_201105162211_0002 2 1305555869142 root NORMAL NA

job_201105162211_0003 2 1305555912626 root NORMAL NA

job_201105162211_0004 2 1305633307809 root NORMAL NA

job_201105162211_0005 2 1305633347357 root NORMAL NA

hadoop 6 7
Hadoop基本操作(6/7)
  • 查看某個Job狀態之指令如下:
    • bin/hadoop job -status [JobID]

/hadoop# bin/hadoop job -status job_201105162211_0001

  • 查看Job歷史記錄之指令如下:
    • bin/hadoop job -history [輸出目錄]

bin/hadoop job -history /user/root/output

Hadoop job: job_201105162211_0007

=====================================

Job tracker host name: Host01

job tracker start time: Mon May 16 22:11:01 CST 2011

User: root

JobName: grep-sort

…以下省略

hadoop 7 7
Hadoop基本操作(7/7)
  • 其它Job指令可透過bin/hadoop job查詢:

/hadoop# bin/hadoop job

Usage: JobClient <command> <args>

[-submit <job-file>]

[-status <job-id>]

[-counter <job-id> <group-name> <counter-name>]

[-kill <job-id>]

[-set-priority <job-id> <priority>]. Valid values for priorities are: VERY_HIGH HIGH NORMAL LOW VERY_LOW

...

...

...

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

slide59
前置作業

Hadoop安裝設定

HBase叢集安裝

Hadoop基本操作

HBase基本操作

網頁介面

hbase 1 10
HBase基本操作(1/10)
  • 以下透過創建學生成績表格的範例,介紹幾個常用的HBase基本指令。
  • 首先執行bin/hbasehsell進入HBase控制台:

/hbase# bin/hbase shell

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011

hbase(main):001:0>

hbase 2 10
HBase基本操作(2/10)
  • 建立一張表格scores包含studentid及course這兩個column:
    • > create ‘[表格名稱]’, ‘[column名稱1]’, ‘[column名稱2]’,…

hbase(main):001:0> create 'scores', 'studentid', 'course'

0 row(s) in 1.8970 seconds

  • 透過指令list列出目前HBase中所有的表格:

hbase(main):002:0> list

TABLE

scores

1 row(s) in 0.0170 seconds

hbase 3 10
HBase基本操作(3/10)
  • 查看表格的結構:
    • > describe ‘[表格名稱]’

hbase(main):003:0> describe 'scores'

DESCRIPTION ENABLED

BLOCKCACHE => 'true'}]}

1 row(s) in 0.0260 seconds

  • 在表格scores中加入一列名為John的資料,其studentid之column的值為1:
    • > put ‘[表格名稱]’, ‘[row名稱]’, ‘[column名稱]’, ‘[值]’

hbase(main):004:0> put 'scores', 'John', 'studentid:', '1'

0 row(s) in 0.0600 seconds

hbase 4 10
HBase基本操作(4/10)
  • 在John這一列中加入course:math這一個column,值為80:

hbase(main):005:0> put 'scores', 'John', 'course:math', '80'

0 row(s) in 0.0100 seconds

  • 在John這一列中加入course:history這一個column,值為85:

hbase(main):006:0> put 'scores', 'John', 'course:history', '85'

0 row(s) in 0.0080 seconds

hbase 5 10
HBase基本操作(5/10)
  • 同樣的加入另一列Adam的資料,studentid值為2、course:math值為75、course:history值為90

hbase(main):007:0> put 'scores', 'Adam', 'studentid:', '2'

0 row(s) in 0.0130 seconds

hbase(main):008:0> put 'scores', 'Adam', 'course:math', '75'

0 row(s) in 0.0100 seconds

hbase(main):009:0> put 'scores', 'Adam', 'course:history', '90'

0 row(s) in 0.0080 seconds

hbase 6 10
HBase基本操作(6/10)
  • 查詢scores表中所有的資料:
    • > scan ‘[表格名稱]’

hbase(main):011:0> scan 'scores'

ROW COLUMN+CELL

Adam column=course:history, timestamp=1305704304053, value=90

Adam column=course:math, timestamp=1305704282591, value=75

Adam column=studentid:, timestamp=1305704186916, value=2

John column=course:history, timestamp=1305704046378, value=85

John column=course:math, timestamp=1305703949662, value=80

John column=studentid:, timestamp=1305703742527, value=1

2 row(s) in 0.0420 seconds

hbase 7 10
HBase基本操作(7/10)
  • 查詢scores表格中John的資料:
    • > get ‘[表格名稱]’, ‘[row名稱]’

hbase(main):010:0> get 'scores', 'John'

COLUMN CELL

course:history timestamp=1305704046378, value=85

course:math timestamp=1305703949662, value=80

studentid: timestamp=1305703742527, value=1

3 row(s) in 0.0440 seconds

hbase 8 10
HBase基本操作(8/10)
  • 查詢表格scores中courses之column family的所有數據:
    • > scan ‘[表格名稱]’, {COLUMNS => ‘[column family名稱]’}

hbase(main):011:0> scan 'scores', {COLUMNS => 'course:'}

ROW COLUMN+CELL

Adam column=course:history, timestamp=1305704304053, value=90

Adam column=course:math, timestamp=1305704282591, value=75

John column=course:history, timestamp=1305704046378, value=85

John column=course:math, timestamp=1305703949662, value=80

2 row(s) in 0.0250 seconds

hbase 9 10
HBase基本操作(9/10)
  • 同時查詢表格scores中某幾column的所有資料:
    • > scan ‘[表格名稱]’, {COLUMNS => [‘[column名稱1]’, ‘[column名稱2]’,…]}

hbase(main):012:0> scan 'scores', {COLUMNS => ['studentid','course:']}

ROW COLUMN+CELL

Adam column=course:history, timestamp=1305704304053, value=90

Adam column=course:math, timestamp=1305704282591, value=75

Adam column=studentid:, timestamp=1305704186916, value=2

John column=course:history, timestamp=1305704046378, value=85

John column=course:math, timestamp=1305703949662, value=80

John column=studentid:, timestamp=1305703742527, value=1

2 row(s) in 0.0290 seconds

hbase 10 10
HBase基本操作(10/10)
  • 刪除一張表格:
    • 在刪除一張表格之前,必需給把該表格disable,再透過drop的指令刪除:

hbase(main):003:0> disable 'scores'

0 row(s) in 2.1510 seconds

hbase(main):004:0> drop 'scores'

0 row(s) in 1.7780 seconds

slide70
前置作業

Hadoop安裝設定

HBase叢集安裝

Hadoop基本操作

HBase基本操作

網頁介面

slide71
網頁介面(1/2)
  • 除了可利用指令查看Hadoop狀態外,Hadoop也提供網頁監控介面讓使用者查詢HDFS的名稱節點及MapReduce的Jobtracker狀態
    • 若使用者使用本機端的圖形化介面,即可打開瀏覽器(Mozilla Firefox)並在網址列輸入http://localhost:50070,即可看到名稱節點的狀態
    • 另外,輸入http://localhost:50030即可看到Jobtracker的狀態
slide72
網頁介面(2/2)
  • HBase也提供了網頁介面供管理者監控,
    • 查看Master狀態:
      • 在Master節點上的瀏覽器網址輸入http://localhost:60010/
    • 查看Region Server狀態:
      • 在Slave節點上的瀏覽器網址輸入http://localhost:60030/
    • 查看ZooKeeper:
      • 在Master節點上的瀏覽器網址輸入http://localhost:60010/zk.jsp