1 / 17

Hadoop Demo

Hadoop Demo. Presented by: Imranul Hoque. Topics. Hadoop running modes Stand alone Pseudo distributed Cluster Running MapReduce jobs Status/logs Sample MapReduce code. Required Software. Hadoop (release 0.18.3)

meryl
Download Presentation

Hadoop Demo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HadoopDemo Presented by: ImranulHoque

  2. Topics • Hadoop running modes • Stand alone • Pseudo distributed • Cluster • Running MapReducejobs • Status/logs • Sample MapReduce code

  3. Required Software • Hadoop (release 0.18.3) • http://apache.osuosl.org/hadoop/core/hadoop-0.18.3/hadoop-0.18.3.tar.gz • Java Development Kit (jdk 1.6.0_01) • http://java.sun.com/javase/downloads/index.jsp • Ant (ant 1.7.1) • http://apache.inetbridge.net/ant/binaries/apache-ant-1.7.1-bin.tar.gz

  4. Setup NameNode: sherpa01 JobTracker: sherpa02 DataNode/TaskTracker: sherpa05, sherpa06

  5. Assumptions • ssh must be installed and sshd must be running • Shared home directory (nfs) across all nodes in the cluster (makes life easier)

  6. Steps • Install JDK, ant • Passphraselessssh • Compiling Hadoop • Setting up config parameters • Starting up Hadoop • Running jobs • Job status

  7. Passphraselessssh Source Destination Generate private-public key-pair ~/.ssh/id_dsa and ~/.ssh/id_dsa.pub Send the public key to Destination Add the public key to the authorized key list~/.ssh/authorized_keys

  8. Passphraselessssh (2) sherpa01 sherpa02 sherpa05 sherpa06 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys (four times) Modify hostname in authorized_keys NFS Add “StrictHostKeyChecking no” in /etc/ssh/ssh_config to turn off prompt

  9. Setting the PATH JAVA_HOME=/usr/java/jdk1.6.0_01 ANT_HOME=~/ant PATH=/usr/java/jdk1.6.0_01/bin:$PATH PATH=~/ant/bin:$PATH

  10. Installing and Configuring Hadoop • Extract • Build (ant) • Modify conf/hadoop-env.sh: • export JAVA_HOME=/usr/java/jdk1.6.0_01 • Inform Hadoop of the Masters and Slaves • conf/masters • conf/slaves • Modify conf/hadoop-site.xml

  11. Rack Awareness <property> <name>topology.script.file.name</name> <value>conf/fakedns.sh</value> </property> • In fakedns.sh: • echo /rack_id

  12. Staring Hadoop • Format Namenode FS (sherpa01): • bin/hadoopnamenode -format • From NameNode (sherpa01): • bin/start-dfs.sh • From JobTracker (sherpa02): • bin/start-mapred.sh

  13. Running MapReduce • Copy data to HDFS • bin/hadoopdfs -copyFromLocal ~/data gutenberg • Run MapReduce • bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -r 6 gutenberggutenberg-output • Some HDFS commands • copyToLocal, cat, cp, rm, du, ls, etc.

  14. Job/Node Status • NameNode: • http://sherpa01.cs.uiuc.edu:50001 • DataNode: • http://sherpa02.cs.uiuc.edu:50002 • Also look at the logs: • logs/

  15. WordCount.java • src/examples/org/apache/hadoop/examples/WordCount.java • Map function • Reduce function • Driver function

  16. Shutdown • From NameNode(sherpa01): • bin/stop-dfs.sh • From JobTracker(sherpa02): • bin/stop-mapred.sh

  17. Conclusion • For more details: • http://hadoop.apache.org/core/ • http://wiki.apache.org/hadoop/

More Related