Comprehensive Guide to Setting Up Hadoop on Multiple Operating Systems
This guide outlines the prerequisites, setup, and execution of Hadoop on various systems including Mac OS, Linux, and Windows via Cygwin. It emphasizes the necessity of a stable environment, particularly supporting Ubuntu, and provides detailed instructions for both single-node and cluster setups. Users will learn to configure essential properties, generate SSH keys, format distributed filesystems, and start Hadoop daemons. It also covers the management of configuration files and links to further resources, ensuring smooth execution of Hadoop tasks.
Comprehensive Guide to Setting Up Hadoop on Multiple Operating Systems
E N D
Presentation Transcript
Prerequisite: • System: Mac OS / Linux / Cygwin on Windows • Notice: • 1. only works in Ubuntu will be supported by TA. You may try other environments for challenge. • 2. Cygwin on Windows is not recommended, for its instability and unforeseen bugs. • Java Runtime Environment, JavaTM 1.6.x recommended • ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons. Hadoop Setup
Single Node Setup (Usually for debug) • Untarhadoop-*.**.*.tar.gz to your user path • About Version: • The latest stable version 1.0.1 is recommended. • edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation • edit the files to configure properties: conf/core-site.xml: <configuration> <property> <name> fs.default.name </name> <value> hdfs://localhost:9000 </value> </property> </configuration> conf/hdfs-site.xml: <configuration> <property> <name> dfs.replication </name> <value> 1 </value> </property> </configuration> conf/mapred-site.xml: <configuration> <property> <name> mapred.job.tracker </name> <value> localhost:9001 </value> </property> </configuration> Hadoop Setup
Cluster Setup ( the only acceptable setup for HW) • Same steps as single node setup • Set dfs.name.dir and dfs.data.dir property in hdfs-site.xml • Add the master’s node name to conf/master • Add all the slaves’ node name to conf/slaves • Edit /etc/hosts in each node: add IP and node name item for each node • Suppose your master’s node name is ubuntu1 and its IP is 192.168.0.2, then add line “192.168.0.2 ubuntu1” to the file • Copy the folder to the same path of all nodes • Notice: JAVA_HOME may not be set the same in each node Hadoop Setup
Execution • generating ssh keygen. Passphrase will be omitted when starting up:$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys$ ssh localhost • Format a new distributed-filesystem:$ bin/hadoop namenode –format • Start the hadoop daemons:$ bin/start-all.sh • The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs). Hadoop Setup
Execution(continued) • Copy the input files into the distributed filesystem:$ bin/hadoop fs -put conf input • Run some of the examples provided:$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' • Examine the output files: • View the output files on the distributed filesystem:$ bin/hadoop fs -cat output/* • When you're done, stop the daemons with:$ bin/stop-all.sh Hadoop Setup
Details About Configuration Files • Hadoop configuration is driven by two types of important configuration files: • Read-only default configuration:src/core/core-default.xmlsrc/hdfs/hdfs-default.xmlsrc/mapred/mapred-default.xmlconf/mapred-queues.xml.template. • Site-specific configuration:conf/core-site.xmlconf/hdfs-site.xmlconf/mapred-site.xmlconf/mapred-queues.xml Hadoop Setup
Details About Configuration Files (continued) conf/core-site.xml: conf/hdfs-site.xml: Hadoop Setup
Details About Configuration Files (continued) conf/mapred-site.xml: Hadoop Setup
You may get detailed information from The official site: http://hadoop.apache.org Course slides & Textbooks: http://www.cs.sjtu.edu.cn/~liwujun/course/mmds.html Michael G. Noll's Blog (a good guide): http://www.michael-noll.com/ If you have good materials to share, please send them to TA. Hadoop Setup