<> == Hadoop Cluster Kickstart == === Preface === We work with Apache Hadoop release 1.0.4 from http://hadoop.apache.org/, which is stable version in February 2013.<
> In our setup the ''secondarynamenode'' is running on other machine than ''namenode''. Both namenode and secondarynamenode are also ''datanodes'' and ''tasktracker''. ''Jobtracker'' is same machine than namenode. For data storage we used a RAID 6 disk array mounted under /dcache with 16 TB. === Configuration on all machines in cluster === * We have to add user hadoop in group hadoop on all machines in Cluster: {{{ groupadd -g 790 hadoop }}} {{{ useradd --comment "Hadoop" --shell /bin/zsh -m -r -g 790 -G hadoop --home /usr/local/hadoop hadoop }}} * in zshrc we have to add some variables: {{{ export HADOOP_INSTALL=/usr/local/hadoop/hadoop-1.0.4 export HADOOP_CONF_DIR=$HADOOP_INSTALL/conf export PATH=$PATH:$HADOOP_INSTALL/bin }}} === Configuration of Hadoop framework === ==== conf/hadoop-env.xml ==== Following lines are to add in hadoop-env.xml * Setting JAVA_HOME {{{ export JAVA_HOME=/etc/alternatives/jre_oracle }}} * Setting cluster members {{{ export HADOOP_SLAVES=$HADOOP_HOME/conf/slaves }}} * Setting path where hadoop conf should be rsync'd {{{ export HADOOP_MASTER=ssu03:/usr/local/hadoop/hadoop-1.0.4 }}} ==== conf/hdfs-site.xml ==== We manipulated following ''properties'' {{{ hadoop.tmp.dir /dcache/hadoop/tmp dfs.data.dir /dcache/hadoop/hdfs/data dfs.name.dir /dcache/hadoop/hdfs/name fs.default.name hdfs://ssu03 dfs.hosts $(HADOOP_CONF_DIR)/slaves dfs.replication 3 Default block replication dfs.secondary.http.address ssu04:50090 fs.checkpoint.dir ssu04:/dcache/hadoop/secondary dfs.http.address ssu03:50090 }}} ==== conf/hadoop-env.xml ==== ==== conf/hadoop-env.xml ==== ==== conf/hadoop-env.xml ==== ==== conf/hadoop-env.xml ====