Differences between revisions 2 and 3
Revision 2 as of 2013-02-12 11:16:04
Size: 1296
Comment:
Revision 3 as of 2013-02-12 11:23:45
Size: 2506
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
In our setup the ''secondarynamenode'' is running on other machine than ''namenode''. Both namenode and secondarynamenode are also ''datanodes'' and ''tasktracker''. ''Jobtracker'' is same machine than namenode. In our setup the ''secondarynamenode'' is running on other machine than ''namenode''. Both namenode and secondarynamenode are also ''datanodes'' and ''tasktracker''. ''Jobtracker'' is same machine than namenode. For data storage we used a RAID 6 disk array mounted under /dcache with 16 TB.
Line 19: Line 19:
  export HADOOP_CONF_DIR=$HADOOP_INSTALL/conf
Line 20: Line 21:
  export HADOOP_CONF_DIR=$HADOOP_INSTALL/conf
Line 22: Line 22:
Line 39: Line 40:

==== conf/hdfs-site.xml ====
We manipulated following ''properties''
{{{
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/dcache/hadoop/tmp</value>
 </property>
 <property>
  <name>dfs.data.dir</name>
  <value>/dcache/hadoop/hdfs/data</value>
 </property>
 <property>
  <name>dfs.name.dir</name>
  <value>/dcache/hadoop/hdfs/name</value>
 </property>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://ssu03</value>
 </property>
 <property>
  <name>dfs.hosts</name>
  <value>$(HADOOP_CONF_DIR)/slaves</value>
 </property>
 <property>
  <name>dfs.replication<name>
  <value>3<value>
  <description>Default block replication<description>
  <property>
  <property>
  <name>dfs.secondary.http.address<name>
  <value>ssu04:50090<value>
  <property>
  <property>
  <name>fs.checkpoint.dir<name>
  <value>ssu04:/dcache/hadoop/secondary<value>
  <property>
 <property>
  <name>dfs.http.address<name>
  <value>ssu03:50090<value>
  <property>
}}}

==== conf/hadoop-env.xml ====


==== conf/hadoop-env.xml ====


==== conf/hadoop-env.xml ====


==== conf/hadoop-env.xml ====

Hadoop Cluster Kickstart

Preface

We work with Apache Hadoop release 1.0.4 from http://hadoop.apache.org/, which is stable version in February 2013.
In our setup the secondarynamenode is running on other machine than namenode. Both namenode and secondarynamenode are also datanodes and tasktracker. Jobtracker is same machine than namenode. For data storage we used a RAID 6 disk array mounted under /dcache with 16 TB.

Configuration on all machines in cluster

  • We have to add user hadoop in group hadoop on all machines in Cluster:

  groupadd -g 790 hadoop

  useradd --comment "Hadoop" --shell /bin/zsh -m -r -g 790 -G hadoop --home /usr/local/hadoop hadoop
  • in zshrc we have to add some variables:

  export HADOOP_INSTALL=/usr/local/hadoop/hadoop-1.0.4
  export HADOOP_CONF_DIR=$HADOOP_INSTALL/conf
  export PATH=$PATH:$HADOOP_INSTALL/bin

Configuration of Hadoop framework

conf/hadoop-env.xml

Following lines are to add in hadoop-env.xml

  • Setting JAVA_HOME

  export JAVA_HOME=/etc/alternatives/jre_oracle
  • Setting cluster members

  export HADOOP_SLAVES=$HADOOP_HOME/conf/slaves
  • Setting path where hadoop conf should be rsync'd

  export HADOOP_MASTER=ssu03:/usr/local/hadoop/hadoop-1.0.4

conf/hdfs-site.xml

We manipulated following properties

        <property>
                <name>hadoop.tmp.dir</name>
                <value>/dcache/hadoop/tmp</value>
        </property>
        <property>
                <name>dfs.data.dir</name>
                <value>/dcache/hadoop/hdfs/data</value>
        </property>
        <property>
                <name>dfs.name.dir</name>
                <value>/dcache/hadoop/hdfs/name</value>
        </property>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://ssu03</value>
        </property>
        <property>
                <name>dfs.hosts</name>
                <value>$(HADOOP_CONF_DIR)/slaves</value>
        </property>
        <property>
                <name>dfs.replication<name>
                <value>3<value>
                <description>Default block replication<description>
        <property>
        <property>
                <name>dfs.secondary.http.address<name>
                <value>ssu04:50090<value>
        <property>
        <property>
                <name>fs.checkpoint.dir<name>
                <value>ssu04:/dcache/hadoop/secondary<value>
        <property>
        <property>
                <name>dfs.http.address<name>
                <value>ssu03:50090<value>
        <property>

conf/hadoop-env.xml

conf/hadoop-env.xml

conf/hadoop-env.xml

conf/hadoop-env.xml

Hadoop Cluster Kickstart (last edited 2013-02-12 11:56:04 by AndreasKnoepke)