Diff for "Hadoop Cluster Kickstart" -

Differences between revisions 9 and 10

Preface

conf/hadoo conf/core-s conf/hdfs-s conf/mapr conf/masterconf/slavesssh settingsnode node

We
Configuration
conf/hadoop-
conf/core-sit

conf/hdfs-sit

conf/mapred

conf/master

conf/slaves

ssh settings

cluster">Configuration on all machines in cluster
">Configuration of Hadoop framework
1. p-env.xml
2. ite.xml
3. ite.xml
4. ed-site.xml
5. i>
6. i>
i>
ication">Ports to open for datanode communication
commissioning
decommissioning

class="line862">We work with Apache Hadoop release 1.0.4 from http://hadoop.apache.org/, which is stable version in February 2013.
our setup the secondarynamenode is running on other machine than namenode. Both namenode and secondarynamenode are also datanodes and tasktracker. Jobtracker is same machine than namenode. For data storage we use a RAID 6 disk array mounted under /dcache with 16 TB.

uster">Configuration on all machines in cluster have to add user hadoop in group hadoop on all machines in Cluster:

  groupadd -g 790 hadoop

  useradd --comment "Hadoop" --shell /bin/zsh -m -r -g 790 -G hadoop --home /usr/local/hadoop hadoop

in zshrc we have to add some variables:

  export HADOOP_INSTALL=/usr/local/hadoop/hadoop-1.0.4 export HADOOP_CONF_DIR=$HADOOP_INSTALL/conf export PATH=$PATH:$HADOOP_INSTALL/bin

of Hadoop framework class="line867"> env.xml class="line874">Following lines are to add in hadoop-env.xml

Setting JAVA_HOME

  export JAVA_HOME=/etc/alternatives/jre_oracle

Setting cluster members

  export HADOOP_SLAVES=$HADOOP_HOME/conf/slaves

Setting path where hadoop conf should be rsync'd

  export HADOOP_MASTER=ssu03:/usr/local/hadoop/hadoop-1.0.4

e.xml class="line862">We manipulated following properties:

        <property> <name>fs.default.name</name> <value>hdfs://ssu03</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/dcache/hadoop/tmp</value> </property> <property> <name>fs.inmemory.size.mb</name> <value>200</value> </property> <property> <name>io.sort.factor</name> <value>100</value> </property> <property> <name>io.sort.mb</name> <value>200</value> </property>

e.xml class="line862">We manipulate following properties:

        <property> <name>hadoop.tmp.dir</name> <value>/dcache/hadoop/tmp</value> </property> <property> <name>dfs.data.dir</name> <value>/dcache/hadoop/hdfs/data</value> </property> <property> <name>dfs.name.dir</name> <value>/dcache/hadoop/hdfs/name</value> </property> <property> <name>fs.default.name</name> <value>hdfs://ssu03</value> </property> <property> <name>dfs.hosts</name> <value>$(HADOOP_CONF_DIR)/slaves</value> </property> <property> <name>dfs.replication<name> <value>3<value> <description>Default block replication<description> <property> <property> <name>dfs.secondary.http.address<name> <value>ssu04:50090<value> <property> <property> <name>fs.checkpoint.dir<name> <value>ssu04:/dcache/hadoop/secondary<value> <property> <property> <name>dfs.http.address<name> <value>ssu03:50090<value> <property>

-site.xml class="line862">We manipulate following properties

we have to attach the mapred/system directory on all machines:

  mkdir -p /dcache/mapred/system

        <property> <name>mapred.system.dir</name> <value>/dcache/hadoop/mapred/system</value> </property> <property> <name>mapred.job.tracker</name> <value>ssu03:9001</value> </property> <property> <name>mapred.hosts</name> <value>${HADOOP_CONF_DIR}/slaves</value> </property> <property> <name>dfs.hosts</name> <value>${HADOOP_CONF_DIR}/slaves</value> </property>

class="line874">We have to add host name of our namenode/jobtracker

    ssu03

class="line874">We have to add host names of all datanodes/tasktracker

    ssu03 ssu01 ssu04 ssu05

class="anchor" id="line-148">

to open for datanode communication class="anchor" id="line-151">

ing class="anchor" id="line-154">

decommissioning class="anchor" id="bottom">

-  ⇤ ← Revision 9 as of 2013-02-12 11:46:23 → 
  Size: 3799
  Editor: AndreasKnoepke
  Comment:
+   ← Revision 10 as of 2013-02-12 11:48:11 → ⇥
  Size: 3796
  Editor: AndreasKnoepke
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 4:
-In our setup the ''secondarynamenode'' is running on other machine than ''namenode''. Both namenode and secondarynamenode are also ''datanodes'' and ''tasktracker''. ''Jobtracker'' is same machine than namenode. For data storage we used a RAID 6 disk array mounted under /dcache with 16 TB.
+In our setup the ''secondarynamenode'' is running on other machine than ''namenode''. Both namenode and secondarynamenode are also ''datanodes'' and ''tasktracker''. ''Jobtracker'' is same machine than namenode. For data storage we use a RAID 6 disk array mounted under /dcache with 16 TB.
 Line 66:
-We manipulated following ''properties'':
+We manipulate following ''properties'':
 Line 108:
-We manipulated following ''properties''
+We manipulate following ''properties''

Wiki

Page

Preface

We
Configuration
conf/hadoop-
conf/core-sit

conf/hdfs-sit

conf/mapred

conf/master

conf/slaves

Configuration
conf/hadoop-
conf/core-sit

conf/hdfs-sit

conf/mapred

conf/master

ssh settings

Wiki

Page

Preface

We Configuration conf/hadoop- conf/core-sit conf/hdfs-sit conf/mapred conf/master conf/slaves

Configuration conf/hadoop- conf/core-sit

conf/hdfs-sit

conf/mapred

conf/master

ssh settings

We
Configuration
conf/hadoop-
conf/core-sit

conf/hdfs-sit

conf/mapred

conf/master

conf/slaves

Configuration
conf/hadoop-
conf/core-sit