3332
Comment:
|
3799
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
== Hadoop Cluster Kickstart == === Preface === |
== Preface == |
Line 7: | Line 6: |
=== Configuration on all machines in cluster === | == Configuration on all machines in cluster == |
Line 23: | Line 22: |
=== Configuration of Hadoop framework === ==== conf/hadoop-env.xml ==== |
== Configuration of Hadoop framework == === conf/hadoop-env.xml === |
Line 41: | Line 40: |
==== conf/hdfs-site.xml ==== We manipulated following ''properties'' |
=== conf/core-site.xml === We manipulated following ''properties'': {{{ <property> <name>fs.default.name</name> <value>hdfs://ssu03</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/dcache/hadoop/tmp</value> </property> <property> <name>fs.inmemory.size.mb</name> <value>200</value> </property> <property> <name>io.sort.factor</name> <value>100</value> </property> <property> <name>io.sort.mb</name> <value>200</value> </property> }}} === conf/hdfs-site.xml === We manipulated following ''properties'': |
Line 83: | Line 107: |
==== conf/mapred-site.xml ==== | === conf/mapred-site.xml === |
Line 87: | Line 111: |
mkdir -p /dcache/mapred/system | mkdir -p /dcache/mapred/system |
Line 108: | Line 132: |
==== conf/master ==== | === conf/master === |
Line 114: | Line 138: |
==== conf/slaves ==== | === conf/slaves === |
Line 122: | Line 146: |
=== ssh settings === === Ports to open for datanode communication === |
== ssh settings == |
Line 126: | Line 149: |
=== node commissioning === | == Ports to open for datanode communication == |
Line 128: | Line 151: |
=== node decommissioning === | == node commissioning == == node decommissioning == |
Contents
Preface
We work with Apache Hadoop release 1.0.4 from http://hadoop.apache.org/, which is stable version in February 2013.
In our setup the secondarynamenode is running on other machine than namenode. Both namenode and secondarynamenode are also datanodes and tasktracker. Jobtracker is same machine than namenode. For data storage we used a RAID 6 disk array mounted under /dcache with 16 TB.
Configuration on all machines in cluster
- We have to add user hadoop in group hadoop on all machines in Cluster:
groupadd -g 790 hadoop
useradd --comment "Hadoop" --shell /bin/zsh -m -r -g 790 -G hadoop --home /usr/local/hadoop hadoop
- in zshrc we have to add some variables:
export HADOOP_INSTALL=/usr/local/hadoop/hadoop-1.0.4 export HADOOP_CONF_DIR=$HADOOP_INSTALL/conf export PATH=$PATH:$HADOOP_INSTALL/bin
Configuration of Hadoop framework
conf/hadoop-env.xml
Following lines are to add in hadoop-env.xml
- Setting JAVA_HOME
export JAVA_HOME=/etc/alternatives/jre_oracle
- Setting cluster members
export HADOOP_SLAVES=$HADOOP_HOME/conf/slaves
- Setting path where hadoop conf should be rsync'd
export HADOOP_MASTER=ssu03:/usr/local/hadoop/hadoop-1.0.4
conf/core-site.xml
We manipulated following properties:
<property> <name>fs.default.name</name> <value>hdfs://ssu03</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/dcache/hadoop/tmp</value> </property> <property> <name>fs.inmemory.size.mb</name> <value>200</value> </property> <property> <name>io.sort.factor</name> <value>100</value> </property> <property> <name>io.sort.mb</name> <value>200</value> </property>
conf/hdfs-site.xml
We manipulated following properties:
<property> <name>hadoop.tmp.dir</name> <value>/dcache/hadoop/tmp</value> </property> <property> <name>dfs.data.dir</name> <value>/dcache/hadoop/hdfs/data</value> </property> <property> <name>dfs.name.dir</name> <value>/dcache/hadoop/hdfs/name</value> </property> <property> <name>fs.default.name</name> <value>hdfs://ssu03</value> </property> <property> <name>dfs.hosts</name> <value>$(HADOOP_CONF_DIR)/slaves</value> </property> <property> <name>dfs.replication<name> <value>3<value> <description>Default block replication<description> <property> <property> <name>dfs.secondary.http.address<name> <value>ssu04:50090<value> <property> <property> <name>fs.checkpoint.dir<name> <value>ssu04:/dcache/hadoop/secondary<value> <property> <property> <name>dfs.http.address<name> <value>ssu03:50090<value> <property>
conf/mapred-site.xml
We manipulated following properties
we have to attach the mapred/system directory on all machines:
mkdir -p /dcache/mapred/system
<property> <name>mapred.system.dir</name> <value>/dcache/hadoop/mapred/system</value> </property> <property> <name>mapred.job.tracker</name> <value>ssu03:9001</value> </property> <property> <name>mapred.hosts</name> <value>${HADOOP_CONF_DIR}/slaves</value> </property> <property> <name>dfs.hosts</name> <value>${HADOOP_CONF_DIR}/slaves</value> </property>
conf/master
We have to add host name of our namenode/jobtracker
ssu03
conf/slaves
We have to add host names of all datanodes/tasktracker
ssu03 ssu01 ssu04 ssu05
ssh settings
Ports to open for datanode communication
node commissioning