Install Hadoop in multi node cluster - Java Haddop

Step 1: Prerequisites for installing multinode cluster

Initially single node cluster configuration should be done on two separate machines

Step 2: Host Configuration on Master Machine

Add the following lines to the /etc/hosts/ file
192.168.0.1 master (IP address of the master node)
192.168.0.2 slave (IP address of the slave node)
SSH access

Connect master and slave machine
$ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave
Connect from master to master
$ssh master
Connect from master to slave
$ssh slave

Step 3:Configure the Hadoop directory on master machineAdd the following lines to the /usr/local/hadoop-1.2.1/conf/ directory

1. $ vi masters
master
2. $vi slaves
master
slave

Step 4: Host Configuration on Slave Machine

192.168.0.1 master (IP address of the master node)
192.168.0.2 slave (IP address of the slave node)

Step 5: Configure the Hadoop directory on all machine
Add the following lines to Hadoop xml files

These files are located in following directory
$ cd /usr/local/hadoop-1.2.1/conf

1.Core-site.xml<configuration><property><name>fs.default.name</name><value>hdfs://master:54310</value><description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. </description></li>
</ul>
</property>

</configuration>

2.Mapred-site.xml
<configuration>

<property>

<name>mapred.job.tracker</name>

<value>master:54311</value>

<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description>

</property>

</configuration>

3.hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

<description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description>

</property>

</configuration>

Step 6: Format Hadoop NameNode

Execute the below command from Hadoop directory
$hadoop namenode -format

Step 8: Verify the running state of daemons

$jps
6146 JobTracker
6400 TaskTracker
6541 Jps
5806 DataNode
6057 SecondaryNameNode
5474 NameNode

Step 9: Verify the running state on slave machine

$jps
15183 DataNode
15897 TaskTracker
16284 Jps

Step 10: Stop Hadoop Daemons

$stop-all.sh
stopping namenode
slave: Ubuntu 12.04
slave: stopping datanode
master: stopping datanode
master: stopping secondarynamenode

List

S-Logix (OPC) Private Limited