Installation and configuration of Hadoop on Multi Node Cluster

  • Step 1: Prerequisites for installing multinode cluster

    Initially single node cluster configuration should be done on two separate machines

  • Step 2: Host Configuration on Master Machine

    Add the following lines to the /etc/hosts/ file
    192.168.0.1 master (IP address of the master node)
    192.168.0.2 slave (IP address of the slave node)

    SSH access

    Connect master and slave machine
    $ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave
    Connect from master to master
    $ssh master
    Connect from master to slave
    $ssh slave

  • Step 3:Configure the Hadoop directory on master machineAdd the following lines to the /usr/local/hadoop-1.2.1/conf/ directory

    1. $ vi masters
    master
    2. $vi slaves
    master
    slave

  • Step 4: Host Configuration on Slave Machine

    192.168.0.1 master (IP address of the master node)
    192.168.0.2 slave (IP address of the slave node)

  • Step 5: Configure the Hadoop directory on all machine
    Add the following lines to Hadoop xml files

    These files are located in following directory
    $ cd /usr/local/hadoop-1.2.1/conf

    1.Core-site.xml<configuration><property><name>fs.default.name</name><value>hdfs://master:54310</value><description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. </description></li>
    </ul>
    </property>

    </configuration>

    2.Mapred-site.xml
    <configuration>

    <property>

    <name>mapred.job.tracker</name>

    <value>master:54311</value>

    <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description>

    </property>

    </configuration> 

    3.hdfs-site.xml

    <configuration>

    <property>

    <name>dfs.replication</name>

    <value>2</value>

    <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description>

    </property>

    </configuration>

  • Step 6: Format Hadoop NameNode

    Execute the below command from Hadoop directory
    $hadoop namenode -format

  • Step 7: Start Hadoop Daemons

    $start-all.sh

  • Step 8: Verify the running state of daemons

    $jps
    6146 JobTracker
    6400 TaskTracker
    6541 Jps
    5806 DataNode
    6057 SecondaryNameNode
    5474 NameNode

  • Step 9: Verify the running state on slave machine

    $jps
    15183 DataNode
    15897 TaskTracker
    16284 Jps

  • Step 10: Stop Hadoop Daemons

    $stop-all.sh
    stopping namenode
    slave: Ubuntu 12.04
    slave: stopping datanode
    master: stopping datanode
    master: stopping secondarynamenode