How to install Hadoop in Standalone Machine

  • Step 1: Install the following software before installing Hadoop

    $sudo apt-get update
    $sudo apt-get install sun-java6-jdk
    $sudo update-java-alternatives -s java-6-sun

  • Step 2: Verify the java installation using following command

    $java -version

  • Step 3:Create a Hadoop User

    $sudo addgroup hadoop
    $sudo adduser –ingroup hadoop hduser

  • Step 4: SSH configuration

    $su hduser

    Generate ssh key

    $ssh-keygen -t rsa -P ""

    Enable the SSH access to local machine

    $cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

    Verify ssh configuration using the command

    $ssh slogix.in

  • Step 5: Disabling IPv6
    Add the following lines to the /etc/sysctl.conf file

    net.ipv6.conf.all.disable_ipv6 = 1
    net.ipv6.conf.default.disable_ipv6 = 1
    net.ipv6.conf.lo.disable_ipv6 = 1


    To check whether IPv6 is enabled on local machine using the following command

    $cat /proc/sys/net/ipv6/conf/all/disable_ipv6
    A return value of 0 means IPv6 is enabled, a value of 1 means disabled

  • Step 6: Download hadoop from the following link

    https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/

  • Step 7: Extract hadoop-1.2.1.tar.gz into the /usr/local/ directory using the following command

    $sudo tar xzf hadoop-1.2.1.tar.gz -C /usr/local/

  • Getting Permission to the hadoop user $sudo chown -R hduser:hadoop hadoop-1.2.1
  • Step 8: Set hadoop environment variable as follows

    Modify .bahrcvi /home/admin(system name)/.bashrc

  • Add the following lines to .bashrc file

    # Set Hadoop-related environment variables
    export HADOOP_HOME=/usr/local/hadoop-1.2.1
    # Set JAVA_HOME
    export JAVA_HOME=/usr/lib/jvm/jdk1.8.0
    # Some convenient aliases and functions for running Hadoop-related commands
    unalias fs &> /dev/null
    alias fs="hadoop fs"
    unalias hls &> /dev/null
    alias hls="fs -ls"
    # Requires installed 'lzop' command.
    lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
    }
    # Add Hadoop bin/ directory to PATH
    export PATH=$PATH:$HADOOP_HOME/bin

  • Step 9: Configuration of Hadoop directory

    Add the following lines to Hadoop xml filesThese files are contained in following directory$ cd /usr/local/hadoop-1.2.1/conf

    1.hadoop-env.sh
    export JAVA_HOME=/usr/lib/jvm/jdk1.8.0
    export HADOOP_HOME_WARN_SUPPRESS="TRUE"

    2.Core-site.xml
    <configuration></li>
    </ul>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>

    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
    <description>The name of the default file system. </description>
    </property>

    </configuration>

    3.hdfs-site.xml
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.</description>
    </property>
    </configuration>

    4.mapred-site.xml
    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
    <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
    </description>
    </property>

    </configuration>

  • Step 10: Format Hadoop NameNode

    Execute the below command from Hadoop directory
    $hadoop namenode -format

  • Step 11: Start Hadoop Daemons

    $start-all.sh

  • Step 12: Verify the running state of Hadoop daemons

    $jps6146 JobTracker
    6400 TaskTracker
    6541 Jps
    5806 DataNode
    6057 SecondaryNameNode
    5474 NameNode

  • Step 13:Stop Hadoop Daemons

    Stop Hadoop Daemons
    $stop-all.sh
    stopping jobtracker
    slogix.in: stopping tasktracker
    stopping namenode
    slogix.in: stopping datanode
    slogix.in: stopping secondarynamenode