List of Topics:
Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Install Hadoop in Standalone Machine?

Installing Hadoop in Standalone Machine

Steps for Installing Hadoop in Standalone Machine

  • Description:
    To install Hadoop on a standalone machine, ensure Java is installed, download the latest Hadoop version, and extract it to a directory. Set environment variables for JAVA_HOME and HADOOP_HOME in the .bashrc file. Configure Hadoop by editing core-site.xml (set fs.defaultFS to hdfs://localhost:9000) and hdfs-site.xml (set replication factor to 1). Format the Hadoop filesystem using hdfs namenode -format and start the Hadoop services with start-dfs.sh. Finally, verify the installation by checking the Hadoop processes with jps and accessing the web interface at http://localhost:50070.

Step 1: Update Package Lists for APT

  • Command:
    sudo apt update
  • Hadoop Step 1 Screenshot

Step 2: Install Java JDK 8

  • Command:
    sudo apt install openjdk-8-jdk
  • Hadoop Step 2 Screenshot

Step 3: Verify Java Installation

  • Command:
    java -version
  • Hadoop Step 3 Screenshot

Step 4: Install OpenSSH

  • Command:
    sudo apt install openssh-server openssh-client -y
  • Hadoop Step 4 Screenshot

Step 5: Create Hadoop User

  • Command:
    sudo adduser hdoop
  • Hadoop Step 5 Screenshot

Step 6: Switch to the Hadoop User

  • Command:
    su – hdoop
  • Hadoop Step 6 Screenshot

Step 7: Generate SSH Key Pair

  • Command:
    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  • Hadoop Step 7 Screenshot

Step 8: Store Public Key

  • Command:
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • Hadoop Step 8 Screenshot

Step 9: Verify SSH Configuration

  • Command:
    ssh localhost
  • Hadoop Step 9 Screenshot

Step 10: The steps outlined in this tutorial use the binary download for Hadoop Version 3.4.0. Select your preferred option, and you will be presented with a mirror link to download the Hadoop tar package.p

  • Hadoop Step 10 Screenshot

Step 11: Use the provided mirror link and download the Hadoop package using the wget command

Step 12: Once the download completes, extract the .tar.gz file

  • Command:
    tar xzf hadoop-3.4.0.tar.gz

Step 13: Edit the .bashrc shell configuration file using a text editor

  • Command:
    nano ~/.bashrc
  • Edit .bashrc File Screenshot
  • Edit .bashrc File Screenshot

Step 14: Define the Hadoop environment variables by adding the following content at the end of the file

  • Configuration:
    # Hadoop Related Options
    export HADOOP_HOME=/home/hdoop/hadoop-3.4.0
    export HADOOP_INSTALL=$HADOOP_HOME
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
              

Step 15: Once you add the variables, save and exit the .bashrc file

  • Save and Exit .bashrc Screenshot

Step 16: Run the command below to apply the changes to the current environment

  • Command:
    source ~/.bashrc
  • Apply Changes Screenshot

Step 17: Use the previously created $HADOOP_HOME variable to access the hadoop-env.sh file:

  • Command:
    nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
  • Edit hadoop-env.sh Screenshot

Step 18: Uncomment the $JAVA_HOME variable and set the full path to OpenJDK

  • Configuration:
    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
              
  • Set JAVA_HOME Screenshot

Step 19: Open the core-site.xml file for editing

  • Command:
    nano $HADOOP_HOME/etc/hadoop/core-site.xml
  • Edit core-site.xml Screenshot

Step 20: Add the following configuration to the core-site.xml file

  • Configuration:
    <configuration>
      <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hdoop/tmpdata</value>
      </property>
      <property>
        <name>fs.default.name</name>
        <value>hdfs://127.0.0.1:9000</value>
      </property>
    </configuration>
        

Step 21: Put the core-site.xml file in $HADOOP_HOME/etc/hadoop

  • Save Core-Site Configuration Screenshot

Step 22: Use the following command to open the hdfs-site.xml file for editing

  • Command:
    nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
  • Edit hdfs-site.xml Screenshot

Step 23: Add the following configuration to the file and adjust directories as needed

  • Configuration:
    <configuration>
      <property>
        <name>dfs.data.dir</name>
        <value>/home/hdoop/dfsdata/namenode</value>
      </property>
      <property>
        <name>dfs.data.dir</name>
        <value>/home/hdoop/dfsdata/datanode</value>
      </property>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
    </configuration>
        

Step 24: Put the xml file in $HADOOP_HOME/etc/hadoop/hdfs-site.xml

  • Save hdfs-site.xml Screenshot

Step 25: Use the following command to access the mapred-site.xml file and define MapReduce values

  • Command:
    sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
  • Edit mapred-site.xml Screenshot

Step 25 :Add the following configuration to change the default MapReduce framework name value to YARN

  • Configuration:
    <configuration>
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
    </configuration>
        

Step 27: Put the xml file in $HADOOP_HOME/etc/hadoop/mapred-site.xml

  • Save mapred-site.xml Screenshot

Step 28: Open the yarn-site.xml file in a text editor

  • Command:
    nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
  • Edit yarn-site.xml Screenshot
  • Edit yarn-site.xml Screenshot

Step 29: Append the following configuration to the yarn-site.xml file

  • <configuration>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
      <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>127.0.0.1</value>
      </property>
      <property>
        <name>yarn.acl.enable</name>
        <value>0</value>
      </property>
      <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
      </property>
    </configuration>
        

Step 30: Put the xml file in $HADOOP_HOME/etc/hadoop/yarn-site.xml<

  • Save yarn-site.xml Screenshot

Step 31: Format HDFS NameNode

  • Command:
    hdfs namenode -format
  • Format NameNode Screenshot

Step 32: Navigate to the hadoop-3.4.0/sbin directory and execute the following command to start the NameNode and DataNode:

  • Command:
    ./start-dfs.sh
  • Start DFS Screenshot

Step 33: Once the namenode, datanodes, and secondary namenode are up and running, start the YARN resource and nodemanagers by typing:

  • Command:
    ./start-yarn.sh
  • Start YARN Screenshot

Step 34: Run the following command to check if all the daemons are active and running as Java processes:

  • Command:
    jps
  • Verify Hadoop Cluster Screenshot

Step 35:Use your preferred browser and navigate to your localhost URL or IP. The default port number 9870 gives you access to the Hadoop NameNode UI:

  • URL:
    http://localhost:50070/
  • Hadoop Web Interface Screenshot

Step 36: The Resource Manager is an invaluable tool that allows you to monitor all running processes in your Hadoop cluster

  • Summary: You have configured Hadoop, formatted the NameNode, and started both the DFS and YARN services.
  • Hadoop Web Interface Screenshot