List of Topics:
Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Install Hadoop in Multi Node Cluster?

Installing Hadoop in Multi Node Cluster

Steps for Installing Hadoop in Multi Node Cluster

  • Description:
    To install Hadoop in a multi-node cluster, start by installing Java on all nodes and configuring SSH for passwordless login. Install Hadoop on the master node and configure necessary settings in the core-site.xml, hdfs-site.xml, mapred-site.xml, and yarn-site.xml files. Set up passwordless SSH, format the HDFS filesystem, and start the Hadoop daemons on all nodes. Verify the installation using the jps command and check the web interfaces to ensure the cluster is running correctly. This setup establishes a multi-node Hadoop cluster for distributed data processing.

Step 1: First, update the system's package list to ensure access to the latest versions of software packages before starting the installation process.

  • Command:
    sudo apt update
  • Hadoop Step 1 Screenshot

Step 2: Install SSH to enable remote communication between nodes in the cluster.

  • Command:
    sudo apt install openssh-server openssh-client -y
  • Hadoop Step 2 Screenshot

Step 3: Set Up Passwordless Login Between Name Node and Data Nodes.

  • Command:
    ssh-keygen
  • Hadoop Step 3 Screenshot

Step 4: Copy id_rsa.pub to authorized_keys

  • Command:
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • Hadoop Step 4 Screenshot

Step 5: Copy id_rsa.pub to authorized_keys under ~/.ssh folder. By using >> it appends the contents of the id_rsa.pub file to authorized_keys

  • Command:
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • Hadoop Step 5 Screenshot

Step 6: Now copy authorized_keys to all data nodes in a cluster. This enables name node to connect to data nodes password less (without prompting for password)

  • Hadoop Step 6 Screenshot

Step 7: Add all our nodes to /etc/hosts.

  • Hadoop Step 7 Screenshot

Step 8: Install JDK1.8 on all 4 nodes

  • Hadoop Step 8 Screenshot

Step 9: Download Apache Hadoop latest version using wget command.

  • Hadoop Step 9 Screenshot

Step 10: After the download finishes, extract the contents using the tar command, a tool for unzipping files in Ubuntu, and rename the extracted folder to "hadoop."

  • Hadoop Step 10 Screenshot

Step 11: Add Hadoop environment variables to .bashrc file. Open .bashrc file in vi editor and add below variables.

  • Hadoop Step 11 Screenshot

Step 12: Now re-load the environment variables to the opened session or close and open the shell.

  • Hadoop Step 12 Screenshot

Step 13: Once Apache Hadoop Installation completes, you need to configure it by changing some configuration files. Make the below configurations on the name node and copy these configurations to all 3 data nodes in a cluster.

  • Hadoop Step 13 Screenshot

Step 14: Update core-site.xml

  • Hadoop Step 14 Screenshot

Step 15: Update hdfs-site.xml

  • Hadoop Step 15 Screenshot

Step 16: Update yarn-site.xml

  • Hadoop Step 16 Screenshot

Step 17: Update mapred-site.xml

  • Hadoop Step 17 Screenshot

Step 18: Create a data folder and change its permissions to the login user. I’ve logged in as a ubuntu user, so you see with ubuntu.

  • Hadoop Step 18 Screenshot

Step 19: HDFS needs to be formatted like any classical file system. On Name Node server (namenode), run the following command:

  • Hadoop Step 19 Screenshot

Step 20: Start the HDFS by running the start-dfs.sh script from Name Node Server (namenode)

  • Hadoop Step 20 Screenshot

Step 21: Running jps command on namenode should list the following

  • Hadoop Step 21 Screenshot

Step 22: Running jps command on datanodes should list the following

  • Hadoop Step 22 Screenshot

Step 23: Verify the installation by accessing the web interface.

  • URL:
    http://192.168.1.100:9870
  • Hadoop Step 23 Screenshot