Description: To install Hadoop in a multi-node cluster, start by installing Java on all nodes and configuring SSH for passwordless login. Install Hadoop on the master node and configure necessary settings in the core-site.xml, hdfs-site.xml, mapred-site.xml, and yarn-site.xml files. Set up passwordless SSH, format the HDFS filesystem, and start the Hadoop daemons on all nodes. Verify the installation using the jps command and check the web interfaces to ensure the cluster is running correctly. This setup establishes a multi-node Hadoop cluster for distributed data processing.
Step 1: First, update the system's package list to ensure access to the latest versions of software packages before starting the installation process.
Command: sudo apt update
Step 2: Install SSH to enable remote communication between nodes in the cluster.
Step 6: Now copy authorized_keys to all data nodes in a cluster. This enables name node to connect to data nodes password less (without prompting for password)
Step 7: Add all our nodes to /etc/hosts.
Step 8: Install JDK1.8 on all 4 nodes
Step 9: Download Apache Hadoop latest version using wget command.
Step 10: After the download finishes, extract the contents using the tar command, a tool for unzipping files in Ubuntu, and rename the extracted folder to "hadoop."
Step 11: Add Hadoop environment variables to .bashrc file. Open .bashrc file in vi editor and add below variables.
Step 12: Now re-load the environment variables to the opened session or close and open the shell.
Step 13: Once Apache Hadoop Installation completes, you need to configure it by changing some configuration files. Make the below configurations on the name node and copy these configurations to all 3 data nodes in a cluster.
Step 14: Update core-site.xml
Step 15: Update hdfs-site.xml
Step 16: Update yarn-site.xml
Step 17: Update mapred-site.xml
Step 18: Create a data folder and change its permissions to the login user. I’ve logged in as a ubuntu user, so you see with ubuntu.
Step 19: HDFS needs to be formatted like any classical file system. On Name Node server (namenode), run the following command:
Step 20: Start the HDFS by running the start-dfs.sh script from Name Node Server (namenode)
Step 21: Running jps command on namenode should list the following
Step 22: Running jps command on datanodes should list the following
Step 23: Verify the installation by accessing the web interface.