Installation and configuration of Apache Spark on Standalone Cluster

  • Step 1: Install the following software before installing Hadoop

    Java JDK 1.8Hadoop 1.2.1
    Scala-2.10

  • Step 2:Java Installation

    $sudo apt-get update
    $sudo apt-get install oracle-java8-installer
    $sudo update-alternatives –config java
    Verify the java installation using following command
    $java -version

  • Step 3:Hadoop Installation

    Install Hadoop from following link

    http://slogix.in/how-to-install-hadoop-in-standalone-machine

  • Step 4: Scala InstallationDownload Scala from the following link

    http://www.scala-lang.org/download/

    Set Scala environment variable as follows
    Modify .bashrc

    vi /home/admin(system name)/.bashrc

    Add the following lines to .bashrc file

    export SCALA_HOME=/home/user/scala-2.10

    Give the following command in Terminal

    $source .bashrc

    Verify the scala installation using following command

    $scala -version

  • Step 5: Spark Installation
    (i) Download Spark from following link

    https://spark.apache.org/downloads.html

    (ii) Set Spark environment variable as followsModify .bashrc

    vi /home/admin(system name)/.bashrc


    Add the following lines to .bashrc file

    export SPARK_HOME=/home/user/spark-1.6.2</li>
    </ul>
    export

    PATH=$HADOOP_HOME/bin:$SPARK_HOME/bin:$SCALA_HOME/bin:$PATH

    Give the following command in Terminal

    $source .bashrc

    (iii) Configuration of Spark directory

    $ cd /home/user/spark-1.6.2/conf

    Edit spark-env.sh by adding following lines

    export SCALA_HOME=/home/user/scala-2.10

    export SPARK_WORKER_MEMORY=1g

    export SPARK_WORKER_INSTANCES=2

    export SPARK_WORKER_DIR=/home/user/work

    Rename slaves.template into slaves and edit as follows

    $vi slaves

    Ipaddress (Master IPaddress)

    Rename spark-defaults.conf.template into spark-defaults.conf

    $vi spark-defaults.conf

    spark.master spark://IP address:7077

  • Step 6:Execute the below command from Spark directory

    $cd /home/user/spark-1.6.2/sbin
    start-master.shstart-slaves.sh

  • Step 7: Verify the running state of Spark daemons
    $jps

    3590 Worker
    3273 Master2842 Worker

  • Step 8: Verify the Spark installation by executing following command

    $cd “/home/user/spark-1.6.2/bin”$spark-shellscala>