Spark 2.1.0 setup on YARN environment, along with Zeppelin notebook
by Hao WU
Summary
1. Introduction
2. Architecture
3. Spark Setup
4. Zeppelin Setup
1. Introduction
This posts will give the detail about how to setup Spark environment onto YARN computing cluster, and also along with apache Zeppelin notebook.
2. Architecture
The architecture is specified in my previous post, please refer to it for YARN to be setup.
3. Spark Setup
P.S. We use hadoop as our login user.
3.1. Download and untar Spark bin package
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-without-hadoop.tgz
tar -xvf spark-2.1.0-bin-without-hadoop.tgz
rm spark-2.1.0-bin-without-hadoop.tgz
mv spark-2.1.0-bin-without-hadoop ~/spark-2.1.0
3.2. Add environment variables, append the following to ~/.bashrc
export SPARK_MASTER_HOST=master
export SPARK_HOME=/home/hadoop/spark-2.1.0
export SPARK_LOCAL_DIRS=$SPARK_HOME/storage
export SPARK_DRIVER_MEMORY=1G
export PATH=$PATH:$SPARK_HOME/bin
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Run source ~/.bashrc
to make environment variables effect
3.3. Specify slave node info to conf/slaves
echo slave1 > $SPARK_HOME/conf/slaves
echo slave2 >> $SPARK_HOME/conf/slaves
3.4. Copy to slave nodes
scp ~/.bashrc slave1:~/
scp ~/.bashrc slave2:~/
scp -r ~/spark-2.1.0 slave1:~/
scp -r ~/spark-2.1.0 slave2:~/
3.5. Check if Spark works well in standalone cluster mode
~/spark-2.1.0/sbin/start-all.sh
check web ui master:8080 is up and displays ok, then turn the cluster down because we will use YARN as our mode. ~/spark-2.1.0/sbin/stop-all.sh
3.6. The Spark commands below now can be now running on YARN
spark-submit
spark-shell
pyspark
4. Zeppelin Setup
P.S. We’ll continue use hadoop as our login user.
4.1. Download and untar Zeppelin bin package
wget http://apache.mirrors.ionfish.org/zeppelin/zeppelin-0.7.0/zeppelin-0.7.0-bin-all.tgz
tar -xvf zeppelin-0.7.0-bin-all.tgz
rm zeppelin-0.7.0-bin-all.tgz
mv zeppelin-0.7.0-bin-all ~/zeppelin-0.7.0
4.2. Add environment variables, append the following to ~/.bashrc
export ZEPPELIN_HOME=/home/hadoop/zeppelin-0.7.0
export PATH=$PATH:$ZEPPELIN_HOME/bin
Run source ~/.bashrc
to make environment variables effect
4.3. Check if Zeppelin works well
Launch the Zeppelin daemon use zeppelin-daemon.sh start
check web ui master:8080 is up and works ok.
use commands below to operate the zeppelin notebook.
zeppelin-daemon.sh start/restart/stop
Subscribe via RSS