Spark集群安装
实验名称:spark集群安装详细步骤
安装Spark集群任务介绍
下面将要在三台linux虚拟机上搭建spark集群并启动服务。
用到的知识点
- linux基本命令
- spark节点master和worker
集群安装
完成实验需要以下相关知识
- 解压命令
tar -zxvf XX.tar.gz -C dist
- vi编辑器的使用
vi + file 打开一个文件,要想了解更多请了解vi编辑器的使用
- 远程拷贝
scp -r srcfile user@hostName:distpath
-
关闭防火墙命令
service iptables stop
-
linux下安装jdk
-
linux下免密码登录
-
spark集群基本常识
安装前准备
- 准备三台linux虚拟机
- 配置ip和host 下面表格是本次实验的配置情况
ip | host | 软件名 |
---|---|---|
192.168.1.111 | linux1 | spark-master |
192.168.1.112 | linux2 | spark-worker |
192.168.1.113 | linux3 | spark-worker |
- 配置免密登录,免密登录方案 linux1免密登录linux2和linux3
- 安装jdk8
- 准备spark-2.2.3-bin-hadoop2.7.tgz版本的安装包
下面开始进行安装。
spark集群安装实验
- 上传spark-2.2.3-bin-hadoop2.7.tgz安装文件到 /root/apps/srcclauster
- 进入主节点创建一个目录apps就作为安装目录
[root@linux1 ~]# mkdir /root/apps
复制代码
- 解压spark
[root@linux1 ~]#tar –zxvf /root/srcclauster/spark-2.2.3-bin-hadoop2.7.tgz -C /root/apps
复制代码
- 配置spark
进入sparkconf目录
[root@linux1 ~]#
[root@linux01 ~]# cd apps/spark-2.2.3-bin-hadoop2.7/conf/
[root@linux01 conf]#
[root@linux01 conf]# ll
总用量 44
-rw-r--r--. 1 501 games 996 1月 8 2019 docker.properties.template
-rw-r--r--. 1 501 games 1105 1月 8 2019 fairscheduler.xml.template
-rw-r--r--. 1 501 games 2025 1月 8 2019 log4j.properties.template
-rw-r--r--. 1 501 games 7313 1月 8 2019 metrics.properties.template
-rw-r--r--. 1 root root 17 5月 21 19:48 slaves
-rw-r--r--. 1 501 games 865 1月 8 2019 slaves.template
-rw-r--r--. 1 501 games 1292 1月 8 2019 spark-defaults.conf.template
-rwxr-xr-x. 1 501 games 3764 1月 8 2019 spark-env.sh.template
[root@linux01 conf]#
复制代码
修改spark-env.sh.template文件名为spark-env.sh
[root@linux01 conf]# mv spark-env.sh.template spark-env.sh
[root@linux01 conf]# ll
总用量 44
-rw-r--r--. 1 501 games 996 1月 8 2019 docker.properties.template
-rw-r--r--. 1 501 games 1105 1月 8 2019 fairscheduler.xml.template
-rw-r--r--. 1 501 games 2025 1月 8 2019 log4j.properties.template
-rw-r--r--. 1 501 games 7313 1月 8 2019 metrics.properties.template
-rw-r--r--. 1 root root 17 5月 21 19:48 slaves
-rw-r--r--. 1 501 games 865 1月 8 2019 slaves.template
-rw-r--r--. 1 501 games 1292 1月 8 2019 spark-defaults.conf.template
-rwxr-xr-x. 1 501 games 3764 1月 8 2019 spark-env.sh
[root@linux01 conf]#
复制代码
编辑spark-env.sh
[root@linux01 conf]# vi spark-env.sh
复制代码
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos
# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_EXECUTOR_INSTANCES, Number of executors to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS The scheduling priority for daemons. (Default: 0)
export JAVA_HOME=/root/apps/jdk1.8.0_101
export HADOOP_HOME=/root/apps/hadoop-2.7.7
export HADOOP_CONF_DIR=/root/apps/hadoop-2.7.7/etc/hadoop
export SPARK_MASTER_IP=linux01
export SPARK_MASTER_PORT=7077
export SPARK_EXECUTOR_MEMORY=512m
复制代码
- 配置spark的环境变量
[root@linux1 ~]# vi /etc/profile
复制代码
export SPARK_HOME=/root/apps/spark-2.2.3-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$PATH:$SPARK_HOME/sbin
复制代码
- scp到其他节点
[root@linux1 ~]# scp -r /root/apps/ root@linux2:/root
[root@linux1 ~]# scp -r /root/apps/ root@linux3:/root
复制代码
- 启动spark集群
[root@linux01 sbin]# ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /root/apps/spark-2.2.3-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-linux01.out
linux02: starting org.apache.spark.deploy.worker.Worker, logging to /root/apps/spark-2.2.3-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-linux02.out
linux03: starting org.apache.spark.deploy.worker.Worker, logging to /root/apps/spark-2.2.3-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-linux03.out
linux03: failed to launch: nice -n 0 /root/apps/spark-2.2.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://linux01:7077
linux03: full log in /root/apps/spark-2.2.3-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-linux03.out
复制代码
- 查看是否启动成功
linux1节点执行jps命令
[root@linux01 sbin]# jps
1649 Jps
1493 Master
[root@linux01 sbin]#
复制代码
linux2节点执行jps命令
[root@linux02 sbin]# jps
1649 Jps
1493 Master
[root@linux01 sbin]#
复制代码
linux3节点执行jps命令
[root@linux03 sbin]# jps
1649 Jps
1493 Master
[root@linux01 sbin]#
复制代码
实验总结
下载并上传spark集群的安装文件,解压到指定目录,修改spark-env.sh-文件然后scp到其他节点即可。
启动集群是sbin目录下的start-all.sh这个命令。
相关文章