在 K8S 部署一个 Spark History Server - 篇2

2023-07-04 17:58:47 server history 部署 Spark K8S

1 Overview

之前我们组在生产环境上部署的是 Spark 2.2 on k8s 的那个 fork,部署在 K8S 上,至少需要一个 Dockerfile,近有计划升级到 3.0.0 Snapshot 的分支代码上,借此,做个记录。

History Server => HS

2 Start

Spark 自2.3.0,之后就提供官方的 Dockerfile 了,可以基于生产环境的需求,自行 build。所以这里调研一下,Dockerfile 能否直接支持运行一个 HS 的进程。

贴个 Dockerfile 看看(删除了一些注释)。

FROM openjdk:8-alpine

ARG spark_uid=185

RUN set -ex && \
    apk upgrade --no-cache && \
    ln -s /lib /lib64 && \
    apk add --no-cache bash tini libc6-compat linux-pam krb5 krb5-libs nss && \
    mkdir -p /opt/spark && \
    mkdir -p /opt/spark/examples && \
    mkdir -p /opt/spark/work-dir && \
    touch /opt/spark/RELEASE && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd

COPY jars /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY examples /opt/spark/examples
COPY kubernetes/tests /opt/spark/tests
COPY data /opt/spark/data

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir

ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
USER ${spark_uid}

相关文章