在python坞站映像上使用GPU
问题描述
我使用的是python:3.7.4-slim-buster
驳接图像,无法更改它。
我想知道如何在上面使用我的nVidia GPU。
我通常使用tensorflow/tensorflow:1.14.0-gpu-py3
,使用简单的--runtime=nvidia
intdocker run
命令一切正常,但现在我有这个限制。
我认为这种类型的镜像上没有快捷方式,所以我按照这个指南https://towardsdatascience.com/how-to-properly-use-the-gpu-within-a-docker-container-4c699c78c6d1构建它建议的Dockerfile:
FROM python:3.7.4-slim-buster
RUN apt-get update && apt-get install -y build-essential
RUN apt-get --purge remove -y nvidia*
ADD ./Downloads/nvidia_installers /tmp/nvidia > Get the install files you used to install CUDA and the NVIDIA drivers on your host
RUN /tmp/nvidia/NVIDIA-Linux-x86_64-331.62.run -s -N --no-kernel-module > Install the driver.
RUN rm -rf /tmp/selfgz7 > For some reason the driver installer left temp files when used during a docker build (i dont have any explanation why) and the CUDA installer will fail if there still there so we delete them.
RUN /tmp/nvidia/cuda-linux64-rel-6.0.37-18176142.run -noprompt > CUDA driver installer.
RUN /tmp/nvidia/cuda-samples-linux-6.0.37-18176142.run -noprompt -cudaprefix=/usr/local/cuda-6.0 > CUDA samples comment if you dont want them.
RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 > Add CUDA library into your PATH
RUN touch /etc/ld.so.conf.d/cuda.conf > Update the ld.so.conf.d directory
RUN rm -rf /temp/* > Delete installer files.
但它引发错误:
ADD failed: stat /var/lib/docker/tmp/docker-builder080208872/Downloads/nvidia_installers: no such file or directory
如何更改才能轻松让坞站图像看到我的GPU?
解决方案
TensorFlow图像拆分为几个"部分"Docker文件。One of them包含TensorFlow在GPU上运行所需的所有依赖项。使用它,您可以轻松地创建自定义图像,您只需将默认python更改为您需要的任何版本。在我看来,这似乎比将NVIDIA的东西引入Debian映像(CUDA和/或cuDNN官方不支持AFAIK)容易得多。
这是Dockerfile:
# TensorFlow image base written by TensorFlow authors.
# Source: https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/tools/dockerfiles/partials/ubuntu/nvidia.partial.Dockerfile
# -------------------------------------------------------------------------
ARG ARCH=
ARG CUDA=10.1
FROM nvidia/cuda${ARCH:+-$ARCH}:${CUDA}-base-ubuntu${UBUNTU_VERSION} as base
# ARCH and CUDA are specified again because the FROM directive resets ARGs
# (but their default value is retained if set previously)
ARG ARCH
ARG CUDA
ARG CUDNN=7.6.4.38-1
ARG CUDNN_MAJOR_VERSION=7
ARG LIB_DIR_PREFIX=x86_64
ARG LIBNVINFER=6.0.1-1
ARG LIBNVINFER_MAJOR_VERSION=6
# Needed for string substitution
SHELL ["/bin/bash", "-c"]
# Pick up some TF dependencies
RUN apt-get update && apt-get install -y --no-install-recommends
build-essential
cuda-command-line-tools-${CUDA/./-}
# There appears to be a regression in libcublas10=10.2.2.89-1 which
# prevents cublas from initializing in TF. See
# https://github.com/tensorflow/tensorflow/issues/9489#issuecomment-562394257
libcublas10=10.2.1.243-1
cuda-nvrtc-${CUDA/./-}
cuda-cufft-${CUDA/./-}
cuda-curand-${CUDA/./-}
cuda-cusolver-${CUDA/./-}
cuda-cusparse-${CUDA/./-}
curl
libcudnn7=${CUDNN}+cuda${CUDA}
libfreetype6-dev
libhdf5-serial-dev
libzmq3-dev
pkg-config
software-properties-common
unzip
# Install TensorRT if not building for PowerPC
RUN [[ "${ARCH}" = "ppc64le" ]] || { apt-get update &&
apt-get install -y --no-install-recommends libnvinfer${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA}
libnvinfer-plugin${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA}
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*; }
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# Link the libcuda stub to the location where tensorflow is searching for it and reconfigure
# dynamic linker run-time bindings
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1
&& echo "/usr/local/cuda/lib64/stubs" > /etc/ld.so.conf.d/z-cuda-stubs.conf
&& ldconfig
# -------------------------------------------------------------------------
#
# Custom part
FROM base
ARG PYTHON_VERSION=3.7
RUN apt-get update && apt-get install -y --no-install-recommends --no-install-suggests
python${PYTHON_VERSION}
python3-pip
python${PYTHON_VERSION}-dev
# Change default python
&& cd /usr/bin
&& ln -sf python${PYTHON_VERSION} python3
&& ln -sf python${PYTHON_VERSION}m python3m
&& ln -sf python${PYTHON_VERSION}-config python3-config
&& ln -sf python${PYTHON_VERSION}m-config python3m-config
&& ln -sf python3 /usr/bin/python
# Update pip and add common packages
&& python -m pip install --upgrade pip
&& python -m pip install --upgrade
setuptools
wheel
six
# Cleanup
&& apt-get clean
&& rm -rf $HOME/.cache/pip
您可以从这里开始:将python版本更改为您需要的版本(可以在Ubuntu库中找到),添加包、代码等。
相关文章