如何创建 Docker 映像以同时运行 Python 和 R?

2022-01-14 00:00:00 python docker r dockerfile devops

问题描述

我想将主要在 Python 中开发但依赖于在 R 中训练的模型的代码管道容器化.对于两个代码库所需的要求和包,还有一些额外的依赖关系.如何创建一个 Docker 映像,让我可以构建一个容器来同时运行此 Python 和 R 代码?

I want to containerise a pipeline of code that was predominantly developed in Python but has a dependency on a model that was trained in R. There are some additional dependencies on the requirements and packages needed for both codebases. How can I create a Docker image that allows me to build a container that will run this Python and R code together?

对于上下文,我有一个运行模型(随机森林)的 R 代码,但它需要是用 Python 构建的数据管道的一部分.Python 管道首先执行一些功能并为模型生成输入,然后使用该输入执行 R 代码,然后将输出带到 Python 管道的下一个阶段.

For context, I have an R code that runs a model (random forest) but it needs to be part of a data pipeline that was built in Python. The Python pipeline performs some functionality first and generates input for the model, then executes the R code with that input, before taking the output to the next stage of the Python pipeline.

所以我通过编写一个简单的测试 Python 函数来调用 R 代码(test_call_r.py",它导入子流程包)为此过程创建了一个模板,并且需要将它放入具有必要要求的 Docker 容器中以及适用于 Python 和 R 的软件包.

So I've created a template for this process by writing a simple test Python function to call an R code ("test_call_r.py" which imports the subprocess package) and need to put this in a Docker container with the necessary requirements and packages for both Python and R.

我已经能够为 Python 管道本身构建 Docker 容器,但无法成功安装 R 和相关软件包以及 Python 要求.我想重写 Dockerfile 来创建一个图像来做到这一点.

I have been able to build the Docker container for the Python pipeline itself, but cannot successfully install R and the associated packages alongside the Python requirements. I want to rewrite the Dockerfile to create an image to do this.

从 Dockerhub 文档中,我可以使用例如

From the Dockerhub documentation I can create an image for the Python pipeline using, e.g.,

FROM python:3
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
CMD [ "python", "./test_call_r.py" ]

与 Dockerhub 类似,我可以使用基本 R 映像(或 Rocker)来创建可以运行 randomForest 模型的 Docker 容器,例如,

And similarly from Dockerhub I can use a base R image (or Rocker) to create a Docker container that can run a randomForest model, e.g.,

FROM r-base
WORKDIR /app    
COPY myscripts /app/
RUN Rscript -e "install.packages('randomForest')"
CMD ["Rscript", "myscript.R"] 

但我需要创建一个可以安装 Python 和 R 的要求和包的映像,并执行代码库以从 Python 中的子进程运行 R.我怎样才能做到这一点?

But what I need is to create an image that can install the requirements and packages for both Python and R, and execute the codebase to run R from a subprocess in Python. How can I do this?


解决方案

我为 Python 和 R 构建的 Dockerfile 以这种方式与它们的依赖项一起运行是:

The Dockerfile I built for Python and R to run together with their dependencies in this manner is:

FROM ubuntu:latest

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y --no-install-recommends build-essential r-base r-cran-randomforest python3.6 python3-pip python3-setuptools python3-dev

WORKDIR /app

COPY requirements.txt /app/requirements.txt

RUN pip3 install -r requirements.txt

RUN Rscript -e "install.packages('data.table')"

COPY . /app

构建镜像、运行容器(这里命名为SnakeR)和执行代码的命令是:

The commands to build the image, run the container (naming it SnakeR here), and execute the code are:

docker build -t my_image .
docker run -it --name SnakeR my_image
docker exec SnakeR /bin/sh -c "python3 test_call_r.py"

我把它当成一个 Ubuntu 操作系统,构建镜像如下:

I treated it like a Ubuntu OS and built the image as follows:

  • 在 R 安装期间抑制选择位置的提示;
  • 更新apt-get;
  • 设置以下安装条件:
    • y = 是对用户继续进行的提示(例如内存分配);
    • 只安装推荐的而不是推荐的依赖项;

    这是从我的博客文章中复制的 https://datascienceunicorn.tumblr.com/post/182297983466/building-a-docker-to-run-python-r

    This is replicated from my blog post at https://datascienceunicorn.tumblr.com/post/182297983466/building-a-docker-to-run-python-r

相关文章