在使用 conda tensorflow-gpu 包之前是否还需要安装 CUDA?

2022-01-10 00:00:00 python tensorflow conda cuda cudnn

问题描述

当我通过 Conda 安装 tensorflow-gpu 时;它给了我以下输出:

When I install tensorflow-gpu through Conda; it gives me the following output:

conda install tensorflow-gpu
Collecting package metadata (current_repodata.json): done
Solving environment: done


## Package Plan ##

  environment location: /home/psychotechnopath/anaconda3/envs/DeepLearning3.6

  added / updated specs:
    - tensorflow-gpu


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _tflow_select-2.1.0        |              gpu           2 KB
    cudatoolkit-10.1.243       |       h6bb024c_0       347.4 MB
    cudnn-7.6.5                |       cuda10.1_0       179.9 MB
    cupti-10.1.168             |                0         1.4 MB
    tensorflow-2.1.0           |gpu_py36h2e5cdaa_0           4 KB
    tensorflow-base-2.1.0      |gpu_py36h6c5654b_0       155.9 MB
    tensorflow-gpu-2.1.0       |       h0d30ee6_0           3 KB
    ------------------------------------------------------------
                                           Total:       684.7 MB

The following NEW packages will be INSTALLED:

  cudatoolkit        pkgs/main/linux-64::cudatoolkit-10.1.243-h6bb024c_0
  cudnn              pkgs/main/linux-64::cudnn-7.6.5-cuda10.1_0
  cupti              pkgs/main/linux-64::cupti-10.1.168-0
  tensorflow-gpu     pkgs/main/linux-64::tensorflow-gpu-2.1.0-h0d30ee6_0

我看到安装 tensorflow-gpu 会自动触发 cudatoolkit 和 cudnn 的安装.这是否意味着我不再需要手动安装 CUDA 和 CUDNN 才能使用 tensorflow-gpu?这个 CUDA 的 conda 安装在哪里?

I see that installing tensorflow-gpu automatically triggers the installation of the cudatoolkit and cudnn. Does this mean that I no longer need to install CUDA and CUDNN manually anymore to be able to use tensorflow-gpu? Where does this conda installation of CUDA reside?

我首先以旧方式安装了 CUDA 和 CuDNN(例如,按照以下安装说明进行操作:https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)

I first installed CUDA and CuDNN the old way (e.g. by following these installation instructions: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html )

然后我注意到 tensorflow-gpu 也在安装 cuda 和 cudnn

And then I noticed that tensorflow-gpu was also installing cuda and cudnn

我现在是否安装了两个版本的 CUDA/CuDNN?如何检查?


解决方案

我现在是否安装了两个版本的 CUDA?如何检查?

Do i now have two versions of CUDA installed and how do I check this?

没有.

conda 安装支持它们提供的 CUDA 加速包所需的最少的可再发行库组件.包名 cudatoolkit 完全用词不当.这不是那种事.尽管现在它的范围比以前大大扩展了(字面意思是 5 个文件——我认为在某些时候他们一定已经从 NVIDIA 获得了许可协议,因为其中一些不是/不在官方的"自由再分发"列表 AFAIK),它基本上仍然只是少数几个库.

conda installs the bare minimum redistributable library components required to support the CUDA accelerated packages they offer. The package name cudatoolkit is a complete misnomer. It is nothing of the sort. Even though it is now greatly expanded in scope from what it used to be (literally 5 files -- I think at some point they must have gotten a licensing deal from NVIDIA because some of this wasn't/isn't on the official "freely redistributable" list AFAIK), it still is basically just a handful of libraries.

你可以自己检查一下:

cat /opt/miniconda3/conda-meta/cudatoolkit-10.1.168-0.json 
{
  "build": "0",
  "build_number": 0,
  "channel": "https://repo.anaconda.com/pkgs/main/linux-64",
  "constrains": [],
  "depends": [],
  "extracted_package_dir": "/opt/miniconda3/pkgs/cudatoolkit-10.1.168-0",
  "features": "",
  "files": [
    "lib/cudatoolkit_config.yaml",
    "lib/libcublas.so",
    "lib/libcublas.so.10",
    "lib/libcublas.so.10.2.0.168",
    "lib/libcublasLt.so",
    "lib/libcublasLt.so.10",
    "lib/libcublasLt.so.10.2.0.168",
    "lib/libcudart.so",
    "lib/libcudart.so.10.1",
    "lib/libcudart.so.10.1.168",
    "lib/libcufft.so",
    "lib/libcufft.so.10",
    "lib/libcufft.so.10.1.168",
    "lib/libcufftw.so",
    "lib/libcufftw.so.10",
    "lib/libcufftw.so.10.1.168",
    "lib/libcurand.so",
    "lib/libcurand.so.10",
    "lib/libcurand.so.10.1.168",
    "lib/libcusolver.so",
    "lib/libcusolver.so.10",
    "lib/libcusolver.so.10.1.168",
    "lib/libcusparse.so",
    "lib/libcusparse.so.10",
    "lib/libcusparse.so.10.1.168",
    "lib/libdevice.10.bc",
    "lib/libnppc.so",
    "lib/libnppc.so.10",
    "lib/libnppc.so.10.1.168",
    "lib/libnppial.so",
    "lib/libnppial.so.10",
    "lib/libnppial.so.10.1.168",
    "lib/libnppicc.so",
    "lib/libnppicc.so.10",
    "lib/libnppicc.so.10.1.168",
    "lib/libnppicom.so",
    "lib/libnppicom.so.10",
    "lib/libnppicom.so.10.1.168",
    "lib/libnppidei.so",
    "lib/libnppidei.so.10",
    "lib/libnppidei.so.10.1.168",
    "lib/libnppif.so",
    "lib/libnppif.so.10",
    "lib/libnppif.so.10.1.168",
    "lib/libnppig.so",
    "lib/libnppig.so.10",
    "lib/libnppig.so.10.1.168",
    "lib/libnppim.so",
    "lib/libnppim.so.10",
    "lib/libnppim.so.10.1.168",
    "lib/libnppist.so",
    "lib/libnppist.so.10",
    "lib/libnppist.so.10.1.168",
    "lib/libnppisu.so",
    "lib/libnppisu.so.10",
    "lib/libnppisu.so.10.1.168",
    "lib/libnppitc.so",
    "lib/libnppitc.so.10",
    "lib/libnppitc.so.10.1.168",
    "lib/libnpps.so",
    "lib/libnpps.so.10",
    "lib/libnpps.so.10.1.168",
    "lib/libnvToolsExt.so",
    "lib/libnvToolsExt.so.1",
    "lib/libnvToolsExt.so.1.0.0",
    "lib/libnvblas.so",
    "lib/libnvblas.so.10",
    "lib/libnvblas.so.10.2.0.168",
    "lib/libnvgraph.so",
    "lib/libnvgraph.so.10",
    "lib/libnvgraph.so.10.1.168",
    "lib/libnvjpeg.so",
    "lib/libnvjpeg.so.10",
    "lib/libnvjpeg.so.10.1.168",
    "lib/libnvrtc-builtins.so",
    "lib/libnvrtc-builtins.so.10.1",
    "lib/libnvrtc-builtins.so.10.1.168",
    "lib/libnvrtc.so",
    "lib/libnvrtc.so.10.1",
    "lib/libnvrtc.so.10.1.168",
    "lib/libnvvm.so",
    "lib/libnvvm.so.3",
    "lib/libnvvm.so.3.3.0"
  ]

  .....

即你得到的是(记住上面的大多数文件"只是符号链接)

i.e. what you get is (keeping in mind most of those "files" above are just symlinks)

  • CUBLAS 运行时
  • CUDA 运行时库
  • CUFFT 运行时
  • CU 和运行时
  • CUsparse rutime
  • CUsolver 运行时
  • NPP 运行时
  • nvblas 运行时
  • NVTX 运行时
  • NVgraph 运行时
  • NVjpeg 运行时
  • NVRTC/NVVM 运行时

conda 安装的 CUDNN 包是可再分发的二进制分发包,它与 NVIDIA 分发的包完全相同,即两个文件,一个头文件和一个库.

The CUDNN package that conda installs is the redistributable binary distribution which is identical to what NVIDIA distribute -- which is exactly two files, a header file and a library.

您仍然需要安装受支持的 NVIDIA 驱动程序才能使 conda 安装的 tensorflow 工作.

You would still require a supported NVIDIA driver installation to make the tensorflow which conda installs work.

如果您想真正编译和构建 CUDA 代码,您需要安装一个单独的 CUDA 工具包,其中包含 conda 故意从其发行版中省略的所有开发组件.

If you want to actually compile and build CUDA code, you need to install a separate CUDA toolkit which contains all the the development components which conda deliberately omits from their distribution.

相关文章