为什么在 conda 安装后 Tensorflow 无法识别我的 GPU?
问题描述
我是深度学习的新手,过去 2 天我一直在尝试在我的电脑上安装 tensorflow-gpu 版本,但徒劳无功.我避免安装 CUDA 和 cuDNN 驱动程序,因为由于许多兼容性问题,一些在线论坛不推荐它.由于我之前已经在使用 python 的 conda 发行版,所以我选择了 conda install -c anaconda tensorflow-gpu
,在他们的官方网站上写着:https://anaconda.org/anaconda/tensorflow-gpu .
I am new to deep learning and I have been trying to install tensorflow-gpu version in my pc in vain for the last 2 days. I avoided installing CUDA and cuDNN drivers since several forums online don't recommend it due to numerous compatibility issues. Since I was already using the conda distribution of python before, I went for the conda install -c anaconda tensorflow-gpu
as written in their official website here: https://anaconda.org/anaconda/tensorflow-gpu .
然而,即使在新的虚拟环境中安装了 gpu 版本(为了避免与基础环境中 pip 安装的库发生潜在冲突),tensorflow 似乎出于某种神秘原因甚至无法识别我的 GPU.
However even after installing the gpu version in a fresh virtual environment (to avoid potential conflicts with pip installed libraries in the base env), tensorflow doesn't seem to even recognize my GPU for some mysterious reason.
我运行的一些代码片段(在 anaconda 提示符中)以了解它无法识别我的 GPU:-
Some of the code snippets I ran(in anaconda prompt) to understand that it wasn't recognizing my GPU:-
1.
>>>from tensorflow.python.client import device_lib
>>>print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7692219132769779763
]
如您所见,它完全忽略了 GPU.
As you can see it completely ignores the GPU.
2.
>>>tf.debugging.set_log_device_placement(True)
>>>a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
2020-12-13 10:11:30.902956: I tensorflow/core/platform/cpu_feature_guard.cc:142] This
TensorFlow
binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU
instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>>b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
>>>c = tf.matmul(a, b)
>>>print(c)
tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32)
在这里,它应该通过显示 Executing op MatMul in device/job:localhost/replica:0/task:0/device:GPU:0
来表明它使用 GPU 运行(如写在这里:https://www.tensorflow.org/guide/gpu) 但没有那样的存在.我也不确定第二行之后的消息是什么意思.
Here, it was supposed to indicate that it ran with a GPU by showing Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
(as written here: https://www.tensorflow.org/guide/gpu) but nothing like that is present. Also I am not sure what the message after the 2nd line means.
我也在网上搜索了几个解决方案,包括这里,但是几乎所有的问题都与第一种手动安装方法有关,因为大家都推荐这种方法,所以我还没有尝试过.
I have also searched for several solutions online including here but almost all of the issues are related to the first manual installation method which I haven't tried yet since everyone recommended this approach.
我不再使用 cmd,因为在从基本 env 卸载 tensorflow-cpu 并重新安装后,环境变量以某种方式搞砸了,它与 anaconda 提示符完美配合,但不是 cmd.这是一个单独的问题(也很普遍),但我提到了它,以防它在这里发挥作用.我在一个全新的虚拟环境中安装了 gpu 版本以确保安装干净,据我所知,路径变量只需要为手动安装 CUDA 和 cuDNN 库设置.
I don't use cmd anymore since the environment variables somehow got messed up after uninstalling tensorflow-cpu from the base env and on re-installing, it worked perfectly with anaconda prompt but not cmd. This is a separate issue (and widespread also) but I mentioned it in case that has a role to play here. I installed the gpu version in a fresh virtual environment to ensure a clean installation and as far as I understand path variables need to be set up only for manual installation of CUDA and cuDNN libraries.
我使用的卡:-(启用了 CUDA)
The card which I use:-(which is CUDA enabled)
C:WINDOWSsystem32>wmic path win32_VideoController get name
Name
NVIDIA GeForce 940MX
Intel(R) HD Graphics 620
我目前使用的 Tensorflow 和 python 版本:-
Tensorflow and python version I am using currently:-
>>> import tensorflow as tf
>>> tf.__version__
'2.3.0'
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
系统信息:Windows 10 Home、64 位操作系统、基于 x64 的处理器.
System information: Windows 10 Home, 64-bit operating system, x64-based processor.
任何帮助将不胜感激.提前致谢.
Any help would be really appreciated. Thanks in advance.
解决方案
2021 年 8 月 Conda install 现在可能正在运行,根据@ComputerScientist 在下面的评论中,conda install tensorflow-gpu==2.4.1
将给出 cudatoolkit-10.1.243
和 cudnn-7.6.5
August 2021 Conda install may be working now, as according to @ComputerScientist in the comments below, conda install tensorflow-gpu==2.4.1
will give cudatoolkit-10.1.243
and cudnn-7.6.5
以下内容写于 2021 年 1 月,已过时
目前 conda install tensorflow-gpu
安装 tensorflow v2.3.0 并且不安装 conda cudnn 或 cudatoolkit 包.手动安装它们(例如,使用 conda install cudatoolkit=10.1
)似乎也不能解决问题.
Currently conda install tensorflow-gpu
installs tensorflow v2.3.0 and does NOT install the conda cudnn or cudatoolkit packages. Installing them manually (e.g. with conda install cudatoolkit=10.1
) does not seem to fix the problem either.
解决方案是安装较早版本的 tensorflow,它会安装 cudnn 和 cudatoolkit,然后使用 pip 升级
A solution is to install an earlier version of tensorflow, which does install cudnn and cudatoolkit, then upgrade with pip
conda install tensorflow-gpu=2.1
pip install tensorflow-gpu==2.3.1
(2.4.0 使用 cuda 11.0 和 cudnn 8.0,但截至 2020 年 12 月 16 日 cudnn 8.0 不在 anaconda 中)
(2.4.0 uses cuda 11.0 and cudnn 8.0, however cudnn 8.0 is not in anaconda as of 16/12/2020)
另请参阅@GZ0 的答案,该答案链接到 github 讨论,其中包含单行解决方案
please also see @GZ0's answer, which links to a github discussion with a one-line solution
相关文章