Ubuntu怎么安装和卸载CUDA和CUDNN
这篇文章主要介绍了Ubuntu怎么安装和卸载CUDA和CUDNN的相关知识,内容详细易懂,操作简单快捷,具有一定借鉴价值,相信大家阅读完这篇Ubuntu怎么安装和卸载CUDA和CUDNN文章都会有所收获,下面我们一起来看看吧。
安装显卡驱动
禁用nouveau驱动
sudo vim /etc/modprobe.d/blacklist.conf
在文本最后添加:
blacklist nouveau
options nouveau modeset=0
然后执行:
sudo update-initramfs -u
重启后,执行以下命令,如果没有屏幕输出,说明禁用nouveau成功:
lsmod | grep nouveau
下载驱动
根据自己显卡的情况下载对应版本的显卡驱动,比如笔者的显卡是rtx2070:
下载完成之后会得到一个安装包,不同版本文件名可能不一样:
nvidia-linux-x86_64-410.93.run
卸载旧驱动
以下操作都需要在命令界面操作,执行以下快捷键进入命令界面,并登录:
ctrl-alt+f1
执行以下命令禁用x-window服务,否则无法安装显卡驱动:
sudo service lightdm stop
执行以下三条命令卸载原有显卡驱动:
sudo apt-get remove --purge nvidia*
sudo chmod +x nvidia-linux-x86_64-410.93.run
sudo ./nvidia-linux-x86_64-410.93.run --uninstall
安装新驱动
直接执行驱动文件即可安装新驱动,一直默认即可:
sudo ./nvidia-linux-x86_64-410.93.run
执行以下命令启动x-window服务
sudo service lightdm start
最后执行重启命令,重启系统即可:
reboot
注意: 如果系统重启之后出现重复登录的情况,多数情况下都是安装了错误版本的显卡驱动。需要下载对应本身机器安装的显卡版本。
卸载cuda
为什么一开始我就要卸载cuda呢,这是因为笔者是换了显卡rtx2070,原本就安装了cuda 8.0 和 cudnn 7.0.5不能够正常使用,笔者需要安装cuda 10.0 和 cudnn 7.4.2,所以要先卸载原来的cuda。注意以下的命令都是在root用户下操作的。
卸载cuda很简单,一条命令就可以了,主要执行的是cuda自带的卸载脚本,读者要根据自己的cuda版本找到卸载脚本:
sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl
卸载之后,还有一些残留的文件夹,之前安装的是cuda 8.0。可以一并删除:
sudo rm -rf /usr/local/cuda-8.0/
这样就算卸载完了cuda。
安装cuda
安装的cuda和cudnn版本:
cuda 10.0
cudnn 7.4.2
接下来的安装步骤都是在root用户下操作的。
下载和安装cuda
我们可以在官网:cuda10下载页面,
下载符合自己系统版本的cuda。页面如下:
下载完成之后,给文件赋予执行权限:
chmod +x cuda_10.0.130_410.48_linux.run
执行安装包,开始安装:
./cuda_10.0.130_410.48_linux.run
开始安装之后,需要阅读说明,可以使用
ctrl + c
直接阅读完成,或者使用空格键
慢慢阅读。然后进行配置,我这里说明一下:(是否同意条款,必须同意才能继续安装)
accept/decline/quit: accept
(这里不要安装驱动,因为已经安装最新的驱动了,否则可能会安装旧版本的显卡驱动,导致重复登录的情况)
install nvidia accelerated graphics driver for linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n
install the cuda 10.0 toolkit?(是否安装cuda 10 ,这里必须要安装)
(y)es/(n)o/(q)uit: y
enter toolkit location(安装路径,使用默认,直接回车就行)
[ default is /usr/local/cuda-10.0 ]:
do you want to install a symbolic link at /usr/local/cuda?(同意创建软链接)
(y)es/(n)o/(q)uit: y
install the cuda 10.0 samples?(不用安装测试,本身就有了)
(y)es/(n)o/(q)uit: n
installing the cuda toolkit in /usr/local/cuda-10.0 ...(开始安装)
安装完成之后,可以配置他们的环境变量,在
vim ~/.bashrc
的最后加上以下配置信息:export cuda_home=/usr/local/cuda-10.0
export ld_library_path=${cuda_home}/lib64
export path=${cuda_home}/bin:${path}
最后使用命令
source ~/.bashrc
使它生效。可以使用命令
nvcc -v
查看安装的版本信息:test@test:~$ nvcc -v
nvcc: nvidia (r) cuda compiler driver
copyright (c) 2005-2018 nvidia corporation
built on sat_aug_25_21:08:01_cdt_2018
cuda compilation tools, release 10.0, v10.0.130
测试安装是否成功
执行以下几条命令:
cd /usr/local/cuda-10.0/samples/1_utilities/devicequery
make
./devicequery
正常情况下输出:
./devicequery starting...
cuda device query (runtime api) version (cudart static linking)
detected 1 cuda capable device(s)
device 0: "geforce rtx 2070"
cuda driver version / runtime version 10.0 / 10.0
cuda capability major/minor version number: 7.5
total amount of global memory: 7950 mbytes (8335982592 bytes)
(36) multiprocessors, ( 64) cuda cores/mp: 2304 cuda cores
gpu max clock rate: 1620 mhz (1.62 ghz)
memory clock rate: 7001 mhz
memory bus width: 256-bit
l2 cache size: 4194304 bytes
maximum texture dimension size (x,y,z) 1d=(131072), 2d=(131072, 65536), 3d=(16384, 16384, 16384)
maximum layered 1d texture size, (num) layers 1d=(32768), 2048 layers
maximum layered 2d texture size, (num) layers 2d=(32768, 32768), 2048 layers
total amount of constant memory: 65536 bytes
total amount of shared memory per block: 49152 bytes
total number of registers available per block: 65536
warp size: 32
maximum number of threads per multiprocessor: 1024
maximum number of threads per block: 1024
max dimension size of a thread block (x,y,z): (1024, 1024, 64)
max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
maximum memory pitch: 2147483647 bytes
texture alignment: 512 bytes
concurrent copy and kernel execution: yes with 3 copy engine(s)
run time limit on kernels: yes
integrated gpu sharing host memory: no
support host page-locked memory mapping: yes
alignment requirement for surfaces: yes
device has ecc support: disabled
device supports unified addressing (uva): yes
device supports compute preemption: yes
supports cooperative kernel launch: yes
supports multidevice co-op kernel launch: yes
device pci domain id / bus id / location id: 0 / 1 / 0
compute mode:
< default (multiple host threads can use ::cudasetdevice() with device simultaneously) >
devicequery, cuda driver = cudart, cuda driver version = 10.0, cuda runtime version = 10.0, numdevs = 1
result = pass
下载和安装cudnn
进入到cudnn的下载官网:,然点击download开始选择下载版本,当然在下载之前还有登录,选择版本界面如下,我们选择
cudnn library for linux
:下载之后是一个压缩包,如下:
cudnn-10.0-linux-x64-v7.4.2.24.tgz
然后对它进行解压,命令如下:
tar -zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
解压之后可以得到以下文件:
cuda/include/cudnn.h
cuda/nvidia_sla_cudnn_support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.4.2
cuda/lib64/libcudnn_static.a
使用以下两条命令复制这些文件到cuda目录下:
cp cuda/lib64/* /usr/local/cuda-10.0/lib64/
cp cuda/include/* /usr/local/cuda-10.0/include/
拷贝完成之后,可以使用以下命令查看cudnn的版本信息:
cat /usr/local/cuda/include/cudnn.h | grep cudnn_major -a 2
测试安装结果
到这里就已经完成了cuda 10 和 cudnn 7.4.2 的安装。可以安装对应的pytorch的gpu版本测试是否可以正常使用了。安装如下:
pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.0-cp35-cp35m-linux_x86_64.whl
pip3 install torchvision
然后使用以下的程序测试安装情况:
import torch
import torch.nn as nn
import torch.nn.functional as f
import torch.optim as optim
import torch.backends.cudnn as cudnn
from torchvision import datasets, transforms
class net(nn.module):
def __init__(self):
super(net, self).__init__()
self.conv1 = nn.conv2d(1, 10, kernel_size=5)
self.conv2 = nn.conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.dropout2d()
self.fc1 = nn.linear(320, 50)
self.fc2 = nn.linear(50, 10)
def forward(self, x):
x = f.relu(f.max_pool2d(self.conv1(x), 2))
x = f.relu(f.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = f.relu(self.fc1(x))
x = f.dropout(x, training=self.training)
x = self.fc2(x)
return f.log_softmax(x, dim=1)
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = f.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('train epoch: {} [{}/{} ({:.0f}%)] loss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def main():
cudnn.benchmark = true
torch.manual_seed(1)
device = torch.device("cuda")
kwargs = {'num_workers': 1, 'pin_memory': true}
train_loader = torch.utils.data.dataloader(
datasets.mnist('../data', train=true, download=true,
transform=transforms.compose([
transforms.totensor(),
transforms.normalize((0.1307,), (0.3081,))
])),
batch_size=64, shuffle=true, **kwargs)
model = net().to(device)
optimizer = optim.sgd(model.parameters(), lr=0.01, momentum=0.5)
for epoch in range(1, 11):
train(model, device, train_loader, optimizer, epoch)
if __name__ == '__main__':
main()
如果正常输出一下以下信息,证明已经安装成了:
train epoch: 1 [0/60000 (0%)] loss: 2.365850
train epoch: 1 [640/60000 (1%)] loss: 2.305295
train epoch: 1 [1280/60000 (2%)] loss: 2.301407
train epoch: 1 [1920/60000 (3%)] loss: 2.316538
train epoch: 1 [2560/60000 (4%)] loss: 2.255809
train epoch: 1 [3200/60000 (5%)] loss: 2.224511
train epoch: 1 [3840/60000 (6%)] loss: 2.216569
train epoch: 1 [4480/60000 (7%)] loss: 2.181396
相关文章