Nvidia
For user software use, see Help:Cuda linux.
This page covers installation of nVidia drivers including the CUDA toolkit for nVidia GPUS and nVidia GPGPUs.
A more detailed interactive version of this documentation is also available.
Note for all proprietary video driver installs:
- Make sure that the dkms package is installed. Some driver install scripts will autodetect this if it is pre-installed and use it.
- If DKMS is used, the driver will update itself with kernel updates. Otherwise, a driver reinstall will be required each time the kernel is updated.
NOTE: nVidia drivers are frequently only supported in Ubuntu LTS releases! Please check driver availability before upgrading!
Note: The nvidia legacy drivers nvidia-304 and nvidia-340 are buggy in Ubuntu 16.04 and break if you have kernel version >4.10. There is a beta version of this driver that works, but for legacy cards the nouveau driver may also work. Try uninstalling the nvidia drivers to re-enable it.
utilties[edit]
- nvidia-smi
- nvtop (Ubuntu 20 repos)
- nvidia-ps (local)
- heat-svg (local)
- gpust (local)
Versions[edit]
See CUDA
Software:
cuda | released | OS | issues |
---|---|---|---|
9 | Sept 2017 - May 2018 | Ubuntu 16, 18 | won't run on turing or ampere |
10 | Sept 2018 - Nov 2019 | Ubuntu 18 | won't run on ampere gpus |
11 | March 2020 - Oct 2022 | Ubuntu 20, 22 | still supported |
12 | Dec 2022 - current | Ubuntu 20, 22 | current |
Hardware:
compute | generation |
---|---|
6.x | Pascal |
7.0 7.2 | Volta |
7.5 | Turing |
8.0 8.6 8.7 | Ampere |
8.9 | Ada Lovelace |
9.0 | Hopper |
Recent updates[edit]
The following items are not (yet) in the interactive version linked above:
- Before trying to debug your nvidia driver, make sure you actually have an nvidia video card
update-pciids lspci | grep -i vga
- Ubuntu 18 repos include cuda and the drivers seem well integrated; use of PPAs is probalby not needed anymore
Hardware with known issues[edit]
Cards known to be problematic
- GeForce 7000 LT may require older driver and is not supported by CUDA
- GeForce 8xxx requires nvidia-340 driver (ubuntu)
- Titan X/pascal DOES NOT WORK with cuda-toolbox-7.5, it requires 8.0 ; the final release of 8.0 DOES NOT NEED a newer driver
For these cards, installation of a specific driver may be needed instead of the default driver or the driver that comes with cuda. Read instructions below for special driver alternate installation methods.
Debugging and recovery[edit]
If the video works at POST and grub but locks up or fails at linux boot, add nomodeset to the kernel command line. See grub magic for all the rest of the options.
Current nVidia drivers explain what to do if they fail, and occasionally specify exactly what driver is needed instead of the latest one.
- Centos and RedHat
- check /var/log/messages
- Ubuntu
- check /var/log/syslog
- All operating systems
- immediately after boot or failed driver load (modprobe?) dmesg may include the error as well.
Look for nVidia in the log or dmesg output.
Diagnostics and stress tests[edit]
./cuda_memtest --num_iterations 10 --exit_on_error --stress --monitor_temp 5
run without --stress first, --monitor_temp doesn't work on all cards
Installing nvidia proprietary driver without cuda[edit]
If you are going to install cuda, skip this section and use the driver that comes with cuda instead unless your card has a known problem with the cuda driver.
You may need to try multiple versions of the driver until one works. You can install drivers with
apt-get install package...
You must reboot to actually load the driver, as the previous driver version can't be unloaded with an active screen.
After the driver tries to load, read through the kernel messages to see if it succeeded. Some driver versions (especially newer ones) recognize that they don't support your video card and may suggest the correct version of the driver to load.
dmesg | less
Try these drivers first:
- nvidia-current
- nvidia-367 (latest supplied with ubuntu)
If those don't work and don't suggest an older driver, you can try a newer driver:
apt-add-repository ppa:graphics-drivers/ppa apt-get update apt-cache search 'nvidia-[0-9]'
Note that the drivers in this ppa may be incompatible with the nVidia CUDA toolkit.
You can check what driver is loaded with
lsmod | grep nvidia
If you believe you have installed the correct driver but it is still loading the wrong one, reconfigure it. For example:
dpkg-reconfigure nvidia-367
and make sure that DKMS correctly builds the driver without errors.
Ubuntu 18[edit]
ubuntu-drivers autoinstall
CUDA 7.5 / 10 conflict[edit]
If you try to install CUDA 10 on a system that already has CUDA 7.5 it will partially install and then fail with a circular dependancy.
These procedures are not well tested and may be missing steps.
The best option is to remove CUDA 7.5 before attempting to install 10.
Alternately, try this to try to complete the cuda 10 install:
apt-get remove --autoremove libcudart7.5 libcupti7.5 apt-get -o Dpkg::Options::="--force-overwrite" install cuda
Ubuntu 18 CUDA[edit]
Ubuntu 18 has the latest version of the cuda in its own repos
- apt-get install nvidia-cuda-toolkit
OR install from nvidia's repos:
nVidia has now released its own repo for drivers, so the instructions for 16 should also now work for 18.
However, nvidia only includes cuda 10.0 in this repo, so if you need older versions of cuda, you still need to install ubuntu 16.
- install cuda repo from nvidia: https://developer.nvidia.com/cuda-downloads Do not skip this step!
- recommend installer type deb network option
- If you can't download on the local computer, download elsewhere and use dpkg -i cuda*.deb to install from text mode
- sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
- apt-get update
- Install everything at once: apt-get install build-essential cuda environment-modules libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev libboost-all-dev libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler libopenblas-dev python-pip python-protobuf git sshfs; apt-get remove unattended-upgrades
Ubuntu 16 CUDA[edit]
NOTE: see #Hardware below if you have a conflicting device
- install cuda repo from nvidia: https://developer.nvidia.com/cuda-downloads Do not skip this step!
- recommend installer type deb network option
- If you can't download on the local computer, download elsewhere and use dpkg -i cuda*.deb to install from text mode
- sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
- apt-get update
- apt-get install build-essential cuda environment-modules
- Caffe dependencies:
- apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev libboost-all-dev libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler libopenblas-dev python-pip python-protobuf
- Or install everything at once: apt-get install build-essential cuda environment-modules libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev libboost-all-dev libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler libopenblas-dev python-pip python-protobuf git sshfs cuda-9-0 cuda-9-2 ; apt-get remove unattended-upgrades
- If your software needs specific versions: apt-get install cuda cuda-9-0 cuda-9-2
- Recommended: apt-get remove unattended-upgrades or you will need to reboot every time a driver update is automatically installed
- If you are doing deep learning, you need to also download and install the appropriate version of the cudnn libraries from the nvidia website.
- You may also want to install modules to help manage multiple versions of cudnn and cuda.
Make sure the video card works on reboot.
Make sure the nvidia-smi command line lists the card.
If it fails, boot in recovery mode or use a text console and find out why the driver is not working.
Upgrade conflict between CUDA 7.5 and CUDA 10[edit]
Before installing CUDA 10, you must remove CUDA 7.5. If you fail to do this, you'll get into an unresolvable dependency loop.
You can force apt-get to ignore the problem with:
apt-get -o Dpkg::Options::="--force-overwrite"
The nvidia-smi command may help debug. Possible problems include:
- secure boot is preventing the driver from loading (change secure boot mode and reboot)
- driver does not support hardware: check system messages for details and install the correct driver
If these directions don't work, additional things to try can be found at https://help.ubuntu.com/community/BinaryDriverHowto/Nvidia
EXTRA: Add cuda to the default path by saving the following to /etc/profile.d/cuda-path.sh
PATH=$PATH:/usr/local/cuda/bin
Alternate install[edit]
If the cuda driver does not work or cuda is not needed, a specific driver can be installed.
On systems having problems, it may help to start with a clean slate. This is not necessary on most machines.
If you do not need cuda, you can try
apt-get install nvidia-current
If this fails on reboot, check /var/log/syslog and look for nvidia driver messages which sometimes specifies the exact driver version needed. If a newer driver is needed, try the ubuntu proprietary driver repo:
apt-add-repository ppa:graphics-drivers/ppa apt-get update
If you have already installed cuda and the included driver does not recognize your card:
apt-get remove cuda-runtime-8-0
And then search for an appropriate driver:
apt-cache search 'nvidia-[0-9]'
and then install the highest relevant driver version, for instance
apt-get install nvidia-370
To prevent cuda from being autoremoved:
apt-get install nvidia-cuda-dev cuda-toolkit-8-0
The following packages may also be useful that are normally included with cuda-toolkit:
apt-get install nvidia-cuda-{doc,gdb} nvidia-{visual-,}profiler
Note: package list above generated from apt-cache rdepends cuda and following the dependency tree.
Upgrade conflict between CUDA 11.0 and 11.2[edit]
cuda-11-2 doesn't seem to want to install over top of cuda-11-0 easily. The conflict is actually cuda-drivers-450 vs. cuda-drivers-460
The following seems to fix it cleanly:
apt update apt install cuda cuda-drivers-450- libnvidia-extra-450- apt full-upgrade
Configuration[edit]
systemd persistenced[edit]
This keeps the nvidia driver permanently loaded, which shortens startup time for cuda apps.
- cp /lib/systemd/system/nvidia-persistenced.service /etc/systemd/system/
- edit and change --no-persistence-mode
Disable nvidia for video[edit]
If you want to use the internal VGA instead of the nvidia card:
- (supermicro bios) advanced -> PCIe/PCI/PNP configuration -> VGA Priority (onboard / offboard)
- update-alternatives --config x86_64-linux-gnu_egl_conf
- update-alternatives --config x86_64-linux-gnu_gl_conf
- ldconfig (and then restart the X server)
- Add to cuda module:
prepend-path PATH [join [glob /usr/lib/nvidia*/bin] ":"] prepend-path LD_LIBRARY_PATH [join [glob /usr/lib/nvidia*] ":"]
Note: this breaks cuda in some cases.
rebuild xorg.conf[edit]
nvidia-xconfig --query-gpu-info nvidia-xconfig
performance data collection[edit]
Long term data performance collection can be done with collectd. The following pieces must be installed:
- collectd data collection back end
- apt-get install collectd
- collectd nvidia plugin
- pip install collectd-nvidianvml
- then add configure collectd plugin as per README
- data viewer (choose one)
- local desktop: kcollectd
- web: cgp
- web: php-collection
- web: (custom code)
Centos 6[edit]
caffe dependencies
- base
- openblas-devel lapack-devel atlas-devel boost-devel protobuf-devel boost-python snappy-devel
- epel
- hdf5-devel leveldb-devel lmdb-devel
- not in repos
- glog (newer gflags) (newer boost)
Makefile.config : USE_CUDNN := 1
Note: may need to blacklist the noveau driver if you use the nvidia proprietary driver from their website instead of cuda drivers.
- add rdblacklist=nouveau to /etc/default/grub
- update grub: (verify correct filenames) grub2-mkconfig --output=/boot/grub2/grub.cfg
- xrandr will reset resolution to the correct one if possible, but use system->settings->display to permanently set it
nvidia driver module rebuild:
- dkms status
- dkms build nvidia/version