Install Cuda 10.0 on Ubuntu 16.04 (for DGX-1)

I am trying to install CUDA-10.0 on Ubuntu 16.04 running on DGX-1 server. I followed the instructions for “runfile installation” in https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html#runfile.

I selected to install CUDA Drivers, CUDA Toolkit and CUDA Samples.

The previous versions of Nvidia driver and CUDA were removed using (as suggested in How can I install CUDA on Ubuntu 16.04?):

sudo apt-get purge nvidia-cuda* sudo apt-get purge nvidia-* 

After step 4.2.6 (i.e. Reboot the system to reload the graphical interface.), I checked the CUDA version as follows:

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 

However, when I run “nvidia-smi”, I get the following error:

nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. 

I went to step 4.4 (Device Node Verification.), and found that the device files “/dev/nvidia*” don’t exist. I tried to create them manually, however, running “modprobe” returns error:

sudo /sbin/modprobe nvidia modprobe: ERROR: could not insert 'nvidia': Exec format error 

Please help to solve the problem. Thanks!

========================================================================== Other details.

lspci | grep -i nvidia 06:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 07:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 85:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 86:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 89:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 8a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1) 

uname -m && cat /etc/*release x86_64 DGX_NAME="DGX Server" DGX_PRETTY_NAME="NVIDIA DGX Server" DGX_SWBUILD_DATE="2018-03-20" DGX_SWBUILD_VERSION="3.1.6" DGX_COMMIT_ID="1b0f58ecbf989820ce745a9e4836e1de5eea6cfd" DGX_SERIAL_NUMBER=QTFCOU8280021 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS" NAME="Ubuntu" VERSION="16.04.6 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.6 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" 

gcc --version gcc (GCC) 5.4.0 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 

uname -r 4.4.0-142-generic 

cat /proc/version Linux version 4.4.0-142-generic (buildd@lgw01-amd64-033) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) ) #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 

dpkg -l | grep nvidia ii  dgx-peer-mem-loader                             1.1-10                                        amd64        Ensure nvidia is loaded before nv_peer_mem