... | @@ -96,9 +96,9 @@ NOTE: Make sure you download the cuda-samples repository that matches the cuda v |
... | @@ -96,9 +96,9 @@ NOTE: Make sure you download the cuda-samples repository that matches the cuda v |
|
## Mellanox Drivers - MLNX OFED
|
|
## Mellanox Drivers - MLNX OFED
|
|
For the Mellanox PCIE card you will need to install its respective drivers specific for the aarch64 architecture and tegra kernel. For this all information found at https://gitlab.ras.byu.edu/alpaca/wiki/-/wikis/Unix-Networking under Installing MLNX OFED still applys with some minor adjustments to the commands. The driver ISO file can be found at https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/. For the University Node the you will need to install the 23.10-3.2.2.0 version of the driver. The drop down menu should include this version. As for the change in commands you will need to run the following as sudo with the NIC card installed after the image is mounted:
|
|
For the Mellanox PCIE card you will need to install its respective drivers specific for the aarch64 architecture and tegra kernel. For this all information found at https://gitlab.ras.byu.edu/alpaca/wiki/-/wikis/Unix-Networking under Installing MLNX OFED still applys with some minor adjustments to the commands. The driver ISO file can be found at https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/. For the University Node the you will need to install the 23.10-3.2.2.0 version of the driver. The drop down menu should include this version. As for the change in commands you will need to run the following as sudo with the NIC card installed after the image is mounted:
|
|
|
|
|
|
`mount -o ro, loop <.iso> /mnt`
|
|
`mount -o ro, loop <.iso> /mnt`.
|
|
`cd /mnt/`
|
|
`cd /mnt/`.
|
|
`./mlnxofedinstall --without-dkms --add-kernel-support --kernel <kernel version> --without-fw-update --force --enable-gds`
|
|
`./mlnxofedinstall --without-dkms --add-kernel-support --kernel <kernel version> --without-fw-update --force --enable-gds`.
|
|
|
|
|
|
If at any point this command fails you'll have to debug why it didn't complete. It will most likely be due to conflicting system packages. The `--force` command should resolve this but if not take caution in removing other packages as CUDA might depend on some of them.
|
|
If at any point this command fails you'll have to debug why it didn't complete. It will most likely be due to conflicting system packages. The `--force` command should resolve this but if not take caution in removing other packages as CUDA might depend on some of them.
|
|
|
|
|
... | | ... | |