Setup PyTorch

PyTorch is a popular open-source deep learning framework, renowned for its dynamic computation graph that enhances flexibility and ease of use, making it a preferred choice for researchers and developers. With strong community support, PyTorch facilitates seamless experimentation and rapid prototyping in the field of machine learning.

Prerequisites

Ensure nerdctl is available on all cluster nodes.
If GPUs are present on the target nodes, install NVidia CUDA (with containerd) or AMD Rocm drivers during provisioning. CPUs do not require any additional drivers.
Use local_repo.yml to create an offline PyTorch repository. For more information, click here.

[Optional prerequisites]

Ensure the system has enough space.
Ensure the passed inventory file includes a kube_control_plane and a kube_node_group listing all cluster nodes. Click here for a sample file.
Nerdctl does not support mounting directories as devices because it is not a feature of containerd (The runtime that nerdctl uses). Individual files need to be attached while running nerdctl.

Deploying PyTorch

Change directories to the tools folder:
```
cd tools
```

Run the pytorch.yml playbook:

ansible-playbook pytorch.yml -i inventory

Accessing PyTorch (CPU)

Verify that the PyTorch image present in container engine images:
```
nerdctl images
```

Use the container image per your needs:

nerdctl run -it --rm pytorch/pytorch:latest

For more information, click here.

Accessing PyTorch (AMD)

Verify that the PyTorch image present in container engine images:
```
nerdctl images
```

Use the container image per your needs:

nerdctl run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device /dev/dri/card0 --device /dev/dri/card1 --device /dev/dri/card2 --device /dev/dri/renderD128 --device /dev/dri/renderD129  --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest

For more information, click here.

Accessing PyTorch (NVidia)

Verify that the PyTorch image present in container engine images:
```
nerdctl images
```

Use the container image per your needs:

nerdctl  run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.12-py3

For more information, click here.

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.