Setup PyTorch
PyTorch is a popular open-source deep learning framework, renowned for its dynamic computation graph that enhances flexibility and ease of use, making it a preferred choice for researchers and developers. With strong community support, PyTorch facilitates seamless experimentation and rapid prototyping in the field of machine learning.
Prerequisites
Ensure nerdctl is available on all cluster nodes.
If GPUs are present on the target nodes, install NVidia CUDA (with containerd) or AMD Rocm drivers during provisioning. CPUs do not require any additional drivers.
Use
local_repo.yml
to create an offline PyTorch repository. For more information, click here.
[Optional prerequisites]
Ensure the system has enough space.
Ensure the passed inventory file includes a
kube_control_plane
and akube_node_group
listing all cluster nodes. Click here for a sample file.Nerdctl does not support mounting directories as devices because it is not a feature of containerd (The runtime that nerdctl uses). Individual files need to be attached while running nerdctl.
Deploying PyTorch
Change directories to the
tools
folder:cd tools
Run the
pytorch.yml
playbook:ansible-playbook pytorch.yml -i inventory
Accessing PyTorch (CPU)
Verify that the PyTorch image present in container engine images:
nerdctl images
Use the container image per your needs:
nerdctl run -it --rm pytorch/pytorch:latest
For more information, click here.
Accessing PyTorch (AMD)
Verify that the PyTorch image present in container engine images:
nerdctl images
Use the container image per your needs:
nerdctl run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device /dev/dri/card0 --device /dev/dri/card1 --device /dev/dri/card2 --device /dev/dri/renderD128 --device /dev/dri/renderD129 --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
For more information, click here.
Accessing PyTorch (NVidia)
Verify that the PyTorch image present in container engine images:
nerdctl images
Use the container image per your needs:
nerdctl run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.12-py3
For more information, click here.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.