Setup TensorFlow
TensorFlow is a widely-used open-source deep learning framework, recognized for its static computation graph that optimizes performance and scalability, making it a favored choice for deploying machine learning models at scale in various industries.
With an Ansible script, deploy TensorFlow on both kube_nodes
and the kube_control_node
. After the deployment of TensorFlow, you gain access to the TensorFlow container.
Prerequisites
Ensure nerdctl is available on all cluster nodes.
If GPUs are present on the target nodes, install NVidia CUDA (with containerd) or AMD ROCm drivers during provisioning. CPUs do not require any additional drivers.
Use
local_repo.yml
to create an offline TensorFlow repository. For more information, click here.
[Optional prerequisites]
Ensure the system has enough space.
Ensure the passed inventory file includes a
kube_control_plane
and akube_node_group
listing all cluster nodes. Click here for a sample file.Nerdctl does not support mounting directories as devices because it is not a feature of containerd (The runtime that nerdctl uses). Individual files need to be attached while running nerdctl.
Container Network Interface should be enabled with nerdctl.
Deploying TensorFlow
Change directories to the
tools
folder:cd tools
Run the
tensorflow.yml
playbook:ansible-playbook tensorflow.yml -i inventory
Accessing TensorFlow (CPU)
Verify that the tensorflow image present in container engine images:
nerdctl images
Use the container image per your needs:
nerdctl run -it --rm tensorflow/tensorflow
For more information, click here.
Accessing TensorFlow (AMD)
Verify that the tensorflow image present in container engine images:
nerdctl images
Use the container image per your needs:
nerdctl run -it --network=host --device=/dev/kfd --device /dev/dri/card0 --device /dev/dri/card1 --device /dev/dri/card2 --device /dev/dri/renderD128 --device /dev/dri/renderD129 --ipc=host --shm-size 16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/tensorflow:latest
For more information, click here.
Accessing TensorFlow (NVidia)
Verify that the tensorflow image present in container engine images:
nerdctl images
Use the container image per your needs:
nerdctl run --gpus all -it --rm nvcr.io/nvidia/tensorflow:23.12-tf2-py3
For more information, click here.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.