Before you run the provision tool
(Recommended) Run
prereq.sh
to get the system ready to deploy Omnia. Alternatively, ensure that Ansible 2.12.10 and Python 3.8 are installed on the system. SELinux should also be disabled.Set the IP address of the control plane.
Set the hostname of the control plane using the
hostname
.domain name
format.- Hostname requirements
The Hostname should not contain the following characters: , (comma), . (period) or _ (underscore). However, the domain name is allowed commas and periods.
The Hostname cannot start or end with a hyphen (-).
No upper case characters are allowed in the hostname.
The hostname cannot start with a number.
The hostname and the domain name (that is:
hostname00000x.domain.xxx
) cumulatively cannot exceed 64 characters. For example, if thenode_name
provided ininput/provision_config.yml
is ‘node’, and thedomain_name
provided is ‘omnia.test’, Omnia will set the hostname of a target compute node to ‘node00001.omnia.test’. Omnia appends 6 digits to the hostname to individually name each target node.
For example,
controlplane.omnia.test
is acceptable.
Note
The domain name specified for the control plane should be the same as the one specified under domain_name
in input/provision_config.yml
.
To provision the bare metal servers, download one of the following ISOs for deployment:
Note the compatibility between cluster OS and control plane OS below:
To set up CUDA and OFED using the provisioning tool, download the required repositories from here:
To dictate IP address/MAC mapping, a host mapping file can be provided. Use the pxe_mapping_file.csv to create your own mapping file.
Ensure that all connection names under the network manager match their corresponding device names.
nmcli connection
In the event of a mismatch, edit the file /etc/sysconfig/network-scripts/ifcfg-<nic name>
using vi editor.
When discovering nodes via snmpwalk or a mapping file, all target nodes should be set up in PXE mode before running the playbook.
Nodes provisioned using the Omnia provision tool do not require a RedHat subscription to run
provision.yml
on RHEL target nodes.For RHEL target nodes not provisioned by Omnia, ensure that RedHat subscription is enabled on all target nodes. Every target node will require a RedHat subscription.
Users should also ensure that all repos (AppStream, BaseOS and CRB) are available on the RHEL control plane.
Uninstall epel-release if installed on the control plane as Omnia configures epel-release on the control plane. To uninstall epel-release, use the following commands:
dnf remove epel-release -y
Note
To enable the repositories, run the following commands:
subscription-manager repos --enable=codeready-builder-for-rhel-8-x86_64-rpms
subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms
subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms
Verify your changes by running:
yum repolist enabled
Ensure that the
pxe_nic
andpublic_nic
are in the firewalld zone: public.The control plane NIC connected to remote servers (through the switch) should be configured with two IPs in a shared LOM set up. This NIC is configured by Omnia with the IP xx.yy.255.254, aa.bb.255.254 (where xx.yy are taken from
bmc_nic_subnet
and aa.bb are taken fromadmin_nic_subnet
) whennetwork_interface_type
is set tolom
. For other discovery mechanisms, only the admin NIC is configured with aa.bb.255.254 (Where aa.bb is taken fromadmin_nic_subnet
).
Note
After configuration and installation of the cluster, changing the control plane is not supported. If you need to change the control plane, you must redeploy the entire cluster.
If there are errors while executing any of the Ansible playbook commands, then re-run the playbook.
For servers with an existing OS being discovered via BMC, ensure that the first PXE device on target nodes should be the designated active NIC for PXE booting.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.