Provisioning the cluster
Edit the input/provision_config.yml
, input/provision_config.yml
, and input/network_spec.yml
files to update the required variables. A list of the variables required is available by discovery mechanism.
Note
The first PXE device on target nodes should be the designated active NIC for PXE booting.
Optional configurations managed by the provision tool
Using multiple versions of a given OS
Omnia now supports deploying different versions of the same OS. With each run of discovery_provision.yml
, a new deployable OS image is created with a distinct type:
Rocky: 8.6, 8.7, 8.8
RHEL: 8.6, 8.7, 8.8
Ubuntu: 20.04, 22.04
depending on the values provided in input/software_config.json
.
Note
While Omnia deploys the minimal version of the OS, the multiple version feature requires that the Rocky full (DVD) version of the OS be provided.
Disk partitioning
Omnia now allows for customization of disk partitions applied to remote servers. The disk partition
desired_capacity
has to be provided in MB. Validmount_point
values accepted for disk partition are/var
,/tmp
,/usr
,swap
. The default partition size provided for RHEL/Rocky is /boot: 1024MB, /boot/efi: 256MB and remaining space to / partition. Default partition size provided for Ubuntu is /boot: 2148MB, /boot/efi: 1124MB and remaining space to / partition. Values are accepted in the form of JSON list such as:disk_partition: - { mount_point: "/var", desired_capacity: "102400" } - { mount_point: "swap", desired_capacity: "10240" }
Running the provision tool
To deploy the Omnia provision tool, ensure that input/provision_config.yml
, input/network_spec.yml
, and input/provision_config_credentials.yml
are updated and then run:
ansible-playbook discovery_provision.yml
discovery_provision.yml
runs in three stages that can be called individually:
Caution
Always execute discovery_provision.yml
within the omnia
directory. That is, always change directories (cd omnia
) to the path where the playbook resides before running the playbook.
Preparing the control plane
Installs required tool packages.
Verifies and updates firewall settings.
Installs xCAT.
Configures Omnia databases basis
input/network_spec.yml
.Creates empty inventory files in the control plane at
/opt/omnia/omnia_inventory/
. These inventory files will be filled with information of compute node service tag post provisioning based on type of CPUs and GPUs they have. The inventory files are:
compute_cpu_amd
compute_cpu_intel
compute_gpu_amd
compute_gpu_nvidia
compute_servicetag_ip
Note
Service tags will only be written into the inventory files after the nodes are successfully PXE booted post provisioning.
For a node’s service tag to list in an inventory file, two conditions must be met:
Node status must be “booted” in DB.
Node’s service tag information is present in DB.
Nodes are not removed from the inventory files even if they are physically disconnected. Ensure to run the delete node playbook to remove the node.
To regenerate an inventory file, use the playbook
omnia/utils/inventory_tagging.yml
.cd prepare_cp ansible-playbook prepare_cp.yml
Discovering the nodes
Discovers all target servers.
PostgreSQL database is set up with all relevant cluster information such as MAC IDs, hostname, admin IP, BMC IPs etc.
Configures the control plane with NTP services for cluster node synchronization.
To call this playbook individually, run:
cd discovery ansible-playbook discovery.yml
Provisioning the nodes
The intended operating system and version is provisioned on the primary disk partition on the nodes. If a BOSS Controller card is available on the target node, the operating system is provisioned on the boss controller disks.
To call this playbook individually, run:
cd provision ansible-playbook provision.yml
After successfully running discovery_provision.yml
, go to Building Clusters to setup Slurm, Kubernetes, NFS, BeeGFS and Authentication.
Note
racadm and ipmitool are installed on all target nodes except Ubuntu 20.04.
Ansible playbooks by default run concurrently on 5 nodes. To change this, update the
forks
value inansible.cfg
present in the respective playbook directory.While the
admin_nic
on cluster nodes is configured by Omnia to be static, the public NIC IP address should be configured by user.If the target nodes were discovered using switch-based or mapping mechanisms, manually PXE boot the target servers after the
discovery_provision.yml
playbook is executed and the target node lists as booted in the nodeinfo table.All ports required for xCAT to run will be opened (For a complete list, check out the Security Configuration Document).
After running
discovery_provision.yml
, the fileinput/provision_config_credentials.yml
will be encrypted. To edit the file, use the command:ansible-vault edit provision_config.yml --vault-password-file .provision_vault_key
Post execution of
discovery_provision.yml
, IPs/hostnames cannot be re-assigned by changing the mapping file. However, the addition of new nodes is supported as explained here.Default Python is installed during provisioning on Ubuntu cluster nodes. For Ubuntu 22.04, Python 3.10 is installed. For Ubuntu 20.04, Python 3.8 is installed.
Caution
Once xCAT is installed, restart your SSH session to the control plane to ensure that the newly set up environment variables come into effect. If the new environment variables still do not come into effect, enable manually using:
source /etc/profile.d/xcat.sh
To avoid breaking the passwordless SSH channel on the control plane, do not run
ssh-keygen
commands post execution ofdiscovery_provision.yml
to create a new key.- Do not delete the following directories:
/root/xcat
/root/xcat-dbback
/docker-registry
/opt/omnia
/var/log/omnia
On subsequent runs of
discovery_provision.yml
, if users are unable to log into the server, refresh the ssh key manually and retry.ssh-keygen -R <node IP>
If a subsequent run of
discovery_provision.yml
fails, theinput/provision_config.yml
file will be unencrypted.
To create a node inventory in /opt/omnia
, click here.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.