Before you run the provision tool

  • (Recommended) Run prereq.sh to get the system ready to deploy Omnia. Alternatively, ensure that Ansible 2.12.10 and Python 3.8 are installed on the system. SELinux should also be disabled.

  • Set the IP address of the control plane with a /16 subnet mask. The control plane NIC connected to remote servers (through the switch) should be configured with two IPs (BMC IP and admin IP) in a shared LOM or hybrid set up. In the case dedicated network topology, a single IP (admin IP) is required.

../../_images/ControlPlaneNic.png

Control plane NIC IP configuration in a LOM setup

../../_images/ControlPlane_DedicatedNIC.png

Control plane NIC IP configuration in a dedicated setup

  • Set the hostname of the control plane using the hostname. domain name format.

    Hostname requirements
    • The Hostname should not contain the following characters: , (comma), . (period) or _ (underscore). However, the domain name is allowed commas and periods.

    • The Hostname cannot start or end with a hyphen (-).

    • No upper case characters are allowed in the hostname.

    • The hostname cannot start with a number.

    • The hostname and the domain name (that is: hostname00000x.domain.xxx) cumulatively cannot exceed 64 characters. For example, if the node_name provided in input/provision_config.yml is ‘node’, and the domain_name provided is ‘omnia.test’, Omnia will set the hostname of a target cluster node to ‘node00001.omnia.test’. Omnia appends 6 digits to the hostname to individually name each target node.

    For example, controlplane.omnia.test is acceptable.

    hostnamectl set-hostname controlplane.omnia.test
    

Note

The domain name specified for the control plane should be the same as the one specified under domain_name in input/provision_config.yml.

  • To provision the bare metal servers, download one of the following ISOs to the control plane:

Caution

THE ROCKY LINUX OS VERSION ON THE CLUSTER WILL BE UPGRADED TO THE LATEST 8.x VERSION AVAILABLE IRRESPECTIVE OF THE PROVISION_OS_VERSION PROVIDED IN PROVISION_CONFIG.YML.

Note

Ensure the ISO provided has downloaded seamlessly (No corruption). Verify the SHA checksum/ download size of the ISO file before provisioning to avoid future failures.

Note the compatibility between cluster OS and control plane OS below:

Control Plane OS


cluster Node OS | Compatibility

RHEL [1]

RHEL

Yes

RHEL [1]

Rocky

Yes

Rocky

Rocky

Yes

  • To optionally set up CUDA and OFED using the provisioning tool, download the required repositories to the control plane from here to deploy on the target nodes:

    1. For NVIDIA GPUs:: CUDA is a parallel computing platform and application programming interface that allows software to use certain types of graphics processing units for general purpose processing, an approach called general-purpose computing on GPUs.

    2. For Mellanox: OFED (OpenFabrics Enterprise Distribution) is open-source software for RDMA and kernel bypass applications. OFED can be used in business, research and scientific environments that require highly efficient networks, storage connectivity and parallel computing.

  • Ensure that all connection names under the network manager match their corresponding device names.

    To verify network connection names:

    nmcli connection
    

    To verify the device name:

    ip link show

In the event of a mismatch, edit the file /etc/sysconfig/network-scripts/ifcfg-<nic name> using vi editor.

  • When discovering nodes via snmpwalk or a mapping file, all target nodes should be set up in PXE mode before running the playbook.

  • Nodes provisioned using the Omnia provision tool do not require a RedHat subscription to run provision.yml on RHEL target nodes.

  • For RHEL target nodes not provisioned by Omnia, ensure that RedHat subscription is enabled on all target nodes. Every target node will require a RedHat subscription.

  • Users should also ensure that all repos (AppStream, BaseOS and CRB) are available on the RHEL control plane.

Note

  • Enable a repository from your RHEL subscription, run the following commands:

    subscription-manager repos --enable=codeready-builder-for-rhel-8-x86_64-rpms
    subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms
    subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms
    
  • Enable an offline repository by creating a .repo file in /etc/yum.repos.d/. Refer the below sample content:

    [RHEL-8-appstream]
    
    name=Red Hat AppStream repo
    
    baseurl=http://xx.yy.zz/pub/Distros/RedHat/RHEL8/8.6/AppStream/x86_64/os/
    
    enabled=1
    
    gpgcheck=0
    
    [RHEL-8-baseos]
    
    name=Red Hat BaseOS repo
    
    baseurl=http://xx.yy.zz/pub/Distros/RedHat/RHEL8/8.6/BaseOS/x86_64/os/
    
    enabled=1
    
    gpgcheck=0
    
    [RHEL-8-crb]
    
    name=Red Hat CRB repo
    
    baseurl=http://xx.yy.zz/pub/Distros/RedHat/RHEL8/8.6/CRB/x86_64/os/
    
    enabled=1
    
    gpgcheck=0
    
  • Verify your changes by running:

    yum repolist enabled
    Updating Subscription Management repositories.
    Unable to read consumer identity
    This system is not registered with an entitlement server. You can use subscription-manager to register.
        repo id                                                           repo name
        RHEL-8-appstream-partners                                         Red Hat Enterprise Linux 8.6.0 Partners (AppStream)
        RHEL-8-baseos-partners                                            Red Hat Enterprise Linux 8.6.0 Partners (BaseOS)
        RHEL-8-crb-partners                                               Red Hat Enterprise Linux 8.6.0 Partners (CRB)
    
  • Uninstall epel-release if installed on the control plane as Omnia configures epel-release on the control plane. To uninstall epel-release, use the following commands:

    dnf remove epel-release -y
    
  • Ensure that the pxe_nic and public_nic are in the firewalld zone: public.

Note

  • After configuration and installation of the cluster, changing the control plane is not supported. If you need to change the control plane, you must redeploy the entire cluster.

  • For servers with an existing OS being discovered via BMC, ensure that the first PXE device on target nodes should be the designated active NIC for PXE booting.

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.