Building clusters

  1. In the input/omnia_config.yml, input/security_config.yml, input/telemetry_config.yml and [optional] input/storage_config.yml files, provide the required details.

  2. Create an inventory file in the omnia folder. Check out the sample inventory for more information. If a hostname is used to refer to the target nodes, ensure that the domain name is included in the entry. IP addresses are also accepted in the inventory file.

Hostname requirements
  • The hostname should not contain the following characters: , (comma), . (period) or _ (underscore). However, the domain name is allowed commas and periods.

  • The hostname cannot start or end with a hyphen (-).

  • No upper case characters are allowed in the hostname.

  • The hostname cannot start with a number.

  • The hostname and the domain name (that is: hostname00000x.domain.xxx) cumulatively cannot exceed 64 characters. For example, if the node_name provided in input/provision_config.yml is ‘node’, and the domain_name provided is ‘omnia.test’, Omnia will set the hostname of a target cluster node to ‘node000001.omnia.test’. Omnia appends 6 digits to the hostname to individually name each target node.

Note

  • RedHat nodes that are not configured by Omnia need to have a valid subscription. To set up a subscription, click here.

  • Omnia creates a log file which is available at: /var/log/omnia.log.

  • If only Slurm is being installed on the cluster, docker credentials are not required.

  1. omnia.yml is a wrapper playbook comprising of:

    1. security.yml: This playbook sets up centralized authentication (LDAP/FreeIPA) on the cluster. For more information, click here.

    2. storage.yml: This playbook sets up storage tools like BeeGFS and NFS.

    3. scheduler.yml: This playbook sets up job schedulers (Slurm or Kubernetes) on the cluster.

    4. telemetry.yml: This playbook sets up Omnia telemetry and/or iDRAC telemetry. It also installs Grafana and Loki as Kubernetes pods.

To run omnia.yml:

ansible-playbook omnia.yml -i inventory

Note

  • For a Kubernetes cluster installation, ensure that the inventory includes an [etcd] entry. etcd is a consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data. For more information, click here.

  • If you want to view or edit the omnia_config.yml file, run the following command:

    • ansible-vault view omnia_config.yml --vault-password-file .omnia_vault_key – To view the file.

    • ansible-vault edit omnia_config.yml --vault-password-file .omnia_vault_key – To edit the file.

  • Use the ansible-vault view or edit commands and not the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to the parameter files.

Setting up a shared home directory

../../_images/UserHomeDirectory.jpg

Users wanting to set up a shared home directory for the cluster can do it in one of two ways:

  • Using the head node as an NFS host: Set enable_omnia_nfs (input/omnia_config.yml) to true and provide a share path which will be configured on all nodes in omnia_usrhome_share (input/omnia_config.yml). During the execution of omnia.yml, the NFS share will be set up for access by all cluster nodes.

  • Using an external filesystem: Configure the external file storage using storage.yml. Set enable_omnia_nfs (input/omnia_config.yml) to false and provide the external share path in omnia_usrhome_share (input/omnia_config.yml). Run omnia.yml to configure access to the external share for deployments.

Slurm job based user access

To ensure security while running jobs on the cluster, users can be assigned permissions to access cluster nodes only while their jobs are running. To enable the feature:

cd scheduler
ansible-playbook job_based_user_access.yml -i inventory

Note

  • The inventory queried in the above command is to be created by the user prior to running omnia.yml as scheduler.yml is invoked by omnia.yml

  • Only users added to the ‘slurm’ group can execute slurm jobs. To add users to the group, use the command: usermod -a -G slurm <username>.

Configuring UCX and OpenMPI on the cluster

If a local repository for UCX and OpenMPI has been configured on the cluster, the following configurations take place when running omnia.yml or scheduler.yml.

  • UCX will be compiled and installed on the NFS share (based on the client_share_path provided in the nfs_client_params in input/storage_config.yml).

  • If the cluster uses Slurm and UCX, OpenMPI is configured to compile with the UCX and Slurm on the NFS share (based on the client_share_path provided in the nfs_client_params in input/storage_config.yml).

  • All corresponding compiled UCX and OpenMPI files will be saved to the <client_share_path>/compile directory on the nfs share.

  • All corresponding UCX and OpenMPI executables will be saved to the <client_share_path>/benchmarks/ directory on the nfs share.

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.