Adding new nodes
Provisioning the new node
A new node can be added using the following ways:
If
pxe_mapping_file_path
is provided ininput/provision_config.yml`
:Update the existing mapping file by appending the new entry (without the disrupting the older entries) or provide a new mapping file by pointing
pxe_mapping_file_path
inprovision_config.yml
to the new location.
Note
When re-running
discovery_provision.yml
with a new mapping file, ensure that existing IPs from the current mapping file are not provided in the new mapping file. Any IP overlap between mapping files will result in PXE failure. This can only be resolved by running the Clean Up script followed bydiscovery_provision.yml
.Run
discovery_provision.yml
.:ansible-playbook discovery_provision.yml
Manually PXE boot the target servers after the
discovery_provision.yml
playbook (ifbmc_ip
is not provided in the mapping file) is executed and the target node lists as booted in the nodeinfo table
When target nodes were discovered using BMC:
Run
discovery_provision.yml
once the node has joined the cluster using an IP that exists within the provided range.ansible-playbook discovery_provision.yml
When target nodes were discovered using
switch_based_details
ininput/provision_config.yml
:Edit or append JSON list stored in
switch_based_details
ininput/provision_config.yml
.
Note
All ports residing on the same switch should be listed in the same JSON list element.
Ports configured via Omnia should be not be removed from
switch_based_details
ininput/provision_config.yml
.
Run
discovery_provision.yml
.ansible-playbook discovery_provision.yml
Manually PXE boot the target servers after the
discovery_provision.yml
playbook is executed and the target node lists as booted in the nodeinfo table
Verify that the node has been provisioned successfully by checking the Omnia nodeinfo table.
Adding new compute nodes to the cluster
Insert the new IPs in the existing inventory file following the below example.
Existing kubernetes inventory
[kube_control_plane]
10.5.0.101
[kube_node]
10.5.0.102
10.5.0.103
[auth_server]
10.5.0.101
[etcd]
10.5.0.110
Updated kubernetes inventory with the new node information
[kube_control_plane]
10.5.0.101
[kube_node]
10.5.0.102
10.5.0.103
10.5.0.105
10.5.0.106
[auth_server]
10.5.0.101
[etcd]
10.5.0.110
Existing Slurm inventory
[slurm_control_node]
10.5.0.101
[slurm_node]
10.5.0.102
10.5.0.103
[login]
10.5.0.104
[auth_server]
10.5.0.101
Updated Slurm inventory with the new node information
[slurm_control_node]
10.5.0.101
[slurm_node]
10.5.0.102
10.5.0.103
10.5.0.105
10.5.0.106
[login]
10.5.0.104
[auth_server]
10.5.0.101
In the above examples, nodes 10.5.0.105 and 10.5.0.106 have been added to the cluster as compute nodes.
Note
The
[etcd]
group only supports an odd number of servers in the group.Do not change the kube_control_plane/slurm_control_node/auth_server in the existing inventory. Simply add the new node information in the kube_node/slurm_node group.
When re-running
omnia.yml
to add a new node, ensure that theinput/security_config.yml
andinput/omnia_config.yml
are not edited between runs.
To install security, job scheduler and storage tools (NFS, BeeGFS) on the node, run
omnia.yml
:ansible-playbook omnia.yml -i inventory
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.