Skip to content

Support for Placement Groups #32

@ghost

Description

As part of Terraform/#63 (AWS EFA support), support for AWS Placement groups are required. I've been contemplating this a bit recently, as placement groups (AWS, Azure) and GCP Group Placement Policies are somewhat important good performance with certain HPC jobs.

Placement groups are a great match to a single HPC job, or a static set of nodes. They're not really conducive to very elastic environments, or environments where you may mix & match instance types. While they can work there, you're just more likely to get capacity issues and instances failing to launch.

There are also some restrictions that are challenging to support:

  • On GCP, Group Placement Policies are limited to only C2 node types (which aren't really supported by CitC yet), and only up to 22 VM instances. The number of instances that will be in the Group Placement policy must be set when creating the policy.
  • On AWS, Cluster placement groups don't support all VM types (ie, Burstable vCPU (T-series) and Mac)

Thus, placement groups need to be a somewhat optional feature, and it would be nice to treat both AWS and GCP similarly, even though they have different restrictions.

I don't believe that we can create the placement groups as part of the Terraform process, as at that point, limits.yaml doesn't exist, and we don't know how big the cluster could be (affects GCP).

I don't believe that we can create the placement groups as part of the SLURM ResumeProgram call to startnode.py, as this isn't directly linked to a single job. Creating a group for every startnode call will get messy as the nodes not all terminate at a set time, so cleanup becomes a challenge. That said, I do believe that startnode ought to change to enable all the nodes which SLURM wishes to start at once be done in a single API call - it's more likely that the cloud scheduler will be able to find space for the set of nodes, placed compactly (in the placement group) if they are all started in a single call.

Suggesting course

I'm currently thinking that making changes to update_config.py is our best spot for creating for placement groups. Each call to update_config could clean up/terminate existing placement groups that are part of our ${cluster_id}, and create new placement group(s).

I feel like creating a placement group per shape defined in limits.yaml would make the most sense. This way, we would, for example, group C5n instances together, and group C6gn instances together, without trying to get AWS to find a way to compactly mix ARM and x86 instances.

We would also want to update startnode to add the placement policy to the instance starts, in the case where we have placement group created. (ie, we wouldn't create them for AWS t3a instances, as they're burstable, or n1 instances on GCP).

Is there already work in progress to support Placement Groups? If not, does my suggested course of action seem reasonable? I can work on this, and offer patches, but I wanted to make sure that the plan seems reasonable to the core team first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions