Skip to content

Offers tools that streamline the use of AMD GPUs with containers.

License

Notifications You must be signed in to change notification settings

ROCm/container-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

207 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

AMD Container Toolkit offers tools to streamline the use of AMD GPUs with containers. The toolkit includes the following packages.

  • amd-container-runtime - The AMD Container Runtime
  • amd-ctk - The AMD Container Toolkit CLI

Requirements

  • Ubuntu 22.04 or 24.04, or RHEL/CentOS 9
  • Docker version 25 or later
  • All the 'amd-ctk runtime configure' commands should be run as root/sudo

Note: Docker Desktop on Linux is not supported for GPU workloads; see troubleshooting to know more.

Quick Start

Install the Container toolkit.

Installing on Ubuntu

To install the AMD Container Toolkit on Ubuntu systems, follow these steps:

  1. Ensure pre-requisites are installed

    apt update && apt install -y wget gnupg2
  2. Add the GPG key for the repository:

    wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | tee /etc/apt/keyrings/rocm.gpg > /dev/null
  3. Add the repository to the system. Replace noble with jammy when using Ubuntu 22.04:

    echo "deb [signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amd-container-toolkit/apt/ noble main" > /etc/apt/sources.list.d/amd-container-toolkit.list
  4. Update the package list and install the toolkit:

    apt update && apt install amd-container-toolkit

Installing on RHEL/CentOS 9

To install the AMD Container Toolkit on RHEL/CentOS 9 systems, follow these steps:

  1. Add the repository configuration:

    tee --append /etc/yum.repos.d/amd-container-toolkit.repo <<EOF
    [amd-container-toolkit]
    name=amd-container-toolkit
    baseurl=https://repo.radeon.com/amd-container-toolkit/el9/main/
    enabled=1
    priority=50
    gpgcheck=1
    gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
    EOF
  2. Clean the package cache and install the toolkit:

    dnf clean all
    dnf install -y amd-container-toolkit

Configuring Docker

  1. Configure the AMD container runtime for Docker as follows. The following command modifies the docker configuration file, /etc/docker/daemon.json, so that Docker can use the AMD container runtime.

    > sudo amd-ctk runtime configure
    
  2. Restart the Docker daemon.

    > sudo systemctl restart docker
    

Docker Runtime Integration

  1. Configure Docker to use AMD container runtime.
> amd-ctk runtime configure --runtime=docker
  1. Specify the required GPUs. There are 3 ways to do this.

    1. Using AMD_VISIBLE_DEVICES environment variable

      • To use all available GPUs,
      > docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=all rocm/rocm-terminal rocm-smi
      
      • To use a subset of available GPUs,
      > docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0,1,2 rocm/rocm-terminal rocm-smi
      
      • To use many contiguously numbered GPUs,
      > docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0-3,5,8 rocm/rocm-terminal rocm-smi
      
    2. Using CDI style

      • First, generate the CDI spec.
      > amd-ctk cdi generate --output=/etc/cdi/amd.json
      
      • Validate the generated CDI spec.
      > amd-ctk cdi validate --path=/etc/cdi/amd.json
      
      • To use all available GPUs,
      > docker run --rm --device amd.com/gpu=all rocm/rocm-terminal rocm-smi
      
      • To use a subset of available GPUs,
      > docker run --rm --device amd.com/gpu=0 --device amd.com/gpu=1 rocm/rocm-terminal rocm-smi
      
      • Note that once the CDI spec, /etc/cdi/amd.json is available, runtime=amd is not required in the docker run command.
    3. Using explicit paths. Note that runtime=amd is not required here.

    > docker run --device /dev/kfd --device /dev/dri/renderD128 --device /dev/dri/renderD129 rocm/rocm-terminal rocm-smi
    
  2. List available GPUs. If this command is run as root, the container-toolkit logs go to /var/log/amd-container-runtime.log, otherwise they go to the user's home directory.

> amd-ctk cdi list
Found 1 AMD GPU device
amd.com/gpu=all
amd.com/gpu=0
  /dev/dri/card1
  /dev/dri/renderD128
  1. Make AMD container runtime default runtime. Avoid specifying --runtime=amd option with the docker run command by setting the AMD container runtime as the default for Docker.
> amd-ctk runtime configure --runtime=docker --set-as-default
  1. Remove AMD container runtime as default runtime.
> amd-ctk runtime configure --runtime=docker --unset-as-default
  1. Remove AMD container runtime configuration in Docker (undo the earlier configuration).
> amd-ctk runtime configure --runtime=docker --remove

Device discovery and enumeration

The following command can be used to list the GPUs available on the system and their enumeration. The GPUs are listed in the CDI format, but the same enumeration applies to usage with the OCI environment variable, AMD_VISIBLE_DEVICES.

> amd-ctk cdi list
Found 1 AMD GPU device
amd.com/gpu=all
amd.com/gpu=0
  /dev/dri/card1
  /dev/dri/renderD128

GPU UUID Support

The AMD Container Toolkit supports GPU selection using unique identifiers (UUIDs) in addition to device indices. This enables more precise and reliable GPU targeting, especially in multi-GPU systems and orchestrated environments.

Getting GPU UUIDs

GPU UUIDs can be obtained using different tools:

Using ROCm SMI

rocm-smi --showuniqueid

This will display output similar to:

GPU[0]          : Unique ID: 0xef2c1799a1f3e2ed
GPU[1]          : Unique ID: 0x1234567890abcdef

Using AMD-SMI

The amd-smi tool can also be used to get the ASIC_SERIAL, which serves as the GPU UUID:

amd-smi static -aB

This will display output similar to:

GPU: 0
    ASIC:
        MARKET_NAME: AMD Instinct MI210
        VENDOR_ID: 0x1002
        VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
        SUBVENDOR_ID: 0x1002
        DEVICE_ID: 0x740f
        SUBSYSTEM_ID: 0x0c34
        REV_ID: 0x02
        ASIC_SERIAL: 0xD1CC3F11CFDD5112
        OAM_ID: N/A
        NUM_COMPUTE_UNITS: 104
        TARGET_GRAPHICS_VERSION: gfx90a
    BOARD:
        MODEL_NUMBER: 102-D67302-00
        PRODUCT_SERIAL: 692231000131
        FRU_ID: 113-HPED67302000B.009
        PRODUCT_NAME: Instinct MI210
        MANUFACTURER_NAME: AMD

Use the ASIC_SERIAL value (e.g., 0xD1CC3F11CFDD5112) as the GPU UUID in container configurations.

Using UUIDs with Environment Variables

Both AMD_VISIBLE_DEVICES and DOCKER_RESOURCE_* environment variables support UUID specification:

Using AMD_VISIBLE_DEVICES

# Use specific GPUs by UUID
docker run --rm --runtime=amd \
  -e AMD_VISIBLE_DEVICES=0xef2c1799a1f3e2ed,0x1234567890abcdef \
  rocm/dev-ubuntu-24.04 rocm-smi

# Mix device indices and UUIDs
docker run --rm --runtime=amd \
  -e AMD_VISIBLE_DEVICES=0,0xef2c1799a1f3e2ed \
  rocm/dev-ubuntu-24.04 rocm-smi

Using DOCKER_RESOURCE_* Variables

# Docker Swarm generic resource format
docker run --rm --runtime=amd \
  -e DOCKER_RESOURCE_GPU=0xef2c1799a1f3e2ed \
  rocm/dev-ubuntu-24.04 rocm-smi

Docker Swarm Integration

GPU UUID support significantly improves Docker Swarm deployments by enabling precise GPU allocation across cluster nodes.

Docker Daemon Configuration for Swarm

Configure each swarm node's Docker daemon with GPU resources in /etc/docker/daemon.json:

{
  "default-runtime": "amd",
  "runtimes": {
    "amd": {
      "path": "amd-container-runtime",
      "runtimeArgs": []
    }
  },
  "node-generic-resources": [
    "AMD_GPU=0x378041e1ada6015",
    "AMD_GPU=0xef39dad16afb86ad",
    "GPU_COMPUTE=0x583de6f2d99dc333"
  ]
}

After updating the configuration, restart the Docker daemon:

sudo systemctl restart docker

Service Definition

Deploy services with specific GPU requirements using docker-compose:

Using generic resources:

# docker-compose.yml for Swarm deployment
version: '3.8'
services:
  rocm-service:
    image: rocm/dev-ubuntu-24.04
    command: rocm-smi
    deploy:
      replicas: 1
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: 'AMD_GPU'  # Matches daemon.json key
                value: 1

Using environment variables:

# docker-compose.yml for Swarm deployment with environment variable
version: '3.8'
services:
  rocm-service:
    image: rocm/dev-ubuntu-24.04
    command: rocm-smi
    environment:
      - AMD_VISIBLE_DEVICES=all
    deploy:
      replicas: 1

Deploy the service:

docker stack deploy -c docker-compose.yml rocm-stack

GPU Tracker

GPU Tracker is an optional, lightweight feature that tracks which containers use which GPUs and lets users set GPUs to shared or exclusive access. It is disabled by default; users can run amd-ctk gpu-tracker enable to turn it on, and status / reset to query or clear state. It only applies when containers are started with docker run and AMD_VISIBLE_DEVICES.

For full usage, examples, and limitations, see the GPU Tracker documentation in this repo, or the official documentation.

Release notes

Release Features Known Issues
v1.2.0 1. GPU Tracker feature support
2. Docker Swarm support
None
v1.1.0 1. GPU partitioning support
2. Full RPM package support
3. Support for range operator in the input string to AMD_VISIBLE_DEVICES ENV variable.
None
v1.0.0 Initial release 1. Partitioned GPUs are not supported.
2. RPM builds are experimental.

Building from Source

To build debian package, use the following command.

make
make pkg-deb

To build rpm package, use the following command.

make build-dev-container-rpm
make pkg-rpm

The packages will be generated in the bin folder.

Documentation

For detailed documentation including installation guides and configuration options, see the documentation.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

About

Offers tools that streamline the use of AMD GPUs with containers.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 18